samba-mirror

mirror of https://github.com/samba-team/samba.git synced 2024-12-23 17:34:34 +03:00

2971 lines

81 KiB

C

Raw Normal View History

start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00			`/*`
			`ctdb recovery daemon`

			`Copyright (C) Ronnie Sahlberg 2007`

ctdb is GPL not LGPL (This used to be ctdb commit 8624378010d1c2a1438e1e701339dfba7276f960) 2007-05-31 07:50:53 +04:00			`This program is free software; you can redistribute it and/or modify`
			`it under the terms of the GNU General Public License as published by`
update lib/replace from samba4 (This used to be ctdb commit f0555484105668c01c21f56322992e752e831109) 2007-07-10 09:29:31 +04:00			`the Free Software Foundation; either version 3 of the License, or`
ctdb is GPL not LGPL (This used to be ctdb commit 8624378010d1c2a1438e1e701339dfba7276f960) 2007-05-31 07:50:53 +04:00			`(at your option) any later version.`

			`This program is distributed in the hope that it will be useful,`
start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00			`but WITHOUT ANY WARRANTY; without even the implied warranty of`
ctdb is GPL not LGPL (This used to be ctdb commit 8624378010d1c2a1438e1e701339dfba7276f960) 2007-05-31 07:50:53 +04:00			`MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the`
			`GNU General Public License for more details.`

			`You should have received a copy of the GNU General Public License`
update lib/replace from samba4 (This used to be ctdb commit f0555484105668c01c21f56322992e752e831109) 2007-07-10 09:29:31 +04:00			`along with this program; if not, see <http://www.gnu.org/licenses/>.`
start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00			`*/`

			`#include "includes.h"`
			`#include "lib/events/events.h"`
			`#include "system/filesys.h"`
better timeout handling for calls, controls and traverses (This used to be ctdb commit 63346a6c59d4821b4c443939b5d88db8cd20f5fe) 2007-05-10 08:06:48 +04:00			`#include "system/time.h"`
let each node verify that they have a correct assignment of public ip addresses (i.e. htey hold those they should hold and they dont hold any of those they shouldnt hold) if an inconsistency is found, mark the local node as recovery mode active and wait for the recovery master to trigger a full blown recovery (This used to be ctdb commit 55a5bfc8244c5b9cdda3f11992f384f00566b5dc) 2007-09-14 04:16:36 +04:00			`#include "system/network.h"`
when we are shutting down, we should first shut down the recovery daemon (This used to be ctdb commit 39ade6b329adcd3234124d6a8daaa6181abf739b) 2007-10-22 06:34:08 +04:00			`#include "system/wait.h"`
start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00			`#include "popt.h"`
			`#include "cmdline.h"`
			`#include "../include/ctdb.h"`
			`#include "../include/ctdb_private.h"`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`#include "db_wrap.h"`
ensure the recovery daemon is not clagged up by vacuum calls (This used to be ctdb commit ff7e80e247bf5a86adda0ef850d901478449675b) 2008-01-08 13:28:42 +03:00			`#include "dlinklist.h"`
start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00
added admin commands to ban/unban nodes (This used to be ctdb commit 4dad04172e7e4955b5bf6444a85b19901c9683ad) 2007-06-07 10:34:33 +04:00
			`struct ban_state {`
			`struct ctdb_recoverd *rec;`
			`uint32_t banned_node;`
			`};`

implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`/*`
			`private state of recovery daemon`
			`*/`
			`struct ctdb_recoverd {`
			`struct ctdb_context *ctdb;`
add a new file <reclock>.pnn where each recovery daemon can lock that byte at offset==pnn to offer an alternative way to detect which nodes are active instead of relying on CONNECTED being accurate. (This used to be ctdb commit 21d3319eaf463e2a00637d440ee2d4d15f53bf09) 2008-02-29 04:37:42 +03:00			`int rec_file_fd;`
change recmaster from being a local variable in monitor_cluster() to be a member of the ctdb_recoverd structure (This used to be ctdb commit b7f955338f50c92374b4f559268fb3a1a516aefa) 2008-03-02 23:53:46 +03:00			`uint32_t recmaster;`
store the num_active variable (number of connected/active nodes) inside the rec structure and avoid passing this as an extra parameter to do_recovery() (This used to be ctdb commit 8bb229aa3b4bd41e48d4e4e2e148d8680c8ba436) 2008-02-29 04:55:20 +03:00			`uint32_t num_active;`
add a num_connected field to the rec structure that holds the number of connected nodes num_active only contains the number of active nodes and would thus not count banned nodes (This used to be ctdb commit 06d3ce470766ef0b60d68ccd84de5437146cc147) 2008-03-03 02:24:17 +03:00			`uint32_t num_connected;`
add a new tunable : reclockpingperiod once every such interval : * the recovery master on each node will uppdate the "connected" count in the reclock count file (ctdb getreclock) * if the node thinks it is a recovery master but it detects another node that is DISCONNECTED but which still holds a lock to the reclock count file this may mean that we have a split cluster. if that other node that is DISCONNECTED but still holds the lock on hte reclock pnn count file, is MORE connected than the local node, yield the recmaster role and let the other half of the lcuster take over this add a second, last chance mechanism to detect split clusters. IF the cluster is split but GPFS is not yet split, this mechanism makes the largest half of the cluster become the active half. (This used to be ctdb commit 07af425f444531942cce8abff112c1524228d287) 2008-03-03 01:19:30 +03:00			`struct ctdb_node_map *nodemap;`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`uint32_t last_culprit;`
			`uint32_t culprit_counter;`
			`struct timeval first_recover_time;`
added admin commands to ban/unban nodes (This used to be ctdb commit 4dad04172e7e4955b5bf6444a85b19901c9683ad) 2007-06-07 10:34:33 +04:00			`struct ban_state **banned_nodes;`
use a priority time for the election data, not just the vnn (This used to be ctdb commit a691f9c5cd77194005f0d98483da94b07a48d57d) 2007-06-07 12:37:27 +04:00			`struct timeval priority_time;`
prevent recursion in the calling of ctdb_takeover_run (This used to be ctdb commit 0fbdeb7c91b965d9bc5ecc7b24e31070378d8f1d) 2007-09-13 08:08:18 +04:00			`bool need_takeover_run;`
- merge from ronnie - add a flag to check that recovery completed correctly. If not, re-trigger it in monitoring (This used to be ctdb commit d5ed941d9bab4af30d8b5f9b77bdf43d9218d69b) 2007-09-14 03:49:12 +04:00			`bool need_recovery;`
- catch ESTALE in the recovery lock by trying a read() - priortise nodes that are unbanned and healthy in the election (This used to be ctdb commit 929feb475dfdf7283f0e99b50b179e1c91d3a39f) 2007-10-05 07:28:21 +04:00			`uint32_t node_flags;`
make election handling much more scalable (This used to be ctdb commit 05938d462b92bd9ecb8e35f53651bded47c48675) 2007-11-13 02:27:44 +03:00			`struct timed_event *send_election_te;`
			`struct timed_event *election_timeout;`
ensure the recovery daemon is not clagged up by vacuum calls (This used to be ctdb commit ff7e80e247bf5a86adda0ef850d901478449675b) 2008-01-08 13:28:42 +03:00			`struct vacuum_info *vacuum_info;`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`};`
start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00
make recovery daemon values tunable (This used to be ctdb commit ec29dbf2f5110428df8b97801443ba7addf61353) 2007-06-04 14:22:44 +04:00			`#define CONTROL_TIMEOUT() timeval_current_ofs(ctdb->tunable.recover_timeout, 0)`
added health monitoring logic to ctdb, so a node loses its public IP address if one of the sybsystem event scripts reports a problem (This used to be ctdb commit c7a089256d86cec21097453bce5acbccee87413f) 2007-06-06 04:25:46 +04:00			`#define MONITOR_TIMEOUT() timeval_current_ofs(ctdb->tunable.recover_interval, 0)`
raise the control timeout in recovery (This used to be ctdb commit 43424ff66daf28c202c12982f20a9f662b6fb125) 2007-05-24 07:49:27 +04:00
convert much of the recovery logic to be async and parallel across all nodes (This used to be ctdb commit 8b72a02bf1045d8befb342a4111ca1316889262e) 2008-01-05 01:35:43 +03:00
added admin commands to ban/unban nodes (This used to be ctdb commit 4dad04172e7e4955b5bf6444a85b19901c9683ad) 2007-06-07 10:34:33 +04:00			`/*`
			`unban a node`
			`*/`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`static void ctdb_unban_node(struct ctdb_recoverd *rec, uint32_t pnn)`
added admin commands to ban/unban nodes (This used to be ctdb commit 4dad04172e7e4955b5bf6444a85b19901c9683ad) 2007-06-07 10:34:33 +04:00			`{`
			`struct ctdb_context *ctdb = rec->ctdb;`

merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_NOTICE,("Unbanning node %u\n", pnn));`
add log output for when ctdb_ban_node() and ctdb_unban_node() are called when these functions are called to ban or unban a node make sure we update the CTDB_NODE_BANNED flag in rec->node_flags since this field and flag are checked during the election process (This used to be ctdb commit 740c632ae96a2d34327d1b575780aaf079d93f4f) 2007-11-23 04:36:14 +03:00
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`if (!ctdb_validate_pnn(ctdb, pnn)) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,("Bad pnn %u in ctdb_unban_node\n", pnn));`
handle CTDB_CURRENT_NODE in ban commands (This used to be ctdb commit fefb53f1d22c5458a1e107f8352818aee87983de) 2007-06-07 10:48:31 +04:00			`return;`
			`}`

rework banning/unbanning nodes ctdb_recoverd.c Always handle banning/unbanning locally on the node that is being banned/unbanned instead of on the recovery master. This means that if a ban request comes in to the recovery master for a remote node, we pass the request on to the remote node instead of setting up the ban and ban timeouts locally. ctdb.c send ban/unban requests to the node being banned/unbanned instead of to the recmaster (This used to be ctdb commit 880dd9f5fd0b91e450da93e195cc5c62cb1dcd6e) 2007-12-03 07:45:53 +03:00			`/* If we are unbanning a different node then just pass the ban info on */`
			`if (pnn != ctdb->pnn) {`
			`TDB_DATA data;`
			`int ret;`

merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_NOTICE,("Unanning remote node %u. Passing the ban request on to the remote node.\n", pnn));`
rework banning/unbanning nodes ctdb_recoverd.c Always handle banning/unbanning locally on the node that is being banned/unbanned instead of on the recovery master. This means that if a ban request comes in to the recovery master for a remote node, we pass the request on to the remote node instead of setting up the ban and ban timeouts locally. ctdb.c send ban/unban requests to the node being banned/unbanned instead of to the recmaster (This used to be ctdb commit 880dd9f5fd0b91e450da93e195cc5c62cb1dcd6e) 2007-12-03 07:45:53 +03:00
			`data.dptr = (uint8_t *)&pnn;`
			`data.dsize = sizeof(uint32_t);`

			`ret = ctdb_send_message(ctdb, pnn, CTDB_SRVID_UNBAN_NODE, data);`
			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,("Failed to unban node %u\n", pnn));`
rework banning/unbanning nodes ctdb_recoverd.c Always handle banning/unbanning locally on the node that is being banned/unbanned instead of on the recovery master. This means that if a ban request comes in to the recovery master for a remote node, we pass the request on to the remote node instead of setting up the ban and ban timeouts locally. ctdb.c send ban/unban requests to the node being banned/unbanned instead of to the recmaster (This used to be ctdb commit 880dd9f5fd0b91e450da93e195cc5c62cb1dcd6e) 2007-12-03 07:45:53 +03:00			`return;`
			`}`

			`return;`
add log output for when ctdb_ban_node() and ctdb_unban_node() are called when these functions are called to ban or unban a node make sure we update the CTDB_NODE_BANNED flag in rec->node_flags since this field and flag are checked during the election process (This used to be ctdb commit 740c632ae96a2d34327d1b575780aaf079d93f4f) 2007-11-23 04:36:14 +03:00			`}`

rework banning/unbanning nodes ctdb_recoverd.c Always handle banning/unbanning locally on the node that is being banned/unbanned instead of on the recovery master. This means that if a ban request comes in to the recovery master for a remote node, we pass the request on to the remote node instead of setting up the ban and ban timeouts locally. ctdb.c send ban/unban requests to the node being banned/unbanned instead of to the recmaster (This used to be ctdb commit 880dd9f5fd0b91e450da93e195cc5c62cb1dcd6e) 2007-12-03 07:45:53 +03:00			`/* make sure we remember we are no longer banned in case`
			`there is an election */`
			`rec->node_flags &= ~NODE_FLAGS_BANNED;`

merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_INFO,("Clearing ban flag on node %u\n", pnn));`
rework banning/unbanning nodes ctdb_recoverd.c Always handle banning/unbanning locally on the node that is being banned/unbanned instead of on the recovery master. This means that if a ban request comes in to the recovery master for a remote node, we pass the request on to the remote node instead of setting up the ban and ban timeouts locally. ctdb.c send ban/unban requests to the node being banned/unbanned instead of to the recmaster (This used to be ctdb commit 880dd9f5fd0b91e450da93e195cc5c62cb1dcd6e) 2007-12-03 07:45:53 +03:00			`ctdb_ctrl_modflags(ctdb, CONTROL_TIMEOUT(), pnn, 0, NODE_FLAGS_BANNED);`

change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`if (rec->banned_nodes[pnn] == NULL) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_INFO,("No ban recorded for this node. ctdb_unban_node() request ignored\n"));`
added admin commands to ban/unban nodes (This used to be ctdb commit 4dad04172e7e4955b5bf6444a85b19901c9683ad) 2007-06-07 10:34:33 +04:00			`return;`
			`}`

change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`talloc_free(rec->banned_nodes[pnn]);`
			`rec->banned_nodes[pnn] = NULL;`
added admin commands to ban/unban nodes (This used to be ctdb commit 4dad04172e7e4955b5bf6444a85b19901c9683ad) 2007-06-07 10:34:33 +04:00			`}`


			`/*`
			`called when a ban has timed out`
			`*/`
			`static void ctdb_ban_timeout(struct event_context ev, struct timed_event te, struct timeval t, void *p)`
			`{`
			`struct ban_state *state = talloc_get_type(p, struct ban_state);`
			`struct ctdb_recoverd *rec = state->rec;`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`uint32_t pnn = state->banned_node;`
added admin commands to ban/unban nodes (This used to be ctdb commit 4dad04172e7e4955b5bf6444a85b19901c9683ad) 2007-06-07 10:34:33 +04:00
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_NOTICE,("Ban timeout. Node %u is now unbanned\n", pnn));`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`ctdb_unban_node(rec, pnn);`
added admin commands to ban/unban nodes (This used to be ctdb commit 4dad04172e7e4955b5bf6444a85b19901c9683ad) 2007-06-07 10:34:33 +04:00			`}`

			`/*`
			`ban a node for a period of time`
			`*/`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`static void ctdb_ban_node(struct ctdb_recoverd *rec, uint32_t pnn, uint32_t ban_time)`
added admin commands to ban/unban nodes (This used to be ctdb commit 4dad04172e7e4955b5bf6444a85b19901c9683ad) 2007-06-07 10:34:33 +04:00			`{`
			`struct ctdb_context *ctdb = rec->ctdb;`

merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_NOTICE,("Banning node %u for %u seconds\n", pnn, ban_time));`
add log output for when ctdb_ban_node() and ctdb_unban_node() are called when these functions are called to ban or unban a node make sure we update the CTDB_NODE_BANNED flag in rec->node_flags since this field and flag are checked during the election process (This used to be ctdb commit 740c632ae96a2d34327d1b575780aaf079d93f4f) 2007-11-23 04:36:14 +03:00
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`if (!ctdb_validate_pnn(ctdb, pnn)) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,("Bad pnn %u in ctdb_ban_node\n", pnn));`
handle CTDB_CURRENT_NODE in ban commands (This used to be ctdb commit fefb53f1d22c5458a1e107f8352818aee87983de) 2007-06-07 10:48:31 +04:00			`return;`
			`}`

add config option for disabling bans (This used to be ctdb commit 153b911f7f957d4c564b04f5aa878033a02da9e4) 2007-10-15 07:22:58 +04:00			`if (0 == ctdb->tunable.enable_bans) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_INFO,("Bans are disabled - ignoring ban of node %u\n", pnn));`
add config option for disabling bans (This used to be ctdb commit 153b911f7f957d4c564b04f5aa878033a02da9e4) 2007-10-15 07:22:58 +04:00			`return;`
			`}`

rework banning/unbanning nodes ctdb_recoverd.c Always handle banning/unbanning locally on the node that is being banned/unbanned instead of on the recovery master. This means that if a ban request comes in to the recovery master for a remote node, we pass the request on to the remote node instead of setting up the ban and ban timeouts locally. ctdb.c send ban/unban requests to the node being banned/unbanned instead of to the recmaster (This used to be ctdb commit 880dd9f5fd0b91e450da93e195cc5c62cb1dcd6e) 2007-12-03 07:45:53 +03:00			`/* If we are banning a different node then just pass the ban info on */`
			`if (pnn != ctdb->pnn) {`
			`struct ctdb_ban_info b;`
			`TDB_DATA data;`
			`int ret;`

merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_NOTICE,("Banning remote node %u for %u seconds. Passing the ban request on to the remote node.\n", pnn, ban_time));`
rework banning/unbanning nodes ctdb_recoverd.c Always handle banning/unbanning locally on the node that is being banned/unbanned instead of on the recovery master. This means that if a ban request comes in to the recovery master for a remote node, we pass the request on to the remote node instead of setting up the ban and ban timeouts locally. ctdb.c send ban/unban requests to the node being banned/unbanned instead of to the recmaster (This used to be ctdb commit 880dd9f5fd0b91e450da93e195cc5c62cb1dcd6e) 2007-12-03 07:45:53 +03:00
			`b.pnn = pnn;`
			`b.ban_time = ban_time;`

			`data.dptr = (uint8_t *)&b;`
			`data.dsize = sizeof(b);`

			`ret = ctdb_send_message(ctdb, pnn, CTDB_SRVID_BAN_NODE, data);`
			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,("Failed to ban node %u\n", pnn));`
rework banning/unbanning nodes ctdb_recoverd.c Always handle banning/unbanning locally on the node that is being banned/unbanned instead of on the recovery master. This means that if a ban request comes in to the recovery master for a remote node, we pass the request on to the remote node instead of setting up the ban and ban timeouts locally. ctdb.c send ban/unban requests to the node being banned/unbanned instead of to the recmaster (This used to be ctdb commit 880dd9f5fd0b91e450da93e195cc5c62cb1dcd6e) 2007-12-03 07:45:53 +03:00			`return;`
			`}`
add log output for when ctdb_ban_node() and ctdb_unban_node() are called when these functions are called to ban or unban a node make sure we update the CTDB_NODE_BANNED flag in rec->node_flags since this field and flag are checked during the election process (This used to be ctdb commit 740c632ae96a2d34327d1b575780aaf079d93f4f) 2007-11-23 04:36:14 +03:00
rework banning/unbanning nodes ctdb_recoverd.c Always handle banning/unbanning locally on the node that is being banned/unbanned instead of on the recovery master. This means that if a ban request comes in to the recovery master for a remote node, we pass the request on to the remote node instead of setting up the ban and ban timeouts locally. ctdb.c send ban/unban requests to the node being banned/unbanned instead of to the recmaster (This used to be ctdb commit 880dd9f5fd0b91e450da93e195cc5c62cb1dcd6e) 2007-12-03 07:45:53 +03:00			`return;`
use a priority time for the election data, not just the vnn (This used to be ctdb commit a691f9c5cd77194005f0d98483da94b07a48d57d) 2007-06-07 12:37:27 +04:00			`}`

merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_NOTICE,("self ban - lowering our election priority\n"));`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`ctdb_ctrl_modflags(ctdb, CONTROL_TIMEOUT(), pnn, NODE_FLAGS_BANNED, 0);`
added admin commands to ban/unban nodes (This used to be ctdb commit 4dad04172e7e4955b5bf6444a85b19901c9683ad) 2007-06-07 10:34:33 +04:00
rework banning/unbanning nodes ctdb_recoverd.c Always handle banning/unbanning locally on the node that is being banned/unbanned instead of on the recovery master. This means that if a ban request comes in to the recovery master for a remote node, we pass the request on to the remote node instead of setting up the ban and ban timeouts locally. ctdb.c send ban/unban requests to the node being banned/unbanned instead of to the recmaster (This used to be ctdb commit 880dd9f5fd0b91e450da93e195cc5c62cb1dcd6e) 2007-12-03 07:45:53 +03:00			`/* banning ourselves - lower our election priority */`
			`rec->priority_time = timeval_current();`

			`/* make sure we remember we are banned in case there is an`
			`election */`
			`rec->node_flags \|= NODE_FLAGS_BANNED;`

check for recursive bans in ctdb_ban_node() and remove the previous ban if this is an attempt to ban an already banned node (This used to be ctdb commit 214f2d7b04d0a491d466fc85c8d016efde416f9e) 2007-11-23 04:38:37 +03:00			`if (rec->banned_nodes[pnn] != NULL) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_NOTICE,("Re-banning an already banned node. Remove previous ban and set a new ban.\n"));`
check for recursive bans in ctdb_ban_node() and remove the previous ban if this is an attempt to ban an already banned node (This used to be ctdb commit 214f2d7b04d0a491d466fc85c8d016efde416f9e) 2007-11-23 04:38:37 +03:00			`talloc_free(rec->banned_nodes[pnn]);`
			`rec->banned_nodes[pnn] = NULL;`
			`}`

for the banned status, we should allocate this structure as a child of the banned_nodes array and not the rec structure so that ban_state is destroyed when the banned_nodes array gets destroyed (and so that when this struct is destroyed, that any pending ctdb_ban_timeout events are also destroyed.) othervise we may end up with multiple ban_timeout timed events going in parallell since we destroy/recreate the banned_nodes structure during election but we never destroy/recreate the rec structure. (This used to be ctdb commit fbd663d56a2a4421a5c0e541962c87e2e9c7cd82) 2007-12-03 03:39:17 +03:00			`rec->banned_nodes[pnn] = talloc(rec->banned_nodes, struct ban_state);`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`CTDB_NO_MEMORY_FATAL(ctdb, rec->banned_nodes[pnn]);`
added admin commands to ban/unban nodes (This used to be ctdb commit 4dad04172e7e4955b5bf6444a85b19901c9683ad) 2007-06-07 10:34:33 +04:00
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`rec->banned_nodes[pnn]->rec = rec;`
			`rec->banned_nodes[pnn]->banned_node = pnn;`
added admin commands to ban/unban nodes (This used to be ctdb commit 4dad04172e7e4955b5bf6444a85b19901c9683ad) 2007-06-07 10:34:33 +04:00
			`if (ban_time != 0) {`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`event_add_timed(ctdb->ev, rec->banned_nodes[pnn],`
added admin commands to ban/unban nodes (This used to be ctdb commit 4dad04172e7e4955b5bf6444a85b19901c9683ad) 2007-06-07 10:34:33 +04:00			`timeval_current_ofs(ban_time, 0),`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`ctdb_ban_timeout, rec->banned_nodes[pnn]);`
added admin commands to ban/unban nodes (This used to be ctdb commit 4dad04172e7e4955b5bf6444a85b19901c9683ad) 2007-06-07 10:34:33 +04:00			`}`
			`}`

add async versions of the freeze node control and freeze all nodes in parallell (This used to be ctdb commit f34e89f54d9f4380e76eb1b5b2385a4d8500b505) 2007-08-27 04:31:22 +04:00			`enum monitor_result { MONITOR_OK, MONITOR_RECOVERY_NEEDED, MONITOR_ELECTION_NEEDED, MONITOR_FAILED};`


merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`/*`
			`run the "recovered" eventscript on all nodes`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`*/`
Update some debug statements. Dont say that recovery failed if the failed function was invoked from outside of recovery (This used to be ctdb commit 3038d0b74895b51af4f85f2f304508ed16d245f4) 2008-05-15 06:28:52 +04:00			`static int run_recovered_eventscript(struct ctdb_context ctdb, struct ctdb_node_map nodemap, const char *caller)`
add async versions of the freeze node control and freeze all nodes in parallell (This used to be ctdb commit f34e89f54d9f4380e76eb1b5b2385a4d8500b505) 2007-08-27 04:31:22 +04:00			`{`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`TALLOC_CTX *tmp_ctx;`
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`uint32_t *nodes;`
add async versions of the freeze node control and freeze all nodes in parallell (This used to be ctdb commit f34e89f54d9f4380e76eb1b5b2385a4d8500b505) 2007-08-27 04:31:22 +04:00
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`tmp_ctx = talloc_new(ctdb);`
			`CTDB_NO_MEMORY(ctdb, tmp_ctx);`
add async versions of the freeze node control and freeze all nodes in parallell (This used to be ctdb commit f34e89f54d9f4380e76eb1b5b2385a4d8500b505) 2007-08-27 04:31:22 +04:00
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`nodes = list_of_active_nodes(ctdb, nodemap, tmp_ctx, true);`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`if (ctdb_client_async_control(ctdb, CTDB_CONTROL_END_RECOVERY,`
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`nodes,`
			`CONTROL_TIMEOUT(), false, tdb_null,`
			`NULL, NULL,`
			`NULL) != 0) {`
Update some debug statements. Dont say that recovery failed if the failed function was invoked from outside of recovery (This used to be ctdb commit 3038d0b74895b51af4f85f2f304508ed16d245f4) 2008-05-15 06:28:52 +04:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to run the 'recovered' event when called from %s\n", caller));`

merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`talloc_free(tmp_ctx);`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`return -1;`
add async versions of the freeze node control and freeze all nodes in parallell (This used to be ctdb commit f34e89f54d9f4380e76eb1b5b2385a4d8500b505) 2007-08-27 04:31:22 +04:00			`}`

merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`talloc_free(tmp_ctx);`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`return 0;`
add async versions of the freeze node control and freeze all nodes in parallell (This used to be ctdb commit f34e89f54d9f4380e76eb1b5b2385a4d8500b505) 2007-08-27 04:31:22 +04:00			`}`

add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`/*`
			`remember the trouble maker`
			`*/`
			`static void ctdb_set_culprit(struct ctdb_recoverd *rec, uint32_t culprit)`
			`{`
			`struct ctdb_context *ctdb = rec->ctdb;`

			`if (rec->last_culprit != culprit \|\|`
			`timeval_elapsed(&rec->first_recover_time) > ctdb->tunable.recovery_grace_period) {`
			`DEBUG(DEBUG_NOTICE,("New recovery culprit %u\n", culprit));`
			`/* either a new node is the culprit, or we've decided to forgive them */`
			`rec->last_culprit = culprit;`
			`rec->first_recover_time = timeval_current();`
			`rec->culprit_counter = 0;`
			`}`
			`rec->culprit_counter++;`
			`}`


			`/* this callback is called for every node that failed to execute the`
			`start recovery event`
			`*/`
			`static void startrecovery_fail_callback(struct ctdb_context ctdb, uint32_t node_pnn, int32_t res, TDB_DATA outdata, void callback_data)`
			`{`
			`struct ctdb_recoverd *rec = talloc_get_type(callback_data, struct ctdb_recoverd);`

			`DEBUG(DEBUG_ERR, (__location__ " Node %u failed the startrecovery event. Setting it as recovery fail culprit\n", node_pnn));`

			`ctdb_set_culprit(rec, node_pnn);`
			`}`

merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`/*`
			`run the "startrecovery" eventscript on all nodes`
			`*/`
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`static int run_startrecovery_eventscript(struct ctdb_recoverd rec, struct ctdb_node_map nodemap)`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`{`
			`TALLOC_CTX *tmp_ctx;`
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`uint32_t *nodes;`
			`struct ctdb_context *ctdb = rec->ctdb;`
added admin commands to ban/unban nodes (This used to be ctdb commit 4dad04172e7e4955b5bf6444a85b19901c9683ad) 2007-06-07 10:34:33 +04:00
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`tmp_ctx = talloc_new(ctdb);`
			`CTDB_NO_MEMORY(ctdb, tmp_ctx);`

add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`nodes = list_of_active_nodes(ctdb, nodemap, tmp_ctx, true);`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`if (ctdb_client_async_control(ctdb, CTDB_CONTROL_START_RECOVERY,`
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`nodes,`
			`CONTROL_TIMEOUT(), false, tdb_null,`
			`NULL,`
			`startrecovery_fail_callback,`
			`rec) != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to run the 'startrecovery' event. Recovery failed.\n"));`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`talloc_free(tmp_ctx);`
			`return -1;`
			`}`

			`talloc_free(tmp_ctx);`
			`return 0;`
			`}`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`static void async_getcap_callback(struct ctdb_context ctdb, uint32_t node_pnn, int32_t res, TDB_DATA outdata, void callback_data)`
Expand the client async framework so that it can take a callback function. This allows us to use the async framework also for controls that return outdata. Add a "capabilities" field to the ctdb_node structure. This field is only initialized and kept valid inside the recovery daemon context and not inside the main ctdb daemon. change the GET_CAPABILITIES control to return the capabilities in outdata instead of in the res return variable. When performing a recovery inside the recovery daemon, read the capabilities from all connected nodes and update the ctdb->nodes list of nodes. when building the new vnnmap after the database rebuild in recovery, do not include any nodes which lack the LMASTER capability in the new vnnmap. Unless there are no available connected node that sports the LMASTER capability in which case we let the local node (recmaster) take on the lmaster role temporarily (i.e. become a member of the vnnmap list) (This used to be ctdb commit 0f1883c69c689b28b0c04148774840b2c4081df6) 2008-05-06 09:42:59 +04:00			`{`
			`if ( (outdata.dsize != sizeof(uint32_t)) \|\| (outdata.dptr == NULL) ) {`
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`DEBUG(DEBUG_ERR, (__location__ " Invalid lenght/pointer for getcap callback : %u %p\n", (unsigned)outdata.dsize, outdata.dptr));`
Expand the client async framework so that it can take a callback function. This allows us to use the async framework also for controls that return outdata. Add a "capabilities" field to the ctdb_node structure. This field is only initialized and kept valid inside the recovery daemon context and not inside the main ctdb daemon. change the GET_CAPABILITIES control to return the capabilities in outdata instead of in the res return variable. When performing a recovery inside the recovery daemon, read the capabilities from all connected nodes and update the ctdb->nodes list of nodes. when building the new vnnmap after the database rebuild in recovery, do not include any nodes which lack the LMASTER capability in the new vnnmap. Unless there are no available connected node that sports the LMASTER capability in which case we let the local node (recmaster) take on the lmaster role temporarily (i.e. become a member of the vnnmap list) (This used to be ctdb commit 0f1883c69c689b28b0c04148774840b2c4081df6) 2008-05-06 09:42:59 +04:00			`return;`
			`}`
			`ctdb->nodes[node_pnn]->capabilities = ((uint32_t )outdata.dptr);`
			`}`

			`/*`
			`update the node capabilities for all connected nodes`
			`*/`
			`static int update_capabilities(struct ctdb_context ctdb, struct ctdb_node_map nodemap)`
			`{`
			`uint32_t *nodes;`
			`TALLOC_CTX *tmp_ctx;`

			`tmp_ctx = talloc_new(ctdb);`
			`CTDB_NO_MEMORY(ctdb, tmp_ctx);`

			`nodes = list_of_active_nodes(ctdb, nodemap, tmp_ctx, true);`
			`if (ctdb_client_async_control(ctdb, CTDB_CONTROL_GET_CAPABILITIES,`
			`nodes, CONTROL_TIMEOUT(),`
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`false, tdb_null,`
			`async_getcap_callback, NULL,`
			`NULL) != 0) {`
Expand the client async framework so that it can take a callback function. This allows us to use the async framework also for controls that return outdata. Add a "capabilities" field to the ctdb_node structure. This field is only initialized and kept valid inside the recovery daemon context and not inside the main ctdb daemon. change the GET_CAPABILITIES control to return the capabilities in outdata instead of in the res return variable. When performing a recovery inside the recovery daemon, read the capabilities from all connected nodes and update the ctdb->nodes list of nodes. when building the new vnnmap after the database rebuild in recovery, do not include any nodes which lack the LMASTER capability in the new vnnmap. Unless there are no available connected node that sports the LMASTER capability in which case we let the local node (recmaster) take on the lmaster role temporarily (i.e. become a member of the vnnmap list) (This used to be ctdb commit 0f1883c69c689b28b0c04148774840b2c4081df6) 2008-05-06 09:42:59 +04:00			`DEBUG(DEBUG_ERR, (__location__ " Failed to read node capabilities.\n"));`
			`talloc_free(tmp_ctx);`
			`return -1;`
			`}`

			`talloc_free(tmp_ctx);`
			`return 0;`
			`}`

implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`/*`
			`change recovery mode on all nodes`
			`*/`
break out the setting/clearing of recovery mode into a dedicated helper function (This used to be ctdb commit dba4e4f8aa4f2fde1e9f8d93bdf3a33f7de8ce18) 2007-05-06 03:53:12 +04:00			`static int set_recovery_mode(struct ctdb_context ctdb, struct ctdb_node_map nodemap, uint32_t rec_mode)`
			`{`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`TDB_DATA data;`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`uint32_t *nodes;`
			`TALLOC_CTX *tmp_ctx;`

			`tmp_ctx = talloc_new(ctdb);`
			`CTDB_NO_MEMORY(ctdb, tmp_ctx);`

add async versions of the freeze node control and freeze all nodes in parallell (This used to be ctdb commit f34e89f54d9f4380e76eb1b5b2385a4d8500b505) 2007-08-27 04:31:22 +04:00			`/* freeze all nodes */`
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`nodes = list_of_active_nodes(ctdb, nodemap, tmp_ctx, true);`
add async versions of the freeze node control and freeze all nodes in parallell (This used to be ctdb commit f34e89f54d9f4380e76eb1b5b2385a4d8500b505) 2007-08-27 04:31:22 +04:00			`if (rec_mode == CTDB_RECOVERY_ACTIVE) {`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`if (ctdb_client_async_control(ctdb, CTDB_CONTROL_FREEZE,`
			`nodes, CONTROL_TIMEOUT(),`
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`false, tdb_null,`
			`NULL, NULL,`
			`NULL) != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to freeze nodes. Recovery failed.\n"));`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`talloc_free(tmp_ctx);`
add async versions of the freeze node control and freeze all nodes in parallell (This used to be ctdb commit f34e89f54d9f4380e76eb1b5b2385a4d8500b505) 2007-08-27 04:31:22 +04:00			`return -1;`
			`}`
			`}`

make sure we start the freeze process quickly on all nodes when we are going to do recovery - this prevents serialisation of freeze, which can take a long time (This used to be ctdb commit 52675c19e420d83d9556a3e73d9a4b490078aa9c) 2007-06-11 17:03:23 +04:00
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`data.dsize = sizeof(uint32_t);`
			`data.dptr = (unsigned char *)&rec_mode;`
break out the setting/clearing of recovery mode into a dedicated helper function (This used to be ctdb commit dba4e4f8aa4f2fde1e9f8d93bdf3a33f7de8ce18) 2007-05-06 03:53:12 +04:00
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`if (ctdb_client_async_control(ctdb, CTDB_CONTROL_SET_RECMODE,`
			`nodes, CONTROL_TIMEOUT(),`
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`false, data,`
			`NULL, NULL,`
			`NULL) != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to set recovery mode. Recovery failed.\n"));`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`talloc_free(tmp_ctx);`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`return -1;`
			`}`
separate out the freeze/thaw handling from recovery (This used to be ctdb commit 0b0640bd8b8334961f240e0cf276ac112cd6e616) 2007-05-12 09:15:27 +04:00
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`if (rec_mode == CTDB_RECOVERY_NORMAL) {`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`if (ctdb_client_async_control(ctdb, CTDB_CONTROL_THAW,`
			`nodes, CONTROL_TIMEOUT(),`
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`false, tdb_null,`
			`NULL, NULL,`
			`NULL) != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to thaw nodes. Recovery failed.\n"));`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`talloc_free(tmp_ctx);`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`return -1;`
separate out the freeze/thaw handling from recovery (This used to be ctdb commit 0b0640bd8b8334961f240e0cf276ac112cd6e616) 2007-05-12 09:15:27 +04:00			`}`
break out the setting/clearing of recovery mode into a dedicated helper function (This used to be ctdb commit dba4e4f8aa4f2fde1e9f8d93bdf3a33f7de8ce18) 2007-05-06 03:53:12 +04:00			`}`

merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`talloc_free(tmp_ctx);`
break out the setting/clearing of recovery mode into a dedicated helper function (This used to be ctdb commit dba4e4f8aa4f2fde1e9f8d93bdf3a33f7de8ce18) 2007-05-06 03:53:12 +04:00			`return 0;`
			`}`

implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`/*`
			`change recovery master on all node`
			`*/`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`static int set_recovery_master(struct ctdb_context ctdb, struct ctdb_node_map nodemap, uint32_t pnn)`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`{`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`TDB_DATA data;`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`TALLOC_CTX *tmp_ctx;`
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`uint32_t *nodes;`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00
			`tmp_ctx = talloc_new(ctdb);`
			`CTDB_NO_MEMORY(ctdb, tmp_ctx);`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`data.dsize = sizeof(uint32_t);`
			`data.dptr = (unsigned char *)&pnn;`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`nodes = list_of_active_nodes(ctdb, nodemap, tmp_ctx, true);`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`if (ctdb_client_async_control(ctdb, CTDB_CONTROL_SET_RECMASTER,`
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`nodes,`
			`CONTROL_TIMEOUT(), false, data,`
			`NULL, NULL,`
			`NULL) != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to set recmaster. Recovery failed.\n"));`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`talloc_free(tmp_ctx);`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`return -1;`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`}`

merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`talloc_free(tmp_ctx);`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`return 0;`
			`}`

implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00
			`/*`
			`ensure all other nodes have attached to any databases that we have`
			`*/`
			`static int create_missing_remote_databases(struct ctdb_context ctdb, struct ctdb_node_map nodemap,`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`uint32_t pnn, struct ctdb_dbid_map dbmap, TALLOC_CTX mem_ctx)`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`{`
recovery daemon this program is a client to the local ctdb daemon every second it pulls all vnnmap and nodemaps from all nodes that are available and checks if a recovery is required a recovery is required if : * all nodes do NOT have an identical vnnmap and generation * all nodes do NOT have an identical nodemap * there are active nodes that are NOT in the nodemap * there are nodes in the nodemap that are NOT active During recovery, the recovery tool will also make sure that all nodes know about and have created all databases. (This used to be ctdb commit 2f2650467bac7e8954de7c17cb34f46b0bdbcd26) 2007-05-04 09:21:40 +04:00			`int i, j, db, ret;`
			`struct ctdb_dbid_map *remote_dbmap;`

update to rhe recovery daemon ctdb_ctrl_ calls are timedout due to nodes arriving or leaving the cluster it crashes the recovery daemon afterwards with a SEGV but no useful stack backtrace (This used to be ctdb commit cd3abc7349e86555ccd87cd47a1dcc2adad2f46c) 2007-05-06 00:58:01 +04:00			`/* verify that all other nodes have all our databases */`
			`for (j=0; j<nodemap->num; j++) {`
			`/* we dont need to ourself ourselves */`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`if (nodemap->nodes[j].pnn == pnn) {`
update to rhe recovery daemon ctdb_ctrl_ calls are timedout due to nodes arriving or leaving the cluster it crashes the recovery daemon afterwards with a SEGV but no useful stack backtrace (This used to be ctdb commit cd3abc7349e86555ccd87cd47a1dcc2adad2f46c) 2007-05-06 00:58:01 +04:00			`continue;`
			`}`
			`/* dont check nodes that are unavailable */`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`if (nodemap->nodes[j].flags & NODE_FLAGS_INACTIVE) {`
update to rhe recovery daemon ctdb_ctrl_ calls are timedout due to nodes arriving or leaving the cluster it crashes the recovery daemon afterwards with a SEGV but no useful stack backtrace (This used to be ctdb commit cd3abc7349e86555ccd87cd47a1dcc2adad2f46c) 2007-05-06 00:58:01 +04:00			`continue;`
			`}`

change how we do public addresses and takeover so that we can have multiple public addresses spread across multiple interfaces on each node. this is a massive patch since we have previously made the assumtion that we only have one public address per node. get rid of the public_interface argument. the public addresses file now explicitely lists which interface the address belongs to (This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8) 2007-09-04 03:50:07 +04:00			`ret = ctdb_ctrl_getdbmap(ctdb, CONTROL_TIMEOUT(), nodemap->nodes[j].pnn,`
formatting fixes (This used to be ctdb commit ed63a2057698aed3931762605b2ea2368681af2b) 2007-06-07 12:39:37 +04:00			`mem_ctx, &remote_dbmap);`
update to rhe recovery daemon ctdb_ctrl_ calls are timedout due to nodes arriving or leaving the cluster it crashes the recovery daemon afterwards with a SEGV but no useful stack backtrace (This used to be ctdb commit cd3abc7349e86555ccd87cd47a1dcc2adad2f46c) 2007-05-06 00:58:01 +04:00			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to get dbids from node %u\n", pnn));`
update to rhe recovery daemon ctdb_ctrl_ calls are timedout due to nodes arriving or leaving the cluster it crashes the recovery daemon afterwards with a SEGV but no useful stack backtrace (This used to be ctdb commit cd3abc7349e86555ccd87cd47a1dcc2adad2f46c) 2007-05-06 00:58:01 +04:00			`return -1;`
			`}`

			`/* step through all local databases */`
			`for (db=0; db<dbmap->num;db++) {`
			`const char *name;`


			`for (i=0;i<remote_dbmap->num;i++) {`
added support for persistent databases in ctdbd (This used to be ctdb commit 3115090a0d882beca9d70761130b74bb0821f201) 2007-09-21 06:24:02 +04:00			`if (dbmap->dbs[db].dbid == remote_dbmap->dbs[i].dbid) {`
update to rhe recovery daemon ctdb_ctrl_ calls are timedout due to nodes arriving or leaving the cluster it crashes the recovery daemon afterwards with a SEGV but no useful stack backtrace (This used to be ctdb commit cd3abc7349e86555ccd87cd47a1dcc2adad2f46c) 2007-05-06 00:58:01 +04:00			`break;`
			`}`
			`}`
			`/* the remote node already have this database */`
			`if (i!=remote_dbmap->num) {`
			`continue;`
			`}`
			`/* ok so we need to create this database */`
added support for persistent databases in ctdbd (This used to be ctdb commit 3115090a0d882beca9d70761130b74bb0821f201) 2007-09-21 06:24:02 +04:00			`ctdb_ctrl_getdbname(ctdb, CONTROL_TIMEOUT(), pnn, dbmap->dbs[db].dbid,`
			`mem_ctx, &name);`
update to rhe recovery daemon ctdb_ctrl_ calls are timedout due to nodes arriving or leaving the cluster it crashes the recovery daemon afterwards with a SEGV but no useful stack backtrace (This used to be ctdb commit cd3abc7349e86555ccd87cd47a1dcc2adad2f46c) 2007-05-06 00:58:01 +04:00			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to get dbname from node %u\n", pnn));`
update to rhe recovery daemon ctdb_ctrl_ calls are timedout due to nodes arriving or leaving the cluster it crashes the recovery daemon afterwards with a SEGV but no useful stack backtrace (This used to be ctdb commit cd3abc7349e86555ccd87cd47a1dcc2adad2f46c) 2007-05-06 00:58:01 +04:00			`return -1;`
			`}`
added support for persistent databases in ctdbd (This used to be ctdb commit 3115090a0d882beca9d70761130b74bb0821f201) 2007-09-21 06:24:02 +04:00			`ctdb_ctrl_createdb(ctdb, CONTROL_TIMEOUT(), nodemap->nodes[j].pnn,`
			`mem_ctx, name, dbmap->dbs[db].persistent);`
update to rhe recovery daemon ctdb_ctrl_ calls are timedout due to nodes arriving or leaving the cluster it crashes the recovery daemon afterwards with a SEGV but no useful stack backtrace (This used to be ctdb commit cd3abc7349e86555ccd87cd47a1dcc2adad2f46c) 2007-05-06 00:58:01 +04:00			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to create remote db:%s\n", name));`
update to rhe recovery daemon ctdb_ctrl_ calls are timedout due to nodes arriving or leaving the cluster it crashes the recovery daemon afterwards with a SEGV but no useful stack backtrace (This used to be ctdb commit cd3abc7349e86555ccd87cd47a1dcc2adad2f46c) 2007-05-06 00:58:01 +04:00			`return -1;`
			`}`
			`}`
			`}`
recovery daemon this program is a client to the local ctdb daemon every second it pulls all vnnmap and nodemaps from all nodes that are available and checks if a recovery is required a recovery is required if : * all nodes do NOT have an identical vnnmap and generation * all nodes do NOT have an identical nodemap * there are active nodes that are NOT in the nodemap * there are nodes in the nodemap that are NOT active During recovery, the recovery tool will also make sure that all nodes know about and have created all databases. (This used to be ctdb commit 2f2650467bac7e8954de7c17cb34f46b0bdbcd26) 2007-05-04 09:21:40 +04:00
add a helper function to create all missing remote databases detected during recovery (This used to be ctdb commit 04758c6f7d8f61260be6d2472380cb7904984427) 2007-05-06 04:04:37 +04:00			`return 0;`
			`}`


implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`/*`
			`ensure we are attached to any databases that anyone else is attached to`
			`*/`
			`static int create_missing_local_databases(struct ctdb_context ctdb, struct ctdb_node_map nodemap,`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`uint32_t pnn, struct ctdb_dbid_map *dbmap, TALLOC_CTX mem_ctx)`
create a helper function to make sure the local node that does recovery has all the databases that exist on any other remote node (This used to be ctdb commit 0f436e3d40fea6e6a146019b0c664e80e81e88b4) 2007-05-06 04:12:42 +04:00			`{`
			`int i, j, db, ret;`
			`struct ctdb_dbid_map *remote_dbmap;`

			`/* verify that we have all database any other node has */`
			`for (j=0; j<nodemap->num; j++) {`
			`/* we dont need to ourself ourselves */`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`if (nodemap->nodes[j].pnn == pnn) {`
create a helper function to make sure the local node that does recovery has all the databases that exist on any other remote node (This used to be ctdb commit 0f436e3d40fea6e6a146019b0c664e80e81e88b4) 2007-05-06 04:12:42 +04:00			`continue;`
			`}`
			`/* dont check nodes that are unavailable */`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`if (nodemap->nodes[j].flags & NODE_FLAGS_INACTIVE) {`
create a helper function to make sure the local node that does recovery has all the databases that exist on any other remote node (This used to be ctdb commit 0f436e3d40fea6e6a146019b0c664e80e81e88b4) 2007-05-06 04:12:42 +04:00			`continue;`
			`}`

change how we do public addresses and takeover so that we can have multiple public addresses spread across multiple interfaces on each node. this is a massive patch since we have previously made the assumtion that we only have one public address per node. get rid of the public_interface argument. the public addresses file now explicitely lists which interface the address belongs to (This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8) 2007-09-04 03:50:07 +04:00			`ret = ctdb_ctrl_getdbmap(ctdb, CONTROL_TIMEOUT(), nodemap->nodes[j].pnn,`
formatting fixes (This used to be ctdb commit ed63a2057698aed3931762605b2ea2368681af2b) 2007-06-07 12:39:37 +04:00			`mem_ctx, &remote_dbmap);`
create a helper function to make sure the local node that does recovery has all the databases that exist on any other remote node (This used to be ctdb commit 0f436e3d40fea6e6a146019b0c664e80e81e88b4) 2007-05-06 04:12:42 +04:00			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to get dbids from node %u\n", pnn));`
create a helper function to make sure the local node that does recovery has all the databases that exist on any other remote node (This used to be ctdb commit 0f436e3d40fea6e6a146019b0c664e80e81e88b4) 2007-05-06 04:12:42 +04:00			`return -1;`
			`}`

			`/* step through all databases on the remote node */`
			`for (db=0; db<remote_dbmap->num;db++) {`
			`const char *name;`

			`for (i=0;i<(*dbmap)->num;i++) {`
added support for persistent databases in ctdbd (This used to be ctdb commit 3115090a0d882beca9d70761130b74bb0821f201) 2007-09-21 06:24:02 +04:00			`if (remote_dbmap->dbs[db].dbid == (*dbmap)->dbs[i].dbid) {`
create a helper function to make sure the local node that does recovery has all the databases that exist on any other remote node (This used to be ctdb commit 0f436e3d40fea6e6a146019b0c664e80e81e88b4) 2007-05-06 04:12:42 +04:00			`break;`
			`}`
			`}`
			`/* we already have this db locally */`
			`if (i!=(*dbmap)->num) {`
			`continue;`
			`}`
			`/* ok so we need to create this database and`
			`rebuild dbmap`
			`*/`
change how we do public addresses and takeover so that we can have multiple public addresses spread across multiple interfaces on each node. this is a massive patch since we have previously made the assumtion that we only have one public address per node. get rid of the public_interface argument. the public addresses file now explicitely lists which interface the address belongs to (This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8) 2007-09-04 03:50:07 +04:00			`ctdb_ctrl_getdbname(ctdb, CONTROL_TIMEOUT(), nodemap->nodes[j].pnn,`
added support for persistent databases in ctdbd (This used to be ctdb commit 3115090a0d882beca9d70761130b74bb0821f201) 2007-09-21 06:24:02 +04:00			`remote_dbmap->dbs[db].dbid, mem_ctx, &name);`
create a helper function to make sure the local node that does recovery has all the databases that exist on any other remote node (This used to be ctdb commit 0f436e3d40fea6e6a146019b0c664e80e81e88b4) 2007-05-06 04:12:42 +04:00			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to get dbname from node %u\n",`
change how we do public addresses and takeover so that we can have multiple public addresses spread across multiple interfaces on each node. this is a massive patch since we have previously made the assumtion that we only have one public address per node. get rid of the public_interface argument. the public addresses file now explicitely lists which interface the address belongs to (This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8) 2007-09-04 03:50:07 +04:00			`nodemap->nodes[j].pnn));`
create a helper function to make sure the local node that does recovery has all the databases that exist on any other remote node (This used to be ctdb commit 0f436e3d40fea6e6a146019b0c664e80e81e88b4) 2007-05-06 04:12:42 +04:00			`return -1;`
			`}`
added support for persistent databases in ctdbd (This used to be ctdb commit 3115090a0d882beca9d70761130b74bb0821f201) 2007-09-21 06:24:02 +04:00			`ctdb_ctrl_createdb(ctdb, CONTROL_TIMEOUT(), pnn, mem_ctx, name,`
			`remote_dbmap->dbs[db].persistent);`
create a helper function to make sure the local node that does recovery has all the databases that exist on any other remote node (This used to be ctdb commit 0f436e3d40fea6e6a146019b0c664e80e81e88b4) 2007-05-06 04:12:42 +04:00			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to create local db:%s\n", name));`
create a helper function to make sure the local node that does recovery has all the databases that exist on any other remote node (This used to be ctdb commit 0f436e3d40fea6e6a146019b0c664e80e81e88b4) 2007-05-06 04:12:42 +04:00			`return -1;`
			`}`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`ret = ctdb_ctrl_getdbmap(ctdb, CONTROL_TIMEOUT(), pnn, mem_ctx, dbmap);`
create a helper function to make sure the local node that does recovery has all the databases that exist on any other remote node (This used to be ctdb commit 0f436e3d40fea6e6a146019b0c664e80e81e88b4) 2007-05-06 04:12:42 +04:00			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to reread dbmap on node %u\n", pnn));`
create a helper function to make sure the local node that does recovery has all the databases that exist on any other remote node (This used to be ctdb commit 0f436e3d40fea6e6a146019b0c664e80e81e88b4) 2007-05-06 04:12:42 +04:00			`return -1;`
			`}`
			`}`
			`}`

			`return 0;`
			`}`

create a helper function for recovery that pulls and merges all remote databases onto the local node (This used to be ctdb commit 5cecc47449c369f91e83389a94b987ac32b1e3f4) 2007-05-06 04:16:48 +04:00
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`/*`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`pull the remote database contents from one node into the recdb`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`*/`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`static int pull_one_remote_database(struct ctdb_context *ctdb, uint32_t srcnode,`
			`struct tdb_wrap *recdb, uint32_t dbid)`
create a helper function for recovery that pulls and merges all remote databases onto the local node (This used to be ctdb commit 5cecc47449c369f91e83389a94b987ac32b1e3f4) 2007-05-06 04:16:48 +04:00			`{`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`int ret;`
			`TDB_DATA outdata;`
			`struct ctdb_control_pulldb_reply *reply;`
			`struct ctdb_rec_data *rec;`
			`int i;`
			`TALLOC_CTX *tmp_ctx = talloc_new(recdb);`
create a helper function for recovery that pulls and merges all remote databases onto the local node (This used to be ctdb commit 5cecc47449c369f91e83389a94b987ac32b1e3f4) 2007-05-06 04:16:48 +04:00
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`ret = ctdb_ctrl_pulldb(ctdb, srcnode, dbid, CTDB_LMASTER_ANY, tmp_ctx,`
			`CONTROL_TIMEOUT(), &outdata);`
			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,(__location__ " Unable to copy db from node %u\n", srcnode));`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`talloc_free(tmp_ctx);`
			`return -1;`
			`}`

			`reply = (struct ctdb_control_pulldb_reply *)outdata.dptr;`

			`if (outdata.dsize < offsetof(struct ctdb_control_pulldb_reply, data)) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,(__location__ " invalid data in pulldb reply\n"));`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`talloc_free(tmp_ctx);`
			`return -1;`
			`}`

			`rec = (struct ctdb_rec_data *)&reply->data[0];`

			`for (i=0;`
			`i<reply->count;`
			`rec = (struct ctdb_rec_data )(rec->length + (uint8_t )rec), i++) {`
			`TDB_DATA key, data;`
			`struct ctdb_ltdb_header *hdr;`
			`TDB_DATA existing;`

			`key.dptr = &rec->data[0];`
			`key.dsize = rec->keylen;`
			`data.dptr = &rec->data[key.dsize];`
			`data.dsize = rec->datalen;`

			`hdr = (struct ctdb_ltdb_header *)data.dptr;`

			`if (data.dsize < sizeof(struct ctdb_ltdb_header)) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_CRIT,(__location__ " bad ltdb record\n"));`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`talloc_free(tmp_ctx);`
			`return -1;`
			`}`

			`/* fetch the existing record, if any */`
			`existing = tdb_fetch(recdb->tdb, key);`

			`if (existing.dptr != NULL) {`
			`struct ctdb_ltdb_header header;`
			`if (existing.dsize < sizeof(struct ctdb_ltdb_header)) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_CRIT,(__location__ " Bad record size %u from node %u\n",`
catch internal traversal errors (This used to be ctdb commit 8caa85ad71be5d20a8d6f0cb3d52aff6905657a4) 2008-01-07 06:08:25 +03:00			`(unsigned)existing.dsize, srcnode));`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`free(existing.dptr);`
			`talloc_free(tmp_ctx);`
			`return -1;`
create a helper function for recovery that pulls and merges all remote databases onto the local node (This used to be ctdb commit 5cecc47449c369f91e83389a94b987ac32b1e3f4) 2007-05-06 04:16:48 +04:00			`}`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`header = (struct ctdb_ltdb_header )existing.dptr;`
			`free(existing.dptr);`
			`if (!(header.rsn < hdr->rsn \|\|`
			`(header.dmaster != ctdb->recovery_master && header.rsn == hdr->rsn))) {`
create a helper function for recovery that pulls and merges all remote databases onto the local node (This used to be ctdb commit 5cecc47449c369f91e83389a94b987ac32b1e3f4) 2007-05-06 04:16:48 +04:00			`continue;`
			`}`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`}`

			`if (tdb_store(recdb->tdb, key, data, TDB_REPLACE) != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_CRIT,(__location__ " Failed to store record\n"));`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`talloc_free(tmp_ctx);`
			`return -1;`
create a helper function for recovery that pulls and merges all remote databases onto the local node (This used to be ctdb commit 5cecc47449c369f91e83389a94b987ac32b1e3f4) 2007-05-06 04:16:48 +04:00			`}`
			`}`

new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`talloc_free(tmp_ctx);`

create a helper function for recovery that pulls and merges all remote databases onto the local node (This used to be ctdb commit 5cecc47449c369f91e83389a94b987ac32b1e3f4) 2007-05-06 04:16:48 +04:00			`return 0;`
			`}`

implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`/*`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`pull all the remote database contents into the recdb`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`*/`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`static int pull_remote_database(struct ctdb_context ctdb, struct ctdb_node_map nodemap,`
			`struct tdb_wrap *recdb, uint32_t dbid)`
break the code that repoints dmaster for all local and remote records into a separate helper function (This used to be ctdb commit d5ab30d0ac21e736eb34eaa19bccfee5f0ce7cfb) 2007-05-06 04:22:13 +04:00			`{`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`int j;`
break the code that repoints dmaster for all local and remote records into a separate helper function (This used to be ctdb commit d5ab30d0ac21e736eb34eaa19bccfee5f0ce7cfb) 2007-05-06 04:22:13 +04:00
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`/* pull all records from all other nodes across onto this node`
			`(this merges based on rsn)`
			`*/`
			`for (j=0; j<nodemap->num; j++) {`
			`/* dont merge from nodes that are unavailable */`
			`if (nodemap->nodes[j].flags & NODE_FLAGS_INACTIVE) {`
			`continue;`
			`}`
			`if (pull_one_remote_database(ctdb, nodemap->nodes[j].pnn, recdb, dbid) != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,(__location__ " Failed to pull remote database from node %u\n",`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`nodemap->nodes[j].pnn));`
more optimisations to recovery (This used to be ctdb commit 9a41ad0a842cd4f3792d6e84b5c809b7ff6f342e) 2008-01-02 14:44:46 +03:00			`return -1;`
break the code that repoints dmaster for all local and remote records into a separate helper function (This used to be ctdb commit d5ab30d0ac21e736eb34eaa19bccfee5f0ce7cfb) 2007-05-06 04:22:13 +04:00			`}`
			`}`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00
break the code that repoints dmaster for all local and remote records into a separate helper function (This used to be ctdb commit d5ab30d0ac21e736eb34eaa19bccfee5f0ce7cfb) 2007-05-06 04:22:13 +04:00			`return 0;`
			`}`

implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00
			`/*`
			`update flags on all active nodes`
			`*/`
			`static int update_flags_on_all_nodes(struct ctdb_context ctdb, struct ctdb_node_map nodemap)`
			`{`
			`int i;`
			`for (i=0;i<nodemap->num;i++) {`
			`struct ctdb_node_flag_change c;`
			`TDB_DATA data;`

change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`c.pnn = nodemap->nodes[i].pnn;`
change the structure used for node flag change messages so that we can see both the old flags as well as the new flags (so we can tell which flags changed) send the CTDB_SRVID_RECONFIGURE messages to connected nodes only, not to every node, connected or not, in the cluster. in the handler inside the recovery daemon which is invoked for node flag change messages, only do a takeover_run() and redistribute the ip addresses IF it was the disabled or the unhealthy flags that changed. Also send out the cluster reconfigured message in this case. If any of the other flags changed we dont need to do the takeover_run(0 here since that will be done during recovery. (This used to be ctdb commit 5549b2058e2c148a8ca9d419123acf3247bb8829) 2007-08-21 11:25:15 +04:00			`c.old_flags = nodemap->nodes[i].flags;`
			`c.new_flags = nodemap->nodes[i].flags;`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00
			`data.dptr = (uint8_t *)&c;`
			`data.dsize = sizeof(c);`

propogate flag changes to all connected nodes (This used to be ctdb commit 711d1f7e20f1e98caaf08a57df0b1825ff6e97a0) 2007-06-09 15:58:50 +04:00			`ctdb_send_message(ctdb, CTDB_BROADCAST_CONNECTED,`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`CTDB_SRVID_NODE_FLAGS_CHANGED, data);`

			`}`
			`return 0;`
			`}`

verify that the recmaster has the correct flags for us and if not tell the recmaster what the flags should be (This used to be ctdb commit 3387597926ad71e4140cc504b828486d99a3ec8e) 2008-06-26 05:08:09 +04:00			`static int update_our_flags_on_all_nodes(struct ctdb_context ctdb, uint32_t pnn, struct ctdb_node_map nodemap)`
			`{`
			`struct ctdb_node_flag_change c;`
			`TDB_DATA data;`

			`c.pnn = nodemap->nodes[pnn].pnn;`
			`c.old_flags = nodemap->nodes[pnn].flags;`
			`c.new_flags = nodemap->nodes[pnn].flags;`

			`data.dptr = (uint8_t *)&c;`
			`data.dsize = sizeof(c);`

			`ctdb_send_message(ctdb, CTDB_BROADCAST_CONNECTED,`
			`CTDB_SRVID_NODE_FLAGS_CHANGED, data);`

			`return 0;`
			`}`
create a helper function for recovery to push all local databases out onto the remote nodes (This used to be ctdb commit 1ba76d374652cfa29e56fb77c7190349e42d3bcc) 2007-05-06 04:38:44 +04:00
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`/*`
			`ensure all nodes have the same vnnmap we do`
			`*/`
added automatic vacuuming of empty records during recovery (This used to be ctdb commit f9181a784ac7009df5e9c996f4e0c3e99098b59a) 2007-05-23 11:21:14 +04:00			`static int update_vnnmap_on_all_nodes(struct ctdb_context ctdb, struct ctdb_node_map nodemap,`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`uint32_t pnn, struct ctdb_vnn_map vnnmap, TALLOC_CTX mem_ctx)`
break out the code to update all nodes to the new vnnmap into a helper function (This used to be ctdb commit 81d39177949b54715710907d14ddc888dc09b064) 2007-05-06 04:42:18 +04:00			`{`
			`int j, ret;`

			`/* push the new vnn map out to all the nodes */`
			`for (j=0; j<nodemap->num; j++) {`
			`/* dont push to nodes that are unavailable */`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`if (nodemap->nodes[j].flags & NODE_FLAGS_INACTIVE) {`
break out the code to update all nodes to the new vnnmap into a helper function (This used to be ctdb commit 81d39177949b54715710907d14ddc888dc09b064) 2007-05-06 04:42:18 +04:00			`continue;`
			`}`

change how we do public addresses and takeover so that we can have multiple public addresses spread across multiple interfaces on each node. this is a massive patch since we have previously made the assumtion that we only have one public address per node. get rid of the public_interface argument. the public addresses file now explicitely lists which interface the address belongs to (This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8) 2007-09-04 03:50:07 +04:00			`ret = ctdb_ctrl_setvnnmap(ctdb, CONTROL_TIMEOUT(), nodemap->nodes[j].pnn, mem_ctx, vnnmap);`
break out the code to update all nodes to the new vnnmap into a helper function (This used to be ctdb commit 81d39177949b54715710907d14ddc888dc09b064) 2007-05-06 04:42:18 +04:00			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to set vnnmap for node %u\n", pnn));`
break out the code to update all nodes to the new vnnmap into a helper function (This used to be ctdb commit 81d39177949b54715710907d14ddc888dc09b064) 2007-05-06 04:42:18 +04:00			`return -1;`
			`}`
			`}`

			`return 0;`
			`}`

implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00
added admin commands to ban/unban nodes (This used to be ctdb commit 4dad04172e7e4955b5bf6444a85b19901c9683ad) 2007-06-07 10:34:33 +04:00			`/*`
			`handler for when the admin bans a node`
			`*/`
			`static void ban_handler(struct ctdb_context *ctdb, uint64_t srvid,`
			`TDB_DATA data, void *private_data)`
			`{`
			`struct ctdb_recoverd *rec = talloc_get_type(private_data, struct ctdb_recoverd);`
			`struct ctdb_ban_info b = (struct ctdb_ban_info )data.dptr;`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`TALLOC_CTX *mem_ctx = talloc_new(ctdb);`
added admin commands to ban/unban nodes (This used to be ctdb commit 4dad04172e7e4955b5bf6444a85b19901c9683ad) 2007-06-07 10:34:33 +04:00
			`if (data.dsize != sizeof(*b)) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,("Bad data in ban_handler\n"));`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`talloc_free(mem_ctx);`
added admin commands to ban/unban nodes (This used to be ctdb commit 4dad04172e7e4955b5bf6444a85b19901c9683ad) 2007-06-07 10:34:33 +04:00			`return;`
			`}`

rework banning/unbanning nodes ctdb_recoverd.c Always handle banning/unbanning locally on the node that is being banned/unbanned instead of on the recovery master. This means that if a ban request comes in to the recovery master for a remote node, we pass the request on to the remote node instead of setting up the ban and ban timeouts locally. ctdb.c send ban/unban requests to the node being banned/unbanned instead of to the recmaster (This used to be ctdb commit 880dd9f5fd0b91e450da93e195cc5c62cb1dcd6e) 2007-12-03 07:45:53 +03:00			`if (b->pnn != ctdb->pnn) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,("Got a ban request for pnn:%u but our pnn is %u. Ignoring ban request\n", b->pnn, ctdb->pnn));`
added admin commands to ban/unban nodes (This used to be ctdb commit 4dad04172e7e4955b5bf6444a85b19901c9683ad) 2007-06-07 10:34:33 +04:00			`return;`
			`}`

merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_NOTICE,("Node %u has been banned for %u seconds\n",`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`b->pnn, b->ban_time));`
rework banning/unbanning nodes ctdb_recoverd.c Always handle banning/unbanning locally on the node that is being banned/unbanned instead of on the recovery master. This means that if a ban request comes in to the recovery master for a remote node, we pass the request on to the remote node instead of setting up the ban and ban timeouts locally. ctdb.c send ban/unban requests to the node being banned/unbanned instead of to the recmaster (This used to be ctdb commit 880dd9f5fd0b91e450da93e195cc5c62cb1dcd6e) 2007-12-03 07:45:53 +03:00
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`ctdb_ban_node(rec, b->pnn, b->ban_time);`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`talloc_free(mem_ctx);`
added admin commands to ban/unban nodes (This used to be ctdb commit 4dad04172e7e4955b5bf6444a85b19901c9683ad) 2007-06-07 10:34:33 +04:00			`}`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00
			`/*`
added admin commands to ban/unban nodes (This used to be ctdb commit 4dad04172e7e4955b5bf6444a85b19901c9683ad) 2007-06-07 10:34:33 +04:00			`handler for when the admin unbans a node`
			`*/`
			`static void unban_handler(struct ctdb_context *ctdb, uint64_t srvid,`
			`TDB_DATA data, void *private_data)`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`{`
added admin commands to ban/unban nodes (This used to be ctdb commit 4dad04172e7e4955b5bf6444a85b19901c9683ad) 2007-06-07 10:34:33 +04:00			`struct ctdb_recoverd *rec = talloc_get_type(private_data, struct ctdb_recoverd);`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`TALLOC_CTX *mem_ctx = talloc_new(ctdb);`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`uint32_t pnn;`
added admin commands to ban/unban nodes (This used to be ctdb commit 4dad04172e7e4955b5bf6444a85b19901c9683ad) 2007-06-07 10:34:33 +04:00
			`if (data.dsize != sizeof(uint32_t)) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,("Bad data in unban_handler\n"));`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`talloc_free(mem_ctx);`
added admin commands to ban/unban nodes (This used to be ctdb commit 4dad04172e7e4955b5bf6444a85b19901c9683ad) 2007-06-07 10:34:33 +04:00			`return;`
			`}`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`pnn = (uint32_t )data.dptr;`
added admin commands to ban/unban nodes (This used to be ctdb commit 4dad04172e7e4955b5bf6444a85b19901c9683ad) 2007-06-07 10:34:33 +04:00
rework banning/unbanning nodes ctdb_recoverd.c Always handle banning/unbanning locally on the node that is being banned/unbanned instead of on the recovery master. This means that if a ban request comes in to the recovery master for a remote node, we pass the request on to the remote node instead of setting up the ban and ban timeouts locally. ctdb.c send ban/unban requests to the node being banned/unbanned instead of to the recmaster (This used to be ctdb commit 880dd9f5fd0b91e450da93e195cc5c62cb1dcd6e) 2007-12-03 07:45:53 +03:00			`if (pnn != ctdb->pnn) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,("Got an unban request for pnn:%u but our pnn is %u. Ignoring unban request\n", pnn, ctdb->pnn));`
added admin commands to ban/unban nodes (This used to be ctdb commit 4dad04172e7e4955b5bf6444a85b19901c9683ad) 2007-06-07 10:34:33 +04:00			`return;`
			`}`

merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_NOTICE,("Node %u has been unbanned.\n", pnn));`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`ctdb_unban_node(rec, pnn);`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`talloc_free(mem_ctx);`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`}`


ensure the recovery daemon is not clagged up by vacuum calls (This used to be ctdb commit ff7e80e247bf5a86adda0ef850d901478449675b) 2008-01-08 13:28:42 +03:00			`struct vacuum_info {`
			`struct vacuum_info next, prev;`
			`struct ctdb_recoverd *rec;`
			`uint32_t srcnode;`
			`struct ctdb_db_context *ctdb_db;`
			`struct ctdb_control_pulldb_reply *recs;`
			`struct ctdb_rec_data *r;`
			`};`

			`static void vacuum_fetch_next(struct vacuum_info *v);`

added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 09:23:27 +03:00			`/*`
ensure the recovery daemon is not clagged up by vacuum calls (This used to be ctdb commit ff7e80e247bf5a86adda0ef850d901478449675b) 2008-01-08 13:28:42 +03:00			`called when a vacuum fetch has completed - just free it and do the next one`
added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 09:23:27 +03:00			`*/`
			`static void vacuum_fetch_callback(struct ctdb_client_call_state *state)`
			`{`
ensure the recovery daemon is not clagged up by vacuum calls (This used to be ctdb commit ff7e80e247bf5a86adda0ef850d901478449675b) 2008-01-08 13:28:42 +03:00			`struct vacuum_info *v = talloc_get_type(state->async.private, struct vacuum_info);`
added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 09:23:27 +03:00			`talloc_free(state);`
ensure the recovery daemon is not clagged up by vacuum calls (This used to be ctdb commit ff7e80e247bf5a86adda0ef850d901478449675b) 2008-01-08 13:28:42 +03:00			`vacuum_fetch_next(v);`
			`}`


			`/*`
			`process the next element from the vacuum list`
			`*/`
			`static void vacuum_fetch_next(struct vacuum_info *v)`
			`{`
			`struct ctdb_call call;`
			`struct ctdb_rec_data *r;`

			`while (v->recs->count) {`
			`struct ctdb_client_call_state *state;`
			`TDB_DATA data;`
			`struct ctdb_ltdb_header *hdr;`

			`ZERO_STRUCT(call);`
			`call.call_id = CTDB_NULL_FUNC;`
			`call.flags = CTDB_IMMEDIATE_MIGRATION;`

			`r = v->r;`
			`v->r = (struct ctdb_rec_data )(r->length + (uint8_t )r);`
			`v->recs->count--;`

			`call.key.dptr = &r->data[0];`
			`call.key.dsize = r->keylen;`

			`/* ensure we don't block this daemon - just skip a record if we can't get`
			`the chainlock */`
			`if (tdb_chainlock_nonblock(v->ctdb_db->ltdb->tdb, call.key) != 0) {`
			`continue;`
			`}`

			`data = tdb_fetch(v->ctdb_db->ltdb->tdb, call.key);`
fixed a memory leak in the recovery daemon (This used to be ctdb commit 73c27cf4c62cbe44b2b8fd00f907974d0808500c) 2008-01-15 12:11:44 +03:00			`if (data.dptr == NULL) {`
			`tdb_chainunlock(v->ctdb_db->ltdb->tdb, call.key);`
			`continue;`
			`}`

			`if (data.dsize < sizeof(struct ctdb_ltdb_header)) {`
			`free(data.dptr);`
ensure the recovery daemon is not clagged up by vacuum calls (This used to be ctdb commit ff7e80e247bf5a86adda0ef850d901478449675b) 2008-01-08 13:28:42 +03:00			`tdb_chainunlock(v->ctdb_db->ltdb->tdb, call.key);`
			`continue;`
			`}`

			`hdr = (struct ctdb_ltdb_header *)data.dptr;`
			`if (hdr->dmaster == v->rec->ctdb->pnn) {`
			`/* its already local */`
fixed a memory leak in the recovery daemon (This used to be ctdb commit 73c27cf4c62cbe44b2b8fd00f907974d0808500c) 2008-01-15 12:11:44 +03:00			`free(data.dptr);`
ensure the recovery daemon is not clagged up by vacuum calls (This used to be ctdb commit ff7e80e247bf5a86adda0ef850d901478449675b) 2008-01-08 13:28:42 +03:00			`tdb_chainunlock(v->ctdb_db->ltdb->tdb, call.key);`
			`continue;`
			`}`

fixed a memory leak in the recovery daemon (This used to be ctdb commit 73c27cf4c62cbe44b2b8fd00f907974d0808500c) 2008-01-15 12:11:44 +03:00			`free(data.dptr);`

ensure the recovery daemon is not clagged up by vacuum calls (This used to be ctdb commit ff7e80e247bf5a86adda0ef850d901478449675b) 2008-01-08 13:28:42 +03:00			`state = ctdb_call_send(v->ctdb_db, &call);`
			`tdb_chainunlock(v->ctdb_db->ltdb->tdb, call.key);`
			`if (state == NULL) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,(__location__ " Failed to setup vacuum fetch call\n"));`
ensure the recovery daemon is not clagged up by vacuum calls (This used to be ctdb commit ff7e80e247bf5a86adda0ef850d901478449675b) 2008-01-08 13:28:42 +03:00			`talloc_free(v);`
			`return;`
			`}`
			`state->async.fn = vacuum_fetch_callback;`
			`state->async.private = v;`
			`return;`
			`}`

			`talloc_free(v);`
			`}`


			`/*`
			`destroy a vacuum info structure`
			`*/`
			`static int vacuum_info_destructor(struct vacuum_info *v)`
			`{`
			`DLIST_REMOVE(v->rec->vacuum_info, v);`
			`return 0;`
added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 09:23:27 +03:00			`}`


			`/*`
			`handler for vacuum fetch`
			`*/`
			`static void vacuum_fetch_handler(struct ctdb_context *ctdb, uint64_t srvid,`
			`TDB_DATA data, void *private_data)`
			`{`
ensure the recovery daemon is not clagged up by vacuum calls (This used to be ctdb commit ff7e80e247bf5a86adda0ef850d901478449675b) 2008-01-08 13:28:42 +03:00			`struct ctdb_recoverd *rec = talloc_get_type(private_data, struct ctdb_recoverd);`
added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 09:23:27 +03:00			`struct ctdb_control_pulldb_reply *recs;`
			`int ret, i;`
			`TALLOC_CTX *tmp_ctx = talloc_new(ctdb);`
			`const char *name;`
			`struct ctdb_dbid_map *dbmap=NULL;`
			`bool persistent = false;`
			`struct ctdb_db_context *ctdb_db;`
			`struct ctdb_rec_data *r;`
ensure the recovery daemon is not clagged up by vacuum calls (This used to be ctdb commit ff7e80e247bf5a86adda0ef850d901478449675b) 2008-01-08 13:28:42 +03:00			`uint32_t srcnode;`
			`struct vacuum_info *v;`
added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 09:23:27 +03:00
			`recs = (struct ctdb_control_pulldb_reply *)data.dptr;`
ensure the recovery daemon is not clagged up by vacuum calls (This used to be ctdb commit ff7e80e247bf5a86adda0ef850d901478449675b) 2008-01-08 13:28:42 +03:00			`r = (struct ctdb_rec_data *)&recs->data[0];`

			`if (recs->count == 0) {`
			`return;`
			`}`

			`srcnode = r->reqid;`

			`for (v=rec->vacuum_info;v;v=v->next) {`
only match vacuum list if on the same database (This used to be ctdb commit 27e56955e93027534780cc7549ddb224670d82b6) 2008-01-09 02:22:20 +03:00			`if (srcnode == v->srcnode && recs->db_id == v->ctdb_db->db_id) {`
ensure the recovery daemon is not clagged up by vacuum calls (This used to be ctdb commit ff7e80e247bf5a86adda0ef850d901478449675b) 2008-01-08 13:28:42 +03:00			`/* we're already working on records from this node */`
			`return;`
			`}`
			`}`
added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 09:23:27 +03:00
			`/* work out if the database is persistent */`
			`ret = ctdb_ctrl_getdbmap(ctdb, CONTROL_TIMEOUT(), CTDB_CURRENT_NODE, tmp_ctx, &dbmap);`
			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to get dbids from local node\n"));`
added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 09:23:27 +03:00			`talloc_free(tmp_ctx);`
			`return;`
			`}`

			`for (i=0;i<dbmap->num;i++) {`
			`if (dbmap->dbs[i].dbid == recs->db_id) {`
			`persistent = dbmap->dbs[i].persistent;`
			`break;`
			`}`
			`}`
			`if (i == dbmap->num) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to find db_id 0x%x on local node\n", recs->db_id));`
added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 09:23:27 +03:00			`talloc_free(tmp_ctx);`
			`return;`
			`}`

			`/* find the name of this database */`
			`if (ctdb_ctrl_getdbname(ctdb, CONTROL_TIMEOUT(), CTDB_CURRENT_NODE, recs->db_id, tmp_ctx, &name) != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,(__location__ " Failed to get name of db 0x%x\n", recs->db_id));`
added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 09:23:27 +03:00			`talloc_free(tmp_ctx);`
			`return;`
			`}`

			`/* attach to it */`
add a parameter for the tdb-flags to the client function ctdb_attach() so that we can pass TDB_NOSYNC when we attach to a persistent database and want fast unsafe writes instead of slow but safe tdb_transaction writes. enhance the ctdb_persistent test suite to test both safe and unsafe writes (This used to be ctdb commit 4948574f5a290434f3edd0c052cf13f3645deec4) 2008-06-04 04:46:20 +04:00			`ctdb_db = ctdb_attach(ctdb, name, persistent, 0);`
added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 09:23:27 +03:00			`if (ctdb_db == NULL) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,(__location__ " Failed to attach to database '%s'\n", name));`
added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 09:23:27 +03:00			`talloc_free(tmp_ctx);`
			`return;`
			`}`

ensure the recovery daemon is not clagged up by vacuum calls (This used to be ctdb commit ff7e80e247bf5a86adda0ef850d901478449675b) 2008-01-08 13:28:42 +03:00			`v = talloc_zero(rec, struct vacuum_info);`
			`if (v == NULL) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_CRIT,(__location__ " Out of memory\n"));`
ensure the recovery daemon is not clagged up by vacuum calls (This used to be ctdb commit ff7e80e247bf5a86adda0ef850d901478449675b) 2008-01-08 13:28:42 +03:00			`return;`
			`}`
added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 09:23:27 +03:00
ensure the recovery daemon is not clagged up by vacuum calls (This used to be ctdb commit ff7e80e247bf5a86adda0ef850d901478449675b) 2008-01-08 13:28:42 +03:00			`v->rec = rec;`
			`v->srcnode = srcnode;`
			`v->ctdb_db = ctdb_db;`
			`v->recs = talloc_memdup(v, recs, data.dsize);`
			`if (v->recs == NULL) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_CRIT,(__location__ " Out of memory\n"));`
ensure the recovery daemon is not clagged up by vacuum calls (This used to be ctdb commit ff7e80e247bf5a86adda0ef850d901478449675b) 2008-01-08 13:28:42 +03:00			`talloc_free(v);`
			`return;`
			`}`
			`v->r = (struct ctdb_rec_data *)&v->recs->data[0];`
added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 09:23:27 +03:00
ensure the recovery daemon is not clagged up by vacuum calls (This used to be ctdb commit ff7e80e247bf5a86adda0ef850d901478449675b) 2008-01-08 13:28:42 +03:00			`DLIST_ADD(rec->vacuum_info, v);`
added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 09:23:27 +03:00
ensure the recovery daemon is not clagged up by vacuum calls (This used to be ctdb commit ff7e80e247bf5a86adda0ef850d901478449675b) 2008-01-08 13:28:42 +03:00			`talloc_set_destructor(v, vacuum_info_destructor);`
added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 09:23:27 +03:00
ensure the recovery daemon is not clagged up by vacuum calls (This used to be ctdb commit ff7e80e247bf5a86adda0ef850d901478449675b) 2008-01-08 13:28:42 +03:00			`vacuum_fetch_next(v);`
added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 09:23:27 +03:00			`}`

added admin commands to ban/unban nodes (This used to be ctdb commit 4dad04172e7e4955b5bf6444a85b19901c9683ad) 2007-06-07 10:34:33 +04:00
add a tuneable to control how long we wait after a successful recovery before we alow another recovery to be initiated (This used to be ctdb commit f3b43519423b7a73e6a2dd986bdf11203b8653cf) 2007-07-04 02:36:59 +04:00			`/*`
			`called when ctdb_wait_timeout should finish`
			`*/`
			`static void ctdb_wait_handler(struct event_context ev, struct timed_event te,`
			`struct timeval yt, void *p)`
			`{`
			`uint32_t timed_out = (uint32_t )p;`
			`(*timed_out) = 1;`
			`}`

			`/*`
			`wait for a given number of seconds`
			`*/`
			`static void ctdb_wait_timeout(struct ctdb_context *ctdb, uint32_t secs)`
			`{`
			`uint32_t timed_out = 0;`
			`event_add_timed(ctdb->ev, ctdb, timeval_current_ofs(secs, 0), ctdb_wait_handler, &timed_out);`
			`while (!timed_out) {`
			`event_loop_once(ctdb->ev);`
			`}`
			`}`

make election handling much more scalable (This used to be ctdb commit 05938d462b92bd9ecb8e35f53651bded47c48675) 2007-11-13 02:27:44 +03:00			`/*`
			`called when an election times out (ends)`
			`*/`
			`static void ctdb_election_timeout(struct event_context ev, struct timed_event te,`
			`struct timeval t, void *p)`
			`{`
			`struct ctdb_recoverd *rec = talloc_get_type(p, struct ctdb_recoverd);`
			`rec->election_timeout = NULL;`
			`}`


			`/*`
			`wait for an election to finish. It finished election_timeout seconds after`
			`the last election packet is received`
			`*/`
			`static void ctdb_wait_election(struct ctdb_recoverd *rec)`
			`{`
			`struct ctdb_context *ctdb = rec->ctdb;`
			`while (rec->election_timeout) {`
			`event_loop_once(ctdb->ev);`
			`}`
			`}`

sync flags between nodes in monitor loop in recmaster (This used to be ctdb commit 6eef86e06388fc53a1212f1e2783ae174c6cd210) 2007-10-15 08:28:51 +04:00			`/*`
when we as the recovery daemon on the recovery master detects that the flags differ between the local ctdb daemon and the remote node we can force a flags update on all nodes and not just the local daemon (This used to be ctdb commit a924eb89c966ecbae029ca137e06cffd40cc70fd) 2007-11-23 03:31:42 +03:00			`Update our local flags from all remote connected nodes.`
			`This is only run when we are or we belive we are the recovery master`
sync flags between nodes in monitor loop in recmaster (This used to be ctdb commit 6eef86e06388fc53a1212f1e2783ae174c6cd210) 2007-10-15 08:28:51 +04:00			`*/`
dont manipulate ctdb->monitoring_mode directly from the SET_MON_MODE control, instead call ctdb_start/stop_monitoring() ctdb_stop_monitoring() dont allocate a new monitoring context, leave it NULL. Also set the monitoring_mode in this function so that ctdb_stop/start_monitoring() and ->monitoring_mode are kept in sync. Add a debug message to log that we have stopped monitoring. ctdb_start_monitoring() check whether monitoring is already active and make the function idempotent. Create the monitoring context when monitoring is started. Update ->monitoring_mode once the monitoring has been started. Add a debug message to log that we have started monitoring. When we temporarily stop monitoring while running an event script, restart monitoring after the event script wrapper returns instead of in the event script callback. Let monitoring_mode start out as DISABLED and let it be enabled once we call ctdb_start_monitoring. dont check for MONITORING_DISABLED in check_fore_dead_nodes(). If monitoring is disabled, this event handler will not be called. (This used to be ctdb commit 3a93ae8bdcffb1adbd6243844f3058fc742f76aa) 2007-11-30 00:44:34 +03:00			`static int update_local_flags(struct ctdb_recoverd rec, struct ctdb_node_map nodemap)`
sync flags between nodes in monitor loop in recmaster (This used to be ctdb commit 6eef86e06388fc53a1212f1e2783ae174c6cd210) 2007-10-15 08:28:51 +04:00			`{`
			`int j;`
dont manipulate ctdb->monitoring_mode directly from the SET_MON_MODE control, instead call ctdb_start/stop_monitoring() ctdb_stop_monitoring() dont allocate a new monitoring context, leave it NULL. Also set the monitoring_mode in this function so that ctdb_stop/start_monitoring() and ->monitoring_mode are kept in sync. Add a debug message to log that we have stopped monitoring. ctdb_start_monitoring() check whether monitoring is already active and make the function idempotent. Create the monitoring context when monitoring is started. Update ->monitoring_mode once the monitoring has been started. Add a debug message to log that we have started monitoring. When we temporarily stop monitoring while running an event script, restart monitoring after the event script wrapper returns instead of in the event script callback. Let monitoring_mode start out as DISABLED and let it be enabled once we call ctdb_start_monitoring. dont check for MONITORING_DISABLED in check_fore_dead_nodes(). If monitoring is disabled, this event handler will not be called. (This used to be ctdb commit 3a93ae8bdcffb1adbd6243844f3058fc742f76aa) 2007-11-30 00:44:34 +03:00			`struct ctdb_context *ctdb = rec->ctdb;`
sync flags between nodes in monitor loop in recmaster (This used to be ctdb commit 6eef86e06388fc53a1212f1e2783ae174c6cd210) 2007-10-15 08:28:51 +04:00			`TALLOC_CTX *mem_ctx = talloc_new(ctdb);`

			`/* get the nodemap for all active remote nodes and verify`
			`they are the same as for this node`
			`*/`
			`for (j=0; j<nodemap->num; j++) {`
			`struct ctdb_node_map *remote_nodemap=NULL;`
			`int ret;`

			`if (nodemap->nodes[j].flags & NODE_FLAGS_DISCONNECTED) {`
			`continue;`
			`}`
			`if (nodemap->nodes[j].pnn == ctdb->pnn) {`
			`continue;`
			`}`

			`ret = ctdb_ctrl_getnodemap(ctdb, CONTROL_TIMEOUT(), nodemap->nodes[j].pnn,`
			`mem_ctx, &remote_nodemap);`
			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to get nodemap from remote node %u\n",`
sync flags between nodes in monitor loop in recmaster (This used to be ctdb commit 6eef86e06388fc53a1212f1e2783ae174c6cd210) 2007-10-15 08:28:51 +04:00			`nodemap->nodes[j].pnn));`
move ctdb_set_culprit higher up in the file when we are the recmaster and we update the local flags for all the nodes, if one of the nodes fail to respond and give us his flags, set that node as a "culprit" as one of the first things to do in the monitor_cluster loop, check if the current culprit has caused too many (20) failures and if so ban that node. this is for the situation where a remote node may still be CONNECTED but it fails to respond to the getnodemap control causing the recovery master to loop in monitor_cluster aborting the monitoring when the node fails to respond but before anything will trigger a call to do_recovery(). If one or more of the databases or nodes are frozen at this stage, this would lead to smbd being blocked for potentially a longish time. (This used to be ctdb commit 83b0261f2cb453195b86f547d360400103a8b795) 2007-11-28 07:04:20 +03:00			`ctdb_set_culprit(rec, nodemap->nodes[j].pnn);`
sync flags between nodes in monitor loop in recmaster (This used to be ctdb commit 6eef86e06388fc53a1212f1e2783ae174c6cd210) 2007-10-15 08:28:51 +04:00			`talloc_free(mem_ctx);`
If update_local_flags() finds that a node has changed its BANNED status so it differs from what the local ctdb daemon on the recovery master thinks it should be we should call for a re-election (This used to be ctdb commit 21ad6039c31ef5cc0e40a35a41220f91943947cb) 2007-11-23 03:53:06 +03:00			`return MONITOR_FAILED;`
sync flags between nodes in monitor loop in recmaster (This used to be ctdb commit 6eef86e06388fc53a1212f1e2783ae174c6cd210) 2007-10-15 08:28:51 +04:00			`}`
			`if (nodemap->nodes[j].flags != remote_nodemap->nodes[j].flags) {`
If update_local_flags() finds that a node has changed its BANNED status so it differs from what the local ctdb daemon on the recovery master thinks it should be we should call for a re-election (This used to be ctdb commit 21ad6039c31ef5cc0e40a35a41220f91943947cb) 2007-11-23 03:53:06 +03:00			`struct ctdb_node_flag_change c;`
			`TDB_DATA data;`
add an extra log if we get a modflags control but it doesnt change any flags in update_local_flags() (this is only called if we are or we belive we are the recmaster) when we detect that the flags of a remote node is different from what our local node thinks the flags should be for that remote node we should send a node-flag-changed message to the local daemon so that it updates the flags for that node. (This used to be ctdb commit 36225e4e271f7a4065398253747fb20054f99a53) 2007-11-23 02:52:29 +03:00
If update_local_flags() finds that a node has changed its BANNED status so it differs from what the local ctdb daemon on the recovery master thinks it should be we should call for a re-election (This used to be ctdb commit 21ad6039c31ef5cc0e40a35a41220f91943947cb) 2007-11-23 03:53:06 +03:00			`/* We should tell our daemon about this so it`
add an extra log if we get a modflags control but it doesnt change any flags in update_local_flags() (this is only called if we are or we belive we are the recmaster) when we detect that the flags of a remote node is different from what our local node thinks the flags should be for that remote node we should send a node-flag-changed message to the local daemon so that it updates the flags for that node. (This used to be ctdb commit 36225e4e271f7a4065398253747fb20054f99a53) 2007-11-23 02:52:29 +03:00			`updates its flags or else we will log the same`
			`message again in the next iteration of recovery.`
when we as the recovery daemon on the recovery master detects that the flags differ between the local ctdb daemon and the remote node we can force a flags update on all nodes and not just the local daemon (This used to be ctdb commit a924eb89c966ecbae029ca137e06cffd40cc70fd) 2007-11-23 03:31:42 +03:00			`Since we are the recovery master we can just as`
			`well update the flags on all nodes.`
add an extra log if we get a modflags control but it doesnt change any flags in update_local_flags() (this is only called if we are or we belive we are the recmaster) when we detect that the flags of a remote node is different from what our local node thinks the flags should be for that remote node we should send a node-flag-changed message to the local daemon so that it updates the flags for that node. (This used to be ctdb commit 36225e4e271f7a4065398253747fb20054f99a53) 2007-11-23 02:52:29 +03:00			`*/`
			`c.pnn = nodemap->nodes[j].pnn;`
			`c.old_flags = nodemap->nodes[j].flags;`
			`c.new_flags = remote_nodemap->nodes[j].flags;`

			`data.dptr = (uint8_t *)&c;`
			`data.dsize = sizeof(c);`

rework banning/unbanning nodes ctdb_recoverd.c Always handle banning/unbanning locally on the node that is being banned/unbanned instead of on the recovery master. This means that if a ban request comes in to the recovery master for a remote node, we pass the request on to the remote node instead of setting up the ban and ban timeouts locally. ctdb.c send ban/unban requests to the node being banned/unbanned instead of to the recmaster (This used to be ctdb commit 880dd9f5fd0b91e450da93e195cc5c62cb1dcd6e) 2007-12-03 07:45:53 +03:00			`ctdb_send_message(ctdb, ctdb->pnn,`
add an extra log if we get a modflags control but it doesnt change any flags in update_local_flags() (this is only called if we are or we belive we are the recmaster) when we detect that the flags of a remote node is different from what our local node thinks the flags should be for that remote node we should send a node-flag-changed message to the local daemon so that it updates the flags for that node. (This used to be ctdb commit 36225e4e271f7a4065398253747fb20054f99a53) 2007-11-23 02:52:29 +03:00			`CTDB_SRVID_NODE_FLAGS_CHANGED,`
			`data);`

If update_local_flags() finds that a node has changed its BANNED status so it differs from what the local ctdb daemon on the recovery master thinks it should be we should call for a re-election (This used to be ctdb commit 21ad6039c31ef5cc0e40a35a41220f91943947cb) 2007-11-23 03:53:06 +03:00			`/* Update our local copy of the flags in the recovery`
			`daemon.`
			`*/`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_NOTICE,("Remote node %u had flags 0x%x, local had 0x%x - updating local\n",`
If update_local_flags() finds that a node has changed its BANNED status so it differs from what the local ctdb daemon on the recovery master thinks it should be we should call for a re-election (This used to be ctdb commit 21ad6039c31ef5cc0e40a35a41220f91943947cb) 2007-11-23 03:53:06 +03:00			`nodemap->nodes[j].pnn, remote_nodemap->nodes[j].flags,`
			`nodemap->nodes[j].flags));`
sync flags between nodes in monitor loop in recmaster (This used to be ctdb commit 6eef86e06388fc53a1212f1e2783ae174c6cd210) 2007-10-15 08:28:51 +04:00			`nodemap->nodes[j].flags = remote_nodemap->nodes[j].flags;`
If update_local_flags() finds that a node has changed its BANNED status so it differs from what the local ctdb daemon on the recovery master thinks it should be we should call for a re-election (This used to be ctdb commit 21ad6039c31ef5cc0e40a35a41220f91943947cb) 2007-11-23 03:53:06 +03:00
			`/* If the BANNED flag has changed for the node`
			`this is a good reason to do a new election.`
			`*/`
			`if ((c.old_flags ^ c.new_flags) & NODE_FLAGS_BANNED) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_NOTICE,("Remote node %u had different BANNED flags 0x%x, local had 0x%x - trigger a re-election\n",`
If update_local_flags() finds that a node has changed its BANNED status so it differs from what the local ctdb daemon on the recovery master thinks it should be we should call for a re-election (This used to be ctdb commit 21ad6039c31ef5cc0e40a35a41220f91943947cb) 2007-11-23 03:53:06 +03:00			`nodemap->nodes[j].pnn, c.new_flags,`
			`c.old_flags));`
			`talloc_free(mem_ctx);`
			`return MONITOR_ELECTION_NEEDED;`
			`}`

sync flags between nodes in monitor loop in recmaster (This used to be ctdb commit 6eef86e06388fc53a1212f1e2783ae174c6cd210) 2007-10-15 08:28:51 +04:00			`}`
			`talloc_free(remote_nodemap);`
			`}`
			`talloc_free(mem_ctx);`
If update_local_flags() finds that a node has changed its BANNED status so it differs from what the local ctdb daemon on the recovery master thinks it should be we should call for a re-election (This used to be ctdb commit 21ad6039c31ef5cc0e40a35a41220f91943947cb) 2007-11-23 03:53:06 +03:00			`return MONITOR_OK;`
sync flags between nodes in monitor loop in recmaster (This used to be ctdb commit 6eef86e06388fc53a1212f1e2783ae174c6cd210) 2007-10-15 08:28:51 +04:00			`}`


create a define to represent the 'invalid' generation id we used in two places. create a new helper function to generate new generation id values that know about the invalid id and avoids generating it. update the ctdb status tool to know about the invalid generation id and print the string INVALID instead (This used to be ctdb commit 4fbcd189543cb8a92227fdcd3d158472e558ccda) 2007-08-22 06:38:31 +04:00			`/* Create a new random generation ip.`
			`The generation id can not be the INVALID_GENERATION id`
			`*/`
			`static uint32_t new_generation(void)`
			`{`
			`uint32_t generation;`

			`while (1) {`
			`generation = random();`

			`if (generation != INVALID_GENERATION) {`
			`break;`
			`}`
			`}`

			`return generation;`
			`}`
we are the culprit if we can't get the reclock (This used to be ctdb commit 1d320e113c6134ff6822b985a47131d8204af35a) 2007-10-05 06:01:40 +04:00
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00
			`/*`
			`create a temporary working database`
			`*/`
			`static struct tdb_wrap create_recdb(struct ctdb_context ctdb, TALLOC_CTX *mem_ctx)`
			`{`
			`char *name;`
			`struct tdb_wrap *recdb;`

			`/* open up the temporary recovery database */`
			`name = talloc_asprintf(mem_ctx, "%s/recdb.tdb", ctdb->db_directory);`
			`if (name == NULL) {`
			`return NULL;`
			`}`
			`unlink(name);`
			`recdb = tdb_wrap_open(mem_ctx, name, ctdb->tunable.database_hash_size,`
			`TDB_NOLOCK, O_RDWR\|O_CREAT\|O_EXCL, 0600);`
			`if (recdb == NULL) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_CRIT,(__location__ " Failed to create temp recovery database '%s'\n", name));`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`}`

			`talloc_free(name);`

			`return recdb;`
			`}`


			`/*`
			`a traverse function for pulling all relevent records from recdb`
			`*/`
			`struct recdb_data {`
			`struct ctdb_context *ctdb;`
			`struct ctdb_control_pulldb_reply *recdata;`
			`uint32_t len;`
catch internal traversal errors (This used to be ctdb commit 8caa85ad71be5d20a8d6f0cb3d52aff6905657a4) 2008-01-07 06:08:25 +03:00			`bool failed;`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`};`

			`static int traverse_recdb(struct tdb_context tdb, TDB_DATA key, TDB_DATA data, void p)`
			`{`
			`struct recdb_data params = (struct recdb_data )p;`
			`struct ctdb_rec_data *rec;`
			`struct ctdb_ltdb_header *hdr;`

			`/* skip empty records */`
			`if (data.dsize <= sizeof(struct ctdb_ltdb_header)) {`
			`return 0;`
			`}`

			`/* update the dmaster field to point to us */`
			`hdr = (struct ctdb_ltdb_header *)data.dptr;`
			`hdr->dmaster = params->ctdb->pnn;`

			`/* add the record to the blob ready to send to the nodes */`
			`rec = ctdb_marshall_record(params->recdata, 0, key, NULL, data);`
catch internal traversal errors (This used to be ctdb commit 8caa85ad71be5d20a8d6f0cb3d52aff6905657a4) 2008-01-07 06:08:25 +03:00			`if (rec == NULL) {`
			`params->failed = true;`
			`return -1;`
			`}`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`params->recdata = talloc_realloc_size(NULL, params->recdata, rec->length + params->len);`
			`if (params->recdata == NULL) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_CRIT,(__location__ " Failed to expand recdata to %u (%u records)\n",`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`rec->length + params->len, params->recdata->count));`
catch internal traversal errors (This used to be ctdb commit 8caa85ad71be5d20a8d6f0cb3d52aff6905657a4) 2008-01-07 06:08:25 +03:00			`params->failed = true;`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`return -1;`
			`}`
			`params->recdata->count++;`
			`memcpy(params->len+(uint8_t *)params->recdata, rec, rec->length);`
			`params->len += rec->length;`
			`talloc_free(rec);`

			`return 0;`
			`}`

			`/*`
			`push the recdb database out to all nodes`
			`*/`
			`static int push_recdb_database(struct ctdb_context *ctdb, uint32_t dbid,`
			`struct tdb_wrap recdb, struct ctdb_node_map nodemap)`
			`{`
			`struct recdb_data params;`
			`struct ctdb_control_pulldb_reply *recdata;`
			`TDB_DATA outdata;`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`TALLOC_CTX *tmp_ctx;`
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`uint32_t *nodes;`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00
			`tmp_ctx = talloc_new(ctdb);`
			`CTDB_NO_MEMORY(ctdb, tmp_ctx);`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00
			`recdata = talloc_zero(recdb, struct ctdb_control_pulldb_reply);`
			`CTDB_NO_MEMORY(ctdb, recdata);`

			`recdata->db_id = dbid;`

			`params.ctdb = ctdb;`
			`params.recdata = recdata;`
			`params.len = offsetof(struct ctdb_control_pulldb_reply, data);`
catch internal traversal errors (This used to be ctdb commit 8caa85ad71be5d20a8d6f0cb3d52aff6905657a4) 2008-01-07 06:08:25 +03:00			`params.failed = false;`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00
			`if (tdb_traverse_read(recdb->tdb, traverse_recdb, &params) == -1) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,(__location__ " Failed to traverse recdb database\n"));`
catch internal traversal errors (This used to be ctdb commit 8caa85ad71be5d20a8d6f0cb3d52aff6905657a4) 2008-01-07 06:08:25 +03:00			`talloc_free(params.recdata);`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`talloc_free(tmp_ctx);`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`return -1;`
			`}`

catch internal traversal errors (This used to be ctdb commit 8caa85ad71be5d20a8d6f0cb3d52aff6905657a4) 2008-01-07 06:08:25 +03:00			`if (params.failed) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,(__location__ " Failed to traverse recdb database\n"));`
catch internal traversal errors (This used to be ctdb commit 8caa85ad71be5d20a8d6f0cb3d52aff6905657a4) 2008-01-07 06:08:25 +03:00			`talloc_free(params.recdata);`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`talloc_free(tmp_ctx);`
catch internal traversal errors (This used to be ctdb commit 8caa85ad71be5d20a8d6f0cb3d52aff6905657a4) 2008-01-07 06:08:25 +03:00			`return -1;`
			`}`

new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`recdata = params.recdata;`

			`outdata.dptr = (void *)recdata;`
			`outdata.dsize = params.len;`

add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`nodes = list_of_active_nodes(ctdb, nodemap, tmp_ctx, true);`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`if (ctdb_client_async_control(ctdb, CTDB_CONTROL_PUSH_DB,`
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`nodes,`
			`CONTROL_TIMEOUT(), false, outdata,`
			`NULL, NULL,`
			`NULL) != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,(__location__ " Failed to push recdb records to nodes for db 0x%x\n", dbid));`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`talloc_free(recdata);`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`talloc_free(tmp_ctx);`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`return -1;`
			`}`

added debug constants to allow for better mapping to syslog levels (This used to be ctdb commit 7ba8f1dde318eab03f4257e5a89fd23e7281e502) 2008-02-04 09:44:24 +03:00			`DEBUG(DEBUG_NOTICE, (__location__ " Recovery - pushed remote database 0x%x of size %u\n",`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`dbid, recdata->count));`

			`talloc_free(recdata);`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`talloc_free(tmp_ctx);`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00
			`return 0;`
			`}`


			`/*`
			`go through a full recovery on one database`
			`*/`
			`static int recover_database(struct ctdb_recoverd *rec,`
			`TALLOC_CTX *mem_ctx,`
			`uint32_t dbid,`
			`uint32_t pnn,`
added paranoid transaction ids (This used to be ctdb commit afc1da53873cdbd31fcc8c6b22fae262e344cf6e) 2008-01-06 05:24:55 +03:00			`struct ctdb_node_map *nodemap,`
			`uint32_t transaction_id)`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`{`
			`struct tdb_wrap *recdb;`
			`int ret;`
			`struct ctdb_context *ctdb = rec->ctdb;`
			`TDB_DATA data;`
added paranoid transaction ids (This used to be ctdb commit afc1da53873cdbd31fcc8c6b22fae262e344cf6e) 2008-01-06 05:24:55 +03:00			`struct ctdb_control_wipe_database w;`
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`uint32_t *nodes;`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00
			`recdb = create_recdb(ctdb, mem_ctx);`
			`if (recdb == NULL) {`
			`return -1;`
			`}`

			`/* pull all remote databases onto the recdb */`
			`ret = pull_remote_database(ctdb, nodemap, recdb, dbid);`
			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to pull remote database 0x%x\n", dbid));`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`return -1;`
			`}`

added debug constants to allow for better mapping to syslog levels (This used to be ctdb commit 7ba8f1dde318eab03f4257e5a89fd23e7281e502) 2008-02-04 09:44:24 +03:00			`DEBUG(DEBUG_NOTICE, (__location__ " Recovery - pulled remote database 0x%x\n", dbid));`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00
			`/* wipe all the remote databases. This is safe as we are in a transaction */`
added paranoid transaction ids (This used to be ctdb commit afc1da53873cdbd31fcc8c6b22fae262e344cf6e) 2008-01-06 05:24:55 +03:00			`w.db_id = dbid;`
			`w.transaction_id = transaction_id;`

			`data.dptr = (void *)&w;`
			`data.dsize = sizeof(w);`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`nodes = list_of_active_nodes(ctdb, nodemap, recdb, true);`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`if (ctdb_client_async_control(ctdb, CTDB_CONTROL_WIPE_DATABASE,`
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`nodes,`
			`CONTROL_TIMEOUT(), false, data,`
			`NULL, NULL,`
			`NULL) != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to wipe database. Recovery failed.\n"));`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`talloc_free(recdb);`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`return -1;`
			`}`

			`/* push out the correct database. This sets the dmaster and skips`
			`the empty records */`
			`ret = push_recdb_database(ctdb, dbid, recdb, nodemap);`
			`if (ret != 0) {`
			`talloc_free(recdb);`
			`return -1;`
			`}`

			`/* all done with this database */`
			`talloc_free(recdb);`

			`return 0;`
			`}`

create a define to represent the 'invalid' generation id we used in two places. create a new helper function to generate new generation id values that know about the invalid id and avoids generating it. update the ctdb status tool to know about the invalid generation id and print the string INVALID instead (This used to be ctdb commit 4fbcd189543cb8a92227fdcd3d158472e558ccda) 2007-08-22 06:38:31 +04:00
- moved cmdline options that are only relevant to ctdbd into ctdbd.c - fixed a valgrind error on failing to send a control - don't mark node dead when already disconnected - moved node list lock code into common code (This used to be ctdb commit bcc0432d0fea7ef223f82ccee81cf35c18144b1b) 2007-06-02 04:03:28 +04:00			`/*`
			`we are the recmaster, and recovery is needed - start a recovery run`
			`*/`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`static int do_recovery(struct ctdb_recoverd *rec,`
store the num_active variable (number of connected/active nodes) inside the rec structure and avoid passing this as an extra parameter to do_recovery() (This used to be ctdb commit 8bb229aa3b4bd41e48d4e4e2e148d8680c8ba436) 2008-02-29 04:55:20 +03:00			`TALLOC_CTX *mem_ctx, uint32_t pnn,`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`struct ctdb_node_map nodemap, struct ctdb_vnn_map vnnmap,`
make it possible to re-start a recovery without marking the current node as the culprit. (This used to be ctdb commit 3a69fad0b1dee4a482461680c556358409e53c4d) 2008-06-13 05:47:42 +04:00			`int32_t culprit)`
add a helper function to create all missing remote databases detected during recovery (This used to be ctdb commit 04758c6f7d8f61260be6d2472380cb7904984427) 2007-05-06 04:04:37 +04:00			`{`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`struct ctdb_context *ctdb = rec->ctdb;`
create a helper function to make sure the local node that does recovery has all the databases that exist on any other remote node (This used to be ctdb commit 0f436e3d40fea6e6a146019b0c664e80e81e88b4) 2007-05-06 04:12:42 +04:00			`int i, j, ret;`
add a helper function to create all missing remote databases detected during recovery (This used to be ctdb commit 04758c6f7d8f61260be6d2472380cb7904984427) 2007-05-06 04:04:37 +04:00			`uint32_t generation;`
			`struct ctdb_dbid_map *dbmap;`
added paranoid transaction ids (This used to be ctdb commit afc1da53873cdbd31fcc8c6b22fae262e344cf6e) 2008-01-06 05:24:55 +03:00			`TDB_DATA data;`
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`uint32_t *nodes;`
- moved cmdline options that are only relevant to ctdbd into ctdbd.c - fixed a valgrind error on failing to send a control - don't mark node dead when already disconnected - moved node list lock code into common code (This used to be ctdb commit bcc0432d0fea7ef223f82ccee81cf35c18144b1b) 2007-06-02 04:03:28 +04:00
added debug constants to allow for better mapping to syslog levels (This used to be ctdb commit 7ba8f1dde318eab03f4257e5a89fd23e7281e502) 2008-02-04 09:44:24 +03:00			`DEBUG(DEBUG_NOTICE, (__location__ " Starting do_recovery\n"));`
added some debug lines to help track down a problem (This used to be ctdb commit 2ca31e9de179f76e392a26cc8305e2473357c760) 2007-10-18 10:27:36 +04:00
- merge from ronnie - add a flag to check that recovery completed correctly. If not, re-trigger it in monitoring (This used to be ctdb commit d5ed941d9bab4af30d8b5f9b77bdf43d9218d69b) 2007-09-14 03:49:12 +04:00			`/* if recovery fails, force it again */`
			`rec->need_recovery = true;`

make it possible to re-start a recovery without marking the current node as the culprit. (This used to be ctdb commit 3a69fad0b1dee4a482461680c556358409e53c4d) 2008-06-13 05:47:42 +04:00			`if (culprit != -1) {`
			`ctdb_set_culprit(rec, culprit);`
			`}`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00
			`if (rec->culprit_counter > 2*nodemap->num) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_NOTICE,("Node %u has caused %u recoveries in %.0f seconds - banning it for %u seconds\n",`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`culprit, rec->culprit_counter, timeval_elapsed(&rec->first_recover_time),`
			`ctdb->tunable.recovery_ban_period));`
added admin commands to ban/unban nodes (This used to be ctdb commit 4dad04172e7e4955b5bf6444a85b19901c9683ad) 2007-06-07 10:34:33 +04:00			`ctdb_ban_node(rec, culprit, ctdb->tunable.recovery_ban_period);`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`}`

- make specification of a recovery lock file compulsory - die if someone other than the recmaster can get the recovery lock (This used to be ctdb commit a827d0d0e430ca8ad5d521367e45097185492869) 2007-06-02 05:36:42 +04:00			`if (!ctdb_recovery_lock(ctdb, true)) {`
we are the culprit if we can't get the reclock (This used to be ctdb commit 1d320e113c6134ff6822b985a47131d8204af35a) 2007-10-05 06:01:40 +04:00			`ctdb_set_culprit(rec, pnn);`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,("Unable to get recovery lock - aborting recovery\n"));`
- moved cmdline options that are only relevant to ctdbd into ctdbd.c - fixed a valgrind error on failing to send a control - don't mark node dead when already disconnected - moved node list lock code into common code (This used to be ctdb commit bcc0432d0fea7ef223f82ccee81cf35c18144b1b) 2007-06-02 04:03:28 +04:00			`return -1;`
			`}`
add a helper function to create all missing remote databases detected during recovery (This used to be ctdb commit 04758c6f7d8f61260be6d2472380cb7904984427) 2007-05-06 04:04:37 +04:00
added debug constants to allow for better mapping to syslog levels (This used to be ctdb commit 7ba8f1dde318eab03f4257e5a89fd23e7281e502) 2008-02-04 09:44:24 +03:00			`DEBUG(DEBUG_NOTICE, (__location__ " Recovery initiated due to problem with node %u\n", culprit));`
- startup frozen, and do an initial recovery - fixed a bug in traverse - get a lock on the node list file in the recmaster recovery daemon (This used to be ctdb commit 162a5647535ad1cb3e8e5d4042a2784365fb1913) 2007-05-23 08:35:19 +04:00
add a helper function to create all missing remote databases detected during recovery (This used to be ctdb commit 04758c6f7d8f61260be6d2472380cb7904984427) 2007-05-06 04:04:37 +04:00			`/* get a list of all databases */`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`ret = ctdb_ctrl_getdbmap(ctdb, CONTROL_TIMEOUT(), pnn, mem_ctx, &dbmap);`
add a helper function to create all missing remote databases detected during recovery (This used to be ctdb commit 04758c6f7d8f61260be6d2472380cb7904984427) 2007-05-06 04:04:37 +04:00			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to get dbids from node :%u\n", pnn));`
add a helper function to create all missing remote databases detected during recovery (This used to be ctdb commit 04758c6f7d8f61260be6d2472380cb7904984427) 2007-05-06 04:04:37 +04:00			`return -1;`
			`}`

new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`/* we do the db creation before we set the recovery mode, so the freeze happens`
			`on all databases we will be dealing with. */`
add a helper function to create all missing remote databases detected during recovery (This used to be ctdb commit 04758c6f7d8f61260be6d2472380cb7904984427) 2007-05-06 04:04:37 +04:00
create a helper function to make sure the local node that does recovery has all the databases that exist on any other remote node (This used to be ctdb commit 0f436e3d40fea6e6a146019b0c664e80e81e88b4) 2007-05-06 04:12:42 +04:00			`/* verify that we have all the databases any other node has */`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`ret = create_missing_local_databases(ctdb, nodemap, pnn, &dbmap, mem_ctx);`
create a helper function to make sure the local node that does recovery has all the databases that exist on any other remote node (This used to be ctdb commit 0f436e3d40fea6e6a146019b0c664e80e81e88b4) 2007-05-06 04:12:42 +04:00			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to create missing local databases\n"));`
create a helper function to make sure the local node that does recovery has all the databases that exist on any other remote node (This used to be ctdb commit 0f436e3d40fea6e6a146019b0c664e80e81e88b4) 2007-05-06 04:12:42 +04:00			`return -1;`
recovery daemon this program is a client to the local ctdb daemon every second it pulls all vnnmap and nodemaps from all nodes that are available and checks if a recovery is required a recovery is required if : * all nodes do NOT have an identical vnnmap and generation * all nodes do NOT have an identical nodemap * there are active nodes that are NOT in the nodemap * there are nodes in the nodemap that are NOT active During recovery, the recovery tool will also make sure that all nodes know about and have created all databases. (This used to be ctdb commit 2f2650467bac7e8954de7c17cb34f46b0bdbcd26) 2007-05-04 09:21:40 +04:00			`}`

			`/* verify that all other nodes have all our databases */`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`ret = create_missing_remote_databases(ctdb, nodemap, pnn, dbmap, mem_ctx);`
add a helper function to create all missing remote databases detected during recovery (This used to be ctdb commit 04758c6f7d8f61260be6d2472380cb7904984427) 2007-05-06 04:04:37 +04:00			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to create missing remote databases\n"));`
add a helper function to create all missing remote databases detected during recovery (This used to be ctdb commit 04758c6f7d8f61260be6d2472380cb7904984427) 2007-05-06 04:04:37 +04:00			`return -1;`
recovery daemon this program is a client to the local ctdb daemon every second it pulls all vnnmap and nodemaps from all nodes that are available and checks if a recovery is required a recovery is required if : * all nodes do NOT have an identical vnnmap and generation * all nodes do NOT have an identical nodemap * there are active nodes that are NOT in the nodemap * there are nodes in the nodemap that are NOT active During recovery, the recovery tool will also make sure that all nodes know about and have created all databases. (This used to be ctdb commit 2f2650467bac7e8954de7c17cb34f46b0bdbcd26) 2007-05-04 09:21:40 +04:00			`}`

added debug constants to allow for better mapping to syslog levels (This used to be ctdb commit 7ba8f1dde318eab03f4257e5a89fd23e7281e502) 2008-02-04 09:44:24 +03:00			`DEBUG(DEBUG_NOTICE, (__location__ " Recovery - created remote databases\n"));`
- merged ctdb_store test from ronnie - added DatabaseHashSize tunable - added logging of events inside recovery (for timing) (This used to be ctdb commit 3593cdb928b91e217faf1b3c537fa28dc82cdace) 2007-06-17 17:31:44 +04:00
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`/* set recovery mode to active on all nodes */`
			`ret = set_recovery_mode(ctdb, nodemap, CTDB_RECOVERY_ACTIVE);`
			`if (ret!=0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to set recovery mode to active on cluster\n"));`
break the code that repoints dmaster for all local and remote records into a separate helper function (This used to be ctdb commit d5ab30d0ac21e736eb34eaa19bccfee5f0ce7cfb) 2007-05-06 04:22:13 +04:00			`return -1;`
recovery daemon this program is a client to the local ctdb daemon every second it pulls all vnnmap and nodemaps from all nodes that are available and checks if a recovery is required a recovery is required if : * all nodes do NOT have an identical vnnmap and generation * all nodes do NOT have an identical nodemap * there are active nodes that are NOT in the nodemap * there are nodes in the nodemap that are NOT active During recovery, the recovery tool will also make sure that all nodes know about and have created all databases. (This used to be ctdb commit 2f2650467bac7e8954de7c17cb34f46b0bdbcd26) 2007-05-04 09:21:40 +04:00			`}`

merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`/* execute the "startrecovery" event script on all nodes */`
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`ret = run_startrecovery_eventscript(rec, nodemap);`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`if (ret!=0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to run the 'startrecovery' event on cluster\n"));`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`return -1;`
			`}`

new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`/* pick a new generation number */`
			`generation = new_generation();`
break the code that repoints dmaster for all local and remote records into a separate helper function (This used to be ctdb commit d5ab30d0ac21e736eb34eaa19bccfee5f0ce7cfb) 2007-05-06 04:22:13 +04:00
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`/* change the vnnmap on this node to use the new generation`
			`number but not on any other nodes.`
			`this guarantees that if we abort the recovery prematurely`
			`for some reason (a node stops responding?)`
			`that we can just return immediately and we will reenter`
			`recovery shortly again.`
			`I.e. we deliberately leave the cluster with an inconsistent`
			`generation id to allow us to abort recovery at any stage and`
			`just restart it from scratch.`
more optimisations to recovery (This used to be ctdb commit 9a41ad0a842cd4f3792d6e84b5c809b7ff6f342e) 2008-01-02 14:44:46 +03:00			`*/`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`vnnmap->generation = generation;`
			`ret = ctdb_ctrl_setvnnmap(ctdb, CONTROL_TIMEOUT(), pnn, mem_ctx, vnnmap);`
more optimisations to recovery (This used to be ctdb commit 9a41ad0a842cd4f3792d6e84b5c809b7ff6f342e) 2008-01-02 14:44:46 +03:00			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to set vnnmap for node %u\n", pnn));`
more optimisations to recovery (This used to be ctdb commit 9a41ad0a842cd4f3792d6e84b5c809b7ff6f342e) 2008-01-02 14:44:46 +03:00			`return -1;`
			`}`

added paranoid transaction ids (This used to be ctdb commit afc1da53873cdbd31fcc8c6b22fae262e344cf6e) 2008-01-06 05:24:55 +03:00			`data.dptr = (void *)&generation;`
			`data.dsize = sizeof(uint32_t);`

add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`nodes = list_of_active_nodes(ctdb, nodemap, mem_ctx, true);`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`if (ctdb_client_async_control(ctdb, CTDB_CONTROL_TRANSACTION_START,`
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`nodes,`
			`CONTROL_TIMEOUT(), false, data,`
			`NULL, NULL,`
			`NULL) != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to start transactions. Recovery failed.\n"));`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`return -1;`
			`}`
more optimisations to recovery (This used to be ctdb commit 9a41ad0a842cd4f3792d6e84b5c809b7ff6f342e) 2008-01-02 14:44:46 +03:00
added debug constants to allow for better mapping to syslog levels (This used to be ctdb commit 7ba8f1dde318eab03f4257e5a89fd23e7281e502) 2008-02-04 09:44:24 +03:00			`DEBUG(DEBUG_NOTICE,(__location__ " started transactions on all nodes\n"));`
more optimisations to recovery (This used to be ctdb commit 9a41ad0a842cd4f3792d6e84b5c809b7ff6f342e) 2008-01-02 14:44:46 +03:00
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`for (i=0;i<dbmap->num;i++) {`
added paranoid transaction ids (This used to be ctdb commit afc1da53873cdbd31fcc8c6b22fae262e344cf6e) 2008-01-06 05:24:55 +03:00			`if (recover_database(rec, mem_ctx, dbmap->dbs[i].dbid, pnn, nodemap, generation) != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Failed to recover database 0x%x\n", dbmap->dbs[i].dbid));`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`return -1;`
			`}`
			`}`

added debug constants to allow for better mapping to syslog levels (This used to be ctdb commit 7ba8f1dde318eab03f4257e5a89fd23e7281e502) 2008-02-04 09:44:24 +03:00			`DEBUG(DEBUG_NOTICE, (__location__ " Recovery - starting database commits\n"));`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00
			`/* commit all the changes */`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`if (ctdb_client_async_control(ctdb, CTDB_CONTROL_TRANSACTION_COMMIT,`
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`nodes,`
			`CONTROL_TIMEOUT(), false, data,`
			`NULL, NULL,`
			`NULL) != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to commit recovery changes. Recovery failed.\n"));`
create a helper function for recovery to push all local databases out onto the remote nodes (This used to be ctdb commit 1ba76d374652cfa29e56fb77c7190349e42d3bcc) 2007-05-06 04:38:44 +04:00			`return -1;`
recovery daemon this program is a client to the local ctdb daemon every second it pulls all vnnmap and nodemaps from all nodes that are available and checks if a recovery is required a recovery is required if : * all nodes do NOT have an identical vnnmap and generation * all nodes do NOT have an identical nodemap * there are active nodes that are NOT in the nodemap * there are nodes in the nodemap that are NOT active During recovery, the recovery tool will also make sure that all nodes know about and have created all databases. (This used to be ctdb commit 2f2650467bac7e8954de7c17cb34f46b0bdbcd26) 2007-05-04 09:21:40 +04:00			`}`

added debug constants to allow for better mapping to syslog levels (This used to be ctdb commit 7ba8f1dde318eab03f4257e5a89fd23e7281e502) 2008-02-04 09:44:24 +03:00			`DEBUG(DEBUG_NOTICE, (__location__ " Recovery - committed databases\n"));`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00
create a helper function for recovery to push all local databases out onto the remote nodes (This used to be ctdb commit 1ba76d374652cfa29e56fb77c7190349e42d3bcc) 2007-05-06 04:38:44 +04:00
Expand the client async framework so that it can take a callback function. This allows us to use the async framework also for controls that return outdata. Add a "capabilities" field to the ctdb_node structure. This field is only initialized and kept valid inside the recovery daemon context and not inside the main ctdb daemon. change the GET_CAPABILITIES control to return the capabilities in outdata instead of in the res return variable. When performing a recovery inside the recovery daemon, read the capabilities from all connected nodes and update the ctdb->nodes list of nodes. when building the new vnnmap after the database rebuild in recovery, do not include any nodes which lack the LMASTER capability in the new vnnmap. Unless there are no available connected node that sports the LMASTER capability in which case we let the local node (recmaster) take on the lmaster role temporarily (i.e. become a member of the vnnmap list) (This used to be ctdb commit 0f1883c69c689b28b0c04148774840b2c4081df6) 2008-05-06 09:42:59 +04:00			`/* update the capabilities for all nodes */`
			`ret = update_capabilities(ctdb, nodemap);`
			`if (ret!=0) {`
			`DEBUG(DEBUG_ERR, (__location__ " Unable to update node capabilities.\n"));`
			`return -1;`
			`}`

implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`/* build a new vnn map with all the currently active and`
			`unbanned nodes */`
create a define to represent the 'invalid' generation id we used in two places. create a new helper function to generate new generation id values that know about the invalid id and avoids generating it. update the ctdb status tool to know about the invalid generation id and print the string INVALID instead (This used to be ctdb commit 4fbcd189543cb8a92227fdcd3d158472e558ccda) 2007-08-22 06:38:31 +04:00			`generation = new_generation();`
remove old s3 recovery code fixed vnnmap wire format in recover daemon (This used to be ctdb commit e03fab7bfe0cf43f40c49a3d63e75dc44001d8d8) 2007-05-10 02:49:57 +04:00			`vnnmap = talloc(mem_ctx, struct ctdb_vnn_map);`
			`CTDB_NO_MEMORY(ctdb, vnnmap);`
recovery daemon this program is a client to the local ctdb daemon every second it pulls all vnnmap and nodemaps from all nodes that are available and checks if a recovery is required a recovery is required if : * all nodes do NOT have an identical vnnmap and generation * all nodes do NOT have an identical nodemap * there are active nodes that are NOT in the nodemap * there are nodes in the nodemap that are NOT active During recovery, the recovery tool will also make sure that all nodes know about and have created all databases. (This used to be ctdb commit 2f2650467bac7e8954de7c17cb34f46b0bdbcd26) 2007-05-04 09:21:40 +04:00			`vnnmap->generation = generation;`
Expand the client async framework so that it can take a callback function. This allows us to use the async framework also for controls that return outdata. Add a "capabilities" field to the ctdb_node structure. This field is only initialized and kept valid inside the recovery daemon context and not inside the main ctdb daemon. change the GET_CAPABILITIES control to return the capabilities in outdata instead of in the res return variable. When performing a recovery inside the recovery daemon, read the capabilities from all connected nodes and update the ctdb->nodes list of nodes. when building the new vnnmap after the database rebuild in recovery, do not include any nodes which lack the LMASTER capability in the new vnnmap. Unless there are no available connected node that sports the LMASTER capability in which case we let the local node (recmaster) take on the lmaster role temporarily (i.e. become a member of the vnnmap list) (This used to be ctdb commit 0f1883c69c689b28b0c04148774840b2c4081df6) 2008-05-06 09:42:59 +04:00			`vnnmap->size = 0;`
added admin commands to ban/unban nodes (This used to be ctdb commit 4dad04172e7e4955b5bf6444a85b19901c9683ad) 2007-06-07 10:34:33 +04:00			`vnnmap->map = talloc_zero_array(vnnmap, uint32_t, vnnmap->size);`
Expand the client async framework so that it can take a callback function. This allows us to use the async framework also for controls that return outdata. Add a "capabilities" field to the ctdb_node structure. This field is only initialized and kept valid inside the recovery daemon context and not inside the main ctdb daemon. change the GET_CAPABILITIES control to return the capabilities in outdata instead of in the res return variable. When performing a recovery inside the recovery daemon, read the capabilities from all connected nodes and update the ctdb->nodes list of nodes. when building the new vnnmap after the database rebuild in recovery, do not include any nodes which lack the LMASTER capability in the new vnnmap. Unless there are no available connected node that sports the LMASTER capability in which case we let the local node (recmaster) take on the lmaster role temporarily (i.e. become a member of the vnnmap list) (This used to be ctdb commit 0f1883c69c689b28b0c04148774840b2c4081df6) 2008-05-06 09:42:59 +04:00			`CTDB_NO_MEMORY(ctdb, vnnmap->map);`
recovery daemon this program is a client to the local ctdb daemon every second it pulls all vnnmap and nodemaps from all nodes that are available and checks if a recovery is required a recovery is required if : * all nodes do NOT have an identical vnnmap and generation * all nodes do NOT have an identical nodemap * there are active nodes that are NOT in the nodemap * there are nodes in the nodemap that are NOT active During recovery, the recovery tool will also make sure that all nodes know about and have created all databases. (This used to be ctdb commit 2f2650467bac7e8954de7c17cb34f46b0bdbcd26) 2007-05-04 09:21:40 +04:00			`for (i=j=0;i<nodemap->num;i++) {`
Expand the client async framework so that it can take a callback function. This allows us to use the async framework also for controls that return outdata. Add a "capabilities" field to the ctdb_node structure. This field is only initialized and kept valid inside the recovery daemon context and not inside the main ctdb daemon. change the GET_CAPABILITIES control to return the capabilities in outdata instead of in the res return variable. When performing a recovery inside the recovery daemon, read the capabilities from all connected nodes and update the ctdb->nodes list of nodes. when building the new vnnmap after the database rebuild in recovery, do not include any nodes which lack the LMASTER capability in the new vnnmap. Unless there are no available connected node that sports the LMASTER capability in which case we let the local node (recmaster) take on the lmaster role temporarily (i.e. become a member of the vnnmap list) (This used to be ctdb commit 0f1883c69c689b28b0c04148774840b2c4081df6) 2008-05-06 09:42:59 +04:00			`if (nodemap->nodes[i].flags & NODE_FLAGS_INACTIVE) {`
			`continue;`
recovery daemon this program is a client to the local ctdb daemon every second it pulls all vnnmap and nodemaps from all nodes that are available and checks if a recovery is required a recovery is required if : * all nodes do NOT have an identical vnnmap and generation * all nodes do NOT have an identical nodemap * there are active nodes that are NOT in the nodemap * there are nodes in the nodemap that are NOT active During recovery, the recovery tool will also make sure that all nodes know about and have created all databases. (This used to be ctdb commit 2f2650467bac7e8954de7c17cb34f46b0bdbcd26) 2007-05-04 09:21:40 +04:00			`}`
Expand the client async framework so that it can take a callback function. This allows us to use the async framework also for controls that return outdata. Add a "capabilities" field to the ctdb_node structure. This field is only initialized and kept valid inside the recovery daemon context and not inside the main ctdb daemon. change the GET_CAPABILITIES control to return the capabilities in outdata instead of in the res return variable. When performing a recovery inside the recovery daemon, read the capabilities from all connected nodes and update the ctdb->nodes list of nodes. when building the new vnnmap after the database rebuild in recovery, do not include any nodes which lack the LMASTER capability in the new vnnmap. Unless there are no available connected node that sports the LMASTER capability in which case we let the local node (recmaster) take on the lmaster role temporarily (i.e. become a member of the vnnmap list) (This used to be ctdb commit 0f1883c69c689b28b0c04148774840b2c4081df6) 2008-05-06 09:42:59 +04:00			`if (!(ctdb->nodes[i]->capabilities & CTDB_CAP_LMASTER)) {`
			`/* this node can not be an lmaster */`
			`DEBUG(DEBUG_DEBUG, ("Node %d cant be a LMASTER, skipping it\n", i));`
			`continue;`
			`}`

			`vnnmap->size++;`
fixed realloc bug Should always use type safe talloc functions when possible. In this case we were allocating bytes instead of uint32_t (This used to be ctdb commit cb14ee57dd0a589242da1ac2830bb7939df460a5) 2008-05-08 13:59:24 +04:00			`vnnmap->map = talloc_realloc(vnnmap, vnnmap->map, uint32_t, vnnmap->size);`
Expand the client async framework so that it can take a callback function. This allows us to use the async framework also for controls that return outdata. Add a "capabilities" field to the ctdb_node structure. This field is only initialized and kept valid inside the recovery daemon context and not inside the main ctdb daemon. change the GET_CAPABILITIES control to return the capabilities in outdata instead of in the res return variable. When performing a recovery inside the recovery daemon, read the capabilities from all connected nodes and update the ctdb->nodes list of nodes. when building the new vnnmap after the database rebuild in recovery, do not include any nodes which lack the LMASTER capability in the new vnnmap. Unless there are no available connected node that sports the LMASTER capability in which case we let the local node (recmaster) take on the lmaster role temporarily (i.e. become a member of the vnnmap list) (This used to be ctdb commit 0f1883c69c689b28b0c04148774840b2c4081df6) 2008-05-06 09:42:59 +04:00			`CTDB_NO_MEMORY(ctdb, vnnmap->map);`
			`vnnmap->map[j++] = nodemap->nodes[i].pnn;`

recovery daemon this program is a client to the local ctdb daemon every second it pulls all vnnmap and nodemaps from all nodes that are available and checks if a recovery is required a recovery is required if : * all nodes do NOT have an identical vnnmap and generation * all nodes do NOT have an identical nodemap * there are active nodes that are NOT in the nodemap * there are nodes in the nodemap that are NOT active During recovery, the recovery tool will also make sure that all nodes know about and have created all databases. (This used to be ctdb commit 2f2650467bac7e8954de7c17cb34f46b0bdbcd26) 2007-05-04 09:21:40 +04:00			`}`
Expand the client async framework so that it can take a callback function. This allows us to use the async framework also for controls that return outdata. Add a "capabilities" field to the ctdb_node structure. This field is only initialized and kept valid inside the recovery daemon context and not inside the main ctdb daemon. change the GET_CAPABILITIES control to return the capabilities in outdata instead of in the res return variable. When performing a recovery inside the recovery daemon, read the capabilities from all connected nodes and update the ctdb->nodes list of nodes. when building the new vnnmap after the database rebuild in recovery, do not include any nodes which lack the LMASTER capability in the new vnnmap. Unless there are no available connected node that sports the LMASTER capability in which case we let the local node (recmaster) take on the lmaster role temporarily (i.e. become a member of the vnnmap list) (This used to be ctdb commit 0f1883c69c689b28b0c04148774840b2c4081df6) 2008-05-06 09:42:59 +04:00			`if (vnnmap->size == 0) {`
			`DEBUG(DEBUG_NOTICE, ("No suitable lmasters found. Adding local node (recmaster) anyway.\n"));`
			`vnnmap->size++;`
fixed realloc bug Should always use type safe talloc functions when possible. In this case we were allocating bytes instead of uint32_t (This used to be ctdb commit cb14ee57dd0a589242da1ac2830bb7939df460a5) 2008-05-08 13:59:24 +04:00			`vnnmap->map = talloc_realloc(vnnmap, vnnmap->map, uint32_t, vnnmap->size);`
Expand the client async framework so that it can take a callback function. This allows us to use the async framework also for controls that return outdata. Add a "capabilities" field to the ctdb_node structure. This field is only initialized and kept valid inside the recovery daemon context and not inside the main ctdb daemon. change the GET_CAPABILITIES control to return the capabilities in outdata instead of in the res return variable. When performing a recovery inside the recovery daemon, read the capabilities from all connected nodes and update the ctdb->nodes list of nodes. when building the new vnnmap after the database rebuild in recovery, do not include any nodes which lack the LMASTER capability in the new vnnmap. Unless there are no available connected node that sports the LMASTER capability in which case we let the local node (recmaster) take on the lmaster role temporarily (i.e. become a member of the vnnmap list) (This used to be ctdb commit 0f1883c69c689b28b0c04148774840b2c4081df6) 2008-05-06 09:42:59 +04:00			`CTDB_NO_MEMORY(ctdb, vnnmap->map);`
			`vnnmap->map[0] = pnn;`
			`}`
recovery daemon this program is a client to the local ctdb daemon every second it pulls all vnnmap and nodemaps from all nodes that are available and checks if a recovery is required a recovery is required if : * all nodes do NOT have an identical vnnmap and generation * all nodes do NOT have an identical nodemap * there are active nodes that are NOT in the nodemap * there are nodes in the nodemap that are NOT active During recovery, the recovery tool will also make sure that all nodes know about and have created all databases. (This used to be ctdb commit 2f2650467bac7e8954de7c17cb34f46b0bdbcd26) 2007-05-04 09:21:40 +04:00
break out the code to update all nodes to the new vnnmap into a helper function (This used to be ctdb commit 81d39177949b54715710907d14ddc888dc09b064) 2007-05-06 04:42:18 +04:00			`/* update to the new vnnmap on all nodes */`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`ret = update_vnnmap_on_all_nodes(ctdb, nodemap, pnn, vnnmap, mem_ctx);`
break out the code to update all nodes to the new vnnmap into a helper function (This used to be ctdb commit 81d39177949b54715710907d14ddc888dc09b064) 2007-05-06 04:42:18 +04:00			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to update vnnmap on all nodes\n"));`
break out the code to update all nodes to the new vnnmap into a helper function (This used to be ctdb commit 81d39177949b54715710907d14ddc888dc09b064) 2007-05-06 04:42:18 +04:00			`return -1;`
recovery daemon this program is a client to the local ctdb daemon every second it pulls all vnnmap and nodemaps from all nodes that are available and checks if a recovery is required a recovery is required if : * all nodes do NOT have an identical vnnmap and generation * all nodes do NOT have an identical nodemap * there are active nodes that are NOT in the nodemap * there are nodes in the nodemap that are NOT active During recovery, the recovery tool will also make sure that all nodes know about and have created all databases. (This used to be ctdb commit 2f2650467bac7e8954de7c17cb34f46b0bdbcd26) 2007-05-04 09:21:40 +04:00			`}`

added debug constants to allow for better mapping to syslog levels (This used to be ctdb commit 7ba8f1dde318eab03f4257e5a89fd23e7281e502) 2008-02-04 09:44:24 +03:00			`DEBUG(DEBUG_NOTICE, (__location__ " Recovery - updated vnnmap\n"));`
recovery daemon this program is a client to the local ctdb daemon every second it pulls all vnnmap and nodemaps from all nodes that are available and checks if a recovery is required a recovery is required if : * all nodes do NOT have an identical vnnmap and generation * all nodes do NOT have an identical nodemap * there are active nodes that are NOT in the nodemap * there are nodes in the nodemap that are NOT active During recovery, the recovery tool will also make sure that all nodes know about and have created all databases. (This used to be ctdb commit 2f2650467bac7e8954de7c17cb34f46b0bdbcd26) 2007-05-04 09:21:40 +04:00
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`/* update recmaster to point to us for all nodes */`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`ret = set_recovery_master(ctdb, nodemap, pnn);`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`if (ret!=0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to set recovery master\n"));`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`return -1;`
			`}`

added debug constants to allow for better mapping to syslog levels (This used to be ctdb commit 7ba8f1dde318eab03f4257e5a89fd23e7281e502) 2008-02-04 09:44:24 +03:00			`DEBUG(DEBUG_NOTICE, (__location__ " Recovery - updated recmaster\n"));`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`/*`
			`update all nodes to have the same flags that we have`
			`*/`
			`ret = update_flags_on_all_nodes(ctdb, nodemap);`
			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to update flags on all nodes\n"));`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`return -1;`
			`}`

added debug constants to allow for better mapping to syslog levels (This used to be ctdb commit 7ba8f1dde318eab03f4257e5a89fd23e7281e502) 2008-02-04 09:44:24 +03:00			`DEBUG(DEBUG_NOTICE, (__location__ " Recovery - updated flags\n"));`
- merged ctdb_store test from ronnie - added DatabaseHashSize tunable - added logging of events inside recovery (for timing) (This used to be ctdb commit 3593cdb928b91e217faf1b3c537fa28dc82cdace) 2007-06-17 17:31:44 +04:00
Fix the chicken and egg problem with ctdb/samba and a registry smb.conf This attempts to fix the problem of ctdb event scripts blocking due to attempted access to the ctdb databases during recovery. The changes are: - now only the 'shutdown' and 'startrecovery' events can be called with the databases locked in recovery. The event scripts must ensure that for these two events no database access is attempted - the recovered, takeip and releaseip events could previously be called inside a recovery. The code now ensures that this doesn't happen, delaying the events till after recovery has finished - the 50.samba event script now avoids using testparm unless it is really needed This needs extensive testing. (This used to be ctdb commit e3cdb8f2be6a44ec877efcd75c7297edb008a80b) 2008-05-14 14:57:04 +04:00			`/* disable recovery mode */`
			`ret = set_recovery_mode(ctdb, nodemap, CTDB_RECOVERY_NORMAL);`
			`if (ret!=0) {`
			`DEBUG(DEBUG_ERR, (__location__ " Unable to set recovery mode to normal on cluster\n"));`
			`return -1;`
			`}`

			`DEBUG(DEBUG_NOTICE, (__location__ " Recovery - disabled recovery mode\n"));`

added IP takeover logic for public IPs to ctdb (This used to be ctdb commit 374adb729472670f35cef41269b8719f49c0de0e) 2007-05-25 11:04:13 +04:00			`/*`
fix a bug where the public ip addresses of the cluster would not be redistributed across the cluster after a recovery was performed. Remove a bogus check inside the recovery daemon that ONLY redistributed public addresses IFF the local node had/served public addresses. This was a valid optimization long ago when we enforced that all nodes must use the same public addresses file but is invalid today where we can have different public addresses configs on all nodes and even have some nodes that do NOT use public addresses at all. (This used to be ctdb commit 5833e6b99d9afaf35dc8354df8676b9115418b23) 2008-05-09 07:41:31 +04:00			`tell nodes to takeover their public IPs`
added IP takeover logic for public IPs to ctdb (This used to be ctdb commit 374adb729472670f35cef41269b8719f49c0de0e) 2007-05-25 11:04:13 +04:00			`*/`
fix a bug where the public ip addresses of the cluster would not be redistributed across the cluster after a recovery was performed. Remove a bogus check inside the recovery daemon that ONLY redistributed public addresses IFF the local node had/served public addresses. This was a valid optimization long ago when we enforced that all nodes must use the same public addresses file but is invalid today where we can have different public addresses configs on all nodes and even have some nodes that do NOT use public addresses at all. (This used to be ctdb commit 5833e6b99d9afaf35dc8354df8676b9115418b23) 2008-05-09 07:41:31 +04:00			`rec->need_takeover_run = false;`
			`ret = ctdb_takeover_run(ctdb, nodemap);`
			`if (ret != 0) {`
			`DEBUG(DEBUG_ERR, (__location__ " Unable to setup public takeover addresses\n"));`
			`return -1;`
added IP takeover logic for public IPs to ctdb (This used to be ctdb commit 374adb729472670f35cef41269b8719f49c0de0e) 2007-05-25 11:04:13 +04:00			`}`
read the current debuglevel in each loop in the recovery daemon so that we pick up when they change in the parent daemon (This used to be ctdb commit 792d5471ff0c2947b6e66183925860de27f30eaf) 2008-02-18 11:38:04 +03:00			`DEBUG(DEBUG_NOTICE, (__location__ " Recovery - takeip finished\n"));`

merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`/* execute the "recovered" event script on all nodes */`
Update some debug statements. Dont say that recovery failed if the failed function was invoked from outside of recovery (This used to be ctdb commit 3038d0b74895b51af4f85f2f304508ed16d245f4) 2008-05-15 06:28:52 +04:00			`ret = run_recovered_eventscript(ctdb, nodemap, "do_recovery");`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`if (ret!=0) {`
Update some debug statements. Dont say that recovery failed if the failed function was invoked from outside of recovery (This used to be ctdb commit 3038d0b74895b51af4f85f2f304508ed16d245f4) 2008-05-15 06:28:52 +04:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to run the 'recovered' event on cluster. Recovery process failed.\n"));`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`return -1;`
			`}`

read the current debuglevel in each loop in the recovery daemon so that we pick up when they change in the parent daemon (This used to be ctdb commit 792d5471ff0c2947b6e66183925860de27f30eaf) 2008-02-18 11:38:04 +03:00			`DEBUG(DEBUG_NOTICE, (__location__ " Recovery - finished the recovered event\n"));`

send a message to clients when an IP has been released (This used to be ctdb commit 8b7ab0b00253462593d368052c2cb10a385b4e63) 2007-05-25 18:05:30 +04:00			`/* send a message to all clients telling them that the cluster`
			`has been reconfigured */`
change the structure used for node flag change messages so that we can see both the old flags as well as the new flags (so we can tell which flags changed) send the CTDB_SRVID_RECONFIGURE messages to connected nodes only, not to every node, connected or not, in the cluster. in the handler inside the recovery daemon which is invoked for node flag change messages, only do a takeover_run() and redistribute the ip addresses IF it was the disabled or the unhealthy flags that changed. Also send out the cluster reconfigured message in this case. If any of the other flags changed we dont need to do the takeover_run(0 here since that will be done during recovery. (This used to be ctdb commit 5549b2058e2c148a8ca9d419123acf3247bb8829) 2007-08-21 11:25:15 +04:00			`ctdb_send_message(ctdb, CTDB_BROADCAST_CONNECTED, CTDB_SRVID_RECONFIGURE, tdb_null);`
recovery daemon this program is a client to the local ctdb daemon every second it pulls all vnnmap and nodemaps from all nodes that are available and checks if a recovery is required a recovery is required if : * all nodes do NOT have an identical vnnmap and generation * all nodes do NOT have an identical nodemap * there are active nodes that are NOT in the nodemap * there are nodes in the nodemap that are NOT active During recovery, the recovery tool will also make sure that all nodes know about and have created all databases. (This used to be ctdb commit 2f2650467bac7e8954de7c17cb34f46b0bdbcd26) 2007-05-04 09:21:40 +04:00
added debug constants to allow for better mapping to syslog levels (This used to be ctdb commit 7ba8f1dde318eab03f4257e5a89fd23e7281e502) 2008-02-04 09:44:24 +03:00			`DEBUG(DEBUG_NOTICE, (__location__ " Recovery complete\n"));`
add a tuneable to control how long we wait after a successful recovery before we alow another recovery to be initiated (This used to be ctdb commit f3b43519423b7a73e6a2dd986bdf11203b8653cf) 2007-07-04 02:36:59 +04:00
- merge from ronnie - add a flag to check that recovery completed correctly. If not, re-trigger it in monitoring (This used to be ctdb commit d5ed941d9bab4af30d8b5f9b77bdf43d9218d69b) 2007-09-14 03:49:12 +04:00			`rec->need_recovery = false;`

add a tuneable to control how long we wait after a successful recovery before we alow another recovery to be initiated (This used to be ctdb commit f3b43519423b7a73e6a2dd986bdf11203b8653cf) 2007-07-04 02:36:59 +04:00			`/* We just finished a recovery successfully.`
			`We now wait for rerecovery_timeout before we allow`
			`another recovery to take place.`
			`*/`
added debug constants to allow for better mapping to syslog levels (This used to be ctdb commit 7ba8f1dde318eab03f4257e5a89fd23e7281e502) 2008-02-04 09:44:24 +03:00			`DEBUG(DEBUG_NOTICE, (__location__ " New recoveries supressed for the rerecovery timeout\n"));`
add a tuneable to control how long we wait after a successful recovery before we alow another recovery to be initiated (This used to be ctdb commit f3b43519423b7a73e6a2dd986bdf11203b8653cf) 2007-07-04 02:36:59 +04:00			`ctdb_wait_timeout(ctdb, ctdb->tunable.rerecovery_timeout);`
added debug constants to allow for better mapping to syslog levels (This used to be ctdb commit 7ba8f1dde318eab03f4257e5a89fd23e7281e502) 2008-02-04 09:44:24 +03:00			`DEBUG(DEBUG_NOTICE, (__location__ " Rerecovery timeout elapsed. Recovery reactivated.\n"));`
add a tuneable to control how long we wait after a successful recovery before we alow another recovery to be initiated (This used to be ctdb commit f3b43519423b7a73e6a2dd986bdf11203b8653cf) 2007-07-04 02:36:59 +04:00
recovery daemon this program is a client to the local ctdb daemon every second it pulls all vnnmap and nodemaps from all nodes that are available and checks if a recovery is required a recovery is required if : * all nodes do NOT have an identical vnnmap and generation * all nodes do NOT have an identical nodemap * there are active nodes that are NOT in the nodemap * there are nodes in the nodemap that are NOT active During recovery, the recovery tool will also make sure that all nodes know about and have created all databases. (This used to be ctdb commit 2f2650467bac7e8954de7c17cb34f46b0bdbcd26) 2007-05-04 09:21:40 +04:00			`return 0;`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`}`
start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00
add a test in the function that checks whether the cluster needs recovery or not that all active nodes are in normal mode. If we discover that some node is still in recoverymode it may indicate that a previous recovery ended prematurely and thus we should start a new recovery (This used to be ctdb commit c15517872e6c98c8c425a8d47d2b348ecb0620b0) 2007-05-06 22:41:12 +04:00
choose the most connected node first (This used to be ctdb commit c7c17a79fa4f28509e34b6f635fa62517dc458c2) 2007-06-07 13:17:27 +04:00			`/*`
			`elections are won by first checking the number of connected nodes, then`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`the priority time, then the pnn`
choose the most connected node first (This used to be ctdb commit c7c17a79fa4f28509e34b6f635fa62517dc458c2) 2007-06-07 13:17:27 +04:00			`*/`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`struct election_message {`
choose the most connected node first (This used to be ctdb commit c7c17a79fa4f28509e34b6f635fa62517dc458c2) 2007-06-07 13:17:27 +04:00			`uint32_t num_connected;`
use a priority time for the election data, not just the vnn (This used to be ctdb commit a691f9c5cd77194005f0d98483da94b07a48d57d) 2007-06-07 12:37:27 +04:00			`struct timeval priority_time;`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`uint32_t pnn;`
- catch ESTALE in the recovery lock by trying a read() - priortise nodes that are unbanned and healthy in the election (This used to be ctdb commit 929feb475dfdf7283f0e99b50b179e1c91d3a39f) 2007-10-05 07:28:21 +04:00			`uint32_t node_flags;`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`};`

choose the most connected node first (This used to be ctdb commit c7c17a79fa4f28509e34b6f635fa62517dc458c2) 2007-06-07 13:17:27 +04:00			`/*`
			`form this nodes election data`
			`*/`
			`static void ctdb_election_data(struct ctdb_recoverd rec, struct election_message em)`
			`{`
			`int ret, i;`
			`struct ctdb_node_map *nodemap;`
			`struct ctdb_context *ctdb = rec->ctdb;`

			`ZERO_STRUCTP(em);`

change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`em->pnn = rec->ctdb->pnn;`
choose the most connected node first (This used to be ctdb commit c7c17a79fa4f28509e34b6f635fa62517dc458c2) 2007-06-07 13:17:27 +04:00			`em->priority_time = rec->priority_time;`
- catch ESTALE in the recovery lock by trying a read() - priortise nodes that are unbanned and healthy in the election (This used to be ctdb commit 929feb475dfdf7283f0e99b50b179e1c91d3a39f) 2007-10-05 07:28:21 +04:00			`em->node_flags = rec->node_flags;`
choose the most connected node first (This used to be ctdb commit c7c17a79fa4f28509e34b6f635fa62517dc458c2) 2007-06-07 13:17:27 +04:00
			`ret = ctdb_ctrl_getnodemap(ctdb, CONTROL_TIMEOUT(), CTDB_CURRENT_NODE, rec, &nodemap);`
			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,(__location__ " unable to get election data\n"));`
choose the most connected node first (This used to be ctdb commit c7c17a79fa4f28509e34b6f635fa62517dc458c2) 2007-06-07 13:17:27 +04:00			`return;`
			`}`

			`for (i=0;i<nodemap->num;i++) {`
			`if (!(nodemap->nodes[i].flags & NODE_FLAGS_DISCONNECTED)) {`
			`em->num_connected++;`
			`}`
			`}`
make sure we lose all elections for recmaster role if we do not have the recmaster capability. (unless there are no other node at all available with this capability) (This used to be ctdb commit 8556e9dc897c6b9b9be0b52f391effb1f72fbd80) 2008-05-06 07:56:56 +04:00
			`/* we shouldnt try to win this election if we cant be a recmaster */`
			`if ((ctdb->capabilities & CTDB_CAP_RECMASTER) == 0) {`
			`em->num_connected = 0;`
			`em->priority_time = timeval_current();`
			`}`

choose the most connected node first (This used to be ctdb commit c7c17a79fa4f28509e34b6f635fa62517dc458c2) 2007-06-07 13:17:27 +04:00			`talloc_free(nodemap);`
			`}`

			`/*`
			`see if the given election data wins`
			`*/`
			`static bool ctdb_election_win(struct ctdb_recoverd rec, struct election_message em)`
			`{`
			`struct election_message myem;`
- catch ESTALE in the recovery lock by trying a read() - priortise nodes that are unbanned and healthy in the election (This used to be ctdb commit 929feb475dfdf7283f0e99b50b179e1c91d3a39f) 2007-10-05 07:28:21 +04:00			`int cmp = 0;`
choose the most connected node first (This used to be ctdb commit c7c17a79fa4f28509e34b6f635fa62517dc458c2) 2007-06-07 13:17:27 +04:00
			`ctdb_election_data(rec, &myem);`

make sure we lose all elections for recmaster role if we do not have the recmaster capability. (unless there are no other node at all available with this capability) (This used to be ctdb commit 8556e9dc897c6b9b9be0b52f391effb1f72fbd80) 2008-05-06 07:56:56 +04:00			`/* we cant win if we dont have the recmaster capability */`
			`if ((rec->ctdb->capabilities & CTDB_CAP_RECMASTER) == 0) {`
			`return false;`
			`}`

simplify election handling make sure we read and update the flags from all remote nodes before we reach the first codepath that can call do_recovery() since during do_recovery() we need to know what the flags are. (This used to be ctdb commit e85f3806483ea420559d449e0e4d81bec996740f) 2007-10-11 00:16:36 +04:00			`/* we cant win if we are banned */`
			`if (rec->node_flags & NODE_FLAGS_BANNED) {`
merge from ronnie (This used to be ctdb commit d18712caba11855010be52f90bac656683076676) 2007-10-15 08:17:49 +04:00			`return false;`
simplify election handling make sure we read and update the flags from all remote nodes before we reach the first codepath that can call do_recovery() since during do_recovery() we need to know what the flags are. (This used to be ctdb commit e85f3806483ea420559d449e0e4d81bec996740f) 2007-10-11 00:16:36 +04:00			`}`
- catch ESTALE in the recovery lock by trying a read() - priortise nodes that are unbanned and healthy in the election (This used to be ctdb commit 929feb475dfdf7283f0e99b50b179e1c91d3a39f) 2007-10-05 07:28:21 +04:00
simplify election handling make sure we read and update the flags from all remote nodes before we reach the first codepath that can call do_recovery() since during do_recovery() we need to know what the flags are. (This used to be ctdb commit e85f3806483ea420559d449e0e4d81bec996740f) 2007-10-11 00:16:36 +04:00			`/* we will automatically win if the other node is banned */`
			`if (em->node_flags & NODE_FLAGS_BANNED) {`
merge from ronnie (This used to be ctdb commit d18712caba11855010be52f90bac656683076676) 2007-10-15 08:17:49 +04:00			`return true;`
- catch ESTALE in the recovery lock by trying a read() - priortise nodes that are unbanned and healthy in the election (This used to be ctdb commit 929feb475dfdf7283f0e99b50b179e1c91d3a39f) 2007-10-05 07:28:21 +04:00			`}`

choose the most connected node first (This used to be ctdb commit c7c17a79fa4f28509e34b6f635fa62517dc458c2) 2007-06-07 13:17:27 +04:00			`/* try to use the most connected node */`
- catch ESTALE in the recovery lock by trying a read() - priortise nodes that are unbanned and healthy in the election (This used to be ctdb commit 929feb475dfdf7283f0e99b50b179e1c91d3a39f) 2007-10-05 07:28:21 +04:00			`if (cmp == 0) {`
			`cmp = (int)myem.num_connected - (int)em->num_connected;`
			`}`
choose the most connected node first (This used to be ctdb commit c7c17a79fa4f28509e34b6f635fa62517dc458c2) 2007-06-07 13:17:27 +04:00
			`/* then the longest running node */`
			`if (cmp == 0) {`
later times are a lower priority, not a higher priority (This used to be ctdb commit e96424e7d366df29767c4eeaccdcc0cc975cb8ae) 2007-06-07 13:21:55 +04:00			`cmp = timeval_compare(&em->priority_time, &myem.priority_time);`
choose the most connected node first (This used to be ctdb commit c7c17a79fa4f28509e34b6f635fa62517dc458c2) 2007-06-07 13:17:27 +04:00			`}`

			`if (cmp == 0) {`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`cmp = (int)myem.pnn - (int)em->pnn;`
choose the most connected node first (This used to be ctdb commit c7c17a79fa4f28509e34b6f635fa62517dc458c2) 2007-06-07 13:17:27 +04:00			`}`

			`return cmp > 0;`
			`}`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00
			`/*`
			`send out an election request`
			`*/`
make election handling much more scalable (This used to be ctdb commit 05938d462b92bd9ecb8e35f53651bded47c48675) 2007-11-13 02:27:44 +03:00			`static int send_election_request(struct ctdb_recoverd *rec, uint32_t pnn)`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`{`
			`int ret;`
			`TDB_DATA election_data;`
			`struct election_message emsg;`
			`uint64_t srvid;`
use a priority time for the election data, not just the vnn (This used to be ctdb commit a691f9c5cd77194005f0d98483da94b07a48d57d) 2007-06-07 12:37:27 +04:00			`struct ctdb_context *ctdb = rec->ctdb;`
simplify election handling make sure we read and update the flags from all remote nodes before we reach the first codepath that can call do_recovery() since during do_recovery() we need to know what the flags are. (This used to be ctdb commit e85f3806483ea420559d449e0e4d81bec996740f) 2007-10-11 00:16:36 +04:00
added health monitoring logic to ctdb, so a node loses its public IP address if one of the sybsystem event scripts reports a problem (This used to be ctdb commit c7a089256d86cec21097453bce5acbccee87413f) 2007-06-06 04:25:46 +04:00			`srvid = CTDB_SRVID_RECOVERY;`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00
choose the most connected node first (This used to be ctdb commit c7c17a79fa4f28509e34b6f635fa62517dc458c2) 2007-06-07 13:17:27 +04:00			`ctdb_election_data(rec, &emsg);`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00
			`election_data.dsize = sizeof(struct election_message);`
			`election_data.dptr = (unsigned char *)&emsg;`


			`/* first we assume we will win the election and set`
			`recoverymaster to be ourself on the current node`
			`*/`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`ret = ctdb_ctrl_setrecmaster(ctdb, CONTROL_TIMEOUT(), pnn, pnn);`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " failed to send recmaster election request\n"));`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`return -1;`
			`}`


			`/* send an election message to all active nodes */`
			`ctdb_send_message(ctdb, CTDB_BROADCAST_ALL, srvid, election_data);`

			`return 0;`
			`}`

unban all nodes when we release recmaster role or when we win an election (This used to be ctdb commit 48fb7483b3fe391e2d0b78718af29f69a641525e) 2007-06-09 14:11:51 +04:00			`/*`
			`this function will unban all nodes in the cluster`
			`*/`
			`static void unban_all_nodes(struct ctdb_context *ctdb)`
			`{`
			`int ret, i;`
			`struct ctdb_node_map *nodemap;`
			`TALLOC_CTX *tmp_ctx = talloc_new(ctdb);`

			`ret = ctdb_ctrl_getnodemap(ctdb, CONTROL_TIMEOUT(), CTDB_CURRENT_NODE, tmp_ctx, &nodemap);`
			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,(__location__ " failed to get nodemap to unban all nodes\n"));`
unban all nodes when we release recmaster role or when we win an election (This used to be ctdb commit 48fb7483b3fe391e2d0b78718af29f69a641525e) 2007-06-09 14:11:51 +04:00			`return;`
			`}`

			`for (i=0;i<nodemap->num;i++) {`
			`if ( (!(nodemap->nodes[i].flags & NODE_FLAGS_DISCONNECTED))`
			`&& (nodemap->nodes[i].flags & NODE_FLAGS_BANNED) ) {`
change how we do public addresses and takeover so that we can have multiple public addresses spread across multiple interfaces on each node. this is a massive patch since we have previously made the assumtion that we only have one public address per node. get rid of the public_interface argument. the public addresses file now explicitely lists which interface the address belongs to (This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8) 2007-09-04 03:50:07 +04:00			`ctdb_ctrl_modflags(ctdb, CONTROL_TIMEOUT(), nodemap->nodes[i].pnn, 0, NODE_FLAGS_BANNED);`
unban all nodes when we release recmaster role or when we win an election (This used to be ctdb commit 48fb7483b3fe391e2d0b78718af29f69a641525e) 2007-06-09 14:11:51 +04:00			`}`
			`}`

			`talloc_free(tmp_ctx);`
			`}`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00
make election handling much more scalable (This used to be ctdb commit 05938d462b92bd9ecb8e35f53651bded47c48675) 2007-11-13 02:27:44 +03:00
			`/*`
			`we think we are winning the election - send a broadcast election request`
			`*/`
			`static void election_send_request(struct event_context ev, struct timed_event te, struct timeval t, void *p)`
			`{`
			`struct ctdb_recoverd *rec = talloc_get_type(p, struct ctdb_recoverd);`
			`int ret;`

			`ret = send_election_request(rec, ctdb_get_pnn(rec->ctdb));`
			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,("Failed to send election request!\n"));`
make election handling much more scalable (This used to be ctdb commit 05938d462b92bd9ecb8e35f53651bded47c48675) 2007-11-13 02:27:44 +03:00			`}`

			`talloc_free(rec->send_election_te);`
			`rec->send_election_te = NULL;`
			`}`

add improvements to tracking memory usage in ctdbd adn the recovery daemon and a ctdb command to pull the talloc memory map from a recovery daemon ctdb rddumpmemory (This used to be ctdb commit d23950be7406cf288f48b660c0f57a9b8d7bdd05) 2008-04-01 08:34:54 +04:00			`/*`
			`handler for memory dumps`
			`*/`
			`static void mem_dump_handler(struct ctdb_context *ctdb, uint64_t srvid,`
			`TDB_DATA data, void *private_data)`
			`{`
			`TALLOC_CTX *tmp_ctx = talloc_new(ctdb);`
			`TDB_DATA *dump;`
			`int ret;`
			`struct rd_memdump_reply *rd;`

			`if (data.dsize != sizeof(struct rd_memdump_reply)) {`
			`DEBUG(DEBUG_ERR, (__location__ " Wrong size of return address.\n"));`
			`return;`
			`}`
			`rd = (struct rd_memdump_reply *)data.dptr;`

			`dump = talloc_zero(tmp_ctx, TDB_DATA);`
			`if (dump == NULL) {`
			`DEBUG(DEBUG_ERR, (__location__ " Failed to allocate memory for memdump\n"));`
			`talloc_free(tmp_ctx);`
			`return;`
			`}`
			`ret = ctdb_dump_memory(ctdb, dump);`
			`if (ret != 0) {`
			`DEBUG(DEBUG_ERR, (__location__ " ctdb_dump_memory() failed\n"));`
			`talloc_free(tmp_ctx);`
			`return;`
			`}`

			`DEBUG(DEBUG_ERR, ("recovery master memory dump\n"));`

			`ret = ctdb_send_message(ctdb, rd->pnn, rd->srvid, *dump);`
			`if (ret != 0) {`
			`DEBUG(DEBUG_ERR,("Failed to send rd memdump reply message\n"));`
			`return;`
			`}`

			`talloc_free(tmp_ctx);`
			`}`

recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`/*`
			`handler for recovery master elections`
			`*/`
			`static void election_handler(struct ctdb_context *ctdb, uint64_t srvid,`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`TDB_DATA data, void *private_data)`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`{`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`struct ctdb_recoverd *rec = talloc_get_type(private_data, struct ctdb_recoverd);`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`int ret;`
			`struct election_message em = (struct election_message )data.dptr;`
			`TALLOC_CTX *mem_ctx;`

make election handling much more scalable (This used to be ctdb commit 05938d462b92bd9ecb8e35f53651bded47c48675) 2007-11-13 02:27:44 +03:00			`/* we got an election packet - update the timeout for the election */`
			`talloc_free(rec->election_timeout);`
			`rec->election_timeout = event_add_timed(ctdb->ev, ctdb,`
			`timeval_current_ofs(ctdb->tunable.election_timeout, 0),`
			`ctdb_election_timeout, rec);`

setup the random number generator a bit better (This used to be ctdb commit 708585eb0ed31b0df6543a1d7a20b82e751877c2) 2007-05-10 07:10:23 +04:00			`mem_ctx = talloc_new(ctdb);`
choose the most connected node first (This used to be ctdb commit c7c17a79fa4f28509e34b6f635fa62517dc458c2) 2007-06-07 13:17:27 +04:00
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`/* someone called an election. check their election data`
			`and if we disagree and we would rather be the elected node,`
			`send a new election message to all other nodes`
			`*/`
choose the most connected node first (This used to be ctdb commit c7c17a79fa4f28509e34b6f635fa62517dc458c2) 2007-06-07 13:17:27 +04:00			`if (ctdb_election_win(rec, em)) {`
make election handling much more scalable (This used to be ctdb commit 05938d462b92bd9ecb8e35f53651bded47c48675) 2007-11-13 02:27:44 +03:00			`if (!rec->send_election_te) {`
			`rec->send_election_te = event_add_timed(ctdb->ev, rec,`
			`timeval_current_ofs(0, 500000),`
			`election_send_request, rec);`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`}`
			`talloc_free(mem_ctx);`
should be sufficient to unban nodes when we unbecome recmaster (This used to be ctdb commit 8a6c4e675b4b877a9d0a7a3701973573ff0b71e8) 2007-06-09 14:13:25 +04:00			`/unban_all_nodes(ctdb);/`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`return;`
			`}`
make election handling much more scalable (This used to be ctdb commit 05938d462b92bd9ecb8e35f53651bded47c48675) 2007-11-13 02:27:44 +03:00
			`/* we didn't win */`
			`talloc_free(rec->send_election_te);`
			`rec->send_election_te = NULL;`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00
- startup frozen, and do an initial recovery - fixed a bug in traverse - get a lock on the node list file in the recmaster recovery daemon (This used to be ctdb commit 162a5647535ad1cb3e8e5d4042a2784365fb1913) 2007-05-23 08:35:19 +04:00			`/* release the recmaster lock */`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`if (em->pnn != ctdb->pnn &&`
fixed a race condition in the handling of the recovery lock (This used to be ctdb commit 3b98c5ad23662259b0eed399ab0c8037cf9b2b0b) 2007-06-03 04:29:14 +04:00			`ctdb->recovery_lock_fd != -1) {`
- make specification of a recovery lock file compulsory - die if someone other than the recmaster can get the recovery lock (This used to be ctdb commit a827d0d0e430ca8ad5d521367e45097185492869) 2007-06-02 05:36:42 +04:00			`close(ctdb->recovery_lock_fd);`
			`ctdb->recovery_lock_fd = -1;`
unban all nodes when we release recmaster role or when we win an election (This used to be ctdb commit 48fb7483b3fe391e2d0b78718af29f69a641525e) 2007-06-09 14:11:51 +04:00			`unban_all_nodes(ctdb);`
- startup frozen, and do an initial recovery - fixed a bug in traverse - get a lock on the node list file in the recmaster recovery daemon (This used to be ctdb commit 162a5647535ad1cb3e8e5d4042a2784365fb1913) 2007-05-23 08:35:19 +04:00			`}`

recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`/* ok, let that guy become recmaster then */`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`ret = ctdb_ctrl_setrecmaster(ctdb, CONTROL_TIMEOUT(), ctdb_get_pnn(ctdb), em->pnn);`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " failed to send recmaster election request"));`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`talloc_free(mem_ctx);`
			`return;`
			`}`

added admin commands to ban/unban nodes (This used to be ctdb commit 4dad04172e7e4955b5bf6444a85b19901c9683ad) 2007-06-07 10:34:33 +04:00			`/* release any bans */`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`rec->last_culprit = (uint32_t)-1;`
			`talloc_free(rec->banned_nodes);`
added admin commands to ban/unban nodes (This used to be ctdb commit 4dad04172e7e4955b5bf6444a85b19901c9683ad) 2007-06-07 10:34:33 +04:00			`rec->banned_nodes = talloc_zero_array(rec, struct ban_state *, ctdb->num_nodes);`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`CTDB_NO_MEMORY_FATAL(ctdb, rec->banned_nodes);`

recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`talloc_free(mem_ctx);`
			`return;`
			`}`


implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`/*`
			`force the start of the election process`
			`*/`
add a new tunable : reclockpingperiod once every such interval : * the recovery master on each node will uppdate the "connected" count in the reclock count file (ctdb getreclock) * if the node thinks it is a recovery master but it detects another node that is DISCONNECTED but which still holds a lock to the reclock count file this may mean that we have a split cluster. if that other node that is DISCONNECTED but still holds the lock on hte reclock pnn count file, is MORE connected than the local node, yield the recmaster role and let the other half of the lcuster take over this add a second, last chance mechanism to detect split clusters. IF the cluster is split but GPFS is not yet split, this mechanism makes the largest half of the cluster become the active half. (This used to be ctdb commit 07af425f444531942cce8abff112c1524228d287) 2008-03-03 01:19:30 +03:00			`static void force_election(struct ctdb_recoverd *rec, uint32_t pnn,`
use a priority time for the election data, not just the vnn (This used to be ctdb commit a691f9c5cd77194005f0d98483da94b07a48d57d) 2007-06-07 12:37:27 +04:00			`struct ctdb_node_map *nodemap)`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`{`
			`int ret;`
use a priority time for the election data, not just the vnn (This used to be ctdb commit a691f9c5cd77194005f0d98483da94b07a48d57d) 2007-06-07 12:37:27 +04:00			`struct ctdb_context *ctdb = rec->ctdb;`
when starting a new election, also force all nodes into recovery mode so there is no internode traffic to interfere with our election (This used to be ctdb commit ccfb67a076c72a0e7f2b6dc5fce9c19f652ba2ad) 2007-05-10 03:48:14 +04:00
			`/* set all nodes to recovery mode to stop all internode traffic */`
			`ret = set_recovery_mode(ctdb, nodemap, CTDB_RECOVERY_ACTIVE);`
			`if (ret!=0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to set recovery mode to active on cluster\n"));`
when starting a new election, also force all nodes into recovery mode so there is no internode traffic to interfere with our election (This used to be ctdb commit ccfb67a076c72a0e7f2b6dc5fce9c19f652ba2ad) 2007-05-10 03:48:14 +04:00			`return;`
			`}`
make election handling much more scalable (This used to be ctdb commit 05938d462b92bd9ecb8e35f53651bded47c48675) 2007-11-13 02:27:44 +03:00
			`talloc_free(rec->election_timeout);`
			`rec->election_timeout = event_add_timed(ctdb->ev, ctdb,`
			`timeval_current_ofs(ctdb->tunable.election_timeout, 0),`
			`ctdb_election_timeout, rec);`

			`ret = send_election_request(rec, pnn);`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`if (ret!=0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " failed to initiate recmaster election"));`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`return;`
			`}`

moved system specific ip code to system.c (This used to be ctdb commit 9de9e4ccda9665108baac12a8716b189d26340b1) 2007-05-26 08:01:08 +04:00			`/* wait for a few seconds to collect all responses */`
make election handling much more scalable (This used to be ctdb commit 05938d462b92bd9ecb8e35f53651bded47c48675) 2007-11-13 02:27:44 +03:00			`ctdb_wait_election(rec);`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`}`



			`/*`
			`handler for when a node changes its flags`
			`*/`
			`static void monitor_handler(struct ctdb_context *ctdb, uint64_t srvid,`
			`TDB_DATA data, void *private_data)`
			`{`
			`int ret;`
			`struct ctdb_node_flag_change c = (struct ctdb_node_flag_change )data.dptr;`
			`struct ctdb_node_map *nodemap=NULL;`
			`TALLOC_CTX *tmp_ctx;`
change the structure used for node flag change messages so that we can see both the old flags as well as the new flags (so we can tell which flags changed) send the CTDB_SRVID_RECONFIGURE messages to connected nodes only, not to every node, connected or not, in the cluster. in the handler inside the recovery daemon which is invoked for node flag change messages, only do a takeover_run() and redistribute the ip addresses IF it was the disabled or the unhealthy flags that changed. Also send out the cluster reconfigured message in this case. If any of the other flags changed we dont need to do the takeover_run(0 here since that will be done during recovery. (This used to be ctdb commit 5549b2058e2c148a8ca9d419123acf3247bb8829) 2007-08-21 11:25:15 +04:00			`uint32_t changed_flags;`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`int i;`
prevent recursion in the calling of ctdb_takeover_run (This used to be ctdb commit 0fbdeb7c91b965d9bc5ecc7b24e31070378d8f1d) 2007-09-13 08:08:18 +04:00			`struct ctdb_recoverd *rec = talloc_get_type(private_data, struct ctdb_recoverd);`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00
			`if (data.dsize != sizeof(*c)) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,(__location__ "Invalid data in ctdb_node_flag_change\n"));`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`return;`
			`}`

			`tmp_ctx = talloc_new(ctdb);`
			`CTDB_NO_MEMORY_VOID(ctdb, tmp_ctx);`

			`ret = ctdb_ctrl_getnodemap(ctdb, CONTROL_TIMEOUT(), CTDB_CURRENT_NODE, tmp_ctx, &nodemap);`
fixed segv on failed ctdb_ctrl_getnodemap (This used to be ctdb commit 5daf9a72f0e60a9af7cf32ae6d759be7d94857ec) 2007-12-27 02:07:01 +03:00			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,(__location__ "ctdb_ctrl_getnodemap failed in monitor_handler\n"));`
fixed segv on failed ctdb_ctrl_getnodemap (This used to be ctdb commit 5daf9a72f0e60a9af7cf32ae6d759be7d94857ec) 2007-12-27 02:07:01 +03:00			`talloc_free(tmp_ctx);`
			`return;`
			`}`

implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00
			`for (i=0;i<nodemap->num;i++) {`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`if (nodemap->nodes[i].pnn == c->pnn) break;`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`}`

			`if (i == nodemap->num) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_CRIT,(__location__ "Flag change for non-existant node %u\n", c->pnn));`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`talloc_free(tmp_ctx);`
			`return;`
			`}`

change the structure used for node flag change messages so that we can see both the old flags as well as the new flags (so we can tell which flags changed) send the CTDB_SRVID_RECONFIGURE messages to connected nodes only, not to every node, connected or not, in the cluster. in the handler inside the recovery daemon which is invoked for node flag change messages, only do a takeover_run() and redistribute the ip addresses IF it was the disabled or the unhealthy flags that changed. Also send out the cluster reconfigured message in this case. If any of the other flags changed we dont need to do the takeover_run(0 here since that will be done during recovery. (This used to be ctdb commit 5549b2058e2c148a8ca9d419123acf3247bb8829) 2007-08-21 11:25:15 +04:00			`changed_flags = c->old_flags ^ c->new_flags;`

when a remote node has sent us a message to update the flags for a node, dont let those messages modify the DISCONNECTED flag. the DISCONNECTED flag must be managed locally since it describes whether the local node can communicate with the remote node or not (This used to be ctdb commit 5650673205d335a32d4f27f66847ea66752a00f0) 2007-07-09 07:21:17 +04:00			`/* Dont let messages from remote nodes change the DISCONNECTED flag.`
			`This flag is handled locally based on whether the local node`
			`can communicate with the node or not.`
			`*/`
change the structure used for node flag change messages so that we can see both the old flags as well as the new flags (so we can tell which flags changed) send the CTDB_SRVID_RECONFIGURE messages to connected nodes only, not to every node, connected or not, in the cluster. in the handler inside the recovery daemon which is invoked for node flag change messages, only do a takeover_run() and redistribute the ip addresses IF it was the disabled or the unhealthy flags that changed. Also send out the cluster reconfigured message in this case. If any of the other flags changed we dont need to do the takeover_run(0 here since that will be done during recovery. (This used to be ctdb commit 5549b2058e2c148a8ca9d419123acf3247bb8829) 2007-08-21 11:25:15 +04:00			`c->new_flags &= ~NODE_FLAGS_DISCONNECTED;`
nicer handling of DISCONNECTED flag when we update the node flags from a remote message (This used to be ctdb commit 9a50ad22be61a09761ffda89de91ef3221917c84) 2007-07-09 11:40:15 +04:00			`if (nodemap->nodes[i].flags&NODE_FLAGS_DISCONNECTED) {`
change the structure used for node flag change messages so that we can see both the old flags as well as the new flags (so we can tell which flags changed) send the CTDB_SRVID_RECONFIGURE messages to connected nodes only, not to every node, connected or not, in the cluster. in the handler inside the recovery daemon which is invoked for node flag change messages, only do a takeover_run() and redistribute the ip addresses IF it was the disabled or the unhealthy flags that changed. Also send out the cluster reconfigured message in this case. If any of the other flags changed we dont need to do the takeover_run(0 here since that will be done during recovery. (This used to be ctdb commit 5549b2058e2c148a8ca9d419123acf3247bb8829) 2007-08-21 11:25:15 +04:00			`c->new_flags \|= NODE_FLAGS_DISCONNECTED;`
nicer handling of DISCONNECTED flag when we update the node flags from a remote message (This used to be ctdb commit 9a50ad22be61a09761ffda89de91ef3221917c84) 2007-07-09 11:40:15 +04:00			`}`
when a remote node has sent us a message to update the flags for a node, dont let those messages modify the DISCONNECTED flag. the DISCONNECTED flag must be managed locally since it describes whether the local node can communicate with the remote node or not (This used to be ctdb commit 5650673205d335a32d4f27f66847ea66752a00f0) 2007-07-09 07:21:17 +04:00
change the structure used for node flag change messages so that we can see both the old flags as well as the new flags (so we can tell which flags changed) send the CTDB_SRVID_RECONFIGURE messages to connected nodes only, not to every node, connected or not, in the cluster. in the handler inside the recovery daemon which is invoked for node flag change messages, only do a takeover_run() and redistribute the ip addresses IF it was the disabled or the unhealthy flags that changed. Also send out the cluster reconfigured message in this case. If any of the other flags changed we dont need to do the takeover_run(0 here since that will be done during recovery. (This used to be ctdb commit 5549b2058e2c148a8ca9d419123acf3247bb8829) 2007-08-21 11:25:15 +04:00			`if (nodemap->nodes[i].flags != c->new_flags) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_NOTICE,("Node %u has changed flags - now 0x%x was 0x%x\n", c->pnn, c->new_flags, c->old_flags));`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`}`

change the structure used for node flag change messages so that we can see both the old flags as well as the new flags (so we can tell which flags changed) send the CTDB_SRVID_RECONFIGURE messages to connected nodes only, not to every node, connected or not, in the cluster. in the handler inside the recovery daemon which is invoked for node flag change messages, only do a takeover_run() and redistribute the ip addresses IF it was the disabled or the unhealthy flags that changed. Also send out the cluster reconfigured message in this case. If any of the other flags changed we dont need to do the takeover_run(0 here since that will be done during recovery. (This used to be ctdb commit 5549b2058e2c148a8ca9d419123acf3247bb8829) 2007-08-21 11:25:15 +04:00			`nodemap->nodes[i].flags = c->new_flags;`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`ret = ctdb_ctrl_getrecmaster(ctdb, tmp_ctx, CONTROL_TIMEOUT(),`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`CTDB_CURRENT_NODE, &ctdb->recovery_master);`

			`if (ret == 0) {`
hang the ctdb_req_control structure off the ctdb_client_control_state struct so that if we timeout a control we can print debug info such as what opcode failed and to which node we dont need the *status parameter to ctdb_client_control_state create async versions of the getrecmaster control pass a memory context to getrecmaster (This used to be ctdb commit 558b680c82f830fba82c283c78c2de8a0b150b75) 2007-08-23 07:00:10 +04:00			`ret = ctdb_ctrl_getrecmode(ctdb, tmp_ctx, CONTROL_TIMEOUT(),`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`CTDB_CURRENT_NODE, &ctdb->recovery_mode);`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`}`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00
			`if (ret == 0 &&`
change ctdb->vnn to ctdb->pnn (This used to be ctdb commit 8c776e5707e503ec6586aae39ac6b3ea5a2fd2bc) 2007-09-04 04:06:36 +04:00			`ctdb->recovery_master == ctdb->pnn &&`
remove some unnessecary tests if ->vnn is null or not (This used to be ctdb commit f0169ac8166a19d65ce254496e21d095aed87c2f) 2008-05-15 07:28:19 +04:00			`ctdb->recovery_mode == CTDB_RECOVERY_NORMAL) {`
change the structure used for node flag change messages so that we can see both the old flags as well as the new flags (so we can tell which flags changed) send the CTDB_SRVID_RECONFIGURE messages to connected nodes only, not to every node, connected or not, in the cluster. in the handler inside the recovery daemon which is invoked for node flag change messages, only do a takeover_run() and redistribute the ip addresses IF it was the disabled or the unhealthy flags that changed. Also send out the cluster reconfigured message in this case. If any of the other flags changed we dont need to do the takeover_run(0 here since that will be done during recovery. (This used to be ctdb commit 5549b2058e2c148a8ca9d419123acf3247bb8829) 2007-08-21 11:25:15 +04:00			`/* Only do the takeover run if the perm disabled or unhealthy`
			`flags changed since these will cause an ip failover but not`
			`a recovery.`
			`If the node became disconnected or banned this will also`
			`lead to an ip address failover but that is handled`
			`during recovery`
			`*/`
			`if (changed_flags & NODE_FLAGS_DISABLED) {`
prevent recursion in the calling of ctdb_takeover_run (This used to be ctdb commit 0fbdeb7c91b965d9bc5ecc7b24e31070378d8f1d) 2007-09-13 08:08:18 +04:00			`rec->need_takeover_run = true;`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`}`
			`}`

			`talloc_free(tmp_ctx);`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`}`
add a test in the function that checks whether the cluster needs recovery or not that all active nodes are in normal mode. If we discover that some node is still in recoverymode it may indicate that a previous recovery ended prematurely and thus we should start a new recovery (This used to be ctdb commit c15517872e6c98c8c425a8d47d2b348ecb0620b0) 2007-05-06 22:41:12 +04:00
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00
break checking that the recoverymode on all nodes are ok out into its own function (This used to be ctdb commit 813cf9a252af96da24122b80f24aabeed2911939) 2007-08-23 07:48:39 +04:00
change the monitoring of recmode in the recovery daemon to use a fully async eventdriven api for controls (This used to be ctdb commit 8d0e43428c507967a0d96e6a4c5c821ac269c546) 2007-08-27 03:40:10 +04:00			`struct verify_recmode_normal_data {`
			`uint32_t count;`
			`enum monitor_result status;`
			`};`

			`static void verify_recmode_normal_callback(struct ctdb_client_control_state *state)`
			`{`
change async.private to async.private_data since private is a reserved work in c++ (This used to be ctdb commit 79eb28f6cd5dcc30b04966d202a050eaf98a2552) 2007-09-26 08:25:32 +04:00			`struct verify_recmode_normal_data *rmdata = talloc_get_type(state->async.private_data, struct verify_recmode_normal_data);`
change the monitoring of recmode in the recovery daemon to use a fully async eventdriven api for controls (This used to be ctdb commit 8d0e43428c507967a0d96e6a4c5c821ac269c546) 2007-08-27 03:40:10 +04:00

			`/* one more node has responded with recmode data*/`
			`rmdata->count--;`

			`/* if we failed to get the recmode, then return an error and let`
			`the main loop try again.`
			`*/`
			`if (state->state != CTDB_CONTROL_DONE) {`
			`if (rmdata->status == MONITOR_OK) {`
			`rmdata->status = MONITOR_FAILED;`
			`}`
			`return;`
			`}`

			`/* if we got a response, then the recmode will be stored in the`
			`status field`
			`*/`
			`if (state->status != CTDB_RECOVERY_NORMAL) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_NOTICE, (__location__ " Node:%u was in recovery mode. Restart recovery process\n", state->c->hdr.destnode));`
change the monitoring of recmode in the recovery daemon to use a fully async eventdriven api for controls (This used to be ctdb commit 8d0e43428c507967a0d96e6a4c5c821ac269c546) 2007-08-27 03:40:10 +04:00			`rmdata->status = MONITOR_RECOVERY_NEEDED;`
			`}`

			`return;`
			`}`


			`/* verify that all nodes are in normal recovery mode */`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`static enum monitor_result verify_recmode(struct ctdb_context ctdb, struct ctdb_node_map nodemap)`
break checking that the recoverymode on all nodes are ok out into its own function (This used to be ctdb commit 813cf9a252af96da24122b80f24aabeed2911939) 2007-08-23 07:48:39 +04:00			`{`
change the monitoring of recmode in the recovery daemon to use a fully async eventdriven api for controls (This used to be ctdb commit 8d0e43428c507967a0d96e6a4c5c821ac269c546) 2007-08-27 03:40:10 +04:00			`struct verify_recmode_normal_data *rmdata;`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`TALLOC_CTX *mem_ctx = talloc_new(ctdb);`
change the monitoring of recmode in the recovery daemon to use a fully async eventdriven api for controls (This used to be ctdb commit 8d0e43428c507967a0d96e6a4c5c821ac269c546) 2007-08-27 03:40:10 +04:00			`struct ctdb_client_control_state *state;`
			`enum monitor_result status;`
			`int j;`
break checking that the recoverymode on all nodes are ok out into its own function (This used to be ctdb commit 813cf9a252af96da24122b80f24aabeed2911939) 2007-08-23 07:48:39 +04:00
change the monitoring of recmode in the recovery daemon to use a fully async eventdriven api for controls (This used to be ctdb commit 8d0e43428c507967a0d96e6a4c5c821ac269c546) 2007-08-27 03:40:10 +04:00			`rmdata = talloc(mem_ctx, struct verify_recmode_normal_data);`
			`CTDB_NO_MEMORY_FATAL(ctdb, rmdata);`
			`rmdata->count = 0;`
			`rmdata->status = MONITOR_OK;`
break checking that the recoverymode on all nodes are ok out into its own function (This used to be ctdb commit 813cf9a252af96da24122b80f24aabeed2911939) 2007-08-23 07:48:39 +04:00
			`/* loop over all active nodes and send an async getrecmode call to`
			`them*/`
			`for (j=0; j<nodemap->num; j++) {`
			`if (nodemap->nodes[j].flags & NODE_FLAGS_INACTIVE) {`
			`continue;`
			`}`
change the monitoring of recmode in the recovery daemon to use a fully async eventdriven api for controls (This used to be ctdb commit 8d0e43428c507967a0d96e6a4c5c821ac269c546) 2007-08-27 03:40:10 +04:00			`state = ctdb_ctrl_getrecmode_send(ctdb, mem_ctx,`
break checking that the recoverymode on all nodes are ok out into its own function (This used to be ctdb commit 813cf9a252af96da24122b80f24aabeed2911939) 2007-08-23 07:48:39 +04:00			`CONTROL_TIMEOUT(),`
change how we do public addresses and takeover so that we can have multiple public addresses spread across multiple interfaces on each node. this is a massive patch since we have previously made the assumtion that we only have one public address per node. get rid of the public_interface argument. the public addresses file now explicitely lists which interface the address belongs to (This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8) 2007-09-04 03:50:07 +04:00			`nodemap->nodes[j].pnn);`
change the monitoring of recmode in the recovery daemon to use a fully async eventdriven api for controls (This used to be ctdb commit 8d0e43428c507967a0d96e6a4c5c821ac269c546) 2007-08-27 03:40:10 +04:00			`if (state == NULL) {`
			`/* we failed to send the control, treat this as`
			`an error and try again next iteration`
			`*/`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,("Failed to call ctdb_ctrl_getrecmode_send during monitoring\n"));`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`talloc_free(mem_ctx);`
break checking that the recoverymode on all nodes are ok out into its own function (This used to be ctdb commit 813cf9a252af96da24122b80f24aabeed2911939) 2007-08-23 07:48:39 +04:00			`return MONITOR_FAILED;`
			`}`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00
change the monitoring of recmode in the recovery daemon to use a fully async eventdriven api for controls (This used to be ctdb commit 8d0e43428c507967a0d96e6a4c5c821ac269c546) 2007-08-27 03:40:10 +04:00			`/* set up the callback functions */`
			`state->async.fn = verify_recmode_normal_callback;`
change async.private to async.private_data since private is a reserved work in c++ (This used to be ctdb commit 79eb28f6cd5dcc30b04966d202a050eaf98a2552) 2007-09-26 08:25:32 +04:00			`state->async.private_data = rmdata;`
change the monitoring of recmode in the recovery daemon to use a fully async eventdriven api for controls (This used to be ctdb commit 8d0e43428c507967a0d96e6a4c5c821ac269c546) 2007-08-27 03:40:10 +04:00
			`/* one more control to wait for to complete */`
			`rmdata->count++;`
break checking that the recoverymode on all nodes are ok out into its own function (This used to be ctdb commit 813cf9a252af96da24122b80f24aabeed2911939) 2007-08-23 07:48:39 +04:00			`}`

change the monitoring of recmode in the recovery daemon to use a fully async eventdriven api for controls (This used to be ctdb commit 8d0e43428c507967a0d96e6a4c5c821ac269c546) 2007-08-27 03:40:10 +04:00
			`/* now wait for up to the maximum number of seconds allowed`
			`or until all nodes we expect a response from has replied`
			`*/`
			`while (rmdata->count > 0) {`
			`event_loop_once(ctdb->ev);`
			`}`

			`status = rmdata->status;`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`talloc_free(mem_ctx);`
change the monitoring of recmode in the recovery daemon to use a fully async eventdriven api for controls (This used to be ctdb commit 8d0e43428c507967a0d96e6a4c5c821ac269c546) 2007-08-27 03:40:10 +04:00			`return status;`
break checking that the recoverymode on all nodes are ok out into its own function (This used to be ctdb commit 813cf9a252af96da24122b80f24aabeed2911939) 2007-08-23 07:48:39 +04:00			`}`

change the monitoring of recmode in the recovery daemon to use a fully async eventdriven api for controls (This used to be ctdb commit 8d0e43428c507967a0d96e6a4c5c821ac269c546) 2007-08-27 03:40:10 +04:00
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`struct verify_recmaster_data {`
when a node disgrees with us re who is recmaster make it mark that node as a lcuprit so it eventually gets banned (This used to be ctdb commit eff3f326f8ce6070c9f3c430cd14d1b71a8db220) 2008-04-21 18:56:27 +04:00			`struct ctdb_recoverd *rec;`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`uint32_t count;`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`uint32_t pnn;`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`enum monitor_result status;`
			`};`

change the api for managing callbacks to controls so that isntead of passing it as a parameter we set the callback function explicitely from the caller if the ..._send() function returned a valid state pointer. (This used to be ctdb commit aa939570662786455f63299b62c99882cff29d42) 2007-08-24 04:42:06 +04:00			`static void verify_recmaster_callback(struct ctdb_client_control_state *state)`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`{`
change async.private to async.private_data since private is a reserved work in c++ (This used to be ctdb commit 79eb28f6cd5dcc30b04966d202a050eaf98a2552) 2007-09-26 08:25:32 +04:00			`struct verify_recmaster_data *rmdata = talloc_get_type(state->async.private_data, struct verify_recmaster_data);`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00

			`/* one more node has responded with recmaster data*/`
			`rmdata->count--;`

			`/* if we failed to get the recmaster, then return an error and let`
			`the main loop try again.`
			`*/`
change the api for managing callbacks to controls so that isntead of passing it as a parameter we set the callback function explicitely from the caller if the ..._send() function returned a valid state pointer. (This used to be ctdb commit aa939570662786455f63299b62c99882cff29d42) 2007-08-24 04:42:06 +04:00			`if (state->state != CTDB_CONTROL_DONE) {`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`if (rmdata->status == MONITOR_OK) {`
			`rmdata->status = MONITOR_FAILED;`
			`}`
change the api for managing callbacks to controls so that isntead of passing it as a parameter we set the callback function explicitely from the caller if the ..._send() function returned a valid state pointer. (This used to be ctdb commit aa939570662786455f63299b62c99882cff29d42) 2007-08-24 04:42:06 +04:00			`return;`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`}`

			`/* if we got a response, then the recmaster will be stored in the`
			`status field`
			`*/`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`if (state->status != rmdata->pnn) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,("Node %d does not agree we are the recmaster. Need a new recmaster election\n", state->c->hdr.destnode));`
when a node disgrees with us re who is recmaster make it mark that node as a lcuprit so it eventually gets banned (This used to be ctdb commit eff3f326f8ce6070c9f3c430cd14d1b71a8db220) 2008-04-21 18:56:27 +04:00			`ctdb_set_culprit(rmdata->rec, state->c->hdr.destnode);`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`rmdata->status = MONITOR_ELECTION_NEEDED;`
			`}`

change the api for managing callbacks to controls so that isntead of passing it as a parameter we set the callback function explicitely from the caller if the ..._send() function returned a valid state pointer. (This used to be ctdb commit aa939570662786455f63299b62c99882cff29d42) 2007-08-24 04:42:06 +04:00			`return;`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`}`


			`/* verify that all nodes agree that we are the recmaster */`
when a node disgrees with us re who is recmaster make it mark that node as a lcuprit so it eventually gets banned (This used to be ctdb commit eff3f326f8ce6070c9f3c430cd14d1b71a8db220) 2008-04-21 18:56:27 +04:00			`static enum monitor_result verify_recmaster(struct ctdb_recoverd rec, struct ctdb_node_map nodemap, uint32_t pnn)`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`{`
when a node disgrees with us re who is recmaster make it mark that node as a lcuprit so it eventually gets banned (This used to be ctdb commit eff3f326f8ce6070c9f3c430cd14d1b71a8db220) 2008-04-21 18:56:27 +04:00			`struct ctdb_context *ctdb = rec->ctdb;`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`struct verify_recmaster_data *rmdata;`
			`TALLOC_CTX *mem_ctx = talloc_new(ctdb);`
			`struct ctdb_client_control_state *state;`
			`enum monitor_result status;`
			`int j;`

			`rmdata = talloc(mem_ctx, struct verify_recmaster_data);`
			`CTDB_NO_MEMORY_FATAL(ctdb, rmdata);`
when a node disgrees with us re who is recmaster make it mark that node as a lcuprit so it eventually gets banned (This used to be ctdb commit eff3f326f8ce6070c9f3c430cd14d1b71a8db220) 2008-04-21 18:56:27 +04:00			`rmdata->rec = rec;`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`rmdata->count = 0;`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`rmdata->pnn = pnn;`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`rmdata->status = MONITOR_OK;`

			`/* loop over all active nodes and send an async getrecmaster call to`
			`them*/`
			`for (j=0; j<nodemap->num; j++) {`
			`if (nodemap->nodes[j].flags & NODE_FLAGS_INACTIVE) {`
			`continue;`
			`}`
			`state = ctdb_ctrl_getrecmaster_send(ctdb, mem_ctx,`
get rid of the explicit global timeout used in the previous example and try this time by relying on the timeouts for the individual controls (This used to be ctdb commit 448a0eb4fd896dc545aa0b4bb2ba4628491578be) 2007-08-23 13:38:54 +04:00			`CONTROL_TIMEOUT(),`
change how we do public addresses and takeover so that we can have multiple public addresses spread across multiple interfaces on each node. this is a massive patch since we have previously made the assumtion that we only have one public address per node. get rid of the public_interface argument. the public addresses file now explicitely lists which interface the address belongs to (This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8) 2007-09-04 03:50:07 +04:00			`nodemap->nodes[j].pnn);`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`if (state == NULL) {`
			`/* we failed to send the control, treat this as`
			`an error and try again next iteration`
			`*/`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,("Failed to call ctdb_ctrl_getrecmaster_send during monitoring\n"));`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`talloc_free(mem_ctx);`
			`return MONITOR_FAILED;`
			`}`

change the api for managing callbacks to controls so that isntead of passing it as a parameter we set the callback function explicitely from the caller if the ..._send() function returned a valid state pointer. (This used to be ctdb commit aa939570662786455f63299b62c99882cff29d42) 2007-08-24 04:42:06 +04:00			`/* set up the callback functions */`
			`state->async.fn = verify_recmaster_callback;`
change async.private to async.private_data since private is a reserved work in c++ (This used to be ctdb commit 79eb28f6cd5dcc30b04966d202a050eaf98a2552) 2007-09-26 08:25:32 +04:00			`state->async.private_data = rmdata;`
change the api for managing callbacks to controls so that isntead of passing it as a parameter we set the callback function explicitely from the caller if the ..._send() function returned a valid state pointer. (This used to be ctdb commit aa939570662786455f63299b62c99882cff29d42) 2007-08-24 04:42:06 +04:00
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`/* one more control to wait for to complete */`
			`rmdata->count++;`
			`}`


			`/* now wait for up to the maximum number of seconds allowed`
			`or until all nodes we expect a response from has replied`
			`*/`
get rid of the explicit global timeout used in the previous example and try this time by relying on the timeouts for the individual controls (This used to be ctdb commit 448a0eb4fd896dc545aa0b4bb2ba4628491578be) 2007-08-23 13:38:54 +04:00			`while (rmdata->count > 0) {`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`event_loop_once(ctdb->ev);`
			`}`

			`status = rmdata->status;`
			`talloc_free(mem_ctx);`
			`return status;`
			`}`

add a new file <reclock>.pnn where each recovery daemon can lock that byte at offset==pnn to offer an alternative way to detect which nodes are active instead of relying on CONNECTED being accurate. (This used to be ctdb commit 21d3319eaf463e2a00637d440ee2d4d15f53bf09) 2008-02-29 04:37:42 +03:00			`/*`
			`this function writes the number of connected nodes we have for this pnn`
			`to the pnn slot in the reclock file`
			`*/`
			`static void`
update the reclock pnn count for how many nodes are connected to the current node once every 60 seconds (This used to be ctdb commit bf1863cc9e2539b2c3e53c664b493b459ebfcc8b) 2008-02-29 05:14:47 +03:00			`ctdb_recoverd_write_pnn_connect_count(struct ctdb_recoverd *rec)`
add a new file <reclock>.pnn where each recovery daemon can lock that byte at offset==pnn to offer an alternative way to detect which nodes are active instead of relying on CONNECTED being accurate. (This used to be ctdb commit 21d3319eaf463e2a00637d440ee2d4d15f53bf09) 2008-02-29 04:37:42 +03:00			`{`
add a num_connected field to the rec structure that holds the number of connected nodes num_active only contains the number of active nodes and would thus not count banned nodes (This used to be ctdb commit 06d3ce470766ef0b60d68ccd84de5437146cc147) 2008-03-03 02:24:17 +03:00			`const char count = rec->num_connected;`
add a new file <reclock>.pnn where each recovery daemon can lock that byte at offset==pnn to offer an alternative way to detect which nodes are active instead of relying on CONNECTED being accurate. (This used to be ctdb commit 21d3319eaf463e2a00637d440ee2d4d15f53bf09) 2008-02-29 04:37:42 +03:00			`struct ctdb_context *ctdb = talloc_get_type(rec->ctdb, struct ctdb_context);`

close and reopen the reclock pnn file at regular intervals. handle failure to get/hold the reclock pnn file better and just treat it as a transient backend filesystem error and try again later instead of shutting down the recovery daemon when we have lost the pnn file and if we are recmaster release the recmaster role so that someone else can become recmaster isntead (This used to be ctdb commit e513277fb09b951427be8351d04c877e0a15359d) 2008-05-06 07:27:17 +04:00			`if (rec->rec_file_fd == -1) {`
			`DEBUG(DEBUG_CRIT,(__location__ " Unable to write pnn count. pnnfile is not open.\n"));`
			`return;`
			`}`

add a new file <reclock>.pnn where each recovery daemon can lock that byte at offset==pnn to offer an alternative way to detect which nodes are active instead of relying on CONNECTED being accurate. (This used to be ctdb commit 21d3319eaf463e2a00637d440ee2d4d15f53bf09) 2008-02-29 04:37:42 +03:00			`if (pwrite(rec->rec_file_fd, &count, 1, ctdb->pnn) == -1) {`
			`DEBUG(DEBUG_CRIT, (__location__ " Failed to write pnn count\n"));`
close and reopen the reclock pnn file at regular intervals. handle failure to get/hold the reclock pnn file better and just treat it as a transient backend filesystem error and try again later instead of shutting down the recovery daemon when we have lost the pnn file and if we are recmaster release the recmaster role so that someone else can become recmaster isntead (This used to be ctdb commit e513277fb09b951427be8351d04c877e0a15359d) 2008-05-06 07:27:17 +04:00			`close(rec->rec_file_fd);`
			`rec->rec_file_fd = -1;`
add a new file <reclock>.pnn where each recovery daemon can lock that byte at offset==pnn to offer an alternative way to detect which nodes are active instead of relying on CONNECTED being accurate. (This used to be ctdb commit 21d3319eaf463e2a00637d440ee2d4d15f53bf09) 2008-02-29 04:37:42 +03:00			`}`
			`}`

			`/*`
			`this function opens the reclock file and sets a byterage lock for the single`
			`byte at position pnn+1.`
			`the existence/non-existence of such a lock provides an alternative mechanism`
			`to know whether a remote node(recovery daemon) is running or not.`
			`*/`
			`static void`
			`ctdb_recoverd_get_pnn_lock(struct ctdb_recoverd *rec)`
			`{`
			`struct ctdb_context *ctdb = talloc_get_type(rec->ctdb, struct ctdb_context);`
			`struct flock lock;`
			`char *pnnfile = NULL;`

			`DEBUG(DEBUG_INFO, ("Setting PNN lock for pnn:%d\n", ctdb->pnn));`

			`if (rec->rec_file_fd != -1) {`
close and reopen the reclock pnn file at regular intervals. handle failure to get/hold the reclock pnn file better and just treat it as a transient backend filesystem error and try again later instead of shutting down the recovery daemon when we have lost the pnn file and if we are recmaster release the recmaster role so that someone else can become recmaster isntead (This used to be ctdb commit e513277fb09b951427be8351d04c877e0a15359d) 2008-05-06 07:27:17 +04:00			`close(rec->rec_file_fd);`
			`rec->rec_file_fd = -1;`
add a new file <reclock>.pnn where each recovery daemon can lock that byte at offset==pnn to offer an alternative way to detect which nodes are active instead of relying on CONNECTED being accurate. (This used to be ctdb commit 21d3319eaf463e2a00637d440ee2d4d15f53bf09) 2008-02-29 04:37:42 +03:00			`}`

			`pnnfile = talloc_asprintf(rec, "%s.pnn", ctdb->recovery_lock_file);`
			`CTDB_NO_MEMORY_FATAL(ctdb, pnnfile);`

			`rec->rec_file_fd = open(pnnfile, O_RDWR\|O_CREAT, 0600);`
			`if (rec->rec_file_fd == -1) {`
			`DEBUG(DEBUG_CRIT,(__location__ " Unable to open %s - (%s)\n",`
			`pnnfile, strerror(errno)));`
close and reopen the reclock pnn file at regular intervals. handle failure to get/hold the reclock pnn file better and just treat it as a transient backend filesystem error and try again later instead of shutting down the recovery daemon when we have lost the pnn file and if we are recmaster release the recmaster role so that someone else can become recmaster isntead (This used to be ctdb commit e513277fb09b951427be8351d04c877e0a15359d) 2008-05-06 07:27:17 +04:00			`talloc_free(pnnfile);`
			`return;`
add a new file <reclock>.pnn where each recovery daemon can lock that byte at offset==pnn to offer an alternative way to detect which nodes are active instead of relying on CONNECTED being accurate. (This used to be ctdb commit 21d3319eaf463e2a00637d440ee2d4d15f53bf09) 2008-02-29 04:37:42 +03:00			`}`

			`set_close_on_exec(rec->rec_file_fd);`
			`lock.l_type = F_WRLCK;`
			`lock.l_whence = SEEK_SET;`
			`lock.l_start = ctdb->pnn;`
			`lock.l_len = 1;`
			`lock.l_pid = 0;`

			`if (fcntl(rec->rec_file_fd, F_SETLK, &lock) != 0) {`
			`close(rec->rec_file_fd);`
			`rec->rec_file_fd = -1;`
			`DEBUG(DEBUG_CRIT,(__location__ " Failed to get pnn lock on '%s'\n", pnnfile));`
close and reopen the reclock pnn file at regular intervals. handle failure to get/hold the reclock pnn file better and just treat it as a transient backend filesystem error and try again later instead of shutting down the recovery daemon when we have lost the pnn file and if we are recmaster release the recmaster role so that someone else can become recmaster isntead (This used to be ctdb commit e513277fb09b951427be8351d04c877e0a15359d) 2008-05-06 07:27:17 +04:00			`talloc_free(pnnfile);`
			`return;`
add a new file <reclock>.pnn where each recovery daemon can lock that byte at offset==pnn to offer an alternative way to detect which nodes are active instead of relying on CONNECTED being accurate. (This used to be ctdb commit 21d3319eaf463e2a00637d440ee2d4d15f53bf09) 2008-02-29 04:37:42 +03:00			`}`


			`DEBUG(DEBUG_NOTICE,(__location__ " Got pnn lock on '%s'\n", pnnfile));`
			`talloc_free(pnnfile);`

			`/* we start out with 0 connected nodes */`
update the reclock pnn count for how many nodes are connected to the current node once every 60 seconds (This used to be ctdb commit bf1863cc9e2539b2c3e53c664b493b459ebfcc8b) 2008-02-29 05:14:47 +03:00			`ctdb_recoverd_write_pnn_connect_count(rec);`
			`}`

			`/*`
			`called when we need to do the periodical reclock pnn count update`
			`*/`
			`static void ctdb_update_pnn_count(struct event_context ev, struct timed_event te,`
			`struct timeval t, void *p)`
			`{`
add a new tunable : reclockpingperiod once every such interval : * the recovery master on each node will uppdate the "connected" count in the reclock count file (ctdb getreclock) * if the node thinks it is a recovery master but it detects another node that is DISCONNECTED but which still holds a lock to the reclock count file this may mean that we have a split cluster. if that other node that is DISCONNECTED but still holds the lock on hte reclock pnn count file, is MORE connected than the local node, yield the recmaster role and let the other half of the lcuster take over this add a second, last chance mechanism to detect split clusters. IF the cluster is split but GPFS is not yet split, this mechanism makes the largest half of the cluster become the active half. (This used to be ctdb commit 07af425f444531942cce8abff112c1524228d287) 2008-03-03 01:19:30 +03:00			`int i, count;`
			`struct ctdb_recoverd *rec = talloc_get_type(p, struct ctdb_recoverd);`
			`struct ctdb_context *ctdb = rec->ctdb;`
			`struct ctdb_node_map *nodemap = rec->nodemap;`
update the reclock pnn count for how many nodes are connected to the current node once every 60 seconds (This used to be ctdb commit bf1863cc9e2539b2c3e53c664b493b459ebfcc8b) 2008-02-29 05:14:47 +03:00
close and reopen the reclock pnn file at regular intervals. handle failure to get/hold the reclock pnn file better and just treat it as a transient backend filesystem error and try again later instead of shutting down the recovery daemon when we have lost the pnn file and if we are recmaster release the recmaster role so that someone else can become recmaster isntead (This used to be ctdb commit e513277fb09b951427be8351d04c877e0a15359d) 2008-05-06 07:27:17 +04:00			`/* close and reopen the pnn lock file */`
			`ctdb_recoverd_get_pnn_lock(rec);`

update the reclock pnn count for how many nodes are connected to the current node once every 60 seconds (This used to be ctdb commit bf1863cc9e2539b2c3e53c664b493b459ebfcc8b) 2008-02-29 05:14:47 +03:00			`ctdb_recoverd_write_pnn_connect_count(rec);`

add a new tunable : reclockpingperiod once every such interval : * the recovery master on each node will uppdate the "connected" count in the reclock count file (ctdb getreclock) * if the node thinks it is a recovery master but it detects another node that is DISCONNECTED but which still holds a lock to the reclock count file this may mean that we have a split cluster. if that other node that is DISCONNECTED but still holds the lock on hte reclock pnn count file, is MORE connected than the local node, yield the recmaster role and let the other half of the lcuster take over this add a second, last chance mechanism to detect split clusters. IF the cluster is split but GPFS is not yet split, this mechanism makes the largest half of the cluster become the active half. (This used to be ctdb commit 07af425f444531942cce8abff112c1524228d287) 2008-03-03 01:19:30 +03:00			`event_add_timed(rec->ctdb->ev, rec->ctdb,`
			`timeval_current_ofs(ctdb->tunable.reclock_ping_period, 0),`
			`ctdb_update_pnn_count, rec);`

			`/* check if there is a split cluster and yeld the recmaster role`
			`it the other half of the cluster is larger`
			`*/`
			`DEBUG(DEBUG_DEBUG, ("CHECK FOR SPLIT CLUSTER\n"));`
			`if (rec->nodemap == NULL) {`
			`return;`
			`}`
			`if (rec->rec_file_fd == -1) {`
			`return;`
			`}`
			`/* only test this if we think we are the recmaster */`
			`if (ctdb->pnn != rec->recmaster) {`
			`DEBUG(DEBUG_DEBUG, ("We are not recmaster, skip test\n"));`
			`return;`
			`}`
			`if (ctdb->recovery_lock_fd == -1) {`
close and reopen the reclock pnn file at regular intervals. handle failure to get/hold the reclock pnn file better and just treat it as a transient backend filesystem error and try again later instead of shutting down the recovery daemon when we have lost the pnn file and if we are recmaster release the recmaster role so that someone else can become recmaster isntead (This used to be ctdb commit e513277fb09b951427be8351d04c877e0a15359d) 2008-05-06 07:27:17 +04:00			`DEBUG(DEBUG_ERR, (__location__ " Lost reclock pnn file. Yielding recmaster role\n"));`
			`close(ctdb->recovery_lock_fd);`
			`ctdb->recovery_lock_fd = -1;`
			`force_election(rec, ctdb->pnn, rec->nodemap);`
add a new tunable : reclockpingperiod once every such interval : * the recovery master on each node will uppdate the "connected" count in the reclock count file (ctdb getreclock) * if the node thinks it is a recovery master but it detects another node that is DISCONNECTED but which still holds a lock to the reclock count file this may mean that we have a split cluster. if that other node that is DISCONNECTED but still holds the lock on hte reclock pnn count file, is MORE connected than the local node, yield the recmaster role and let the other half of the lcuster take over this add a second, last chance mechanism to detect split clusters. IF the cluster is split but GPFS is not yet split, this mechanism makes the largest half of the cluster become the active half. (This used to be ctdb commit 07af425f444531942cce8abff112c1524228d287) 2008-03-03 01:19:30 +03:00			`return;`
			`}`
			`for (i=0; i<nodemap->num; i++) {`
			`/* we dont need to check ourself */`
			`if (nodemap->nodes[i].pnn == ctdb->pnn) {`
			`continue;`
			`}`
			`/* dont check nodes that are connected to us */`
			`if (!(nodemap->nodes[i].flags & NODE_FLAGS_DISCONNECTED)) {`
			`continue;`
			`}`
			`/* check if the node is "connected" and how connected it it */`
			`count = ctdb_read_pnn_lock(rec->rec_file_fd, nodemap->nodes[i].pnn);`
			`if (count < 0) {`
			`continue;`
			`}`
			`/* check if that node is more connected that us */`
add a num_connected field to the rec structure that holds the number of connected nodes num_active only contains the number of active nodes and would thus not count banned nodes (This used to be ctdb commit 06d3ce470766ef0b60d68ccd84de5437146cc147) 2008-03-03 02:24:17 +03:00			`if (count > rec->num_connected) {`
add a new tunable : reclockpingperiod once every such interval : * the recovery master on each node will uppdate the "connected" count in the reclock count file (ctdb getreclock) * if the node thinks it is a recovery master but it detects another node that is DISCONNECTED but which still holds a lock to the reclock count file this may mean that we have a split cluster. if that other node that is DISCONNECTED but still holds the lock on hte reclock pnn count file, is MORE connected than the local node, yield the recmaster role and let the other half of the lcuster take over this add a second, last chance mechanism to detect split clusters. IF the cluster is split but GPFS is not yet split, this mechanism makes the largest half of the cluster become the active half. (This used to be ctdb commit 07af425f444531942cce8abff112c1524228d287) 2008-03-03 01:19:30 +03:00			`DEBUG(DEBUG_ERR, ("DISCONNECTED Node %u is more connected than we are, yielding recmaster role\n", nodemap->nodes[i].pnn));`
			`close(ctdb->recovery_lock_fd);`
			`ctdb->recovery_lock_fd = -1;`
			`force_election(rec, ctdb->pnn, rec->nodemap);`
			`return;`
			`}`
			`}`
add a new file <reclock>.pnn where each recovery daemon can lock that byte at offset==pnn to offer an alternative way to detect which nodes are active instead of relying on CONNECTED being accurate. (This used to be ctdb commit 21d3319eaf463e2a00637d440ee2d4d15f53bf09) 2008-02-29 04:37:42 +03:00			`}`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00
make recovery daemon values tunable (This used to be ctdb commit ec29dbf2f5110428df8b97801443ba7addf61353) 2007-06-04 14:22:44 +04:00			`/*`
			`the main monitoring loop`
			`*/`
clean out some more cruft (This used to be ctdb commit ad16c5fe2748b48a6f6c79976359d56d9bed33f4) 2007-06-05 11:57:07 +04:00			`static void monitor_cluster(struct ctdb_context *ctdb)`
start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00			`{`
change recmaster from being a local variable in monitor_cluster() to be a member of the ctdb_recoverd structure (This used to be ctdb commit b7f955338f50c92374b4f559268fb3a1a516aefa) 2008-03-02 23:53:46 +03:00			`uint32_t pnn;`
start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00			`TALLOC_CTX *mem_ctx=NULL;`
change the signature for ctdb_ctrl_getnodemap() so that a timeout parameter is added. change ctdb_get_connected_nodes in the same way (This used to be ctdb commit d85f23bcf4c1230225abb2f4a053c70b68d469aa) 2007-05-04 03:01:01 +04:00			`struct ctdb_node_map *nodemap=NULL;`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`struct ctdb_node_map *remote_nodemap=NULL;`
			`struct ctdb_vnn_map *vnnmap=NULL;`
			`struct ctdb_vnn_map *remote_vnnmap=NULL;`
read the current debuglevel in each loop in the recovery daemon so that we pick up when they change in the parent daemon (This used to be ctdb commit 792d5471ff0c2947b6e66183925860de27f30eaf) 2008-02-18 11:38:04 +03:00			`int32_t debug_level;`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`int i, j, ret;`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`struct ctdb_recoverd *rec;`
let each node verify that they have a correct assignment of public ip addresses (i.e. htey hold those they should hold and they dont hold any of those they shouldnt hold) if an inconsistency is found, mark the local node as recovery mode active and wait for the recovery master to trigger a full blown recovery (This used to be ctdb commit 55a5bfc8244c5b9cdda3f11992f384f00566b5dc) 2007-09-14 04:16:36 +04:00			`struct ctdb_all_public_ips *ips;`
- catch ESTALE in the recovery lock by trying a read() - priortise nodes that are unbanned and healthy in the election (This used to be ctdb commit 929feb475dfdf7283f0e99b50b179e1c91d3a39f) 2007-10-05 07:28:21 +04:00			`char c;`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_NOTICE,("monitor_cluster starting\n"));`
added some debug lines to help track down a problem (This used to be ctdb commit 2ca31e9de179f76e392a26cc8305e2473357c760) 2007-10-18 10:27:36 +04:00
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`rec = talloc_zero(ctdb, struct ctdb_recoverd);`
			`CTDB_NO_MEMORY_FATAL(ctdb, rec);`

			`rec->ctdb = ctdb;`
added admin commands to ban/unban nodes (This used to be ctdb commit 4dad04172e7e4955b5bf6444a85b19901c9683ad) 2007-06-07 10:34:33 +04:00			`rec->banned_nodes = talloc_zero_array(rec, struct ban_state *, ctdb->num_nodes);`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`CTDB_NO_MEMORY_FATAL(ctdb, rec->banned_nodes);`

use a priority time for the election data, not just the vnn (This used to be ctdb commit a691f9c5cd77194005f0d98483da94b07a48d57d) 2007-06-07 12:37:27 +04:00			`rec->priority_time = timeval_current();`

add a new file <reclock>.pnn where each recovery daemon can lock that byte at offset==pnn to offer an alternative way to detect which nodes are active instead of relying on CONNECTED being accurate. (This used to be ctdb commit 21d3319eaf463e2a00637d440ee2d4d15f53bf09) 2008-02-29 04:37:42 +03:00			`/* open the rec file fd and lock our slot */`
			`rec->rec_file_fd = -1;`
			`ctdb_recoverd_get_pnn_lock(rec);`

add improvements to tracking memory usage in ctdbd adn the recovery daemon and a ctdb command to pull the talloc memory map from a recovery daemon ctdb rddumpmemory (This used to be ctdb commit d23950be7406cf288f48b660c0f57a9b8d7bdd05) 2008-04-01 08:34:54 +04:00			`/* register a message port for sending memory dumps */`
			`ctdb_set_message_handler(ctdb, CTDB_SRVID_MEM_DUMP, mem_dump_handler, rec);`

implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`/* register a message port for recovery elections */`
			`ctdb_set_message_handler(ctdb, CTDB_SRVID_RECOVERY, election_handler, rec);`

			`/* and one for when nodes are disabled/enabled */`
			`ctdb_set_message_handler(ctdb, CTDB_SRVID_NODE_FLAGS_CHANGED, monitor_handler, rec);`
added admin commands to ban/unban nodes (This used to be ctdb commit 4dad04172e7e4955b5bf6444a85b19901c9683ad) 2007-06-07 10:34:33 +04:00
			`/* and one for when nodes are banned */`
			`ctdb_set_message_handler(ctdb, CTDB_SRVID_BAN_NODE, ban_handler, rec);`

			`/* and one for when nodes are unbanned */`
			`ctdb_set_message_handler(ctdb, CTDB_SRVID_UNBAN_NODE, unban_handler, rec);`
added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 09:23:27 +03:00
			`/* register a message port for vacuum fetch */`
			`ctdb_set_message_handler(ctdb, CTDB_SRVID_VACUUM_FETCH, vacuum_fetch_handler, rec);`
update the reclock pnn count for how many nodes are connected to the current node once every 60 seconds (This used to be ctdb commit bf1863cc9e2539b2c3e53c664b493b459ebfcc8b) 2008-02-29 05:14:47 +03:00
			`/* update the reclock pnn file connected count on a regular basis */`
add a new tunable : reclockpingperiod once every such interval : * the recovery master on each node will uppdate the "connected" count in the reclock count file (ctdb getreclock) * if the node thinks it is a recovery master but it detects another node that is DISCONNECTED but which still holds a lock to the reclock count file this may mean that we have a split cluster. if that other node that is DISCONNECTED but still holds the lock on hte reclock pnn count file, is MORE connected than the local node, yield the recmaster role and let the other half of the lcuster take over this add a second, last chance mechanism to detect split clusters. IF the cluster is split but GPFS is not yet split, this mechanism makes the largest half of the cluster become the active half. (This used to be ctdb commit 07af425f444531942cce8abff112c1524228d287) 2008-03-03 01:19:30 +03:00			`event_add_timed(ctdb->ev, ctdb,`
			`timeval_current_ofs(ctdb->tunable.reclock_ping_period, 0),`
			`ctdb_update_pnn_count, rec);`
update the reclock pnn count for how many nodes are connected to the current node once every 60 seconds (This used to be ctdb commit bf1863cc9e2539b2c3e53c664b493b459ebfcc8b) 2008-02-29 05:14:47 +03:00
start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00			`again:`
			`if (mem_ctx) {`
			`talloc_free(mem_ctx);`
			`mem_ctx = NULL;`
			`}`
			`mem_ctx = talloc_new(ctdb);`
			`if (!mem_ctx) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_CRIT,(__location__ " Failed to create temporary context\n"));`
start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00			`exit(-1);`
			`}`

			`/* we only check for recovery once every second */`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`ctdb_wait_timeout(ctdb, ctdb->tunable.recover_interval);`
make recovery daemon values tunable (This used to be ctdb commit ec29dbf2f5110428df8b97801443ba7addf61353) 2007-06-04 14:22:44 +04:00
merge from ronnie (This used to be ctdb commit 0aa6e04438aa5ec727815689baa19544df042cf7) 2008-01-07 08:17:22 +03:00			`/* verify that the main daemon is still running */`
			`if (kill(ctdb->ctdbd_pid, 0) != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_CRIT,("CTDB daemon is no longer available. Shutting down recovery daemon\n"));`
merge from ronnie (This used to be ctdb commit 0aa6e04438aa5ec727815689baa19544df042cf7) 2008-01-07 08:17:22 +03:00			`exit(-1);`
			`}`

make election handling much more scalable (This used to be ctdb commit 05938d462b92bd9ecb8e35f53651bded47c48675) 2007-11-13 02:27:44 +03:00			`if (rec->election_timeout) {`
			`/* an election is in progress */`
			`goto again;`
			`}`

read the current debuglevel in each loop in the recovery daemon so that we pick up when they change in the parent daemon (This used to be ctdb commit 792d5471ff0c2947b6e66183925860de27f30eaf) 2008-02-18 11:38:04 +03:00			`/* read the debug level from the parent and update locally */`
			`ret = ctdb_ctrl_get_debuglevel(ctdb, CTDB_CURRENT_NODE, &debug_level);`
			`if (ret !=0) {`
			`DEBUG(DEBUG_ERR, (__location__ " Failed to read debuglevel from parent\n"));`
			`goto again;`
			`}`
			`LogLevel = debug_level;`

move ctdb_set_culprit higher up in the file when we are the recmaster and we update the local flags for all the nodes, if one of the nodes fail to respond and give us his flags, set that node as a "culprit" as one of the first things to do in the monitor_cluster loop, check if the current culprit has caused too many (20) failures and if so ban that node. this is for the situation where a remote node may still be CONNECTED but it fails to respond to the getnodemap control causing the recovery master to loop in monitor_cluster aborting the monitoring when the node fails to respond but before anything will trigger a call to do_recovery(). If one or more of the databases or nodes are frozen at this stage, this would lead to smbd being blocked for potentially a longish time. (This used to be ctdb commit 83b0261f2cb453195b86f547d360400103a8b795) 2007-11-28 07:04:20 +03:00
			`/* We must check if we need to ban a node here but we want to do this`
			`as early as possible so we dont wait until we have pulled the node`
			`map from the local node. thats why we have the hardcoded value 20`
			`*/`
			`if (rec->culprit_counter > 20) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_NOTICE,("Node %u has caused %u failures in %.0f seconds - banning it for %u seconds\n",`
move ctdb_set_culprit higher up in the file when we are the recmaster and we update the local flags for all the nodes, if one of the nodes fail to respond and give us his flags, set that node as a "culprit" as one of the first things to do in the monitor_cluster loop, check if the current culprit has caused too many (20) failures and if so ban that node. this is for the situation where a remote node may still be CONNECTED but it fails to respond to the getnodemap control causing the recovery master to loop in monitor_cluster aborting the monitoring when the node fails to respond but before anything will trigger a call to do_recovery(). If one or more of the databases or nodes are frozen at this stage, this would lead to smbd being blocked for potentially a longish time. (This used to be ctdb commit 83b0261f2cb453195b86f547d360400103a8b795) 2007-11-28 07:04:20 +03:00			`rec->last_culprit, rec->culprit_counter, timeval_elapsed(&rec->first_recover_time),`
			`ctdb->tunable.recovery_ban_period));`
			`ctdb_ban_node(rec, rec->last_culprit, ctdb->tunable.recovery_ban_period);`
			`}`

make recovery daemon values tunable (This used to be ctdb commit ec29dbf2f5110428df8b97801443ba7addf61353) 2007-06-04 14:22:44 +04:00			`/* get relevant tunables */`
get all the tunables at once in recovery daemon (This used to be ctdb commit 8e60be6c22aab145e68b16ede5f32f4430c2af93) 2007-06-07 12:05:25 +04:00			`ret = ctdb_ctrl_get_all_tunables(ctdb, CONTROL_TIMEOUT(), CTDB_CURRENT_NODE, &ctdb->tunable);`
			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,("Failed to get tunables - retrying\n"));`
get all the tunables at once in recovery daemon (This used to be ctdb commit 8e60be6c22aab145e68b16ede5f32f4430c2af93) 2007-06-07 12:05:25 +04:00			`goto again;`
			`}`
start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00
change ctdb_ctrl_getvnn to ctdb_ctrl_getpnn (This used to be ctdb commit ef47cc4cd416065c69382e4d9e76c30a0a34e42f) 2007-09-04 04:38:48 +04:00			`pnn = ctdb_ctrl_getpnn(ctdb, CONTROL_TIMEOUT(), CTDB_CURRENT_NODE);`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`if (pnn == (uint32_t)-1) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,("Failed to get local pnn - retrying\n"));`
- startup frozen, and do an initial recovery - fixed a bug in traverse - get a lock on the node list file in the recmaster recovery daemon (This used to be ctdb commit 162a5647535ad1cb3e8e5d4042a2784365fb1913) 2007-05-23 08:35:19 +04:00			`goto again;`
			`}`
start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00
add a test in the function that checks whether the cluster needs recovery or not that all active nodes are in normal mode. If we discover that some node is still in recoverymode it may indicate that a previous recovery ended prematurely and thus we should start a new recovery (This used to be ctdb commit c15517872e6c98c8c425a8d47d2b348ecb0620b0) 2007-05-06 22:41:12 +04:00			`/* get the vnnmap */`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`ret = ctdb_ctrl_getvnnmap(ctdb, CONTROL_TIMEOUT(), pnn, mem_ctx, &vnnmap);`
add a test in the function that checks whether the cluster needs recovery or not that all active nodes are in normal mode. If we discover that some node is still in recoverymode it may indicate that a previous recovery ended prematurely and thus we should start a new recovery (This used to be ctdb commit c15517872e6c98c8c425a8d47d2b348ecb0620b0) 2007-05-06 22:41:12 +04:00			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to get vnnmap from node %u\n", pnn));`
add a test in the function that checks whether the cluster needs recovery or not that all active nodes are in normal mode. If we discover that some node is still in recoverymode it may indicate that a previous recovery ended prematurely and thus we should start a new recovery (This used to be ctdb commit c15517872e6c98c8c425a8d47d2b348ecb0620b0) 2007-05-06 22:41:12 +04:00			`goto again;`
			`}`


start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00			`/* get number of nodes */`
add a new tunable : reclockpingperiod once every such interval : * the recovery master on each node will uppdate the "connected" count in the reclock count file (ctdb getreclock) * if the node thinks it is a recovery master but it detects another node that is DISCONNECTED but which still holds a lock to the reclock count file this may mean that we have a split cluster. if that other node that is DISCONNECTED but still holds the lock on hte reclock pnn count file, is MORE connected than the local node, yield the recmaster role and let the other half of the lcuster take over this add a second, last chance mechanism to detect split clusters. IF the cluster is split but GPFS is not yet split, this mechanism makes the largest half of the cluster become the active half. (This used to be ctdb commit 07af425f444531942cce8abff112c1524228d287) 2008-03-03 01:19:30 +03:00			`if (rec->nodemap) {`
			`talloc_free(rec->nodemap);`
			`rec->nodemap = NULL;`
			`nodemap=NULL;`
			`}`
			`ret = ctdb_ctrl_getnodemap(ctdb, CONTROL_TIMEOUT(), pnn, rec, &rec->nodemap);`
change the signature for ctdb_ctrl_getnodemap() so that a timeout parameter is added. change ctdb_get_connected_nodes in the same way (This used to be ctdb commit d85f23bcf4c1230225abb2f4a053c70b68d469aa) 2007-05-04 03:01:01 +04:00			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to get nodemap from node %u\n", pnn));`
change the signature for ctdb_ctrl_getnodemap() so that a timeout parameter is added. change ctdb_get_connected_nodes in the same way (This used to be ctdb commit d85f23bcf4c1230225abb2f4a053c70b68d469aa) 2007-05-04 03:01:01 +04:00			`goto again;`
			`}`
add a new tunable : reclockpingperiod once every such interval : * the recovery master on each node will uppdate the "connected" count in the reclock count file (ctdb getreclock) * if the node thinks it is a recovery master but it detects another node that is DISCONNECTED but which still holds a lock to the reclock count file this may mean that we have a split cluster. if that other node that is DISCONNECTED but still holds the lock on hte reclock pnn count file, is MORE connected than the local node, yield the recmaster role and let the other half of the lcuster take over this add a second, last chance mechanism to detect split clusters. IF the cluster is split but GPFS is not yet split, this mechanism makes the largest half of the cluster become the active half. (This used to be ctdb commit 07af425f444531942cce8abff112c1524228d287) 2008-03-03 01:19:30 +03:00			`nodemap = rec->nodemap;`
start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00
prevent a re-ban loop for single node clusters (This used to be ctdb commit b20a3369655bcba274c99091157ba7466994e848) 2008-01-04 04:11:29 +03:00			`/* check which node is the recovery master */`
change recmaster from being a local variable in monitor_cluster() to be a member of the ctdb_recoverd structure (This used to be ctdb commit b7f955338f50c92374b4f559268fb3a1a516aefa) 2008-03-02 23:53:46 +03:00			`ret = ctdb_ctrl_getrecmaster(ctdb, mem_ctx, CONTROL_TIMEOUT(), pnn, &rec->recmaster);`
prevent a re-ban loop for single node clusters (This used to be ctdb commit b20a3369655bcba274c99091157ba7466994e848) 2008-01-04 04:11:29 +03:00			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to get recmaster from node %u\n", pnn));`
prevent a re-ban loop for single node clusters (This used to be ctdb commit b20a3369655bcba274c99091157ba7466994e848) 2008-01-04 04:11:29 +03:00			`goto again;`
			`}`

change recmaster from being a local variable in monitor_cluster() to be a member of the ctdb_recoverd structure (This used to be ctdb commit b7f955338f50c92374b4f559268fb3a1a516aefa) 2008-03-02 23:53:46 +03:00			`if (rec->recmaster == (uint32_t)-1) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_NOTICE,(__location__ " Initial recovery master set - forcing election\n"));`
add a new tunable : reclockpingperiod once every such interval : * the recovery master on each node will uppdate the "connected" count in the reclock count file (ctdb getreclock) * if the node thinks it is a recovery master but it detects another node that is DISCONNECTED but which still holds a lock to the reclock count file this may mean that we have a split cluster. if that other node that is DISCONNECTED but still holds the lock on hte reclock pnn count file, is MORE connected than the local node, yield the recmaster role and let the other half of the lcuster take over this add a second, last chance mechanism to detect split clusters. IF the cluster is split but GPFS is not yet split, this mechanism makes the largest half of the cluster become the active half. (This used to be ctdb commit 07af425f444531942cce8abff112c1524228d287) 2008-03-03 01:19:30 +03:00			`force_election(rec, pnn, nodemap);`
prevent a re-ban loop for single node clusters (This used to be ctdb commit b20a3369655bcba274c99091157ba7466994e848) 2008-01-04 04:11:29 +03:00			`goto again;`
			`}`

when monitoring the node from the recovery daemon, check that the recovery daemon and the ctdb daemon both agree on whether the node is banned or not and if they disagree then reban the node again after logging an error to the debug log (This used to be ctdb commit 6cd6e534493066edd4bb2c6ae5be0e9a9d495aa0) 2007-11-23 04:41:29 +03:00			`/* check that we (recovery daemon) and the local ctdb daemon`
			`agrees on whether we are banned or not`
			`*/`
			`if (nodemap->nodes[pnn].flags & NODE_FLAGS_BANNED) {`
			`if (rec->banned_nodes[pnn] == NULL) {`
change recmaster from being a local variable in monitor_cluster() to be a member of the ctdb_recoverd structure (This used to be ctdb commit b7f955338f50c92374b4f559268fb3a1a516aefa) 2008-03-02 23:53:46 +03:00			`if (rec->recmaster == pnn) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_NOTICE,("Local ctdb daemon on recmaster thinks this node is BANNED but the recovery master disagrees. Unbanning the node\n"));`
prevent a re-ban loop for single node clusters (This used to be ctdb commit b20a3369655bcba274c99091157ba7466994e848) 2008-01-04 04:11:29 +03:00
			`ctdb_unban_node(rec, pnn);`
			`} else {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_NOTICE,("Local ctdb daemon on non-recmaster thinks this node is BANNED but the recovery master disagrees. Re-banning the node\n"));`
prevent a re-ban loop for single node clusters (This used to be ctdb commit b20a3369655bcba274c99091157ba7466994e848) 2008-01-04 04:11:29 +03:00			`ctdb_ban_node(rec, pnn, ctdb->tunable.recovery_ban_period);`
			`ctdb_set_culprit(rec, pnn);`
			`}`
when monitoring the node from the recovery daemon, check that the recovery daemon and the ctdb daemon both agree on whether the node is banned or not and if they disagree then reban the node again after logging an error to the debug log (This used to be ctdb commit 6cd6e534493066edd4bb2c6ae5be0e9a9d495aa0) 2007-11-23 04:41:29 +03:00			`goto again;`
			`}`
			`} else {`
			`if (rec->banned_nodes[pnn] != NULL) {`
change recmaster from being a local variable in monitor_cluster() to be a member of the ctdb_recoverd structure (This used to be ctdb commit b7f955338f50c92374b4f559268fb3a1a516aefa) 2008-03-02 23:53:46 +03:00			`if (rec->recmaster == pnn) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_NOTICE,("Local ctdb daemon on recmaster does not think this node is BANNED but the recovery master disagrees. Unbanning the node\n"));`
when monitoring the node from the recovery daemon, check that the recovery daemon and the ctdb daemon both agree on whether the node is banned or not and if they disagree then reban the node again after logging an error to the debug log (This used to be ctdb commit 6cd6e534493066edd4bb2c6ae5be0e9a9d495aa0) 2007-11-23 04:41:29 +03:00
prevent a re-ban loop for single node clusters (This used to be ctdb commit b20a3369655bcba274c99091157ba7466994e848) 2008-01-04 04:11:29 +03:00			`ctdb_unban_node(rec, pnn);`
			`} else {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_NOTICE,("Local ctdb daemon on non-recmaster does not think this node is BANNED but the recovery master disagrees. Re-banning the node\n"));`
when monitoring the node from the recovery daemon, check that the recovery daemon and the ctdb daemon both agree on whether the node is banned or not and if they disagree then reban the node again after logging an error to the debug log (This used to be ctdb commit 6cd6e534493066edd4bb2c6ae5be0e9a9d495aa0) 2007-11-23 04:41:29 +03:00
prevent a re-ban loop for single node clusters (This used to be ctdb commit b20a3369655bcba274c99091157ba7466994e848) 2008-01-04 04:11:29 +03:00			`ctdb_ban_node(rec, pnn, ctdb->tunable.recovery_ban_period);`
			`ctdb_set_culprit(rec, pnn);`
			`}`
when monitoring the node from the recovery daemon, check that the recovery daemon and the ctdb daemon both agree on whether the node is banned or not and if they disagree then reban the node again after logging an error to the debug log (This used to be ctdb commit 6cd6e534493066edd4bb2c6ae5be0e9a9d495aa0) 2007-11-23 04:41:29 +03:00			`goto again;`
			`}`
			`}`

- catch ESTALE in the recovery lock by trying a read() - priortise nodes that are unbanned and healthy in the election (This used to be ctdb commit 929feb475dfdf7283f0e99b50b179e1c91d3a39f) 2007-10-05 07:28:21 +04:00			`/* remember our own node flags */`
			`rec->node_flags = nodemap->nodes[pnn].flags;`
add a test in the function that checks whether the cluster needs recovery or not that all active nodes are in normal mode. If we discover that some node is still in recoverymode it may indicate that a previous recovery ended prematurely and thus we should start a new recovery (This used to be ctdb commit c15517872e6c98c8c425a8d47d2b348ecb0620b0) 2007-05-06 22:41:12 +04:00
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`/* count how many active nodes there are */`
add a num_connected field to the rec structure that holds the number of connected nodes num_active only contains the number of active nodes and would thus not count banned nodes (This used to be ctdb commit 06d3ce470766ef0b60d68ccd84de5437146cc147) 2008-03-03 02:24:17 +03:00			`rec->num_active = 0;`
			`rec->num_connected = 0;`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`for (i=0; i<nodemap->num; i++) {`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`if (!(nodemap->nodes[i].flags & NODE_FLAGS_INACTIVE)) {`
store the num_active variable (number of connected/active nodes) inside the rec structure and avoid passing this as an extra parameter to do_recovery() (This used to be ctdb commit 8bb229aa3b4bd41e48d4e4e2e148d8680c8ba436) 2008-02-29 04:55:20 +03:00			`rec->num_active++;`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`}`
add a num_connected field to the rec structure that holds the number of connected nodes num_active only contains the number of active nodes and would thus not count banned nodes (This used to be ctdb commit 06d3ce470766ef0b60d68ccd84de5437146cc147) 2008-03-03 02:24:17 +03:00			`if (!(nodemap->nodes[i].flags & NODE_FLAGS_DISCONNECTED)) {`
			`rec->num_connected++;`
			`}`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`}`


recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`/* verify that the recmaster node is still active */`
			`for (j=0; j<nodemap->num; j++) {`
change recmaster from being a local variable in monitor_cluster() to be a member of the ctdb_recoverd structure (This used to be ctdb commit b7f955338f50c92374b4f559268fb3a1a516aefa) 2008-03-02 23:53:46 +03:00			`if (nodemap->nodes[j].pnn==rec->recmaster) {`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`break;`
			`}`
setup the random number generator a bit better (This used to be ctdb commit 708585eb0ed31b0df6543a1d7a20b82e751877c2) 2007-05-10 07:10:23 +04:00			`}`
- startup frozen, and do an initial recovery - fixed a bug in traverse - get a lock on the node list file in the recmaster recovery daemon (This used to be ctdb commit 162a5647535ad1cb3e8e5d4042a2784365fb1913) 2007-05-23 08:35:19 +04:00
			`if (j == nodemap->num) {`
change recmaster from being a local variable in monitor_cluster() to be a member of the ctdb_recoverd structure (This used to be ctdb commit b7f955338f50c92374b4f559268fb3a1a516aefa) 2008-03-02 23:53:46 +03:00			`DEBUG(DEBUG_ERR, ("Recmaster node %u not in list. Force reelection\n", rec->recmaster));`
add a new tunable : reclockpingperiod once every such interval : * the recovery master on each node will uppdate the "connected" count in the reclock count file (ctdb getreclock) * if the node thinks it is a recovery master but it detects another node that is DISCONNECTED but which still holds a lock to the reclock count file this may mean that we have a split cluster. if that other node that is DISCONNECTED but still holds the lock on hte reclock pnn count file, is MORE connected than the local node, yield the recmaster role and let the other half of the lcuster take over this add a second, last chance mechanism to detect split clusters. IF the cluster is split but GPFS is not yet split, this mechanism makes the largest half of the cluster become the active half. (This used to be ctdb commit 07af425f444531942cce8abff112c1524228d287) 2008-03-03 01:19:30 +03:00			`force_election(rec, pnn, nodemap);`
- startup frozen, and do an initial recovery - fixed a bug in traverse - get a lock on the node list file in the recmaster recovery daemon (This used to be ctdb commit 162a5647535ad1cb3e8e5d4042a2784365fb1913) 2007-05-23 08:35:19 +04:00			`goto again;`
			`}`

first check that recovery master is connected (we know this from our own flags) then pull the flags off recovery master before checking if it is banned (This used to be ctdb commit 94c1d234e57a40eda2d8b892dd9fbe1ffc4b3433) 2007-10-11 01:10:17 +04:00			`/* if recovery master is disconnected we must elect a new recmaster */`
			`if (nodemap->nodes[j].flags & NODE_FLAGS_DISCONNECTED) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_NOTICE, ("Recmaster node %u is disconnected. Force reelection\n", nodemap->nodes[j].pnn));`
add a new tunable : reclockpingperiod once every such interval : * the recovery master on each node will uppdate the "connected" count in the reclock count file (ctdb getreclock) * if the node thinks it is a recovery master but it detects another node that is DISCONNECTED but which still holds a lock to the reclock count file this may mean that we have a split cluster. if that other node that is DISCONNECTED but still holds the lock on hte reclock pnn count file, is MORE connected than the local node, yield the recmaster role and let the other half of the lcuster take over this add a second, last chance mechanism to detect split clusters. IF the cluster is split but GPFS is not yet split, this mechanism makes the largest half of the cluster become the active half. (This used to be ctdb commit 07af425f444531942cce8abff112c1524228d287) 2008-03-03 01:19:30 +03:00			`force_election(rec, pnn, nodemap);`
first check that recovery master is connected (we know this from our own flags) then pull the flags off recovery master before checking if it is banned (This used to be ctdb commit 94c1d234e57a40eda2d8b892dd9fbe1ffc4b3433) 2007-10-11 01:10:17 +04:00			`goto again;`
			`}`

merge from ronnie (This used to be ctdb commit d18712caba11855010be52f90bac656683076676) 2007-10-15 08:17:49 +04:00			`/* grap the nodemap from the recovery master to check if it is banned */`
simplify election handling make sure we read and update the flags from all remote nodes before we reach the first codepath that can call do_recovery() since during do_recovery() we need to know what the flags are. (This used to be ctdb commit e85f3806483ea420559d449e0e4d81bec996740f) 2007-10-11 00:16:36 +04:00			`ret = ctdb_ctrl_getnodemap(ctdb, CONTROL_TIMEOUT(), nodemap->nodes[j].pnn,`
			`mem_ctx, &remote_nodemap);`
			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to get nodemap from recovery master %u\n",`
simplify election handling make sure we read and update the flags from all remote nodes before we reach the first codepath that can call do_recovery() since during do_recovery() we need to know what the flags are. (This used to be ctdb commit e85f3806483ea420559d449e0e4d81bec996740f) 2007-10-11 00:16:36 +04:00			`nodemap->nodes[j].pnn));`
			`goto again;`
			`}`


			`if (remote_nodemap->nodes[j].flags & NODE_FLAGS_INACTIVE) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_NOTICE, ("Recmaster node %u no longer available. Force reelection\n", nodemap->nodes[j].pnn));`
add a new tunable : reclockpingperiod once every such interval : * the recovery master on each node will uppdate the "connected" count in the reclock count file (ctdb getreclock) * if the node thinks it is a recovery master but it detects another node that is DISCONNECTED but which still holds a lock to the reclock count file this may mean that we have a split cluster. if that other node that is DISCONNECTED but still holds the lock on hte reclock pnn count file, is MORE connected than the local node, yield the recmaster role and let the other half of the lcuster take over this add a second, last chance mechanism to detect split clusters. IF the cluster is split but GPFS is not yet split, this mechanism makes the largest half of the cluster become the active half. (This used to be ctdb commit 07af425f444531942cce8abff112c1524228d287) 2008-03-03 01:19:30 +03:00			`force_election(rec, pnn, nodemap);`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`goto again;`
			`}`
let each node verify that they have a correct assignment of public ip addresses (i.e. htey hold those they should hold and they dont hold any of those they shouldnt hold) if an inconsistency is found, mark the local node as recovery mode active and wait for the recovery master to trigger a full blown recovery (This used to be ctdb commit 55a5bfc8244c5b9cdda3f11992f384f00566b5dc) 2007-09-14 04:16:36 +04:00
verify that the recmaster has the correct flags for us and if not tell the recmaster what the flags should be (This used to be ctdb commit 3387597926ad71e4140cc504b828486d99a3ec8e) 2008-06-26 05:08:09 +04:00
			`/* verify that we and the recmaster agrees on our flags */`
			`if (nodemap->nodes[pnn].flags != remote_nodemap->nodes[pnn].flags) {`
			`DEBUG(DEBUG_ERR, (__location__ " Recmaster disagrees on our flags flags:0x%x recmaster_flags:0x%x Broadcasting out flags.\n", nodemap->nodes[pnn].flags, remote_nodemap->nodes[pnn].flags));`

			`update_our_flags_on_all_nodes(ctdb, pnn, nodemap);`
			`}`

let each node verify that they have a correct assignment of public ip addresses (i.e. htey hold those they should hold and they dont hold any of those they shouldnt hold) if an inconsistency is found, mark the local node as recovery mode active and wait for the recovery master to trigger a full blown recovery (This used to be ctdb commit 55a5bfc8244c5b9cdda3f11992f384f00566b5dc) 2007-09-14 04:16:36 +04:00			`/* verify that the public ip address allocation is consistent */`
remove some unnessecary tests if ->vnn is null or not (This used to be ctdb commit f0169ac8166a19d65ce254496e21d095aed87c2f) 2008-05-15 07:28:19 +04:00			`ret = ctdb_ctrl_get_public_ips(ctdb, CONTROL_TIMEOUT(), CTDB_CURRENT_NODE, mem_ctx, &ips);`
			`if (ret != 0) {`
			`DEBUG(DEBUG_ERR, ("Unable to get public ips from node %u\n", i));`
			`goto again;`
			`}`
			`for (j=0; j<ips->num; j++) {`
			`/* verify that we have the ip addresses we should have`
			`and we dont have ones we shouldnt have.`
			`if we find an inconsistency we set recmode to`
			`active on the local node and wait for the recmaster`
			`to do a full blown recovery`
			`*/`
			`if (ips->ips[j].pnn == pnn) {`
			`if (!ctdb_sys_have_ip(ips->ips[j].sin)) {`
			`DEBUG(DEBUG_CRIT,("Public address '%s' is missing and we should serve this ip\n", inet_ntoa(ips->ips[j].sin.sin_addr)));`
			`ret = ctdb_ctrl_freeze(ctdb, CONTROL_TIMEOUT(), CTDB_CURRENT_NODE);`
			`if (ret != 0) {`
			`DEBUG(DEBUG_ERR,(__location__ " Failed to freeze node due to public ip address mismatches\n"));`
			`goto again;`
let each node verify that they have a correct assignment of public ip addresses (i.e. htey hold those they should hold and they dont hold any of those they shouldnt hold) if an inconsistency is found, mark the local node as recovery mode active and wait for the recovery master to trigger a full blown recovery (This used to be ctdb commit 55a5bfc8244c5b9cdda3f11992f384f00566b5dc) 2007-09-14 04:16:36 +04:00			`}`
remove some unnessecary tests if ->vnn is null or not (This used to be ctdb commit f0169ac8166a19d65ce254496e21d095aed87c2f) 2008-05-15 07:28:19 +04:00			`ret = ctdb_ctrl_setrecmode(ctdb, CONTROL_TIMEOUT(), CTDB_CURRENT_NODE, CTDB_RECOVERY_ACTIVE);`
			`if (ret != 0) {`
			`DEBUG(DEBUG_ERR,(__location__ " Failed to activate recovery mode due to public ip address mismatches\n"));`
			`goto again;`
			`}`
			`}`
			`} else {`
			`if (ctdb_sys_have_ip(ips->ips[j].sin)) {`
			`DEBUG(DEBUG_CRIT,("We are still serving a public address '%s' that we should not be serving.\n", inet_ntoa(ips->ips[j].sin.sin_addr)));`
			`ret = ctdb_ctrl_freeze(ctdb, CONTROL_TIMEOUT(), CTDB_CURRENT_NODE);`
			`if (ret != 0) {`
			`DEBUG(DEBUG_ERR,(__location__ " Failed to freeze node due to public ip address mismatches\n"));`
			`goto again;`
			`}`
			`ret = ctdb_ctrl_setrecmode(ctdb, CONTROL_TIMEOUT(), CTDB_CURRENT_NODE, CTDB_RECOVERY_ACTIVE);`
			`if (ret != 0) {`
			`DEBUG(DEBUG_ERR,(__location__ " Failed to activate recovery mode due to public ip address mismatches\n"));`
			`goto again;`
let each node verify that they have a correct assignment of public ip addresses (i.e. htey hold those they should hold and they dont hold any of those they shouldnt hold) if an inconsistency is found, mark the local node as recovery mode active and wait for the recovery master to trigger a full blown recovery (This used to be ctdb commit 55a5bfc8244c5b9cdda3f11992f384f00566b5dc) 2007-09-14 04:16:36 +04:00			`}`
			`}`
			`}`
			`}`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00
			`/* if we are not the recmaster then we do not need to check`
			`if recovery is needed`
			`*/`
change recmaster from being a local variable in monitor_cluster() to be a member of the ctdb_recoverd structure (This used to be ctdb commit b7f955338f50c92374b4f559268fb3a1a516aefa) 2008-03-02 23:53:46 +03:00			`if (pnn != rec->recmaster) {`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`goto again;`
			`}`

simplify election handling make sure we read and update the flags from all remote nodes before we reach the first codepath that can call do_recovery() since during do_recovery() we need to know what the flags are. (This used to be ctdb commit e85f3806483ea420559d449e0e4d81bec996740f) 2007-10-11 00:16:36 +04:00
sync flags between nodes in monitor loop in recmaster (This used to be ctdb commit 6eef86e06388fc53a1212f1e2783ae174c6cd210) 2007-10-15 08:28:51 +04:00			`/* ensure our local copies of flags are right */`
dont manipulate ctdb->monitoring_mode directly from the SET_MON_MODE control, instead call ctdb_start/stop_monitoring() ctdb_stop_monitoring() dont allocate a new monitoring context, leave it NULL. Also set the monitoring_mode in this function so that ctdb_stop/start_monitoring() and ->monitoring_mode are kept in sync. Add a debug message to log that we have stopped monitoring. ctdb_start_monitoring() check whether monitoring is already active and make the function idempotent. Create the monitoring context when monitoring is started. Update ->monitoring_mode once the monitoring has been started. Add a debug message to log that we have started monitoring. When we temporarily stop monitoring while running an event script, restart monitoring after the event script wrapper returns instead of in the event script callback. Let monitoring_mode start out as DISABLED and let it be enabled once we call ctdb_start_monitoring. dont check for MONITORING_DISABLED in check_fore_dead_nodes(). If monitoring is disabled, this event handler will not be called. (This used to be ctdb commit 3a93ae8bdcffb1adbd6243844f3058fc742f76aa) 2007-11-30 00:44:34 +03:00			`ret = update_local_flags(rec, nodemap);`
If update_local_flags() finds that a node has changed its BANNED status so it differs from what the local ctdb daemon on the recovery master thinks it should be we should call for a re-election (This used to be ctdb commit 21ad6039c31ef5cc0e40a35a41220f91943947cb) 2007-11-23 03:53:06 +03:00			`if (ret == MONITOR_ELECTION_NEEDED) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_NOTICE,("update_local_flags() called for a re-election.\n"));`
add a new tunable : reclockpingperiod once every such interval : * the recovery master on each node will uppdate the "connected" count in the reclock count file (ctdb getreclock) * if the node thinks it is a recovery master but it detects another node that is DISCONNECTED but which still holds a lock to the reclock count file this may mean that we have a split cluster. if that other node that is DISCONNECTED but still holds the lock on hte reclock pnn count file, is MORE connected than the local node, yield the recmaster role and let the other half of the lcuster take over this add a second, last chance mechanism to detect split clusters. IF the cluster is split but GPFS is not yet split, this mechanism makes the largest half of the cluster become the active half. (This used to be ctdb commit 07af425f444531942cce8abff112c1524228d287) 2008-03-03 01:19:30 +03:00			`force_election(rec, pnn, nodemap);`
If update_local_flags() finds that a node has changed its BANNED status so it differs from what the local ctdb daemon on the recovery master thinks it should be we should call for a re-election (This used to be ctdb commit 21ad6039c31ef5cc0e40a35a41220f91943947cb) 2007-11-23 03:53:06 +03:00			`goto again;`
			`}`
			`if (ret != MONITOR_OK) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,("Unable to update local flags\n"));`
sync flags between nodes in monitor loop in recmaster (This used to be ctdb commit 6eef86e06388fc53a1212f1e2783ae174c6cd210) 2007-10-15 08:28:51 +04:00			`goto again;`
simplify election handling make sure we read and update the flags from all remote nodes before we reach the first codepath that can call do_recovery() since during do_recovery() we need to know what the flags are. (This used to be ctdb commit e85f3806483ea420559d449e0e4d81bec996740f) 2007-10-11 00:16:36 +04:00			`}`

allow different nodes in the cluster to use different public_addresses files so that we can partition the cluster into different subsets of nodes which each serve a different subset of the public addresses (This used to be ctdb commit 889e0fe69e4c88c6166282b12843b8d9727552d6) 2007-09-04 17:15:23 +04:00			`/* update the list of public ips that a node can handle for`
			`all connected nodes`
			`*/`
			`for (j=0; j<nodemap->num; j++) {`
			`if (nodemap->nodes[j].flags & NODE_FLAGS_INACTIVE) {`
			`continue;`
			`}`
			`/* release any existing data */`
			`if (ctdb->nodes[j]->public_ips) {`
			`talloc_free(ctdb->nodes[j]->public_ips);`
			`ctdb->nodes[j]->public_ips = NULL;`
			`}`
			`/* grab a new shiny list of public ips from the node */`
			`if (ctdb_ctrl_get_public_ips(ctdb, CONTROL_TIMEOUT(),`
			`ctdb->nodes[j]->pnn,`
			`ctdb->nodes,`
			`&ctdb->nodes[j]->public_ips)) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,("Failed to read public ips from node : %u\n",`
allow different nodes in the cluster to use different public_addresses files so that we can partition the cluster into different subsets of nodes which each serve a different subset of the public addresses (This used to be ctdb commit 889e0fe69e4c88c6166282b12843b8d9727552d6) 2007-09-04 17:15:23 +04:00			`ctdb->nodes[j]->pnn));`
			`goto again;`
			`}`
			`}`


recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`/* verify that all active nodes agree that we are the recmaster */`
when a node disgrees with us re who is recmaster make it mark that node as a lcuprit so it eventually gets banned (This used to be ctdb commit eff3f326f8ce6070c9f3c430cd14d1b71a8db220) 2008-04-21 18:56:27 +04:00			`switch (verify_recmaster(rec, nodemap, pnn)) {`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`case MONITOR_RECOVERY_NEEDED:`
			`/* can not happen */`
			`goto again;`
			`case MONITOR_ELECTION_NEEDED:`
add a new tunable : reclockpingperiod once every such interval : * the recovery master on each node will uppdate the "connected" count in the reclock count file (ctdb getreclock) * if the node thinks it is a recovery master but it detects another node that is DISCONNECTED but which still holds a lock to the reclock count file this may mean that we have a split cluster. if that other node that is DISCONNECTED but still holds the lock on hte reclock pnn count file, is MORE connected than the local node, yield the recmaster role and let the other half of the lcuster take over this add a second, last chance mechanism to detect split clusters. IF the cluster is split but GPFS is not yet split, this mechanism makes the largest half of the cluster become the active half. (This used to be ctdb commit 07af425f444531942cce8abff112c1524228d287) 2008-03-03 01:19:30 +03:00			`force_election(rec, pnn, nodemap);`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`goto again;`
			`case MONITOR_OK:`
			`break;`
			`case MONITOR_FAILED:`
			`goto again;`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`}`


- merge from ronnie - add a flag to check that recovery completed correctly. If not, re-trigger it in monitoring (This used to be ctdb commit d5ed941d9bab4af30d8b5f9b77bdf43d9218d69b) 2007-09-14 03:49:12 +04:00			`if (rec->need_recovery) {`
			`/* a previous recovery didn't finish */`
make it possible to re-start a recovery without marking the current node as the culprit. (This used to be ctdb commit 3a69fad0b1dee4a482461680c556358409e53c4d) 2008-06-13 05:47:42 +04:00			`do_recovery(rec, mem_ctx, pnn, nodemap, vnnmap, -1);`
- merge from ronnie - add a flag to check that recovery completed correctly. If not, re-trigger it in monitoring (This used to be ctdb commit d5ed941d9bab4af30d8b5f9b77bdf43d9218d69b) 2007-09-14 03:49:12 +04:00			`goto again;`
			`}`

add a test in the function that checks whether the cluster needs recovery or not that all active nodes are in normal mode. If we discover that some node is still in recoverymode it may indicate that a previous recovery ended prematurely and thus we should start a new recovery (This used to be ctdb commit c15517872e6c98c8c425a8d47d2b348ecb0620b0) 2007-05-06 22:41:12 +04:00			`/* verify that all active nodes are in normal mode`
			`and not in recovery mode`
			`*/`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`switch (verify_recmode(ctdb, nodemap)) {`
break checking that the recoverymode on all nodes are ok out into its own function (This used to be ctdb commit 813cf9a252af96da24122b80f24aabeed2911939) 2007-08-23 07:48:39 +04:00			`case MONITOR_RECOVERY_NEEDED:`
store the num_active variable (number of connected/active nodes) inside the rec structure and avoid passing this as an extra parameter to do_recovery() (This used to be ctdb commit 8bb229aa3b4bd41e48d4e4e2e148d8680c8ba436) 2008-02-29 04:55:20 +03:00			`do_recovery(rec, mem_ctx, pnn, nodemap, vnnmap, ctdb->pnn);`
break checking that the recoverymode on all nodes are ok out into its own function (This used to be ctdb commit 813cf9a252af96da24122b80f24aabeed2911939) 2007-08-23 07:48:39 +04:00			`goto again;`
			`case MONITOR_FAILED:`
			`goto again;`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`case MONITOR_ELECTION_NEEDED:`
			`/* can not happen */`
break checking that the recoverymode on all nodes are ok out into its own function (This used to be ctdb commit 813cf9a252af96da24122b80f24aabeed2911939) 2007-08-23 07:48:39 +04:00			`case MONITOR_OK:`
			`break;`
add a test in the function that checks whether the cluster needs recovery or not that all active nodes are in normal mode. If we discover that some node is still in recoverymode it may indicate that a previous recovery ended prematurely and thus we should start a new recovery (This used to be ctdb commit c15517872e6c98c8c425a8d47d2b348ecb0620b0) 2007-05-06 22:41:12 +04:00			`}`


- catch ESTALE in the recovery lock by trying a read() - priortise nodes that are unbanned and healthy in the election (This used to be ctdb commit 929feb475dfdf7283f0e99b50b179e1c91d3a39f) 2007-10-05 07:28:21 +04:00			`/* we should have the reclock - check its not stale */`
			`if (ctdb->recovery_lock_fd == -1) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_CRIT,("recovery master doesn't have the recovery lock\n"));`
store the num_active variable (number of connected/active nodes) inside the rec structure and avoid passing this as an extra parameter to do_recovery() (This used to be ctdb commit 8bb229aa3b4bd41e48d4e4e2e148d8680c8ba436) 2008-02-29 04:55:20 +03:00			`do_recovery(rec, mem_ctx, pnn, nodemap, vnnmap, ctdb->pnn);`
- catch ESTALE in the recovery lock by trying a read() - priortise nodes that are unbanned and healthy in the election (This used to be ctdb commit 929feb475dfdf7283f0e99b50b179e1c91d3a39f) 2007-10-05 07:28:21 +04:00			`goto again;`
			`}`

add a control to get the name of the reclock file from the daemon (This used to be ctdb commit 9effb22cc1616d684352d7ebabb359e69adb0f52) 2008-02-29 02:03:39 +03:00			`if (pread(ctdb->recovery_lock_fd, &c, 1, 0) == -1) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_CRIT,("failed read from recovery_lock_fd - %s\n", strerror(errno)));`
- catch ESTALE in the recovery lock by trying a read() - priortise nodes that are unbanned and healthy in the election (This used to be ctdb commit 929feb475dfdf7283f0e99b50b179e1c91d3a39f) 2007-10-05 07:28:21 +04:00			`close(ctdb->recovery_lock_fd);`
			`ctdb->recovery_lock_fd = -1;`
store the num_active variable (number of connected/active nodes) inside the rec structure and avoid passing this as an extra parameter to do_recovery() (This used to be ctdb commit 8bb229aa3b4bd41e48d4e4e2e148d8680c8ba436) 2008-02-29 04:55:20 +03:00			`do_recovery(rec, mem_ctx, pnn, nodemap, vnnmap, ctdb->pnn);`
- catch ESTALE in the recovery lock by trying a read() - priortise nodes that are unbanned and healthy in the election (This used to be ctdb commit 929feb475dfdf7283f0e99b50b179e1c91d3a39f) 2007-10-05 07:28:21 +04:00			`goto again;`
			`}`
break checking that the recoverymode on all nodes are ok out into its own function (This used to be ctdb commit 813cf9a252af96da24122b80f24aabeed2911939) 2007-08-23 07:48:39 +04:00
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`/* get the nodemap for all active remote nodes and verify`
			`they are the same as for this node`
			`*/`
			`for (j=0; j<nodemap->num; j++) {`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`if (nodemap->nodes[j].flags & NODE_FLAGS_INACTIVE) {`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`continue;`
			`}`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`if (nodemap->nodes[j].pnn == pnn) {`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`continue;`
			`}`

change how we do public addresses and takeover so that we can have multiple public addresses spread across multiple interfaces on each node. this is a massive patch since we have previously made the assumtion that we only have one public address per node. get rid of the public_interface argument. the public addresses file now explicitely lists which interface the address belongs to (This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8) 2007-09-04 03:50:07 +04:00			`ret = ctdb_ctrl_getnodemap(ctdb, CONTROL_TIMEOUT(), nodemap->nodes[j].pnn,`
formatting fixes (This used to be ctdb commit ed63a2057698aed3931762605b2ea2368681af2b) 2007-06-07 12:39:37 +04:00			`mem_ctx, &remote_nodemap);`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to get nodemap from remote node %u\n",`
change how we do public addresses and takeover so that we can have multiple public addresses spread across multiple interfaces on each node. this is a massive patch since we have previously made the assumtion that we only have one public address per node. get rid of the public_interface argument. the public addresses file now explicitely lists which interface the address belongs to (This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8) 2007-09-04 03:50:07 +04:00			`nodemap->nodes[j].pnn));`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`goto again;`
			`}`

			`/* if the nodes disagree on how many nodes there are`
			`then this is a good reason to try recovery`
			`*/`
			`if (remote_nodemap->num != nodemap->num) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Remote node:%u has different node count. %u vs %u of the local node\n",`
change how we do public addresses and takeover so that we can have multiple public addresses spread across multiple interfaces on each node. this is a massive patch since we have previously made the assumtion that we only have one public address per node. get rid of the public_interface argument. the public addresses file now explicitely lists which interface the address belongs to (This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8) 2007-09-04 03:50:07 +04:00			`nodemap->nodes[j].pnn, remote_nodemap->num, nodemap->num));`
store the num_active variable (number of connected/active nodes) inside the rec structure and avoid passing this as an extra parameter to do_recovery() (This used to be ctdb commit 8bb229aa3b4bd41e48d4e4e2e148d8680c8ba436) 2008-02-29 04:55:20 +03:00			`do_recovery(rec, mem_ctx, pnn, nodemap, vnnmap, nodemap->nodes[j].pnn);`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`goto again;`
			`}`

			`/* if the nodes disagree on which nodes exist and are`
			`active, then that is also a good reason to do recovery`
			`*/`
			`for (i=0;i<nodemap->num;i++) {`
change how we do public addresses and takeover so that we can have multiple public addresses spread across multiple interfaces on each node. this is a massive patch since we have previously made the assumtion that we only have one public address per node. get rid of the public_interface argument. the public addresses file now explicitely lists which interface the address belongs to (This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8) 2007-09-04 03:50:07 +04:00			`if (remote_nodemap->nodes[i].pnn != nodemap->nodes[i].pnn) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Remote node:%u has different nodemap pnn for %d (%u vs %u).\n",`
change how we do public addresses and takeover so that we can have multiple public addresses spread across multiple interfaces on each node. this is a massive patch since we have previously made the assumtion that we only have one public address per node. get rid of the public_interface argument. the public addresses file now explicitely lists which interface the address belongs to (This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8) 2007-09-04 03:50:07 +04:00			`nodemap->nodes[j].pnn, i,`
			`remote_nodemap->nodes[i].pnn, nodemap->nodes[i].pnn));`
store the num_active variable (number of connected/active nodes) inside the rec structure and avoid passing this as an extra parameter to do_recovery() (This used to be ctdb commit 8bb229aa3b4bd41e48d4e4e2e148d8680c8ba436) 2008-02-29 04:55:20 +03:00			`do_recovery(rec, mem_ctx, pnn, nodemap,`
change how we do public addresses and takeover so that we can have multiple public addresses spread across multiple interfaces on each node. this is a massive patch since we have previously made the assumtion that we only have one public address per node. get rid of the public_interface argument. the public addresses file now explicitely lists which interface the address belongs to (This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8) 2007-09-04 03:50:07 +04:00			`vnnmap, nodemap->nodes[j].pnn);`
more detail in recovery message (This used to be ctdb commit bc18a39efcf1fa5edfadc4c2f842f7cf035e4fbd) 2007-06-11 15:37:09 +04:00			`goto again;`
			`}`
a better way to fix the DISCONNECT\|BANNED vs DISCONNECT bug (This used to be ctdb commit 5c638d7731c5a268de02d3a37828ac7aec9a12de) 2007-07-09 06:55:15 +04:00			`if ((remote_nodemap->nodes[i].flags & NODE_FLAGS_INACTIVE) !=`
			`(nodemap->nodes[i].flags & NODE_FLAGS_INACTIVE)) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Remote node:%u has different nodemap flag for %d (0x%x vs 0x%x)\n",`
change how we do public addresses and takeover so that we can have multiple public addresses spread across multiple interfaces on each node. this is a massive patch since we have previously made the assumtion that we only have one public address per node. get rid of the public_interface argument. the public addresses file now explicitely lists which interface the address belongs to (This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8) 2007-09-04 03:50:07 +04:00			`nodemap->nodes[j].pnn, i,`
more detail in recovery message (This used to be ctdb commit bc18a39efcf1fa5edfadc4c2f842f7cf035e4fbd) 2007-06-11 15:37:09 +04:00			`remote_nodemap->nodes[i].flags, nodemap->nodes[i].flags));`
store the num_active variable (number of connected/active nodes) inside the rec structure and avoid passing this as an extra parameter to do_recovery() (This used to be ctdb commit 8bb229aa3b4bd41e48d4e4e2e148d8680c8ba436) 2008-02-29 04:55:20 +03:00			`do_recovery(rec, mem_ctx, pnn, nodemap,`
change how we do public addresses and takeover so that we can have multiple public addresses spread across multiple interfaces on each node. this is a massive patch since we have previously made the assumtion that we only have one public address per node. get rid of the public_interface argument. the public addresses file now explicitely lists which interface the address belongs to (This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8) 2007-09-04 03:50:07 +04:00			`vnnmap, nodemap->nodes[j].pnn);`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`goto again;`
			`}`
			`}`

			`}`


			`/* there better be the same number of lmasters in the vnn map`
setup the random number generator a bit better (This used to be ctdb commit 708585eb0ed31b0df6543a1d7a20b82e751877c2) 2007-05-10 07:10:23 +04:00			`as there are active nodes or we will have to do a recovery`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`*/`
store the num_active variable (number of connected/active nodes) inside the rec structure and avoid passing this as an extra parameter to do_recovery() (This used to be ctdb commit 8bb229aa3b4bd41e48d4e4e2e148d8680c8ba436) 2008-02-29 04:55:20 +03:00			`if (vnnmap->size != rec->num_active) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " The vnnmap count is different from the number of active nodes. %u vs %u\n",`
store the num_active variable (number of connected/active nodes) inside the rec structure and avoid passing this as an extra parameter to do_recovery() (This used to be ctdb commit 8bb229aa3b4bd41e48d4e4e2e148d8680c8ba436) 2008-02-29 04:55:20 +03:00			`vnnmap->size, rec->num_active));`
			`do_recovery(rec, mem_ctx, pnn, nodemap, vnnmap, ctdb->pnn);`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`goto again;`
			`}`

			`/* verify that all active nodes in the nodemap also exist in`
			`the vnnmap.`
			`*/`
			`for (j=0; j<nodemap->num; j++) {`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`if (nodemap->nodes[j].flags & NODE_FLAGS_INACTIVE) {`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`continue;`
			`}`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`if (nodemap->nodes[j].pnn == pnn) {`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`continue;`
			`}`

			`for (i=0; i<vnnmap->size; i++) {`
change how we do public addresses and takeover so that we can have multiple public addresses spread across multiple interfaces on each node. this is a massive patch since we have previously made the assumtion that we only have one public address per node. get rid of the public_interface argument. the public addresses file now explicitely lists which interface the address belongs to (This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8) 2007-09-04 03:50:07 +04:00			`if (vnnmap->map[i] == nodemap->nodes[j].pnn) {`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`break;`
			`}`
			`}`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`if (i == vnnmap->size) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Node %u is active in the nodemap but did not exist in the vnnmap\n",`
change how we do public addresses and takeover so that we can have multiple public addresses spread across multiple interfaces on each node. this is a massive patch since we have previously made the assumtion that we only have one public address per node. get rid of the public_interface argument. the public addresses file now explicitely lists which interface the address belongs to (This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8) 2007-09-04 03:50:07 +04:00			`nodemap->nodes[j].pnn));`
store the num_active variable (number of connected/active nodes) inside the rec structure and avoid passing this as an extra parameter to do_recovery() (This used to be ctdb commit 8bb229aa3b4bd41e48d4e4e2e148d8680c8ba436) 2008-02-29 04:55:20 +03:00			`do_recovery(rec, mem_ctx, pnn, nodemap, vnnmap, nodemap->nodes[j].pnn);`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`goto again;`
			`}`
			`}`


also verify that the generation id is the same on all the nodes and if not, trigger a recovery (This used to be ctdb commit 46b8a66ee70419c153acf45eeec88c1fc8f230ce) 2007-05-04 05:57:45 +04:00			`/* verify that all other nodes have the same vnnmap`
			`and are from the same generation`
			`*/`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`for (j=0; j<nodemap->num; j++) {`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`if (nodemap->nodes[j].flags & NODE_FLAGS_INACTIVE) {`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`continue;`
			`}`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`if (nodemap->nodes[j].pnn == pnn) {`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`continue;`
			`}`

change how we do public addresses and takeover so that we can have multiple public addresses spread across multiple interfaces on each node. this is a massive patch since we have previously made the assumtion that we only have one public address per node. get rid of the public_interface argument. the public addresses file now explicitely lists which interface the address belongs to (This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8) 2007-09-04 03:50:07 +04:00			`ret = ctdb_ctrl_getvnnmap(ctdb, CONTROL_TIMEOUT(), nodemap->nodes[j].pnn,`
formatting fixes (This used to be ctdb commit ed63a2057698aed3931762605b2ea2368681af2b) 2007-06-07 12:39:37 +04:00			`mem_ctx, &remote_vnnmap);`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to get vnnmap from remote node %u\n",`
change how we do public addresses and takeover so that we can have multiple public addresses spread across multiple interfaces on each node. this is a massive patch since we have previously made the assumtion that we only have one public address per node. get rid of the public_interface argument. the public addresses file now explicitely lists which interface the address belongs to (This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8) 2007-09-04 03:50:07 +04:00			`nodemap->nodes[j].pnn));`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`goto again;`
			`}`

also verify that the generation id is the same on all the nodes and if not, trigger a recovery (This used to be ctdb commit 46b8a66ee70419c153acf45eeec88c1fc8f230ce) 2007-05-04 05:57:45 +04:00			`/* verify the vnnmap generation is the same */`
			`if (vnnmap->generation != remote_vnnmap->generation) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Remote node %u has different generation of vnnmap. %u vs %u (ours)\n",`
change how we do public addresses and takeover so that we can have multiple public addresses spread across multiple interfaces on each node. this is a massive patch since we have previously made the assumtion that we only have one public address per node. get rid of the public_interface argument. the public addresses file now explicitely lists which interface the address belongs to (This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8) 2007-09-04 03:50:07 +04:00			`nodemap->nodes[j].pnn, remote_vnnmap->generation, vnnmap->generation));`
store the num_active variable (number of connected/active nodes) inside the rec structure and avoid passing this as an extra parameter to do_recovery() (This used to be ctdb commit 8bb229aa3b4bd41e48d4e4e2e148d8680c8ba436) 2008-02-29 04:55:20 +03:00			`do_recovery(rec, mem_ctx, pnn, nodemap, vnnmap, nodemap->nodes[j].pnn);`
also verify that the generation id is the same on all the nodes and if not, trigger a recovery (This used to be ctdb commit 46b8a66ee70419c153acf45eeec88c1fc8f230ce) 2007-05-04 05:57:45 +04:00			`goto again;`
			`}`

update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`/* verify the vnnmap size is the same */`
			`if (vnnmap->size != remote_vnnmap->size) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Remote node %u has different size of vnnmap. %u vs %u (ours)\n",`
change how we do public addresses and takeover so that we can have multiple public addresses spread across multiple interfaces on each node. this is a massive patch since we have previously made the assumtion that we only have one public address per node. get rid of the public_interface argument. the public addresses file now explicitely lists which interface the address belongs to (This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8) 2007-09-04 03:50:07 +04:00			`nodemap->nodes[j].pnn, remote_vnnmap->size, vnnmap->size));`
store the num_active variable (number of connected/active nodes) inside the rec structure and avoid passing this as an extra parameter to do_recovery() (This used to be ctdb commit 8bb229aa3b4bd41e48d4e4e2e148d8680c8ba436) 2008-02-29 04:55:20 +03:00			`do_recovery(rec, mem_ctx, pnn, nodemap, vnnmap, nodemap->nodes[j].pnn);`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`goto again;`
			`}`

			`/* verify the vnnmap is the same */`
			`for (i=0;i<vnnmap->size;i++) {`
			`if (remote_vnnmap->map[i] != vnnmap->map[i]) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Remote node %u has different vnnmap.\n",`
change how we do public addresses and takeover so that we can have multiple public addresses spread across multiple interfaces on each node. this is a massive patch since we have previously made the assumtion that we only have one public address per node. get rid of the public_interface argument. the public addresses file now explicitely lists which interface the address belongs to (This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8) 2007-09-04 03:50:07 +04:00			`nodemap->nodes[j].pnn));`
store the num_active variable (number of connected/active nodes) inside the rec structure and avoid passing this as an extra parameter to do_recovery() (This used to be ctdb commit 8bb229aa3b4bd41e48d4e4e2e148d8680c8ba436) 2008-02-29 04:55:20 +03:00			`do_recovery(rec, mem_ctx, pnn, nodemap,`
change how we do public addresses and takeover so that we can have multiple public addresses spread across multiple interfaces on each node. this is a massive patch since we have previously made the assumtion that we only have one public address per node. get rid of the public_interface argument. the public addresses file now explicitely lists which interface the address belongs to (This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8) 2007-09-04 03:50:07 +04:00			`vnnmap, nodemap->nodes[j].pnn);`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`goto again;`
			`}`
			`}`
			`}`

added health monitoring logic to ctdb, so a node loses its public IP address if one of the sybsystem event scripts reports a problem (This used to be ctdb commit c7a089256d86cec21097453bce5acbccee87413f) 2007-06-06 04:25:46 +04:00			`/* we might need to change who has what IP assigned */`
prevent recursion in the calling of ctdb_takeover_run (This used to be ctdb commit 0fbdeb7c91b965d9bc5ecc7b24e31070378d8f1d) 2007-09-13 08:08:18 +04:00			`if (rec->need_takeover_run) {`
			`rec->need_takeover_run = false;`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00
			`/* execute the "startrecovery" event script on all nodes */`
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`ret = run_startrecovery_eventscript(rec, nodemap);`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`if (ret!=0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to run the 'startrecovery' event on cluster\n"));`
store the num_active variable (number of connected/active nodes) inside the rec structure and avoid passing this as an extra parameter to do_recovery() (This used to be ctdb commit 8bb229aa3b4bd41e48d4e4e2e148d8680c8ba436) 2008-02-29 04:55:20 +03:00			`do_recovery(rec, mem_ctx, pnn, nodemap,`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`vnnmap, ctdb->pnn);`
			`}`

added health monitoring logic to ctdb, so a node loses its public IP address if one of the sybsystem event scripts reports a problem (This used to be ctdb commit c7a089256d86cec21097453bce5acbccee87413f) 2007-06-06 04:25:46 +04:00			`ret = ctdb_takeover_run(ctdb, nodemap);`
			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to setup public takeover addresses - starting recovery\n"));`
store the num_active variable (number of connected/active nodes) inside the rec structure and avoid passing this as an extra parameter to do_recovery() (This used to be ctdb commit 8bb229aa3b4bd41e48d4e4e2e148d8680c8ba436) 2008-02-29 04:55:20 +03:00			`do_recovery(rec, mem_ctx, pnn, nodemap,`
fixed several places where we set the recovery culprit incorrectly (This used to be ctdb commit d9da73395fa443801fc68ec53a42b548e832d58a) 2007-10-05 07:51:31 +04:00			`vnnmap, ctdb->pnn);`
added health monitoring logic to ctdb, so a node loses its public IP address if one of the sybsystem event scripts reports a problem (This used to be ctdb commit c7a089256d86cec21097453bce5acbccee87413f) 2007-06-06 04:25:46 +04:00			`}`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00
			`/* execute the "recovered" event script on all nodes */`
Update some debug statements. Dont say that recovery failed if the failed function was invoked from outside of recovery (This used to be ctdb commit 3038d0b74895b51af4f85f2f304508ed16d245f4) 2008-05-15 06:28:52 +04:00			`ret = run_recovered_eventscript(ctdb, nodemap, "monitor_cluster");`
dont check whether the "recovered" event was successful or not since this event wont run unless the recovery mode is normal but we can not know what the recovery mode will be in the future on a remote node so since we issue these commands that will execute in the future at some other node it is pointless to try to check if it worked or not in particular if "failure to successfully run the eventscript" would then trigger a full new recovery which is disruptive and expensive. (This used to be ctdb commit 2c292039a0139dcf5bb2bd964eb6f8902d094c50) 2008-05-15 09:01:01 +04:00			`#if 0`
			`// we cant check whether the event completed successfully`
			`// since this script WILL fail if the node is in recovery mode`
			`// and if that race happens, the code here would just cause a second`
			`// cascading recovery.`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`if (ret!=0) {`
Update some debug statements. Dont say that recovery failed if the failed function was invoked from outside of recovery (This used to be ctdb commit 3038d0b74895b51af4f85f2f304508ed16d245f4) 2008-05-15 06:28:52 +04:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to run the 'recovered' event on cluster. Update of public ips failed.\n"));`
store the num_active variable (number of connected/active nodes) inside the rec structure and avoid passing this as an extra parameter to do_recovery() (This used to be ctdb commit 8bb229aa3b4bd41e48d4e4e2e148d8680c8ba436) 2008-02-29 04:55:20 +03:00			`do_recovery(rec, mem_ctx, pnn, nodemap,`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`vnnmap, ctdb->pnn);`
			`}`
dont check whether the "recovered" event was successful or not since this event wont run unless the recovery mode is normal but we can not know what the recovery mode will be in the future on a remote node so since we issue these commands that will execute in the future at some other node it is pointless to try to check if it worked or not in particular if "failure to successfully run the eventscript" would then trigger a full new recovery which is disruptive and expensive. (This used to be ctdb commit 2c292039a0139dcf5bb2bd964eb6f8902d094c50) 2008-05-15 09:01:01 +04:00			`#endif`
added health monitoring logic to ctdb, so a node loses its public IP address if one of the sybsystem event scripts reports a problem (This used to be ctdb commit c7a089256d86cec21097453bce5acbccee87413f) 2007-06-06 04:25:46 +04:00			`}`

force an update of the flags from the recmaster after each monitoring run (This used to be ctdb commit 251aeadc8b16a9c27a4bae78c97ad6e93e6cfdf4) 2008-06-26 07:08:37 +04:00
reduce loglevel of the info message we are updating the flags on all nodes (This used to be ctdb commit 9a98a21979558dcd6421b3fcb97d21ab82b792d8) 2008-06-26 07:15:41 +04:00			`DEBUG(DEBUG_INFO, (__location__ " Update flags on all nodes\n"));`
force an update of the flags from the recmaster after each monitoring run (This used to be ctdb commit 251aeadc8b16a9c27a4bae78c97ad6e93e6cfdf4) 2008-06-26 07:08:37 +04:00			`/*`
			`update all nodes to have the same flags that we have`
			`*/`
			`ret = update_flags_on_all_nodes(ctdb, nodemap);`
			`if (ret != 0) {`
			`DEBUG(DEBUG_ERR, (__location__ " Unable to update flags on all nodes\n"));`
test (This used to be ctdb commit 4f2d722cf29175c3c207e6ebb6d4f9e370767249) 2008-06-26 08:14:37 +04:00			`goto again;`
force an update of the flags from the recmaster after each monitoring run (This used to be ctdb commit 251aeadc8b16a9c27a4bae78c97ad6e93e6cfdf4) 2008-06-26 07:08:37 +04:00			`}`

update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`goto again;`

start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00			`}`

added health monitoring logic to ctdb, so a node loses its public IP address if one of the sybsystem event scripts reports a problem (This used to be ctdb commit c7a089256d86cec21097453bce5acbccee87413f) 2007-06-06 04:25:46 +04:00			`/*`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`event handler for when the main ctdbd dies`
			`*/`
moved the recovery daemon into the main ctdbd and enable it by default (This used to be ctdb commit 2a7d42124731f43d013cb76a798525eab4cc1ee0) 2007-05-15 09:13:36 +04:00			`static void ctdb_recoverd_parent(struct event_context ev, struct fd_event fde,`
			`uint16_t flags, void *private_data)`
			`{`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ALERT,("recovery daemon parent died - exiting\n"));`
moved the recovery daemon into the main ctdbd and enable it by default (This used to be ctdb commit 2a7d42124731f43d013cb76a798525eab4cc1ee0) 2007-05-15 09:13:36 +04:00			`_exit(1);`
			`}`

Monitor that the recovery daemon is still running from the main ctdb daemon and if it has terminated, then we shut down the main daemon as well (This used to be ctdb commit 7e587acaf8006254e89ff9b4bf48454821c85863) 2008-05-06 05:19:17 +04:00			`/*`
			`called regularly to verify that the recovery daemon is still running`
			`*/`
			`static void ctdb_check_recd(struct event_context ev, struct timed_event te,`
			`struct timeval yt, void *p)`
			`{`
			`struct ctdb_context *ctdb = talloc_get_type(p, struct ctdb_context);`

			`/* make sure we harvest the child if signals are blocked for some`
			`reason`
			`*/`
			`waitpid(ctdb->recoverd_pid, 0, WNOHANG);`

			`if (kill(ctdb->recoverd_pid, 0) != 0) {`
			`DEBUG(DEBUG_ERR,("Recovery daemon (pid:%d) is no longer running. Shutting down main daemon\n", (int)ctdb->recoverd_pid));`

			`ctdb_stop_recoverd(ctdb);`
			`ctdb_stop_keepalive(ctdb);`
			`ctdb_stop_monitoring(ctdb);`
			`ctdb_release_all_ips(ctdb);`
ctdb->methods becomes NULL when we shutdown the transport. If we shutdown the transport and CTDB later decides to send a command out for queueing, the call to ctdb->methods->allocate_pkt() will SEGV. This could trigger for example when we are in the process of shuttind down CTDBD and have already shutdown the transport but we are still waiting for the "shutdown" eventscripts to finish. If the event scripts now take much much longer to execute for some reason, this race condition becomes much more probable. Decorate all dereferencing of ctdb->methods-> with a check that ctdb->menthods is non-NULL (This used to be ctdb commit c4c2c53918da6fb566d6e9cbd6b02e61ae2921e7) 2008-05-11 08:28:33 +04:00			`if (ctdb->methods != NULL) {`
			`ctdb->methods->shutdown(ctdb);`
			`}`
Monitor that the recovery daemon is still running from the main ctdb daemon and if it has terminated, then we shut down the main daemon as well (This used to be ctdb commit 7e587acaf8006254e89ff9b4bf48454821c85863) 2008-05-06 05:19:17 +04:00			`ctdb_event_script(ctdb, "shutdown");`

			`exit(10);`
			`}`

			`event_add_timed(ctdb->ev, ctdb,`
			`timeval_current_ofs(30, 0),`
			`ctdb_check_recd, ctdb);`
			`}`

implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`/*`
			`startup the recovery daemon as a child of the main ctdb daemon`
			`*/`
moved the recovery daemon into the main ctdbd and enable it by default (This used to be ctdb commit 2a7d42124731f43d013cb76a798525eab4cc1ee0) 2007-05-15 09:13:36 +04:00			`int ctdb_start_recoverd(struct ctdb_context *ctdb)`
start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00			`{`
			`int ret;`
moved the recovery daemon into the main ctdbd and enable it by default (This used to be ctdb commit 2a7d42124731f43d013cb76a798525eab4cc1ee0) 2007-05-15 09:13:36 +04:00			`int fd[2];`
start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00
moved the recovery daemon into the main ctdbd and enable it by default (This used to be ctdb commit 2a7d42124731f43d013cb76a798525eab4cc1ee0) 2007-05-15 09:13:36 +04:00			`if (pipe(fd) != 0) {`
			`return -1;`
start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00			`}`

merge from ronnie (This used to be ctdb commit 0aa6e04438aa5ec727815689baa19544df042cf7) 2008-01-07 08:17:22 +03:00			`ctdb->ctdbd_pid = getpid();`

when we are shutting down, we should first shut down the recovery daemon (This used to be ctdb commit 39ade6b329adcd3234124d6a8daaa6181abf739b) 2007-10-22 06:34:08 +04:00			`ctdb->recoverd_pid = fork();`
			`if (ctdb->recoverd_pid == -1) {`
moved the recovery daemon into the main ctdbd and enable it by default (This used to be ctdb commit 2a7d42124731f43d013cb76a798525eab4cc1ee0) 2007-05-15 09:13:36 +04:00			`return -1;`
start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00			`}`
moved the recovery daemon into the main ctdbd and enable it by default (This used to be ctdb commit 2a7d42124731f43d013cb76a798525eab4cc1ee0) 2007-05-15 09:13:36 +04:00
when we are shutting down, we should first shut down the recovery daemon (This used to be ctdb commit 39ade6b329adcd3234124d6a8daaa6181abf739b) 2007-10-22 06:34:08 +04:00			`if (ctdb->recoverd_pid != 0) {`
moved the recovery daemon into the main ctdbd and enable it by default (This used to be ctdb commit 2a7d42124731f43d013cb76a798525eab4cc1ee0) 2007-05-15 09:13:36 +04:00			`close(fd[0]);`
Monitor that the recovery daemon is still running from the main ctdb daemon and if it has terminated, then we shut down the main daemon as well (This used to be ctdb commit 7e587acaf8006254e89ff9b4bf48454821c85863) 2008-05-06 05:19:17 +04:00			`event_add_timed(ctdb->ev, ctdb,`
			`timeval_current_ofs(30, 0),`
			`ctdb_check_recd, ctdb);`
moved the recovery daemon into the main ctdbd and enable it by default (This used to be ctdb commit 2a7d42124731f43d013cb76a798525eab4cc1ee0) 2007-05-15 09:13:36 +04:00			`return 0;`
start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00			`}`

moved the recovery daemon into the main ctdbd and enable it by default (This used to be ctdb commit 2a7d42124731f43d013cb76a798525eab4cc1ee0) 2007-05-15 09:13:36 +04:00			`close(fd[1]);`

- make calling of recovered event script async - shutdown sockets before calling shutdown script (This used to be ctdb commit c5e099feef94a014a77742b6cc1d0afe78ef9da9) 2007-06-02 02:41:19 +04:00			`/* shutdown the transport */`
ctdb->methods becomes NULL when we shutdown the transport. If we shutdown the transport and CTDB later decides to send a command out for queueing, the call to ctdb->methods->allocate_pkt() will SEGV. This could trigger for example when we are in the process of shuttind down CTDBD and have already shutdown the transport but we are still waiting for the "shutdown" eventscripts to finish. If the event scripts now take much much longer to execute for some reason, this race condition becomes much more probable. Decorate all dereferencing of ctdb->methods-> with a check that ctdb->menthods is non-NULL (This used to be ctdb commit c4c2c53918da6fb566d6e9cbd6b02e61ae2921e7) 2008-05-11 08:28:33 +04:00			`if (ctdb->methods) {`
			`ctdb->methods->shutdown(ctdb);`
			`}`
- make calling of recovered event script async - shutdown sockets before calling shutdown script (This used to be ctdb commit c5e099feef94a014a77742b6cc1d0afe78ef9da9) 2007-06-02 02:41:19 +04:00
			`/* get a new event context */`
don't start the transport connecting to the other nodes until after the startup event script has run (This used to be ctdb commit afca3cc74211aa2e18b1f74d36b2add8dffcfdc7) 2007-05-30 07:26:50 +04:00			`talloc_free(ctdb->ev);`
			`ctdb->ev = event_context_init(ctdb);`

moved the recovery daemon into the main ctdbd and enable it by default (This used to be ctdb commit 2a7d42124731f43d013cb76a798525eab4cc1ee0) 2007-05-15 09:13:36 +04:00			`event_add_fd(ctdb->ev, ctdb, fd[0], EVENT_FD_READ\|EVENT_FD_AUTOCLOSE,`
			`ctdb_recoverd_parent, &fd[0]);`
setup the random number generator a bit better (This used to be ctdb commit 708585eb0ed31b0df6543a1d7a20b82e751877c2) 2007-05-10 07:10:23 +04:00
moved the recovery daemon into the main ctdbd and enable it by default (This used to be ctdb commit 2a7d42124731f43d013cb76a798525eab4cc1ee0) 2007-05-15 09:13:36 +04:00			`close(ctdb->daemon.sd);`
			`ctdb->daemon.sd = -1;`

			`srandom(getpid() ^ time(NULL));`
start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00
The recovery daemon does not need to be a realtime task (This used to be ctdb commit f552acf7c1f9dd37eb35d9716ea3fb02304aae8f) 2008-01-16 14:08:33 +03:00			`/* the recovery daemon does not need to be realtime */`
			`if (ctdb->do_setsched) {`
			`ctdb_restore_scheduler(ctdb);`
			`}`

start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00			`/* initialise ctdb */`
moved the recovery daemon into the main ctdbd and enable it by default (This used to be ctdb commit 2a7d42124731f43d013cb76a798525eab4cc1ee0) 2007-05-15 09:13:36 +04:00			`ret = ctdb_socket_connect(ctdb);`
			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ALERT, (__location__ " Failed to init ctdb\n"));`
start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00			`exit(1);`
			`}`

moved the recovery daemon into the main ctdbd and enable it by default (This used to be ctdb commit 2a7d42124731f43d013cb76a798525eab4cc1ee0) 2007-05-15 09:13:36 +04:00			`monitor_cluster(ctdb);`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ALERT,("ERROR: ctdb_recoverd finished!?\n"));`
moved the recovery daemon into the main ctdbd and enable it by default (This used to be ctdb commit 2a7d42124731f43d013cb76a798525eab4cc1ee0) 2007-05-15 09:13:36 +04:00			`return -1;`
start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00			`}`
when we are shutting down, we should first shut down the recovery daemon (This used to be ctdb commit 39ade6b329adcd3234124d6a8daaa6181abf739b) 2007-10-22 06:34:08 +04:00
			`/*`
			`shutdown the recovery daemon`
			`*/`
			`void ctdb_stop_recoverd(struct ctdb_context *ctdb)`
			`{`
			`if (ctdb->recoverd_pid == 0) {`
			`return;`
			`}`

merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_NOTICE,("Shutting down recovery daemon\n"));`
when we are shutting down, we should first shut down the recovery daemon (This used to be ctdb commit 39ade6b329adcd3234124d6a8daaa6181abf739b) 2007-10-22 06:34:08 +04:00			`kill(ctdb->recoverd_pid, SIGTERM);`
			`}`

2971 lines 81 KiB C Raw Normal View History Unescape Escape

2971 lines

81 KiB

C

Raw Normal View History