samba-mirror

mirror of https://github.com/samba-team/samba.git synced 2024-12-23 17:34:34 +03:00

4196 lines

114 KiB

C

Raw Normal View History

start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00			`/*`
			`ctdb recovery daemon`

			`Copyright (C) Ronnie Sahlberg 2007`

ctdb is GPL not LGPL (This used to be ctdb commit 8624378010d1c2a1438e1e701339dfba7276f960) 2007-05-31 07:50:53 +04:00			`This program is free software; you can redistribute it and/or modify`
			`it under the terms of the GNU General Public License as published by`
update lib/replace from samba4 (This used to be ctdb commit f0555484105668c01c21f56322992e752e831109) 2007-07-10 09:29:31 +04:00			`the Free Software Foundation; either version 3 of the License, or`
ctdb is GPL not LGPL (This used to be ctdb commit 8624378010d1c2a1438e1e701339dfba7276f960) 2007-05-31 07:50:53 +04:00			`(at your option) any later version.`

			`This program is distributed in the hope that it will be useful,`
start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00			`but WITHOUT ANY WARRANTY; without even the implied warranty of`
ctdb is GPL not LGPL (This used to be ctdb commit 8624378010d1c2a1438e1e701339dfba7276f960) 2007-05-31 07:50:53 +04:00			`MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the`
			`GNU General Public License for more details.`

			`You should have received a copy of the GNU General Public License`
update lib/replace from samba4 (This used to be ctdb commit f0555484105668c01c21f56322992e752e831109) 2007-07-10 09:29:31 +04:00			`along with this program; if not, see <http://www.gnu.org/licenses/>.`
start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00			`*/`

ctdb-daemon: Remove dependency on includes.h Instead of includes.h, include the required header files explicitly. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-26 08:50:46 +03:00			`#include "replace.h"`
start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00			`#include "system/filesys.h"`
better timeout handling for calls, controls and traverses (This used to be ctdb commit 63346a6c59d4821b4c443939b5d88db8cd20f5fe) 2007-05-10 08:06:48 +04:00			`#include "system/time.h"`
let each node verify that they have a correct assignment of public ip addresses (i.e. htey hold those they should hold and they dont hold any of those they shouldnt hold) if an inconsistency is found, mark the local node as recovery mode active and wait for the recovery master to trigger a full blown recovery (This used to be ctdb commit 55a5bfc8244c5b9cdda3f11992f384f00566b5dc) 2007-09-14 04:16:36 +04:00			`#include "system/network.h"`
when we are shutting down, we should first shut down the recovery daemon (This used to be ctdb commit 39ade6b329adcd3234124d6a8daaa6181abf739b) 2007-10-22 06:34:08 +04:00			`#include "system/wait.h"`
ctdb-daemon: Remove dependency on includes.h Instead of includes.h, include the required header files explicitly. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-26 08:50:46 +03:00
			`#include <popt.h>`
			`#include <talloc.h>`
			`#include <tevent.h>`
			`#include <tdb.h>`

ctdb-util: Rename db_wrap to tdb_wrap and make it a build subsystem This makes it consistent with Samba, to ease transition. Update unit test code to link to with tdb_wrap instead of including db_wrap.c. There are some potential whitespace fixes in this commit that have been ignored. CTDB's lib/tdb_wrap will be deleted after the transition to Samba's lib/tdb_wrap, so there's no point polishing it too much. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2014-08-15 09:46:33 +04:00			`#include "lib/tdb_wrap/tdb_wrap.h"`
ctdb-recoverd: Change include of dlinklist.h to contain directory This makes it consistent with the rest of the code and avoids problems when some variant of lib/util isn't in the include path. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2014-08-15 10:18:05 +04:00			`#include "lib/util/dlinklist.h"`
ctdb-daemon: Remove dependency on includes.h Instead of includes.h, include the required header files explicitly. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-26 08:50:46 +03:00			`#include "lib/util/debug.h"`
			`#include "lib/util/samba_util.h"`

			`#include "ctdb_private.h"`
			`#include "ctdb_client.h"`
			`#include "ctdb_logging.h"`

ctdb-daemon: Separate prototypes for system specific functions This groups function prototypes for system specific functions in common/system.h and removes them from ctdb_private.h. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-23 06:11:53 +03:00			`#include "common/system.h"`
ctdb-daemon: Remove dependency on includes.h Instead of includes.h, include the required header files explicitly. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-26 08:50:46 +03:00			`#include "common/cmdline.h"`
ctdb-daemon: Separate prototypes for common client/server functions This groups function prototypes for common client/server functions in common/common.h and removes them from ctdb_private.h. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-23 06:17:34 +03:00			`#include "common/common.h"`
start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00
added admin commands to ban/unban nodes (This used to be ctdb commit 4dad04172e7e4955b5bf6444a85b19901c9683ad) 2007-06-07 10:34:33 +04:00
recoverd: Factor out the SRVID handling code The code that handles IP reallocate requests can be reused. This also changes the result back to a SRVID caller to the PNN on success or a negative error code on failure. None of the callers currently look at the result so this is harmless... but it will be useful later. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit e4eae6e3291baa299a1d0f733ab11b138ee699a3) 2013-08-16 14:02:34 +04:00			`/* List of SRVID requests that need to be processed */`
			`struct srvid_list {`
			`struct srvid_list next, prev;`
ctdb-daemon: Rename struct srvid_request to ctdb_srvid_message Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 06:32:49 +03:00			`struct ctdb_srvid_message *request;`
recoverd: Factor out the SRVID handling code The code that handles IP reallocate requests can be reused. This also changes the result back to a SRVID caller to the PNN on success or a negative error code on failure. None of the callers currently look at the result so this is harmless... but it will be useful later. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit e4eae6e3291baa299a1d0f733ab11b138ee699a3) 2013-08-16 14:02:34 +04:00			`};`

			`struct srvid_requests {`
			`struct srvid_list *requests;`
add a new command "ctdb ipreallocate", this command will force the recovery master to perform a full ip reallocation process. the ctdb command will block until the ip reallocation has comleted (This used to be ctdb commit abad7b97fe0c066b33f6e75d0953bbed892a3216) 2009-07-02 07:00:26 +04:00			`};`

recoverd: Factor out the SRVID handling code The code that handles IP reallocate requests can be reused. This also changes the result back to a SRVID caller to the PNN on success or a negative error code on failure. None of the callers currently look at the result so this is harmless... but it will be useful later. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit e4eae6e3291baa299a1d0f733ab11b138ee699a3) 2013-08-16 14:02:34 +04:00			`static void srvid_request_reply(struct ctdb_context *ctdb,`
ctdb-daemon: Rename struct srvid_request to ctdb_srvid_message Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 06:32:49 +03:00			`struct ctdb_srvid_message *request,`
recoverd: Factor out the SRVID handling code The code that handles IP reallocate requests can be reused. This also changes the result back to a SRVID caller to the PNN on success or a negative error code on failure. None of the callers currently look at the result so this is harmless... but it will be useful later. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit e4eae6e3291baa299a1d0f733ab11b138ee699a3) 2013-08-16 14:02:34 +04:00			`TDB_DATA result)`
			`{`
			`/* Someone that sent srvid==0 does not want a reply */`
			`if (request->srvid == 0) {`
			`talloc_free(request);`
			`return;`
			`}`

			`if (ctdb_client_send_message(ctdb, request->pnn, request->srvid,`
			`result) == 0) {`
			`DEBUG(DEBUG_INFO,("Sent SRVID reply to %u:%llu\n",`
			`(unsigned)request->pnn,`
			`(unsigned long long)request->srvid));`
			`} else {`
			`DEBUG(DEBUG_ERR,("Failed to send SRVID reply to %u:%llu\n",`
			`(unsigned)request->pnn,`
			`(unsigned long long)request->srvid));`
			`}`

			`talloc_free(request);`
			`}`

			`static void srvid_requests_reply(struct ctdb_context *ctdb,`
			`struct srvid_requests **requests,`
			`TDB_DATA result)`
			`{`
			`struct srvid_list *r;`

			`for (r = (*requests)->requests; r != NULL; r = r->next) {`
			`srvid_request_reply(ctdb, r->request, result);`
			`}`

			`/* Free the list structure... */`
			`TALLOC_FREE(*requests);`
			`}`

			`static void srvid_request_add(struct ctdb_context *ctdb,`
			`struct srvid_requests **requests,`
ctdb-daemon: Rename struct srvid_request to ctdb_srvid_message Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 06:32:49 +03:00			`struct ctdb_srvid_message *request)`
recoverd: Factor out the SRVID handling code The code that handles IP reallocate requests can be reused. This also changes the result back to a SRVID caller to the PNN on success or a negative error code on failure. None of the callers currently look at the result so this is harmless... but it will be useful later. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit e4eae6e3291baa299a1d0f733ab11b138ee699a3) 2013-08-16 14:02:34 +04:00			`{`
			`struct srvid_list *t;`
			`int32_t ret;`
			`TDB_DATA result;`

			`if (*requests == NULL) {`
			`*requests = talloc_zero(ctdb, struct srvid_requests);`
			`if (*requests == NULL) {`
			`goto nomem;`
			`}`
			`}`

			`t = talloc_zero(*requests, struct srvid_list);`
			`if (t == NULL) {`
			`/* If requests was just allocated above then free it /`
			`if ((*requests)->requests == NULL) {`
			`TALLOC_FREE(*requests);`
			`}`
			`goto nomem;`
			`}`

ctdb-daemon: Rename struct srvid_request to ctdb_srvid_message Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 06:32:49 +03:00			`t->request = (struct ctdb_srvid_message *)talloc_steal(t, request);`
recoverd: Factor out the SRVID handling code The code that handles IP reallocate requests can be reused. This also changes the result back to a SRVID caller to the PNN on success or a negative error code on failure. None of the callers currently look at the result so this is harmless... but it will be useful later. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit e4eae6e3291baa299a1d0f733ab11b138ee699a3) 2013-08-16 14:02:34 +04:00			`DLIST_ADD((*requests)->requests, t);`

			`return;`

			`nomem:`
			`/* Failed to add the request to the list. Send a fail. */`
			`DEBUG(DEBUG_ERR, (__location__`
			`" Out of memory, failed to queue SRVID request\n"));`
			`ret = -ENOMEM;`
			`result.dsize = sizeof(ret);`
			`result.dptr = (uint8_t *)&ret;`
			`srvid_request_reply(ctdb, request, result);`
			`}`

ctdb-recoverd: Add a new abstraction ctdb_op_disable() This can be used to disable and re-enable an operation, and do all the relevant sanity checking. Most of this is from existing functions disable_takeover_runs_handler(), clear_takeover_runs_disable() and reenable_takeover_runs(). Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-02-08 12:50:38 +03:00			`/* An abstraction to allow an operation (takeover runs, recoveries,`
			`* ...) to be disabled for a given timeout */`
			`struct ctdb_op_state {`
			`struct tevent_timer *timer;`
			`bool in_progress;`
			`const char *name;`
			`};`

			`static struct ctdb_op_state ctdb_op_init(TALLOC_CTX mem_ctx, const char *name)`
			`{`
			`struct ctdb_op_state *state = talloc_zero(mem_ctx, struct ctdb_op_state);`

			`if (state != NULL) {`
			`state->in_progress = false;`
			`state->name = name;`
			`}`

			`return state;`
			`}`

			`static bool ctdb_op_is_disabled(struct ctdb_op_state *state)`
			`{`
			`return state->timer != NULL;`
			`}`

			`static bool ctdb_op_begin(struct ctdb_op_state *state)`
			`{`
			`if (ctdb_op_is_disabled(state)) {`
			`DEBUG(DEBUG_NOTICE,`
			`("Unable to begin - %s are disabled\n", state->name));`
			`return false;`
			`}`

			`state->in_progress = true;`
			`return true;`
			`}`

			`static bool ctdb_op_end(struct ctdb_op_state *state)`
			`{`
			`return state->in_progress = false;`
			`}`

			`static bool ctdb_op_is_in_progress(struct ctdb_op_state *state)`
			`{`
			`return state->in_progress;`
			`}`

			`static void ctdb_op_enable(struct ctdb_op_state *state)`
			`{`
			`TALLOC_FREE(state->timer);`
			`}`

ctdb-daemon: Stop using tevent compatibility definitions Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-26 08:50:09 +03:00			`static void ctdb_op_timeout_handler(struct tevent_context *ev,`
			`struct tevent_timer *te,`
ctdb-recoverd: Add a new abstraction ctdb_op_disable() This can be used to disable and re-enable an operation, and do all the relevant sanity checking. Most of this is from existing functions disable_takeover_runs_handler(), clear_takeover_runs_disable() and reenable_takeover_runs(). Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-02-08 12:50:38 +03:00			`struct timeval yt, void *p)`
			`{`
			`struct ctdb_op_state *state =`
			`talloc_get_type(p, struct ctdb_op_state);`

			`DEBUG(DEBUG_NOTICE,("Reenabling %s after timeout\n", state->name));`
			`ctdb_op_enable(state);`
			`}`

			`static int ctdb_op_disable(struct ctdb_op_state *state,`
			`struct tevent_context *ev,`
			`uint32_t timeout)`
			`{`
			`if (timeout == 0) {`
			`DEBUG(DEBUG_NOTICE,("Reenabling %s\n", state->name));`
			`ctdb_op_enable(state);`
			`return 0;`
			`}`

			`if (state->in_progress) {`
			`DEBUG(DEBUG_ERR,`
			`("Unable to disable %s - in progress\n", state->name));`
			`return -EAGAIN;`
			`}`

			`DEBUG(DEBUG_NOTICE,("Disabling %s for %u seconds\n",`
			`state->name, timeout));`

			`/* Clear any old timers */`
			`talloc_free(state->timer);`

			`/* Arrange for the timeout to occur */`
			`state->timer = tevent_add_timer(ev, state,`
			`timeval_current_ofs(timeout, 0),`
			`ctdb_op_timeout_handler, state);`
			`if (state->timer == NULL) {`
			`DEBUG(DEBUG_ERR,(__location__ " Unable to setup timer\n"));`
			`return -ENOMEM;`
			`}`

			`return 0;`
			`}`

new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00			`struct ctdb_banning_state {`
			`uint32_t count;`
			`struct timeval last_reported_time;`
			`};`

implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`/*`
			`private state of recovery daemon`
			`*/`
			`struct ctdb_recoverd {`
			`struct ctdb_context *ctdb;`
change recmaster from being a local variable in monitor_cluster() to be a member of the ctdb_recoverd structure (This used to be ctdb commit b7f955338f50c92374b4f559268fb3a1a516aefa) 2008-03-02 23:53:46 +03:00			`uint32_t recmaster;`
new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00			`uint32_t last_culprit_node;`
ctdb-daemon: Rename struct ctdb_node_map to ctdb_node_map_old Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 09:22:48 +03:00			`struct ctdb_node_map_old *nodemap;`
use a priority time for the election data, not just the vnn (This used to be ctdb commit a691f9c5cd77194005f0d98483da94b07a48d57d) 2007-06-07 12:37:27 +04:00			`struct timeval priority_time;`
prevent recursion in the calling of ctdb_takeover_run (This used to be ctdb commit 0fbdeb7c91b965d9bc5ecc7b24e31070378d8f1d) 2007-09-13 08:08:18 +04:00			`bool need_takeover_run;`
- merge from ronnie - add a flag to check that recovery completed correctly. If not, re-trigger it in monitoring (This used to be ctdb commit d5ed941d9bab4af30d8b5f9b77bdf43d9218d69b) 2007-09-14 03:49:12 +04:00			`bool need_recovery;`
- catch ESTALE in the recovery lock by trying a read() - priortise nodes that are unbanned and healthy in the election (This used to be ctdb commit 929feb475dfdf7283f0e99b50b179e1c91d3a39f) 2007-10-05 07:28:21 +04:00			`uint32_t node_flags;`
ctdb-daemon: Stop using tevent compatibility definitions Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-26 08:50:09 +03:00			`struct tevent_timer *send_election_te;`
			`struct tevent_timer *election_timeout;`
recoverd: Factor out the SRVID handling code The code that handles IP reallocate requests can be reused. This also changes the result back to a SRVID caller to the PNN on success or a negative error code on failure. None of the callers currently look at the result so this is harmless... but it will be useful later. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit e4eae6e3291baa299a1d0f733ab11b138ee699a3) 2013-08-16 14:02:34 +04:00			`struct srvid_requests *reallocate_requests;`
ctdb-recoverd: Reimplement disabling takeover runs using ctdb_op_disable() Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-02-08 12:52:12 +03:00			`struct ctdb_op_state *takeover_run;`
ctdb-recoverd: Reimplement ReRecoveryTimeout using ctdb_op_disable() Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-02-06 06:47:33 +03:00			`struct ctdb_op_state *recovery;`
ctdb-daemon: Rename struct ctdb_control_get_ifaces to ctdb_iface_list_old Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-28 11:43:48 +03:00			`struct ctdb_iface_list_old *ifaces;`
recoverd: Fix the implementation of CTDB_SRVID_REBALANCE_NODE The current implementation has a few flaws: * A takeover run is called unconditionally when the timer goes even if the recovery master role has moved. This means a node other than the recovery master can incorrectly do a takeover run. * The rebalancing target nodes are cleared in the setup for a takeover run, regardless of whether the takeover run succeeds. * The timer to force a rebalance isn't cleared if another takeover run occurs before the deadline. Any forced rebalancing will happen in the first takeover run and when the timer expires some time later then an unnecessary takeover run will occur. * If the recovery master role moves then the rebalancing data will stay on the original node and affect the next takeover run to occur if the recovery master role should come back to the original node. Instead, store an array of rebalance target nodes in the recovery master context. This is passed as an extra argument to ctdb_takeover_run() each time it is called and is cleared when a takeover run succeeds. The timer hangs off the array of rebalance target nodes, which is cleared if the node isn't the recovery master. This means that it is possible to lose rebalance data if the recovery master role moves. However, that's a difficult problem to solve. The best way of approaching it is probably to try to stop the recovery master role from jumping around unnecesarily when inactive nodes join the cluster. The long term solution is to avoid this nonsense completely. The IP allocation algorithm needs to cache state between runs so that it knows which nodes have just become healthy. This also needs recovery master stability. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit c51c1efe5fc7fa668597f2acd435dee16e410fc9) 2013-09-04 08:30:04 +04:00			`uint32_t *force_rebalance_nodes;`
ctdb-recoverd: Use capabilities API Simplify update_capabilities() using the capabilities API and store the capabilities in new field rec->caps rather than scattered around ctdb->nodes. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2014-07-31 09:26:03 +04:00			`struct ctdb_node_capabilities *caps;`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`};`
start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00
make recovery daemon values tunable (This used to be ctdb commit ec29dbf2f5110428df8b97801443ba7addf61353) 2007-06-04 14:22:44 +04:00			`#define CONTROL_TIMEOUT() timeval_current_ofs(ctdb->tunable.recover_timeout, 0)`
added health monitoring logic to ctdb, so a node loses its public IP address if one of the sybsystem event scripts reports a problem (This used to be ctdb commit c7a089256d86cec21097453bce5acbccee87413f) 2007-06-06 04:25:46 +04:00			`#define MONITOR_TIMEOUT() timeval_current_ofs(ctdb->tunable.recover_interval, 0)`
raise the control timeout in recovery (This used to be ctdb commit 43424ff66daf28c202c12982f20a9f662b6fb125) 2007-05-24 07:49:27 +04:00
ctdb-daemon: Stop using tevent compatibility definitions Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-26 08:50:09 +03:00			`static void ctdb_restart_recd(struct tevent_context *ev,`
			`struct tevent_timer *te, struct timeval t,`
			`void *private_data);`
convert much of the recovery logic to be async and parallel across all nodes (This used to be ctdb commit 8b72a02bf1045d8befb342a4111ca1316889262e) 2008-01-05 01:35:43 +03:00
added admin commands to ban/unban nodes (This used to be ctdb commit 4dad04172e7e4955b5bf6444a85b19901c9683ad) 2007-06-07 10:34:33 +04:00			`/*`
			`ban a node for a period of time`
			`*/`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`static void ctdb_ban_node(struct ctdb_recoverd *rec, uint32_t pnn, uint32_t ban_time)`
added admin commands to ban/unban nodes (This used to be ctdb commit 4dad04172e7e4955b5bf6444a85b19901c9683ad) 2007-06-07 10:34:33 +04:00			`{`
new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00			`int ret;`
added admin commands to ban/unban nodes (This used to be ctdb commit 4dad04172e7e4955b5bf6444a85b19901c9683ad) 2007-06-07 10:34:33 +04:00			`struct ctdb_context *ctdb = rec->ctdb;`
ctdb-daemon: Rename struct ctdb_ban_time to ctdb_ban_state Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-28 10:18:33 +03:00			`struct ctdb_ban_state bantime;`

change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`if (!ctdb_validate_pnn(ctdb, pnn)) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,("Bad pnn %u in ctdb_ban_node\n", pnn));`
handle CTDB_CURRENT_NODE in ban commands (This used to be ctdb commit fefb53f1d22c5458a1e107f8352818aee87983de) 2007-06-07 10:48:31 +04:00			`return;`
			`}`

recoverd: Print banning message only after verifying pnn Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 4be8dff3a4451192f838497b4747273685959bed) 2013-06-24 08:18:58 +04:00			`DEBUG(DEBUG_NOTICE,("Banning node %u for %u seconds\n", pnn, ban_time));`

new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00			`bantime.pnn = pnn;`
			`bantime.time = ban_time;`
add log output for when ctdb_ban_node() and ctdb_unban_node() are called when these functions are called to ban or unban a node make sure we update the CTDB_NODE_BANNED flag in rec->node_flags since this field and flag are checked during the election process (This used to be ctdb commit 740c632ae96a2d34327d1b575780aaf079d93f4f) 2007-11-23 04:36:14 +03:00
new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00			`ret = ctdb_ctrl_set_ban(ctdb, CONTROL_TIMEOUT(), pnn, &bantime);`
			`if (ret != 0) {`
			`DEBUG(DEBUG_ERR,(__location__ " Failed to ban node %d\n", pnn));`
rework banning/unbanning nodes ctdb_recoverd.c Always handle banning/unbanning locally on the node that is being banned/unbanned instead of on the recovery master. This means that if a ban request comes in to the recovery master for a remote node, we pass the request on to the remote node instead of setting up the ban and ban timeouts locally. ctdb.c send ban/unban requests to the node being banned/unbanned instead of to the recmaster (This used to be ctdb commit 880dd9f5fd0b91e450da93e195cc5c62cb1dcd6e) 2007-12-03 07:45:53 +03:00			`return;`
use a priority time for the election data, not just the vnn (This used to be ctdb commit a691f9c5cd77194005f0d98483da94b07a48d57d) 2007-06-07 12:37:27 +04:00			`}`

added admin commands to ban/unban nodes (This used to be ctdb commit 4dad04172e7e4955b5bf6444a85b19901c9683ad) 2007-06-07 10:34:33 +04:00			`}`

add async versions of the freeze node control and freeze all nodes in parallell (This used to be ctdb commit f34e89f54d9f4380e76eb1b5b2385a4d8500b505) 2007-08-27 04:31:22 +04:00			`enum monitor_result { MONITOR_OK, MONITOR_RECOVERY_NEEDED, MONITOR_ELECTION_NEEDED, MONITOR_FAILED};`


add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`/*`
			`remember the trouble maker`
			`*/`
new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00			`static void ctdb_set_culprit_count(struct ctdb_recoverd *rec, uint32_t culprit, uint32_t count)`
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`{`
new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00			`struct ctdb_context *ctdb = talloc_get_type(rec->ctdb, struct ctdb_context);`
			`struct ctdb_banning_state *ban_state;`

			`if (culprit > ctdb->num_nodes) {`
			`DEBUG(DEBUG_ERR,("Trying to set culprit %d but num_nodes is %d\n", culprit, ctdb->num_nodes));`
			`return;`
			`}`

recoverd: Do not set banning credits on a node if current node is inactive If the current node is banned or stopped, then it should not assign banning credits to other nodes since the current node will not have up-to-date flags of other nodes. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 38304f88e0c634e97d4687c25adef975f71537b8) 2013-06-28 08:10:47 +04:00			`/* If we are banned or stopped, do not set other nodes as culprits */`
			`if (rec->node_flags & NODE_FLAGS_INACTIVE) {`
			`DEBUG(DEBUG_NOTICE, ("This node is INACTIVE, cannot set culprit node %d\n", culprit));`
			`return;`
			`}`

new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00			`if (ctdb->nodes[culprit]->ban_state == NULL) {`
			`ctdb->nodes[culprit]->ban_state = talloc_zero(ctdb->nodes[culprit], struct ctdb_banning_state);`
			`CTDB_NO_MEMORY_VOID(ctdb, ctdb->nodes[culprit]->ban_state);`
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00
new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00
			`}`
			`ban_state = ctdb->nodes[culprit]->ban_state;`
			`if (timeval_elapsed(&ban_state->last_reported_time) > ctdb->tunable.recovery_grace_period) {`
			`/* this was the first time in a long while this node`
			`misbehaved so we will forgive any old transgressions.`
			`*/`
			`ban_state->count = 0;`
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`}`
new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00
			`ban_state->count += count;`
			`ban_state->last_reported_time = timeval_current();`
			`rec->last_culprit_node = culprit;`
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`}`

If we can not pull a database from a node during recovery, mark this node as a "culprit" so that it will eventually become banned. (This used to be ctdb commit 69dc3bf60b86d8df6dc5c7c6ebf303e847fb2ba9) 2009-04-24 07:58:32 +04:00			`/*`
			`remember the trouble maker`
			`*/`
new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00			`static void ctdb_set_culprit(struct ctdb_recoverd *rec, uint32_t culprit)`
If we can not pull a database from a node during recovery, mark this node as a "culprit" so that it will eventually become banned. (This used to be ctdb commit 69dc3bf60b86d8df6dc5c7c6ebf303e847fb2ba9) 2009-04-24 07:58:32 +04:00			`{`
new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00			`ctdb_set_culprit_count(rec, culprit, 1);`
If we can not pull a database from a node during recovery, mark this node as a "culprit" so that it will eventually become banned. (This used to be ctdb commit 69dc3bf60b86d8df6dc5c7c6ebf303e847fb2ba9) 2009-04-24 07:58:32 +04:00			`}`
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00
new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00
recoverd: Track failure of "recovered" event, banning culprits Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 9550c497e6d6ef5ee44826c4bd9ed5ad65174263) 2012-09-24 08:32:04 +04:00			`/* this callback is called for every node that failed to execute the`
			`recovered event`
			`*/`
			`static void recovered_fail_callback(struct ctdb_context ctdb, uint32_t node_pnn, int32_t res, TDB_DATA outdata, void callback_data)`
			`{`
			`struct ctdb_recoverd *rec = talloc_get_type(callback_data, struct ctdb_recoverd);`

			`DEBUG(DEBUG_ERR, (__location__ " Node %u failed the recovered event. Setting it as recovery fail culprit\n", node_pnn));`

			`ctdb_set_culprit(rec, node_pnn);`
			`}`

			`/*`
			`run the "recovered" eventscript on all nodes`
			`*/`
ctdb-daemon: Rename struct ctdb_node_map to ctdb_node_map_old Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 09:22:48 +03:00			`static int run_recovered_eventscript(struct ctdb_recoverd rec, struct ctdb_node_map_old nodemap, const char *caller)`
recoverd: Track failure of "recovered" event, banning culprits Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 9550c497e6d6ef5ee44826c4bd9ed5ad65174263) 2012-09-24 08:32:04 +04:00			`{`
			`TALLOC_CTX *tmp_ctx;`
			`uint32_t *nodes;`
			`struct ctdb_context *ctdb = rec->ctdb;`

			`tmp_ctx = talloc_new(ctdb);`
			`CTDB_NO_MEMORY(ctdb, tmp_ctx);`

			`nodes = list_of_active_nodes(ctdb, nodemap, tmp_ctx, true);`
			`if (ctdb_client_async_control(ctdb, CTDB_CONTROL_END_RECOVERY,`
			`nodes, 0,`
			`CONTROL_TIMEOUT(), false, tdb_null,`
			`NULL, recovered_fail_callback,`
			`rec) != 0) {`
			`DEBUG(DEBUG_ERR, (__location__ " Unable to run the 'recovered' event when called from %s\n", caller));`

			`talloc_free(tmp_ctx);`
			`return -1;`
			`}`

			`talloc_free(tmp_ctx);`
			`return 0;`
			`}`

add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`/* this callback is called for every node that failed to execute the`
			`start recovery event`
			`*/`
			`static void startrecovery_fail_callback(struct ctdb_context ctdb, uint32_t node_pnn, int32_t res, TDB_DATA outdata, void callback_data)`
			`{`
			`struct ctdb_recoverd *rec = talloc_get_type(callback_data, struct ctdb_recoverd);`

			`DEBUG(DEBUG_ERR, (__location__ " Node %u failed the startrecovery event. Setting it as recovery fail culprit\n", node_pnn));`

			`ctdb_set_culprit(rec, node_pnn);`
			`}`

merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`/*`
			`run the "startrecovery" eventscript on all nodes`
			`*/`
ctdb-daemon: Rename struct ctdb_node_map to ctdb_node_map_old Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 09:22:48 +03:00			`static int run_startrecovery_eventscript(struct ctdb_recoverd rec, struct ctdb_node_map_old nodemap)`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`{`
			`TALLOC_CTX *tmp_ctx;`
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`uint32_t *nodes;`
			`struct ctdb_context *ctdb = rec->ctdb;`
added admin commands to ban/unban nodes (This used to be ctdb commit 4dad04172e7e4955b5bf6444a85b19901c9683ad) 2007-06-07 10:34:33 +04:00
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`tmp_ctx = talloc_new(ctdb);`
			`CTDB_NO_MEMORY(ctdb, tmp_ctx);`

add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`nodes = list_of_active_nodes(ctdb, nodemap, tmp_ctx, true);`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`if (ctdb_client_async_control(ctdb, CTDB_CONTROL_START_RECOVERY,`
initial attempt at freezing databases in priority order (This used to be ctdb commit e8d692590da1070c87a4144031e3306d190ebed2) 2009-10-12 05:08:39 +04:00			`nodes, 0,`
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`CONTROL_TIMEOUT(), false, tdb_null,`
			`NULL,`
			`startrecovery_fail_callback,`
			`rec) != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to run the 'startrecovery' event. Recovery failed.\n"));`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`talloc_free(tmp_ctx);`
			`return -1;`
			`}`

			`talloc_free(tmp_ctx);`
			`return 0;`
			`}`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00
Expand the client async framework so that it can take a callback function. This allows us to use the async framework also for controls that return outdata. Add a "capabilities" field to the ctdb_node structure. This field is only initialized and kept valid inside the recovery daemon context and not inside the main ctdb daemon. change the GET_CAPABILITIES control to return the capabilities in outdata instead of in the res return variable. When performing a recovery inside the recovery daemon, read the capabilities from all connected nodes and update the ctdb->nodes list of nodes. when building the new vnnmap after the database rebuild in recovery, do not include any nodes which lack the LMASTER capability in the new vnnmap. Unless there are no available connected node that sports the LMASTER capability in which case we let the local node (recmaster) take on the lmaster role temporarily (i.e. become a member of the vnnmap list) (This used to be ctdb commit 0f1883c69c689b28b0c04148774840b2c4081df6) 2008-05-06 09:42:59 +04:00			`/*`
			`update the node capabilities for all connected nodes`
			`*/`
ctdb-recoverd: Use capabilities API Simplify update_capabilities() using the capabilities API and store the capabilities in new field rec->caps rather than scattered around ctdb->nodes. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2014-07-31 09:26:03 +04:00			`static int update_capabilities(struct ctdb_recoverd *rec,`
ctdb-daemon: Rename struct ctdb_node_map to ctdb_node_map_old Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 09:22:48 +03:00			`struct ctdb_node_map_old *nodemap)`
Expand the client async framework so that it can take a callback function. This allows us to use the async framework also for controls that return outdata. Add a "capabilities" field to the ctdb_node structure. This field is only initialized and kept valid inside the recovery daemon context and not inside the main ctdb daemon. change the GET_CAPABILITIES control to return the capabilities in outdata instead of in the res return variable. When performing a recovery inside the recovery daemon, read the capabilities from all connected nodes and update the ctdb->nodes list of nodes. when building the new vnnmap after the database rebuild in recovery, do not include any nodes which lack the LMASTER capability in the new vnnmap. Unless there are no available connected node that sports the LMASTER capability in which case we let the local node (recmaster) take on the lmaster role temporarily (i.e. become a member of the vnnmap list) (This used to be ctdb commit 0f1883c69c689b28b0c04148774840b2c4081df6) 2008-05-06 09:42:59 +04:00			`{`
ctdb-recoverd: Use capabilities API Simplify update_capabilities() using the capabilities API and store the capabilities in new field rec->caps rather than scattered around ctdb->nodes. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2014-07-31 09:26:03 +04:00			`uint32_t *capp;`
Expand the client async framework so that it can take a callback function. This allows us to use the async framework also for controls that return outdata. Add a "capabilities" field to the ctdb_node structure. This field is only initialized and kept valid inside the recovery daemon context and not inside the main ctdb daemon. change the GET_CAPABILITIES control to return the capabilities in outdata instead of in the res return variable. When performing a recovery inside the recovery daemon, read the capabilities from all connected nodes and update the ctdb->nodes list of nodes. when building the new vnnmap after the database rebuild in recovery, do not include any nodes which lack the LMASTER capability in the new vnnmap. Unless there are no available connected node that sports the LMASTER capability in which case we let the local node (recmaster) take on the lmaster role temporarily (i.e. become a member of the vnnmap list) (This used to be ctdb commit 0f1883c69c689b28b0c04148774840b2c4081df6) 2008-05-06 09:42:59 +04:00			`TALLOC_CTX *tmp_ctx;`
ctdb-recoverd: Use capabilities API Simplify update_capabilities() using the capabilities API and store the capabilities in new field rec->caps rather than scattered around ctdb->nodes. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2014-07-31 09:26:03 +04:00			`struct ctdb_node_capabilities *caps;`
			`struct ctdb_context *ctdb = rec->ctdb;`
Expand the client async framework so that it can take a callback function. This allows us to use the async framework also for controls that return outdata. Add a "capabilities" field to the ctdb_node structure. This field is only initialized and kept valid inside the recovery daemon context and not inside the main ctdb daemon. change the GET_CAPABILITIES control to return the capabilities in outdata instead of in the res return variable. When performing a recovery inside the recovery daemon, read the capabilities from all connected nodes and update the ctdb->nodes list of nodes. when building the new vnnmap after the database rebuild in recovery, do not include any nodes which lack the LMASTER capability in the new vnnmap. Unless there are no available connected node that sports the LMASTER capability in which case we let the local node (recmaster) take on the lmaster role temporarily (i.e. become a member of the vnnmap list) (This used to be ctdb commit 0f1883c69c689b28b0c04148774840b2c4081df6) 2008-05-06 09:42:59 +04:00
ctdb-recoverd: Use capabilities API Simplify update_capabilities() using the capabilities API and store the capabilities in new field rec->caps rather than scattered around ctdb->nodes. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2014-07-31 09:26:03 +04:00			`tmp_ctx = talloc_new(rec);`
Expand the client async framework so that it can take a callback function. This allows us to use the async framework also for controls that return outdata. Add a "capabilities" field to the ctdb_node structure. This field is only initialized and kept valid inside the recovery daemon context and not inside the main ctdb daemon. change the GET_CAPABILITIES control to return the capabilities in outdata instead of in the res return variable. When performing a recovery inside the recovery daemon, read the capabilities from all connected nodes and update the ctdb->nodes list of nodes. when building the new vnnmap after the database rebuild in recovery, do not include any nodes which lack the LMASTER capability in the new vnnmap. Unless there are no available connected node that sports the LMASTER capability in which case we let the local node (recmaster) take on the lmaster role temporarily (i.e. become a member of the vnnmap list) (This used to be ctdb commit 0f1883c69c689b28b0c04148774840b2c4081df6) 2008-05-06 09:42:59 +04:00			`CTDB_NO_MEMORY(ctdb, tmp_ctx);`

ctdb-recoverd: Use capabilities API Simplify update_capabilities() using the capabilities API and store the capabilities in new field rec->caps rather than scattered around ctdb->nodes. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2014-07-31 09:26:03 +04:00			`caps = ctdb_get_capabilities(ctdb, tmp_ctx,`
			`CONTROL_TIMEOUT(), nodemap);`

			`if (caps == NULL) {`
			`DEBUG(DEBUG_ERR,`
			`(__location__ " Failed to get node capabilities\n"));`
Expand the client async framework so that it can take a callback function. This allows us to use the async framework also for controls that return outdata. Add a "capabilities" field to the ctdb_node structure. This field is only initialized and kept valid inside the recovery daemon context and not inside the main ctdb daemon. change the GET_CAPABILITIES control to return the capabilities in outdata instead of in the res return variable. When performing a recovery inside the recovery daemon, read the capabilities from all connected nodes and update the ctdb->nodes list of nodes. when building the new vnnmap after the database rebuild in recovery, do not include any nodes which lack the LMASTER capability in the new vnnmap. Unless there are no available connected node that sports the LMASTER capability in which case we let the local node (recmaster) take on the lmaster role temporarily (i.e. become a member of the vnnmap list) (This used to be ctdb commit 0f1883c69c689b28b0c04148774840b2c4081df6) 2008-05-06 09:42:59 +04:00			`talloc_free(tmp_ctx);`
			`return -1;`
			`}`

ctdb-recoverd: Use capabilities API Simplify update_capabilities() using the capabilities API and store the capabilities in new field rec->caps rather than scattered around ctdb->nodes. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2014-07-31 09:26:03 +04:00			`capp = ctdb_get_node_capabilities(caps, ctdb_get_pnn(ctdb));`
			`if (capp == NULL) {`
			`DEBUG(DEBUG_ERR,`
			`(__location__`
			`" Capabilities don't include current node.\n"));`
			`talloc_free(tmp_ctx);`
			`return -1;`
			`}`
			`ctdb->capabilities = *capp;`

			`TALLOC_FREE(rec->caps);`
			`rec->caps = talloc_steal(rec, caps);`

Expand the client async framework so that it can take a callback function. This allows us to use the async framework also for controls that return outdata. Add a "capabilities" field to the ctdb_node structure. This field is only initialized and kept valid inside the recovery daemon context and not inside the main ctdb daemon. change the GET_CAPABILITIES control to return the capabilities in outdata instead of in the res return variable. When performing a recovery inside the recovery daemon, read the capabilities from all connected nodes and update the ctdb->nodes list of nodes. when building the new vnnmap after the database rebuild in recovery, do not include any nodes which lack the LMASTER capability in the new vnnmap. Unless there are no available connected node that sports the LMASTER capability in which case we let the local node (recmaster) take on the lmaster role temporarily (i.e. become a member of the vnnmap list) (This used to be ctdb commit 0f1883c69c689b28b0c04148774840b2c4081df6) 2008-05-06 09:42:59 +04:00			`talloc_free(tmp_ctx);`
			`return 0;`
			`}`

if a node fails to become frozen during recovery, mark it up with as a culprit so it will soon get banned (This used to be ctdb commit f72d33ac73ebb1af802bacdfb30279df3cd8b8f9) 2009-10-08 09:45:25 +04:00			`static void set_recmode_fail_callback(struct ctdb_context ctdb, uint32_t node_pnn, int32_t res, TDB_DATA outdata, void callback_data)`
			`{`
			`struct ctdb_recoverd *rec = talloc_get_type(callback_data, struct ctdb_recoverd);`

			`DEBUG(DEBUG_ERR,("Failed to freeze node %u during recovery. Set it as ban culprit for %d credits\n", node_pnn, rec->nodemap->num));`
			`ctdb_set_culprit_count(rec, node_pnn, rec->nodemap->num);`
			`}`

add a new control for explicitely cancelling recovery transactions, i.e. the transactions we start across all tdb databased during the recovery. this allows us to properly clean up and delete these tdb transactions on a recovery failure. (This used to be ctdb commit b2ce8b900a7d00944c84e0574fea5b371064a06d) 2009-10-12 09:48:05 +04:00			`static void transaction_start_fail_callback(struct ctdb_context ctdb, uint32_t node_pnn, int32_t res, TDB_DATA outdata, void callback_data)`
			`{`
			`struct ctdb_recoverd *rec = talloc_get_type(callback_data, struct ctdb_recoverd);`

			`DEBUG(DEBUG_ERR,("Failed to start recovery transaction on node %u. Set it as ban culprit for %d credits\n", node_pnn, rec->nodemap->num));`
			`ctdb_set_culprit_count(rec, node_pnn, rec->nodemap->num);`
			`}`

implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`/*`
			`change recovery mode on all nodes`
			`*/`
ctdb-recoverd: Do not freeze databases for election If election occurs during SMB activity, then trying to freeze all the databases can cause samba/ctdb deadlock which parallel database recovery is trying to avoid. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-06 03:52:06 +03:00			`static int set_recovery_mode(struct ctdb_context *ctdb,`
			`struct ctdb_recoverd *rec,`
ctdb-daemon: Rename struct ctdb_node_map to ctdb_node_map_old Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 09:22:48 +03:00			`struct ctdb_node_map_old *nodemap,`
ctdb-recoverd: Do not freeze databases for election If election occurs during SMB activity, then trying to freeze all the databases can cause samba/ctdb deadlock which parallel database recovery is trying to avoid. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-06 03:52:06 +03:00			`uint32_t rec_mode, bool freeze)`
break out the setting/clearing of recovery mode into a dedicated helper function (This used to be ctdb commit dba4e4f8aa4f2fde1e9f8d93bdf3a33f7de8ce18) 2007-05-06 03:53:12 +04:00			`{`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`TDB_DATA data;`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`uint32_t *nodes;`
			`TALLOC_CTX *tmp_ctx;`

			`tmp_ctx = talloc_new(ctdb);`
			`CTDB_NO_MEMORY(ctdb, tmp_ctx);`

add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`nodes = list_of_active_nodes(ctdb, nodemap, tmp_ctx, true);`
ctdb-recoverd: Set recovery mode before freezing databases Setting recovery mode to active is the only correct way to inform recovery daemon to run database recovery. Only freezing databases without setting recovery mode should not trigger database recovery, as this mechanism is used in tool to implement wipedb/restoredb commands. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2014-05-06 08:24:52 +04:00
			`data.dsize = sizeof(uint32_t);`
			`data.dptr = (unsigned char *)&rec_mode;`

			`if (ctdb_client_async_control(ctdb, CTDB_CONTROL_SET_RECMODE,`
			`nodes, 0,`
			`CONTROL_TIMEOUT(),`
			`false, data,`
			`NULL, NULL,`
			`NULL) != 0) {`
			`DEBUG(DEBUG_ERR, (__location__ " Unable to set recovery mode. Recovery failed.\n"));`
			`talloc_free(tmp_ctx);`
			`return -1;`
			`}`

			`/* freeze all nodes */`
ctdb-recoverd: Do not freeze databases for election If election occurs during SMB activity, then trying to freeze all the databases can cause samba/ctdb deadlock which parallel database recovery is trying to avoid. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-06 03:52:06 +03:00			`if (freeze && rec_mode == CTDB_RECOVERY_ACTIVE) {`
initial attempt at freezing databases in priority order (This used to be ctdb commit e8d692590da1070c87a4144031e3306d190ebed2) 2009-10-12 05:08:39 +04:00			`int i;`

			`for (i=1; i<=NUM_DB_PRIORITIES; i++) {`
			`if (ctdb_client_async_control(ctdb, CTDB_CONTROL_FREEZE,`
			`nodes, i,`
			`CONTROL_TIMEOUT(),`
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`false, tdb_null,`
if a node fails to become frozen during recovery, mark it up with as a culprit so it will soon get banned (This used to be ctdb commit f72d33ac73ebb1af802bacdfb30279df3cd8b8f9) 2009-10-08 09:45:25 +04:00			`NULL,`
			`set_recmode_fail_callback,`
			`rec) != 0) {`
initial attempt at freezing databases in priority order (This used to be ctdb commit e8d692590da1070c87a4144031e3306d190ebed2) 2009-10-12 05:08:39 +04:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to freeze nodes. Recovery failed.\n"));`
			`talloc_free(tmp_ctx);`
			`return -1;`
			`}`
add async versions of the freeze node control and freeze all nodes in parallell (This used to be ctdb commit f34e89f54d9f4380e76eb1b5b2385a4d8500b505) 2007-08-27 04:31:22 +04:00			`}`
			`}`

merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`talloc_free(tmp_ctx);`
break out the setting/clearing of recovery mode into a dedicated helper function (This used to be ctdb commit dba4e4f8aa4f2fde1e9f8d93bdf3a33f7de8ce18) 2007-05-06 03:53:12 +04:00			`return 0;`
			`}`

implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`/*`
			`change recovery master on all node`
			`*/`
ctdb-daemon: Rename struct ctdb_node_map to ctdb_node_map_old Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 09:22:48 +03:00			`static int set_recovery_master(struct ctdb_context ctdb, struct ctdb_node_map_old nodemap, uint32_t pnn)`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`{`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`TDB_DATA data;`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`TALLOC_CTX *tmp_ctx;`
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`uint32_t *nodes;`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00
			`tmp_ctx = talloc_new(ctdb);`
			`CTDB_NO_MEMORY(ctdb, tmp_ctx);`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`data.dsize = sizeof(uint32_t);`
			`data.dptr = (unsigned char *)&pnn;`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`nodes = list_of_active_nodes(ctdb, nodemap, tmp_ctx, true);`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`if (ctdb_client_async_control(ctdb, CTDB_CONTROL_SET_RECMASTER,`
initial attempt at freezing databases in priority order (This used to be ctdb commit e8d692590da1070c87a4144031e3306d190ebed2) 2009-10-12 05:08:39 +04:00			`nodes, 0,`
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`CONTROL_TIMEOUT(), false, data,`
			`NULL, NULL,`
			`NULL) != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to set recmaster. Recovery failed.\n"));`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`talloc_free(tmp_ctx);`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`return -1;`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`}`

merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`talloc_free(tmp_ctx);`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`return 0;`
			`}`

during recovery, update all remote nodes so they use the same priorities for the databases as this node. (This used to be ctdb commit 465dc95fef0ff6651ff49fa94e4cf2ebd1036ac4) 2009-10-10 09:28:20 +04:00			`/* update all remote nodes to use the same db priority that we have`
			`this can fail if the remove node has not yet been upgraded to`
			`support this function, so we always return success and never fail`
			`a recovery if this call fails.`
			`*/`
			`static int update_db_priority_on_remote_nodes(struct ctdb_context *ctdb,`
ctdb-daemon: Rename struct ctdb_node_map to ctdb_node_map_old Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 09:22:48 +03:00			`struct ctdb_node_map_old *nodemap,`
ctdb-daemon: Rename struct ctdb_dbid_map to ctdb_dbid_map_old Match struct ctdb_dbid as per protocol/protocol.h Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 09:46:05 +03:00			`uint32_t pnn, struct ctdb_dbid_map_old dbmap, TALLOC_CTX mem_ctx)`
during recovery, update all remote nodes so they use the same priorities for the databases as this node. (This used to be ctdb commit 465dc95fef0ff6651ff49fa94e4cf2ebd1036ac4) 2009-10-10 09:28:20 +04:00			`{`
			`int db;`

			`/* step through all local databases */`
			`for (db=0; db<dbmap->num;db++) {`
			`struct ctdb_db_priority db_prio;`
			`int ret;`

ctdb-daemon: Rename struct ctdb_dbid_map to ctdb_dbid_map_old Match struct ctdb_dbid as per protocol/protocol.h Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 09:46:05 +03:00			`db_prio.db_id = dbmap->dbs[db].db_id;`
			`ret = ctdb_ctrl_get_db_priority(ctdb, CONTROL_TIMEOUT(), CTDB_CURRENT_NODE, dbmap->dbs[db].db_id, &db_prio.priority);`
during recovery, update all remote nodes so they use the same priorities for the databases as this node. (This used to be ctdb commit 465dc95fef0ff6651ff49fa94e4cf2ebd1036ac4) 2009-10-10 09:28:20 +04:00			`if (ret != 0) {`
ctdb-daemon: Rename struct ctdb_dbid_map to ctdb_dbid_map_old Match struct ctdb_dbid as per protocol/protocol.h Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 09:46:05 +03:00			`DEBUG(DEBUG_ERR,(__location__ " Failed to read database priority from local node for db 0x%08x\n", dbmap->dbs[db].db_id));`
during recovery, update all remote nodes so they use the same priorities for the databases as this node. (This used to be ctdb commit 465dc95fef0ff6651ff49fa94e4cf2ebd1036ac4) 2009-10-10 09:28:20 +04:00			`continue;`
			`}`

ctdb-daemon: Rename struct ctdb_dbid_map to ctdb_dbid_map_old Match struct ctdb_dbid as per protocol/protocol.h Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 09:46:05 +03:00			`DEBUG(DEBUG_INFO,("Update DB priority for db 0x%08x to %u\n", dbmap->dbs[db].db_id, db_prio.priority));`
during recovery, update all remote nodes so they use the same priorities for the databases as this node. (This used to be ctdb commit 465dc95fef0ff6651ff49fa94e4cf2ebd1036ac4) 2009-10-10 09:28:20 +04:00
ctdb-daemon: Always update database priority cluster wide Database priority is a global property and all the nodes should have the priority set for the databases. Just setting priority on one node can lead to problems in the recovery as a database can be frozen at wrong priority and then freezing database would not succeed. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: David Disseldorp <ddiss@samba.org> Autobuild-User(master): David Disseldorp <ddiss@samba.org> Autobuild-Date(master): Mon Apr 7 14:06:26 CEST 2014 on sn-devel-104 2014-04-02 10:17:47 +04:00			`ret = ctdb_ctrl_set_db_priority(ctdb, CONTROL_TIMEOUT(),`
			`CTDB_CURRENT_NODE, &db_prio);`
			`if (ret != 0) {`
			`DEBUG(DEBUG_ERR,(__location__ " Failed to set DB priority for 0x%08x\n",`
			`db_prio.db_id));`
during recovery, update all remote nodes so they use the same priorities for the databases as this node. (This used to be ctdb commit 465dc95fef0ff6651ff49fa94e4cf2ebd1036ac4) 2009-10-10 09:28:20 +04:00			`}`
			`}`

			`return 0;`
			`}`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00
			`/*`
			`ensure all other nodes have attached to any databases that we have`
			`*/`
ctdb-daemon: Rename struct ctdb_node_map to ctdb_node_map_old Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 09:22:48 +03:00			`static int create_missing_remote_databases(struct ctdb_context ctdb, struct ctdb_node_map_old nodemap,`
ctdb-daemon: Rename struct ctdb_dbid_map to ctdb_dbid_map_old Match struct ctdb_dbid as per protocol/protocol.h Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 09:46:05 +03:00			`uint32_t pnn, struct ctdb_dbid_map_old dbmap, TALLOC_CTX mem_ctx)`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`{`
recovery daemon this program is a client to the local ctdb daemon every second it pulls all vnnmap and nodemaps from all nodes that are available and checks if a recovery is required a recovery is required if : * all nodes do NOT have an identical vnnmap and generation * all nodes do NOT have an identical nodemap * there are active nodes that are NOT in the nodemap * there are nodes in the nodemap that are NOT active During recovery, the recovery tool will also make sure that all nodes know about and have created all databases. (This used to be ctdb commit 2f2650467bac7e8954de7c17cb34f46b0bdbcd26) 2007-05-04 09:21:40 +04:00			`int i, j, db, ret;`
ctdb-daemon: Rename struct ctdb_dbid_map to ctdb_dbid_map_old Match struct ctdb_dbid as per protocol/protocol.h Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 09:46:05 +03:00			`struct ctdb_dbid_map_old *remote_dbmap;`
recovery daemon this program is a client to the local ctdb daemon every second it pulls all vnnmap and nodemaps from all nodes that are available and checks if a recovery is required a recovery is required if : * all nodes do NOT have an identical vnnmap and generation * all nodes do NOT have an identical nodemap * there are active nodes that are NOT in the nodemap * there are nodes in the nodemap that are NOT active During recovery, the recovery tool will also make sure that all nodes know about and have created all databases. (This used to be ctdb commit 2f2650467bac7e8954de7c17cb34f46b0bdbcd26) 2007-05-04 09:21:40 +04:00
update to rhe recovery daemon ctdb_ctrl_ calls are timedout due to nodes arriving or leaving the cluster it crashes the recovery daemon afterwards with a SEGV but no useful stack backtrace (This used to be ctdb commit cd3abc7349e86555ccd87cd47a1dcc2adad2f46c) 2007-05-06 00:58:01 +04:00			`/* verify that all other nodes have all our databases */`
			`for (j=0; j<nodemap->num; j++) {`
Fix various spelling errors Reviewed-by: Andrew Bartlett <abartlet@samba.org> Reviewed-by: Michael Adam <obnox@samba.org> Autobuild-User(master): Andrew Bartlett <abartlet@samba.org> Autobuild-Date(master): Fri Nov 6 13:43:45 CET 2015 on sn-devel-104 2015-07-27 00:02:57 +03:00			`/* we don't need to ourself ourselves */`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`if (nodemap->nodes[j].pnn == pnn) {`
update to rhe recovery daemon ctdb_ctrl_ calls are timedout due to nodes arriving or leaving the cluster it crashes the recovery daemon afterwards with a SEGV but no useful stack backtrace (This used to be ctdb commit cd3abc7349e86555ccd87cd47a1dcc2adad2f46c) 2007-05-06 00:58:01 +04:00			`continue;`
			`}`
Fix various spelling errors Reviewed-by: Andrew Bartlett <abartlet@samba.org> Reviewed-by: Michael Adam <obnox@samba.org> Autobuild-User(master): Andrew Bartlett <abartlet@samba.org> Autobuild-Date(master): Fri Nov 6 13:43:45 CET 2015 on sn-devel-104 2015-07-27 00:02:57 +03:00			`/* don't check nodes that are unavailable */`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`if (nodemap->nodes[j].flags & NODE_FLAGS_INACTIVE) {`
update to rhe recovery daemon ctdb_ctrl_ calls are timedout due to nodes arriving or leaving the cluster it crashes the recovery daemon afterwards with a SEGV but no useful stack backtrace (This used to be ctdb commit cd3abc7349e86555ccd87cd47a1dcc2adad2f46c) 2007-05-06 00:58:01 +04:00			`continue;`
			`}`

change how we do public addresses and takeover so that we can have multiple public addresses spread across multiple interfaces on each node. this is a massive patch since we have previously made the assumtion that we only have one public address per node. get rid of the public_interface argument. the public addresses file now explicitely lists which interface the address belongs to (This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8) 2007-09-04 03:50:07 +04:00			`ret = ctdb_ctrl_getdbmap(ctdb, CONTROL_TIMEOUT(), nodemap->nodes[j].pnn,`
formatting fixes (This used to be ctdb commit ed63a2057698aed3931762605b2ea2368681af2b) 2007-06-07 12:39:37 +04:00			`mem_ctx, &remote_dbmap);`
update to rhe recovery daemon ctdb_ctrl_ calls are timedout due to nodes arriving or leaving the cluster it crashes the recovery daemon afterwards with a SEGV but no useful stack backtrace (This used to be ctdb commit cd3abc7349e86555ccd87cd47a1dcc2adad2f46c) 2007-05-06 00:58:01 +04:00			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to get dbids from node %u\n", pnn));`
update to rhe recovery daemon ctdb_ctrl_ calls are timedout due to nodes arriving or leaving the cluster it crashes the recovery daemon afterwards with a SEGV but no useful stack backtrace (This used to be ctdb commit cd3abc7349e86555ccd87cd47a1dcc2adad2f46c) 2007-05-06 00:58:01 +04:00			`return -1;`
			`}`

			`/* step through all local databases */`
			`for (db=0; db<dbmap->num;db++) {`
			`const char *name;`


			`for (i=0;i<remote_dbmap->num;i++) {`
ctdb-daemon: Rename struct ctdb_dbid_map to ctdb_dbid_map_old Match struct ctdb_dbid as per protocol/protocol.h Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 09:46:05 +03:00			`if (dbmap->dbs[db].db_id == remote_dbmap->dbs[i].db_id) {`
update to rhe recovery daemon ctdb_ctrl_ calls are timedout due to nodes arriving or leaving the cluster it crashes the recovery daemon afterwards with a SEGV but no useful stack backtrace (This used to be ctdb commit cd3abc7349e86555ccd87cd47a1dcc2adad2f46c) 2007-05-06 00:58:01 +04:00			`break;`
			`}`
			`}`
			`/* the remote node already have this database */`
			`if (i!=remote_dbmap->num) {`
			`continue;`
			`}`
			`/* ok so we need to create this database */`
ctdb-server: Coverity fixes Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Michael Adam <obnox@samba.org> 2013-11-11 05:39:27 +04:00			`ret = ctdb_ctrl_getdbname(ctdb, CONTROL_TIMEOUT(), pnn,`
ctdb-daemon: Rename struct ctdb_dbid_map to ctdb_dbid_map_old Match struct ctdb_dbid as per protocol/protocol.h Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 09:46:05 +03:00			`dbmap->dbs[db].db_id, mem_ctx,`
ctdb-server: Coverity fixes Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Michael Adam <obnox@samba.org> 2013-11-11 05:39:27 +04:00			`&name);`
update to rhe recovery daemon ctdb_ctrl_ calls are timedout due to nodes arriving or leaving the cluster it crashes the recovery daemon afterwards with a SEGV but no useful stack backtrace (This used to be ctdb commit cd3abc7349e86555ccd87cd47a1dcc2adad2f46c) 2007-05-06 00:58:01 +04:00			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to get dbname from node %u\n", pnn));`
update to rhe recovery daemon ctdb_ctrl_ calls are timedout due to nodes arriving or leaving the cluster it crashes the recovery daemon afterwards with a SEGV but no useful stack backtrace (This used to be ctdb commit cd3abc7349e86555ccd87cd47a1dcc2adad2f46c) 2007-05-06 00:58:01 +04:00			`return -1;`
			`}`
ctdb-server: Coverity fixes Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Michael Adam <obnox@samba.org> 2013-11-11 05:39:27 +04:00			`ret = ctdb_ctrl_createdb(ctdb, CONTROL_TIMEOUT(),`
			`nodemap->nodes[j].pnn,`
			`mem_ctx, name,`
			`dbmap->dbs[db].flags & CTDB_DB_FLAGS_PERSISTENT);`
update to rhe recovery daemon ctdb_ctrl_ calls are timedout due to nodes arriving or leaving the cluster it crashes the recovery daemon afterwards with a SEGV but no useful stack backtrace (This used to be ctdb commit cd3abc7349e86555ccd87cd47a1dcc2adad2f46c) 2007-05-06 00:58:01 +04:00			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to create remote db:%s\n", name));`
update to rhe recovery daemon ctdb_ctrl_ calls are timedout due to nodes arriving or leaving the cluster it crashes the recovery daemon afterwards with a SEGV but no useful stack backtrace (This used to be ctdb commit cd3abc7349e86555ccd87cd47a1dcc2adad2f46c) 2007-05-06 00:58:01 +04:00			`return -1;`
			`}`
			`}`
			`}`
recovery daemon this program is a client to the local ctdb daemon every second it pulls all vnnmap and nodemaps from all nodes that are available and checks if a recovery is required a recovery is required if : * all nodes do NOT have an identical vnnmap and generation * all nodes do NOT have an identical nodemap * there are active nodes that are NOT in the nodemap * there are nodes in the nodemap that are NOT active During recovery, the recovery tool will also make sure that all nodes know about and have created all databases. (This used to be ctdb commit 2f2650467bac7e8954de7c17cb34f46b0bdbcd26) 2007-05-04 09:21:40 +04:00
add a helper function to create all missing remote databases detected during recovery (This used to be ctdb commit 04758c6f7d8f61260be6d2472380cb7904984427) 2007-05-06 04:04:37 +04:00			`return 0;`
			`}`


implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`/*`
			`ensure we are attached to any databases that anyone else is attached to`
			`*/`
ctdb-daemon: Rename struct ctdb_node_map to ctdb_node_map_old Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 09:22:48 +03:00			`static int create_missing_local_databases(struct ctdb_context ctdb, struct ctdb_node_map_old nodemap,`
ctdb-daemon: Rename struct ctdb_dbid_map to ctdb_dbid_map_old Match struct ctdb_dbid as per protocol/protocol.h Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 09:46:05 +03:00			`uint32_t pnn, struct ctdb_dbid_map_old *dbmap, TALLOC_CTX mem_ctx)`
create a helper function to make sure the local node that does recovery has all the databases that exist on any other remote node (This used to be ctdb commit 0f436e3d40fea6e6a146019b0c664e80e81e88b4) 2007-05-06 04:12:42 +04:00			`{`
			`int i, j, db, ret;`
ctdb-daemon: Rename struct ctdb_dbid_map to ctdb_dbid_map_old Match struct ctdb_dbid as per protocol/protocol.h Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 09:46:05 +03:00			`struct ctdb_dbid_map_old *remote_dbmap;`
create a helper function to make sure the local node that does recovery has all the databases that exist on any other remote node (This used to be ctdb commit 0f436e3d40fea6e6a146019b0c664e80e81e88b4) 2007-05-06 04:12:42 +04:00
			`/* verify that we have all database any other node has */`
			`for (j=0; j<nodemap->num; j++) {`
Fix various spelling errors Reviewed-by: Andrew Bartlett <abartlet@samba.org> Reviewed-by: Michael Adam <obnox@samba.org> Autobuild-User(master): Andrew Bartlett <abartlet@samba.org> Autobuild-Date(master): Fri Nov 6 13:43:45 CET 2015 on sn-devel-104 2015-07-27 00:02:57 +03:00			`/* we don't need to ourself ourselves */`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`if (nodemap->nodes[j].pnn == pnn) {`
create a helper function to make sure the local node that does recovery has all the databases that exist on any other remote node (This used to be ctdb commit 0f436e3d40fea6e6a146019b0c664e80e81e88b4) 2007-05-06 04:12:42 +04:00			`continue;`
			`}`
Fix various spelling errors Reviewed-by: Andrew Bartlett <abartlet@samba.org> Reviewed-by: Michael Adam <obnox@samba.org> Autobuild-User(master): Andrew Bartlett <abartlet@samba.org> Autobuild-Date(master): Fri Nov 6 13:43:45 CET 2015 on sn-devel-104 2015-07-27 00:02:57 +03:00			`/* don't check nodes that are unavailable */`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`if (nodemap->nodes[j].flags & NODE_FLAGS_INACTIVE) {`
create a helper function to make sure the local node that does recovery has all the databases that exist on any other remote node (This used to be ctdb commit 0f436e3d40fea6e6a146019b0c664e80e81e88b4) 2007-05-06 04:12:42 +04:00			`continue;`
			`}`

change how we do public addresses and takeover so that we can have multiple public addresses spread across multiple interfaces on each node. this is a massive patch since we have previously made the assumtion that we only have one public address per node. get rid of the public_interface argument. the public addresses file now explicitely lists which interface the address belongs to (This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8) 2007-09-04 03:50:07 +04:00			`ret = ctdb_ctrl_getdbmap(ctdb, CONTROL_TIMEOUT(), nodemap->nodes[j].pnn,`
formatting fixes (This used to be ctdb commit ed63a2057698aed3931762605b2ea2368681af2b) 2007-06-07 12:39:37 +04:00			`mem_ctx, &remote_dbmap);`
create a helper function to make sure the local node that does recovery has all the databases that exist on any other remote node (This used to be ctdb commit 0f436e3d40fea6e6a146019b0c664e80e81e88b4) 2007-05-06 04:12:42 +04:00			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to get dbids from node %u\n", pnn));`
create a helper function to make sure the local node that does recovery has all the databases that exist on any other remote node (This used to be ctdb commit 0f436e3d40fea6e6a146019b0c664e80e81e88b4) 2007-05-06 04:12:42 +04:00			`return -1;`
			`}`

			`/* step through all databases on the remote node */`
			`for (db=0; db<remote_dbmap->num;db++) {`
			`const char *name;`

			`for (i=0;i<(*dbmap)->num;i++) {`
ctdb-daemon: Rename struct ctdb_dbid_map to ctdb_dbid_map_old Match struct ctdb_dbid as per protocol/protocol.h Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 09:46:05 +03:00			`if (remote_dbmap->dbs[db].db_id == (*dbmap)->dbs[i].db_id) {`
create a helper function to make sure the local node that does recovery has all the databases that exist on any other remote node (This used to be ctdb commit 0f436e3d40fea6e6a146019b0c664e80e81e88b4) 2007-05-06 04:12:42 +04:00			`break;`
			`}`
			`}`
			`/* we already have this db locally */`
			`if (i!=(*dbmap)->num) {`
			`continue;`
			`}`
			`/* ok so we need to create this database and`
			`rebuild dbmap`
			`*/`
change how we do public addresses and takeover so that we can have multiple public addresses spread across multiple interfaces on each node. this is a massive patch since we have previously made the assumtion that we only have one public address per node. get rid of the public_interface argument. the public addresses file now explicitely lists which interface the address belongs to (This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8) 2007-09-04 03:50:07 +04:00			`ctdb_ctrl_getdbname(ctdb, CONTROL_TIMEOUT(), nodemap->nodes[j].pnn,`
ctdb-daemon: Rename struct ctdb_dbid_map to ctdb_dbid_map_old Match struct ctdb_dbid as per protocol/protocol.h Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 09:46:05 +03:00			`remote_dbmap->dbs[db].db_id, mem_ctx, &name);`
create a helper function to make sure the local node that does recovery has all the databases that exist on any other remote node (This used to be ctdb commit 0f436e3d40fea6e6a146019b0c664e80e81e88b4) 2007-05-06 04:12:42 +04:00			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to get dbname from node %u\n",`
change how we do public addresses and takeover so that we can have multiple public addresses spread across multiple interfaces on each node. this is a massive patch since we have previously made the assumtion that we only have one public address per node. get rid of the public_interface argument. the public addresses file now explicitely lists which interface the address belongs to (This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8) 2007-09-04 03:50:07 +04:00			`nodemap->nodes[j].pnn));`
create a helper function to make sure the local node that does recovery has all the databases that exist on any other remote node (This used to be ctdb commit 0f436e3d40fea6e6a146019b0c664e80e81e88b4) 2007-05-06 04:12:42 +04:00			`return -1;`
			`}`
added support for persistent databases in ctdbd (This used to be ctdb commit 3115090a0d882beca9d70761130b74bb0821f201) 2007-09-21 06:24:02 +04:00			`ctdb_ctrl_createdb(ctdb, CONTROL_TIMEOUT(), pnn, mem_ctx, name,`
Revert "recoverd: Use correct tdb flags when creating missing databases" This reverts commit 10a057d8e15c8c18e540598a940d3548c731b0b4. This approach would not work when creating local databases since currently there is no control to receive TDB flags for remote databases. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit ca61eb776ab862bd269e45ee0f9f96e7e1e0e001) 2013-08-13 07:55:47 +04:00			`remote_dbmap->dbs[db].flags & CTDB_DB_FLAGS_PERSISTENT);`
create a helper function to make sure the local node that does recovery has all the databases that exist on any other remote node (This used to be ctdb commit 0f436e3d40fea6e6a146019b0c664e80e81e88b4) 2007-05-06 04:12:42 +04:00			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to create local db:%s\n", name));`
create a helper function to make sure the local node that does recovery has all the databases that exist on any other remote node (This used to be ctdb commit 0f436e3d40fea6e6a146019b0c664e80e81e88b4) 2007-05-06 04:12:42 +04:00			`return -1;`
			`}`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`ret = ctdb_ctrl_getdbmap(ctdb, CONTROL_TIMEOUT(), pnn, mem_ctx, dbmap);`
create a helper function to make sure the local node that does recovery has all the databases that exist on any other remote node (This used to be ctdb commit 0f436e3d40fea6e6a146019b0c664e80e81e88b4) 2007-05-06 04:12:42 +04:00			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to reread dbmap on node %u\n", pnn));`
create a helper function to make sure the local node that does recovery has all the databases that exist on any other remote node (This used to be ctdb commit 0f436e3d40fea6e6a146019b0c664e80e81e88b4) 2007-05-06 04:12:42 +04:00			`return -1;`
			`}`
			`}`
			`}`

			`return 0;`
			`}`

create a helper function for recovery that pulls and merges all remote databases onto the local node (This used to be ctdb commit 5cecc47449c369f91e83389a94b987ac32b1e3f4) 2007-05-06 04:16:48 +04:00
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`/*`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`pull the remote database contents from one node into the recdb`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`*/`
ctdb-recoverd: Replace unnecessary use of ctdb->recovery_master Databases are only pulled by the recovery master, so it can compare with current node PNN. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-03-31 06:03:43 +03:00			`static int pull_one_remote_database(struct ctdb_context *ctdb, uint32_t srcnode,`
Recover Persistent database DB by DB and not record by record Add a new tunable that changes the mode how persistent databases are recovered. RecoveryPDBBySeqNum When set to 1, persistent databases will be recovered in whole from the node which has the highest "__db_sequence_number__" record. This record is managed by samba for those databases where we do persistent writes and have inter-record relations. For these databases we do not want the usual "blend records from all nodes based on individual record RSN" but instead a mode where we pick one instance of the persistent database. If no node was found with a "__db_sequence_number__" record at all, we fail back to the original "recover records independently based on record RSN". Some persistent databases do not contain record interrelations and as such does not contain this special record at all. (This used to be ctdb commit 502150c764298a9fa8c4d8aa445bf7d85d4ee9dc) 2011-11-28 06:56:30 +04:00			`struct tdb_wrap *recdb, uint32_t dbid)`
create a helper function for recovery that pulls and merges all remote databases onto the local node (This used to be ctdb commit 5cecc47449c369f91e83389a94b987ac32b1e3f4) 2007-05-06 04:16:48 +04:00			`{`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`int ret;`
			`TDB_DATA outdata;`
renamed the pulldb structure to a ctdb_marshall_buffer (This used to be ctdb commit bad53b2d342bb9760497e6f4a61e64ca50d6e771) 2008-07-30 13:59:18 +04:00			`struct ctdb_marshall_buffer *reply;`
ctdb-daemon: Rename struct ctdb_rec_data to ctdb_rec_data_old Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 09:30:30 +03:00			`struct ctdb_rec_data_old *recdata;`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`int i;`
			`TALLOC_CTX *tmp_ctx = talloc_new(recdb);`
create a helper function for recovery that pulls and merges all remote databases onto the local node (This used to be ctdb commit 5cecc47449c369f91e83389a94b987ac32b1e3f4) 2007-05-06 04:16:48 +04:00
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`ret = ctdb_ctrl_pulldb(ctdb, srcnode, dbid, CTDB_LMASTER_ANY, tmp_ctx,`
			`CONTROL_TIMEOUT(), &outdata);`
			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,(__location__ " Unable to copy db from node %u\n", srcnode));`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`talloc_free(tmp_ctx);`
			`return -1;`
			`}`

rename the structure we use for marshalling multiple records (This used to be ctdb commit 4d205476d286570a6e1f52b59af42858ce051106) 2008-07-30 08:24:56 +04:00			`reply = (struct ctdb_marshall_buffer *)outdata.dptr;`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00
rename the structure we use for marshalling multiple records (This used to be ctdb commit 4d205476d286570a6e1f52b59af42858ce051106) 2008-07-30 08:24:56 +04:00			`if (outdata.dsize < offsetof(struct ctdb_marshall_buffer, data)) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,(__location__ " invalid data in pulldb reply\n"));`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`talloc_free(tmp_ctx);`
			`return -1;`
			`}`
ctdb-recoverd: Rename some local variables to avoid conflict with convention rec is always a (struct ctdb_recoverd *) Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-03-29 11:20:55 +03:00
ctdb-daemon: Rename struct ctdb_rec_data to ctdb_rec_data_old Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 09:30:30 +03:00			`recdata = (struct ctdb_rec_data_old *)&reply->data[0];`
ctdb-recoverd: Rename some local variables to avoid conflict with convention rec is always a (struct ctdb_recoverd *) Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-03-29 11:20:55 +03:00
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`for (i=0;`
			`i<reply->count;`
ctdb-daemon: Rename struct ctdb_rec_data to ctdb_rec_data_old Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 09:30:30 +03:00			`recdata = (struct ctdb_rec_data_old )(recdata->length + (uint8_t )recdata), i++) {`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`TDB_DATA key, data;`
			`struct ctdb_ltdb_header *hdr;`
			`TDB_DATA existing;`
ctdb-recoverd: Rename some local variables to avoid conflict with convention rec is always a (struct ctdb_recoverd *) Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-03-29 11:20:55 +03:00
			`key.dptr = &recdata->data[0];`
			`key.dsize = recdata->keylen;`
			`data.dptr = &recdata->data[key.dsize];`
			`data.dsize = recdata->datalen;`

new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`hdr = (struct ctdb_ltdb_header *)data.dptr;`

			`if (data.dsize < sizeof(struct ctdb_ltdb_header)) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_CRIT,(__location__ " bad ltdb record\n"));`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`talloc_free(tmp_ctx);`
			`return -1;`
			`}`

			`/* fetch the existing record, if any */`
			`existing = tdb_fetch(recdb->tdb, key);`
ctdb-recoverd: Replace unnecessary use of ctdb->recovery_master Databases are only pulled by the recovery master, so it can compare with current node PNN. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-03-31 06:03:43 +03:00
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`if (existing.dptr != NULL) {`
			`struct ctdb_ltdb_header header;`
			`if (existing.dsize < sizeof(struct ctdb_ltdb_header)) {`
ctdb-recoverd: Replace unnecessary use of ctdb->recovery_master Databases are only pulled by the recovery master, so it can compare with current node PNN. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-03-31 06:03:43 +03:00			`DEBUG(DEBUG_CRIT,(__location__ " Bad record size %u from node %u\n",`
catch internal traversal errors (This used to be ctdb commit 8caa85ad71be5d20a8d6f0cb3d52aff6905657a4) 2008-01-07 06:08:25 +03:00			`(unsigned)existing.dsize, srcnode));`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`free(existing.dptr);`
			`talloc_free(tmp_ctx);`
			`return -1;`
create a helper function for recovery that pulls and merges all remote databases onto the local node (This used to be ctdb commit 5cecc47449c369f91e83389a94b987ac32b1e3f4) 2007-05-06 04:16:48 +04:00			`}`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`header = (struct ctdb_ltdb_header )existing.dptr;`
			`free(existing.dptr);`
Revert "recovery: add special pull-logic for persistent databases" This reverts commit 8aef46d2aab3efb322dda51eaa202653cefd5222. This special recovery logic is wrong now with the transaction rewrite. The treatment of persistent databases will later be rewritten to use the database sequence number. Michael (This used to be ctdb commit c5a0aef668a63f927d6184612b13ce316eb4a0be) 2009-12-11 19:05:30 +03:00			`if (!(header.rsn < hdr->rsn \|\|`
ctdb-recoverd: Replace unnecessary use of ctdb->recovery_master Databases are only pulled by the recovery master, so it can compare with current node PNN. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-03-31 06:03:43 +03:00			`(header.dmaster != ctdb_get_pnn(ctdb) &&`
			`header.rsn == hdr->rsn))) {`
Revert "recovery: add special pull-logic for persistent databases" This reverts commit 8aef46d2aab3efb322dda51eaa202653cefd5222. This special recovery logic is wrong now with the transaction rewrite. The treatment of persistent databases will later be rewritten to use the database sequence number. Michael (This used to be ctdb commit c5a0aef668a63f927d6184612b13ce316eb4a0be) 2009-12-11 19:05:30 +03:00			`continue;`
recovery: add special pull-logic for persistent databases The decision mechanism which records of a persistent db are to be pulled into the recdb during recovery is now as follows: * Usually a record with the higher rsn than that already stored is taken. (Just as for normal tdbs.) * If a transaction is running on some node, then those nodes copies of all records are taken and are not overwritten later by other nodes' copies. In order to keep track of whether a record's copy was obtained from a node with a transaction running, the recovery mechanism misuses the ctdb tdb header field 'lacount' in the recdb. It is cleared later when pushing out the recdb database to the other nodes. This way, an incomplete transaction is not spoiled when a recovery interrupts and the replay should usually succeed (possibly after a few retries). Michael (This used to be ctdb commit 8aef46d2aab3efb322dda51eaa202653cefd5222) 2009-12-04 13:21:29 +03:00			`}`
			`}`
ctdb-recoverd: Replace unnecessary use of ctdb->recovery_master Databases are only pulled by the recovery master, so it can compare with current node PNN. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-03-31 06:03:43 +03:00
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`if (tdb_store(recdb->tdb, key, data, TDB_REPLACE) != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_CRIT,(__location__ " Failed to store record\n"));`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`talloc_free(tmp_ctx);`
ctdb-recoverd: Replace unnecessary use of ctdb->recovery_master Databases are only pulled by the recovery master, so it can compare with current node PNN. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-03-31 06:03:43 +03:00			`return -1;`
create a helper function for recovery that pulls and merges all remote databases onto the local node (This used to be ctdb commit 5cecc47449c369f91e83389a94b987ac32b1e3f4) 2007-05-06 04:16:48 +04:00			`}`
			`}`

new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`talloc_free(tmp_ctx);`

create a helper function for recovery that pulls and merges all remote databases onto the local node (This used to be ctdb commit 5cecc47449c369f91e83389a94b987ac32b1e3f4) 2007-05-06 04:16:48 +04:00			`return 0;`
			`}`

Recover Persistent database DB by DB and not record by record Add a new tunable that changes the mode how persistent databases are recovered. RecoveryPDBBySeqNum When set to 1, persistent databases will be recovered in whole from the node which has the highest "__db_sequence_number__" record. This record is managed by samba for those databases where we do persistent writes and have inter-record relations. For these databases we do not want the usual "blend records from all nodes based on individual record RSN" but instead a mode where we pick one instance of the persistent database. If no node was found with a "__db_sequence_number__" record at all, we fail back to the original "recover records independently based on record RSN". Some persistent databases do not contain record interrelations and as such does not contain this special record at all. (This used to be ctdb commit 502150c764298a9fa8c4d8aa445bf7d85d4ee9dc) 2011-11-28 06:56:30 +04:00
			`struct pull_seqnum_cbdata {`
			`int failed;`
			`uint32_t pnn;`
			`uint64_t seqnum;`
			`};`

			`static void pull_seqnum_cb(struct ctdb_context ctdb, uint32_t node_pnn, int32_t res, TDB_DATA outdata, void callback_data)`
			`{`
			`struct pull_seqnum_cbdata *cb_data = talloc_get_type(callback_data, struct pull_seqnum_cbdata);`
			`uint64_t seqnum;`

			`if (cb_data->failed != 0) {`
			`DEBUG(DEBUG_ERR, ("Got seqnum from node %d but we have already failed the entire operation\n", node_pnn));`
			`return;`
			`}`

			`if (res != 0) {`
			`DEBUG(DEBUG_ERR, ("Error when pulling seqnum from node %d\n", node_pnn));`
			`cb_data->failed = 1;`
			`return;`
			`}`

			`if (outdata.dsize != sizeof(uint64_t)) {`
			`DEBUG(DEBUG_ERR, ("Error when reading pull seqnum from node %d, got %d bytes but expected %d\n", node_pnn, (int)outdata.dsize, (int)sizeof(uint64_t)));`
			`cb_data->failed = -1;`
			`return;`
			`}`

			`seqnum = ((uint64_t )outdata.dptr);`

ctdb-recoverd: For persistent databases a sequence number of 0 is valid Otherwise recovery ends up done by RSN when it is unnecessary. Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Michael Adam <obnox@samba.org> 2013-11-15 08:20:40 +04:00			`if (seqnum > cb_data->seqnum \|\|`
			`(cb_data->pnn == -1 && seqnum == 0)) {`
Recover Persistent database DB by DB and not record by record Add a new tunable that changes the mode how persistent databases are recovered. RecoveryPDBBySeqNum When set to 1, persistent databases will be recovered in whole from the node which has the highest "__db_sequence_number__" record. This record is managed by samba for those databases where we do persistent writes and have inter-record relations. For these databases we do not want the usual "blend records from all nodes based on individual record RSN" but instead a mode where we pick one instance of the persistent database. If no node was found with a "__db_sequence_number__" record at all, we fail back to the original "recover records independently based on record RSN". Some persistent databases do not contain record interrelations and as such does not contain this special record at all. (This used to be ctdb commit 502150c764298a9fa8c4d8aa445bf7d85d4ee9dc) 2011-11-28 06:56:30 +04:00			`cb_data->seqnum = seqnum;`
			`cb_data->pnn = node_pnn;`
			`}`
			`}`

			`static void pull_seqnum_fail_cb(struct ctdb_context ctdb, uint32_t node_pnn, int32_t res, TDB_DATA outdata, void callback_data)`
			`{`
			`struct pull_seqnum_cbdata *cb_data = talloc_get_type(callback_data, struct pull_seqnum_cbdata);`

			`DEBUG(DEBUG_ERR, ("Failed to pull db seqnum from node %d\n", node_pnn));`
			`cb_data->failed = 1;`
			`}`

			`static int pull_highest_seqnum_pdb(struct ctdb_context *ctdb,`
			`struct ctdb_recoverd *rec,`
ctdb-daemon: Rename struct ctdb_node_map to ctdb_node_map_old Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 09:22:48 +03:00			`struct ctdb_node_map_old *nodemap,`
Recover Persistent database DB by DB and not record by record Add a new tunable that changes the mode how persistent databases are recovered. RecoveryPDBBySeqNum When set to 1, persistent databases will be recovered in whole from the node which has the highest "__db_sequence_number__" record. This record is managed by samba for those databases where we do persistent writes and have inter-record relations. For these databases we do not want the usual "blend records from all nodes based on individual record RSN" but instead a mode where we pick one instance of the persistent database. If no node was found with a "__db_sequence_number__" record at all, we fail back to the original "recover records independently based on record RSN". Some persistent databases do not contain record interrelations and as such does not contain this special record at all. (This used to be ctdb commit 502150c764298a9fa8c4d8aa445bf7d85d4ee9dc) 2011-11-28 06:56:30 +04:00			`struct tdb_wrap *recdb, uint32_t dbid)`
			`{`
			`TALLOC_CTX *tmp_ctx = talloc_new(NULL);`
			`uint32_t *nodes;`
			`TDB_DATA data;`
			`uint32_t outdata[2];`
			`struct pull_seqnum_cbdata *cb_data;`

			`DEBUG(DEBUG_NOTICE, ("Scan for highest seqnum pdb for db:0x%08x\n", dbid));`

			`outdata[0] = dbid;`
			`outdata[1] = 0;`

			`data.dsize = sizeof(outdata);`
			`data.dptr = (uint8_t *)&outdata[0];`

			`cb_data = talloc(tmp_ctx, struct pull_seqnum_cbdata);`
			`if (cb_data == NULL) {`
			`DEBUG(DEBUG_ERR, ("Failed to allocate pull highest seqnum cb_data structure\n"));`
			`talloc_free(tmp_ctx);`
			`return -1;`
			`}`

			`cb_data->failed = 0;`
			`cb_data->pnn = -1;`
			`cb_data->seqnum = 0;`

			`nodes = list_of_active_nodes(ctdb, nodemap, tmp_ctx, true);`
			`if (ctdb_client_async_control(ctdb, CTDB_CONTROL_GET_DB_SEQNUM,`
			`nodes, 0,`
			`CONTROL_TIMEOUT(), false, data,`
			`pull_seqnum_cb,`
			`pull_seqnum_fail_cb,`
			`cb_data) != 0) {`
			`DEBUG(DEBUG_ERR, (__location__ " Failed to run async GET_DB_SEQNUM\n"));`

			`talloc_free(tmp_ctx);`
			`return -1;`
			`}`

			`if (cb_data->failed != 0) {`
			`DEBUG(DEBUG_NOTICE, ("Failed to pull sequence numbers for DB 0x%08x\n", dbid));`
			`talloc_free(tmp_ctx);`
			`return -1;`
			`}`

ctdb-recoverd: For persistent databases a sequence number of 0 is valid Otherwise recovery ends up done by RSN when it is unnecessary. Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Michael Adam <obnox@samba.org> 2013-11-15 08:20:40 +04:00			`if (cb_data->pnn == -1) {`
Recover Persistent database DB by DB and not record by record Add a new tunable that changes the mode how persistent databases are recovered. RecoveryPDBBySeqNum When set to 1, persistent databases will be recovered in whole from the node which has the highest "__db_sequence_number__" record. This record is managed by samba for those databases where we do persistent writes and have inter-record relations. For these databases we do not want the usual "blend records from all nodes based on individual record RSN" but instead a mode where we pick one instance of the persistent database. If no node was found with a "__db_sequence_number__" record at all, we fail back to the original "recover records independently based on record RSN". Some persistent databases do not contain record interrelations and as such does not contain this special record at all. (This used to be ctdb commit 502150c764298a9fa8c4d8aa445bf7d85d4ee9dc) 2011-11-28 06:56:30 +04:00			`DEBUG(DEBUG_NOTICE, ("Failed to find a node with highest sequence numbers for DB 0x%08x\n", dbid));`
			`talloc_free(tmp_ctx);`
			`return -1;`
			`}`

			`DEBUG(DEBUG_NOTICE, ("Pull persistent db:0x%08x from node %d with highest seqnum:%lld\n", dbid, cb_data->pnn, (long long)cb_data->seqnum));`

			`if (pull_one_remote_database(ctdb, cb_data->pnn, recdb, dbid) != 0) {`
			`DEBUG(DEBUG_ERR, ("Failed to pull higest seqnum database 0x%08x from node %d\n", dbid, cb_data->pnn));`
			`talloc_free(tmp_ctx);`
			`return -1;`
			`}`

			`talloc_free(tmp_ctx);`
			`return 0;`
			`}`


implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`/*`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`pull all the remote database contents into the recdb`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`*/`
If we can not pull a database from a node during recovery, mark this node as a "culprit" so that it will eventually become banned. (This used to be ctdb commit 69dc3bf60b86d8df6dc5c7c6ebf303e847fb2ba9) 2009-04-24 07:58:32 +04:00			`static int pull_remote_database(struct ctdb_context *ctdb,`
			`struct ctdb_recoverd *rec,`
ctdb-daemon: Rename struct ctdb_node_map to ctdb_node_map_old Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 09:22:48 +03:00			`struct ctdb_node_map_old *nodemap,`
recovery: pass the persistent flag to recover_database() and further down to pull_remote_database(), pull_one_remote_database(), and push_recdb_database(). This is in preparation of special handling of persistent databases during recoveries. Michael (This used to be ctdb commit 90abc4ac7c16e854cf6e8f96b60a77bc92e35e07) 2009-11-29 13:14:31 +03:00			`struct tdb_wrap *recdb, uint32_t dbid,`
			`bool persistent)`
break the code that repoints dmaster for all local and remote records into a separate helper function (This used to be ctdb commit d5ab30d0ac21e736eb34eaa19bccfee5f0ce7cfb) 2007-05-06 04:22:13 +04:00			`{`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`int j;`
break the code that repoints dmaster for all local and remote records into a separate helper function (This used to be ctdb commit d5ab30d0ac21e736eb34eaa19bccfee5f0ce7cfb) 2007-05-06 04:22:13 +04:00
Recover Persistent database DB by DB and not record by record Add a new tunable that changes the mode how persistent databases are recovered. RecoveryPDBBySeqNum When set to 1, persistent databases will be recovered in whole from the node which has the highest "__db_sequence_number__" record. This record is managed by samba for those databases where we do persistent writes and have inter-record relations. For these databases we do not want the usual "blend records from all nodes based on individual record RSN" but instead a mode where we pick one instance of the persistent database. If no node was found with a "__db_sequence_number__" record at all, we fail back to the original "recover records independently based on record RSN". Some persistent databases do not contain record interrelations and as such does not contain this special record at all. (This used to be ctdb commit 502150c764298a9fa8c4d8aa445bf7d85d4ee9dc) 2011-11-28 06:56:30 +04:00			`if (persistent && ctdb->tunable.recover_pdb_by_seqnum != 0) {`
			`int ret;`
			`ret = pull_highest_seqnum_pdb(ctdb, rec, nodemap, recdb, dbid);`
			`if (ret == 0) {`
			`return 0;`
			`}`
			`}`

new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`/* pull all records from all other nodes across onto this node`
			`(this merges based on rsn)`
			`*/`
			`for (j=0; j<nodemap->num; j++) {`
Fix various spelling errors Reviewed-by: Andrew Bartlett <abartlet@samba.org> Reviewed-by: Michael Adam <obnox@samba.org> Autobuild-User(master): Andrew Bartlett <abartlet@samba.org> Autobuild-Date(master): Fri Nov 6 13:43:45 CET 2015 on sn-devel-104 2015-07-27 00:02:57 +03:00			`/* don't merge from nodes that are unavailable */`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`if (nodemap->nodes[j].flags & NODE_FLAGS_INACTIVE) {`
			`continue;`
			`}`
Recover Persistent database DB by DB and not record by record Add a new tunable that changes the mode how persistent databases are recovered. RecoveryPDBBySeqNum When set to 1, persistent databases will be recovered in whole from the node which has the highest "__db_sequence_number__" record. This record is managed by samba for those databases where we do persistent writes and have inter-record relations. For these databases we do not want the usual "blend records from all nodes based on individual record RSN" but instead a mode where we pick one instance of the persistent database. If no node was found with a "__db_sequence_number__" record at all, we fail back to the original "recover records independently based on record RSN". Some persistent databases do not contain record interrelations and as such does not contain this special record at all. (This used to be ctdb commit 502150c764298a9fa8c4d8aa445bf7d85d4ee9dc) 2011-11-28 06:56:30 +04:00			`if (pull_one_remote_database(ctdb, nodemap->nodes[j].pnn, recdb, dbid) != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,(__location__ " Failed to pull remote database from node %u\n",`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`nodemap->nodes[j].pnn));`
If we can not pull a database from a node during recovery, mark this node as a "culprit" so that it will eventually become banned. (This used to be ctdb commit 69dc3bf60b86d8df6dc5c7c6ebf303e847fb2ba9) 2009-04-24 07:58:32 +04:00			`ctdb_set_culprit_count(rec, nodemap->nodes[j].pnn, nodemap->num);`
more optimisations to recovery (This used to be ctdb commit 9a41ad0a842cd4f3792d6e84b5c809b7ff6f342e) 2008-01-02 14:44:46 +03:00			`return -1;`
break the code that repoints dmaster for all local and remote records into a separate helper function (This used to be ctdb commit d5ab30d0ac21e736eb34eaa19bccfee5f0ce7cfb) 2007-05-06 04:22:13 +04:00			`}`
			`}`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00
break the code that repoints dmaster for all local and remote records into a separate helper function (This used to be ctdb commit d5ab30d0ac21e736eb34eaa19bccfee5f0ce7cfb) 2007-05-06 04:22:13 +04:00			`return 0;`
			`}`

implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00
			`/*`
			`update flags on all active nodes`
			`*/`
ctdb-daemon: Rename struct ctdb_node_map to ctdb_node_map_old Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 09:22:48 +03:00			`static int update_flags_on_all_nodes(struct ctdb_context ctdb, struct ctdb_node_map_old nodemap, uint32_t pnn, uint32_t flags)`
verify that the recmaster has the correct flags for us and if not tell the recmaster what the flags should be (This used to be ctdb commit 3387597926ad71e4140cc504b828486d99a3ec8e) 2008-06-26 05:08:09 +04:00			`{`
reqrite the handling of flag updates across the cluster to eliminate a race between the ctdb tool and the recovery daemon both at once trying to push flag changes across the cluster. (This used to be ctdb commit a9a1156ea4e10483a4bf4265b8e9203f0af033aa) 2008-11-19 06:43:46 +03:00			`int ret;`
verify that the recmaster has the correct flags for us and if not tell the recmaster what the flags should be (This used to be ctdb commit 3387597926ad71e4140cc504b828486d99a3ec8e) 2008-06-26 05:08:09 +04:00
redo and update how we synchronize flags across the cluster. this simplifies the code and should close a race condition between the local recovery daemon and a remote node when flags are changing. (This used to be ctdb commit 32d460b8469eb53145f04161a5d01166f9b5f09e) 2008-12-05 08:32:30 +03:00			`ret = ctdb_ctrl_modflags(ctdb, CONTROL_TIMEOUT(), pnn, flags, ~flags);`
			`if (ret != 0) {`
reqrite the handling of flag updates across the cluster to eliminate a race between the ctdb tool and the recovery daemon both at once trying to push flag changes across the cluster. (This used to be ctdb commit a9a1156ea4e10483a4bf4265b8e9203f0af033aa) 2008-11-19 06:43:46 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to update nodeflags on remote nodes\n"));`
			`return -1;`
			`}`
verify that the recmaster has the correct flags for us and if not tell the recmaster what the flags should be (This used to be ctdb commit 3387597926ad71e4140cc504b828486d99a3ec8e) 2008-06-26 05:08:09 +04:00
			`return 0;`
			`}`
create a helper function for recovery to push all local databases out onto the remote nodes (This used to be ctdb commit 1ba76d374652cfa29e56fb77c7190349e42d3bcc) 2007-05-06 04:38:44 +04:00
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`/*`
			`ensure all nodes have the same vnnmap we do`
			`*/`
ctdb-daemon: Rename struct ctdb_node_map to ctdb_node_map_old Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 09:22:48 +03:00			`static int update_vnnmap_on_all_nodes(struct ctdb_context ctdb, struct ctdb_node_map_old nodemap,`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`uint32_t pnn, struct ctdb_vnn_map vnnmap, TALLOC_CTX mem_ctx)`
break out the code to update all nodes to the new vnnmap into a helper function (This used to be ctdb commit 81d39177949b54715710907d14ddc888dc09b064) 2007-05-06 04:42:18 +04:00			`{`
			`int j, ret;`

			`/* push the new vnn map out to all the nodes */`
			`for (j=0; j<nodemap->num; j++) {`
Fix various spelling errors Reviewed-by: Andrew Bartlett <abartlet@samba.org> Reviewed-by: Michael Adam <obnox@samba.org> Autobuild-User(master): Andrew Bartlett <abartlet@samba.org> Autobuild-Date(master): Fri Nov 6 13:43:45 CET 2015 on sn-devel-104 2015-07-27 00:02:57 +03:00			`/* don't push to nodes that are unavailable */`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`if (nodemap->nodes[j].flags & NODE_FLAGS_INACTIVE) {`
break out the code to update all nodes to the new vnnmap into a helper function (This used to be ctdb commit 81d39177949b54715710907d14ddc888dc09b064) 2007-05-06 04:42:18 +04:00			`continue;`
			`}`

change how we do public addresses and takeover so that we can have multiple public addresses spread across multiple interfaces on each node. this is a massive patch since we have previously made the assumtion that we only have one public address per node. get rid of the public_interface argument. the public addresses file now explicitely lists which interface the address belongs to (This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8) 2007-09-04 03:50:07 +04:00			`ret = ctdb_ctrl_setvnnmap(ctdb, CONTROL_TIMEOUT(), nodemap->nodes[j].pnn, mem_ctx, vnnmap);`
break out the code to update all nodes to the new vnnmap into a helper function (This used to be ctdb commit 81d39177949b54715710907d14ddc888dc09b064) 2007-05-06 04:42:18 +04:00			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to set vnnmap for node %u\n", pnn));`
break out the code to update all nodes to the new vnnmap into a helper function (This used to be ctdb commit 81d39177949b54715710907d14ddc888dc09b064) 2007-05-06 04:42:18 +04:00			`return -1;`
			`}`
			`}`

			`return 0;`
			`}`

implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00
added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 09:23:27 +03:00			`/*`
ensure the recovery daemon is not clagged up by vacuum calls (This used to be ctdb commit ff7e80e247bf5a86adda0ef850d901478449675b) 2008-01-08 13:28:42 +03:00			`called when a vacuum fetch has completed - just free it and do the next one`
added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 09:23:27 +03:00			`*/`
			`static void vacuum_fetch_callback(struct ctdb_client_call_state *state)`
			`{`
			`talloc_free(state);`
ensure the recovery daemon is not clagged up by vacuum calls (This used to be ctdb commit ff7e80e247bf5a86adda0ef850d901478449675b) 2008-01-08 13:28:42 +03:00			`}`


ctdb-recoverd/vacuum: factor vacuum_fetch_process_one out of vacuum_fetch_loop Signed-off-by: Michael Adam <obnox@samba.org> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-06-02 22:39:00 +03:00			`/**`
			`* Process one elements of the vacuum fetch list:`
			`* Migrate it over to us with the special flag`
			`* CTDB_CALL_FLAG_VACUUM_MIGRATION.`
			`*/`
			`static bool vacuum_fetch_process_one(struct ctdb_db_context *ctdb_db,`
			`uint32_t pnn,`
ctdb-daemon: Rename struct ctdb_rec_data to ctdb_rec_data_old Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 09:30:30 +03:00			`struct ctdb_rec_data_old *r)`
ctdb-recoverd/vacuum: factor vacuum_fetch_process_one out of vacuum_fetch_loop Signed-off-by: Michael Adam <obnox@samba.org> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-06-02 22:39:00 +03:00			`{`
			`struct ctdb_client_call_state *state;`
			`TDB_DATA data;`
			`struct ctdb_ltdb_header *hdr;`
			`struct ctdb_call call;`

			`ZERO_STRUCT(call);`
			`call.call_id = CTDB_NULL_FUNC;`
			`call.flags = CTDB_IMMEDIATE_MIGRATION;`
			`call.flags \|= CTDB_CALL_FLAG_VACUUM_MIGRATION;`

			`call.key.dptr = &r->data[0];`
			`call.key.dsize = r->keylen;`

			`/* ensure we don't block this daemon - just skip a record if we can't get`
			`the chainlock */`
			`if (tdb_chainlock_nonblock(ctdb_db->ltdb->tdb, call.key) != 0) {`
			`return true;`
			`}`

			`data = tdb_fetch(ctdb_db->ltdb->tdb, call.key);`
			`if (data.dptr == NULL) {`
			`tdb_chainunlock(ctdb_db->ltdb->tdb, call.key);`
			`return true;`
			`}`

			`if (data.dsize < sizeof(struct ctdb_ltdb_header)) {`
			`free(data.dptr);`
			`tdb_chainunlock(ctdb_db->ltdb->tdb, call.key);`
			`return true;`
			`}`

			`hdr = (struct ctdb_ltdb_header *)data.dptr;`
			`if (hdr->dmaster == pnn) {`
			`/* its already local */`
			`free(data.dptr);`
			`tdb_chainunlock(ctdb_db->ltdb->tdb, call.key);`
			`return true;`
			`}`

			`free(data.dptr);`

			`state = ctdb_call_send(ctdb_db, &call);`
			`tdb_chainunlock(ctdb_db->ltdb->tdb, call.key);`
			`if (state == NULL) {`
			`DEBUG(DEBUG_ERR,(__location__ " Failed to setup vacuum fetch call\n"));`
			`return false;`
			`}`
			`state->async.fn = vacuum_fetch_callback;`
			`state->async.private_data = NULL;`

			`return true;`
			`}`

ensure the recovery daemon is not clagged up by vacuum calls (This used to be ctdb commit ff7e80e247bf5a86adda0ef850d901478449675b) 2008-01-08 13:28:42 +03:00
added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 09:23:27 +03:00			`/*`
			`handler for vacuum fetch`
			`*/`
ctdb-daemon: Replace ctdb_message with srvid abstraction Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-04-08 07:38:26 +03:00			`static void vacuum_fetch_handler(uint64_t srvid, TDB_DATA data,`
			`void *private_data)`
added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 09:23:27 +03:00			`{`
ctdb-daemon: Replace ctdb_message with srvid abstraction Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-04-08 07:38:26 +03:00			`struct ctdb_recoverd *rec = talloc_get_type(`
			`private_data, struct ctdb_recoverd);`
			`struct ctdb_context *ctdb = rec->ctdb;`
rename the structure we use for marshalling multiple records (This used to be ctdb commit 4d205476d286570a6e1f52b59af42858ce051106) 2008-07-30 08:24:56 +04:00			`struct ctdb_marshall_buffer *recs;`
added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 09:23:27 +03:00			`int ret, i;`
			`TALLOC_CTX *tmp_ctx = talloc_new(ctdb);`
			`const char *name;`
ctdb-daemon: Rename struct ctdb_dbid_map to ctdb_dbid_map_old Match struct ctdb_dbid as per protocol/protocol.h Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 09:46:05 +03:00			`struct ctdb_dbid_map_old *dbmap=NULL;`
added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 09:23:27 +03:00			`bool persistent = false;`
			`struct ctdb_db_context *ctdb_db;`
ctdb-daemon: Rename struct ctdb_rec_data to ctdb_rec_data_old Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 09:30:30 +03:00			`struct ctdb_rec_data_old *r;`
added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 09:23:27 +03:00
rename the structure we use for marshalling multiple records (This used to be ctdb commit 4d205476d286570a6e1f52b59af42858ce051106) 2008-07-30 08:24:56 +04:00			`recs = (struct ctdb_marshall_buffer *)data.dptr;`
ensure the recovery daemon is not clagged up by vacuum calls (This used to be ctdb commit ff7e80e247bf5a86adda0ef850d901478449675b) 2008-01-08 13:28:42 +03:00
			`if (recs->count == 0) {`
ctdb-recoverd/vacuum: add common exit point to vacuum_fetch_handler Signed-off-by: Michael Adam <obnox@samba.org> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-06-02 22:57:54 +03:00			`goto done;`
ensure the recovery daemon is not clagged up by vacuum calls (This used to be ctdb commit ff7e80e247bf5a86adda0ef850d901478449675b) 2008-01-08 13:28:42 +03:00			`}`

added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 09:23:27 +03:00			`/* work out if the database is persistent */`
			`ret = ctdb_ctrl_getdbmap(ctdb, CONTROL_TIMEOUT(), CTDB_CURRENT_NODE, tmp_ctx, &dbmap);`
			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to get dbids from local node\n"));`
ctdb-recoverd/vacuum: add common exit point to vacuum_fetch_handler Signed-off-by: Michael Adam <obnox@samba.org> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-06-02 22:57:54 +03:00			`goto done;`
added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 09:23:27 +03:00			`}`

			`for (i=0;i<dbmap->num;i++) {`
ctdb-daemon: Rename struct ctdb_dbid_map to ctdb_dbid_map_old Match struct ctdb_dbid as per protocol/protocol.h Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 09:46:05 +03:00			`if (dbmap->dbs[i].db_id == recs->db_id) {`
ReadOnly: Change the ctdb_db structure to keep a uint8_t for flags instead of a boolean for the persistent flag. This is the same size as the original boolean but allows ut to add additional flags for the database (This used to be ctdb commit 7462761638d25880ad46024ad4ef21667eb99a98) 2011-09-01 04:21:55 +04:00			`persistent = dbmap->dbs[i].flags & CTDB_DB_FLAGS_PERSISTENT;`
added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 09:23:27 +03:00			`break;`
			`}`
			`}`
			`if (i == dbmap->num) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to find db_id 0x%x on local node\n", recs->db_id));`
ctdb-recoverd/vacuum: add common exit point to vacuum_fetch_handler Signed-off-by: Michael Adam <obnox@samba.org> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-06-02 22:57:54 +03:00			`goto done;`
added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 09:23:27 +03:00			`}`

			`/* find the name of this database */`
			`if (ctdb_ctrl_getdbname(ctdb, CONTROL_TIMEOUT(), CTDB_CURRENT_NODE, recs->db_id, tmp_ctx, &name) != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,(__location__ " Failed to get name of db 0x%x\n", recs->db_id));`
ctdb-recoverd/vacuum: add common exit point to vacuum_fetch_handler Signed-off-by: Michael Adam <obnox@samba.org> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-06-02 22:57:54 +03:00			`goto done;`
added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 09:23:27 +03:00			`}`

			`/* attach to it */`
client: add timeout argument to ctdb_attach Rather than using a fixed 2 second CTDB_CONTROL_GETDBPATH timeout. (This used to be ctdb commit 9e178671560cb95121e11d718a76b05380ecd6c5) 2011-08-08 18:35:56 +04:00			`ctdb_db = ctdb_attach(ctdb, CONTROL_TIMEOUT(), name, persistent, 0);`
added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 09:23:27 +03:00			`if (ctdb_db == NULL) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,(__location__ " Failed to attach to database '%s'\n", name));`
ctdb-recoverd/vacuum: add common exit point to vacuum_fetch_handler Signed-off-by: Michael Adam <obnox@samba.org> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-06-02 22:57:54 +03:00			`goto done;`
added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 09:23:27 +03:00			`}`

ctdb-daemon: Rename struct ctdb_rec_data to ctdb_rec_data_old Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 09:30:30 +03:00			`r = (struct ctdb_rec_data_old *)&recs->data[0];`
ctdb-recoverd/vacuum: Remove vacuum_info structure For all the records listed in VACUUM_FETCH, migration requests are sent immediately without waiting. This means there can only be a single VACUUM_FETCH processing active at a time. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Michael Adam <obnox@samba.org> 2015-06-05 09:35:48 +03:00			`while (recs->count) {`
ctdb-recoverd/vacuum: move fetch loop back into fetch handler. With the processing of one element factored out, it is more natural to have the actual loop inside the handler function. This also makes the talloc/free bracked around the loop more obvious. Signed-off-by: Michael Adam <obnox@samba.org> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-06-02 23:17:03 +03:00			`bool ok;`

ctdb-recoverd/vacuum: Remove vacuum_info structure For all the records listed in VACUUM_FETCH, migration requests are sent immediately without waiting. This means there can only be a single VACUUM_FETCH processing active at a time. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Michael Adam <obnox@samba.org> 2015-06-05 09:35:48 +03:00			`ok = vacuum_fetch_process_one(ctdb_db, rec->ctdb->pnn, r);`
ctdb-recoverd/vacuum: move fetch loop back into fetch handler. With the processing of one element factored out, it is more natural to have the actual loop inside the handler function. This also makes the talloc/free bracked around the loop more obvious. Signed-off-by: Michael Adam <obnox@samba.org> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-06-02 23:17:03 +03:00			`if (!ok) {`
			`break;`
			`}`

ctdb-daemon: Rename struct ctdb_rec_data to ctdb_rec_data_old Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 09:30:30 +03:00			`r = (struct ctdb_rec_data_old )(r->length + (uint8_t )r);`
ctdb-recoverd/vacuum: Remove vacuum_info structure For all the records listed in VACUUM_FETCH, migration requests are sent immediately without waiting. This means there can only be a single VACUUM_FETCH processing active at a time. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Michael Adam <obnox@samba.org> 2015-06-05 09:35:48 +03:00			`recs->count--;`
ctdb-recoverd/vacuum: move fetch loop back into fetch handler. With the processing of one element factored out, it is more natural to have the actual loop inside the handler function. This also makes the talloc/free bracked around the loop more obvious. Signed-off-by: Michael Adam <obnox@samba.org> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-06-02 23:17:03 +03:00			`}`

ctdb-recoverd/vacuum: add common exit point to vacuum_fetch_handler Signed-off-by: Michael Adam <obnox@samba.org> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-06-02 22:57:54 +03:00			`done:`
fix some slow memory leaks in the vacuuming handler in the recovery daemon (This used to be ctdb commit 95bf36559d62f29e6f538f3a173b504ef3258341) 2008-09-16 01:55:57 +04:00			`talloc_free(tmp_ctx);`
added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 09:23:27 +03:00			`}`

added admin commands to ban/unban nodes (This used to be ctdb commit 4dad04172e7e4955b5bf6444a85b19901c9683ad) 2007-06-07 10:34:33 +04:00
ctdb-recoverd: Detach database from recovery daemon As part of vacuuming, recoverd attaches to databases to migrate records. When detaching a database from main daemon, it should be removed from recovery daemon also. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Michael Adam <obnox@samba.org> Autobuild-User(master): Michael Adam <obnox@samba.org> Autobuild-Date(master): Wed Apr 23 17:05:45 CEST 2014 on sn-devel-104 2014-04-22 09:24:49 +04:00			`/*`
			`* handler for database detach`
			`*/`
ctdb-daemon: Replace ctdb_message with srvid abstraction Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-04-08 07:38:26 +03:00			`static void detach_database_handler(uint64_t srvid, TDB_DATA data,`
			`void *private_data)`
ctdb-recoverd: Detach database from recovery daemon As part of vacuuming, recoverd attaches to databases to migrate records. When detaching a database from main daemon, it should be removed from recovery daemon also. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Michael Adam <obnox@samba.org> Autobuild-User(master): Michael Adam <obnox@samba.org> Autobuild-Date(master): Wed Apr 23 17:05:45 CEST 2014 on sn-devel-104 2014-04-22 09:24:49 +04:00			`{`
ctdb-daemon: Replace ctdb_message with srvid abstraction Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-04-08 07:38:26 +03:00			`struct ctdb_recoverd *rec = talloc_get_type(`
			`private_data, struct ctdb_recoverd);`
			`struct ctdb_context *ctdb = rec->ctdb;`
ctdb-recoverd: Detach database from recovery daemon As part of vacuuming, recoverd attaches to databases to migrate records. When detaching a database from main daemon, it should be removed from recovery daemon also. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Michael Adam <obnox@samba.org> Autobuild-User(master): Michael Adam <obnox@samba.org> Autobuild-Date(master): Wed Apr 23 17:05:45 CEST 2014 on sn-devel-104 2014-04-22 09:24:49 +04:00			`uint32_t db_id;`
			`struct ctdb_db_context *ctdb_db;`

			`if (data.dsize != sizeof(db_id)) {`
			`return;`
			`}`
			`db_id = (uint32_t )data.dptr;`

			`ctdb_db = find_ctdb_db(ctdb, db_id);`
			`if (ctdb_db == NULL) {`
			`/* database is not attached */`
			`return;`
			`}`

			`DLIST_REMOVE(ctdb->db_list, ctdb_db);`

			`DEBUG(DEBUG_NOTICE, ("Detached from database '%s'\n",`
			`ctdb_db->db_name));`
			`talloc_free(ctdb_db);`
			`}`

add a tuneable to control how long we wait after a successful recovery before we alow another recovery to be initiated (This used to be ctdb commit f3b43519423b7a73e6a2dd986bdf11203b8653cf) 2007-07-04 02:36:59 +04:00			`/*`
			`called when ctdb_wait_timeout should finish`
			`*/`
ctdb-daemon: Stop using tevent compatibility definitions Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-26 08:50:09 +03:00			`static void ctdb_wait_handler(struct tevent_context *ev,`
			`struct tevent_timer *te,`
add a tuneable to control how long we wait after a successful recovery before we alow another recovery to be initiated (This used to be ctdb commit f3b43519423b7a73e6a2dd986bdf11203b8653cf) 2007-07-04 02:36:59 +04:00			`struct timeval yt, void *p)`
			`{`
			`uint32_t timed_out = (uint32_t )p;`
			`(*timed_out) = 1;`
			`}`

			`/*`
			`wait for a given number of seconds`
			`*/`
speed startup: don't wait a full recovery interval if we've already waited We currently sleep for one second, whether or not we've already slept. Change this to sleep for the remainder of the second, if at all. Seconds between ctdbd first log message and node healthy: BEFORE: 18.09 AFTER: 17.08 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit fde760b5f39c77172308a583da4c2443b71541c9) 2010-06-22 17:20:35 +04:00			`static void ctdb_wait_timeout(struct ctdb_context *ctdb, double secs)`
add a tuneable to control how long we wait after a successful recovery before we alow another recovery to be initiated (This used to be ctdb commit f3b43519423b7a73e6a2dd986bdf11203b8653cf) 2007-07-04 02:36:59 +04:00			`{`
			`uint32_t timed_out = 0;`
speed startup: don't wait a full recovery interval if we've already waited We currently sleep for one second, whether or not we've already slept. Change this to sleep for the remainder of the second, if at all. Seconds between ctdbd first log message and node healthy: BEFORE: 18.09 AFTER: 17.08 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit fde760b5f39c77172308a583da4c2443b71541c9) 2010-06-22 17:20:35 +04:00			`time_t usecs = (secs - (time_t)secs) * 1000000;`
ctdb-daemon: Stop using tevent compatibility definitions Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-26 08:50:09 +03:00			`tevent_add_timer(ctdb->ev, ctdb, timeval_current_ofs(secs, usecs),`
			`ctdb_wait_handler, &timed_out);`
add a tuneable to control how long we wait after a successful recovery before we alow another recovery to be initiated (This used to be ctdb commit f3b43519423b7a73e6a2dd986bdf11203b8653cf) 2007-07-04 02:36:59 +04:00			`while (!timed_out) {`
ctdb-daemon: Stop using tevent compatibility definitions Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-26 08:50:09 +03:00			`tevent_loop_once(ctdb->ev);`
add a tuneable to control how long we wait after a successful recovery before we alow another recovery to be initiated (This used to be ctdb commit f3b43519423b7a73e6a2dd986bdf11203b8653cf) 2007-07-04 02:36:59 +04:00			`}`
			`}`

make election handling much more scalable (This used to be ctdb commit 05938d462b92bd9ecb8e35f53651bded47c48675) 2007-11-13 02:27:44 +03:00			`/*`
			`called when an election times out (ends)`
			`*/`
ctdb-daemon: Stop using tevent compatibility definitions Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-26 08:50:09 +03:00			`static void ctdb_election_timeout(struct tevent_context *ev,`
			`struct tevent_timer *te,`
make election handling much more scalable (This used to be ctdb commit 05938d462b92bd9ecb8e35f53651bded47c48675) 2007-11-13 02:27:44 +03:00			`struct timeval t, void *p)`
			`{`
			`struct ctdb_recoverd *rec = talloc_get_type(p, struct ctdb_recoverd);`
			`rec->election_timeout = NULL;`
speed startup: with --sloppy-start, cut initial election timeout to 1/2 second. Seconds between ctdbd first log message and node healthy: BEFORE: 4.03 AFTER: 2.02 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 8f17731dea4287d4f9b21dc58c1bdf26c8a0e628) 2010-06-22 17:25:20 +04:00			`fast_start = false;`
When we create new election data to send during elections, we must re-read the node flags from the main daemon to catch when the STOPPED flag is changed. (This used to be ctdb commit ca4982c40d81db528fe915d5ecc01fcf7df0b522) 2009-07-17 05:37:03 +04:00
ctdb-recoverd: Don't say "Election timed out" That makes people think there's a problem (and report bugs) so say something a bit less scary instead... Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2014-06-20 07:36:25 +04:00			`DEBUG(DEBUG_WARNING,("Election period ended\n"));`
make election handling much more scalable (This used to be ctdb commit 05938d462b92bd9ecb8e35f53651bded47c48675) 2007-11-13 02:27:44 +03:00			`}`


			`/*`
			`wait for an election to finish. It finished election_timeout seconds after`
			`the last election packet is received`
			`*/`
			`static void ctdb_wait_election(struct ctdb_recoverd *rec)`
			`{`
			`struct ctdb_context *ctdb = rec->ctdb;`
			`while (rec->election_timeout) {`
ctdb-daemon: Stop using tevent compatibility definitions Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-26 08:50:09 +03:00			`tevent_loop_once(ctdb->ev);`
make election handling much more scalable (This used to be ctdb commit 05938d462b92bd9ecb8e35f53651bded47c48675) 2007-11-13 02:27:44 +03:00			`}`
			`}`

sync flags between nodes in monitor loop in recmaster (This used to be ctdb commit 6eef86e06388fc53a1212f1e2783ae174c6cd210) 2007-10-15 08:28:51 +04:00			`/*`
when we as the recovery daemon on the recovery master detects that the flags differ between the local ctdb daemon and the remote node we can force a flags update on all nodes and not just the local daemon (This used to be ctdb commit a924eb89c966ecbae029ca137e06cffd40cc70fd) 2007-11-23 03:31:42 +03:00			`Update our local flags from all remote connected nodes.`
			`This is only run when we are or we belive we are the recovery master`
sync flags between nodes in monitor loop in recmaster (This used to be ctdb commit 6eef86e06388fc53a1212f1e2783ae174c6cd210) 2007-10-15 08:28:51 +04:00			`*/`
ctdb-daemon: Rename struct ctdb_node_map to ctdb_node_map_old Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 09:22:48 +03:00			`static int update_local_flags(struct ctdb_recoverd rec, struct ctdb_node_map_old nodemap)`
sync flags between nodes in monitor loop in recmaster (This used to be ctdb commit 6eef86e06388fc53a1212f1e2783ae174c6cd210) 2007-10-15 08:28:51 +04:00			`{`
			`int j;`
dont manipulate ctdb->monitoring_mode directly from the SET_MON_MODE control, instead call ctdb_start/stop_monitoring() ctdb_stop_monitoring() dont allocate a new monitoring context, leave it NULL. Also set the monitoring_mode in this function so that ctdb_stop/start_monitoring() and ->monitoring_mode are kept in sync. Add a debug message to log that we have stopped monitoring. ctdb_start_monitoring() check whether monitoring is already active and make the function idempotent. Create the monitoring context when monitoring is started. Update ->monitoring_mode once the monitoring has been started. Add a debug message to log that we have started monitoring. When we temporarily stop monitoring while running an event script, restart monitoring after the event script wrapper returns instead of in the event script callback. Let monitoring_mode start out as DISABLED and let it be enabled once we call ctdb_start_monitoring. dont check for MONITORING_DISABLED in check_fore_dead_nodes(). If monitoring is disabled, this event handler will not be called. (This used to be ctdb commit 3a93ae8bdcffb1adbd6243844f3058fc742f76aa) 2007-11-30 00:44:34 +03:00			`struct ctdb_context *ctdb = rec->ctdb;`
sync flags between nodes in monitor loop in recmaster (This used to be ctdb commit 6eef86e06388fc53a1212f1e2783ae174c6cd210) 2007-10-15 08:28:51 +04:00			`TALLOC_CTX *mem_ctx = talloc_new(ctdb);`

			`/* get the nodemap for all active remote nodes and verify`
			`they are the same as for this node`
			`*/`
			`for (j=0; j<nodemap->num; j++) {`
ctdb-daemon: Rename struct ctdb_node_map to ctdb_node_map_old Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 09:22:48 +03:00			`struct ctdb_node_map_old *remote_nodemap=NULL;`
sync flags between nodes in monitor loop in recmaster (This used to be ctdb commit 6eef86e06388fc53a1212f1e2783ae174c6cd210) 2007-10-15 08:28:51 +04:00			`int ret;`

			`if (nodemap->nodes[j].flags & NODE_FLAGS_DISCONNECTED) {`
			`continue;`
			`}`
			`if (nodemap->nodes[j].pnn == ctdb->pnn) {`
			`continue;`
			`}`

			`ret = ctdb_ctrl_getnodemap(ctdb, CONTROL_TIMEOUT(), nodemap->nodes[j].pnn,`
			`mem_ctx, &remote_nodemap);`
			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to get nodemap from remote node %u\n",`
sync flags between nodes in monitor loop in recmaster (This used to be ctdb commit 6eef86e06388fc53a1212f1e2783ae174c6cd210) 2007-10-15 08:28:51 +04:00			`nodemap->nodes[j].pnn));`
move ctdb_set_culprit higher up in the file when we are the recmaster and we update the local flags for all the nodes, if one of the nodes fail to respond and give us his flags, set that node as a "culprit" as one of the first things to do in the monitor_cluster loop, check if the current culprit has caused too many (20) failures and if so ban that node. this is for the situation where a remote node may still be CONNECTED but it fails to respond to the getnodemap control causing the recovery master to loop in monitor_cluster aborting the monitoring when the node fails to respond but before anything will trigger a call to do_recovery(). If one or more of the databases or nodes are frozen at this stage, this would lead to smbd being blocked for potentially a longish time. (This used to be ctdb commit 83b0261f2cb453195b86f547d360400103a8b795) 2007-11-28 07:04:20 +03:00			`ctdb_set_culprit(rec, nodemap->nodes[j].pnn);`
sync flags between nodes in monitor loop in recmaster (This used to be ctdb commit 6eef86e06388fc53a1212f1e2783ae174c6cd210) 2007-10-15 08:28:51 +04:00			`talloc_free(mem_ctx);`
If update_local_flags() finds that a node has changed its BANNED status so it differs from what the local ctdb daemon on the recovery master thinks it should be we should call for a re-election (This used to be ctdb commit 21ad6039c31ef5cc0e40a35a41220f91943947cb) 2007-11-23 03:53:06 +03:00			`return MONITOR_FAILED;`
sync flags between nodes in monitor loop in recmaster (This used to be ctdb commit 6eef86e06388fc53a1212f1e2783ae174c6cd210) 2007-10-15 08:28:51 +04:00			`}`
			`if (nodemap->nodes[j].flags != remote_nodemap->nodes[j].flags) {`
If update_local_flags() finds that a node has changed its BANNED status so it differs from what the local ctdb daemon on the recovery master thinks it should be we should call for a re-election (This used to be ctdb commit 21ad6039c31ef5cc0e40a35a41220f91943947cb) 2007-11-23 03:53:06 +03:00			`/* We should tell our daemon about this so it`
add an extra log if we get a modflags control but it doesnt change any flags in update_local_flags() (this is only called if we are or we belive we are the recmaster) when we detect that the flags of a remote node is different from what our local node thinks the flags should be for that remote node we should send a node-flag-changed message to the local daemon so that it updates the flags for that node. (This used to be ctdb commit 36225e4e271f7a4065398253747fb20054f99a53) 2007-11-23 02:52:29 +03:00			`updates its flags or else we will log the same`
			`message again in the next iteration of recovery.`
when we as the recovery daemon on the recovery master detects that the flags differ between the local ctdb daemon and the remote node we can force a flags update on all nodes and not just the local daemon (This used to be ctdb commit a924eb89c966ecbae029ca137e06cffd40cc70fd) 2007-11-23 03:31:42 +03:00			`Since we are the recovery master we can just as`
			`well update the flags on all nodes.`
add an extra log if we get a modflags control but it doesnt change any flags in update_local_flags() (this is only called if we are or we belive we are the recmaster) when we detect that the flags of a remote node is different from what our local node thinks the flags should be for that remote node we should send a node-flag-changed message to the local daemon so that it updates the flags for that node. (This used to be ctdb commit 36225e4e271f7a4065398253747fb20054f99a53) 2007-11-23 02:52:29 +03:00			`*/`
recoverd: When updating flags on nodes, send updated flags and not old flags This was broken by commit a9a1156ea4e10483a4bf4265b8e9203f0af033aa. Instead of a SRVID_SET_NODE_FLAGS message to recovery daemon, a control was sent to the local daemon which in turn informed the recovery daemon. And while doing this change old flags were sent via CONTROL_MODIFY_FLAGS. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 7eb2f89979360b6cc98ca9b17c48310277fa89fc) 2013-06-26 09:22:46 +04:00			`ret = ctdb_ctrl_modflags(ctdb, CONTROL_TIMEOUT(), nodemap->nodes[j].pnn, remote_nodemap->nodes[j].flags, ~remote_nodemap->nodes[j].flags);`
reqrite the handling of flag updates across the cluster to eliminate a race between the ctdb tool and the recovery daemon both at once trying to push flag changes across the cluster. (This used to be ctdb commit a9a1156ea4e10483a4bf4265b8e9203f0af033aa) 2008-11-19 06:43:46 +03:00			`if (ret != 0) {`
			`DEBUG(DEBUG_ERR, (__location__ " Unable to update nodeflags on remote nodes\n"));`
			`return -1;`
			`}`
add an extra log if we get a modflags control but it doesnt change any flags in update_local_flags() (this is only called if we are or we belive we are the recmaster) when we detect that the flags of a remote node is different from what our local node thinks the flags should be for that remote node we should send a node-flag-changed message to the local daemon so that it updates the flags for that node. (This used to be ctdb commit 36225e4e271f7a4065398253747fb20054f99a53) 2007-11-23 02:52:29 +03:00
If update_local_flags() finds that a node has changed its BANNED status so it differs from what the local ctdb daemon on the recovery master thinks it should be we should call for a re-election (This used to be ctdb commit 21ad6039c31ef5cc0e40a35a41220f91943947cb) 2007-11-23 03:53:06 +03:00			`/* Update our local copy of the flags in the recovery`
			`daemon.`
			`*/`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_NOTICE,("Remote node %u had flags 0x%x, local had 0x%x - updating local\n",`
If update_local_flags() finds that a node has changed its BANNED status so it differs from what the local ctdb daemon on the recovery master thinks it should be we should call for a re-election (This used to be ctdb commit 21ad6039c31ef5cc0e40a35a41220f91943947cb) 2007-11-23 03:53:06 +03:00			`nodemap->nodes[j].pnn, remote_nodemap->nodes[j].flags,`
			`nodemap->nodes[j].flags));`
sync flags between nodes in monitor loop in recmaster (This used to be ctdb commit 6eef86e06388fc53a1212f1e2783ae174c6cd210) 2007-10-15 08:28:51 +04:00			`nodemap->nodes[j].flags = remote_nodemap->nodes[j].flags;`
			`}`
			`talloc_free(remote_nodemap);`
			`}`
			`talloc_free(mem_ctx);`
If update_local_flags() finds that a node has changed its BANNED status so it differs from what the local ctdb daemon on the recovery master thinks it should be we should call for a re-election (This used to be ctdb commit 21ad6039c31ef5cc0e40a35a41220f91943947cb) 2007-11-23 03:53:06 +03:00			`return MONITOR_OK;`
sync flags between nodes in monitor loop in recmaster (This used to be ctdb commit 6eef86e06388fc53a1212f1e2783ae174c6cd210) 2007-10-15 08:28:51 +04:00			`}`


ctdbd: Fix a typo Signed-off-by: Volker Lendecke <vl@samba.org> Reviewed-by: Jeremy Allison <jra@samba.org> 2015-10-12 17:52:49 +03:00			`/* Create a new random generation id.`
create a define to represent the 'invalid' generation id we used in two places. create a new helper function to generate new generation id values that know about the invalid id and avoids generating it. update the ctdb status tool to know about the invalid generation id and print the string INVALID instead (This used to be ctdb commit 4fbcd189543cb8a92227fdcd3d158472e558ccda) 2007-08-22 06:38:31 +04:00			`The generation id can not be the INVALID_GENERATION id`
			`*/`
			`static uint32_t new_generation(void)`
			`{`
			`uint32_t generation;`

			`while (1) {`
			`generation = random();`

			`if (generation != INVALID_GENERATION) {`
			`break;`
			`}`
			`}`

			`return generation;`
			`}`
we are the culprit if we can't get the reclock (This used to be ctdb commit 1d320e113c6134ff6822b985a47131d8204af35a) 2007-10-05 06:01:40 +04:00
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00
			`/*`
			`create a temporary working database`
			`*/`
			`static struct tdb_wrap create_recdb(struct ctdb_context ctdb, TALLOC_CTX *mem_ctx)`
			`{`
			`char *name;`
			`struct tdb_wrap *recdb;`
don't use mmap in tdb if --nosetsched is set. That makes valgrind happier (it doesn't like the mmap/msync calls in tdb) (This used to be ctdb commit f3a729998ce67f5d2e3b2ad41d96e8f04c0d18d8) 2008-07-04 11:32:21 +04:00			`unsigned tdb_flags;`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00
			`/* open up the temporary recovery database */`
server: create recdb.tdb.X in /var/ctdb/state/ metze (This used to be ctdb commit 92e05282d6c4f16e55d914cc3bde3738ea2d44ad) 2009-11-23 17:36:45 +03:00			`name = talloc_asprintf(mem_ctx, "%s/recdb.tdb.%u",`
			`ctdb->db_directory_state,`
			`ctdb->pnn);`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`if (name == NULL) {`
			`return NULL;`
			`}`
			`unlink(name);`
don't use mmap in tdb if --nosetsched is set. That makes valgrind happier (it doesn't like the mmap/msync calls in tdb) (This used to be ctdb commit f3a729998ce67f5d2e3b2ad41d96e8f04c0d18d8) 2008-07-04 11:32:21 +04:00
			`tdb_flags = TDB_NOLOCK;`
Add --valgringing flag instead of --nosetsched The do_setsched was being tested for whether to mmap tdbs: let's make it explicit. We can also happily move the kill-child eventscript hack under this flag. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 2ee86cc1f311d7b7504c7b14d142b9c4f6f4b469) 2009-12-16 13:29:15 +03:00			`if (ctdb->valgrinding) {`
don't use mmap in tdb if --nosetsched is set. That makes valgrind happier (it doesn't like the mmap/msync calls in tdb) (This used to be ctdb commit f3a729998ce67f5d2e3b2ad41d96e8f04c0d18d8) 2008-07-04 11:32:21 +04:00			`tdb_flags \|= TDB_NOMMAP;`
			`}`
recoverd: Make sure to use jenkins hash for recovery databases Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 32c83e209823e9a4d6306bb7fd63d4500f3e2668) 2013-07-29 07:50:44 +04:00			`tdb_flags \|= (TDB_INCOMPATIBLE_HASH \| TDB_DISALLOW_NESTING);`
don't use mmap in tdb if --nosetsched is set. That makes valgrind happier (it doesn't like the mmap/msync calls in tdb) (This used to be ctdb commit f3a729998ce67f5d2e3b2ad41d96e8f04c0d18d8) 2008-07-04 11:32:21 +04:00
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`recdb = tdb_wrap_open(mem_ctx, name, ctdb->tunable.database_hash_size,`
don't use mmap in tdb if --nosetsched is set. That makes valgrind happier (it doesn't like the mmap/msync calls in tdb) (This used to be ctdb commit f3a729998ce67f5d2e3b2ad41d96e8f04c0d18d8) 2008-07-04 11:32:21 +04:00			`tdb_flags, O_RDWR\|O_CREAT\|O_EXCL, 0600);`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`if (recdb == NULL) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_CRIT,(__location__ " Failed to create temp recovery database '%s'\n", name));`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`}`

			`talloc_free(name);`

			`return recdb;`
			`}`


			`/*`
recoverd: fix a comment typo Signed-off-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit 909269a4a3690e1245117ca1af935401455785e6) 2012-11-19 20:20:11 +04:00			`a traverse function for pulling all relevant records from recdb`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`*/`
			`struct recdb_data {`
			`struct ctdb_context *ctdb;`
rename the structure we use for marshalling multiple records (This used to be ctdb commit 4d205476d286570a6e1f52b59af42858ce051106) 2008-07-30 08:24:56 +04:00			`struct ctdb_marshall_buffer *recdata;`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`uint32_t len;`
RECOVER: When we pull databases during recovery, we used to reallocate the databuffer for each entry added. This would normally not be an issue, but for cases where memory is fragmented, this could start to cost significant cpu if we need to reallocate and move to a different region. Change this to instead preallocate , by default, 10MByte chunks to the data buffer. This significantly reduces the number of potential reallocate and move operations that may be required. Create a tunable to override/change how much preallocation should be used. (This used to be ctdb commit 1f262deaad0818f159f9c68330f7fec121679023) 2012-05-25 06:27:59 +04:00			`uint32_t allocated_len;`
catch internal traversal errors (This used to be ctdb commit 8caa85ad71be5d20a8d6f0cb3d52aff6905657a4) 2008-01-07 06:08:25 +03:00			`bool failed;`
recovery: pass the persistent flag to recover_database() and further down to pull_remote_database(), pull_one_remote_database(), and push_recdb_database(). This is in preparation of special handling of persistent databases during recoveries. Michael (This used to be ctdb commit 90abc4ac7c16e854cf6e8f96b60a77bc92e35e07) 2009-11-29 13:14:31 +03:00			`bool persistent;`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`};`

			`static int traverse_recdb(struct tdb_context tdb, TDB_DATA key, TDB_DATA data, void p)`
			`{`
			`struct recdb_data params = (struct recdb_data )p;`
ctdb-daemon: Rename struct ctdb_rec_data to ctdb_rec_data_old Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 09:30:30 +03:00			`struct ctdb_rec_data_old *recdata;`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`struct ctdb_ltdb_header *hdr;`

recovery: data corruption of persistent DBs after recoveries: don't delete emtpy records The record-by-record mode of recovery deletes empty records. For persistent databases, this can lead to data corruption by deleting records that should be there: - Assume the cluster has been running for a while. - A record R in a persistent database has been created and deleted a couple of times, the last operation being deletion, leaving an empty record with a high RSN, say 10. - Now a node N is turned off. - This leaves the local database copy of D on N with the empty copy of R and RSN 10. On all other nodes, the recovery has deleted the copy of record R. - Now the record is created again while node N is turned off. This creates R with RSN = 1 on all nodes except for N. - Now node N is turned on again. The following recovery will chose the older empty copy of R due to RSN 10 > RSN 1. ==> Hence the record is gone after the recovery. On databases like Samba's registry, this can damage the higher-level data structures built from the various tdb-level records. This patch fixes that problem by not deleting empty records in recoveries for persistent databases. Signed-off-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit 6860c79aea416f56cfd7a6af790bbdf495dbc54e) 2012-11-19 20:28:03 +04:00			`/*`
			`* skip empty records - but NOT for persistent databases:`
			`*`
			`* The record-by-record mode of recovery deletes empty records.`
			`* For persistent databases, this can lead to data corruption`
			`* by deleting records that should be there:`
			`*`
			`* - Assume the cluster has been running for a while.`
			`*`
			`* - A record R in a persistent database has been created and`
			`* deleted a couple of times, the last operation being deletion,`
			`* leaving an empty record with a high RSN, say 10.`
			`*`
			`* - Now a node N is turned off.`
			`*`
			`* - This leaves the local database copy of D on N with the empty`
			`* copy of R and RSN 10. On all other nodes, the recovery has deleted`
			`* the copy of record R.`
			`*`
			`* - Now the record is created again while node N is turned off.`
			`* This creates R with RSN = 1 on all nodes except for N.`
			`*`
			`* - Now node N is turned on again. The following recovery will chose`
			`* the older empty copy of R due to RSN 10 > RSN 1.`
			`*`
			`* ==> Hence the record is gone after the recovery.`
			`*`
			`* On databases like Samba's registry, this can damage the higher-level`
			`* data structures built from the various tdb-level records.`
			`*/`
			`if (!params->persistent && data.dsize <= sizeof(struct ctdb_ltdb_header)) {`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`return 0;`
			`}`

			`/* update the dmaster field to point to us */`
			`hdr = (struct ctdb_ltdb_header *)data.dptr;`
recovery: for persistent db's don't set the dmaster to the recmaster node number It is important to keep track of the dmaster (i.e. the node that last committed a transaction containing changes to this node). Michael (This used to be ctdb commit fe68972eb9cf3aa1f16ba1aacf57ade5d66e647c) 2009-11-29 13:17:18 +03:00			`if (!params->persistent) {`
			`hdr->dmaster = params->ctdb->pnn;`
recoverd: in a recovery, set the MIGRATED_WITH_DATA flag on all records Those records that are kept after recovery, are non-empty, and stored identically on all nodes. So this is as if they had been migrated with data. Pair-Programmed-With: Stefan Metzmacher <metze@samba.org> (This used to be ctdb commit 101be642e492a3a54231e2e3e6553a59380fe702) 2010-12-03 17:24:06 +03:00			`hdr->flags \|= CTDB_REC_FLAG_MIGRATED_WITH_DATA;`
recovery: for persistent db's don't set the dmaster to the recmaster node number It is important to keep track of the dmaster (i.e. the node that last committed a transaction containing changes to this node). Michael (This used to be ctdb commit fe68972eb9cf3aa1f16ba1aacf57ade5d66e647c) 2009-11-29 13:17:18 +03:00			`}`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00
			`/* add the record to the blob ready to send to the nodes */`
ctdb-recoverd: Rename some local variables to avoid conflict with convention rec is always a (struct ctdb_recoverd *) Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-03-29 11:20:55 +03:00			`recdata = ctdb_marshall_record(params->recdata, 0, key, NULL, data);`
			`if (recdata == NULL) {`
catch internal traversal errors (This used to be ctdb commit 8caa85ad71be5d20a8d6f0cb3d52aff6905657a4) 2008-01-07 06:08:25 +03:00			`params->failed = true;`
			`return -1;`
			`}`
ctdb-recoverd: Rename some local variables to avoid conflict with convention rec is always a (struct ctdb_recoverd *) Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-03-29 11:20:55 +03:00			`if (params->len + recdata->length >= params->allocated_len) {`
			`params->allocated_len = recdata->length + params->len + params->ctdb->tunable.pulldb_preallocation_size;`
RECOVER: When we pull databases during recovery, we used to reallocate the databuffer for each entry added. This would normally not be an issue, but for cases where memory is fragmented, this could start to cost significant cpu if we need to reallocate and move to a different region. Change this to instead preallocate , by default, 10MByte chunks to the data buffer. This significantly reduces the number of potential reallocate and move operations that may be required. Create a tunable to override/change how much preallocation should be used. (This used to be ctdb commit 1f262deaad0818f159f9c68330f7fec121679023) 2012-05-25 06:27:59 +04:00			`params->recdata = talloc_realloc_size(NULL, params->recdata, params->allocated_len);`
			`}`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`if (params->recdata == NULL) {`
Fixes for various issues found by Coverity Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 05bfdbbd0d4abdfbcf28e3930086723508b35952) 2011-08-10 19:53:56 +04:00			`DEBUG(DEBUG_CRIT,(__location__ " Failed to expand recdata to %u\n",`
ctdb-recoverd: Rename some local variables to avoid conflict with convention rec is always a (struct ctdb_recoverd *) Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-03-29 11:20:55 +03:00			`recdata->length + params->len));`
catch internal traversal errors (This used to be ctdb commit 8caa85ad71be5d20a8d6f0cb3d52aff6905657a4) 2008-01-07 06:08:25 +03:00			`params->failed = true;`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`return -1;`
			`}`
			`params->recdata->count++;`
ctdb-recoverd: Rename some local variables to avoid conflict with convention rec is always a (struct ctdb_recoverd *) Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-03-29 11:20:55 +03:00			`memcpy(params->len+(uint8_t *)params->recdata, recdata, recdata->length);`
			`params->len += recdata->length;`
			`talloc_free(recdata);`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00
			`return 0;`
			`}`

			`/*`
			`push the recdb database out to all nodes`
			`*/`
			`static int push_recdb_database(struct ctdb_context *ctdb, uint32_t dbid,`
recovery: pass the persistent flag to recover_database() and further down to pull_remote_database(), pull_one_remote_database(), and push_recdb_database(). This is in preparation of special handling of persistent databases during recoveries. Michael (This used to be ctdb commit 90abc4ac7c16e854cf6e8f96b60a77bc92e35e07) 2009-11-29 13:14:31 +03:00			`bool persistent,`
ctdb-daemon: Rename struct ctdb_node_map to ctdb_node_map_old Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 09:22:48 +03:00			`struct tdb_wrap recdb, struct ctdb_node_map_old nodemap)`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`{`
			`struct recdb_data params;`
rename the structure we use for marshalling multiple records (This used to be ctdb commit 4d205476d286570a6e1f52b59af42858ce051106) 2008-07-30 08:24:56 +04:00			`struct ctdb_marshall_buffer *recdata;`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`TDB_DATA outdata;`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`TALLOC_CTX *tmp_ctx;`
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`uint32_t *nodes;`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00
			`tmp_ctx = talloc_new(ctdb);`
			`CTDB_NO_MEMORY(ctdb, tmp_ctx);`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00
rename the structure we use for marshalling multiple records (This used to be ctdb commit 4d205476d286570a6e1f52b59af42858ce051106) 2008-07-30 08:24:56 +04:00			`recdata = talloc_zero(recdb, struct ctdb_marshall_buffer);`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`CTDB_NO_MEMORY(ctdb, recdata);`

			`recdata->db_id = dbid;`

			`params.ctdb = ctdb;`
			`params.recdata = recdata;`
rename the structure we use for marshalling multiple records (This used to be ctdb commit 4d205476d286570a6e1f52b59af42858ce051106) 2008-07-30 08:24:56 +04:00			`params.len = offsetof(struct ctdb_marshall_buffer, data);`
RECOVER: When we pull databases during recovery, we used to reallocate the databuffer for each entry added. This would normally not be an issue, but for cases where memory is fragmented, this could start to cost significant cpu if we need to reallocate and move to a different region. Change this to instead preallocate , by default, 10MByte chunks to the data buffer. This significantly reduces the number of potential reallocate and move operations that may be required. Create a tunable to override/change how much preallocation should be used. (This used to be ctdb commit 1f262deaad0818f159f9c68330f7fec121679023) 2012-05-25 06:27:59 +04:00			`params.allocated_len = params.len;`
catch internal traversal errors (This used to be ctdb commit 8caa85ad71be5d20a8d6f0cb3d52aff6905657a4) 2008-01-07 06:08:25 +03:00			`params.failed = false;`
recovery: pass the persistent flag to recover_database() and further down to pull_remote_database(), pull_one_remote_database(), and push_recdb_database(). This is in preparation of special handling of persistent databases during recoveries. Michael (This used to be ctdb commit 90abc4ac7c16e854cf6e8f96b60a77bc92e35e07) 2009-11-29 13:14:31 +03:00			`params.persistent = persistent;`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00
			`if (tdb_traverse_read(recdb->tdb, traverse_recdb, &params) == -1) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,(__location__ " Failed to traverse recdb database\n"));`
catch internal traversal errors (This used to be ctdb commit 8caa85ad71be5d20a8d6f0cb3d52aff6905657a4) 2008-01-07 06:08:25 +03:00			`talloc_free(params.recdata);`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`talloc_free(tmp_ctx);`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`return -1;`
			`}`

catch internal traversal errors (This used to be ctdb commit 8caa85ad71be5d20a8d6f0cb3d52aff6905657a4) 2008-01-07 06:08:25 +03:00			`if (params.failed) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,(__location__ " Failed to traverse recdb database\n"));`
catch internal traversal errors (This used to be ctdb commit 8caa85ad71be5d20a8d6f0cb3d52aff6905657a4) 2008-01-07 06:08:25 +03:00			`talloc_free(params.recdata);`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`talloc_free(tmp_ctx);`
catch internal traversal errors (This used to be ctdb commit 8caa85ad71be5d20a8d6f0cb3d52aff6905657a4) 2008-01-07 06:08:25 +03:00			`return -1;`
			`}`

new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`recdata = params.recdata;`

			`outdata.dptr = (void *)recdata;`
			`outdata.dsize = params.len;`

add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`nodes = list_of_active_nodes(ctdb, nodemap, tmp_ctx, true);`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`if (ctdb_client_async_control(ctdb, CTDB_CONTROL_PUSH_DB,`
initial attempt at freezing databases in priority order (This used to be ctdb commit e8d692590da1070c87a4144031e3306d190ebed2) 2009-10-12 05:08:39 +04:00			`nodes, 0,`
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`CONTROL_TIMEOUT(), false, outdata,`
			`NULL, NULL,`
			`NULL) != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,(__location__ " Failed to push recdb records to nodes for db 0x%x\n", dbid));`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`talloc_free(recdata);`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`talloc_free(tmp_ctx);`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`return -1;`
			`}`

added debug constants to allow for better mapping to syslog levels (This used to be ctdb commit 7ba8f1dde318eab03f4257e5a89fd23e7281e502) 2008-02-04 09:44:24 +03:00			`DEBUG(DEBUG_NOTICE, (__location__ " Recovery - pushed remote database 0x%x of size %u\n",`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`dbid, recdata->count));`

			`talloc_free(recdata);`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`talloc_free(tmp_ctx);`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00
			`return 0;`
			`}`


			`/*`
			`go through a full recovery on one database`
			`*/`
			`static int recover_database(struct ctdb_recoverd *rec,`
			`TALLOC_CTX *mem_ctx,`
			`uint32_t dbid,`
recovery: pass the persistent flag to recover_database() and further down to pull_remote_database(), pull_one_remote_database(), and push_recdb_database(). This is in preparation of special handling of persistent databases during recoveries. Michael (This used to be ctdb commit 90abc4ac7c16e854cf6e8f96b60a77bc92e35e07) 2009-11-29 13:14:31 +03:00			`bool persistent,`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`uint32_t pnn,`
ctdb-daemon: Rename struct ctdb_node_map to ctdb_node_map_old Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 09:22:48 +03:00			`struct ctdb_node_map_old *nodemap,`
added paranoid transaction ids (This used to be ctdb commit afc1da53873cdbd31fcc8c6b22fae262e344cf6e) 2008-01-06 05:24:55 +03:00			`uint32_t transaction_id)`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`{`
			`struct tdb_wrap *recdb;`
			`int ret;`
			`struct ctdb_context *ctdb = rec->ctdb;`
			`TDB_DATA data;`
ctdb-daemon: Rename struct ctdb_control_transdb to ctdb_transdb Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-28 11:22:23 +03:00			`struct ctdb_transdb w;`
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`uint32_t *nodes;`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00
			`recdb = create_recdb(ctdb, mem_ctx);`
			`if (recdb == NULL) {`
			`return -1;`
			`}`

			`/* pull all remote databases onto the recdb */`
recovery: pass the persistent flag to recover_database() and further down to pull_remote_database(), pull_one_remote_database(), and push_recdb_database(). This is in preparation of special handling of persistent databases during recoveries. Michael (This used to be ctdb commit 90abc4ac7c16e854cf6e8f96b60a77bc92e35e07) 2009-11-29 13:14:31 +03:00			`ret = pull_remote_database(ctdb, rec, nodemap, recdb, dbid, persistent);`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to pull remote database 0x%x\n", dbid));`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`return -1;`
			`}`

added debug constants to allow for better mapping to syslog levels (This used to be ctdb commit 7ba8f1dde318eab03f4257e5a89fd23e7281e502) 2008-02-04 09:44:24 +03:00			`DEBUG(DEBUG_NOTICE, (__location__ " Recovery - pulled remote database 0x%x\n", dbid));`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00
			`/* wipe all the remote databases. This is safe as we are in a transaction */`
added paranoid transaction ids (This used to be ctdb commit afc1da53873cdbd31fcc8c6b22fae262e344cf6e) 2008-01-06 05:24:55 +03:00			`w.db_id = dbid;`
ctdb-daemon: Rename struct ctdb_control_transdb to ctdb_transdb Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-28 11:22:23 +03:00			`w.tid = transaction_id;`
added paranoid transaction ids (This used to be ctdb commit afc1da53873cdbd31fcc8c6b22fae262e344cf6e) 2008-01-06 05:24:55 +03:00
			`data.dptr = (void *)&w;`
			`data.dsize = sizeof(w);`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`nodes = list_of_active_nodes(ctdb, nodemap, recdb, true);`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`if (ctdb_client_async_control(ctdb, CTDB_CONTROL_WIPE_DATABASE,`
initial attempt at freezing databases in priority order (This used to be ctdb commit e8d692590da1070c87a4144031e3306d190ebed2) 2009-10-12 05:08:39 +04:00			`nodes, 0,`
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`CONTROL_TIMEOUT(), false, data,`
			`NULL, NULL,`
			`NULL) != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to wipe database. Recovery failed.\n"));`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`talloc_free(recdb);`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`return -1;`
			`}`

			`/* push out the correct database. This sets the dmaster and skips`
			`the empty records */`
recovery: pass the persistent flag to recover_database() and further down to pull_remote_database(), pull_one_remote_database(), and push_recdb_database(). This is in preparation of special handling of persistent databases during recoveries. Michael (This used to be ctdb commit 90abc4ac7c16e854cf6e8f96b60a77bc92e35e07) 2009-11-29 13:14:31 +03:00			`ret = push_recdb_database(ctdb, dbid, persistent, recdb, nodemap);`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`if (ret != 0) {`
			`talloc_free(recdb);`
			`return -1;`
			`}`

			`/* all done with this database */`
			`talloc_free(recdb);`

			`return 0;`
			`}`

when performing a recovery, ensure that all nodes use the same reclock file setting as the recovery master (This used to be ctdb commit 26793ad42b77c2328a00ac9a12bca813c7425245) 2010-05-06 03:33:08 +04:00			`/* when we start a recovery, make sure all nodes use the same reclock file`
			`setting`
			`*/`
			`static int sync_recovery_lock_file_across_cluster(struct ctdb_recoverd *rec)`
			`{`
			`struct ctdb_context *ctdb = rec->ctdb;`
			`TALLOC_CTX *tmp_ctx = talloc_new(NULL);`
			`TDB_DATA data;`
			`uint32_t *nodes;`

			`if (ctdb->recovery_lock_file == NULL) {`
			`data.dptr = NULL;`
			`data.dsize = 0;`
			`} else {`
			`data.dsize = strlen(ctdb->recovery_lock_file) + 1;`
			`data.dptr = (uint8_t *)ctdb->recovery_lock_file;`
			`}`

			`nodes = list_of_active_nodes(ctdb, rec->nodemap, tmp_ctx, true);`
			`if (ctdb_client_async_control(ctdb, CTDB_CONTROL_SET_RECLOCK_FILE,`
			`nodes, 0,`
			`CONTROL_TIMEOUT(),`
			`false, data,`
			`NULL, NULL,`
			`rec) != 0) {`
			`DEBUG(DEBUG_ERR, (__location__ " Failed to sync reclock file settings\n"));`
			`talloc_free(tmp_ctx);`
			`return -1;`
			`}`

			`talloc_free(tmp_ctx);`
			`return 0;`
			`}`


recoverd: Track the nodes that fail takeover run and set culprit count If any of the nodes fail takeover run (either due to timeout or failure to complete within takeover_timeout interval) from main loop, recovery master will give up trying takeover run with following message: "Unable to setup public takeover addresses. Try again later" And as a side-effect the monitoring is disabled on all the nodes. Before ctdb_takeover_run() is called from main loop, monitoring get disabled via startrecovery event. Since ctdb_takeover_run() fails, it never runs recovered event and monitoring does not get re-enabled. In main_loop, ctdb_takeover_run() is called with a takeover_fail_callback. This callback will get called if any of the nodes fail in handling takeip/releaseip/ipreallocated events in ctdb_takeover_run(). Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit a5c6bb1fffb8dc3960af113957a1fd080cc7c245) 2012-10-23 09:23:12 +04:00			`/*`
			`* this callback is called for every node that failed to execute ctdb_takeover_run()`
			`* and set flag to re-run takeover run.`
			`*/`
			`static void takeover_fail_callback(struct ctdb_context ctdb, uint32_t node_pnn, int32_t res, TDB_DATA outdata, void callback_data)`
			`{`
recoverd: Log node that causes takoever run to fail Extend takeover_fail_callback() to just log (and not do any ban processing) when the callback data is NULL. Always call ctdb_takeover_run() with the callback so that useful errors are always logged. Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit c429394afbabaee09f9216dc743419adddf523ea) 2013-05-31 08:55:07 +04:00			`DEBUG(DEBUG_ERR, ("Node %u failed the takeover run\n", node_pnn));`
recoverd: Track the nodes that fail takeover run and set culprit count If any of the nodes fail takeover run (either due to timeout or failure to complete within takeover_timeout interval) from main loop, recovery master will give up trying takeover run with following message: "Unable to setup public takeover addresses. Try again later" And as a side-effect the monitoring is disabled on all the nodes. Before ctdb_takeover_run() is called from main loop, monitoring get disabled via startrecovery event. Since ctdb_takeover_run() fails, it never runs recovered event and monitoring does not get re-enabled. In main_loop, ctdb_takeover_run() is called with a takeover_fail_callback. This callback will get called if any of the nodes fail in handling takeip/releaseip/ipreallocated events in ctdb_takeover_run(). Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit a5c6bb1fffb8dc3960af113957a1fd080cc7c245) 2012-10-23 09:23:12 +04:00
recoverd: Log node that causes takoever run to fail Extend takeover_fail_callback() to just log (and not do any ban processing) when the callback data is NULL. Always call ctdb_takeover_run() with the callback so that useful errors are always logged. Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit c429394afbabaee09f9216dc743419adddf523ea) 2013-05-31 08:55:07 +04:00			`if (callback_data != NULL) {`
			`struct ctdb_recoverd *rec = talloc_get_type(callback_data, struct ctdb_recoverd);`
recoverd: Track the nodes that fail takeover run and set culprit count If any of the nodes fail takeover run (either due to timeout or failure to complete within takeover_timeout interval) from main loop, recovery master will give up trying takeover run with following message: "Unable to setup public takeover addresses. Try again later" And as a side-effect the monitoring is disabled on all the nodes. Before ctdb_takeover_run() is called from main loop, monitoring get disabled via startrecovery event. Since ctdb_takeover_run() fails, it never runs recovered event and monitoring does not get re-enabled. In main_loop, ctdb_takeover_run() is called with a takeover_fail_callback. This callback will get called if any of the nodes fail in handling takeip/releaseip/ipreallocated events in ctdb_takeover_run(). Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit a5c6bb1fffb8dc3960af113957a1fd080cc7c245) 2012-10-23 09:23:12 +04:00
recoverd: Log node that causes takoever run to fail Extend takeover_fail_callback() to just log (and not do any ban processing) when the callback data is NULL. Always call ctdb_takeover_run() with the callback so that useful errors are always logged. Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit c429394afbabaee09f9216dc743419adddf523ea) 2013-05-31 08:55:07 +04:00			`DEBUG(DEBUG_ERR, ("Setting node %u as recovery fail culprit\n", node_pnn));`

			`ctdb_set_culprit(rec, node_pnn);`
			`}`
recoverd: Track the nodes that fail takeover run and set culprit count If any of the nodes fail takeover run (either due to timeout or failure to complete within takeover_timeout interval) from main loop, recovery master will give up trying takeover run with following message: "Unable to setup public takeover addresses. Try again later" And as a side-effect the monitoring is disabled on all the nodes. Before ctdb_takeover_run() is called from main loop, monitoring get disabled via startrecovery event. Since ctdb_takeover_run() fails, it never runs recovered event and monitoring does not get re-enabled. In main_loop, ctdb_takeover_run() is called with a takeover_fail_callback. This callback will get called if any of the nodes fail in handling takeip/releaseip/ipreallocated events in ctdb_takeover_run(). Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit a5c6bb1fffb8dc3960af113957a1fd080cc7c245) 2012-10-23 09:23:12 +04:00			`}`


recoverd: Don't continue if the current node gets banned Can not continue with recovery or monitoring cluster. Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 14399de1dd0bd8dabf1f48b1457e3ccb37589d8a) 2013-06-28 10:31:07 +04:00			`static void ban_misbehaving_nodes(struct ctdb_recoverd rec, bool self_ban)`
recoverd: Refactor code to ban misbehaving nodes Since we have nodemap information, there is no need to hardcode the limit of 20. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Pair-Programmed-With: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit aea12dce83ef385e9fb3bc03ac7ace0874a0e3fe) 2013-06-28 08:31:02 +04:00			`{`
			`struct ctdb_context *ctdb = rec->ctdb;`
			`int i;`
			`struct ctdb_banning_state *ban_state;`

recoverd: Don't continue if the current node gets banned Can not continue with recovery or monitoring cluster. Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 14399de1dd0bd8dabf1f48b1457e3ccb37589d8a) 2013-06-28 10:31:07 +04:00			`*self_ban = false;`
recoverd: Refactor code to ban misbehaving nodes Since we have nodemap information, there is no need to hardcode the limit of 20. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Pair-Programmed-With: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit aea12dce83ef385e9fb3bc03ac7ace0874a0e3fe) 2013-06-28 08:31:02 +04:00			`for (i=0; i<ctdb->num_nodes; i++) {`
			`if (ctdb->nodes[i]->ban_state == NULL) {`
			`continue;`
			`}`
			`ban_state = (struct ctdb_banning_state *)ctdb->nodes[i]->ban_state;`
			`if (ban_state->count < 2*ctdb->num_nodes) {`
			`continue;`
			`}`

			`DEBUG(DEBUG_NOTICE,("Node %u reached %u banning credits - banning it for %u seconds\n",`
			`ctdb->nodes[i]->pnn, ban_state->count,`
			`ctdb->tunable.recovery_ban_period));`
			`ctdb_ban_node(rec, ctdb->nodes[i]->pnn, ctdb->tunable.recovery_ban_period);`
			`ban_state->count = 0;`
recoverd: Don't continue if the current node gets banned Can not continue with recovery or monitoring cluster. Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 14399de1dd0bd8dabf1f48b1457e3ccb37589d8a) 2013-06-28 10:31:07 +04:00
			`/* Banning ourself? */`
			`if (ctdb->nodes[i]->pnn == rec->ctdb->pnn) {`
			`*self_ban = true;`
			`}`
recoverd: Refactor code to ban misbehaving nodes Since we have nodemap information, there is no need to hardcode the limit of 20. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Pair-Programmed-With: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit aea12dce83ef385e9fb3bc03ac7ace0874a0e3fe) 2013-06-28 08:31:02 +04:00			`}`
			`}`

recoverd: New function do_takeover_run() Factor the calling sequence for ctdb_takeover_run() into a new function and call it instead. This changes rec->need_takeover_run to false for each successful takeover run and that seems to be the right thing to do. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 9a3f0c0e61ca5c17e020c6e0463d73c7cf4f7c09) 2013-08-27 06:14:34 +04:00			`static bool do_takeover_run(struct ctdb_recoverd *rec,`
ctdb-daemon: Rename struct ctdb_node_map to ctdb_node_map_old Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 09:22:48 +03:00			`struct ctdb_node_map_old *nodemap,`
recoverd: New function do_takeover_run() Factor the calling sequence for ctdb_takeover_run() into a new function and call it instead. This changes rec->need_takeover_run to false for each successful takeover run and that seems to be the right thing to do. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 9a3f0c0e61ca5c17e020c6e0463d73c7cf4f7c09) 2013-08-27 06:14:34 +04:00			`bool banning_credits_on_fail)`
			`{`
recoverd: New SRVID message CTDB_SRVID_DISABLE_TAKEOVER_RUNS This implements a superset of CTDB_SRVID_DISABLE_IP_CHECK. It stops the IP checks but also causes any attempted takeover runs to fail and be rescheduled. This is meant to completely stop IP movements. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 00db4de53a0d86013e79e6577e7e6cf3ef864e56) 2013-08-27 09:04:40 +04:00			`uint32_t *nodes = NULL;`
ctdb-daemon: Rename struct srvid_request_data to ctdb_disable_message Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-28 10:23:13 +03:00			`struct ctdb_disable_message dtr;`
recoverd: Move disabling of IP checks into do_takeover_run() Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 48b603fbf16311daa47b01e7a33d477ed51da56d) 2013-09-03 05:21:09 +04:00			`TDB_DATA data;`
recoverd: New SRVID message CTDB_SRVID_DISABLE_TAKEOVER_RUNS This implements a superset of CTDB_SRVID_DISABLE_IP_CHECK. It stops the IP checks but also causes any attempted takeover runs to fail and be rescheduled. This is meant to completely stop IP movements. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 00db4de53a0d86013e79e6577e7e6cf3ef864e56) 2013-08-27 09:04:40 +04:00			`int i;`
recoverd: Be careful about freeing the list of IP rebalance target nodes It can change during a takeover run. If it does then don't free it. There are potentially fancier solutions (e.g. check what PNNs are new to the list) to this issue but this is the simplest. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit e81589b7084c661adf617e166cc2c25b4939f841) 2013-09-06 05:23:07 +04:00			`uint32_t *rebalance_nodes = rec->force_rebalance_nodes;`
recoverd: New function do_takeover_run() Factor the calling sequence for ctdb_takeover_run() into a new function and call it instead. This changes rec->need_takeover_run to false for each successful takeover run and that seems to be the right thing to do. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 9a3f0c0e61ca5c17e020c6e0463d73c7cf4f7c09) 2013-08-27 06:14:34 +04:00			`int ret;`
			`bool ok;`

recoverd: Improve logging for takeover runs Takeover runs are currently silent when they succeed. However, they are important, so log something by default. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit b39aa2e401fbb581207d986bac93778e9c01acdc) 2013-09-18 11:06:16 +04:00			`DEBUG(DEBUG_NOTICE, ("Takeover run starting\n"));`

ctdb-recoverd: Reimplement disabling takeover runs using ctdb_op_disable() Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-02-08 12:52:12 +03:00			`if (ctdb_op_is_in_progress(rec->takeover_run)) {`
recoverd: do_takeover_run() should mark when a takeover run is in progress Nested takeover runs should never happens so they should fail. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 8ed29c60c0a7dd29f2a6efdf694d38e94281e1c4) 2013-09-03 05:20:01 +04:00			`DEBUG(DEBUG_ERR, (__location__`
			`" takeover run already in progress \n"));`
			`ok = false;`
			`goto done;`
			`}`

ctdb-recoverd: Reimplement disabling takeover runs using ctdb_op_disable() Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-02-08 12:52:12 +03:00			`if (!ctdb_op_begin(rec->takeover_run)) {`
recoverd: New SRVID message CTDB_SRVID_DISABLE_TAKEOVER_RUNS This implements a superset of CTDB_SRVID_DISABLE_IP_CHECK. It stops the IP checks but also causes any attempted takeover runs to fail and be rescheduled. This is meant to completely stop IP movements. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 00db4de53a0d86013e79e6577e7e6cf3ef864e56) 2013-08-27 09:04:40 +04:00			`ok = false;`
			`goto done;`
recoverd: Move disabling of IP checks into do_takeover_run() Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 48b603fbf16311daa47b01e7a33d477ed51da56d) 2013-09-03 05:21:09 +04:00			`}`

recoverd: New SRVID message CTDB_SRVID_DISABLE_TAKEOVER_RUNS This implements a superset of CTDB_SRVID_DISABLE_IP_CHECK. It stops the IP checks but also causes any attempted takeover runs to fail and be rescheduled. This is meant to completely stop IP movements. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 00db4de53a0d86013e79e6577e7e6cf3ef864e56) 2013-08-27 09:04:40 +04:00			`/* Disable IP checks (takeover runs, really) on other nodes`
			`* while doing this takeover run. This will stop those other`
			`* nodes from triggering takeover runs when think they should`
			`* be hosting an IP but it isn't yet on an interface. Don't`
			`* wait for replies since a failure here might cause some`
			`* noise in the logs but will not actually cause a problem.`
			`*/`
			`dtr.srvid = 0; /* No reply */`
			`dtr.pnn = -1;`

			`data.dptr = (uint8_t*)&dtr;`
			`data.dsize = sizeof(dtr);`

			`nodes = list_of_connected_nodes(rec->ctdb, nodemap, rec, false);`

Revert "recoverd: Disable takeover runs on other nodes for 5 minutes" 5 minutes is too long to leave the cluster in limbo if the recovery daemon dies during a takeover run, even though this is quite unlikely. We need a new recover master to be able to do takeover runs fairly quickly. This reverts commit 71080676bb4acbd0d9b595a30cf7fe6dddbf426f. (This used to be ctdb commit 3e41170c78fc7a2bf526129c9b7db3739b61c6bf) 2013-10-24 04:13:16 +04:00			`/* Disable for 60 seconds. This can be a tunable later if`
recoverd: New SRVID message CTDB_SRVID_DISABLE_TAKEOVER_RUNS This implements a superset of CTDB_SRVID_DISABLE_IP_CHECK. It stops the IP checks but also causes any attempted takeover runs to fail and be rescheduled. This is meant to completely stop IP movements. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 00db4de53a0d86013e79e6577e7e6cf3ef864e56) 2013-08-27 09:04:40 +04:00			`* necessary.`
			`*/`
ctdb-daemon: Rename struct srvid_request_data to ctdb_disable_message Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-28 10:23:13 +03:00			`dtr.timeout = 60;`
recoverd: New SRVID message CTDB_SRVID_DISABLE_TAKEOVER_RUNS This implements a superset of CTDB_SRVID_DISABLE_IP_CHECK. It stops the IP checks but also causes any attempted takeover runs to fail and be rescheduled. This is meant to completely stop IP movements. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 00db4de53a0d86013e79e6577e7e6cf3ef864e56) 2013-08-27 09:04:40 +04:00			`for (i = 0; i < talloc_array_length(nodes); i++) {`
			`if (ctdb_client_send_message(rec->ctdb, nodes[i],`
			`CTDB_SRVID_DISABLE_TAKEOVER_RUNS,`
			`data) != 0) {`
			`DEBUG(DEBUG_INFO,("Failed to disable takeover runs\n"));`
			`}`
			`}`
recoverd: do_takeover_run() should mark when a takeover run is in progress Nested takeover runs should never happens so they should fail. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 8ed29c60c0a7dd29f2a6efdf694d38e94281e1c4) 2013-09-03 05:20:01 +04:00
recoverd: Fix the implementation of CTDB_SRVID_REBALANCE_NODE The current implementation has a few flaws: * A takeover run is called unconditionally when the timer goes even if the recovery master role has moved. This means a node other than the recovery master can incorrectly do a takeover run. * The rebalancing target nodes are cleared in the setup for a takeover run, regardless of whether the takeover run succeeds. * The timer to force a rebalance isn't cleared if another takeover run occurs before the deadline. Any forced rebalancing will happen in the first takeover run and when the timer expires some time later then an unnecessary takeover run will occur. * If the recovery master role moves then the rebalancing data will stay on the original node and affect the next takeover run to occur if the recovery master role should come back to the original node. Instead, store an array of rebalance target nodes in the recovery master context. This is passed as an extra argument to ctdb_takeover_run() each time it is called and is cleared when a takeover run succeeds. The timer hangs off the array of rebalance target nodes, which is cleared if the node isn't the recovery master. This means that it is possible to lose rebalance data if the recovery master role moves. However, that's a difficult problem to solve. The best way of approaching it is probably to try to stop the recovery master role from jumping around unnecesarily when inactive nodes join the cluster. The long term solution is to avoid this nonsense completely. The IP allocation algorithm needs to cache state between runs so that it knows which nodes have just become healthy. This also needs recovery master stability. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit c51c1efe5fc7fa668597f2acd435dee16e410fc9) 2013-09-04 08:30:04 +04:00			`ret = ctdb_takeover_run(rec->ctdb, nodemap,`
			`rec->force_rebalance_nodes,`
			`takeover_fail_callback,`
recoverd: New function do_takeover_run() Factor the calling sequence for ctdb_takeover_run() into a new function and call it instead. This changes rec->need_takeover_run to false for each successful takeover run and that seems to be the right thing to do. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 9a3f0c0e61ca5c17e020c6e0463d73c7cf4f7c09) 2013-08-27 06:14:34 +04:00			`banning_credits_on_fail ? rec : NULL);`
recoverd: Move disabling of IP checks into do_takeover_run() Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 48b603fbf16311daa47b01e7a33d477ed51da56d) 2013-09-03 05:21:09 +04:00
recoverd: New SRVID message CTDB_SRVID_DISABLE_TAKEOVER_RUNS This implements a superset of CTDB_SRVID_DISABLE_IP_CHECK. It stops the IP checks but also causes any attempted takeover runs to fail and be rescheduled. This is meant to completely stop IP movements. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 00db4de53a0d86013e79e6577e7e6cf3ef864e56) 2013-08-27 09:04:40 +04:00			`/* Reenable takeover runs and IP checks on other nodes */`
ctdb-daemon: Rename struct srvid_request_data to ctdb_disable_message Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-28 10:23:13 +03:00			`dtr.timeout = 0;`
recoverd: New SRVID message CTDB_SRVID_DISABLE_TAKEOVER_RUNS This implements a superset of CTDB_SRVID_DISABLE_IP_CHECK. It stops the IP checks but also causes any attempted takeover runs to fail and be rescheduled. This is meant to completely stop IP movements. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 00db4de53a0d86013e79e6577e7e6cf3ef864e56) 2013-08-27 09:04:40 +04:00			`for (i = 0; i < talloc_array_length(nodes); i++) {`
			`if (ctdb_client_send_message(rec->ctdb, nodes[i],`
			`CTDB_SRVID_DISABLE_TAKEOVER_RUNS,`
			`data) != 0) {`
Fix various spelling errors Reviewed-by: Andrew Bartlett <abartlet@samba.org> Reviewed-by: Michael Adam <obnox@samba.org> Autobuild-User(master): Andrew Bartlett <abartlet@samba.org> Autobuild-Date(master): Fri Nov 6 13:43:45 CET 2015 on sn-devel-104 2015-07-27 00:02:57 +03:00			`DEBUG(DEBUG_INFO,("Failed to re-enable takeover runs\n"));`
recoverd: New SRVID message CTDB_SRVID_DISABLE_TAKEOVER_RUNS This implements a superset of CTDB_SRVID_DISABLE_IP_CHECK. It stops the IP checks but also causes any attempted takeover runs to fail and be rescheduled. This is meant to completely stop IP movements. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 00db4de53a0d86013e79e6577e7e6cf3ef864e56) 2013-08-27 09:04:40 +04:00			`}`
recoverd: Move disabling of IP checks into do_takeover_run() Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 48b603fbf16311daa47b01e7a33d477ed51da56d) 2013-09-03 05:21:09 +04:00			`}`

recoverd: New function do_takeover_run() Factor the calling sequence for ctdb_takeover_run() into a new function and call it instead. This changes rec->need_takeover_run to false for each successful takeover run and that seems to be the right thing to do. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 9a3f0c0e61ca5c17e020c6e0463d73c7cf4f7c09) 2013-08-27 06:14:34 +04:00			`if (ret != 0) {`
recoverd: Improve logging for takeover runs Takeover runs are currently silent when they succeed. However, they are important, so log something by default. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit b39aa2e401fbb581207d986bac93778e9c01acdc) 2013-09-18 11:06:16 +04:00			`DEBUG(DEBUG_ERR, ("ctdb_takeover_run() failed\n"));`
recoverd: New function do_takeover_run() Factor the calling sequence for ctdb_takeover_run() into a new function and call it instead. This changes rec->need_takeover_run to false for each successful takeover run and that seems to be the right thing to do. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 9a3f0c0e61ca5c17e020c6e0463d73c7cf4f7c09) 2013-08-27 06:14:34 +04:00			`ok = false;`
			`goto done;`
			`}`

			`ok = true;`
recoverd: Fix the implementation of CTDB_SRVID_REBALANCE_NODE The current implementation has a few flaws: * A takeover run is called unconditionally when the timer goes even if the recovery master role has moved. This means a node other than the recovery master can incorrectly do a takeover run. * The rebalancing target nodes are cleared in the setup for a takeover run, regardless of whether the takeover run succeeds. * The timer to force a rebalance isn't cleared if another takeover run occurs before the deadline. Any forced rebalancing will happen in the first takeover run and when the timer expires some time later then an unnecessary takeover run will occur. * If the recovery master role moves then the rebalancing data will stay on the original node and affect the next takeover run to occur if the recovery master role should come back to the original node. Instead, store an array of rebalance target nodes in the recovery master context. This is passed as an extra argument to ctdb_takeover_run() each time it is called and is cleared when a takeover run succeeds. The timer hangs off the array of rebalance target nodes, which is cleared if the node isn't the recovery master. This means that it is possible to lose rebalance data if the recovery master role moves. However, that's a difficult problem to solve. The best way of approaching it is probably to try to stop the recovery master role from jumping around unnecesarily when inactive nodes join the cluster. The long term solution is to avoid this nonsense completely. The IP allocation algorithm needs to cache state between runs so that it knows which nodes have just become healthy. This also needs recovery master stability. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit c51c1efe5fc7fa668597f2acd435dee16e410fc9) 2013-09-04 08:30:04 +04:00			`/* Takeover run was successful so clear force rebalance targets */`
recoverd: Be careful about freeing the list of IP rebalance target nodes It can change during a takeover run. If it does then don't free it. There are potentially fancier solutions (e.g. check what PNNs are new to the list) to this issue but this is the simplest. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit e81589b7084c661adf617e166cc2c25b4939f841) 2013-09-06 05:23:07 +04:00			`if (rebalance_nodes == rec->force_rebalance_nodes) {`
			`TALLOC_FREE(rec->force_rebalance_nodes);`
			`} else {`
			`DEBUG(DEBUG_WARNING,`
			`("Rebalance target nodes changed during takeover run - not clearing\n"));`
			`}`
recoverd: New function do_takeover_run() Factor the calling sequence for ctdb_takeover_run() into a new function and call it instead. This changes rec->need_takeover_run to false for each successful takeover run and that seems to be the right thing to do. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 9a3f0c0e61ca5c17e020c6e0463d73c7cf4f7c09) 2013-08-27 06:14:34 +04:00			`done:`
			`rec->need_takeover_run = !ok;`
recoverd: New SRVID message CTDB_SRVID_DISABLE_TAKEOVER_RUNS This implements a superset of CTDB_SRVID_DISABLE_IP_CHECK. It stops the IP checks but also causes any attempted takeover runs to fail and be rescheduled. This is meant to completely stop IP movements. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 00db4de53a0d86013e79e6577e7e6cf3ef864e56) 2013-08-27 09:04:40 +04:00			`talloc_free(nodes);`
ctdb-recoverd: Reimplement disabling takeover runs using ctdb_op_disable() Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-02-08 12:52:12 +03:00			`ctdb_op_end(rec->takeover_run);`
recoverd: Improve logging for takeover runs Takeover runs are currently silent when they succeed. However, they are important, so log something by default. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit b39aa2e401fbb581207d986bac93778e9c01acdc) 2013-09-18 11:06:16 +04:00
			`DEBUG(DEBUG_NOTICE, ("Takeover run %s\n", ok ? "completed successfully" : "unsuccessful"));`
recoverd: New function do_takeover_run() Factor the calling sequence for ctdb_takeover_run() into a new function and call it instead. This changes rec->need_takeover_run to false for each successful takeover run and that seems to be the right thing to do. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 9a3f0c0e61ca5c17e020c6e0463d73c7cf4f7c09) 2013-08-27 06:14:34 +04:00			`return ok;`
			`}`

ctdb-recoverd: Add code for parallel database recovery Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-09-17 09:22:38 +03:00			`struct recovery_helper_state {`
			`int fd[2];`
			`pid_t pid;`
			`int result;`
			`bool done;`
			`};`

			`static void ctdb_recovery_handler(struct tevent_context *ev,`
			`struct tevent_fd *fde,`
			`uint16_t flags, void *private_data)`
			`{`
			`struct recovery_helper_state *state = talloc_get_type_abort(`
			`private_data, struct recovery_helper_state);`
			`int ret;`

			`ret = sys_read(state->fd[0], &state->result, sizeof(state->result));`
			`if (ret != sizeof(state->result)) {`
			`state->result = EPIPE;`
			`}`

			`state->done = true;`
			`}`


			`static int db_recovery_parallel(struct ctdb_recoverd rec, TALLOC_CTX mem_ctx)`
			`{`
			`static char prog[PATH_MAX+1] = "";`
			`const char **args;`
			`struct recovery_helper_state *state;`
			`struct tevent_fd *fde;`
			`int nargs, ret;`

			`if (!ctdb_set_helper("recovery_helper", prog, sizeof(prog),`
			`"CTDB_RECOVERY_HELPER", CTDB_HELPER_BINDIR,`
			`"ctdb_recovery_helper")) {`
			`ctdb_die(rec->ctdb, "Unable to set recovery helper\n");`
			`}`

			`state = talloc_zero(mem_ctx, struct recovery_helper_state);`
			`if (state == NULL) {`
			`DEBUG(DEBUG_ERR, (__location__ " memory error\n"));`
			`return -1;`
			`}`

			`state->pid = -1;`

			`ret = pipe(state->fd);`
			`if (ret != 0) {`
			`DEBUG(DEBUG_ERR,`
			`("Failed to create pipe for recovery helper\n"));`
			`goto fail;`
			`}`

			`set_close_on_exec(state->fd[0]);`

			`nargs = 4;`
			`args = talloc_array(state, const char *, nargs);`
			`if (args == NULL) {`
			`DEBUG(DEBUG_ERR, (__location__ " memory error\n"));`
			`goto fail;`
			`}`

			`args[0] = talloc_asprintf(args, "%d", state->fd[1]);`
			`args[1] = rec->ctdb->daemon.name;`
			`args[2] = talloc_asprintf(args, "%u", new_generation());`
			`args[3] = NULL;`

			`if (args[0] == NULL \|\| args[2] == NULL) {`
			`DEBUG(DEBUG_ERR, (__location__ " memory error\n"));`
			`goto fail;`
			`}`

			`if (!ctdb_vfork_with_logging(state, rec->ctdb, "recovery", prog, nargs,`
			`args, NULL, NULL, &state->pid)) {`
			`DEBUG(DEBUG_ERR,`
			`("Failed to create child for recovery helper\n"));`
			`goto fail;`
			`}`

			`close(state->fd[1]);`
			`state->fd[1] = -1;`

			`state->done = false;`

			`fde = tevent_add_fd(rec->ctdb->ev, rec->ctdb, state->fd[0],`
			`TEVENT_FD_READ, ctdb_recovery_handler, state);`
			`if (fde == NULL) {`
			`goto fail;`
			`}`
			`tevent_fd_set_auto_close(fde);`

			`while (!state->done) {`
			`tevent_loop_once(rec->ctdb->ev);`
			`}`

			`close(state->fd[0]);`
			`state->fd[0] = -1;`

			`if (state->result != 0) {`
			`goto fail;`
			`}`

			`ctdb_kill(rec->ctdb, state->pid, SIGKILL);`
			`talloc_free(state);`
			`return 0;`

			`fail:`
			`if (state->fd[0] != -1) {`
			`close(state->fd[0]);`
			`}`
			`if (state->fd[1] != -1) {`
			`close(state->fd[1]);`
			`}`
			`if (state->pid != -1) {`
			`ctdb_kill(rec->ctdb, state->pid, SIGKILL);`
			`}`
			`talloc_free(state);`
			`return -1;`
			`}`

ctdb-recovery: Factor out existing database recovery code Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-09-17 09:00:47 +03:00			`static int db_recovery_serial(struct ctdb_recoverd rec, TALLOC_CTX mem_ctx,`
ctdb-daemon: Rename struct ctdb_node_map to ctdb_node_map_old Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 09:22:48 +03:00			`uint32_t pnn, struct ctdb_node_map_old *nodemap,`
ctdb-recovery: Factor out existing database recovery code Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-09-17 09:00:47 +03:00			`struct ctdb_vnn_map *vnnmap,`
ctdb-daemon: Rename struct ctdb_dbid_map to ctdb_dbid_map_old Match struct ctdb_dbid as per protocol/protocol.h Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 09:46:05 +03:00			`struct ctdb_dbid_map_old *dbmap)`
add a helper function to create all missing remote databases detected during recovery (This used to be ctdb commit 04758c6f7d8f61260be6d2472380cb7904984427) 2007-05-06 04:04:37 +04:00			`{`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`struct ctdb_context *ctdb = rec->ctdb;`
add a helper function to create all missing remote databases detected during recovery (This used to be ctdb commit 04758c6f7d8f61260be6d2472380cb7904984427) 2007-05-06 04:04:37 +04:00			`uint32_t generation;`
added paranoid transaction ids (This used to be ctdb commit afc1da53873cdbd31fcc8c6b22fae262e344cf6e) 2008-01-06 05:24:55 +03:00			`TDB_DATA data;`
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`uint32_t *nodes;`
ctdb-recovery: Factor out existing database recovery code Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-09-17 09:00:47 +03:00			`int ret, i, j;`
when performing a recovery, ensure that all nodes use the same reclock file setting as the recovery master (This used to be ctdb commit 26793ad42b77c2328a00ac9a12bca813c7425245) 2010-05-06 03:33:08 +04:00
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`/* set recovery mode to active on all nodes */`
ctdb-recoverd: Do not freeze databases for election If election occurs during SMB activity, then trying to freeze all the databases can cause samba/ctdb deadlock which parallel database recovery is trying to avoid. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-06 03:52:06 +03:00			`ret = set_recovery_mode(ctdb, rec, nodemap, CTDB_RECOVERY_ACTIVE, true);`
in the destructor for the lock-wait child, make sure that we cancel any pending transactions. (This used to be ctdb commit 45b6ff64f6ddf037b810c4e5f8b9f04d71067b98) 2008-07-07 02:50:12 +04:00			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to set recovery mode to active on cluster\n"));`
ctdb-recovery: Factor out existing database recovery code Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-09-17 09:00:47 +03:00			`return -1;`
recovery daemon this program is a client to the local ctdb daemon every second it pulls all vnnmap and nodemaps from all nodes that are available and checks if a recovery is required a recovery is required if : * all nodes do NOT have an identical vnnmap and generation * all nodes do NOT have an identical nodemap * there are active nodes that are NOT in the nodemap * there are nodes in the nodemap that are NOT active During recovery, the recovery tool will also make sure that all nodes know about and have created all databases. (This used to be ctdb commit 2f2650467bac7e8954de7c17cb34f46b0bdbcd26) 2007-05-04 09:21:40 +04:00			`}`

merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`/* execute the "startrecovery" event script on all nodes */`
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`ret = run_startrecovery_eventscript(rec, nodemap);`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`if (ret!=0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to run the 'startrecovery' event on cluster\n"));`
ctdb-recovery: Factor out existing database recovery code Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-09-17 09:00:47 +03:00			`return -1;`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`}`

new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`/* pick a new generation number */`
			`generation = new_generation();`
break the code that repoints dmaster for all local and remote records into a separate helper function (This used to be ctdb commit d5ab30d0ac21e736eb34eaa19bccfee5f0ce7cfb) 2007-05-06 04:22:13 +04:00
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`/* change the vnnmap on this node to use the new generation`
			`number but not on any other nodes.`
			`this guarantees that if we abort the recovery prematurely`
			`for some reason (a node stops responding?)`
			`that we can just return immediately and we will reenter`
			`recovery shortly again.`
			`I.e. we deliberately leave the cluster with an inconsistent`
			`generation id to allow us to abort recovery at any stage and`
			`just restart it from scratch.`
more optimisations to recovery (This used to be ctdb commit 9a41ad0a842cd4f3792d6e84b5c809b7ff6f342e) 2008-01-02 14:44:46 +03:00			`*/`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`vnnmap->generation = generation;`
			`ret = ctdb_ctrl_setvnnmap(ctdb, CONTROL_TIMEOUT(), pnn, mem_ctx, vnnmap);`
more optimisations to recovery (This used to be ctdb commit 9a41ad0a842cd4f3792d6e84b5c809b7ff6f342e) 2008-01-02 14:44:46 +03:00			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to set vnnmap for node %u\n", pnn));`
ctdb-recovery: Factor out existing database recovery code Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-09-17 09:00:47 +03:00			`return -1;`
more optimisations to recovery (This used to be ctdb commit 9a41ad0a842cd4f3792d6e84b5c809b7ff6f342e) 2008-01-02 14:44:46 +03:00			`}`

ctdb-daemon: Introduce per database generation The database generation for each database is updated only during recovery. After recovery is complete the database generation would be the same as the global generation. The database generation is required for parallel database recovery. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-09-11 07:20:44 +03:00			`/* Database generations are updated when the transaction is commited to`
			`* the databases. So make sure to use the final generation as the`
			`* transaction id`
			`*/`
			`generation = new_generation();`

added paranoid transaction ids (This used to be ctdb commit afc1da53873cdbd31fcc8c6b22fae262e344cf6e) 2008-01-06 05:24:55 +03:00			`data.dptr = (void *)&generation;`
			`data.dsize = sizeof(uint32_t);`

add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`nodes = list_of_active_nodes(ctdb, nodemap, mem_ctx, true);`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`if (ctdb_client_async_control(ctdb, CTDB_CONTROL_TRANSACTION_START,`
initial attempt at freezing databases in priority order (This used to be ctdb commit e8d692590da1070c87a4144031e3306d190ebed2) 2009-10-12 05:08:39 +04:00			`nodes, 0,`
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`CONTROL_TIMEOUT(), false, data,`
add a new control for explicitely cancelling recovery transactions, i.e. the transactions we start across all tdb databased during the recovery. this allows us to properly clean up and delete these tdb transactions on a recovery failure. (This used to be ctdb commit b2ce8b900a7d00944c84e0574fea5b371064a06d) 2009-10-12 09:48:05 +04:00			`NULL,`
			`transaction_start_fail_callback,`
			`rec) != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to start transactions. Recovery failed.\n"));`
add a new control for explicitely cancelling recovery transactions, i.e. the transactions we start across all tdb databased during the recovery. this allows us to properly clean up and delete these tdb transactions on a recovery failure. (This used to be ctdb commit b2ce8b900a7d00944c84e0574fea5b371064a06d) 2009-10-12 09:48:05 +04:00			`if (ctdb_client_async_control(ctdb, CTDB_CONTROL_TRANSACTION_CANCEL,`
			`nodes, 0,`
			`CONTROL_TIMEOUT(), false, tdb_null,`
			`NULL,`
			`NULL,`
			`NULL) != 0) {`
			`DEBUG(DEBUG_ERR,("Failed to cancel recovery transaction\n"));`
			`}`
ctdb-recovery: Factor out existing database recovery code Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-09-17 09:00:47 +03:00			`return -1;`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`}`
more optimisations to recovery (This used to be ctdb commit 9a41ad0a842cd4f3792d6e84b5c809b7ff6f342e) 2008-01-02 14:44:46 +03:00
added debug constants to allow for better mapping to syslog levels (This used to be ctdb commit 7ba8f1dde318eab03f4257e5a89fd23e7281e502) 2008-02-04 09:44:24 +03:00			`DEBUG(DEBUG_NOTICE,(__location__ " started transactions on all nodes\n"));`
more optimisations to recovery (This used to be ctdb commit 9a41ad0a842cd4f3792d6e84b5c809b7ff6f342e) 2008-01-02 14:44:46 +03:00
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`for (i=0;i<dbmap->num;i++) {`
recovery: pass the persistent flag to recover_database() and further down to pull_remote_database(), pull_one_remote_database(), and push_recdb_database(). This is in preparation of special handling of persistent databases during recoveries. Michael (This used to be ctdb commit 90abc4ac7c16e854cf6e8f96b60a77bc92e35e07) 2009-11-29 13:14:31 +03:00			`ret = recover_database(rec, mem_ctx,`
ctdb-daemon: Rename struct ctdb_dbid_map to ctdb_dbid_map_old Match struct ctdb_dbid as per protocol/protocol.h Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 09:46:05 +03:00			`dbmap->dbs[i].db_id,`
ReadOnly: Change the ctdb_db structure to keep a uint8_t for flags instead of a boolean for the persistent flag. This is the same size as the original boolean but allows ut to add additional flags for the database (This used to be ctdb commit 7462761638d25880ad46024ad4ef21667eb99a98) 2011-09-01 04:21:55 +04:00			`dbmap->dbs[i].flags & CTDB_DB_FLAGS_PERSISTENT,`
recovery: pass the persistent flag to recover_database() and further down to pull_remote_database(), pull_one_remote_database(), and push_recdb_database(). This is in preparation of special handling of persistent databases during recoveries. Michael (This used to be ctdb commit 90abc4ac7c16e854cf6e8f96b60a77bc92e35e07) 2009-11-29 13:14:31 +03:00			`pnn, nodemap, generation);`
			`if (ret != 0) {`
ctdb-daemon: Rename struct ctdb_dbid_map to ctdb_dbid_map_old Match struct ctdb_dbid as per protocol/protocol.h Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 09:46:05 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Failed to recover database 0x%x\n", dbmap->dbs[i].db_id));`
ctdb-recovery: Factor out existing database recovery code Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-09-17 09:00:47 +03:00			`return -1;`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00			`}`
			`}`

added debug constants to allow for better mapping to syslog levels (This used to be ctdb commit 7ba8f1dde318eab03f4257e5a89fd23e7281e502) 2008-02-04 09:44:24 +03:00			`DEBUG(DEBUG_NOTICE, (__location__ " Recovery - starting database commits\n"));`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 04:38:01 +03:00
			`/* commit all the changes */`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`if (ctdb_client_async_control(ctdb, CTDB_CONTROL_TRANSACTION_COMMIT,`
initial attempt at freezing databases in priority order (This used to be ctdb commit e8d692590da1070c87a4144031e3306d190ebed2) 2009-10-12 05:08:39 +04:00			`nodes, 0,`
add a callback for failed nodes to the async control helper. this callback is called for every node where the control failed (or timed out) when we issue the start recovery control from recovery master, set any node that fails as a culprit so it will eventually be banned (This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2) 2008-06-12 10:53:36 +04:00			`CONTROL_TIMEOUT(), false, data,`
			`NULL, NULL,`
			`NULL) != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to commit recovery changes. Recovery failed.\n"));`
ctdb-recovery: Factor out existing database recovery code Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-09-17 09:00:47 +03:00			`return -1;`
recovery daemon this program is a client to the local ctdb daemon every second it pulls all vnnmap and nodemaps from all nodes that are available and checks if a recovery is required a recovery is required if : * all nodes do NOT have an identical vnnmap and generation * all nodes do NOT have an identical nodemap * there are active nodes that are NOT in the nodemap * there are nodes in the nodemap that are NOT active During recovery, the recovery tool will also make sure that all nodes know about and have created all databases. (This used to be ctdb commit 2f2650467bac7e8954de7c17cb34f46b0bdbcd26) 2007-05-04 09:21:40 +04:00			`}`

added debug constants to allow for better mapping to syslog levels (This used to be ctdb commit 7ba8f1dde318eab03f4257e5a89fd23e7281e502) 2008-02-04 09:44:24 +03:00			`DEBUG(DEBUG_NOTICE, (__location__ " Recovery - committed databases\n"));`
create a helper function for recovery to push all local databases out onto the remote nodes (This used to be ctdb commit 1ba76d374652cfa29e56fb77c7190349e42d3bcc) 2007-05-06 04:38:44 +04:00
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`/* build a new vnn map with all the currently active and`
			`unbanned nodes */`
remove old s3 recovery code fixed vnnmap wire format in recover daemon (This used to be ctdb commit e03fab7bfe0cf43f40c49a3d63e75dc44001d8d8) 2007-05-10 02:49:57 +04:00			`vnnmap = talloc(mem_ctx, struct ctdb_vnn_map);`
			`CTDB_NO_MEMORY(ctdb, vnnmap);`
recovery daemon this program is a client to the local ctdb daemon every second it pulls all vnnmap and nodemaps from all nodes that are available and checks if a recovery is required a recovery is required if : * all nodes do NOT have an identical vnnmap and generation * all nodes do NOT have an identical nodemap * there are active nodes that are NOT in the nodemap * there are nodes in the nodemap that are NOT active During recovery, the recovery tool will also make sure that all nodes know about and have created all databases. (This used to be ctdb commit 2f2650467bac7e8954de7c17cb34f46b0bdbcd26) 2007-05-04 09:21:40 +04:00			`vnnmap->generation = generation;`
Expand the client async framework so that it can take a callback function. This allows us to use the async framework also for controls that return outdata. Add a "capabilities" field to the ctdb_node structure. This field is only initialized and kept valid inside the recovery daemon context and not inside the main ctdb daemon. change the GET_CAPABILITIES control to return the capabilities in outdata instead of in the res return variable. When performing a recovery inside the recovery daemon, read the capabilities from all connected nodes and update the ctdb->nodes list of nodes. when building the new vnnmap after the database rebuild in recovery, do not include any nodes which lack the LMASTER capability in the new vnnmap. Unless there are no available connected node that sports the LMASTER capability in which case we let the local node (recmaster) take on the lmaster role temporarily (i.e. become a member of the vnnmap list) (This used to be ctdb commit 0f1883c69c689b28b0c04148774840b2c4081df6) 2008-05-06 09:42:59 +04:00			`vnnmap->size = 0;`
added admin commands to ban/unban nodes (This used to be ctdb commit 4dad04172e7e4955b5bf6444a85b19901c9683ad) 2007-06-07 10:34:33 +04:00			`vnnmap->map = talloc_zero_array(vnnmap, uint32_t, vnnmap->size);`
Expand the client async framework so that it can take a callback function. This allows us to use the async framework also for controls that return outdata. Add a "capabilities" field to the ctdb_node structure. This field is only initialized and kept valid inside the recovery daemon context and not inside the main ctdb daemon. change the GET_CAPABILITIES control to return the capabilities in outdata instead of in the res return variable. When performing a recovery inside the recovery daemon, read the capabilities from all connected nodes and update the ctdb->nodes list of nodes. when building the new vnnmap after the database rebuild in recovery, do not include any nodes which lack the LMASTER capability in the new vnnmap. Unless there are no available connected node that sports the LMASTER capability in which case we let the local node (recmaster) take on the lmaster role temporarily (i.e. become a member of the vnnmap list) (This used to be ctdb commit 0f1883c69c689b28b0c04148774840b2c4081df6) 2008-05-06 09:42:59 +04:00			`CTDB_NO_MEMORY(ctdb, vnnmap->map);`
recovery daemon this program is a client to the local ctdb daemon every second it pulls all vnnmap and nodemaps from all nodes that are available and checks if a recovery is required a recovery is required if : * all nodes do NOT have an identical vnnmap and generation * all nodes do NOT have an identical nodemap * there are active nodes that are NOT in the nodemap * there are nodes in the nodemap that are NOT active During recovery, the recovery tool will also make sure that all nodes know about and have created all databases. (This used to be ctdb commit 2f2650467bac7e8954de7c17cb34f46b0bdbcd26) 2007-05-04 09:21:40 +04:00			`for (i=j=0;i<nodemap->num;i++) {`
Expand the client async framework so that it can take a callback function. This allows us to use the async framework also for controls that return outdata. Add a "capabilities" field to the ctdb_node structure. This field is only initialized and kept valid inside the recovery daemon context and not inside the main ctdb daemon. change the GET_CAPABILITIES control to return the capabilities in outdata instead of in the res return variable. When performing a recovery inside the recovery daemon, read the capabilities from all connected nodes and update the ctdb->nodes list of nodes. when building the new vnnmap after the database rebuild in recovery, do not include any nodes which lack the LMASTER capability in the new vnnmap. Unless there are no available connected node that sports the LMASTER capability in which case we let the local node (recmaster) take on the lmaster role temporarily (i.e. become a member of the vnnmap list) (This used to be ctdb commit 0f1883c69c689b28b0c04148774840b2c4081df6) 2008-05-06 09:42:59 +04:00			`if (nodemap->nodes[i].flags & NODE_FLAGS_INACTIVE) {`
			`continue;`
recovery daemon this program is a client to the local ctdb daemon every second it pulls all vnnmap and nodemaps from all nodes that are available and checks if a recovery is required a recovery is required if : * all nodes do NOT have an identical vnnmap and generation * all nodes do NOT have an identical nodemap * there are active nodes that are NOT in the nodemap * there are nodes in the nodemap that are NOT active During recovery, the recovery tool will also make sure that all nodes know about and have created all databases. (This used to be ctdb commit 2f2650467bac7e8954de7c17cb34f46b0bdbcd26) 2007-05-04 09:21:40 +04:00			`}`
ctdb-recoverd: Use capabilities API Simplify update_capabilities() using the capabilities API and store the capabilities in new field rec->caps rather than scattered around ctdb->nodes. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2014-07-31 09:26:03 +04:00			`if (!ctdb_node_has_capabilities(rec->caps,`
			`ctdb->nodes[i]->pnn,`
			`CTDB_CAP_LMASTER)) {`
Expand the client async framework so that it can take a callback function. This allows us to use the async framework also for controls that return outdata. Add a "capabilities" field to the ctdb_node structure. This field is only initialized and kept valid inside the recovery daemon context and not inside the main ctdb daemon. change the GET_CAPABILITIES control to return the capabilities in outdata instead of in the res return variable. When performing a recovery inside the recovery daemon, read the capabilities from all connected nodes and update the ctdb->nodes list of nodes. when building the new vnnmap after the database rebuild in recovery, do not include any nodes which lack the LMASTER capability in the new vnnmap. Unless there are no available connected node that sports the LMASTER capability in which case we let the local node (recmaster) take on the lmaster role temporarily (i.e. become a member of the vnnmap list) (This used to be ctdb commit 0f1883c69c689b28b0c04148774840b2c4081df6) 2008-05-06 09:42:59 +04:00			`/* this node can not be an lmaster */`
			`DEBUG(DEBUG_DEBUG, ("Node %d cant be a LMASTER, skipping it\n", i));`
			`continue;`
			`}`

			`vnnmap->size++;`
fixed realloc bug Should always use type safe talloc functions when possible. In this case we were allocating bytes instead of uint32_t (This used to be ctdb commit cb14ee57dd0a589242da1ac2830bb7939df460a5) 2008-05-08 13:59:24 +04:00			`vnnmap->map = talloc_realloc(vnnmap, vnnmap->map, uint32_t, vnnmap->size);`
Expand the client async framework so that it can take a callback function. This allows us to use the async framework also for controls that return outdata. Add a "capabilities" field to the ctdb_node structure. This field is only initialized and kept valid inside the recovery daemon context and not inside the main ctdb daemon. change the GET_CAPABILITIES control to return the capabilities in outdata instead of in the res return variable. When performing a recovery inside the recovery daemon, read the capabilities from all connected nodes and update the ctdb->nodes list of nodes. when building the new vnnmap after the database rebuild in recovery, do not include any nodes which lack the LMASTER capability in the new vnnmap. Unless there are no available connected node that sports the LMASTER capability in which case we let the local node (recmaster) take on the lmaster role temporarily (i.e. become a member of the vnnmap list) (This used to be ctdb commit 0f1883c69c689b28b0c04148774840b2c4081df6) 2008-05-06 09:42:59 +04:00			`CTDB_NO_MEMORY(ctdb, vnnmap->map);`
			`vnnmap->map[j++] = nodemap->nodes[i].pnn;`

recovery daemon this program is a client to the local ctdb daemon every second it pulls all vnnmap and nodemaps from all nodes that are available and checks if a recovery is required a recovery is required if : * all nodes do NOT have an identical vnnmap and generation * all nodes do NOT have an identical nodemap * there are active nodes that are NOT in the nodemap * there are nodes in the nodemap that are NOT active During recovery, the recovery tool will also make sure that all nodes know about and have created all databases. (This used to be ctdb commit 2f2650467bac7e8954de7c17cb34f46b0bdbcd26) 2007-05-04 09:21:40 +04:00			`}`
Expand the client async framework so that it can take a callback function. This allows us to use the async framework also for controls that return outdata. Add a "capabilities" field to the ctdb_node structure. This field is only initialized and kept valid inside the recovery daemon context and not inside the main ctdb daemon. change the GET_CAPABILITIES control to return the capabilities in outdata instead of in the res return variable. When performing a recovery inside the recovery daemon, read the capabilities from all connected nodes and update the ctdb->nodes list of nodes. when building the new vnnmap after the database rebuild in recovery, do not include any nodes which lack the LMASTER capability in the new vnnmap. Unless there are no available connected node that sports the LMASTER capability in which case we let the local node (recmaster) take on the lmaster role temporarily (i.e. become a member of the vnnmap list) (This used to be ctdb commit 0f1883c69c689b28b0c04148774840b2c4081df6) 2008-05-06 09:42:59 +04:00			`if (vnnmap->size == 0) {`
			`DEBUG(DEBUG_NOTICE, ("No suitable lmasters found. Adding local node (recmaster) anyway.\n"));`
			`vnnmap->size++;`
fixed realloc bug Should always use type safe talloc functions when possible. In this case we were allocating bytes instead of uint32_t (This used to be ctdb commit cb14ee57dd0a589242da1ac2830bb7939df460a5) 2008-05-08 13:59:24 +04:00			`vnnmap->map = talloc_realloc(vnnmap, vnnmap->map, uint32_t, vnnmap->size);`
Expand the client async framework so that it can take a callback function. This allows us to use the async framework also for controls that return outdata. Add a "capabilities" field to the ctdb_node structure. This field is only initialized and kept valid inside the recovery daemon context and not inside the main ctdb daemon. change the GET_CAPABILITIES control to return the capabilities in outdata instead of in the res return variable. When performing a recovery inside the recovery daemon, read the capabilities from all connected nodes and update the ctdb->nodes list of nodes. when building the new vnnmap after the database rebuild in recovery, do not include any nodes which lack the LMASTER capability in the new vnnmap. Unless there are no available connected node that sports the LMASTER capability in which case we let the local node (recmaster) take on the lmaster role temporarily (i.e. become a member of the vnnmap list) (This used to be ctdb commit 0f1883c69c689b28b0c04148774840b2c4081df6) 2008-05-06 09:42:59 +04:00			`CTDB_NO_MEMORY(ctdb, vnnmap->map);`
			`vnnmap->map[0] = pnn;`
ctdb-recovery: Factor out existing database recovery code Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-09-17 09:00:47 +03:00			`}`
recovery daemon this program is a client to the local ctdb daemon every second it pulls all vnnmap and nodemaps from all nodes that are available and checks if a recovery is required a recovery is required if : * all nodes do NOT have an identical vnnmap and generation * all nodes do NOT have an identical nodemap * there are active nodes that are NOT in the nodemap * there are nodes in the nodemap that are NOT active During recovery, the recovery tool will also make sure that all nodes know about and have created all databases. (This used to be ctdb commit 2f2650467bac7e8954de7c17cb34f46b0bdbcd26) 2007-05-04 09:21:40 +04:00
break out the code to update all nodes to the new vnnmap into a helper function (This used to be ctdb commit 81d39177949b54715710907d14ddc888dc09b064) 2007-05-06 04:42:18 +04:00			`/* update to the new vnnmap on all nodes */`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`ret = update_vnnmap_on_all_nodes(ctdb, nodemap, pnn, vnnmap, mem_ctx);`
break out the code to update all nodes to the new vnnmap into a helper function (This used to be ctdb commit 81d39177949b54715710907d14ddc888dc09b064) 2007-05-06 04:42:18 +04:00			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to update vnnmap on all nodes\n"));`
ctdb-recovery: Factor out existing database recovery code Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-09-17 09:00:47 +03:00			`return -1;`
recovery daemon this program is a client to the local ctdb daemon every second it pulls all vnnmap and nodemaps from all nodes that are available and checks if a recovery is required a recovery is required if : * all nodes do NOT have an identical vnnmap and generation * all nodes do NOT have an identical nodemap * there are active nodes that are NOT in the nodemap * there are nodes in the nodemap that are NOT active During recovery, the recovery tool will also make sure that all nodes know about and have created all databases. (This used to be ctdb commit 2f2650467bac7e8954de7c17cb34f46b0bdbcd26) 2007-05-04 09:21:40 +04:00			`}`

added debug constants to allow for better mapping to syslog levels (This used to be ctdb commit 7ba8f1dde318eab03f4257e5a89fd23e7281e502) 2008-02-04 09:44:24 +03:00			`DEBUG(DEBUG_NOTICE, (__location__ " Recovery - updated vnnmap\n"));`
recovery daemon this program is a client to the local ctdb daemon every second it pulls all vnnmap and nodemaps from all nodes that are available and checks if a recovery is required a recovery is required if : * all nodes do NOT have an identical vnnmap and generation * all nodes do NOT have an identical nodemap * there are active nodes that are NOT in the nodemap * there are nodes in the nodemap that are NOT active During recovery, the recovery tool will also make sure that all nodes know about and have created all databases. (This used to be ctdb commit 2f2650467bac7e8954de7c17cb34f46b0bdbcd26) 2007-05-04 09:21:40 +04:00
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`/* update recmaster to point to us for all nodes */`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`ret = set_recovery_master(ctdb, nodemap, pnn);`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`if (ret!=0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to set recovery master\n"));`
ctdb-recovery: Factor out existing database recovery code Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-09-17 09:00:47 +03:00			`return -1;`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`}`

added debug constants to allow for better mapping to syslog levels (This used to be ctdb commit 7ba8f1dde318eab03f4257e5a89fd23e7281e502) 2008-02-04 09:44:24 +03:00			`DEBUG(DEBUG_NOTICE, (__location__ " Recovery - updated recmaster\n"));`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00
Fix the chicken and egg problem with ctdb/samba and a registry smb.conf This attempts to fix the problem of ctdb event scripts blocking due to attempted access to the ctdb databases during recovery. The changes are: - now only the 'shutdown' and 'startrecovery' events can be called with the databases locked in recovery. The event scripts must ensure that for these two events no database access is attempted - the recovered, takeip and releaseip events could previously be called inside a recovery. The code now ensures that this doesn't happen, delaying the events till after recovery has finished - the 50.samba event script now avoids using testparm unless it is really needed This needs extensive testing. (This used to be ctdb commit e3cdb8f2be6a44ec877efcd75c7297edb008a80b) 2008-05-14 14:57:04 +04:00			`/* disable recovery mode */`
ctdb-recoverd: Do not freeze databases for election If election occurs during SMB activity, then trying to freeze all the databases can cause samba/ctdb deadlock which parallel database recovery is trying to avoid. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-06 03:52:06 +03:00			`ret = set_recovery_mode(ctdb, rec, nodemap, CTDB_RECOVERY_NORMAL, false);`
in the destructor for the lock-wait child, make sure that we cancel any pending transactions. (This used to be ctdb commit 45b6ff64f6ddf037b810c4e5f8b9f04d71067b98) 2008-07-07 02:50:12 +04:00			`if (ret != 0) {`
Fix the chicken and egg problem with ctdb/samba and a registry smb.conf This attempts to fix the problem of ctdb event scripts blocking due to attempted access to the ctdb databases during recovery. The changes are: - now only the 'shutdown' and 'startrecovery' events can be called with the databases locked in recovery. The event scripts must ensure that for these two events no database access is attempted - the recovered, takeip and releaseip events could previously be called inside a recovery. The code now ensures that this doesn't happen, delaying the events till after recovery has finished - the 50.samba event script now avoids using testparm unless it is really needed This needs extensive testing. (This used to be ctdb commit e3cdb8f2be6a44ec877efcd75c7297edb008a80b) 2008-05-14 14:57:04 +04:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to set recovery mode to normal on cluster\n"));`
ctdb-recovery: Factor out existing database recovery code Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-09-17 09:00:47 +03:00			`return -1;`
Fix the chicken and egg problem with ctdb/samba and a registry smb.conf This attempts to fix the problem of ctdb event scripts blocking due to attempted access to the ctdb databases during recovery. The changes are: - now only the 'shutdown' and 'startrecovery' events can be called with the databases locked in recovery. The event scripts must ensure that for these two events no database access is attempted - the recovered, takeip and releaseip events could previously be called inside a recovery. The code now ensures that this doesn't happen, delaying the events till after recovery has finished - the 50.samba event script now avoids using testparm unless it is really needed This needs extensive testing. (This used to be ctdb commit e3cdb8f2be6a44ec877efcd75c7297edb008a80b) 2008-05-14 14:57:04 +04:00			`}`

			`DEBUG(DEBUG_NOTICE, (__location__ " Recovery - disabled recovery mode\n"));`

ctdb-recovery: Factor out existing database recovery code Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-09-17 09:00:47 +03:00			`return 0;`
			`}`

			`/*`
			`we are the recmaster, and recovery is needed - start a recovery run`
			`*/`
			`static int do_recovery(struct ctdb_recoverd *rec,`
			`TALLOC_CTX *mem_ctx, uint32_t pnn,`
ctdb-daemon: Rename struct ctdb_node_map to ctdb_node_map_old Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 09:22:48 +03:00			`struct ctdb_node_map_old nodemap, struct ctdb_vnn_map vnnmap)`
ctdb-recovery: Factor out existing database recovery code Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-09-17 09:00:47 +03:00			`{`
			`struct ctdb_context *ctdb = rec->ctdb;`
			`int i, ret;`
ctdb-daemon: Rename struct ctdb_dbid_map to ctdb_dbid_map_old Match struct ctdb_dbid as per protocol/protocol.h Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 09:46:05 +03:00			`struct ctdb_dbid_map_old *dbmap;`
ctdb-recovery: Factor out existing database recovery code Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-09-17 09:00:47 +03:00			`struct timeval start_time;`
			`bool self_ban;`
ctdb-recoverd: Add code for parallel database recovery Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-09-17 09:22:38 +03:00			`bool par_recovery;`
ctdb-recovery: Factor out existing database recovery code Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-09-17 09:00:47 +03:00
			`DEBUG(DEBUG_NOTICE, (__location__ " Starting do_recovery\n"));`

ctdb-recoverd: Always check for recmaster before doing recovery Recovery daemon checks if it is the recovery master before performing certain checks. During those checks it's possible that re-election can change the recmaster. In such a case, the recovery daemon should never do a database recovery. This is not complete fix since the recovery master can still change while the recovery is going on. The correct fix is to abort recovery if the recovery master changes. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> Autobuild-User(master): Amitay Isaacs <amitay@samba.org> Autobuild-Date(master): Wed Oct 7 17:55:05 CEST 2015 on sn-devel-104 2015-10-06 09:31:41 +03:00			`/* Check if the current node is still the recmaster. It's possible that`
			`* re-election has changed the recmaster, but we have not yet updated`
			`* that information.`
			`*/`
			`ret = ctdb_ctrl_getrecmaster(ctdb, mem_ctx, CONTROL_TIMEOUT(),`
			`pnn, &ctdb->recovery_master);`
			`if (ret != 0) {`
			`DEBUG(DEBUG_ERR, (__location__ " Unable to get recmaster\n"));`
			`return -1;`
			`}`

			`if (pnn != ctdb->recovery_master) {`
			`DEBUG(DEBUG_NOTICE,`
			`("Recovery master changed to %u, aborting recovery\n",`
			`ctdb->recovery_master));`
			`return -1;`
			`}`

ctdb-recovery: Factor out existing database recovery code Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-09-17 09:00:47 +03:00			`/* if recovery fails, force it again */`
			`rec->need_recovery = true;`

			`if (!ctdb_op_begin(rec->recovery)) {`
			`return -1;`
			`}`

			`if (rec->election_timeout) {`
			`/* an election is in progress */`
			`DEBUG(DEBUG_ERR, ("do_recovery called while election in progress - try again later\n"));`
			`goto fail;`
			`}`

			`ban_misbehaving_nodes(rec, &self_ban);`
			`if (self_ban) {`
			`DEBUG(DEBUG_NOTICE, ("This node was banned, aborting recovery\n"));`
			`goto fail;`
			`}`

			`if (ctdb->recovery_lock_file != NULL) {`
			`if (ctdb_recovery_have_lock(ctdb)) {`
			`DEBUG(DEBUG_NOTICE, ("Already holding recovery lock\n"));`
			`} else {`
			`start_time = timeval_current();`
			`DEBUG(DEBUG_NOTICE, ("Attempting to take recovery lock (%s)\n",`
			`ctdb->recovery_lock_file));`
			`if (!ctdb_recovery_lock(ctdb)) {`
			`if (ctdb->runstate == CTDB_RUNSTATE_FIRST_RECOVERY) {`
			`/* If ctdb is trying first recovery, it's`
			`* possible that current node does not know`
			`* yet who the recmaster is.`
			`*/`
			`DEBUG(DEBUG_ERR, ("Unable to get recovery lock"`
			`" - retrying recovery\n"));`
			`goto fail;`
			`}`

			`DEBUG(DEBUG_ERR,("Unable to get recovery lock - aborting recovery "`
			`"and ban ourself for %u seconds\n",`
			`ctdb->tunable.recovery_ban_period));`
			`ctdb_ban_node(rec, pnn, ctdb->tunable.recovery_ban_period);`
			`goto fail;`
			`}`
			`ctdb_ctrl_report_recd_lock_latency(ctdb,`
			`CONTROL_TIMEOUT(),`
			`timeval_elapsed(&start_time));`
			`DEBUG(DEBUG_NOTICE,`
			`("Recovery lock taken successfully by recovery daemon\n"));`
			`}`
			`}`

			`DEBUG(DEBUG_NOTICE, (__location__ " Recovery initiated due to problem with node %u\n", rec->last_culprit_node));`

			`/* get a list of all databases */`
			`ret = ctdb_ctrl_getdbmap(ctdb, CONTROL_TIMEOUT(), pnn, mem_ctx, &dbmap);`
			`if (ret != 0) {`
			`DEBUG(DEBUG_ERR, (__location__ " Unable to get dbids from node :%u\n", pnn));`
			`goto fail;`
			`}`

			`/* we do the db creation before we set the recovery mode, so the freeze happens`
			`on all databases we will be dealing with. */`

			`/* verify that we have all the databases any other node has */`
			`ret = create_missing_local_databases(ctdb, nodemap, pnn, &dbmap, mem_ctx);`
			`if (ret != 0) {`
			`DEBUG(DEBUG_ERR, (__location__ " Unable to create missing local databases\n"));`
			`goto fail;`
			`}`

			`/* verify that all other nodes have all our databases */`
			`ret = create_missing_remote_databases(ctdb, nodemap, pnn, dbmap, mem_ctx);`
			`if (ret != 0) {`
			`DEBUG(DEBUG_ERR, (__location__ " Unable to create missing remote databases\n"));`
			`goto fail;`
			`}`
			`DEBUG(DEBUG_NOTICE, (__location__ " Recovery - created remote databases\n"));`

			`/* update the database priority for all remote databases */`
			`ret = update_db_priority_on_remote_nodes(ctdb, nodemap, pnn, dbmap, mem_ctx);`
			`if (ret != 0) {`
			`DEBUG(DEBUG_ERR, (__location__ " Unable to set db priority on remote nodes\n"));`
			`}`
			`DEBUG(DEBUG_NOTICE, (__location__ " Recovery - updated db priority for all databases\n"));`


			`/* update all other nodes to use the same setting for reclock files`
			`as the local recovery master.`
			`*/`
			`sync_recovery_lock_file_across_cluster(rec);`

ctdb-recoverd: Update capabilities before the database recovery Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-09-17 09:07:37 +03:00			`/* update the capabilities for all nodes */`
			`ret = update_capabilities(rec, nodemap);`
			`if (ret!=0) {`
			`DEBUG(DEBUG_ERR, (__location__ " Unable to update node capabilities.\n"));`
			`return -1;`
			`}`

ctdb-recoverd: Update flags on all nodes before database recovery Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-09-17 10:10:15 +03:00			`/*`
			`update all nodes to have the same flags that we have`
			`*/`
			`for (i=0;i<nodemap->num;i++) {`
			`if (nodemap->nodes[i].flags & NODE_FLAGS_DISCONNECTED) {`
			`continue;`
			`}`

			`ret = update_flags_on_all_nodes(ctdb, nodemap, i, nodemap->nodes[i].flags);`
			`if (ret != 0) {`
			`if (nodemap->nodes[i].flags & NODE_FLAGS_INACTIVE) {`
			`DEBUG(DEBUG_WARNING, (__location__ "Unable to update flags on inactive node %d\n", i));`
			`} else {`
			`DEBUG(DEBUG_ERR, (__location__ " Unable to update flags on all nodes for node %d\n", i));`
			`return -1;`
			`}`
			`}`
			`}`

			`DEBUG(DEBUG_NOTICE, (__location__ " Recovery - updated flags\n"));`

ctdb-recoverd: Add code for parallel database recovery Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-09-17 09:22:38 +03:00			`/* Check if all participating nodes have parallel recovery capability */`
			`par_recovery = true;`
			`for (i=0; i<nodemap->num; i++) {`
			`if (nodemap->nodes[i].flags & NODE_FLAGS_INACTIVE) {`
			`continue;`
			`}`

			`if (!(rec->caps[i].capabilities &`
			`CTDB_CAP_PARALLEL_RECOVERY)) {`
			`par_recovery = false;`
			`break;`
			`}`
			`}`

			`if (par_recovery) {`
			`ret = db_recovery_parallel(rec, mem_ctx);`
			`} else {`
			`ret = db_recovery_serial(rec, mem_ctx, pnn, nodemap, vnnmap,`
			`dbmap);`
			`}`

ctdb-recovery: Factor out existing database recovery code Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-09-17 09:00:47 +03:00			`if (ret != 0) {`
			`goto fail;`
			`}`

recoverd: Fix an incorrect comment Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 9f6cd8b0bea619991c9f3bf35188c5950dabf8f4) 2013-06-30 11:53:37 +04:00			`/* Fetch known/available public IPs from each active node */`
ctdb-recoverd: Remote IP validation can't cause a takeover run Remote IP validation is only called when a takeover run is about to happen anyway, so don't bother flagging one. Given that a takeover run isn't being triggered, also drop the test that checks if takeover runs are disabled. These are the only uses of the rec argument, so drop it. One possible further simplification would be to remove this function because it doesn't accomplish anything. However, it is worth leaving it as a reminder that remote IP validation should be done properly at some time in the future. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-10-28 12:33:29 +03:00			`ret = ctdb_reload_remote_public_ips(ctdb, nodemap);`
server: reload the public addresses before doing a takeover run metze (This used to be ctdb commit 0e41a2204fa8a1e77dc83c0d4b253ab272b5c72d) 2010-01-19 10:42:48 +03:00			`if (ret != 0) {`
recoverd: avoid triggering a full recovery if just some ip allocation has failed. We dont need to rebuild the databases in this situation, we just need to try again to sort out the ip address allocations. (This used to be ctdb commit 044c398ffea23d36ee033c8ddf07d11028197346) 2011-01-10 08:51:56 +03:00			`rec->need_takeover_run = true;`
ctdb-recoverd: Use a goto for do_recovery() failures This will allow extra things to be done on failure. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-02-06 06:32:08 +03:00			`goto fail;`
server: reload the public addresses before doing a takeover run metze (This used to be ctdb commit 0e41a2204fa8a1e77dc83c0d4b253ab272b5c72d) 2010-01-19 10:42:48 +03:00			`}`
recoverd: New function do_takeover_run() Factor the calling sequence for ctdb_takeover_run() into a new function and call it instead. This changes rec->need_takeover_run to false for each successful takeover run and that seems to be the right thing to do. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 9a3f0c0e61ca5c17e020c6e0463d73c7cf4f7c09) 2013-08-27 06:14:34 +04:00
			`do_takeover_run(rec, nodemap, false);`
read the current debuglevel in each loop in the recovery daemon so that we pick up when they change in the parent daemon (This used to be ctdb commit 792d5471ff0c2947b6e66183925860de27f30eaf) 2008-02-18 11:38:04 +03:00
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`/* execute the "recovered" event script on all nodes */`
recoverd: Track failure of "recovered" event, banning culprits Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 9550c497e6d6ef5ee44826c4bd9ed5ad65174263) 2012-09-24 08:32:04 +04:00			`ret = run_recovered_eventscript(rec, nodemap, "do_recovery");`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`if (ret!=0) {`
Update some debug statements. Dont say that recovery failed if the failed function was invoked from outside of recovery (This used to be ctdb commit 3038d0b74895b51af4f85f2f304508ed16d245f4) 2008-05-15 06:28:52 +04:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to run the 'recovered' event on cluster. Recovery process failed.\n"));`
ctdb-recoverd: Use a goto for do_recovery() failures This will allow extra things to be done on failure. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-02-06 06:32:08 +03:00			`goto fail;`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 05:59:28 +03:00			`}`

read the current debuglevel in each loop in the recovery daemon so that we pick up when they change in the parent daemon (This used to be ctdb commit 792d5471ff0c2947b6e66183925860de27f30eaf) 2008-02-18 11:38:04 +03:00			`DEBUG(DEBUG_NOTICE, (__location__ " Recovery - finished the recovered event\n"));`

send a message to clients when an IP has been released (This used to be ctdb commit 8b7ab0b00253462593d368052c2cb10a385b4e63) 2007-05-25 18:05:30 +04:00			`/* send a message to all clients telling them that the cluster`
			`has been reconfigured */`
ctdb-server: Coverity fixes Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Michael Adam <obnox@samba.org> 2013-11-11 05:39:27 +04:00			`ret = ctdb_client_send_message(ctdb, CTDB_BROADCAST_CONNECTED,`
			`CTDB_SRVID_RECONFIGURE, tdb_null);`
			`if (ret != 0) {`
			`DEBUG(DEBUG_ERR, (__location__ " Failed to send reconfigure message\n"));`
ctdb-recoverd: Use a goto for do_recovery() failures This will allow extra things to be done on failure. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-02-06 06:32:08 +03:00			`goto fail;`
ctdb-server: Coverity fixes Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Michael Adam <obnox@samba.org> 2013-11-11 05:39:27 +04:00			`}`
recovery daemon this program is a client to the local ctdb daemon every second it pulls all vnnmap and nodemaps from all nodes that are available and checks if a recovery is required a recovery is required if : * all nodes do NOT have an identical vnnmap and generation * all nodes do NOT have an identical nodemap * there are active nodes that are NOT in the nodemap * there are nodes in the nodemap that are NOT active During recovery, the recovery tool will also make sure that all nodes know about and have created all databases. (This used to be ctdb commit 2f2650467bac7e8954de7c17cb34f46b0bdbcd26) 2007-05-04 09:21:40 +04:00
added debug constants to allow for better mapping to syslog levels (This used to be ctdb commit 7ba8f1dde318eab03f4257e5a89fd23e7281e502) 2008-02-04 09:44:24 +03:00			`DEBUG(DEBUG_NOTICE, (__location__ " Recovery complete\n"));`
add a tuneable to control how long we wait after a successful recovery before we alow another recovery to be initiated (This used to be ctdb commit f3b43519423b7a73e6a2dd986bdf11203b8653cf) 2007-07-04 02:36:59 +04:00
- merge from ronnie - add a flag to check that recovery completed correctly. If not, re-trigger it in monitoring (This used to be ctdb commit d5ed941d9bab4af30d8b5f9b77bdf43d9218d69b) 2007-09-14 03:49:12 +04:00			`rec->need_recovery = false;`
ctdb-recoverd: Reimplement ReRecoveryTimeout using ctdb_op_disable() Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-02-06 06:47:33 +03:00			`ctdb_op_end(rec->recovery);`
- merge from ronnie - add a flag to check that recovery completed correctly. If not, re-trigger it in monitoring (This used to be ctdb commit d5ed941d9bab4af30d8b5f9b77bdf43d9218d69b) 2007-09-14 03:49:12 +04:00
with the new banning logic with one struct for each node we no longer "forget" the other culprits as often as we used to do, which means that things like "ctdb recover" can now actually lead to a node becomming banned if we perform too many recoveries too frequently. change this to provide absolution to all nodes once they have participated in a recovery session. (This used to be ctdb commit f66d17fb2e81a35d5adb3754e1cc902f76b4590a) 2009-09-25 07:14:53 +04:00			`/* we managed to complete a full recovery, make sure to forgive`
			`any past sins by the nodes that could now participate in the`
			`recovery.`
			`*/`
			`DEBUG(DEBUG_ERR,("Resetting ban count to 0 for all nodes\n"));`
			`for (i=0;i<nodemap->num;i++) {`
			`struct ctdb_banning_state *ban_state;`

			`if (nodemap->nodes[i].flags & NODE_FLAGS_DISCONNECTED) {`
			`continue;`
			`}`

			`ban_state = (struct ctdb_banning_state *)ctdb->nodes[nodemap->nodes[i].pnn]->ban_state;`
			`if (ban_state == NULL) {`
			`continue;`
			`}`

			`ban_state->count = 0;`
			`}`

ctdb-recoverd: Reimplement ReRecoveryTimeout using ctdb_op_disable() Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-02-06 06:47:33 +03:00			`/* We just finished a recovery successfully.`
			`We now wait for rerecovery_timeout before we allow`
add a tuneable to control how long we wait after a successful recovery before we alow another recovery to be initiated (This used to be ctdb commit f3b43519423b7a73e6a2dd986bdf11203b8653cf) 2007-07-04 02:36:59 +04:00			`another recovery to take place.`
			`*/`
update/improve the log message related to rerecovery timeouts (This used to be ctdb commit 8b4d1df3abcae03cf7a339d8390c816682a43019) 2010-09-28 02:46:12 +04:00			`DEBUG(DEBUG_NOTICE, ("Just finished a recovery. New recoveries will now be supressed for the rerecovery timeout (%d seconds)\n", ctdb->tunable.rerecovery_timeout));`
ctdb-recoverd: Reimplement ReRecoveryTimeout using ctdb_op_disable() Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-02-06 06:47:33 +03:00			`ctdb_op_disable(rec->recovery, ctdb->ev,`
			`ctdb->tunable.rerecovery_timeout);`
recovery daemon this program is a client to the local ctdb daemon every second it pulls all vnnmap and nodemaps from all nodes that are available and checks if a recovery is required a recovery is required if : * all nodes do NOT have an identical vnnmap and generation * all nodes do NOT have an identical nodemap * there are active nodes that are NOT in the nodemap * there are nodes in the nodemap that are NOT active During recovery, the recovery tool will also make sure that all nodes know about and have created all databases. (This used to be ctdb commit 2f2650467bac7e8954de7c17cb34f46b0bdbcd26) 2007-05-04 09:21:40 +04:00			`return 0;`
ctdb-recoverd: Use a goto for do_recovery() failures This will allow extra things to be done on failure. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-02-06 06:32:08 +03:00
			`fail:`
ctdb-recoverd: Reimplement ReRecoveryTimeout using ctdb_op_disable() Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-02-06 06:47:33 +03:00			`ctdb_op_end(rec->recovery);`
ctdb-recoverd: Use a goto for do_recovery() failures This will allow extra things to be done on failure. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-02-06 06:32:08 +03:00			`return -1;`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`}`
start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00
add a test in the function that checks whether the cluster needs recovery or not that all active nodes are in normal mode. If we discover that some node is still in recoverymode it may indicate that a previous recovery ended prematurely and thus we should start a new recovery (This used to be ctdb commit c15517872e6c98c8c425a8d47d2b348ecb0620b0) 2007-05-06 22:41:12 +04:00
choose the most connected node first (This used to be ctdb commit c7c17a79fa4f28509e34b6f635fa62517dc458c2) 2007-06-07 13:17:27 +04:00			`/*`
			`elections are won by first checking the number of connected nodes, then`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`the priority time, then the pnn`
choose the most connected node first (This used to be ctdb commit c7c17a79fa4f28509e34b6f635fa62517dc458c2) 2007-06-07 13:17:27 +04:00			`*/`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`struct election_message {`
choose the most connected node first (This used to be ctdb commit c7c17a79fa4f28509e34b6f635fa62517dc458c2) 2007-06-07 13:17:27 +04:00			`uint32_t num_connected;`
use a priority time for the election data, not just the vnn (This used to be ctdb commit a691f9c5cd77194005f0d98483da94b07a48d57d) 2007-06-07 12:37:27 +04:00			`struct timeval priority_time;`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`uint32_t pnn;`
- catch ESTALE in the recovery lock by trying a read() - priortise nodes that are unbanned and healthy in the election (This used to be ctdb commit 929feb475dfdf7283f0e99b50b179e1c91d3a39f) 2007-10-05 07:28:21 +04:00			`uint32_t node_flags;`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`};`

choose the most connected node first (This used to be ctdb commit c7c17a79fa4f28509e34b6f635fa62517dc458c2) 2007-06-07 13:17:27 +04:00			`/*`
			`form this nodes election data`
			`*/`
			`static void ctdb_election_data(struct ctdb_recoverd rec, struct election_message em)`
			`{`
			`int ret, i;`
ctdb-daemon: Rename struct ctdb_node_map to ctdb_node_map_old Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 09:22:48 +03:00			`struct ctdb_node_map_old *nodemap;`
choose the most connected node first (This used to be ctdb commit c7c17a79fa4f28509e34b6f635fa62517dc458c2) 2007-06-07 13:17:27 +04:00			`struct ctdb_context *ctdb = rec->ctdb;`

			`ZERO_STRUCTP(em);`

change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`em->pnn = rec->ctdb->pnn;`
choose the most connected node first (This used to be ctdb commit c7c17a79fa4f28509e34b6f635fa62517dc458c2) 2007-06-07 13:17:27 +04:00			`em->priority_time = rec->priority_time;`

			`ret = ctdb_ctrl_getnodemap(ctdb, CONTROL_TIMEOUT(), CTDB_CURRENT_NODE, rec, &nodemap);`
			`if (ret != 0) {`
recoverd: Improve an error message in the election code Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 275ed9ebe287e39d891888c13810c70f347af8ac) 2013-10-30 04:32:28 +04:00			`DEBUG(DEBUG_ERR,(__location__ " unable to get node map\n"));`
choose the most connected node first (This used to be ctdb commit c7c17a79fa4f28509e34b6f635fa62517dc458c2) 2007-06-07 13:17:27 +04:00			`return;`
			`}`

When we create new election data to send during elections, we must re-read the node flags from the main daemon to catch when the STOPPED flag is changed. (This used to be ctdb commit ca4982c40d81db528fe915d5ecc01fcf7df0b522) 2009-07-17 05:37:03 +04:00			`rec->node_flags = nodemap->nodes[ctdb->pnn].flags;`
			`em->node_flags = rec->node_flags;`

choose the most connected node first (This used to be ctdb commit c7c17a79fa4f28509e34b6f635fa62517dc458c2) 2007-06-07 13:17:27 +04:00			`for (i=0;i<nodemap->num;i++) {`
			`if (!(nodemap->nodes[i].flags & NODE_FLAGS_DISCONNECTED)) {`
			`em->num_connected++;`
			`}`
			`}`
make sure we lose all elections for recmaster role if we do not have the recmaster capability. (unless there are no other node at all available with this capability) (This used to be ctdb commit 8556e9dc897c6b9b9be0b52f391effb1f72fbd80) 2008-05-06 07:56:56 +04:00
			`/* we shouldnt try to win this election if we cant be a recmaster */`
			`if ((ctdb->capabilities & CTDB_CAP_RECMASTER) == 0) {`
			`em->num_connected = 0;`
			`em->priority_time = timeval_current();`
			`}`

choose the most connected node first (This used to be ctdb commit c7c17a79fa4f28509e34b6f635fa62517dc458c2) 2007-06-07 13:17:27 +04:00			`talloc_free(nodemap);`
			`}`

			`/*`
			`see if the given election data wins`
			`*/`
			`static bool ctdb_election_win(struct ctdb_recoverd rec, struct election_message em)`
			`{`
			`struct election_message myem;`
- catch ESTALE in the recovery lock by trying a read() - priortise nodes that are unbanned and healthy in the election (This used to be ctdb commit 929feb475dfdf7283f0e99b50b179e1c91d3a39f) 2007-10-05 07:28:21 +04:00			`int cmp = 0;`
choose the most connected node first (This used to be ctdb commit c7c17a79fa4f28509e34b6f635fa62517dc458c2) 2007-06-07 13:17:27 +04:00
			`ctdb_election_data(rec, &myem);`

Fix various spelling errors Reviewed-by: Andrew Bartlett <abartlet@samba.org> Reviewed-by: Michael Adam <obnox@samba.org> Autobuild-User(master): Andrew Bartlett <abartlet@samba.org> Autobuild-Date(master): Fri Nov 6 13:43:45 CET 2015 on sn-devel-104 2015-07-27 00:02:57 +03:00			`/* we cant win if we don't have the recmaster capability */`
make sure we lose all elections for recmaster role if we do not have the recmaster capability. (unless there are no other node at all available with this capability) (This used to be ctdb commit 8556e9dc897c6b9b9be0b52f391effb1f72fbd80) 2008-05-06 07:56:56 +04:00			`if ((rec->ctdb->capabilities & CTDB_CAP_RECMASTER) == 0) {`
			`return false;`
			`}`

simplify election handling make sure we read and update the flags from all remote nodes before we reach the first codepath that can call do_recovery() since during do_recovery() we need to know what the flags are. (This used to be ctdb commit e85f3806483ea420559d449e0e4d81bec996740f) 2007-10-11 00:16:36 +04:00			`/* we cant win if we are banned */`
			`if (rec->node_flags & NODE_FLAGS_BANNED) {`
merge from ronnie (This used to be ctdb commit d18712caba11855010be52f90bac656683076676) 2007-10-15 08:17:49 +04:00			`return false;`
recoverd: eliminate some trailing spaces from ctdb_election_win() Signed-off-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit df30c0a05ed908fc2a997c56ff5484736b23b70f) 2013-06-21 16:06:22 +04:00			`}`
- catch ESTALE in the recovery lock by trying a read() - priortise nodes that are unbanned and healthy in the election (This used to be ctdb commit 929feb475dfdf7283f0e99b50b179e1c91d3a39f) 2007-10-05 07:28:21 +04:00
stopped nodes can not win a recmaster election stopped nodes must yield the recmaster role (This used to be ctdb commit b75ac1185481060ab71bd743e1e48d333d716eba) 2009-07-09 08:44:03 +04:00			`/* we cant win if we are stopped */`
			`if (rec->node_flags & NODE_FLAGS_STOPPED) {`
			`return false;`
recoverd: eliminate some trailing spaces from ctdb_election_win() Signed-off-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit df30c0a05ed908fc2a997c56ff5484736b23b70f) 2013-06-21 16:06:22 +04:00			`}`
stopped nodes can not win a recmaster election stopped nodes must yield the recmaster role (This used to be ctdb commit b75ac1185481060ab71bd743e1e48d333d716eba) 2009-07-09 08:44:03 +04:00
simplify election handling make sure we read and update the flags from all remote nodes before we reach the first codepath that can call do_recovery() since during do_recovery() we need to know what the flags are. (This used to be ctdb commit e85f3806483ea420559d449e0e4d81bec996740f) 2007-10-11 00:16:36 +04:00			`/* we will automatically win if the other node is banned */`
			`if (em->node_flags & NODE_FLAGS_BANNED) {`
merge from ronnie (This used to be ctdb commit d18712caba11855010be52f90bac656683076676) 2007-10-15 08:17:49 +04:00			`return true;`
- catch ESTALE in the recovery lock by trying a read() - priortise nodes that are unbanned and healthy in the election (This used to be ctdb commit 929feb475dfdf7283f0e99b50b179e1c91d3a39f) 2007-10-05 07:28:21 +04:00			`}`

stopped nodes can not win a recmaster election stopped nodes must yield the recmaster role (This used to be ctdb commit b75ac1185481060ab71bd743e1e48d333d716eba) 2009-07-09 08:44:03 +04:00			`/* we will automatically win if the other node is banned */`
			`if (em->node_flags & NODE_FLAGS_STOPPED) {`
			`return true;`
			`}`

choose the most connected node first (This used to be ctdb commit c7c17a79fa4f28509e34b6f635fa62517dc458c2) 2007-06-07 13:17:27 +04:00			`/* then the longest running node */`
			`if (cmp == 0) {`
later times are a lower priority, not a higher priority (This used to be ctdb commit e96424e7d366df29767c4eeaccdcc0cc975cb8ae) 2007-06-07 13:21:55 +04:00			`cmp = timeval_compare(&em->priority_time, &myem.priority_time);`
choose the most connected node first (This used to be ctdb commit c7c17a79fa4f28509e34b6f635fa62517dc458c2) 2007-06-07 13:17:27 +04:00			`}`

			`if (cmp == 0) {`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`cmp = (int)myem.pnn - (int)em->pnn;`
choose the most connected node first (This used to be ctdb commit c7c17a79fa4f28509e34b6f635fa62517dc458c2) 2007-06-07 13:17:27 +04:00			`}`

			`return cmp > 0;`
			`}`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00
			`/*`
			`send out an election request`
			`*/`
Revert "if a new node enters the cluster, that node will already be frozen at start" This is unnecessary due to 03e2e436db5cfd29a56d13f5d2101e42389bfc94. Furthermore, if a node doesn't force an election but wins it then it can fail to record that it is the new recovery master. This can lead to a reverse split brain where there is no recovery master. This reverts commit c5035657606283d2e35bea40992505e84ca8e7be. Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> Conflicts: server/ctdb_recoverd.c (This used to be ctdb commit c8b542e059a54b8d524bd430cad9d82e5edd864d) 2013-10-29 09:38:42 +04:00			`static int send_election_request(struct ctdb_recoverd *rec, uint32_t pnn)`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`{`
			`int ret;`
			`TDB_DATA election_data;`
			`struct election_message emsg;`
			`uint64_t srvid;`
use a priority time for the election data, not just the vnn (This used to be ctdb commit a691f9c5cd77194005f0d98483da94b07a48d57d) 2007-06-07 12:37:27 +04:00			`struct ctdb_context *ctdb = rec->ctdb;`
simplify election handling make sure we read and update the flags from all remote nodes before we reach the first codepath that can call do_recovery() since during do_recovery() we need to know what the flags are. (This used to be ctdb commit e85f3806483ea420559d449e0e4d81bec996740f) 2007-10-11 00:16:36 +04:00
ctdb-include: Use new protocol definitions This gets rid of the duplicate definitions from ctdb_protocol.h. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 09:51:52 +03:00			`srvid = CTDB_SRVID_ELECTION;`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00
choose the most connected node first (This used to be ctdb commit c7c17a79fa4f28509e34b6f635fa62517dc458c2) 2007-06-07 13:17:27 +04:00			`ctdb_election_data(rec, &emsg);`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00
			`election_data.dsize = sizeof(struct election_message);`
			`election_data.dptr = (unsigned char *)&emsg;`


Revert "if a new node enters the cluster, that node will already be frozen at start" This is unnecessary due to 03e2e436db5cfd29a56d13f5d2101e42389bfc94. Furthermore, if a node doesn't force an election but wins it then it can fail to record that it is the new recovery master. This can lead to a reverse split brain where there is no recovery master. This reverts commit c5035657606283d2e35bea40992505e84ca8e7be. Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> Conflicts: server/ctdb_recoverd.c (This used to be ctdb commit c8b542e059a54b8d524bd430cad9d82e5edd864d) 2013-10-29 09:38:42 +04:00			`/* first we assume we will win the election and set`
			`recoverymaster to be ourself on the current node`
			`*/`
			`ret = ctdb_ctrl_setrecmaster(ctdb, CONTROL_TIMEOUT(), pnn, pnn);`
			`if (ret != 0) {`
			`DEBUG(DEBUG_ERR, (__location__ " failed to send recmaster election request\n"));`
			`return -1;`
			`}`


recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`/* send an election message to all active nodes */`
When we create new election data to send during elections, we must re-read the node flags from the main daemon to catch when the STOPPED flag is changed. (This used to be ctdb commit ca4982c40d81db528fe915d5ecc01fcf7df0b522) 2009-07-17 05:37:03 +04:00			`DEBUG(DEBUG_INFO,(__location__ " Send election request to all active nodes\n"));`
ctdb-server: Coverity fixes Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Michael Adam <obnox@samba.org> 2013-11-11 05:39:27 +04:00			`return ctdb_client_send_message(ctdb, CTDB_BROADCAST_ALL, srvid, election_data);`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`}`

unban all nodes when we release recmaster role or when we win an election (This used to be ctdb commit 48fb7483b3fe391e2d0b78718af29f69a641525e) 2007-06-09 14:11:51 +04:00			`/*`
			`this function will unban all nodes in the cluster`
			`*/`
			`static void unban_all_nodes(struct ctdb_context *ctdb)`
			`{`
			`int ret, i;`
ctdb-daemon: Rename struct ctdb_node_map to ctdb_node_map_old Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 09:22:48 +03:00			`struct ctdb_node_map_old *nodemap;`
unban all nodes when we release recmaster role or when we win an election (This used to be ctdb commit 48fb7483b3fe391e2d0b78718af29f69a641525e) 2007-06-09 14:11:51 +04:00			`TALLOC_CTX *tmp_ctx = talloc_new(ctdb);`

			`ret = ctdb_ctrl_getnodemap(ctdb, CONTROL_TIMEOUT(), CTDB_CURRENT_NODE, tmp_ctx, &nodemap);`
			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,(__location__ " failed to get nodemap to unban all nodes\n"));`
unban all nodes when we release recmaster role or when we win an election (This used to be ctdb commit 48fb7483b3fe391e2d0b78718af29f69a641525e) 2007-06-09 14:11:51 +04:00			`return;`
			`}`

			`for (i=0;i<nodemap->num;i++) {`
			`if ( (!(nodemap->nodes[i].flags & NODE_FLAGS_DISCONNECTED))`
			`&& (nodemap->nodes[i].flags & NODE_FLAGS_BANNED) ) {`
ctdb-server: Coverity fixes Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Michael Adam <obnox@samba.org> 2013-11-11 05:39:27 +04:00			`ret = ctdb_ctrl_modflags(ctdb, CONTROL_TIMEOUT(),`
			`nodemap->nodes[i].pnn, 0,`
			`NODE_FLAGS_BANNED);`
			`if (ret != 0) {`
			`DEBUG(DEBUG_ERR, (__location__ " failed to reset ban state\n"));`
			`}`
unban all nodes when we release recmaster role or when we win an election (This used to be ctdb commit 48fb7483b3fe391e2d0b78718af29f69a641525e) 2007-06-09 14:11:51 +04:00			`}`
			`}`

			`talloc_free(tmp_ctx);`
			`}`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00
make election handling much more scalable (This used to be ctdb commit 05938d462b92bd9ecb8e35f53651bded47c48675) 2007-11-13 02:27:44 +03:00
			`/*`
			`we think we are winning the election - send a broadcast election request`
			`*/`
ctdb-daemon: Stop using tevent compatibility definitions Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-26 08:50:09 +03:00			`static void election_send_request(struct tevent_context *ev,`
			`struct tevent_timer *te,`
			`struct timeval t, void *p)`
make election handling much more scalable (This used to be ctdb commit 05938d462b92bd9ecb8e35f53651bded47c48675) 2007-11-13 02:27:44 +03:00			`{`
			`struct ctdb_recoverd *rec = talloc_get_type(p, struct ctdb_recoverd);`
			`int ret;`

Revert "if a new node enters the cluster, that node will already be frozen at start" This is unnecessary due to 03e2e436db5cfd29a56d13f5d2101e42389bfc94. Furthermore, if a node doesn't force an election but wins it then it can fail to record that it is the new recovery master. This can lead to a reverse split brain where there is no recovery master. This reverts commit c5035657606283d2e35bea40992505e84ca8e7be. Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> Conflicts: server/ctdb_recoverd.c (This used to be ctdb commit c8b542e059a54b8d524bd430cad9d82e5edd864d) 2013-10-29 09:38:42 +04:00			`ret = send_election_request(rec, ctdb_get_pnn(rec->ctdb));`
make election handling much more scalable (This used to be ctdb commit 05938d462b92bd9ecb8e35f53651bded47c48675) 2007-11-13 02:27:44 +03:00			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,("Failed to send election request!\n"));`
make election handling much more scalable (This used to be ctdb commit 05938d462b92bd9ecb8e35f53651bded47c48675) 2007-11-13 02:27:44 +03:00			`}`

ctdb-recoverd: Simplify using TALLOC_FREE() Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-10-23 08:03:38 +03:00			`TALLOC_FREE(rec->send_election_te);`
make election handling much more scalable (This used to be ctdb commit 05938d462b92bd9ecb8e35f53651bded47c48675) 2007-11-13 02:27:44 +03:00			`}`

add improvements to tracking memory usage in ctdbd adn the recovery daemon and a ctdb command to pull the talloc memory map from a recovery daemon ctdb rddumpmemory (This used to be ctdb commit d23950be7406cf288f48b660c0f57a9b8d7bdd05) 2008-04-01 08:34:54 +04:00			`/*`
			`handler for memory dumps`
			`*/`
ctdb-daemon: Replace ctdb_message with srvid abstraction Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-04-08 07:38:26 +03:00			`static void mem_dump_handler(uint64_t srvid, TDB_DATA data, void *private_data)`
add improvements to tracking memory usage in ctdbd adn the recovery daemon and a ctdb command to pull the talloc memory map from a recovery daemon ctdb rddumpmemory (This used to be ctdb commit d23950be7406cf288f48b660c0f57a9b8d7bdd05) 2008-04-01 08:34:54 +04:00			`{`
ctdb-daemon: Replace ctdb_message with srvid abstraction Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-04-08 07:38:26 +03:00			`struct ctdb_recoverd *rec = talloc_get_type(`
			`private_data, struct ctdb_recoverd);`
			`struct ctdb_context *ctdb = rec->ctdb;`
add improvements to tracking memory usage in ctdbd adn the recovery daemon and a ctdb command to pull the talloc memory map from a recovery daemon ctdb rddumpmemory (This used to be ctdb commit d23950be7406cf288f48b660c0f57a9b8d7bdd05) 2008-04-01 08:34:54 +04:00			`TALLOC_CTX *tmp_ctx = talloc_new(ctdb);`
			`TDB_DATA *dump;`
			`int ret;`
ctdb-daemon: Rename struct srvid_request to ctdb_srvid_message Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 06:32:49 +03:00			`struct ctdb_srvid_message *rd;`
add improvements to tracking memory usage in ctdbd adn the recovery daemon and a ctdb command to pull the talloc memory map from a recovery daemon ctdb rddumpmemory (This used to be ctdb commit d23950be7406cf288f48b660c0f57a9b8d7bdd05) 2008-04-01 08:34:54 +04:00
ctdb-daemon: Rename struct srvid_request to ctdb_srvid_message Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 06:32:49 +03:00			`if (data.dsize != sizeof(struct ctdb_srvid_message)) {`
add improvements to tracking memory usage in ctdbd adn the recovery daemon and a ctdb command to pull the talloc memory map from a recovery daemon ctdb rddumpmemory (This used to be ctdb commit d23950be7406cf288f48b660c0f57a9b8d7bdd05) 2008-04-01 08:34:54 +04:00			`DEBUG(DEBUG_ERR, (__location__ " Wrong size of return address.\n"));`
fix a slow memory leak in the recovery daemon in the error paths for the memdump function (This used to be ctdb commit 5e641ef9d6cca286061138a9680dcf2495736e8b) 2008-09-16 03:00:48 +04:00			`talloc_free(tmp_ctx);`
add improvements to tracking memory usage in ctdbd adn the recovery daemon and a ctdb command to pull the talloc memory map from a recovery daemon ctdb rddumpmemory (This used to be ctdb commit d23950be7406cf288f48b660c0f57a9b8d7bdd05) 2008-04-01 08:34:54 +04:00			`return;`
			`}`
ctdb-daemon: Rename struct srvid_request to ctdb_srvid_message Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 06:32:49 +03:00			`rd = (struct ctdb_srvid_message *)data.dptr;`
add improvements to tracking memory usage in ctdbd adn the recovery daemon and a ctdb command to pull the talloc memory map from a recovery daemon ctdb rddumpmemory (This used to be ctdb commit d23950be7406cf288f48b660c0f57a9b8d7bdd05) 2008-04-01 08:34:54 +04:00
			`dump = talloc_zero(tmp_ctx, TDB_DATA);`
			`if (dump == NULL) {`
			`DEBUG(DEBUG_ERR, (__location__ " Failed to allocate memory for memdump\n"));`
			`talloc_free(tmp_ctx);`
			`return;`
			`}`
			`ret = ctdb_dump_memory(ctdb, dump);`
			`if (ret != 0) {`
			`DEBUG(DEBUG_ERR, (__location__ " ctdb_dump_memory() failed\n"));`
			`talloc_free(tmp_ctx);`
			`return;`
			`}`

			`DEBUG(DEBUG_ERR, ("recovery master memory dump\n"));`

rename ctdb_send_message to ctdb_client_send_message to resolve colission with the function of the same name in libctdb (This used to be ctdb commit ac3292c12832484a22715f1d46aa23f3b7c8a6f6) 2010-06-02 03:45:21 +04:00			`ret = ctdb_client_send_message(ctdb, rd->pnn, rd->srvid, *dump);`
add improvements to tracking memory usage in ctdbd adn the recovery daemon and a ctdb command to pull the talloc memory map from a recovery daemon ctdb rddumpmemory (This used to be ctdb commit d23950be7406cf288f48b660c0f57a9b8d7bdd05) 2008-04-01 08:34:54 +04:00			`if (ret != 0) {`
			`DEBUG(DEBUG_ERR,("Failed to send rd memdump reply message\n"));`
fix a slow memory leak in the recovery daemon in the error paths for the memdump function (This used to be ctdb commit 5e641ef9d6cca286061138a9680dcf2495736e8b) 2008-09-16 03:00:48 +04:00			`talloc_free(tmp_ctx);`
add improvements to tracking memory usage in ctdbd adn the recovery daemon and a ctdb command to pull the talloc memory map from a recovery daemon ctdb rddumpmemory (This used to be ctdb commit d23950be7406cf288f48b660c0f57a9b8d7bdd05) 2008-04-01 08:34:54 +04:00			`return;`
			`}`

			`talloc_free(tmp_ctx);`
			`}`

add a new node state : DELETED. This is used to mark nodes as being DELETED internally in ctdb so that nodes are not renumbered if / when they are removed from the nodes file. This is used to be able to do "ctdb reloadnodes" at runtime without causing nodes to be renumbered. To do this, instead of deleting a node from the nodes file, just comment it out like 1.0.0.1 #1.0.0.2 1.0.0.3 After removing 1.0.0.2 from the cluster, the remaining nodes retain their pnn's from prior to the deletion, namely 0 and 2 Any line in the nodes file that is commented out represents a DELETED pnn (This used to be ctdb commit 6a5e4fd7fa391206b463bb4e976502f3ac5bd343) 2009-06-01 08:18:34 +04:00			`/*`
			`handler for reload_nodes`
			`*/`
ctdb-daemon: Replace ctdb_message with srvid abstraction Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-04-08 07:38:26 +03:00			`static void reload_nodes_handler(uint64_t srvid, TDB_DATA data,`
			`void *private_data)`
add a new node state : DELETED. This is used to mark nodes as being DELETED internally in ctdb so that nodes are not renumbered if / when they are removed from the nodes file. This is used to be able to do "ctdb reloadnodes" at runtime without causing nodes to be renumbered. To do this, instead of deleting a node from the nodes file, just comment it out like 1.0.0.1 #1.0.0.2 1.0.0.3 After removing 1.0.0.2 from the cluster, the remaining nodes retain their pnn's from prior to the deletion, namely 0 and 2 Any line in the nodes file that is commented out represents a DELETED pnn (This used to be ctdb commit 6a5e4fd7fa391206b463bb4e976502f3ac5bd343) 2009-06-01 08:18:34 +04:00			`{`
ctdb-daemon: Replace ctdb_message with srvid abstraction Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-04-08 07:38:26 +03:00			`struct ctdb_recoverd *rec = talloc_get_type(`
			`private_data, struct ctdb_recoverd);`
add a new node state : DELETED. This is used to mark nodes as being DELETED internally in ctdb so that nodes are not renumbered if / when they are removed from the nodes file. This is used to be able to do "ctdb reloadnodes" at runtime without causing nodes to be renumbered. To do this, instead of deleting a node from the nodes file, just comment it out like 1.0.0.1 #1.0.0.2 1.0.0.3 After removing 1.0.0.2 from the cluster, the remaining nodes retain their pnn's from prior to the deletion, namely 0 and 2 Any line in the nodes file that is commented out represents a DELETED pnn (This used to be ctdb commit 6a5e4fd7fa391206b463bb4e976502f3ac5bd343) 2009-06-01 08:18:34 +04:00
			`DEBUG(DEBUG_ERR, (__location__ " Reload nodes file from recovery daemon\n"));`

recoverd: Remove function reload_nodes_file() It is a 1 line wrapper around ctdb_load_nodes_file(), so use that instead. We need less code... :-) Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 4a5d5935f4410a93a3343d85a24dbcddae2c4c20) 2013-10-14 06:54:39 +04:00			`ctdb_load_nodes_file(rec->ctdb);`
add a new node state : DELETED. This is used to mark nodes as being DELETED internally in ctdb so that nodes are not renumbered if / when they are removed from the nodes file. This is used to be able to do "ctdb reloadnodes" at runtime without causing nodes to be renumbered. To do this, instead of deleting a node from the nodes file, just comment it out like 1.0.0.1 #1.0.0.2 1.0.0.3 After removing 1.0.0.2 from the cluster, the remaining nodes retain their pnn's from prior to the deletion, namely 0 and 2 Any line in the nodes file that is commented out represents a DELETED pnn (This used to be ctdb commit 6a5e4fd7fa391206b463bb4e976502f3ac5bd343) 2009-06-01 08:18:34 +04:00			`}`

add a new message to ask the recovery daemon to temporarily disable checking ip address consistency. This is useful when we are moving addresses using moveip in the cluster since otherwise if we collide with the recovery daemons own check we could cause a recovery (This used to be ctdb commit 9c63858c0b22c81eaccb9865a414af0bbb2833d4) 2009-10-06 05:11:32 +04:00
ctdb-daemon: Stop using tevent compatibility definitions Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-26 08:50:09 +03:00			`static void ctdb_rebalance_timeout(struct tevent_context *ev,`
			`struct tevent_timer *te,`
recoverd: Fix the implementation of CTDB_SRVID_REBALANCE_NODE The current implementation has a few flaws: * A takeover run is called unconditionally when the timer goes even if the recovery master role has moved. This means a node other than the recovery master can incorrectly do a takeover run. * The rebalancing target nodes are cleared in the setup for a takeover run, regardless of whether the takeover run succeeds. * The timer to force a rebalance isn't cleared if another takeover run occurs before the deadline. Any forced rebalancing will happen in the first takeover run and when the timer expires some time later then an unnecessary takeover run will occur. * If the recovery master role moves then the rebalancing data will stay on the original node and affect the next takeover run to occur if the recovery master role should come back to the original node. Instead, store an array of rebalance target nodes in the recovery master context. This is passed as an extra argument to ctdb_takeover_run() each time it is called and is cleared when a takeover run succeeds. The timer hangs off the array of rebalance target nodes, which is cleared if the node isn't the recovery master. This means that it is possible to lose rebalance data if the recovery master role moves. However, that's a difficult problem to solve. The best way of approaching it is probably to try to stop the recovery master role from jumping around unnecesarily when inactive nodes join the cluster. The long term solution is to avoid this nonsense completely. The IP allocation algorithm needs to cache state between runs so that it knows which nodes have just become healthy. This also needs recovery master stability. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit c51c1efe5fc7fa668597f2acd435dee16e410fc9) 2013-09-04 08:30:04 +04:00			`struct timeval t, void *p)`
When adding ips to nodes, set up a deferred rebalance for the whole node to trigger after 60 seconds in case the normal ipreallocated is not sufficient to trigger rebalance. (This used to be ctdb commit 4340263b219d75c39f8de22abe3f6f1c1ee63ea2) 2012-02-27 23:56:04 +04:00			`{`
			`struct ctdb_recoverd *rec = talloc_get_type(p, struct ctdb_recoverd);`

recoverd: Fix the implementation of CTDB_SRVID_REBALANCE_NODE The current implementation has a few flaws: * A takeover run is called unconditionally when the timer goes even if the recovery master role has moved. This means a node other than the recovery master can incorrectly do a takeover run. * The rebalancing target nodes are cleared in the setup for a takeover run, regardless of whether the takeover run succeeds. * The timer to force a rebalance isn't cleared if another takeover run occurs before the deadline. Any forced rebalancing will happen in the first takeover run and when the timer expires some time later then an unnecessary takeover run will occur. * If the recovery master role moves then the rebalancing data will stay on the original node and affect the next takeover run to occur if the recovery master role should come back to the original node. Instead, store an array of rebalance target nodes in the recovery master context. This is passed as an extra argument to ctdb_takeover_run() each time it is called and is cleared when a takeover run succeeds. The timer hangs off the array of rebalance target nodes, which is cleared if the node isn't the recovery master. This means that it is possible to lose rebalance data if the recovery master role moves. However, that's a difficult problem to solve. The best way of approaching it is probably to try to stop the recovery master role from jumping around unnecesarily when inactive nodes join the cluster. The long term solution is to avoid this nonsense completely. The IP allocation algorithm needs to cache state between runs so that it knows which nodes have just become healthy. This also needs recovery master stability. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit c51c1efe5fc7fa668597f2acd435dee16e410fc9) 2013-09-04 08:30:04 +04:00			`if (rec->force_rebalance_nodes == NULL) {`
			`DEBUG(DEBUG_ERR,`
			`("Rebalance timeout occurred - no nodes to rebalance\n"));`
			`return;`
			`}`
When adding ips to nodes, set up a deferred rebalance for the whole node to trigger after 60 seconds in case the normal ipreallocated is not sufficient to trigger rebalance. (This used to be ctdb commit 4340263b219d75c39f8de22abe3f6f1c1ee63ea2) 2012-02-27 23:56:04 +04:00
recoverd: Fix the implementation of CTDB_SRVID_REBALANCE_NODE The current implementation has a few flaws: * A takeover run is called unconditionally when the timer goes even if the recovery master role has moved. This means a node other than the recovery master can incorrectly do a takeover run. * The rebalancing target nodes are cleared in the setup for a takeover run, regardless of whether the takeover run succeeds. * The timer to force a rebalance isn't cleared if another takeover run occurs before the deadline. Any forced rebalancing will happen in the first takeover run and when the timer expires some time later then an unnecessary takeover run will occur. * If the recovery master role moves then the rebalancing data will stay on the original node and affect the next takeover run to occur if the recovery master role should come back to the original node. Instead, store an array of rebalance target nodes in the recovery master context. This is passed as an extra argument to ctdb_takeover_run() each time it is called and is cleared when a takeover run succeeds. The timer hangs off the array of rebalance target nodes, which is cleared if the node isn't the recovery master. This means that it is possible to lose rebalance data if the recovery master role moves. However, that's a difficult problem to solve. The best way of approaching it is probably to try to stop the recovery master role from jumping around unnecesarily when inactive nodes join the cluster. The long term solution is to avoid this nonsense completely. The IP allocation algorithm needs to cache state between runs so that it knows which nodes have just become healthy. This also needs recovery master stability. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit c51c1efe5fc7fa668597f2acd435dee16e410fc9) 2013-09-04 08:30:04 +04:00			`DEBUG(DEBUG_NOTICE,`
ctdb-recoverd: Trigger takeover run after rebalance timeout No need to do it immediately. It will happen in less than a second. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-10-28 11:56:02 +03:00			`("Rebalance timeout occurred - trigger takeover run\n"));`
			`rec->need_takeover_run = true;`
When adding ips to nodes, set up a deferred rebalance for the whole node to trigger after 60 seconds in case the normal ipreallocated is not sufficient to trigger rebalance. (This used to be ctdb commit 4340263b219d75c39f8de22abe3f6f1c1ee63ea2) 2012-02-27 23:56:04 +04:00			`}`

ctdb-daemon: Replace ctdb_message with srvid abstraction Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-04-08 07:38:26 +03:00
			`static void recd_node_rebalance_handler(uint64_t srvid, TDB_DATA data,`
			`void *private_data)`
When adding ips to nodes, set up a deferred rebalance for the whole node to trigger after 60 seconds in case the normal ipreallocated is not sufficient to trigger rebalance. (This used to be ctdb commit 4340263b219d75c39f8de22abe3f6f1c1ee63ea2) 2012-02-27 23:56:04 +04:00			`{`
ctdb-daemon: Replace ctdb_message with srvid abstraction Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-04-08 07:38:26 +03:00			`struct ctdb_recoverd *rec = talloc_get_type(`
			`private_data, struct ctdb_recoverd);`
			`struct ctdb_context *ctdb = rec->ctdb;`
When adding ips to nodes, set up a deferred rebalance for the whole node to trigger after 60 seconds in case the normal ipreallocated is not sufficient to trigger rebalance. (This used to be ctdb commit 4340263b219d75c39f8de22abe3f6f1c1ee63ea2) 2012-02-27 23:56:04 +04:00			`uint32_t pnn;`
recoverd: Fix the implementation of CTDB_SRVID_REBALANCE_NODE The current implementation has a few flaws: * A takeover run is called unconditionally when the timer goes even if the recovery master role has moved. This means a node other than the recovery master can incorrectly do a takeover run. * The rebalancing target nodes are cleared in the setup for a takeover run, regardless of whether the takeover run succeeds. * The timer to force a rebalance isn't cleared if another takeover run occurs before the deadline. Any forced rebalancing will happen in the first takeover run and when the timer expires some time later then an unnecessary takeover run will occur. * If the recovery master role moves then the rebalancing data will stay on the original node and affect the next takeover run to occur if the recovery master role should come back to the original node. Instead, store an array of rebalance target nodes in the recovery master context. This is passed as an extra argument to ctdb_takeover_run() each time it is called and is cleared when a takeover run succeeds. The timer hangs off the array of rebalance target nodes, which is cleared if the node isn't the recovery master. This means that it is possible to lose rebalance data if the recovery master role moves. However, that's a difficult problem to solve. The best way of approaching it is probably to try to stop the recovery master role from jumping around unnecesarily when inactive nodes join the cluster. The long term solution is to avoid this nonsense completely. The IP allocation algorithm needs to cache state between runs so that it knows which nodes have just become healthy. This also needs recovery master stability. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit c51c1efe5fc7fa668597f2acd435dee16e410fc9) 2013-09-04 08:30:04 +04:00			`uint32_t *t;`
			`int len;`
recoverd: Rebalancing should be done regardless tunable Rebalance target nodes should be set even if a deferred rebalance is not configured. The user can explicitly cause a takeover run. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit afd9b51644af074752d74c412cb4e7ec2eba2c69) 2013-10-30 05:17:37 +04:00			`uint32_t deferred_rebalance;`
When adding ips to nodes, set up a deferred rebalance for the whole node to trigger after 60 seconds in case the normal ipreallocated is not sufficient to trigger rebalance. (This used to be ctdb commit 4340263b219d75c39f8de22abe3f6f1c1ee63ea2) 2012-02-27 23:56:04 +04:00
recoverd: Fix the implementation of CTDB_SRVID_REBALANCE_NODE The current implementation has a few flaws: * A takeover run is called unconditionally when the timer goes even if the recovery master role has moved. This means a node other than the recovery master can incorrectly do a takeover run. * The rebalancing target nodes are cleared in the setup for a takeover run, regardless of whether the takeover run succeeds. * The timer to force a rebalance isn't cleared if another takeover run occurs before the deadline. Any forced rebalancing will happen in the first takeover run and when the timer expires some time later then an unnecessary takeover run will occur. * If the recovery master role moves then the rebalancing data will stay on the original node and affect the next takeover run to occur if the recovery master role should come back to the original node. Instead, store an array of rebalance target nodes in the recovery master context. This is passed as an extra argument to ctdb_takeover_run() each time it is called and is cleared when a takeover run succeeds. The timer hangs off the array of rebalance target nodes, which is cleared if the node isn't the recovery master. This means that it is possible to lose rebalance data if the recovery master role moves. However, that's a difficult problem to solve. The best way of approaching it is probably to try to stop the recovery master role from jumping around unnecesarily when inactive nodes join the cluster. The long term solution is to avoid this nonsense completely. The IP allocation algorithm needs to cache state between runs so that it knows which nodes have just become healthy. This also needs recovery master stability. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit c51c1efe5fc7fa668597f2acd435dee16e410fc9) 2013-09-04 08:30:04 +04:00			`if (rec->recmaster != ctdb_get_pnn(ctdb)) {`
			`return;`
			`}`

When adding ips to nodes, set up a deferred rebalance for the whole node to trigger after 60 seconds in case the normal ipreallocated is not sufficient to trigger rebalance. (This used to be ctdb commit 4340263b219d75c39f8de22abe3f6f1c1ee63ea2) 2012-02-27 23:56:04 +04:00			`if (data.dsize != sizeof(uint32_t)) {`
			`DEBUG(DEBUG_ERR,(__location__ " Incorrect size of node rebalance message. Was %zd but expected %zd bytes\n", data.dsize, sizeof(uint32_t)));`
			`return;`
			`}`

			`pnn = (uint32_t )&data.dptr[0];`

recoverd: Fix the implementation of CTDB_SRVID_REBALANCE_NODE The current implementation has a few flaws: * A takeover run is called unconditionally when the timer goes even if the recovery master role has moved. This means a node other than the recovery master can incorrectly do a takeover run. * The rebalancing target nodes are cleared in the setup for a takeover run, regardless of whether the takeover run succeeds. * The timer to force a rebalance isn't cleared if another takeover run occurs before the deadline. Any forced rebalancing will happen in the first takeover run and when the timer expires some time later then an unnecessary takeover run will occur. * If the recovery master role moves then the rebalancing data will stay on the original node and affect the next takeover run to occur if the recovery master role should come back to the original node. Instead, store an array of rebalance target nodes in the recovery master context. This is passed as an extra argument to ctdb_takeover_run() each time it is called and is cleared when a takeover run succeeds. The timer hangs off the array of rebalance target nodes, which is cleared if the node isn't the recovery master. This means that it is possible to lose rebalance data if the recovery master role moves. However, that's a difficult problem to solve. The best way of approaching it is probably to try to stop the recovery master role from jumping around unnecesarily when inactive nodes join the cluster. The long term solution is to avoid this nonsense completely. The IP allocation algorithm needs to cache state between runs so that it knows which nodes have just become healthy. This also needs recovery master stability. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit c51c1efe5fc7fa668597f2acd435dee16e410fc9) 2013-09-04 08:30:04 +04:00			`DEBUG(DEBUG_NOTICE,("Setting up rebalance of IPs to node %u\n", pnn));`
When adding ips to nodes, set up a deferred rebalance for the whole node to trigger after 60 seconds in case the normal ipreallocated is not sufficient to trigger rebalance. (This used to be ctdb commit 4340263b219d75c39f8de22abe3f6f1c1ee63ea2) 2012-02-27 23:56:04 +04:00
recoverd: Fix the implementation of CTDB_SRVID_REBALANCE_NODE The current implementation has a few flaws: * A takeover run is called unconditionally when the timer goes even if the recovery master role has moved. This means a node other than the recovery master can incorrectly do a takeover run. * The rebalancing target nodes are cleared in the setup for a takeover run, regardless of whether the takeover run succeeds. * The timer to force a rebalance isn't cleared if another takeover run occurs before the deadline. Any forced rebalancing will happen in the first takeover run and when the timer expires some time later then an unnecessary takeover run will occur. * If the recovery master role moves then the rebalancing data will stay on the original node and affect the next takeover run to occur if the recovery master role should come back to the original node. Instead, store an array of rebalance target nodes in the recovery master context. This is passed as an extra argument to ctdb_takeover_run() each time it is called and is cleared when a takeover run succeeds. The timer hangs off the array of rebalance target nodes, which is cleared if the node isn't the recovery master. This means that it is possible to lose rebalance data if the recovery master role moves. However, that's a difficult problem to solve. The best way of approaching it is probably to try to stop the recovery master role from jumping around unnecesarily when inactive nodes join the cluster. The long term solution is to avoid this nonsense completely. The IP allocation algorithm needs to cache state between runs so that it knows which nodes have just become healthy. This also needs recovery master stability. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit c51c1efe5fc7fa668597f2acd435dee16e410fc9) 2013-09-04 08:30:04 +04:00			`/* Copy any existing list of nodes. There's probably some`
			`* sort of realloc variant that will do this but we need to`
			`* make sure that freeing the old array also cancels the timer`
			`* event for the timeout... not sure if realloc will do that.`
			`*/`
			`len = (rec->force_rebalance_nodes != NULL) ?`
			`talloc_array_length(rec->force_rebalance_nodes) :`
			`0;`

			`/* This allows duplicates to be added but they don't cause`
			`* harm. A call to add a duplicate PNN arguably means that`
			`* the timeout should be reset, so this is the simplest`
			`* solution.`
			`*/`
			`t = talloc_zero_array(rec, uint32_t, len+1);`
			`CTDB_NO_MEMORY_VOID(ctdb, t);`
			`if (len > 0) {`
			`memcpy(t, rec->force_rebalance_nodes, sizeof(uint32_t) * len);`
When adding ips to nodes, set up a deferred rebalance for the whole node to trigger after 60 seconds in case the normal ipreallocated is not sufficient to trigger rebalance. (This used to be ctdb commit 4340263b219d75c39f8de22abe3f6f1c1ee63ea2) 2012-02-27 23:56:04 +04:00			`}`
recoverd: Fix the implementation of CTDB_SRVID_REBALANCE_NODE The current implementation has a few flaws: * A takeover run is called unconditionally when the timer goes even if the recovery master role has moved. This means a node other than the recovery master can incorrectly do a takeover run. * The rebalancing target nodes are cleared in the setup for a takeover run, regardless of whether the takeover run succeeds. * The timer to force a rebalance isn't cleared if another takeover run occurs before the deadline. Any forced rebalancing will happen in the first takeover run and when the timer expires some time later then an unnecessary takeover run will occur. * If the recovery master role moves then the rebalancing data will stay on the original node and affect the next takeover run to occur if the recovery master role should come back to the original node. Instead, store an array of rebalance target nodes in the recovery master context. This is passed as an extra argument to ctdb_takeover_run() each time it is called and is cleared when a takeover run succeeds. The timer hangs off the array of rebalance target nodes, which is cleared if the node isn't the recovery master. This means that it is possible to lose rebalance data if the recovery master role moves. However, that's a difficult problem to solve. The best way of approaching it is probably to try to stop the recovery master role from jumping around unnecesarily when inactive nodes join the cluster. The long term solution is to avoid this nonsense completely. The IP allocation algorithm needs to cache state between runs so that it knows which nodes have just become healthy. This also needs recovery master stability. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit c51c1efe5fc7fa668597f2acd435dee16e410fc9) 2013-09-04 08:30:04 +04:00			`t[len] = pnn;`

			`talloc_free(rec->force_rebalance_nodes);`

			`rec->force_rebalance_nodes = t;`
recoverd: Rebalancing should be done regardless tunable Rebalance target nodes should be set even if a deferred rebalance is not configured. The user can explicitly cause a takeover run. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit afd9b51644af074752d74c412cb4e7ec2eba2c69) 2013-10-30 05:17:37 +04:00
			`/* If configured, setup a deferred takeover run to make sure`
			`* that certain nodes get IPs rebalanced to them. This will`
			`* be cancelled if a successful takeover run happens before`
			`* the timeout. Assign tunable value to variable for`
			`* readability.`
			`*/`
			`deferred_rebalance = ctdb->tunable.deferred_rebalance_on_node_add;`
			`if (deferred_rebalance != 0) {`
ctdb-daemon: Stop using tevent compatibility definitions Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-26 08:50:09 +03:00			`tevent_add_timer(ctdb->ev, rec->force_rebalance_nodes,`
			`timeval_current_ofs(deferred_rebalance, 0),`
			`ctdb_rebalance_timeout, rec);`
recoverd: Rebalancing should be done regardless tunable Rebalance target nodes should be set even if a deferred rebalance is not configured. The user can explicitly cause a takeover run. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit afd9b51644af074752d74c412cb4e7ec2eba2c69) 2013-10-30 05:17:37 +04:00			`}`
When adding ips to nodes, set up a deferred rebalance for the whole node to trigger after 60 seconds in case the normal ipreallocated is not sufficient to trigger rebalance. (This used to be ctdb commit 4340263b219d75c39f8de22abe3f6f1c1ee63ea2) 2012-02-27 23:56:04 +04:00			`}`



ctdb-daemon: Replace ctdb_message with srvid abstraction Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-04-08 07:38:26 +03:00			`static void recd_update_ip_handler(uint64_t srvid, TDB_DATA data,`
			`void *private_data)`
The recent change to the recovery daemon to keep track of and verify that all nodes agree on the most recent ip address assignments broke "ctdb moveip ..." since that call would never trigger a full takeover run and thus would immediately trigger an inconsistency. Add a new message to the recovery daemon where we can tell the recovery daemon to update its assignments. BZ62782 (This used to be ctdb commit e7069082e5f0380dcddee247db8754218ce18cab) 2010-04-28 09:43:11 +04:00			`{`
ctdb-daemon: Replace ctdb_message with srvid abstraction Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-04-08 07:38:26 +03:00			`struct ctdb_recoverd *rec = talloc_get_type(`
			`private_data, struct ctdb_recoverd);`
The recent change to the recovery daemon to keep track of and verify that all nodes agree on the most recent ip address assignments broke "ctdb moveip ..." since that call would never trigger a full takeover run and thus would immediately trigger an inconsistency. Add a new message to the recovery daemon where we can tell the recovery daemon to update its assignments. BZ62782 (This used to be ctdb commit e7069082e5f0380dcddee247db8754218ce18cab) 2010-04-28 09:43:11 +04:00			`struct ctdb_public_ip *ip;`

			`if (rec->recmaster != rec->ctdb->pnn) {`
			`DEBUG(DEBUG_INFO,("Not recmaster, ignore update ip message\n"));`
			`return;`
			`}`

			`if (data.dsize != sizeof(struct ctdb_public_ip)) {`
			`DEBUG(DEBUG_ERR,(__location__ " Incorrect size of recd update ip message. Was %zd but expected %zd bytes\n", data.dsize, sizeof(struct ctdb_public_ip)));`
			`return;`
			`}`

			`ip = (struct ctdb_public_ip *)data.dptr;`

			`update_ip_assignment_tree(rec->ctdb, ip);`
			`}`

ctdb-recoverd: Add slightly more abstraction for disabling takeover runs Factor out new function srvid_disable_and_reply(), which can be re-used. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-02-06 05:05:12 +03:00			`static void srvid_disable_and_reply(struct ctdb_context *ctdb,`
			`TDB_DATA data,`
			`struct ctdb_op_state *op_state)`
recoverd: New SRVID message CTDB_SRVID_DISABLE_TAKEOVER_RUNS This implements a superset of CTDB_SRVID_DISABLE_IP_CHECK. It stops the IP checks but also causes any attempted takeover runs to fail and be rescheduled. This is meant to completely stop IP movements. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 00db4de53a0d86013e79e6577e7e6cf3ef864e56) 2013-08-27 09:04:40 +04:00			`{`
ctdb-daemon: Rename struct srvid_request_data to ctdb_disable_message Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-28 10:23:13 +03:00			`struct ctdb_disable_message *r;`
recoverd: New SRVID message CTDB_SRVID_DISABLE_TAKEOVER_RUNS This implements a superset of CTDB_SRVID_DISABLE_IP_CHECK. It stops the IP checks but also causes any attempted takeover runs to fail and be rescheduled. This is meant to completely stop IP movements. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 00db4de53a0d86013e79e6577e7e6cf3ef864e56) 2013-08-27 09:04:40 +04:00			`uint32_t timeout;`
			`TDB_DATA result;`
			`int32_t ret = 0;`

			`/* Validate input data */`
ctdb-daemon: Rename struct srvid_request_data to ctdb_disable_message Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-28 10:23:13 +03:00			`if (data.dsize != sizeof(struct ctdb_disable_message)) {`
recoverd: New SRVID message CTDB_SRVID_DISABLE_TAKEOVER_RUNS This implements a superset of CTDB_SRVID_DISABLE_IP_CHECK. It stops the IP checks but also causes any attempted takeover runs to fail and be rescheduled. This is meant to completely stop IP movements. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 00db4de53a0d86013e79e6577e7e6cf3ef864e56) 2013-08-27 09:04:40 +04:00			`DEBUG(DEBUG_ERR,(__location__ " Wrong size for data :%lu "`
			`"expecting %lu\n", (long unsigned)data.dsize,`
ctdb-daemon: Rename struct srvid_request to ctdb_srvid_message Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 06:32:49 +03:00			`(long unsigned)sizeof(struct ctdb_srvid_message)));`
ctdb-server: Coverity fixes Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Michael Adam <obnox@samba.org> 2013-11-11 05:39:27 +04:00			`return;`
recoverd: New SRVID message CTDB_SRVID_DISABLE_TAKEOVER_RUNS This implements a superset of CTDB_SRVID_DISABLE_IP_CHECK. It stops the IP checks but also causes any attempted takeover runs to fail and be rescheduled. This is meant to completely stop IP movements. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 00db4de53a0d86013e79e6577e7e6cf3ef864e56) 2013-08-27 09:04:40 +04:00			`}`
			`if (data.dptr == NULL) {`
			`DEBUG(DEBUG_ERR,(__location__ " No data received\n"));`
ctdb-server: Coverity fixes Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Michael Adam <obnox@samba.org> 2013-11-11 05:39:27 +04:00			`return;`
recoverd: New SRVID message CTDB_SRVID_DISABLE_TAKEOVER_RUNS This implements a superset of CTDB_SRVID_DISABLE_IP_CHECK. It stops the IP checks but also causes any attempted takeover runs to fail and be rescheduled. This is meant to completely stop IP movements. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 00db4de53a0d86013e79e6577e7e6cf3ef864e56) 2013-08-27 09:04:40 +04:00			`}`

ctdb-daemon: Rename struct srvid_request_data to ctdb_disable_message Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-28 10:23:13 +03:00			`r = (struct ctdb_disable_message *)data.dptr;`
			`timeout = r->timeout;`
recoverd: New SRVID message CTDB_SRVID_DISABLE_TAKEOVER_RUNS This implements a superset of CTDB_SRVID_DISABLE_IP_CHECK. It stops the IP checks but also causes any attempted takeover runs to fail and be rescheduled. This is meant to completely stop IP movements. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 00db4de53a0d86013e79e6577e7e6cf3ef864e56) 2013-08-27 09:04:40 +04:00
ctdb-recoverd: Add slightly more abstraction for disabling takeover runs Factor out new function srvid_disable_and_reply(), which can be re-used. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-02-06 05:05:12 +03:00			`ret = ctdb_op_disable(op_state, ctdb->ev, timeout);`
ctdb-recoverd: Reimplement disabling takeover runs using ctdb_op_disable() Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-02-08 12:52:12 +03:00			`if (ret != 0) {`
recoverd: New SRVID message CTDB_SRVID_DISABLE_TAKEOVER_RUNS This implements a superset of CTDB_SRVID_DISABLE_IP_CHECK. It stops the IP checks but also causes any attempted takeover runs to fail and be rescheduled. This is meant to completely stop IP movements. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 00db4de53a0d86013e79e6577e7e6cf3ef864e56) 2013-08-27 09:04:40 +04:00			`goto done;`
			`}`

			`/* Returning our PNN tells the caller that we succeeded */`
			`ret = ctdb_get_pnn(ctdb);`
			`done:`
			`result.dsize = sizeof(int32_t);`
			`result.dptr = (uint8_t *)&ret;`
ctdb-daemon: Rename struct srvid_request to ctdb_srvid_message Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 06:32:49 +03:00			`srvid_request_reply(ctdb, (struct ctdb_srvid_message *)r, result);`
recoverd: New SRVID message CTDB_SRVID_DISABLE_TAKEOVER_RUNS This implements a superset of CTDB_SRVID_DISABLE_IP_CHECK. It stops the IP checks but also causes any attempted takeover runs to fail and be rescheduled. This is meant to completely stop IP movements. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 00db4de53a0d86013e79e6577e7e6cf3ef864e56) 2013-08-27 09:04:40 +04:00			`}`

ctdb-daemon: Replace ctdb_message with srvid abstraction Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-04-08 07:38:26 +03:00			`static void disable_takeover_runs_handler(uint64_t srvid, TDB_DATA data,`
ctdb-recoverd: Add slightly more abstraction for disabling takeover runs Factor out new function srvid_disable_and_reply(), which can be re-used. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-02-06 05:05:12 +03:00			`void *private_data)`
			`{`
ctdb-daemon: Replace ctdb_message with srvid abstraction Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-04-08 07:38:26 +03:00			`struct ctdb_recoverd *rec = talloc_get_type(`
			`private_data, struct ctdb_recoverd);`
ctdb-recoverd: Add slightly more abstraction for disabling takeover runs Factor out new function srvid_disable_and_reply(), which can be re-used. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-02-06 05:05:12 +03:00
ctdb-daemon: Replace ctdb_message with srvid abstraction Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-04-08 07:38:26 +03:00			`srvid_disable_and_reply(rec->ctdb, data, rec->takeover_run);`
ctdb-recoverd: Add slightly more abstraction for disabling takeover runs Factor out new function srvid_disable_and_reply(), which can be re-used. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-02-06 05:05:12 +03:00			`}`

ctdb-recoverd: Simplify disable_ip_check_handler() using ctdb_op_disable() Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-02-06 07:03:03 +03:00			`/* Backward compatibility for this SRVID */`
ctdb-daemon: Replace ctdb_message with srvid abstraction Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-04-08 07:38:26 +03:00			`static void disable_ip_check_handler(uint64_t srvid, TDB_DATA data,`
			`void *private_data)`
recoverd: Reimplement CTDB_SRVID_DISABLE_IP_CHECK Use disable_takeover_runs_handler() instead of maintaining duplicate logic. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 0a51a85915486b2a8fded7ba6444b18c6c1ee8e8) 2013-08-28 05:32:54 +04:00			`{`
ctdb-daemon: Replace ctdb_message with srvid abstraction Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-04-08 07:38:26 +03:00			`struct ctdb_recoverd *rec = talloc_get_type(`
			`private_data, struct ctdb_recoverd);`
ctdb-recoverd: Simplify disable_ip_check_handler() using ctdb_op_disable() Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-02-06 07:03:03 +03:00			`uint32_t timeout;`
recoverd: Reimplement CTDB_SRVID_DISABLE_IP_CHECK Use disable_takeover_runs_handler() instead of maintaining duplicate logic. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 0a51a85915486b2a8fded7ba6444b18c6c1ee8e8) 2013-08-28 05:32:54 +04:00
			`if (data.dsize != sizeof(uint32_t)) {`
			`DEBUG(DEBUG_ERR,(__location__ " Wrong size for data :%lu "`
			`"expecting %lu\n", (long unsigned)data.dsize,`
			`(long unsigned)sizeof(uint32_t)));`
			`return;`
			`}`
			`if (data.dptr == NULL) {`
			`DEBUG(DEBUG_ERR,(__location__ " No data received\n"));`
			`return;`
			`}`

ctdb-recoverd: Simplify disable_ip_check_handler() using ctdb_op_disable() Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-02-06 07:03:03 +03:00			`timeout = ((uint32_t )data.dptr);`
recoverd: Reimplement CTDB_SRVID_DISABLE_IP_CHECK Use disable_takeover_runs_handler() instead of maintaining duplicate logic. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 0a51a85915486b2a8fded7ba6444b18c6c1ee8e8) 2013-08-28 05:32:54 +04:00
ctdb-daemon: Replace ctdb_message with srvid abstraction Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-04-08 07:38:26 +03:00			`ctdb_op_disable(rec->takeover_run, rec->ctdb->ev, timeout);`
recoverd: Reimplement CTDB_SRVID_DISABLE_IP_CHECK Use disable_takeover_runs_handler() instead of maintaining duplicate logic. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 0a51a85915486b2a8fded7ba6444b18c6c1ee8e8) 2013-08-28 05:32:54 +04:00			`}`
add a new message to ask the recovery daemon to temporarily disable checking ip address consistency. This is useful when we are moving addresses using moveip in the cluster since otherwise if we collide with the recovery daemons own check we could cause a recovery (This used to be ctdb commit 9c63858c0b22c81eaccb9865a414af0bbb2833d4) 2009-10-06 05:11:32 +04:00
ctdb-daemon: Replace ctdb_message with srvid abstraction Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-04-08 07:38:26 +03:00			`static void disable_recoveries_handler(uint64_t srvid, TDB_DATA data,`
ctdb-recoverd: New message ID CTDB_SRVID_DISABLE_RECOVERIES Also add test stub support. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-02-06 07:06:44 +03:00			`void *private_data)`
			`{`
ctdb-daemon: Replace ctdb_message with srvid abstraction Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-04-08 07:38:26 +03:00			`struct ctdb_recoverd *rec = talloc_get_type(`
			`private_data, struct ctdb_recoverd);`
ctdb-recoverd: New message ID CTDB_SRVID_DISABLE_RECOVERIES Also add test stub support. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-02-06 07:06:44 +03:00
ctdb-daemon: Replace ctdb_message with srvid abstraction Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-04-08 07:38:26 +03:00			`srvid_disable_and_reply(rec->ctdb, data, rec->recovery);`
ctdb-recoverd: New message ID CTDB_SRVID_DISABLE_RECOVERIES Also add test stub support. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-02-06 07:06:44 +03:00			`}`

add a new command "ctdb ipreallocate", this command will force the recovery master to perform a full ip reallocation process. the ctdb command will block until the ip reallocation has comleted (This used to be ctdb commit abad7b97fe0c066b33f6e75d0953bbed892a3216) 2009-07-02 07:00:26 +04:00			`/*`
recoverd: Make the SRVID request structure generic No need for a separate one for each SRVID. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit d9c22b04d5aa7938a3965bd3144568664eb772ce) 2013-08-16 14:10:10 +04:00			`handler for ip reallocate, just add it to the list of requests and`
add a new command "ctdb ipreallocate", this command will force the recovery master to perform a full ip reallocation process. the ctdb command will block until the ip reallocation has comleted (This used to be ctdb commit abad7b97fe0c066b33f6e75d0953bbed892a3216) 2009-07-02 07:00:26 +04:00			`handle this later in the monitor_cluster loop so we do not recurse`
recoverd: Make the SRVID request structure generic No need for a separate one for each SRVID. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit d9c22b04d5aa7938a3965bd3144568664eb772ce) 2013-08-16 14:10:10 +04:00			`with other requests to takeover_run()`
add a new command "ctdb ipreallocate", this command will force the recovery master to perform a full ip reallocation process. the ctdb command will block until the ip reallocation has comleted (This used to be ctdb commit abad7b97fe0c066b33f6e75d0953bbed892a3216) 2009-07-02 07:00:26 +04:00			`*/`
ctdb-daemon: Replace ctdb_message with srvid abstraction Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-04-08 07:38:26 +03:00			`static void ip_reallocate_handler(uint64_t srvid, TDB_DATA data,`
			`void *private_data)`
add a new command "ctdb ipreallocate", this command will force the recovery master to perform a full ip reallocation process. the ctdb command will block until the ip reallocation has comleted (This used to be ctdb commit abad7b97fe0c066b33f6e75d0953bbed892a3216) 2009-07-02 07:00:26 +04:00			`{`
ctdb-daemon: Rename struct srvid_request to ctdb_srvid_message Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 06:32:49 +03:00			`struct ctdb_srvid_message *request;`
ctdb-daemon: Replace ctdb_message with srvid abstraction Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-04-08 07:38:26 +03:00			`struct ctdb_recoverd *rec = talloc_get_type(`
			`private_data, struct ctdb_recoverd);`
add a new command "ctdb ipreallocate", this command will force the recovery master to perform a full ip reallocation process. the ctdb command will block until the ip reallocation has comleted (This used to be ctdb commit abad7b97fe0c066b33f6e75d0953bbed892a3216) 2009-07-02 07:00:26 +04:00
ctdb-daemon: Rename struct srvid_request to ctdb_srvid_message Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 06:32:49 +03:00			`if (data.dsize != sizeof(struct ctdb_srvid_message)) {`
add a new command "ctdb ipreallocate", this command will force the recovery master to perform a full ip reallocation process. the ctdb command will block until the ip reallocation has comleted (This used to be ctdb commit abad7b97fe0c066b33f6e75d0953bbed892a3216) 2009-07-02 07:00:26 +04:00			`DEBUG(DEBUG_ERR, (__location__ " Wrong size of return address.\n"));`
			`return;`
			`}`

ctdb-daemon: Rename struct srvid_request to ctdb_srvid_message Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 06:32:49 +03:00			`request = (struct ctdb_srvid_message *)data.dptr;`
add a new command "ctdb ipreallocate", this command will force the recovery master to perform a full ip reallocation process. the ctdb command will block until the ip reallocation has comleted (This used to be ctdb commit abad7b97fe0c066b33f6e75d0953bbed892a3216) 2009-07-02 07:00:26 +04:00
ctdb-daemon: Replace ctdb_message with srvid abstraction Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-04-08 07:38:26 +03:00			`srvid_request_add(rec->ctdb, &rec->reallocate_requests, request);`
add a new command "ctdb ipreallocate", this command will force the recovery master to perform a full ip reallocation process. the ctdb command will block until the ip reallocation has comleted (This used to be ctdb commit abad7b97fe0c066b33f6e75d0953bbed892a3216) 2009-07-02 07:00:26 +04:00			`}`

recoverd: Factor out the SRVID handling code The code that handles IP reallocate requests can be reused. This also changes the result back to a SRVID caller to the PNN on success or a negative error code on failure. None of the callers currently look at the result so this is harmless... but it will be useful later. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit e4eae6e3291baa299a1d0f733ab11b138ee699a3) 2013-08-16 14:02:34 +04:00			`static void process_ipreallocate_requests(struct ctdb_context *ctdb,`
			`struct ctdb_recoverd *rec)`
add a new command "ctdb ipreallocate", this command will force the recovery master to perform a full ip reallocation process. the ctdb command will block until the ip reallocation has comleted (This used to be ctdb commit abad7b97fe0c066b33f6e75d0953bbed892a3216) 2009-07-02 07:00:26 +04:00			`{`
			`TDB_DATA result;`
			`int32_t ret;`
ctdb-recoverd: Only respond to currently queued ipreallocated requests Otherwise new requests can come in during the latter parts of the takeover run when the IP allocation algorithm has already run, and the new requests will be dequeued even though they haven't really be processed. Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Michael Adam <obnox@samba.org> 2013-11-22 06:57:03 +04:00			`struct srvid_requests *current;`
add a new command "ctdb ipreallocate", this command will force the recovery master to perform a full ip reallocation process. the ctdb command will block until the ip reallocation has comleted (This used to be ctdb commit abad7b97fe0c066b33f6e75d0953bbed892a3216) 2009-07-02 07:00:26 +04:00
			`DEBUG(DEBUG_INFO, ("recovery master forced ip reallocation\n"));`
server: reload the public addresses before doing a takeover run metze (This used to be ctdb commit 0e41a2204fa8a1e77dc83c0d4b253ab272b5c72d) 2010-01-19 10:42:48 +03:00
ctdb-recoverd: Only respond to currently queued ipreallocated requests Otherwise new requests can come in during the latter parts of the takeover run when the IP allocation algorithm has already run, and the new requests will be dequeued even though they haven't really be processed. Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Michael Adam <obnox@samba.org> 2013-11-22 06:57:03 +04:00			`/* Only process requests that are currently pending. More`
			`* might come in while the takeover run is in progress and`
			`* they will need to be processed later since they might`
			`* be in response flag changes.`
			`*/`
			`current = rec->reallocate_requests;`
			`rec->reallocate_requests = NULL;`

server: reload the public addresses before doing a takeover run metze (This used to be ctdb commit 0e41a2204fa8a1e77dc83c0d4b253ab272b5c72d) 2010-01-19 10:42:48 +03:00			`/* update the list of public ips that a node can handle for`
			`all connected nodes`
			`*/`
ctdb-recoverd: Remote IP validation can't cause a takeover run Remote IP validation is only called when a takeover run is about to happen anyway, so don't bother flagging one. Given that a takeover run isn't being triggered, also drop the test that checks if takeover runs are disabled. These are the only uses of the rec argument, so drop it. One possible further simplification would be to remove this function because it doesn't accomplish anything. However, it is worth leaving it as a reminder that remote IP validation should be done properly at some time in the future. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-10-28 12:33:29 +03:00			`ret = ctdb_reload_remote_public_ips(ctdb, rec->nodemap);`
server: reload the public addresses before doing a takeover run metze (This used to be ctdb commit 0e41a2204fa8a1e77dc83c0d4b253ab272b5c72d) 2010-01-19 10:42:48 +03:00			`if (ret != 0) {`
			`rec->need_takeover_run = true;`
			`}`
			`if (ret == 0) {`
recoverd: New function do_takeover_run() Factor the calling sequence for ctdb_takeover_run() into a new function and call it instead. This changes rec->need_takeover_run to false for each successful takeover run and that seems to be the right thing to do. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 9a3f0c0e61ca5c17e020c6e0463d73c7cf4f7c09) 2013-08-27 06:14:34 +04:00			`if (do_takeover_run(rec, rec->nodemap, false)) {`
recoverd: Factor out the SRVID handling code The code that handles IP reallocate requests can be reused. This also changes the result back to a SRVID caller to the PNN on success or a negative error code on failure. None of the callers currently look at the result so this is harmless... but it will be useful later. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit e4eae6e3291baa299a1d0f733ab11b138ee699a3) 2013-08-16 14:02:34 +04:00			`ret = ctdb_get_pnn(ctdb);`
recoverd: New function do_takeover_run() Factor the calling sequence for ctdb_takeover_run() into a new function and call it instead. This changes rec->need_takeover_run to false for each successful takeover run and that seems to be the right thing to do. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 9a3f0c0e61ca5c17e020c6e0463d73c7cf4f7c09) 2013-08-27 06:14:34 +04:00			`} else {`
			`ret = -1;`
server: reload the public addresses before doing a takeover run metze (This used to be ctdb commit 0e41a2204fa8a1e77dc83c0d4b253ab272b5c72d) 2010-01-19 10:42:48 +03:00			`}`
			`}`

add a new command "ctdb ipreallocate", this command will force the recovery master to perform a full ip reallocation process. the ctdb command will block until the ip reallocation has comleted (This used to be ctdb commit abad7b97fe0c066b33f6e75d0953bbed892a3216) 2009-07-02 07:00:26 +04:00			`result.dsize = sizeof(int32_t);`
			`result.dptr = (uint8_t *)&ret;`

ctdb-recoverd: Only respond to currently queued ipreallocated requests Otherwise new requests can come in during the latter parts of the takeover run when the IP allocation algorithm has already run, and the new requests will be dequeued even though they haven't really be processed. Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Michael Adam <obnox@samba.org> 2013-11-22 06:57:03 +04:00			`srvid_requests_reply(ctdb, &current, result);`
add a new command "ctdb ipreallocate", this command will force the recovery master to perform a full ip reallocation process. the ctdb command will block until the ip reallocation has comleted (This used to be ctdb commit abad7b97fe0c066b33f6e75d0953bbed892a3216) 2009-07-02 07:00:26 +04:00			`}`
add a new node state : DELETED. This is used to mark nodes as being DELETED internally in ctdb so that nodes are not renumbered if / when they are removed from the nodes file. This is used to be able to do "ctdb reloadnodes" at runtime without causing nodes to be renumbered. To do this, instead of deleting a node from the nodes file, just comment it out like 1.0.0.1 #1.0.0.2 1.0.0.3 After removing 1.0.0.2 from the cluster, the remaining nodes retain their pnn's from prior to the deletion, namely 0 and 2 Any line in the nodes file that is commented out represents a DELETED pnn (This used to be ctdb commit 6a5e4fd7fa391206b463bb4e976502f3ac5bd343) 2009-06-01 08:18:34 +04:00

recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`/*`
			`handler for recovery master elections`
			`*/`
ctdb-daemon: Replace ctdb_message with srvid abstraction Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-04-08 07:38:26 +03:00			`static void election_handler(uint64_t srvid, TDB_DATA data, void *private_data)`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`{`
ctdb-daemon: Replace ctdb_message with srvid abstraction Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-04-08 07:38:26 +03:00			`struct ctdb_recoverd *rec = talloc_get_type(`
			`private_data, struct ctdb_recoverd);`
			`struct ctdb_context *ctdb = rec->ctdb;`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`int ret;`
			`struct election_message em = (struct election_message )data.dptr;`

ctdb-recoverd: A node refuses to play against itself Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Michael Adam <obnox@samba.org> 2013-11-01 07:34:20 +04:00			`/* Ignore election packets from ourself */`
			`if (ctdb->pnn == em->pnn) {`
			`return;`
			`}`

make election handling much more scalable (This used to be ctdb commit 05938d462b92bd9ecb8e35f53651bded47c48675) 2007-11-13 02:27:44 +03:00			`/* we got an election packet - update the timeout for the election */`
			`talloc_free(rec->election_timeout);`
ctdb-daemon: Stop using tevent compatibility definitions Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-26 08:50:09 +03:00			`rec->election_timeout = tevent_add_timer(`
			`ctdb->ev, ctdb,`
			`fast_start ?`
			`timeval_current_ofs(0, 500000) :`
			`timeval_current_ofs(ctdb->tunable.election_timeout, 0),`
			`ctdb_election_timeout, rec);`
make election handling much more scalable (This used to be ctdb commit 05938d462b92bd9ecb8e35f53651bded47c48675) 2007-11-13 02:27:44 +03:00
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`/* someone called an election. check their election data`
			`and if we disagree and we would rather be the elected node,`
			`send a new election message to all other nodes`
			`*/`
choose the most connected node first (This used to be ctdb commit c7c17a79fa4f28509e34b6f635fa62517dc458c2) 2007-06-07 13:17:27 +04:00			`if (ctdb_election_win(rec, em)) {`
make election handling much more scalable (This used to be ctdb commit 05938d462b92bd9ecb8e35f53651bded47c48675) 2007-11-13 02:27:44 +03:00			`if (!rec->send_election_te) {`
ctdb-daemon: Stop using tevent compatibility definitions Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-26 08:50:09 +03:00			`rec->send_election_te = tevent_add_timer(`
			`ctdb->ev, rec,`
			`timeval_current_ofs(0, 500000),`
			`election_send_request, rec);`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`}`
should be sufficient to unban nodes when we unbecome recmaster (This used to be ctdb commit 8a6c4e675b4b877a9d0a7a3701973573ff0b71e8) 2007-06-09 14:13:25 +04:00			`/unban_all_nodes(ctdb);/`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`return;`
			`}`
ctdb-recoverd: New function ctdb_recovery_have_lock() True if this recovery daemon holds the lock. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2014-12-09 05:50:22 +03:00
make election handling much more scalable (This used to be ctdb commit 05938d462b92bd9ecb8e35f53651bded47c48675) 2007-11-13 02:27:44 +03:00			`/* we didn't win */`
ctdb-recoverd: Simplify using TALLOC_FREE() Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-03-31 05:59:02 +03:00			`TALLOC_FREE(rec->send_election_te);`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00
ctdb-recoverd: Remove redundant condition when checking recovery lock It isn't possible to hold the recovery lock without having a lock file set. This is part of a goal to generalise the recovery lock mechanism to just use a helper program, which may use a lock file or may use something else. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-03-31 05:59:49 +03:00			`/* Release the recovery lock file */`
			`if (ctdb_recovery_have_lock(ctdb)) {`
			`ctdb_recovery_unlock(ctdb);`
			`unban_all_nodes(ctdb);`
- startup frozen, and do an initial recovery - fixed a bug in traverse - get a lock on the node list file in the recmaster recovery daemon (This used to be ctdb commit 162a5647535ad1cb3e8e5d4042a2784365fb1913) 2007-05-23 08:35:19 +04:00			`}`

ctdb-recoverd: Clear IP assignment tree on election loss If a node was previously recovery master (say, 20 years ago) and it becomes recovery master again then, if IP assignments have changed, verify_remote_ip_allocation() can produce messages like the following when called during recovery: ctdbd: recoverd:Inconsistent IP allocation - node 0 thinks 10.1.1.1 is held by node 0 while it is assigned to node 1 When a node loses an election it should clear all data specific to it being the recovery master. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-06-11 08:49:25 +03:00			`clear_ip_assignment_tree(ctdb);`

recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`/* ok, let that guy become recmaster then */`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`ret = ctdb_ctrl_setrecmaster(ctdb, CONTROL_TIMEOUT(), ctdb_get_pnn(ctdb), em->pnn);`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " failed to send recmaster election request"));`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`return;`
			`}`

			`return;`
			`}`


implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`/*`
			`force the start of the election process`
			`*/`
add a new tunable : reclockpingperiod once every such interval : * the recovery master on each node will uppdate the "connected" count in the reclock count file (ctdb getreclock) * if the node thinks it is a recovery master but it detects another node that is DISCONNECTED but which still holds a lock to the reclock count file this may mean that we have a split cluster. if that other node that is DISCONNECTED but still holds the lock on hte reclock pnn count file, is MORE connected than the local node, yield the recmaster role and let the other half of the lcuster take over this add a second, last chance mechanism to detect split clusters. IF the cluster is split but GPFS is not yet split, this mechanism makes the largest half of the cluster become the active half. (This used to be ctdb commit 07af425f444531942cce8abff112c1524228d287) 2008-03-03 01:19:30 +03:00			`static void force_election(struct ctdb_recoverd *rec, uint32_t pnn,`
ctdb-daemon: Rename struct ctdb_node_map to ctdb_node_map_old Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 09:22:48 +03:00			`struct ctdb_node_map_old *nodemap)`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`{`
			`int ret;`
use a priority time for the election data, not just the vnn (This used to be ctdb commit a691f9c5cd77194005f0d98483da94b07a48d57d) 2007-06-07 12:37:27 +04:00			`struct ctdb_context *ctdb = rec->ctdb;`
when starting a new election, also force all nodes into recovery mode so there is no internode traffic to interfere with our election (This used to be ctdb commit ccfb67a076c72a0e7f2b6dc5fce9c19f652ba2ad) 2007-05-10 03:48:14 +04:00
When we create new election data to send during elections, we must re-read the node flags from the main daemon to catch when the STOPPED flag is changed. (This used to be ctdb commit ca4982c40d81db528fe915d5ecc01fcf7df0b522) 2009-07-17 05:37:03 +04:00			`DEBUG(DEBUG_INFO,(__location__ " Force an election\n"));`

when starting a new election, also force all nodes into recovery mode so there is no internode traffic to interfere with our election (This used to be ctdb commit ccfb67a076c72a0e7f2b6dc5fce9c19f652ba2ad) 2007-05-10 03:48:14 +04:00			`/* set all nodes to recovery mode to stop all internode traffic */`
ctdb-recoverd: Do not freeze databases for election If election occurs during SMB activity, then trying to freeze all the databases can cause samba/ctdb deadlock which parallel database recovery is trying to avoid. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-06 03:52:06 +03:00			`ret = set_recovery_mode(ctdb, rec, nodemap, CTDB_RECOVERY_ACTIVE, false);`
in the destructor for the lock-wait child, make sure that we cancel any pending transactions. (This used to be ctdb commit 45b6ff64f6ddf037b810c4e5f8b9f04d71067b98) 2008-07-07 02:50:12 +04:00			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to set recovery mode to active on cluster\n"));`
when starting a new election, also force all nodes into recovery mode so there is no internode traffic to interfere with our election (This used to be ctdb commit ccfb67a076c72a0e7f2b6dc5fce9c19f652ba2ad) 2007-05-10 03:48:14 +04:00			`return;`
			`}`
make election handling much more scalable (This used to be ctdb commit 05938d462b92bd9ecb8e35f53651bded47c48675) 2007-11-13 02:27:44 +03:00
			`talloc_free(rec->election_timeout);`
ctdb-daemon: Stop using tevent compatibility definitions Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-26 08:50:09 +03:00			`rec->election_timeout = tevent_add_timer(`
			`ctdb->ev, ctdb,`
			`fast_start ?`
			`timeval_current_ofs(0, 500000) :`
			`timeval_current_ofs(ctdb->tunable.election_timeout, 0),`
			`ctdb_election_timeout, rec);`
make election handling much more scalable (This used to be ctdb commit 05938d462b92bd9ecb8e35f53651bded47c48675) 2007-11-13 02:27:44 +03:00
Revert "if a new node enters the cluster, that node will already be frozen at start" This is unnecessary due to 03e2e436db5cfd29a56d13f5d2101e42389bfc94. Furthermore, if a node doesn't force an election but wins it then it can fail to record that it is the new recovery master. This can lead to a reverse split brain where there is no recovery master. This reverts commit c5035657606283d2e35bea40992505e84ca8e7be. Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> Conflicts: server/ctdb_recoverd.c (This used to be ctdb commit c8b542e059a54b8d524bd430cad9d82e5edd864d) 2013-10-29 09:38:42 +04:00			`ret = send_election_request(rec, pnn);`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`if (ret!=0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " failed to initiate recmaster election"));`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`return;`
			`}`

moved system specific ip code to system.c (This used to be ctdb commit 9de9e4ccda9665108baac12a8716b189d26340b1) 2007-05-26 08:01:08 +04:00			`/* wait for a few seconds to collect all responses */`
make election handling much more scalable (This used to be ctdb commit 05938d462b92bd9ecb8e35f53651bded47c48675) 2007-11-13 02:27:44 +03:00			`ctdb_wait_election(rec);`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`}`



			`/*`
			`handler for when a node changes its flags`
			`*/`
ctdb-daemon: Replace ctdb_message with srvid abstraction Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-04-08 07:38:26 +03:00			`static void monitor_handler(uint64_t srvid, TDB_DATA data, void *private_data)`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`{`
ctdb-daemon: Replace ctdb_message with srvid abstraction Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-04-08 07:38:26 +03:00			`struct ctdb_recoverd *rec = talloc_get_type(`
			`private_data, struct ctdb_recoverd);`
			`struct ctdb_context *ctdb = rec->ctdb;`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`int ret;`
			`struct ctdb_node_flag_change c = (struct ctdb_node_flag_change )data.dptr;`
ctdb-daemon: Rename struct ctdb_node_map to ctdb_node_map_old Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 09:22:48 +03:00			`struct ctdb_node_map_old *nodemap=NULL;`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`TALLOC_CTX *tmp_ctx;`
			`int i;`
verify the DISABLED flag and compare with the previous flag we have registered for that node and not what the node says is the difference. this prevents a situation where the remove node may cause spurious ip reallocations. (This used to be ctdb commit dd122351efaeef5475cdec111eb900110d83ec35) 2009-10-10 06:55:11 +04:00			`int disabled_flag_changed;`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00
			`if (data.dsize != sizeof(*c)) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,(__location__ "Invalid data in ctdb_node_flag_change\n"));`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`return;`
			`}`

			`tmp_ctx = talloc_new(ctdb);`
			`CTDB_NO_MEMORY_VOID(ctdb, tmp_ctx);`

			`ret = ctdb_ctrl_getnodemap(ctdb, CONTROL_TIMEOUT(), CTDB_CURRENT_NODE, tmp_ctx, &nodemap);`
fixed segv on failed ctdb_ctrl_getnodemap (This used to be ctdb commit 5daf9a72f0e60a9af7cf32ae6d759be7d94857ec) 2007-12-27 02:07:01 +03:00			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,(__location__ "ctdb_ctrl_getnodemap failed in monitor_handler\n"));`
fixed segv on failed ctdb_ctrl_getnodemap (This used to be ctdb commit 5daf9a72f0e60a9af7cf32ae6d759be7d94857ec) 2007-12-27 02:07:01 +03:00			`talloc_free(tmp_ctx);`
			`return;`
			`}`

implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00
			`for (i=0;i<nodemap->num;i++) {`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`if (nodemap->nodes[i].pnn == c->pnn) break;`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`}`

			`if (i == nodemap->num) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_CRIT,(__location__ "Flag change for non-existant node %u\n", c->pnn));`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`talloc_free(tmp_ctx);`
			`return;`
			`}`

recoverd: Really fix bogus info in message about changed flags Commit 9119a568c2b4601318f7751f537dca2f92a7230b attempted to fix this. However, this was wrong because old_flags and new_flags were confused. The latter has since been fixed in commit 7eb2f89979360b6cc98ca9b17c48310277fa89fc so this can now be fixed properly. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 40f2825d6e818dc8c745b6385a545969dfb45fbc) 2013-07-11 07:01:13 +04:00			`if (c->old_flags != c->new_flags) {`
			`DEBUG(DEBUG_NOTICE,("Node %u has changed flags - now 0x%x was 0x%x\n", c->pnn, c->new_flags, c->old_flags));`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`}`

verify the DISABLED flag and compare with the previous flag we have registered for that node and not what the node says is the difference. this prevents a situation where the remove node may cause spurious ip reallocations. (This used to be ctdb commit dd122351efaeef5475cdec111eb900110d83ec35) 2009-10-10 06:55:11 +04:00			`disabled_flag_changed = (nodemap->nodes[i].flags ^ c->new_flags) & NODE_FLAGS_DISABLED;`

change the structure used for node flag change messages so that we can see both the old flags as well as the new flags (so we can tell which flags changed) send the CTDB_SRVID_RECONFIGURE messages to connected nodes only, not to every node, connected or not, in the cluster. in the handler inside the recovery daemon which is invoked for node flag change messages, only do a takeover_run() and redistribute the ip addresses IF it was the disabled or the unhealthy flags that changed. Also send out the cluster reconfigured message in this case. If any of the other flags changed we dont need to do the takeover_run(0 here since that will be done during recovery. (This used to be ctdb commit 5549b2058e2c148a8ca9d419123acf3247bb8829) 2007-08-21 11:25:15 +04:00			`nodemap->nodes[i].flags = c->new_flags;`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`ret = ctdb_ctrl_getrecmaster(ctdb, tmp_ctx, CONTROL_TIMEOUT(),`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`CTDB_CURRENT_NODE, &ctdb->recovery_master);`

			`if (ret == 0) {`
hang the ctdb_req_control structure off the ctdb_client_control_state struct so that if we timeout a control we can print debug info such as what opcode failed and to which node we dont need the *status parameter to ctdb_client_control_state create async versions of the getrecmaster control pass a memory context to getrecmaster (This used to be ctdb commit 558b680c82f830fba82c283c78c2de8a0b150b75) 2007-08-23 07:00:10 +04:00			`ret = ctdb_ctrl_getrecmode(ctdb, tmp_ctx, CONTROL_TIMEOUT(),`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`CTDB_CURRENT_NODE, &ctdb->recovery_mode);`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`}`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00
			`if (ret == 0 &&`
change ctdb->vnn to ctdb->pnn (This used to be ctdb commit 8c776e5707e503ec6586aae39ac6b3ea5a2fd2bc) 2007-09-04 04:06:36 +04:00			`ctdb->recovery_master == ctdb->pnn &&`
remove some unnessecary tests if ->vnn is null or not (This used to be ctdb commit f0169ac8166a19d65ce254496e21d095aed87c2f) 2008-05-15 07:28:19 +04:00			`ctdb->recovery_mode == CTDB_RECOVERY_NORMAL) {`
change the structure used for node flag change messages so that we can see both the old flags as well as the new flags (so we can tell which flags changed) send the CTDB_SRVID_RECONFIGURE messages to connected nodes only, not to every node, connected or not, in the cluster. in the handler inside the recovery daemon which is invoked for node flag change messages, only do a takeover_run() and redistribute the ip addresses IF it was the disabled or the unhealthy flags that changed. Also send out the cluster reconfigured message in this case. If any of the other flags changed we dont need to do the takeover_run(0 here since that will be done during recovery. (This used to be ctdb commit 5549b2058e2c148a8ca9d419123acf3247bb8829) 2007-08-21 11:25:15 +04:00			`/* Only do the takeover run if the perm disabled or unhealthy`
			`flags changed since these will cause an ip failover but not`
			`a recovery.`
			`If the node became disconnected or banned this will also`
			`lead to an ip address failover but that is handled`
			`during recovery`
			`*/`
verify the DISABLED flag and compare with the previous flag we have registered for that node and not what the node says is the difference. this prevents a situation where the remove node may cause spurious ip reallocations. (This used to be ctdb commit dd122351efaeef5475cdec111eb900110d83ec35) 2009-10-10 06:55:11 +04:00			`if (disabled_flag_changed) {`
prevent recursion in the calling of ctdb_takeover_run (This used to be ctdb commit 0fbdeb7c91b965d9bc5ecc7b24e31070378d8f1d) 2007-09-13 08:08:18 +04:00			`rec->need_takeover_run = true;`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`}`
			`}`

			`talloc_free(tmp_ctx);`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`}`
add a test in the function that checks whether the cluster needs recovery or not that all active nodes are in normal mode. If we discover that some node is still in recoverymode it may indicate that a previous recovery ended prematurely and thus we should start a new recovery (This used to be ctdb commit c15517872e6c98c8c425a8d47d2b348ecb0620b0) 2007-05-06 22:41:12 +04:00
reqrite the handling of flag updates across the cluster to eliminate a race between the ctdb tool and the recovery daemon both at once trying to push flag changes across the cluster. (This used to be ctdb commit a9a1156ea4e10483a4bf4265b8e9203f0af033aa) 2008-11-19 06:43:46 +03:00			`/*`
			`handler for when we need to push out flag changes ot all other nodes`
			`*/`
ctdb-daemon: Replace ctdb_message with srvid abstraction Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-04-08 07:38:26 +03:00			`static void push_flags_handler(uint64_t srvid, TDB_DATA data,`
			`void *private_data)`
reqrite the handling of flag updates across the cluster to eliminate a race between the ctdb tool and the recovery daemon both at once trying to push flag changes across the cluster. (This used to be ctdb commit a9a1156ea4e10483a4bf4265b8e9203f0af033aa) 2008-11-19 06:43:46 +03:00			`{`
ctdb-daemon: Replace ctdb_message with srvid abstraction Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-04-08 07:38:26 +03:00			`struct ctdb_recoverd *rec = talloc_get_type(`
			`private_data, struct ctdb_recoverd);`
			`struct ctdb_context *ctdb = rec->ctdb;`
reqrite the handling of flag updates across the cluster to eliminate a race between the ctdb tool and the recovery daemon both at once trying to push flag changes across the cluster. (This used to be ctdb commit a9a1156ea4e10483a4bf4265b8e9203f0af033aa) 2008-11-19 06:43:46 +03:00			`int ret;`
			`struct ctdb_node_flag_change c = (struct ctdb_node_flag_change )data.dptr;`
ctdb-daemon: Rename struct ctdb_node_map to ctdb_node_map_old Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 09:22:48 +03:00			`struct ctdb_node_map_old *nodemap=NULL;`
server: if takeover runs when the recovery master becomes unhealthy The problem was this: When the monitor event fails, the node->flags get updated, and an update (containing the old and new flags) is sent to the recovery master. If the recovery master sends the update to itself (the same process), it was compairing the node->flags variable with the received new flags. This check always found both flag values to be equal and never sets the rec->need_takeover_run variable to true. There were two problem, first the push_flags_handler() function didn't pass the received old flags. And the ctdb_control_modflags() function ignored the received old flags. metze (This used to be ctdb commit 8ec633b64a05a2d903c2b9639909f15f6375548f) 2009-10-09 17:47:49 +04:00			`TALLOC_CTX *tmp_ctx = talloc_new(ctdb);`
			`uint32_t recmaster;`
			`uint32_t *nodes;`
reqrite the handling of flag updates across the cluster to eliminate a race between the ctdb tool and the recovery daemon both at once trying to push flag changes across the cluster. (This used to be ctdb commit a9a1156ea4e10483a4bf4265b8e9203f0af033aa) 2008-11-19 06:43:46 +03:00
server: if takeover runs when the recovery master becomes unhealthy The problem was this: When the monitor event fails, the node->flags get updated, and an update (containing the old and new flags) is sent to the recovery master. If the recovery master sends the update to itself (the same process), it was compairing the node->flags variable with the received new flags. This check always found both flag values to be equal and never sets the rec->need_takeover_run variable to true. There were two problem, first the push_flags_handler() function didn't pass the received old flags. And the ctdb_control_modflags() function ignored the received old flags. metze (This used to be ctdb commit 8ec633b64a05a2d903c2b9639909f15f6375548f) 2009-10-09 17:47:49 +04:00			`/* find the recovery master */`
			`ret = ctdb_ctrl_getrecmaster(ctdb, tmp_ctx, CONTROL_TIMEOUT(), CTDB_CURRENT_NODE, &recmaster);`
reqrite the handling of flag updates across the cluster to eliminate a race between the ctdb tool and the recovery daemon both at once trying to push flag changes across the cluster. (This used to be ctdb commit a9a1156ea4e10483a4bf4265b8e9203f0af033aa) 2008-11-19 06:43:46 +03:00			`if (ret != 0) {`
server: if takeover runs when the recovery master becomes unhealthy The problem was this: When the monitor event fails, the node->flags get updated, and an update (containing the old and new flags) is sent to the recovery master. If the recovery master sends the update to itself (the same process), it was compairing the node->flags variable with the received new flags. This check always found both flag values to be equal and never sets the rec->need_takeover_run variable to true. There were two problem, first the push_flags_handler() function didn't pass the received old flags. And the ctdb_control_modflags() function ignored the received old flags. metze (This used to be ctdb commit 8ec633b64a05a2d903c2b9639909f15f6375548f) 2009-10-09 17:47:49 +04:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to get recmaster from local node\n"));`
			`talloc_free(tmp_ctx);`
			`return;`
			`}`

			`/* read the node flags from the recmaster */`
			`ret = ctdb_ctrl_getnodemap(ctdb, CONTROL_TIMEOUT(), recmaster, tmp_ctx, &nodemap);`
			`if (ret != 0) {`
			`DEBUG(DEBUG_ERR, (__location__ " Unable to get nodemap from node %u\n", c->pnn));`
			`talloc_free(tmp_ctx);`
			`return;`
reqrite the handling of flag updates across the cluster to eliminate a race between the ctdb tool and the recovery daemon both at once trying to push flag changes across the cluster. (This used to be ctdb commit a9a1156ea4e10483a4bf4265b8e9203f0af033aa) 2008-11-19 06:43:46 +03:00			`}`
server: if takeover runs when the recovery master becomes unhealthy The problem was this: When the monitor event fails, the node->flags get updated, and an update (containing the old and new flags) is sent to the recovery master. If the recovery master sends the update to itself (the same process), it was compairing the node->flags variable with the received new flags. This check always found both flag values to be equal and never sets the rec->need_takeover_run variable to true. There were two problem, first the push_flags_handler() function didn't pass the received old flags. And the ctdb_control_modflags() function ignored the received old flags. metze (This used to be ctdb commit 8ec633b64a05a2d903c2b9639909f15f6375548f) 2009-10-09 17:47:49 +04:00			`if (c->pnn >= nodemap->num) {`
			`DEBUG(DEBUG_ERR,(__location__ " Nodemap from recmaster does not contain node %d\n", c->pnn));`
			`talloc_free(tmp_ctx);`
			`return;`
			`}`

			`/* send the flags update to all connected nodes */`
			`nodes = list_of_connected_nodes(ctdb, nodemap, tmp_ctx, true);`

			`if (ctdb_client_async_control(ctdb, CTDB_CONTROL_MODIFY_FLAGS,`
			`nodes, 0, CONTROL_TIMEOUT(),`
			`false, data,`
			`NULL, NULL,`
			`NULL) != 0) {`
			`DEBUG(DEBUG_ERR, (__location__ " ctdb_control to modify node flags failed\n"));`

			`talloc_free(tmp_ctx);`
			`return;`
			`}`

			`talloc_free(tmp_ctx);`
reqrite the handling of flag updates across the cluster to eliminate a race between the ctdb tool and the recovery daemon both at once trying to push flag changes across the cluster. (This used to be ctdb commit a9a1156ea4e10483a4bf4265b8e9203f0af033aa) 2008-11-19 06:43:46 +03:00			`}`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00
break checking that the recoverymode on all nodes are ok out into its own function (This used to be ctdb commit 813cf9a252af96da24122b80f24aabeed2911939) 2007-08-23 07:48:39 +04:00
change the monitoring of recmode in the recovery daemon to use a fully async eventdriven api for controls (This used to be ctdb commit 8d0e43428c507967a0d96e6a4c5c821ac269c546) 2007-08-27 03:40:10 +04:00			`struct verify_recmode_normal_data {`
			`uint32_t count;`
			`enum monitor_result status;`
			`};`

			`static void verify_recmode_normal_callback(struct ctdb_client_control_state *state)`
			`{`
change async.private to async.private_data since private is a reserved work in c++ (This used to be ctdb commit 79eb28f6cd5dcc30b04966d202a050eaf98a2552) 2007-09-26 08:25:32 +04:00			`struct verify_recmode_normal_data *rmdata = talloc_get_type(state->async.private_data, struct verify_recmode_normal_data);`
change the monitoring of recmode in the recovery daemon to use a fully async eventdriven api for controls (This used to be ctdb commit 8d0e43428c507967a0d96e6a4c5c821ac269c546) 2007-08-27 03:40:10 +04:00

			`/* one more node has responded with recmode data*/`
			`rmdata->count--;`

			`/* if we failed to get the recmode, then return an error and let`
			`the main loop try again.`
			`*/`
			`if (state->state != CTDB_CONTROL_DONE) {`
			`if (rmdata->status == MONITOR_OK) {`
			`rmdata->status = MONITOR_FAILED;`
			`}`
			`return;`
			`}`

			`/* if we got a response, then the recmode will be stored in the`
			`status field`
			`*/`
			`if (state->status != CTDB_RECOVERY_NORMAL) {`
recoverd: Fix an unclear log message - "Restart recovery process" When the recovery master notices a node in recovery mode it starts the recovery process, it doesn't restart it. Update documentation to match. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 298c4d2c3b4ea3d900c91f5a0a5aca2952a13d61) 2013-06-30 11:57:33 +04:00			`DEBUG(DEBUG_NOTICE, ("Node:%u was in recovery mode. Start recovery process\n", state->c->hdr.destnode));`
change the monitoring of recmode in the recovery daemon to use a fully async eventdriven api for controls (This used to be ctdb commit 8d0e43428c507967a0d96e6a4c5c821ac269c546) 2007-08-27 03:40:10 +04:00			`rmdata->status = MONITOR_RECOVERY_NEEDED;`
			`}`

			`return;`
			`}`


			`/* verify that all nodes are in normal recovery mode */`
ctdb-daemon: Rename struct ctdb_node_map to ctdb_node_map_old Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 09:22:48 +03:00			`static enum monitor_result verify_recmode(struct ctdb_context ctdb, struct ctdb_node_map_old nodemap)`
break checking that the recoverymode on all nodes are ok out into its own function (This used to be ctdb commit 813cf9a252af96da24122b80f24aabeed2911939) 2007-08-23 07:48:39 +04:00			`{`
change the monitoring of recmode in the recovery daemon to use a fully async eventdriven api for controls (This used to be ctdb commit 8d0e43428c507967a0d96e6a4c5c821ac269c546) 2007-08-27 03:40:10 +04:00			`struct verify_recmode_normal_data *rmdata;`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`TALLOC_CTX *mem_ctx = talloc_new(ctdb);`
change the monitoring of recmode in the recovery daemon to use a fully async eventdriven api for controls (This used to be ctdb commit 8d0e43428c507967a0d96e6a4c5c821ac269c546) 2007-08-27 03:40:10 +04:00			`struct ctdb_client_control_state *state;`
			`enum monitor_result status;`
			`int j;`
break checking that the recoverymode on all nodes are ok out into its own function (This used to be ctdb commit 813cf9a252af96da24122b80f24aabeed2911939) 2007-08-23 07:48:39 +04:00
change the monitoring of recmode in the recovery daemon to use a fully async eventdriven api for controls (This used to be ctdb commit 8d0e43428c507967a0d96e6a4c5c821ac269c546) 2007-08-27 03:40:10 +04:00			`rmdata = talloc(mem_ctx, struct verify_recmode_normal_data);`
			`CTDB_NO_MEMORY_FATAL(ctdb, rmdata);`
			`rmdata->count = 0;`
			`rmdata->status = MONITOR_OK;`
break checking that the recoverymode on all nodes are ok out into its own function (This used to be ctdb commit 813cf9a252af96da24122b80f24aabeed2911939) 2007-08-23 07:48:39 +04:00
			`/* loop over all active nodes and send an async getrecmode call to`
			`them*/`
			`for (j=0; j<nodemap->num; j++) {`
			`if (nodemap->nodes[j].flags & NODE_FLAGS_INACTIVE) {`
			`continue;`
			`}`
change the monitoring of recmode in the recovery daemon to use a fully async eventdriven api for controls (This used to be ctdb commit 8d0e43428c507967a0d96e6a4c5c821ac269c546) 2007-08-27 03:40:10 +04:00			`state = ctdb_ctrl_getrecmode_send(ctdb, mem_ctx,`
break checking that the recoverymode on all nodes are ok out into its own function (This used to be ctdb commit 813cf9a252af96da24122b80f24aabeed2911939) 2007-08-23 07:48:39 +04:00			`CONTROL_TIMEOUT(),`
change how we do public addresses and takeover so that we can have multiple public addresses spread across multiple interfaces on each node. this is a massive patch since we have previously made the assumtion that we only have one public address per node. get rid of the public_interface argument. the public addresses file now explicitely lists which interface the address belongs to (This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8) 2007-09-04 03:50:07 +04:00			`nodemap->nodes[j].pnn);`
change the monitoring of recmode in the recovery daemon to use a fully async eventdriven api for controls (This used to be ctdb commit 8d0e43428c507967a0d96e6a4c5c821ac269c546) 2007-08-27 03:40:10 +04:00			`if (state == NULL) {`
			`/* we failed to send the control, treat this as`
			`an error and try again next iteration`
			`*/`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,("Failed to call ctdb_ctrl_getrecmode_send during monitoring\n"));`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`talloc_free(mem_ctx);`
break checking that the recoverymode on all nodes are ok out into its own function (This used to be ctdb commit 813cf9a252af96da24122b80f24aabeed2911939) 2007-08-23 07:48:39 +04:00			`return MONITOR_FAILED;`
			`}`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00
change the monitoring of recmode in the recovery daemon to use a fully async eventdriven api for controls (This used to be ctdb commit 8d0e43428c507967a0d96e6a4c5c821ac269c546) 2007-08-27 03:40:10 +04:00			`/* set up the callback functions */`
			`state->async.fn = verify_recmode_normal_callback;`
change async.private to async.private_data since private is a reserved work in c++ (This used to be ctdb commit 79eb28f6cd5dcc30b04966d202a050eaf98a2552) 2007-09-26 08:25:32 +04:00			`state->async.private_data = rmdata;`
change the monitoring of recmode in the recovery daemon to use a fully async eventdriven api for controls (This used to be ctdb commit 8d0e43428c507967a0d96e6a4c5c821ac269c546) 2007-08-27 03:40:10 +04:00
			`/* one more control to wait for to complete */`
			`rmdata->count++;`
break checking that the recoverymode on all nodes are ok out into its own function (This used to be ctdb commit 813cf9a252af96da24122b80f24aabeed2911939) 2007-08-23 07:48:39 +04:00			`}`

change the monitoring of recmode in the recovery daemon to use a fully async eventdriven api for controls (This used to be ctdb commit 8d0e43428c507967a0d96e6a4c5c821ac269c546) 2007-08-27 03:40:10 +04:00
			`/* now wait for up to the maximum number of seconds allowed`
			`or until all nodes we expect a response from has replied`
			`*/`
			`while (rmdata->count > 0) {`
ctdb-daemon: Stop using tevent compatibility definitions Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-26 08:50:09 +03:00			`tevent_loop_once(ctdb->ev);`
change the monitoring of recmode in the recovery daemon to use a fully async eventdriven api for controls (This used to be ctdb commit 8d0e43428c507967a0d96e6a4c5c821ac269c546) 2007-08-27 03:40:10 +04:00			`}`

			`status = rmdata->status;`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`talloc_free(mem_ctx);`
change the monitoring of recmode in the recovery daemon to use a fully async eventdriven api for controls (This used to be ctdb commit 8d0e43428c507967a0d96e6a4c5c821ac269c546) 2007-08-27 03:40:10 +04:00			`return status;`
break checking that the recoverymode on all nodes are ok out into its own function (This used to be ctdb commit 813cf9a252af96da24122b80f24aabeed2911939) 2007-08-23 07:48:39 +04:00			`}`

change the monitoring of recmode in the recovery daemon to use a fully async eventdriven api for controls (This used to be ctdb commit 8d0e43428c507967a0d96e6a4c5c821ac269c546) 2007-08-27 03:40:10 +04:00
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`struct verify_recmaster_data {`
when a node disgrees with us re who is recmaster make it mark that node as a lcuprit so it eventually gets banned (This used to be ctdb commit eff3f326f8ce6070c9f3c430cd14d1b71a8db220) 2008-04-21 18:56:27 +04:00			`struct ctdb_recoverd *rec;`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`uint32_t count;`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`uint32_t pnn;`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`enum monitor_result status;`
			`};`

change the api for managing callbacks to controls so that isntead of passing it as a parameter we set the callback function explicitely from the caller if the ..._send() function returned a valid state pointer. (This used to be ctdb commit aa939570662786455f63299b62c99882cff29d42) 2007-08-24 04:42:06 +04:00			`static void verify_recmaster_callback(struct ctdb_client_control_state *state)`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`{`
change async.private to async.private_data since private is a reserved work in c++ (This used to be ctdb commit 79eb28f6cd5dcc30b04966d202a050eaf98a2552) 2007-09-26 08:25:32 +04:00			`struct verify_recmaster_data *rmdata = talloc_get_type(state->async.private_data, struct verify_recmaster_data);`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00

			`/* one more node has responded with recmaster data*/`
			`rmdata->count--;`

			`/* if we failed to get the recmaster, then return an error and let`
			`the main loop try again.`
			`*/`
change the api for managing callbacks to controls so that isntead of passing it as a parameter we set the callback function explicitely from the caller if the ..._send() function returned a valid state pointer. (This used to be ctdb commit aa939570662786455f63299b62c99882cff29d42) 2007-08-24 04:42:06 +04:00			`if (state->state != CTDB_CONTROL_DONE) {`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`if (rmdata->status == MONITOR_OK) {`
			`rmdata->status = MONITOR_FAILED;`
			`}`
change the api for managing callbacks to controls so that isntead of passing it as a parameter we set the callback function explicitely from the caller if the ..._send() function returned a valid state pointer. (This used to be ctdb commit aa939570662786455f63299b62c99882cff29d42) 2007-08-24 04:42:06 +04:00			`return;`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`}`

			`/* if we got a response, then the recmaster will be stored in the`
			`status field`
			`*/`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`if (state->status != rmdata->pnn) {`
recoverd: Improve log message when nodes disagree on recmaster Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 7b7aa7b599536cd60ebb84d363607bb4e953248a) 2013-08-14 05:44:12 +04:00			`DEBUG(DEBUG_ERR,("Node %d thinks node %d is recmaster. Need a new recmaster election\n", state->c->hdr.destnode, state->status));`
when a node disgrees with us re who is recmaster make it mark that node as a lcuprit so it eventually gets banned (This used to be ctdb commit eff3f326f8ce6070c9f3c430cd14d1b71a8db220) 2008-04-21 18:56:27 +04:00			`ctdb_set_culprit(rmdata->rec, state->c->hdr.destnode);`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`rmdata->status = MONITOR_ELECTION_NEEDED;`
			`}`

change the api for managing callbacks to controls so that isntead of passing it as a parameter we set the callback function explicitely from the caller if the ..._send() function returned a valid state pointer. (This used to be ctdb commit aa939570662786455f63299b62c99882cff29d42) 2007-08-24 04:42:06 +04:00			`return;`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`}`


			`/* verify that all nodes agree that we are the recmaster */`
ctdb-daemon: Rename struct ctdb_node_map to ctdb_node_map_old Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 09:22:48 +03:00			`static enum monitor_result verify_recmaster(struct ctdb_recoverd rec, struct ctdb_node_map_old nodemap, uint32_t pnn)`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`{`
when a node disgrees with us re who is recmaster make it mark that node as a lcuprit so it eventually gets banned (This used to be ctdb commit eff3f326f8ce6070c9f3c430cd14d1b71a8db220) 2008-04-21 18:56:27 +04:00			`struct ctdb_context *ctdb = rec->ctdb;`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`struct verify_recmaster_data *rmdata;`
			`TALLOC_CTX *mem_ctx = talloc_new(ctdb);`
			`struct ctdb_client_control_state *state;`
			`enum monitor_result status;`
			`int j;`

			`rmdata = talloc(mem_ctx, struct verify_recmaster_data);`
			`CTDB_NO_MEMORY_FATAL(ctdb, rmdata);`
when a node disgrees with us re who is recmaster make it mark that node as a lcuprit so it eventually gets banned (This used to be ctdb commit eff3f326f8ce6070c9f3c430cd14d1b71a8db220) 2008-04-21 18:56:27 +04:00			`rmdata->rec = rec;`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`rmdata->count = 0;`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`rmdata->pnn = pnn;`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`rmdata->status = MONITOR_OK;`

			`/* loop over all active nodes and send an async getrecmaster call to`
			`them*/`
			`for (j=0; j<nodemap->num; j++) {`
			`if (nodemap->nodes[j].flags & NODE_FLAGS_INACTIVE) {`
			`continue;`
			`}`
			`state = ctdb_ctrl_getrecmaster_send(ctdb, mem_ctx,`
get rid of the explicit global timeout used in the previous example and try this time by relying on the timeouts for the individual controls (This used to be ctdb commit 448a0eb4fd896dc545aa0b4bb2ba4628491578be) 2007-08-23 13:38:54 +04:00			`CONTROL_TIMEOUT(),`
change how we do public addresses and takeover so that we can have multiple public addresses spread across multiple interfaces on each node. this is a massive patch since we have previously made the assumtion that we only have one public address per node. get rid of the public_interface argument. the public addresses file now explicitely lists which interface the address belongs to (This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8) 2007-09-04 03:50:07 +04:00			`nodemap->nodes[j].pnn);`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`if (state == NULL) {`
			`/* we failed to send the control, treat this as`
			`an error and try again next iteration`
			`*/`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,("Failed to call ctdb_ctrl_getrecmaster_send during monitoring\n"));`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`talloc_free(mem_ctx);`
			`return MONITOR_FAILED;`
			`}`

change the api for managing callbacks to controls so that isntead of passing it as a parameter we set the callback function explicitely from the caller if the ..._send() function returned a valid state pointer. (This used to be ctdb commit aa939570662786455f63299b62c99882cff29d42) 2007-08-24 04:42:06 +04:00			`/* set up the callback functions */`
			`state->async.fn = verify_recmaster_callback;`
change async.private to async.private_data since private is a reserved work in c++ (This used to be ctdb commit 79eb28f6cd5dcc30b04966d202a050eaf98a2552) 2007-09-26 08:25:32 +04:00			`state->async.private_data = rmdata;`
change the api for managing callbacks to controls so that isntead of passing it as a parameter we set the callback function explicitely from the caller if the ..._send() function returned a valid state pointer. (This used to be ctdb commit aa939570662786455f63299b62c99882cff29d42) 2007-08-24 04:42:06 +04:00
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`/* one more control to wait for to complete */`
			`rmdata->count++;`
			`}`


			`/* now wait for up to the maximum number of seconds allowed`
			`or until all nodes we expect a response from has replied`
			`*/`
get rid of the explicit global timeout used in the previous example and try this time by relying on the timeouts for the individual controls (This used to be ctdb commit 448a0eb4fd896dc545aa0b4bb2ba4628491578be) 2007-08-23 13:38:54 +04:00			`while (rmdata->count > 0) {`
ctdb-daemon: Stop using tevent compatibility definitions Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-26 08:50:09 +03:00			`tevent_loop_once(ctdb->ev);`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`}`

			`status = rmdata->status;`
			`talloc_free(mem_ctx);`
			`return status;`
			`}`

recoverd: Interface reference count changes should not cause takeover runs At the moment a naive compare of the all the interface data is done. So, if any IPs move then the reference counts for the the relevant interfaces change, interfaces appear to have changed and another takeover run is initiated by each node that took/released IPs. This change stops the spurious takeover runs by changing the interface comparison to ignore the reference counts. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 0b7257642f62ebd83c05b6e2922f0dc2737f175c) 2013-02-21 03:43:35 +04:00			`static bool interfaces_have_changed(struct ctdb_context *ctdb,`
			`struct ctdb_recoverd *rec)`
			`{`
ctdb-daemon: Rename struct ctdb_control_get_ifaces to ctdb_iface_list_old Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-28 11:43:48 +03:00			`struct ctdb_iface_list_old *ifaces = NULL;`
recoverd: Interface reference count changes should not cause takeover runs At the moment a naive compare of the all the interface data is done. So, if any IPs move then the reference counts for the the relevant interfaces change, interfaces appear to have changed and another takeover run is initiated by each node that took/released IPs. This change stops the spurious takeover runs by changing the interface comparison to ignore the reference counts. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 0b7257642f62ebd83c05b6e2922f0dc2737f175c) 2013-02-21 03:43:35 +04:00			`TALLOC_CTX *mem_ctx;`
			`bool ret = false;`

			`mem_ctx = talloc_new(NULL);`

			`/* Read the interfaces from the local node */`
			`if (ctdb_ctrl_get_ifaces(ctdb, CONTROL_TIMEOUT(),`
			`CTDB_CURRENT_NODE, mem_ctx, &ifaces) != 0) {`
			`DEBUG(DEBUG_ERR, ("Unable to get interfaces from local node %u\n", ctdb->pnn));`
			`/* We could return an error. However, this will be`
			`* rare so we'll decide that the interfaces have`
			`* actually changed, just in case.`
			`*/`
			`talloc_free(mem_ctx);`
			`return true;`
			`}`

			`if (!rec->ifaces) {`
			`/* We haven't been here before so things have changed */`
recoverd: Log more information when interfaces change Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 3ef93a1a3e60cdf5d8954e7a16a988ea6126916b) 2013-08-15 11:04:01 +04:00			`DEBUG(DEBUG_NOTICE, ("Initial interface fetched\n"));`
recoverd: Interface reference count changes should not cause takeover runs At the moment a naive compare of the all the interface data is done. So, if any IPs move then the reference counts for the the relevant interfaces change, interfaces appear to have changed and another takeover run is initiated by each node that took/released IPs. This change stops the spurious takeover runs by changing the interface comparison to ignore the reference counts. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 0b7257642f62ebd83c05b6e2922f0dc2737f175c) 2013-02-21 03:43:35 +04:00			`ret = true;`
			`} else if (rec->ifaces->num != ifaces->num) {`
			`/* Number of interfaces has changed */`
recoverd: Log more information when interfaces change Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 3ef93a1a3e60cdf5d8954e7a16a988ea6126916b) 2013-08-15 11:04:01 +04:00			`DEBUG(DEBUG_NOTICE, ("Interface count changed from %d to %d\n",`
			`rec->ifaces->num, ifaces->num));`
recoverd: Interface reference count changes should not cause takeover runs At the moment a naive compare of the all the interface data is done. So, if any IPs move then the reference counts for the the relevant interfaces change, interfaces appear to have changed and another takeover run is initiated by each node that took/released IPs. This change stops the spurious takeover runs by changing the interface comparison to ignore the reference counts. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 0b7257642f62ebd83c05b6e2922f0dc2737f175c) 2013-02-21 03:43:35 +04:00			`ret = true;`
			`} else {`
			`/* See if interface names or link states have changed */`
			`int i;`
			`for (i = 0; i < rec->ifaces->num; i++) {`
ctdb-daemon: Rename struct ctdb_control_iface_info to ctdb_iface Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-28 11:37:17 +03:00			`struct ctdb_iface * iface = &rec->ifaces->ifaces[i];`
recoverd: Log more information when interfaces change Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 3ef93a1a3e60cdf5d8954e7a16a988ea6126916b) 2013-08-15 11:04:01 +04:00			`if (strcmp(iface->name, ifaces->ifaces[i].name) != 0) {`
			`DEBUG(DEBUG_NOTICE,`
			`("Interface in slot %d changed: %s => %s\n",`
			`i, iface->name, ifaces->ifaces[i].name));`
			`ret = true;`
			`break;`
			`}`
			`if (iface->link_state != ifaces->ifaces[i].link_state) {`
			`DEBUG(DEBUG_NOTICE,`
			`("Interface %s changed state: %d => %d\n",`
			`iface->name, iface->link_state,`
			`ifaces->ifaces[i].link_state));`
recoverd: Interface reference count changes should not cause takeover runs At the moment a naive compare of the all the interface data is done. So, if any IPs move then the reference counts for the the relevant interfaces change, interfaces appear to have changed and another takeover run is initiated by each node that took/released IPs. This change stops the spurious takeover runs by changing the interface comparison to ignore the reference counts. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 0b7257642f62ebd83c05b6e2922f0dc2737f175c) 2013-02-21 03:43:35 +04:00			`ret = true;`
			`break;`
			`}`
			`}`
			`}`

			`talloc_free(rec->ifaces);`
			`rec->ifaces = talloc_steal(rec, ifaces);`

			`talloc_free(mem_ctx);`
			`return ret;`
			`}`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00
In the recovery daemon, keep track of which node we have assigned public ip addresses and verify that the remote nodes have/keep a consistent view of assigned addresses. If a remote node has an inconsistent view of addresses visavi the recovery master this will trigger a full ip reallocation. (This used to be ctdb commit f3bf2ab61f8dbbc806ec23a68a87aaedd458e712) 2010-04-08 08:07:57 +04:00			`/* called to check that the local allocation of public ip addresses is ok.`
track both when we last started and ended a recovery. make ctdb uptime print how long the recovery took in the recovery daemon when we check that the public ip address allocation on the local node is correct (we have the ips we should have and we dont have any we shouldnt have) use ctdb uptime and check the recovery start/stop times and make sure we dont check for ip allocation inconsistencies during a recovery where the ip address allocation is in flux. (This used to be ctdb commit f86551580349b7f662f9a07e4eb0c1189e38e429) 2008-07-02 07:55:59 +04:00			`*/`
ctdb-daemon: Rename struct ctdb_node_map to ctdb_node_map_old Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 09:22:48 +03:00			`static int verify_local_ip_allocation(struct ctdb_context ctdb, struct ctdb_recoverd rec, uint32_t pnn, struct ctdb_node_map_old *nodemap)`
track both when we last started and ended a recovery. make ctdb uptime print how long the recovery took in the recovery daemon when we check that the public ip address allocation on the local node is correct (we have the ips we should have and we dont have any we shouldnt have) use ctdb uptime and check the recovery start/stop times and make sure we dont check for ip allocation inconsistencies during a recovery where the ip address allocation is in flux. (This used to be ctdb commit f86551580349b7f662f9a07e4eb0c1189e38e429) 2008-07-02 07:55:59 +04:00			`{`
			`TALLOC_CTX *mem_ctx = talloc_new(NULL);`
			`int ret, j;`
server: only trigger one takeover run in verify_ip_allocation() metze (This used to be ctdb commit 10bc087d0280057962177721bdd6d4f28743b311) 2009-12-22 17:21:08 +03:00			`bool need_takeover_run = false;`
track both when we last started and ended a recovery. make ctdb uptime print how long the recovery took in the recovery daemon when we check that the public ip address allocation on the local node is correct (we have the ips we should have and we dont have any we shouldnt have) use ctdb uptime and check the recovery start/stop times and make sure we dont check for ip allocation inconsistencies during a recovery where the ip address allocation is in flux. (This used to be ctdb commit f86551580349b7f662f9a07e4eb0c1189e38e429) 2008-07-02 07:55:59 +04:00
recoverd: Interface reference count changes should not cause takeover runs At the moment a naive compare of the all the interface data is done. So, if any IPs move then the reference counts for the the relevant interfaces change, interfaces appear to have changed and another takeover run is initiated by each node that took/released IPs. This change stops the spurious takeover runs by changing the interface comparison to ignore the reference counts. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 0b7257642f62ebd83c05b6e2922f0dc2737f175c) 2013-02-21 03:43:35 +04:00			`if (interfaces_have_changed(ctdb, rec)) {`
server: monitor interfaces in verify_ip_allocation() metze (This used to be ctdb commit 965a65520693e3731b5b0250127b04c777087808) 2009-12-22 17:21:08 +03:00			`DEBUG(DEBUG_NOTICE, ("The interfaces status has changed on "`
			`"local node %u - force takeover run\n",`
			`pnn));`
			`need_takeover_run = true;`
			`}`

track both when we last started and ended a recovery. make ctdb uptime print how long the recovery took in the recovery daemon when we check that the public ip address allocation on the local node is correct (we have the ips we should have and we dont have any we shouldnt have) use ctdb uptime and check the recovery start/stop times and make sure we dont check for ip allocation inconsistencies during a recovery where the ip address allocation is in flux. (This used to be ctdb commit f86551580349b7f662f9a07e4eb0c1189e38e429) 2008-07-02 07:55:59 +04:00			`/* verify that we have the ip addresses we should have`
Fix various spelling errors Reviewed-by: Andrew Bartlett <abartlet@samba.org> Reviewed-by: Michael Adam <obnox@samba.org> Autobuild-User(master): Andrew Bartlett <abartlet@samba.org> Autobuild-Date(master): Fri Nov 6 13:43:45 CET 2015 on sn-devel-104 2015-07-27 00:02:57 +03:00			`and we don't have ones we shouldnt have.`
track both when we last started and ended a recovery. make ctdb uptime print how long the recovery took in the recovery daemon when we check that the public ip address allocation on the local node is correct (we have the ips we should have and we dont have any we shouldnt have) use ctdb uptime and check the recovery start/stop times and make sure we dont check for ip allocation inconsistencies during a recovery where the ip address allocation is in flux. (This used to be ctdb commit f86551580349b7f662f9a07e4eb0c1189e38e429) 2008-07-02 07:55:59 +04:00			`if we find an inconsistency we set recmode to`
			`active on the local node and wait for the recmaster`
during ip allocation, there are failure modes where a node might hold a ip address but thinks it is still unassigned (-1). add code to the recovery daemon to detect this case and trigger a reallocation so that the ip gets covered and change the takeip code to allow for this condition, taking on an ip address that is already hosted. cq s1021073 (This used to be ctdb commit 9020baf27cab7821c9094cda185206fb7af0fee7) 2010-12-03 05:28:35 +03:00			`to do a full blown recovery.`
			`also if the pnn is -1 and we are healthy and can host the ip`
			`we also request a ip reallocation.`
track both when we last started and ended a recovery. make ctdb uptime print how long the recovery took in the recovery daemon when we check that the public ip address allocation on the local node is correct (we have the ips we should have and we dont have any we shouldnt have) use ctdb uptime and check the recovery start/stop times and make sure we dont check for ip allocation inconsistencies during a recovery where the ip address allocation is in flux. (This used to be ctdb commit f86551580349b7f662f9a07e4eb0c1189e38e429) 2008-07-02 07:55:59 +04:00			`*/`
dont check the public ip assignment or if even we are hosting them and shouldnt when public ips have been disabled (This used to be ctdb commit 7d07a74dc7f907ac757d20626f68e257d7ba16be) 2010-11-10 04:06:05 +03:00			`if (ctdb->tunable.disable_ip_failover == 0) {`
ctdb-daemon: Rename struct ctdb_all_public_ips to ctdb_public_ip_list_old Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-28 09:16:24 +03:00			`struct ctdb_public_ip_list_old *ips = NULL;`
recoverd: Verifying local IPs should only check for unhosted available IPs Currently it checks for unhosted IPs among the known IPs rather than available IPs. This means that a takeover run can be flagged even when that takeover run will be unable to assign a known, unhosted IP. Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 3cc878bc97fdac764a60ed805f64d649eaab06e8) 2012-10-11 08:17:54 +04:00
			`/* read the available IPs from the local node */`
			`ret = ctdb_ctrl_get_public_ips_flags(ctdb, CONTROL_TIMEOUT(), CTDB_CURRENT_NODE, mem_ctx, CTDB_PUBLIC_IP_FLAGS_ONLY_AVAILABLE, &ips);`
			`if (ret != 0) {`
			`DEBUG(DEBUG_ERR, ("Unable to get available public IPs from local node %u\n", pnn));`
			`talloc_free(mem_ctx);`
			`return -1;`
			`}`

dont check the public ip assignment or if even we are hosting them and shouldnt when public ips have been disabled (This used to be ctdb commit 7d07a74dc7f907ac757d20626f68e257d7ba16be) 2010-11-10 04:06:05 +03:00			`for (j=0; j<ips->num; j++) {`
recoverd: Verifying local IPs should only check for unhosted available IPs Currently it checks for unhosted IPs among the known IPs rather than available IPs. This means that a takeover run can be flagged even when that takeover run will be unable to assign a known, unhosted IP. Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 3cc878bc97fdac764a60ed805f64d649eaab06e8) 2012-10-11 08:17:54 +04:00			`if (ips->ips[j].pnn == -1 &&`
			`nodemap->nodes[pnn].flags == 0) {`
			`DEBUG(DEBUG_CRIT,("Public IP '%s' is not assigned and we could serve it\n",`
			`ctdb_addr_to_str(&ips->ips[j].addr)));`
during ip allocation, there are failure modes where a node might hold a ip address but thinks it is still unassigned (-1). add code to the recovery daemon to detect this case and trigger a reallocation so that the ip gets covered and change the takeip code to allow for this condition, taking on an ip address that is already hosted. cq s1021073 (This used to be ctdb commit 9020baf27cab7821c9094cda185206fb7af0fee7) 2010-12-03 05:28:35 +03:00			`need_takeover_run = true;`
recoverd: Verifying local IPs should only check for unhosted available IPs Currently it checks for unhosted IPs among the known IPs rather than available IPs. This means that a takeover run can be flagged even when that takeover run will be unable to assign a known, unhosted IP. Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 3cc878bc97fdac764a60ed805f64d649eaab06e8) 2012-10-11 08:17:54 +04:00			`}`
			`}`

			`talloc_free(ips);`

			`/* read the known IPs from the local node */`
			`ret = ctdb_ctrl_get_public_ips_flags(ctdb, CONTROL_TIMEOUT(), CTDB_CURRENT_NODE, mem_ctx, 0, &ips);`
			`if (ret != 0) {`
			`DEBUG(DEBUG_ERR, ("Unable to get known public IPs from local node %u\n", pnn));`
			`talloc_free(mem_ctx);`
			`return -1;`
			`}`

			`for (j=0; j<ips->num; j++) {`
			`if (ips->ips[j].pnn == pnn) {`
recoverd: Fix spurious warnings when running with --nopublicipcheck Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 7f8096f56d8274151705ac822b582d972078f8fe) 2012-04-04 08:42:23 +04:00			`if (ctdb->do_checkpublicip && !ctdb_sys_have_ip(&ips->ips[j].addr)) {`
recoverd: Verifying local IPs should only check for unhosted available IPs Currently it checks for unhosted IPs among the known IPs rather than available IPs. This means that a takeover run can be flagged even when that takeover run will be unable to assign a known, unhosted IP. Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 3cc878bc97fdac764a60ed805f64d649eaab06e8) 2012-10-11 08:17:54 +04:00			`DEBUG(DEBUG_CRIT,("Public IP '%s' is assigned to us but not on an interface\n",`
dont check the public ip assignment or if even we are hosting them and shouldnt when public ips have been disabled (This used to be ctdb commit 7d07a74dc7f907ac757d20626f68e257d7ba16be) 2010-11-10 04:06:05 +03:00			`ctdb_addr_to_str(&ips->ips[j].addr)));`
			`need_takeover_run = true;`
			`}`
			`} else {`
recoverd: Verifying local IPs should only check for unhosted available IPs Currently it checks for unhosted IPs among the known IPs rather than available IPs. This means that a takeover run can be flagged even when that takeover run will be unable to assign a known, unhosted IP. Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 3cc878bc97fdac764a60ed805f64d649eaab06e8) 2012-10-11 08:17:54 +04:00			`if (ctdb->do_checkpublicip &&`
			`ctdb_sys_have_ip(&ips->ips[j].addr)) {`
When we find an ip we shouldnt host, just release it Dont call a full blown clusterwide ipreallocation, just release it locally (This used to be ctdb commit 9a806dec8687e2ec08a308853b61af6aed5e5d1e) 2012-06-20 09:10:05 +04:00
recoverd: Verifying local IPs should only check for unhosted available IPs Currently it checks for unhosted IPs among the known IPs rather than available IPs. This means that a takeover run can be flagged even when that takeover run will be unable to assign a known, unhosted IP. Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 3cc878bc97fdac764a60ed805f64d649eaab06e8) 2012-10-11 08:17:54 +04:00			`DEBUG(DEBUG_CRIT,("We are still serving a public IP '%s' that we should not be serving. Removing it\n",`
dont check the public ip assignment or if even we are hosting them and shouldnt when public ips have been disabled (This used to be ctdb commit 7d07a74dc7f907ac757d20626f68e257d7ba16be) 2010-11-10 04:06:05 +03:00			`ctdb_addr_to_str(&ips->ips[j].addr)));`
When we find an ip we shouldnt host, just release it Dont call a full blown clusterwide ipreallocation, just release it locally (This used to be ctdb commit 9a806dec8687e2ec08a308853b61af6aed5e5d1e) 2012-06-20 09:10:05 +04:00
			`if (ctdb_ctrl_release_ip(ctdb, CONTROL_TIMEOUT(), CTDB_CURRENT_NODE, &ips->ips[j]) != 0) {`
recoverd: Verifying local IPs should only check for unhosted available IPs Currently it checks for unhosted IPs among the known IPs rather than available IPs. This means that a takeover run can be flagged even when that takeover run will be unable to assign a known, unhosted IP. Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 3cc878bc97fdac764a60ed805f64d649eaab06e8) 2012-10-11 08:17:54 +04:00			`DEBUG(DEBUG_ERR,("Failed to release local IP address\n"));`
When we find an ip we shouldnt host, just release it Dont call a full blown clusterwide ipreallocation, just release it locally (This used to be ctdb commit 9a806dec8687e2ec08a308853b61af6aed5e5d1e) 2012-06-20 09:10:05 +04:00			`}`
dont check the public ip assignment or if even we are hosting them and shouldnt when public ips have been disabled (This used to be ctdb commit 7d07a74dc7f907ac757d20626f68e257d7ba16be) 2010-11-10 04:06:05 +03:00			`}`
track both when we last started and ended a recovery. make ctdb uptime print how long the recovery took in the recovery daemon when we check that the public ip address allocation on the local node is correct (we have the ips we should have and we dont have any we shouldnt have) use ctdb uptime and check the recovery start/stop times and make sure we dont check for ip allocation inconsistencies during a recovery where the ip address allocation is in flux. (This used to be ctdb commit f86551580349b7f662f9a07e4eb0c1189e38e429) 2008-07-02 07:55:59 +04:00			`}`
			`}`
			`}`

server: only trigger one takeover run in verify_ip_allocation() metze (This used to be ctdb commit 10bc087d0280057962177721bdd6d4f28743b311) 2009-12-22 17:21:08 +03:00			`if (need_takeover_run) {`
ctdb-daemon: Rename struct srvid_request to ctdb_srvid_message Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 06:32:49 +03:00			`struct ctdb_srvid_message rd;`
server: only trigger one takeover run in verify_ip_allocation() metze (This used to be ctdb commit 10bc087d0280057962177721bdd6d4f28743b311) 2009-12-22 17:21:08 +03:00			`TDB_DATA data;`

			`DEBUG(DEBUG_CRIT,("Trigger takeoverrun\n"));`

			`rd.pnn = ctdb->pnn;`
			`rd.srvid = 0;`
			`data.dptr = (uint8_t *)&rd;`
			`data.dsize = sizeof(rd);`

rename ctdb_send_message to ctdb_client_send_message to resolve colission with the function of the same name in libctdb (This used to be ctdb commit ac3292c12832484a22715f1d46aa23f3b7c8a6f6) 2010-06-02 03:45:21 +04:00			`ret = ctdb_client_send_message(ctdb, rec->recmaster, CTDB_SRVID_TAKEOVER_RUN, data);`
server: only trigger one takeover run in verify_ip_allocation() metze (This used to be ctdb commit 10bc087d0280057962177721bdd6d4f28743b311) 2009-12-22 17:21:08 +03:00			`if (ret != 0) {`
			`DEBUG(DEBUG_ERR,(__location__ " Failed to send ipreallocate to recmaster :%d\n", (int)rec->recmaster));`
			`}`
			`}`
track both when we last started and ended a recovery. make ctdb uptime print how long the recovery took in the recovery daemon when we check that the public ip address allocation on the local node is correct (we have the ips we should have and we dont have any we shouldnt have) use ctdb uptime and check the recovery start/stop times and make sure we dont check for ip allocation inconsistencies during a recovery where the ip address allocation is in flux. (This used to be ctdb commit f86551580349b7f662f9a07e4eb0c1189e38e429) 2008-07-02 07:55:59 +04:00			`talloc_free(mem_ctx);`
			`return 0;`
			`}`

redo and update how we synchronize flags across the cluster. this simplifies the code and should close a race condition between the local recovery daemon and a remote node when flags are changing. (This used to be ctdb commit 32d460b8469eb53145f04161a5d01166f9b5f09e) 2008-12-05 08:32:30 +03:00
			`static void async_getnodemap_callback(struct ctdb_context ctdb, uint32_t node_pnn, int32_t res, TDB_DATA outdata, void callback_data)`
			`{`
ctdb-daemon: Rename struct ctdb_node_map to ctdb_node_map_old Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 09:22:48 +03:00			`struct ctdb_node_map_old **remote_nodemaps = callback_data;`
redo and update how we synchronize flags across the cluster. this simplifies the code and should close a race condition between the local recovery daemon and a remote node when flags are changing. (This used to be ctdb commit 32d460b8469eb53145f04161a5d01166f9b5f09e) 2008-12-05 08:32:30 +03:00
			`if (node_pnn >= ctdb->num_nodes) {`
			`DEBUG(DEBUG_ERR,(__location__ " pnn from invalid node\n"));`
			`return;`
			`}`

ctdb-daemon: Rename struct ctdb_node_map to ctdb_node_map_old Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 09:22:48 +03:00			`remote_nodemaps[node_pnn] = (struct ctdb_node_map_old *)talloc_steal(remote_nodemaps, outdata.dptr);`
redo and update how we synchronize flags across the cluster. this simplifies the code and should close a race condition between the local recovery daemon and a remote node when flags are changing. (This used to be ctdb commit 32d460b8469eb53145f04161a5d01166f9b5f09e) 2008-12-05 08:32:30 +03:00
			`}`

			`static int get_remote_nodemaps(struct ctdb_context ctdb, TALLOC_CTX mem_ctx,`
ctdb-daemon: Rename struct ctdb_node_map to ctdb_node_map_old Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 09:22:48 +03:00			`struct ctdb_node_map_old *nodemap,`
			`struct ctdb_node_map_old **remote_nodemaps)`
redo and update how we synchronize flags across the cluster. this simplifies the code and should close a race condition between the local recovery daemon and a remote node when flags are changing. (This used to be ctdb commit 32d460b8469eb53145f04161a5d01166f9b5f09e) 2008-12-05 08:32:30 +03:00			`{`
			`uint32_t *nodes;`

			`nodes = list_of_active_nodes(ctdb, nodemap, mem_ctx, true);`
			`if (ctdb_client_async_control(ctdb, CTDB_CONTROL_GET_NODEMAP,`
initial attempt at freezing databases in priority order (This used to be ctdb commit e8d692590da1070c87a4144031e3306d190ebed2) 2009-10-12 05:08:39 +04:00			`nodes, 0,`
redo and update how we synchronize flags across the cluster. this simplifies the code and should close a race condition between the local recovery daemon and a remote node when flags are changing. (This used to be ctdb commit 32d460b8469eb53145f04161a5d01166f9b5f09e) 2008-12-05 08:32:30 +03:00			`CONTROL_TIMEOUT(), false, tdb_null,`
			`async_getnodemap_callback,`
			`NULL,`
update to the flags handling make sure to abort the monitoring and restart if we failed to get the nodemap from a remote node (This used to be ctdb commit 4eac0214e732e6c2f867d66ec71d4406680dbb94) 2008-12-09 02:45:14 +03:00			`remote_nodemaps) != 0) {`
redo and update how we synchronize flags across the cluster. this simplifies the code and should close a race condition between the local recovery daemon and a remote node when flags are changing. (This used to be ctdb commit 32d460b8469eb53145f04161a5d01166f9b5f09e) 2008-12-05 08:32:30 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to pull all remote nodemaps\n"));`

			`return -1;`
			`}`

			`return 0;`
			`}`

update the recovery daemon to read the recovery lock file off the main daemon and handle when the file is changed/enabled/disabled (This used to be ctdb commit 31acc11a6389d4dd9f7b71b7cfa2f2450076f1f7) 2009-06-25 06:55:43 +04:00			`static int update_recovery_lock_file(struct ctdb_context *ctdb)`
			`{`
			`TALLOC_CTX *tmp_ctx = talloc_new(NULL);`
			`const char *reclockfile;`

			`if (ctdb_ctrl_getreclock(ctdb, CONTROL_TIMEOUT(), CTDB_CURRENT_NODE, tmp_ctx, &reclockfile) != 0) {`
			`DEBUG(DEBUG_ERR,("Failed to read reclock file from daemon\n"));`
			`talloc_free(tmp_ctx);`
			`return -1;`
			`}`

			`if (reclockfile == NULL) {`
			`if (ctdb->recovery_lock_file != NULL) {`
ctdb-recoverd: Improve logging when recovery lock file is changed Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2014-12-09 06:09:40 +03:00			`DEBUG(DEBUG_NOTICE,("Recovery lock file disabled\n"));`
update the recovery daemon to read the recovery lock file off the main daemon and handle when the file is changed/enabled/disabled (This used to be ctdb commit 31acc11a6389d4dd9f7b71b7cfa2f2450076f1f7) 2009-06-25 06:55:43 +04:00			`talloc_free(ctdb->recovery_lock_file);`
			`ctdb->recovery_lock_file = NULL;`
ctdb-recoverd: New function ctdb_recovery_unlock() Unlock the recovery lock file. This way knowledge of the file descriptor isn't sprinkled throughout the code. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2014-12-09 06:07:20 +03:00			`ctdb_recovery_unlock(ctdb);`
update the recovery daemon to read the recovery lock file off the main daemon and handle when the file is changed/enabled/disabled (This used to be ctdb commit 31acc11a6389d4dd9f7b71b7cfa2f2450076f1f7) 2009-06-25 06:55:43 +04:00			`}`
			`talloc_free(tmp_ctx);`
			`return 0;`
			`}`

			`if (ctdb->recovery_lock_file == NULL) {`
ctdb-recoverd: Improve logging when recovery lock file is changed Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2014-12-09 06:09:40 +03:00			`DEBUG(DEBUG_NOTICE,`
			`("Recovery lock file enabled (%s)\n", reclockfile));`
update the recovery daemon to read the recovery lock file off the main daemon and handle when the file is changed/enabled/disabled (This used to be ctdb commit 31acc11a6389d4dd9f7b71b7cfa2f2450076f1f7) 2009-06-25 06:55:43 +04:00			`ctdb->recovery_lock_file = talloc_strdup(ctdb, reclockfile);`
ctdb-recoverd: New function ctdb_recovery_unlock() Unlock the recovery lock file. This way knowledge of the file descriptor isn't sprinkled throughout the code. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2014-12-09 06:07:20 +03:00			`ctdb_recovery_unlock(ctdb);`
update the recovery daemon to read the recovery lock file off the main daemon and handle when the file is changed/enabled/disabled (This used to be ctdb commit 31acc11a6389d4dd9f7b71b7cfa2f2450076f1f7) 2009-06-25 06:55:43 +04:00			`talloc_free(tmp_ctx);`
			`return 0;`
			`}`


			`if (!strcmp(reclockfile, ctdb->recovery_lock_file)) {`
			`talloc_free(tmp_ctx);`
			`return 0;`
			`}`

ctdb-recoverd: Improve logging when recovery lock file is changed Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2014-12-09 06:09:40 +03:00			`DEBUG(DEBUG_NOTICE,`
			`("Recovery lock file changed (now %s)\n", reclockfile));`
update the recovery daemon to read the recovery lock file off the main daemon and handle when the file is changed/enabled/disabled (This used to be ctdb commit 31acc11a6389d4dd9f7b71b7cfa2f2450076f1f7) 2009-06-25 06:55:43 +04:00			`talloc_free(ctdb->recovery_lock_file);`
			`ctdb->recovery_lock_file = talloc_strdup(ctdb, reclockfile);`
ctdb-recoverd: New function ctdb_recovery_unlock() Unlock the recovery lock file. This way knowledge of the file descriptor isn't sprinkled throughout the code. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2014-12-09 06:07:20 +03:00			`ctdb_recovery_unlock(ctdb);`
update the recovery daemon to read the recovery lock file off the main daemon and handle when the file is changed/enabled/disabled (This used to be ctdb commit 31acc11a6389d4dd9f7b71b7cfa2f2450076f1f7) 2009-06-25 06:55:43 +04:00
			`talloc_free(tmp_ctx);`
			`return 0;`
			`}`
speed startup: alter recovery loop We do a recovery on startup. But the code does: Sleep for ctdb->tunable.recover_interval. Check for recovery. We want to do it in the other order. This is best done by extracting the loop into a separate "main_loop" function. Seconds between ctdbd first log message and node healthy: BEFORE: 24.09 AFTER: 23.58 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 097046025176b9fcb670839d1a9f100f890e7ed2) 2010-06-22 17:20:23 +04:00
			`static void main_loop(struct ctdb_context ctdb, struct ctdb_recoverd rec,`
			`TALLOC_CTX *mem_ctx)`
start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00			`{`
change recmaster from being a local variable in monitor_cluster() to be a member of the ctdb_recoverd structure (This used to be ctdb commit b7f955338f50c92374b4f559268fb3a1a516aefa) 2008-03-02 23:53:46 +03:00			`uint32_t pnn;`
ctdb-daemon: Rename struct ctdb_node_map to ctdb_node_map_old Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 09:22:48 +03:00			`struct ctdb_node_map_old *nodemap=NULL;`
			`struct ctdb_node_map_old *recmaster_nodemap=NULL;`
			`struct ctdb_node_map_old **remote_nodemaps=NULL;`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`struct ctdb_vnn_map *vnnmap=NULL;`
			`struct ctdb_vnn_map *remote_vnnmap=NULL;`
ctdb-recoverd: Make num_lmasters a local variable It isn't used anywhere else and is always re-initialised to 0. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-03-29 09:49:02 +03:00			`uint32_t num_lmasters;`
read the current debuglevel in each loop in the recovery daemon so that we pick up when they change in the parent daemon (This used to be ctdb commit 792d5471ff0c2947b6e66183925860de27f30eaf) 2008-02-18 11:38:04 +03:00			`int32_t debug_level;`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`int i, j, ret;`
recoverd: Don't continue if the current node gets banned Can not continue with recovery or monitoring cluster. Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 14399de1dd0bd8dabf1f48b1457e3ccb37589d8a) 2013-06-28 10:31:07 +04:00			`bool self_ban;`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00
make recovery daemon values tunable (This used to be ctdb commit ec29dbf2f5110428df8b97801443ba7addf61353) 2007-06-04 14:22:44 +04:00
merge from ronnie (This used to be ctdb commit 0aa6e04438aa5ec727815689baa19544df042cf7) 2008-01-07 08:17:22 +03:00			`/* verify that the main daemon is still running */`
Track all child process so we never send a signal to an unrelated process (our child died and kernel wrapped the pid-space and reused the pid for a different process Wrap all creation of child processes inside ctdb_fork() which is used to track all processes we have spawned. Capture SIGCHLD to track also which child processes have terminated. Wrap kill() inside ctdb_kill() and make sure that we never send a !0 signal to a child process pid that has already terminated (and might have been replaced with a (This used to be ctdb commit f73a4b1495830bcdd094a93732a89dd53b3c2f78) 2012-05-03 05:42:41 +04:00			`if (ctdb_kill(ctdb, ctdb->ctdbd_pid, 0) != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_CRIT,("CTDB daemon is no longer available. Shutting down recovery daemon\n"));`
merge from ronnie (This used to be ctdb commit 0aa6e04438aa5ec727815689baa19544df042cf7) 2008-01-07 08:17:22 +03:00			`exit(-1);`
			`}`

additional monitoring between the two daemons. we currently only monitor that the dameons are running by kill(0, pid) and verifying the the domain socket between them is ok. this is not sufficient since we can have a situation where the recovery daemon is hung. this new code monitors that the recovery daemon is operating. if the recovery hangs, we log this and shut down the main daemon (This used to be ctdb commit cd69d292292eaab3aac0e9d9fc57cb621597c63c) 2008-09-09 07:44:46 +04:00			`/* ping the local daemon to tell it we are alive */`
			`ctdb_ctrl_recd_ping(ctdb);`

make election handling much more scalable (This used to be ctdb commit 05938d462b92bd9ecb8e35f53651bded47c48675) 2007-11-13 02:27:44 +03:00			`if (rec->election_timeout) {`
			`/* an election is in progress */`
speed startup: alter recovery loop We do a recovery on startup. But the code does: Sleep for ctdb->tunable.recover_interval. Check for recovery. We want to do it in the other order. This is best done by extracting the loop into a separate "main_loop" function. Seconds between ctdbd first log message and node healthy: BEFORE: 24.09 AFTER: 23.58 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 097046025176b9fcb670839d1a9f100f890e7ed2) 2010-06-22 17:20:23 +04:00			`return;`
make election handling much more scalable (This used to be ctdb commit 05938d462b92bd9ecb8e35f53651bded47c48675) 2007-11-13 02:27:44 +03:00			`}`

read the current debuglevel in each loop in the recovery daemon so that we pick up when they change in the parent daemon (This used to be ctdb commit 792d5471ff0c2947b6e66183925860de27f30eaf) 2008-02-18 11:38:04 +03:00			`/* read the debug level from the parent and update locally */`
			`ret = ctdb_ctrl_get_debuglevel(ctdb, CTDB_CURRENT_NODE, &debug_level);`
			`if (ret !=0) {`
			`DEBUG(DEBUG_ERR, (__location__ " Failed to read debuglevel from parent\n"));`
speed startup: alter recovery loop We do a recovery on startup. But the code does: Sleep for ctdb->tunable.recover_interval. Check for recovery. We want to do it in the other order. This is best done by extracting the loop into a separate "main_loop" function. Seconds between ctdbd first log message and node healthy: BEFORE: 24.09 AFTER: 23.58 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 097046025176b9fcb670839d1a9f100f890e7ed2) 2010-06-22 17:20:23 +04:00			`return;`
read the current debuglevel in each loop in the recovery daemon so that we pick up when they change in the parent daemon (This used to be ctdb commit 792d5471ff0c2947b6e66183925860de27f30eaf) 2008-02-18 11:38:04 +03:00			`}`
ctdb-logging: Change LogLevel to DEBUGLEVEL For compatibility with current Samba debug.[ch]. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Volker Lendecke <vl@samba.org> 2014-09-24 11:12:56 +04:00			`DEBUGLEVEL = debug_level;`
read the current debuglevel in each loop in the recovery daemon so that we pick up when they change in the parent daemon (This used to be ctdb commit 792d5471ff0c2947b6e66183925860de27f30eaf) 2008-02-18 11:38:04 +03:00
make recovery daemon values tunable (This used to be ctdb commit ec29dbf2f5110428df8b97801443ba7addf61353) 2007-06-04 14:22:44 +04:00			`/* get relevant tunables */`
get all the tunables at once in recovery daemon (This used to be ctdb commit 8e60be6c22aab145e68b16ede5f32f4430c2af93) 2007-06-07 12:05:25 +04:00			`ret = ctdb_ctrl_get_all_tunables(ctdb, CONTROL_TIMEOUT(), CTDB_CURRENT_NODE, &ctdb->tunable);`
			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,("Failed to get tunables - retrying\n"));`
speed startup: alter recovery loop We do a recovery on startup. But the code does: Sleep for ctdb->tunable.recover_interval. Check for recovery. We want to do it in the other order. This is best done by extracting the loop into a separate "main_loop" function. Seconds between ctdbd first log message and node healthy: BEFORE: 24.09 AFTER: 23.58 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 097046025176b9fcb670839d1a9f100f890e7ed2) 2010-06-22 17:20:23 +04:00			`return;`
get all the tunables at once in recovery daemon (This used to be ctdb commit 8e60be6c22aab145e68b16ede5f32f4430c2af93) 2007-06-07 12:05:25 +04:00			`}`
start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00
ctdb-recoverd: If obtaining recovery lock fails, try again When ctdb daemon starts up, it considers itself the recovery master and tries to do first recovery. However, it's possible that there is already a recovery master and the current node has not yet heard from it. So do not ban ourselves immediately if ctdb_recovery_lock() fails when doing first recovery. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2014-09-25 11:17:04 +04:00			`/* get runstate */`
			`ret = ctdb_ctrl_get_runstate(ctdb, CONTROL_TIMEOUT(),`
			`CTDB_CURRENT_NODE, &ctdb->runstate);`
			`if (ret != 0) {`
			`DEBUG(DEBUG_ERR, ("Failed to get runstate - retrying\n"));`
			`return;`
			`}`

update the recovery daemon to read the recovery lock file off the main daemon and handle when the file is changed/enabled/disabled (This used to be ctdb commit 31acc11a6389d4dd9f7b71b7cfa2f2450076f1f7) 2009-06-25 06:55:43 +04:00			`/* get the current recovery lock file from the server */`
			`if (update_recovery_lock_file(ctdb) != 0) {`
			`DEBUG(DEBUG_ERR,("Failed to update the recovery lock file\n"));`
speed startup: alter recovery loop We do a recovery on startup. But the code does: Sleep for ctdb->tunable.recover_interval. Check for recovery. We want to do it in the other order. This is best done by extracting the loop into a separate "main_loop" function. Seconds between ctdbd first log message and node healthy: BEFORE: 24.09 AFTER: 23.58 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 097046025176b9fcb670839d1a9f100f890e7ed2) 2010-06-22 17:20:23 +04:00			`return;`
update the recovery daemon to read the recovery lock file off the main daemon and handle when the file is changed/enabled/disabled (This used to be ctdb commit 31acc11a6389d4dd9f7b71b7cfa2f2450076f1f7) 2009-06-25 06:55:43 +04:00			`}`

			`/* Make sure that if recovery lock verification becomes disabled when`
Dont access the reclock file at all if VerifyRecoveryLock is zero and also make sure the reclock file is closed if the variable is cleared at runtime (This used to be ctdb commit a25f4888689a0725971606163d87c39a41669292) 2009-06-25 05:41:18 +04:00			`we close the file`
			`*/`
ctdb-daemon: Mark tunable VerifyRecoveryLock as obsolete It is pointless having a recovery lock but not sanity checking that it is working. Also, the logic that uses this tunable is confusing. In some places the recovery lock is released unnecessarily because the tunable isn't set. Simplify the logic by assuming that if a recovery lock is specified then it should be verified. Update documentation that references this tunable. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2014-12-09 05:47:42 +03:00			`if (ctdb->recovery_lock_file == NULL) {`
ctdb-recoverd: New function ctdb_recovery_unlock() Unlock the recovery lock file. This way knowledge of the file descriptor isn't sprinkled throughout the code. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2014-12-09 06:07:20 +03:00			`ctdb_recovery_unlock(ctdb);`
Dont access the reclock file at all if VerifyRecoveryLock is zero and also make sure the reclock file is closed if the variable is cleared at runtime (This used to be ctdb commit a25f4888689a0725971606163d87c39a41669292) 2009-06-25 05:41:18 +04:00			`}`

recoverd: Recovery daemon should use ctdb_get_pnn, which can't fail Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit c6fded59fa4da67f738a90fdacb51900e41801f9) 2013-07-08 06:45:31 +04:00			`pnn = ctdb_get_pnn(ctdb);`
start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00
add a test in the function that checks whether the cluster needs recovery or not that all active nodes are in normal mode. If we discover that some node is still in recoverymode it may indicate that a previous recovery ended prematurely and thus we should start a new recovery (This used to be ctdb commit c15517872e6c98c8c425a8d47d2b348ecb0620b0) 2007-05-06 22:41:12 +04:00			`/* get the vnnmap */`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`ret = ctdb_ctrl_getvnnmap(ctdb, CONTROL_TIMEOUT(), pnn, mem_ctx, &vnnmap);`
add a test in the function that checks whether the cluster needs recovery or not that all active nodes are in normal mode. If we discover that some node is still in recoverymode it may indicate that a previous recovery ended prematurely and thus we should start a new recovery (This used to be ctdb commit c15517872e6c98c8c425a8d47d2b348ecb0620b0) 2007-05-06 22:41:12 +04:00			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to get vnnmap from node %u\n", pnn));`
speed startup: alter recovery loop We do a recovery on startup. But the code does: Sleep for ctdb->tunable.recover_interval. Check for recovery. We want to do it in the other order. This is best done by extracting the loop into a separate "main_loop" function. Seconds between ctdbd first log message and node healthy: BEFORE: 24.09 AFTER: 23.58 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 097046025176b9fcb670839d1a9f100f890e7ed2) 2010-06-22 17:20:23 +04:00			`return;`
add a test in the function that checks whether the cluster needs recovery or not that all active nodes are in normal mode. If we discover that some node is still in recoverymode it may indicate that a previous recovery ended prematurely and thus we should start a new recovery (This used to be ctdb commit c15517872e6c98c8c425a8d47d2b348ecb0620b0) 2007-05-06 22:41:12 +04:00			`}`


start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00			`/* get number of nodes */`
add a new tunable : reclockpingperiod once every such interval : * the recovery master on each node will uppdate the "connected" count in the reclock count file (ctdb getreclock) * if the node thinks it is a recovery master but it detects another node that is DISCONNECTED but which still holds a lock to the reclock count file this may mean that we have a split cluster. if that other node that is DISCONNECTED but still holds the lock on hte reclock pnn count file, is MORE connected than the local node, yield the recmaster role and let the other half of the lcuster take over this add a second, last chance mechanism to detect split clusters. IF the cluster is split but GPFS is not yet split, this mechanism makes the largest half of the cluster become the active half. (This used to be ctdb commit 07af425f444531942cce8abff112c1524228d287) 2008-03-03 01:19:30 +03:00			`if (rec->nodemap) {`
			`talloc_free(rec->nodemap);`
			`rec->nodemap = NULL;`
			`nodemap=NULL;`
			`}`
			`ret = ctdb_ctrl_getnodemap(ctdb, CONTROL_TIMEOUT(), pnn, rec, &rec->nodemap);`
change the signature for ctdb_ctrl_getnodemap() so that a timeout parameter is added. change ctdb_get_connected_nodes in the same way (This used to be ctdb commit d85f23bcf4c1230225abb2f4a053c70b68d469aa) 2007-05-04 03:01:01 +04:00			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to get nodemap from node %u\n", pnn));`
speed startup: alter recovery loop We do a recovery on startup. But the code does: Sleep for ctdb->tunable.recover_interval. Check for recovery. We want to do it in the other order. This is best done by extracting the loop into a separate "main_loop" function. Seconds between ctdbd first log message and node healthy: BEFORE: 24.09 AFTER: 23.58 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 097046025176b9fcb670839d1a9f100f890e7ed2) 2010-06-22 17:20:23 +04:00			`return;`
change the signature for ctdb_ctrl_getnodemap() so that a timeout parameter is added. change ctdb_get_connected_nodes in the same way (This used to be ctdb commit d85f23bcf4c1230225abb2f4a053c70b68d469aa) 2007-05-04 03:01:01 +04:00			`}`
add a new tunable : reclockpingperiod once every such interval : * the recovery master on each node will uppdate the "connected" count in the reclock count file (ctdb getreclock) * if the node thinks it is a recovery master but it detects another node that is DISCONNECTED but which still holds a lock to the reclock count file this may mean that we have a split cluster. if that other node that is DISCONNECTED but still holds the lock on hte reclock pnn count file, is MORE connected than the local node, yield the recmaster role and let the other half of the lcuster take over this add a second, last chance mechanism to detect split clusters. IF the cluster is split but GPFS is not yet split, this mechanism makes the largest half of the cluster become the active half. (This used to be ctdb commit 07af425f444531942cce8abff112c1524228d287) 2008-03-03 01:19:30 +03:00			`nodemap = rec->nodemap;`
start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00
recoverd: Set node_flags information as soon as we get nodemap Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 8d622660a14c929e365d306147b378ea6ab92175) 2013-06-28 08:09:35 +04:00			`/* remember our own node flags */`
			`rec->node_flags = nodemap->nodes[pnn].flags;`

recoverd: Don't continue if the current node gets banned Can not continue with recovery or monitoring cluster. Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 14399de1dd0bd8dabf1f48b1457e3ccb37589d8a) 2013-06-28 10:31:07 +04:00			`ban_misbehaving_nodes(rec, &self_ban);`
			`if (self_ban) {`
			`DEBUG(DEBUG_NOTICE, ("This node was banned, restart main_loop\n"));`
			`return;`
			`}`
recoverd: Move code to ban other nodes after we get local node flags If a node gets banned first, then it should not ban other nodes. This code was moved up in main_loop to avoid waiting for nodemap from other nodes (commit 83b0261f2cb453195b86f547d360400103a8b795). To prevent a banned node from banning other nodes, we need to first get nodemap information from local node, so trying to ban other nodes can fail if we are already banned. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit ae1693905036ecdbc4594fde1f12500faae4a554) 2013-06-27 10:01:16 +04:00
recoverd: Also check if current node is in recovery when it is banned Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 6a9dbb8fb0f1f6e8c206189cdc2d33bb371ea2a8) 2013-06-28 08:02:44 +04:00			`/* if the local daemon is STOPPED or BANNED, we verify that the databases are`
recoverd: fix a comment typo Signed-off-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit 741944f118e98f178b860194eecb215180949d18) 2013-06-26 09:11:51 +04:00			`also frozen and that the recmode is set to active.`
recovery daemon needs to monitor when the local ctdb daemon is stopped and ensure that the databases gets frozen and the node enters recovery mode (This used to be ctdb commit 99f239f8b96c8c0a06ac8ca8b8083be96265865a) 2009-07-09 08:19:32 +04:00			`*/`
recoverd: Always do an early exit from main_loop if node is stopped or banned A stopped or banned node cannot do anything useful. So do not participate in any cluster activity and do not cause any unnecessary network traffic. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 2396981c4bcf30530aeb7f4395093cc202105b50) 2013-06-27 09:39:15 +04:00			`if (rec->node_flags & (NODE_FLAGS_STOPPED \| NODE_FLAGS_BANNED)) {`
recoverd: Stabilise the recovery master role On rare occasions when a node that has been inactive it will trigger an election when it becomes active again. If that node has been up for the longest then it will win the election and the recovery master role will spuriously move. While a node remains inactive we reset the priority time to discourage it from winning elections. The priority time will now reflect roughly how long the node has been active rather than how long it has been up. That means the most stable node is more likely to win elections. Having a stable recovery master means that disabling takeover runs while reloading IPs is more likely to succeed. It also improves the chances of being able to cache information in the recovery master - for example, between takeover runs. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit f0f48f22f45e4c82eba2582efae307e25385de81) 2013-09-17 06:00:26 +04:00			`/* If this node has become inactive then we want to`
			`* reduce the chances of it taking over the recovery`
			`* master role when it becomes active again. This`
			`* helps to stabilise the recovery master role so that`
			`* it stays on the most stable node.`
			`*/`
			`rec->priority_time = timeval_current();`

recovery daemon needs to monitor when the local ctdb daemon is stopped and ensure that the databases gets frozen and the node enters recovery mode (This used to be ctdb commit 99f239f8b96c8c0a06ac8ca8b8083be96265865a) 2009-07-09 08:19:32 +04:00			`ret = ctdb_ctrl_getrecmode(ctdb, mem_ctx, CONTROL_TIMEOUT(), CTDB_CURRENT_NODE, &ctdb->recovery_mode);`
			`if (ret != 0) {`
			`DEBUG(DEBUG_ERR,(__location__ " Failed to read recmode from local node\n"));`
			`}`
			`if (ctdb->recovery_mode == CTDB_RECOVERY_NORMAL) {`
recoverd: Also check if current node is in recovery when it is banned Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 6a9dbb8fb0f1f6e8c206189cdc2d33bb371ea2a8) 2013-06-28 08:02:44 +04:00			`DEBUG(DEBUG_ERR,("Node is stopped or banned but recovery mode is not active. Activate recovery mode and lock databases\n"));`
recovery daemon needs to monitor when the local ctdb daemon is stopped and ensure that the databases gets frozen and the node enters recovery mode (This used to be ctdb commit 99f239f8b96c8c0a06ac8ca8b8083be96265865a) 2009-07-09 08:19:32 +04:00
			`ret = ctdb_ctrl_setrecmode(ctdb, CONTROL_TIMEOUT(), CTDB_CURRENT_NODE, CTDB_RECOVERY_ACTIVE);`
			`if (ret != 0) {`
recoverd: Also check if current node is in recovery when it is banned Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 6a9dbb8fb0f1f6e8c206189cdc2d33bb371ea2a8) 2013-06-28 08:02:44 +04:00			`DEBUG(DEBUG_ERR,(__location__ " Failed to activate recovery mode in STOPPED or BANNED state\n"));`
recovery daemon needs to monitor when the local ctdb daemon is stopped and ensure that the databases gets frozen and the node enters recovery mode (This used to be ctdb commit 99f239f8b96c8c0a06ac8ca8b8083be96265865a) 2009-07-09 08:19:32 +04:00
speed startup: alter recovery loop We do a recovery on startup. But the code does: Sleep for ctdb->tunable.recover_interval. Check for recovery. We want to do it in the other order. This is best done by extracting the loop into a separate "main_loop" function. Seconds between ctdbd first log message and node healthy: BEFORE: 24.09 AFTER: 23.58 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 097046025176b9fcb670839d1a9f100f890e7ed2) 2010-06-22 17:20:23 +04:00			`return;`
recovery daemon needs to monitor when the local ctdb daemon is stopped and ensure that the databases gets frozen and the node enters recovery mode (This used to be ctdb commit 99f239f8b96c8c0a06ac8ca8b8083be96265865a) 2009-07-09 08:19:32 +04:00			`}`
ctdb-recoverd: Set recovery mode before freezing databases Setting recovery mode to active is the only correct way to inform recovery daemon to run database recovery. Only freezing databases without setting recovery mode should not trigger database recovery, as this mechanism is used in tool to implement wipedb/restoredb commands. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2014-05-06 08:24:52 +04:00			`ret = ctdb_ctrl_freeze(ctdb, CONTROL_TIMEOUT(), CTDB_CURRENT_NODE);`
			`if (ret != 0) {`
			`DEBUG(DEBUG_ERR,(__location__ " Failed to freeze node in STOPPED or BANNED state\n"));`
			`return;`
			`}`
recovery daemon needs to monitor when the local ctdb daemon is stopped and ensure that the databases gets frozen and the node enters recovery mode (This used to be ctdb commit 99f239f8b96c8c0a06ac8ca8b8083be96265865a) 2009-07-09 08:19:32 +04:00			`}`
recoverd: Always do an early exit from main_loop if node is stopped or banned A stopped or banned node cannot do anything useful. So do not participate in any cluster activity and do not cause any unnecessary network traffic. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 2396981c4bcf30530aeb7f4395093cc202105b50) 2013-06-27 09:39:15 +04:00
			`/* If this node is stopped or banned then it is not the recovery`
			`* master, so don't do anything. This prevents stopped or banned`
			`* node from starting election and sending unnecessary controls.`
			`*/`
			`return;`
recovery daemon needs to monitor when the local ctdb daemon is stopped and ensure that the databases gets frozen and the node enters recovery mode (This used to be ctdb commit 99f239f8b96c8c0a06ac8ca8b8083be96265865a) 2009-07-09 08:19:32 +04:00			`}`
recoverd: Always do an early exit from main_loop if node is stopped or banned A stopped or banned node cannot do anything useful. So do not participate in any cluster activity and do not cause any unnecessary network traffic. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 2396981c4bcf30530aeb7f4395093cc202105b50) 2013-06-27 09:39:15 +04:00
recoverd: Delay the initial election if node is started in stopped state Since there is an early exit if a node is stopped or banned, we can wait till the node becomes active to start initial election. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 593a17678fbd3109e118154b034d43b852659518) 2013-06-27 09:44:27 +04:00			`/* check which node is the recovery master */`
			`ret = ctdb_ctrl_getrecmaster(ctdb, mem_ctx, CONTROL_TIMEOUT(), pnn, &rec->recmaster);`
			`if (ret != 0) {`
			`DEBUG(DEBUG_ERR, (__location__ " Unable to get recmaster from node %u\n", pnn));`
			`return;`
			`}`

recoverd: Fix the implementation of CTDB_SRVID_REBALANCE_NODE The current implementation has a few flaws: * A takeover run is called unconditionally when the timer goes even if the recovery master role has moved. This means a node other than the recovery master can incorrectly do a takeover run. * The rebalancing target nodes are cleared in the setup for a takeover run, regardless of whether the takeover run succeeds. * The timer to force a rebalance isn't cleared if another takeover run occurs before the deadline. Any forced rebalancing will happen in the first takeover run and when the timer expires some time later then an unnecessary takeover run will occur. * If the recovery master role moves then the rebalancing data will stay on the original node and affect the next takeover run to occur if the recovery master role should come back to the original node. Instead, store an array of rebalance target nodes in the recovery master context. This is passed as an extra argument to ctdb_takeover_run() each time it is called and is cleared when a takeover run succeeds. The timer hangs off the array of rebalance target nodes, which is cleared if the node isn't the recovery master. This means that it is possible to lose rebalance data if the recovery master role moves. However, that's a difficult problem to solve. The best way of approaching it is probably to try to stop the recovery master role from jumping around unnecesarily when inactive nodes join the cluster. The long term solution is to avoid this nonsense completely. The IP allocation algorithm needs to cache state between runs so that it knows which nodes have just become healthy. This also needs recovery master stability. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit c51c1efe5fc7fa668597f2acd435dee16e410fc9) 2013-09-04 08:30:04 +04:00			`/* If we are not the recmaster then do some housekeeping */`
recoverd: Delay the initial election if node is started in stopped state Since there is an early exit if a node is stopped or banned, we can wait till the node becomes active to start initial election. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 593a17678fbd3109e118154b034d43b852659518) 2013-06-27 09:44:27 +04:00			`if (rec->recmaster != pnn) {`
recoverd: Fix the implementation of CTDB_SRVID_REBALANCE_NODE The current implementation has a few flaws: * A takeover run is called unconditionally when the timer goes even if the recovery master role has moved. This means a node other than the recovery master can incorrectly do a takeover run. * The rebalancing target nodes are cleared in the setup for a takeover run, regardless of whether the takeover run succeeds. * The timer to force a rebalance isn't cleared if another takeover run occurs before the deadline. Any forced rebalancing will happen in the first takeover run and when the timer expires some time later then an unnecessary takeover run will occur. * If the recovery master role moves then the rebalancing data will stay on the original node and affect the next takeover run to occur if the recovery master role should come back to the original node. Instead, store an array of rebalance target nodes in the recovery master context. This is passed as an extra argument to ctdb_takeover_run() each time it is called and is cleared when a takeover run succeeds. The timer hangs off the array of rebalance target nodes, which is cleared if the node isn't the recovery master. This means that it is possible to lose rebalance data if the recovery master role moves. However, that's a difficult problem to solve. The best way of approaching it is probably to try to stop the recovery master role from jumping around unnecesarily when inactive nodes join the cluster. The long term solution is to avoid this nonsense completely. The IP allocation algorithm needs to cache state between runs so that it knows which nodes have just become healthy. This also needs recovery master stability. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit c51c1efe5fc7fa668597f2acd435dee16e410fc9) 2013-09-04 08:30:04 +04:00			`/* Ignore any IP reallocate requests - only recmaster`
			`* processes them`
			`*/`
recoverd: Factor out the SRVID handling code The code that handles IP reallocate requests can be reused. This also changes the result back to a SRVID caller to the PNN on success or a negative error code on failure. None of the callers currently look at the result so this is harmless... but it will be useful later. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit e4eae6e3291baa299a1d0f733ab11b138ee699a3) 2013-08-16 14:02:34 +04:00			`TALLOC_FREE(rec->reallocate_requests);`
recoverd: Fix the implementation of CTDB_SRVID_REBALANCE_NODE The current implementation has a few flaws: * A takeover run is called unconditionally when the timer goes even if the recovery master role has moved. This means a node other than the recovery master can incorrectly do a takeover run. * The rebalancing target nodes are cleared in the setup for a takeover run, regardless of whether the takeover run succeeds. * The timer to force a rebalance isn't cleared if another takeover run occurs before the deadline. Any forced rebalancing will happen in the first takeover run and when the timer expires some time later then an unnecessary takeover run will occur. * If the recovery master role moves then the rebalancing data will stay on the original node and affect the next takeover run to occur if the recovery master role should come back to the original node. Instead, store an array of rebalance target nodes in the recovery master context. This is passed as an extra argument to ctdb_takeover_run() each time it is called and is cleared when a takeover run succeeds. The timer hangs off the array of rebalance target nodes, which is cleared if the node isn't the recovery master. This means that it is possible to lose rebalance data if the recovery master role moves. However, that's a difficult problem to solve. The best way of approaching it is probably to try to stop the recovery master role from jumping around unnecesarily when inactive nodes join the cluster. The long term solution is to avoid this nonsense completely. The IP allocation algorithm needs to cache state between runs so that it knows which nodes have just become healthy. This also needs recovery master stability. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit c51c1efe5fc7fa668597f2acd435dee16e410fc9) 2013-09-04 08:30:04 +04:00			`/* Clear any nodes that should be force rebalanced in`
			`* the next takeover run. If the recovery master role`
			`* has moved then we don't want to process these some`
			`* time in the future.`
			`*/`
			`TALLOC_FREE(rec->force_rebalance_nodes);`
recoverd: Delay the initial election if node is started in stopped state Since there is an early exit if a node is stopped or banned, we can wait till the node becomes active to start initial election. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 593a17678fbd3109e118154b034d43b852659518) 2013-06-27 09:44:27 +04:00			`}`

			`/* This is a special case. When recovery daemon is started, recmaster`
			`* is set to -1. If a node is not started in stopped state, then`
			`* start election to decide recovery master`
			`*/`
			`if (rec->recmaster == (uint32_t)-1) {`
			`DEBUG(DEBUG_NOTICE,(__location__ " Initial recovery master set - forcing election\n"));`
			`force_election(rec, pnn, nodemap);`
			`return;`
			`}`

recoverd: Update capabilities only if the current node is active Since we do an early return if a node is stopped or banned, move update capabilities code below the early return and just before we check the capabilities of current recovery master. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 93bcb6617e1024f810533e12390a572f51703ca0) 2013-06-27 09:33:49 +04:00			`/* update the capabilities for all nodes */`
ctdb-recoverd: Use capabilities API Simplify update_capabilities() using the capabilities API and store the capabilities in new field rec->caps rather than scattered around ctdb->nodes. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2014-07-31 09:26:03 +04:00			`ret = update_capabilities(rec, nodemap);`
recoverd: Update capabilities only if the current node is active Since we do an early return if a node is stopped or banned, move update capabilities code below the early return and just before we check the capabilities of current recovery master. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 93bcb6617e1024f810533e12390a572f51703ca0) 2013-06-27 09:33:49 +04:00			`if (ret != 0) {`
			`DEBUG(DEBUG_ERR, (__location__ " Unable to update node capabilities.\n"));`
			`return;`
			`}`

recoverd: try to become the recovery master if we have the capability, but the current master doesn't metze (cherry picked from commit 6ba8af28f8a8f79db65120a97d7157dcc5c7e083) Signed-off-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit ccd67cf7f26713e695000d89d9ce8cfa78bfe00f) 2011-06-21 17:49:30 +04:00			`/*`
recoverd: fix a comment in main_loop Signed-off-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit ac06c46e4a80c635f6094b5ac6f0bf3e3a02db95) 2013-06-21 19:57:37 +04:00			`* If the current recmaster does not have CTDB_CAP_RECMASTER,`
			`* but we have, then force an election and try to become the new`
			`* recmaster.`
recoverd: try to become the recovery master if we have the capability, but the current master doesn't metze (cherry picked from commit 6ba8af28f8a8f79db65120a97d7157dcc5c7e083) Signed-off-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit ccd67cf7f26713e695000d89d9ce8cfa78bfe00f) 2011-06-21 17:49:30 +04:00			`*/`
ctdb-recoverd: Use capabilities API Simplify update_capabilities() using the capabilities API and store the capabilities in new field rec->caps rather than scattered around ctdb->nodes. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2014-07-31 09:26:03 +04:00			`if (!ctdb_node_has_capabilities(rec->caps,`
			`rec->recmaster,`
			`CTDB_CAP_RECMASTER) &&`
recoverd: try to become the recovery master if we have the capability, but the current master doesn't metze (cherry picked from commit 6ba8af28f8a8f79db65120a97d7157dcc5c7e083) Signed-off-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit ccd67cf7f26713e695000d89d9ce8cfa78bfe00f) 2011-06-21 17:49:30 +04:00			`(rec->ctdb->capabilities & CTDB_CAP_RECMASTER) &&`
ctdb-recoverd: Use capabilities API Simplify update_capabilities() using the capabilities API and store the capabilities in new field rec->caps rather than scattered around ctdb->nodes. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2014-07-31 09:26:03 +04:00			`!(nodemap->nodes[pnn].flags & NODE_FLAGS_INACTIVE)) {`
recoverd: try to become the recovery master if we have the capability, but the current master doesn't metze (cherry picked from commit 6ba8af28f8a8f79db65120a97d7157dcc5c7e083) Signed-off-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit ccd67cf7f26713e695000d89d9ce8cfa78bfe00f) 2011-06-21 17:49:30 +04:00			`DEBUG(DEBUG_ERR, (__location__ " Current recmaster node %u does not have CAP_RECMASTER,"`
			`" but we (node %u) have - force an election\n",`
			`rec->recmaster, pnn));`
			`force_election(rec, pnn, nodemap);`
			`return;`
			`}`

recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`/* verify that the recmaster node is still active */`
			`for (j=0; j<nodemap->num; j++) {`
change recmaster from being a local variable in monitor_cluster() to be a member of the ctdb_recoverd structure (This used to be ctdb commit b7f955338f50c92374b4f559268fb3a1a516aefa) 2008-03-02 23:53:46 +03:00			`if (nodemap->nodes[j].pnn==rec->recmaster) {`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`break;`
			`}`
setup the random number generator a bit better (This used to be ctdb commit 708585eb0ed31b0df6543a1d7a20b82e751877c2) 2007-05-10 07:10:23 +04:00			`}`
- startup frozen, and do an initial recovery - fixed a bug in traverse - get a lock on the node list file in the recmaster recovery daemon (This used to be ctdb commit 162a5647535ad1cb3e8e5d4042a2784365fb1913) 2007-05-23 08:35:19 +04:00
			`if (j == nodemap->num) {`
change recmaster from being a local variable in monitor_cluster() to be a member of the ctdb_recoverd structure (This used to be ctdb commit b7f955338f50c92374b4f559268fb3a1a516aefa) 2008-03-02 23:53:46 +03:00			`DEBUG(DEBUG_ERR, ("Recmaster node %u not in list. Force reelection\n", rec->recmaster));`
add a new tunable : reclockpingperiod once every such interval : * the recovery master on each node will uppdate the "connected" count in the reclock count file (ctdb getreclock) * if the node thinks it is a recovery master but it detects another node that is DISCONNECTED but which still holds a lock to the reclock count file this may mean that we have a split cluster. if that other node that is DISCONNECTED but still holds the lock on hte reclock pnn count file, is MORE connected than the local node, yield the recmaster role and let the other half of the lcuster take over this add a second, last chance mechanism to detect split clusters. IF the cluster is split but GPFS is not yet split, this mechanism makes the largest half of the cluster become the active half. (This used to be ctdb commit 07af425f444531942cce8abff112c1524228d287) 2008-03-03 01:19:30 +03:00			`force_election(rec, pnn, nodemap);`
speed startup: alter recovery loop We do a recovery on startup. But the code does: Sleep for ctdb->tunable.recover_interval. Check for recovery. We want to do it in the other order. This is best done by extracting the loop into a separate "main_loop" function. Seconds between ctdbd first log message and node healthy: BEFORE: 24.09 AFTER: 23.58 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 097046025176b9fcb670839d1a9f100f890e7ed2) 2010-06-22 17:20:23 +04:00			`return;`
- startup frozen, and do an initial recovery - fixed a bug in traverse - get a lock on the node list file in the recmaster recovery daemon (This used to be ctdb commit 162a5647535ad1cb3e8e5d4042a2784365fb1913) 2007-05-23 08:35:19 +04:00			`}`

first check that recovery master is connected (we know this from our own flags) then pull the flags off recovery master before checking if it is banned (This used to be ctdb commit 94c1d234e57a40eda2d8b892dd9fbe1ffc4b3433) 2007-10-11 01:10:17 +04:00			`/* if recovery master is disconnected we must elect a new recmaster */`
			`if (nodemap->nodes[j].flags & NODE_FLAGS_DISCONNECTED) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_NOTICE, ("Recmaster node %u is disconnected. Force reelection\n", nodemap->nodes[j].pnn));`
add a new tunable : reclockpingperiod once every such interval : * the recovery master on each node will uppdate the "connected" count in the reclock count file (ctdb getreclock) * if the node thinks it is a recovery master but it detects another node that is DISCONNECTED but which still holds a lock to the reclock count file this may mean that we have a split cluster. if that other node that is DISCONNECTED but still holds the lock on hte reclock pnn count file, is MORE connected than the local node, yield the recmaster role and let the other half of the lcuster take over this add a second, last chance mechanism to detect split clusters. IF the cluster is split but GPFS is not yet split, this mechanism makes the largest half of the cluster become the active half. (This used to be ctdb commit 07af425f444531942cce8abff112c1524228d287) 2008-03-03 01:19:30 +03:00			`force_election(rec, pnn, nodemap);`
speed startup: alter recovery loop We do a recovery on startup. But the code does: Sleep for ctdb->tunable.recover_interval. Check for recovery. We want to do it in the other order. This is best done by extracting the loop into a separate "main_loop" function. Seconds between ctdbd first log message and node healthy: BEFORE: 24.09 AFTER: 23.58 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 097046025176b9fcb670839d1a9f100f890e7ed2) 2010-06-22 17:20:23 +04:00			`return;`
first check that recovery master is connected (we know this from our own flags) then pull the flags off recovery master before checking if it is banned (This used to be ctdb commit 94c1d234e57a40eda2d8b892dd9fbe1ffc4b3433) 2007-10-11 01:10:17 +04:00			`}`

recoverd: An inactive node should not force recovery master elections An inactive node can't become the recovery master. So if an inactive node notices that the recovery master is inactive, it shouldn't force an election for recovery master and nominate itself as a candidate. This can cause the recovery master to flip-flop between nodes when all nodes are inactive. If there is actually an active node then it will trigger the election. This is fairly cosmetic but is a step along the way towards ironing out weirdness when all nodes are stopped. Also, fix a related comment. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit e7dc10da3ced54ea9d719ad167ee42dcca8dce75) 2012-07-06 14:36:48 +04:00			`/* get nodemap from the recovery master to check if it is inactive */`
simplify election handling make sure we read and update the flags from all remote nodes before we reach the first codepath that can call do_recovery() since during do_recovery() we need to know what the flags are. (This used to be ctdb commit e85f3806483ea420559d449e0e4d81bec996740f) 2007-10-11 00:16:36 +04:00			`ret = ctdb_ctrl_getnodemap(ctdb, CONTROL_TIMEOUT(), nodemap->nodes[j].pnn,`
redo and update how we synchronize flags across the cluster. this simplifies the code and should close a race condition between the local recovery daemon and a remote node when flags are changing. (This used to be ctdb commit 32d460b8469eb53145f04161a5d01166f9b5f09e) 2008-12-05 08:32:30 +03:00			`mem_ctx, &recmaster_nodemap);`
simplify election handling make sure we read and update the flags from all remote nodes before we reach the first codepath that can call do_recovery() since during do_recovery() we need to know what the flags are. (This used to be ctdb commit e85f3806483ea420559d449e0e4d81bec996740f) 2007-10-11 00:16:36 +04:00			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to get nodemap from recovery master %u\n",`
simplify election handling make sure we read and update the flags from all remote nodes before we reach the first codepath that can call do_recovery() since during do_recovery() we need to know what the flags are. (This used to be ctdb commit e85f3806483ea420559d449e0e4d81bec996740f) 2007-10-11 00:16:36 +04:00			`nodemap->nodes[j].pnn));`
speed startup: alter recovery loop We do a recovery on startup. But the code does: Sleep for ctdb->tunable.recover_interval. Check for recovery. We want to do it in the other order. This is best done by extracting the loop into a separate "main_loop" function. Seconds between ctdbd first log message and node healthy: BEFORE: 24.09 AFTER: 23.58 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 097046025176b9fcb670839d1a9f100f890e7ed2) 2010-06-22 17:20:23 +04:00			`return;`
simplify election handling make sure we read and update the flags from all remote nodes before we reach the first codepath that can call do_recovery() since during do_recovery() we need to know what the flags are. (This used to be ctdb commit e85f3806483ea420559d449e0e4d81bec996740f) 2007-10-11 00:16:36 +04:00			`}`


recoverd: An inactive node should not force recovery master elections An inactive node can't become the recovery master. So if an inactive node notices that the recovery master is inactive, it shouldn't force an election for recovery master and nominate itself as a candidate. This can cause the recovery master to flip-flop between nodes when all nodes are inactive. If there is actually an active node then it will trigger the election. This is fairly cosmetic but is a step along the way towards ironing out weirdness when all nodes are stopped. Also, fix a related comment. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit e7dc10da3ced54ea9d719ad167ee42dcca8dce75) 2012-07-06 14:36:48 +04:00			`if ((recmaster_nodemap->nodes[j].flags & NODE_FLAGS_INACTIVE) &&`
			`(rec->node_flags & NODE_FLAGS_INACTIVE) == 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_NOTICE, ("Recmaster node %u no longer available. Force reelection\n", nodemap->nodes[j].pnn));`
recoverd: when the recmaster is banned, use that information when forcing an election When we trigger an election because the recmaster considers itself inactive, update our local nodemap with the recmaster's flags before calling force_election(). This way, we don't send the inactive node freeze commands (e.g.) that may fail and then lead to ourselves getting banned. The theory is that this should help avoiding banning loops. Signed-off-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit 932360992b08a5483d90c0590218ba0fd756119e) 2013-06-26 11:23:22 +04:00			`/*`
			`* update our nodemap to carry the recmaster's notion of`
			`* its own flags, so that we don't keep freezing the`
			`* inactive recmaster node...`
			`*/`
			`nodemap->nodes[j].flags = recmaster_nodemap->nodes[j].flags;`
add a new tunable : reclockpingperiod once every such interval : * the recovery master on each node will uppdate the "connected" count in the reclock count file (ctdb getreclock) * if the node thinks it is a recovery master but it detects another node that is DISCONNECTED but which still holds a lock to the reclock count file this may mean that we have a split cluster. if that other node that is DISCONNECTED but still holds the lock on hte reclock pnn count file, is MORE connected than the local node, yield the recmaster role and let the other half of the lcuster take over this add a second, last chance mechanism to detect split clusters. IF the cluster is split but GPFS is not yet split, this mechanism makes the largest half of the cluster become the active half. (This used to be ctdb commit 07af425f444531942cce8abff112c1524228d287) 2008-03-03 01:19:30 +03:00			`force_election(rec, pnn, nodemap);`
speed startup: alter recovery loop We do a recovery on startup. But the code does: Sleep for ctdb->tunable.recover_interval. Check for recovery. We want to do it in the other order. This is best done by extracting the loop into a separate "main_loop" function. Seconds between ctdbd first log message and node healthy: BEFORE: 24.09 AFTER: 23.58 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 097046025176b9fcb670839d1a9f100f890e7ed2) 2010-06-22 17:20:23 +04:00			`return;`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`}`
let each node verify that they have a correct assignment of public ip addresses (i.e. htey hold those they should hold and they dont hold any of those they shouldnt hold) if an inconsistency is found, mark the local node as recovery mode active and wait for the recovery master to trigger a full blown recovery (This used to be ctdb commit 55a5bfc8244c5b9cdda3f11992f384f00566b5dc) 2007-09-14 04:16:36 +04:00
track both when we last started and ended a recovery. make ctdb uptime print how long the recovery took in the recovery daemon when we check that the public ip address allocation on the local node is correct (we have the ips we should have and we dont have any we shouldnt have) use ctdb uptime and check the recovery start/stop times and make sure we dont check for ip allocation inconsistencies during a recovery where the ip address allocation is in flux. (This used to be ctdb commit f86551580349b7f662f9a07e4eb0c1189e38e429) 2008-07-02 07:55:59 +04:00			`/* verify that we have all ip addresses we should have and we dont`
			`* have addresses we shouldnt have.`
			`*/`
recoverd: New SRVID message CTDB_SRVID_DISABLE_TAKEOVER_RUNS This implements a superset of CTDB_SRVID_DISABLE_IP_CHECK. It stops the IP checks but also causes any attempted takeover runs to fail and be rescheduled. This is meant to completely stop IP movements. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 00db4de53a0d86013e79e6577e7e6cf3ef864e56) 2013-08-27 09:04:40 +04:00			`if (ctdb->tunable.disable_ip_failover == 0 &&`
ctdb-recoverd: Reimplement disabling takeover runs using ctdb_op_disable() Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-02-08 12:52:12 +03:00			`!ctdb_op_is_disabled(rec->takeover_run)) {`
recoverd: New SRVID message CTDB_SRVID_DISABLE_TAKEOVER_RUNS This implements a superset of CTDB_SRVID_DISABLE_IP_CHECK. It stops the IP checks but also causes any attempted takeover runs to fail and be rescheduled. This is meant to completely stop IP movements. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 00db4de53a0d86013e79e6577e7e6cf3ef864e56) 2013-08-27 09:04:40 +04:00			`if (verify_local_ip_allocation(ctdb, rec, pnn, nodemap) != 0) {`
			`DEBUG(DEBUG_ERR, (__location__ " Public IPs were inconsistent.\n"));`
inew version 1.0.66 ddwq (This used to be ctdb commit 499a01fece2a5f24f1b2943cf3dc6e9a3a8ca3b5) 2008-11-24 11:06:02 +03:00			`}`
remove some unnessecary tests if ->vnn is null or not (This used to be ctdb commit f0169ac8166a19d65ce254496e21d095aed87c2f) 2008-05-15 07:28:19 +04:00			`}`
track both when we last started and ended a recovery. make ctdb uptime print how long the recovery took in the recovery daemon when we check that the public ip address allocation on the local node is correct (we have the ips we should have and we dont have any we shouldnt have) use ctdb uptime and check the recovery start/stop times and make sure we dont check for ip allocation inconsistencies during a recovery where the ip address allocation is in flux. (This used to be ctdb commit f86551580349b7f662f9a07e4eb0c1189e38e429) 2008-07-02 07:55:59 +04:00
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00
			`/* if we are not the recmaster then we do not need to check`
			`if recovery is needed`
			`*/`
change recmaster from being a local variable in monitor_cluster() to be a member of the ctdb_recoverd structure (This used to be ctdb commit b7f955338f50c92374b4f559268fb3a1a516aefa) 2008-03-02 23:53:46 +03:00			`if (pnn != rec->recmaster) {`
speed startup: alter recovery loop We do a recovery on startup. But the code does: Sleep for ctdb->tunable.recover_interval. Check for recovery. We want to do it in the other order. This is best done by extracting the loop into a separate "main_loop" function. Seconds between ctdbd first log message and node healthy: BEFORE: 24.09 AFTER: 23.58 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 097046025176b9fcb670839d1a9f100f890e7ed2) 2010-06-22 17:20:23 +04:00			`return;`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`}`

simplify election handling make sure we read and update the flags from all remote nodes before we reach the first codepath that can call do_recovery() since during do_recovery() we need to know what the flags are. (This used to be ctdb commit e85f3806483ea420559d449e0e4d81bec996740f) 2007-10-11 00:16:36 +04:00
sync flags between nodes in monitor loop in recmaster (This used to be ctdb commit 6eef86e06388fc53a1212f1e2783ae174c6cd210) 2007-10-15 08:28:51 +04:00			`/* ensure our local copies of flags are right */`
dont manipulate ctdb->monitoring_mode directly from the SET_MON_MODE control, instead call ctdb_start/stop_monitoring() ctdb_stop_monitoring() dont allocate a new monitoring context, leave it NULL. Also set the monitoring_mode in this function so that ctdb_stop/start_monitoring() and ->monitoring_mode are kept in sync. Add a debug message to log that we have stopped monitoring. ctdb_start_monitoring() check whether monitoring is already active and make the function idempotent. Create the monitoring context when monitoring is started. Update ->monitoring_mode once the monitoring has been started. Add a debug message to log that we have started monitoring. When we temporarily stop monitoring while running an event script, restart monitoring after the event script wrapper returns instead of in the event script callback. Let monitoring_mode start out as DISABLED and let it be enabled once we call ctdb_start_monitoring. dont check for MONITORING_DISABLED in check_fore_dead_nodes(). If monitoring is disabled, this event handler will not be called. (This used to be ctdb commit 3a93ae8bdcffb1adbd6243844f3058fc742f76aa) 2007-11-30 00:44:34 +03:00			`ret = update_local_flags(rec, nodemap);`
If update_local_flags() finds that a node has changed its BANNED status so it differs from what the local ctdb daemon on the recovery master thinks it should be we should call for a re-election (This used to be ctdb commit 21ad6039c31ef5cc0e40a35a41220f91943947cb) 2007-11-23 03:53:06 +03:00			`if (ret == MONITOR_ELECTION_NEEDED) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_NOTICE,("update_local_flags() called for a re-election.\n"));`
add a new tunable : reclockpingperiod once every such interval : * the recovery master on each node will uppdate the "connected" count in the reclock count file (ctdb getreclock) * if the node thinks it is a recovery master but it detects another node that is DISCONNECTED but which still holds a lock to the reclock count file this may mean that we have a split cluster. if that other node that is DISCONNECTED but still holds the lock on hte reclock pnn count file, is MORE connected than the local node, yield the recmaster role and let the other half of the lcuster take over this add a second, last chance mechanism to detect split clusters. IF the cluster is split but GPFS is not yet split, this mechanism makes the largest half of the cluster become the active half. (This used to be ctdb commit 07af425f444531942cce8abff112c1524228d287) 2008-03-03 01:19:30 +03:00			`force_election(rec, pnn, nodemap);`
speed startup: alter recovery loop We do a recovery on startup. But the code does: Sleep for ctdb->tunable.recover_interval. Check for recovery. We want to do it in the other order. This is best done by extracting the loop into a separate "main_loop" function. Seconds between ctdbd first log message and node healthy: BEFORE: 24.09 AFTER: 23.58 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 097046025176b9fcb670839d1a9f100f890e7ed2) 2010-06-22 17:20:23 +04:00			`return;`
If update_local_flags() finds that a node has changed its BANNED status so it differs from what the local ctdb daemon on the recovery master thinks it should be we should call for a re-election (This used to be ctdb commit 21ad6039c31ef5cc0e40a35a41220f91943947cb) 2007-11-23 03:53:06 +03:00			`}`
			`if (ret != MONITOR_OK) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,("Unable to update local flags\n"));`
speed startup: alter recovery loop We do a recovery on startup. But the code does: Sleep for ctdb->tunable.recover_interval. Check for recovery. We want to do it in the other order. This is best done by extracting the loop into a separate "main_loop" function. Seconds between ctdbd first log message and node healthy: BEFORE: 24.09 AFTER: 23.58 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 097046025176b9fcb670839d1a9f100f890e7ed2) 2010-06-22 17:20:23 +04:00			`return;`
simplify election handling make sure we read and update the flags from all remote nodes before we reach the first codepath that can call do_recovery() since during do_recovery() we need to know what the flags are. (This used to be ctdb commit e85f3806483ea420559d449e0e4d81bec996740f) 2007-10-11 00:16:36 +04:00			`}`

when we reload the nodes file, we may need to reload the nodes file inside the recovery daemon as well. (This used to be ctdb commit 82fd2b6b5cd8e988c38fa6b74121a048757bdeef) 2008-10-17 14:18:06 +04:00			`if (ctdb->num_nodes != nodemap->num) {`
			`DEBUG(DEBUG_ERR, (__location__ " ctdb->num_nodes (%d) != nodemap->num (%d) reloading nodes file\n", ctdb->num_nodes, nodemap->num));`
recoverd: Remove function reload_nodes_file() It is a 1 line wrapper around ctdb_load_nodes_file(), so use that instead. We need less code... :-) Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 4a5d5935f4410a93a3343d85a24dbcddae2c4c20) 2013-10-14 06:54:39 +04:00			`ctdb_load_nodes_file(ctdb);`
speed startup: alter recovery loop We do a recovery on startup. But the code does: Sleep for ctdb->tunable.recover_interval. Check for recovery. We want to do it in the other order. This is best done by extracting the loop into a separate "main_loop" function. Seconds between ctdbd first log message and node healthy: BEFORE: 24.09 AFTER: 23.58 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 097046025176b9fcb670839d1a9f100f890e7ed2) 2010-06-22 17:20:23 +04:00			`return;`
when we reload the nodes file, we may need to reload the nodes file inside the recovery daemon as well. (This used to be ctdb commit 82fd2b6b5cd8e988c38fa6b74121a048757bdeef) 2008-10-17 14:18:06 +04:00			`}`
allow different nodes in the cluster to use different public_addresses files so that we can partition the cluster into different subsets of nodes which each serve a different subset of the public addresses (This used to be ctdb commit 889e0fe69e4c88c6166282b12843b8d9727552d6) 2007-09-04 17:15:23 +04:00
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`/* verify that all active nodes agree that we are the recmaster */`
when a node disgrees with us re who is recmaster make it mark that node as a lcuprit so it eventually gets banned (This used to be ctdb commit eff3f326f8ce6070c9f3c430cd14d1b71a8db220) 2008-04-21 18:56:27 +04:00			`switch (verify_recmaster(rec, nodemap, pnn)) {`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`case MONITOR_RECOVERY_NEEDED:`
			`/* can not happen */`
speed startup: alter recovery loop We do a recovery on startup. But the code does: Sleep for ctdb->tunable.recover_interval. Check for recovery. We want to do it in the other order. This is best done by extracting the loop into a separate "main_loop" function. Seconds between ctdbd first log message and node healthy: BEFORE: 24.09 AFTER: 23.58 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 097046025176b9fcb670839d1a9f100f890e7ed2) 2010-06-22 17:20:23 +04:00			`return;`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`case MONITOR_ELECTION_NEEDED:`
add a new tunable : reclockpingperiod once every such interval : * the recovery master on each node will uppdate the "connected" count in the reclock count file (ctdb getreclock) * if the node thinks it is a recovery master but it detects another node that is DISCONNECTED but which still holds a lock to the reclock count file this may mean that we have a split cluster. if that other node that is DISCONNECTED but still holds the lock on hte reclock pnn count file, is MORE connected than the local node, yield the recmaster role and let the other half of the lcuster take over this add a second, last chance mechanism to detect split clusters. IF the cluster is split but GPFS is not yet split, this mechanism makes the largest half of the cluster become the active half. (This used to be ctdb commit 07af425f444531942cce8abff112c1524228d287) 2008-03-03 01:19:30 +03:00			`force_election(rec, pnn, nodemap);`
speed startup: alter recovery loop We do a recovery on startup. But the code does: Sleep for ctdb->tunable.recover_interval. Check for recovery. We want to do it in the other order. This is best done by extracting the loop into a separate "main_loop" function. Seconds between ctdbd first log message and node healthy: BEFORE: 24.09 AFTER: 23.58 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 097046025176b9fcb670839d1a9f100f890e7ed2) 2010-06-22 17:20:23 +04:00			`return;`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`case MONITOR_OK:`
			`break;`
			`case MONITOR_FAILED:`
speed startup: alter recovery loop We do a recovery on startup. But the code does: Sleep for ctdb->tunable.recover_interval. Check for recovery. We want to do it in the other order. This is best done by extracting the loop into a separate "main_loop" function. Seconds between ctdbd first log message and node healthy: BEFORE: 24.09 AFTER: 23.58 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 097046025176b9fcb670839d1a9f100f890e7ed2) 2010-06-22 17:20:23 +04:00			`return;`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00			`}`


- merge from ronnie - add a flag to check that recovery completed correctly. If not, re-trigger it in monitoring (This used to be ctdb commit d5ed941d9bab4af30d8b5f9b77bdf43d9218d69b) 2007-09-14 03:49:12 +04:00			`if (rec->need_recovery) {`
			`/* a previous recovery didn't finish */`
new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00			`do_recovery(rec, mem_ctx, pnn, nodemap, vnnmap);`
speed startup: alter recovery loop We do a recovery on startup. But the code does: Sleep for ctdb->tunable.recover_interval. Check for recovery. We want to do it in the other order. This is best done by extracting the loop into a separate "main_loop" function. Seconds between ctdbd first log message and node healthy: BEFORE: 24.09 AFTER: 23.58 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 097046025176b9fcb670839d1a9f100f890e7ed2) 2010-06-22 17:20:23 +04:00			`return;`
- merge from ronnie - add a flag to check that recovery completed correctly. If not, re-trigger it in monitoring (This used to be ctdb commit d5ed941d9bab4af30d8b5f9b77bdf43d9218d69b) 2007-09-14 03:49:12 +04:00			`}`

add a test in the function that checks whether the cluster needs recovery or not that all active nodes are in normal mode. If we discover that some node is still in recoverymode it may indicate that a previous recovery ended prematurely and thus we should start a new recovery (This used to be ctdb commit c15517872e6c98c8c425a8d47d2b348ecb0620b0) 2007-05-06 22:41:12 +04:00			`/* verify that all active nodes are in normal mode`
			`and not in recovery mode`
new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00			`*/`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`switch (verify_recmode(ctdb, nodemap)) {`
break checking that the recoverymode on all nodes are ok out into its own function (This used to be ctdb commit 813cf9a252af96da24122b80f24aabeed2911939) 2007-08-23 07:48:39 +04:00			`case MONITOR_RECOVERY_NEEDED:`
new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00			`do_recovery(rec, mem_ctx, pnn, nodemap, vnnmap);`
speed startup: alter recovery loop We do a recovery on startup. But the code does: Sleep for ctdb->tunable.recover_interval. Check for recovery. We want to do it in the other order. This is best done by extracting the loop into a separate "main_loop" function. Seconds between ctdbd first log message and node healthy: BEFORE: 24.09 AFTER: 23.58 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 097046025176b9fcb670839d1a9f100f890e7ed2) 2010-06-22 17:20:23 +04:00			`return;`
break checking that the recoverymode on all nodes are ok out into its own function (This used to be ctdb commit 813cf9a252af96da24122b80f24aabeed2911939) 2007-08-23 07:48:39 +04:00			`case MONITOR_FAILED:`
speed startup: alter recovery loop We do a recovery on startup. But the code does: Sleep for ctdb->tunable.recover_interval. Check for recovery. We want to do it in the other order. This is best done by extracting the loop into a separate "main_loop" function. Seconds between ctdbd first log message and node healthy: BEFORE: 24.09 AFTER: 23.58 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 097046025176b9fcb670839d1a9f100f890e7ed2) 2010-06-22 17:20:23 +04:00			`return;`
try out a slightly different api for controls where you provide a callback function which is called upon completion (or timeout) of the control. modify scanning of recmaster in the monitoring_cluster code to try the api out (This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c) 2007-08-23 13:27:09 +04:00			`case MONITOR_ELECTION_NEEDED:`
			`/* can not happen */`
break checking that the recoverymode on all nodes are ok out into its own function (This used to be ctdb commit 813cf9a252af96da24122b80f24aabeed2911939) 2007-08-23 07:48:39 +04:00			`case MONITOR_OK:`
			`break;`
add a test in the function that checks whether the cluster needs recovery or not that all active nodes are in normal mode. If we discover that some node is still in recoverymode it may indicate that a previous recovery ended prematurely and thus we should start a new recovery (This used to be ctdb commit c15517872e6c98c8c425a8d47d2b348ecb0620b0) 2007-05-06 22:41:12 +04:00			`}`


ctdb-daemon: Mark tunable VerifyRecoveryLock as obsolete It is pointless having a recovery lock but not sanity checking that it is working. Also, the logic that uses this tunable is confusing. In some places the recovery lock is released unnecessarily because the tunable isn't set. Simplify the logic by assuming that if a recovery lock is specified then it should be verified. Update documentation that references this tunable. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2014-12-09 05:47:42 +03:00			`if (ctdb->recovery_lock_file != NULL) {`
ctdb-recoverd: Remove check_recovery_lock() This has not done anything useful since commit b9d8bb23af8abefb2d967e9b4e9d6e60c4a3b520. Instead, just check that the lock is held. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2014-12-09 06:45:08 +03:00			`/* We must already hold the recovery lock */`
			`if (!ctdb_recovery_have_lock(ctdb)) {`
			`DEBUG(DEBUG_ERR,("Failed recovery lock sanity check. Force a recovery\n"));`
new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00			`ctdb_set_culprit(rec, ctdb->pnn);`
			`do_recovery(rec, mem_ctx, pnn, nodemap, vnnmap);`
speed startup: alter recovery loop We do a recovery on startup. But the code does: Sleep for ctdb->tunable.recover_interval. Check for recovery. We want to do it in the other order. This is best done by extracting the loop into a separate "main_loop" function. Seconds between ctdbd first log message and node healthy: BEFORE: 24.09 AFTER: 23.58 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 097046025176b9fcb670839d1a9f100f890e7ed2) 2010-06-22 17:20:23 +04:00			`return;`
Dont access the reclock file at all if VerifyRecoveryLock is zero and also make sure the reclock file is closed if the variable is cleared at runtime (This used to be ctdb commit a25f4888689a0725971606163d87c39a41669292) 2009-06-25 05:41:18 +04:00			`}`
- catch ESTALE in the recovery lock by trying a read() - priortise nodes that are unbanned and healthy in the election (This used to be ctdb commit 929feb475dfdf7283f0e99b50b179e1c91d3a39f) 2007-10-05 07:28:21 +04:00			`}`
break checking that the recoverymode on all nodes are ok out into its own function (This used to be ctdb commit 813cf9a252af96da24122b80f24aabeed2911939) 2007-08-23 07:48:39 +04:00
Add new control to reload the public ip address file on a node Also add a method to use the recovery master/daemon to reload the public ips on all nodes in the cluster. Reloading the public ips on all node sin the cluster is only suported if all nodes in the cluster are available and healthy. (This used to be ctdb commit 05603e914f8c12618d7e06943c0f7df207f645b0) 2012-04-30 09:50:44 +04:00
server/recoverd: do takeover_run after verifying the reclock file metze (This used to be ctdb commit 93df096773c89f21f77b3bcf9aa90bf28881b852) 2010-08-31 10:42:32 +04:00			`/* if there are takeovers requested, perform it and notify the waiters */`
ctdb-recoverd: Reimplement disabling takeover runs using ctdb_op_disable() Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-02-08 12:52:12 +03:00			`if (!ctdb_op_is_disabled(rec->takeover_run) &&`
recoverd: Defer ipreallocated requests when takeover runs are disabled The takeover run will fail anyway but deferring seems like a cleaner option. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 428f800bcdf3dbfe19de8bb36099fbf01ebeaab4) 2013-08-28 05:50:23 +04:00			`rec->reallocate_requests) {`
server/recoverd: do takeover_run after verifying the reclock file metze (This used to be ctdb commit 93df096773c89f21f77b3bcf9aa90bf28881b852) 2010-08-31 10:42:32 +04:00			`process_ipreallocate_requests(ctdb, rec);`
			`}`

ctdb-recoverd: Avoid nodemap-related checks when recoveries are disabled The potential resulting recovery won't run anyway. Also recoveries may have been disabled by "reloadnodes" and if the nodemaps are inconsistent between nodes then avoid triggering an unnecessary recovery. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-02-06 12:59:11 +03:00			`/* If recoveries are disabled then there is no use doing any`
			`* nodemap or flags checks. Recoveries might be disabled due`
			`* to "reloadnodes", so doing these checks might cause an`
			`* unnecessary recovery. */`
			`if (ctdb_op_is_disabled(rec->recovery)) {`
			`return;`
			`}`

redo and update how we synchronize flags across the cluster. this simplifies the code and should close a race condition between the local recovery daemon and a remote node when flags are changing. (This used to be ctdb commit 32d460b8469eb53145f04161a5d01166f9b5f09e) 2008-12-05 08:32:30 +03:00			`/* get the nodemap for all active remote nodes`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`*/`
ctdb-daemon: Rename struct ctdb_node_map to ctdb_node_map_old Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 09:22:48 +03:00			`remote_nodemaps = talloc_array(mem_ctx, struct ctdb_node_map_old *, nodemap->num);`
update to the flags handling make sure to abort the monitoring and restart if we failed to get the nodemap from a remote node (This used to be ctdb commit 4eac0214e732e6c2f867d66ec71d4406680dbb94) 2008-12-09 02:45:14 +03:00			`if (remote_nodemaps == NULL) {`
			`DEBUG(DEBUG_ERR, (__location__ " failed to allocate remote nodemap array\n"));`
speed startup: alter recovery loop We do a recovery on startup. But the code does: Sleep for ctdb->tunable.recover_interval. Check for recovery. We want to do it in the other order. This is best done by extracting the loop into a separate "main_loop" function. Seconds between ctdbd first log message and node healthy: BEFORE: 24.09 AFTER: 23.58 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 097046025176b9fcb670839d1a9f100f890e7ed2) 2010-06-22 17:20:23 +04:00			`return;`
update to the flags handling make sure to abort the monitoring and restart if we failed to get the nodemap from a remote node (This used to be ctdb commit 4eac0214e732e6c2f867d66ec71d4406680dbb94) 2008-12-09 02:45:14 +03:00			`}`
			`for(i=0; i<nodemap->num; i++) {`
			`remote_nodemaps[i] = NULL;`
			`}`
			`if (get_remote_nodemaps(ctdb, mem_ctx, nodemap, remote_nodemaps) != 0) {`
redo and update how we synchronize flags across the cluster. this simplifies the code and should close a race condition between the local recovery daemon and a remote node when flags are changing. (This used to be ctdb commit 32d460b8469eb53145f04161a5d01166f9b5f09e) 2008-12-05 08:32:30 +03:00			`DEBUG(DEBUG_ERR,(__location__ " Failed to read remote nodemaps\n"));`
speed startup: alter recovery loop We do a recovery on startup. But the code does: Sleep for ctdb->tunable.recover_interval. Check for recovery. We want to do it in the other order. This is best done by extracting the loop into a separate "main_loop" function. Seconds between ctdbd first log message and node healthy: BEFORE: 24.09 AFTER: 23.58 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 097046025176b9fcb670839d1a9f100f890e7ed2) 2010-06-22 17:20:23 +04:00			`return;`
redo and update how we synchronize flags across the cluster. this simplifies the code and should close a race condition between the local recovery daemon and a remote node when flags are changing. (This used to be ctdb commit 32d460b8469eb53145f04161a5d01166f9b5f09e) 2008-12-05 08:32:30 +03:00			`}`

			`/* verify that all other nodes have the same nodemap as we have`
			`*/`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`for (j=0; j<nodemap->num; j++) {`
We dont need to verify the nodemap on remote nodes that are banned (This used to be ctdb commit 7f8f9385deee6eff2b7303147bc6412bbdc122df) 2009-04-06 06:00:22 +04:00			`if (nodemap->nodes[j].flags & NODE_FLAGS_INACTIVE) {`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`continue;`
			`}`

update to the flags handling make sure to abort the monitoring and restart if we failed to get the nodemap from a remote node (This used to be ctdb commit 4eac0214e732e6c2f867d66ec71d4406680dbb94) 2008-12-09 02:45:14 +03:00			`if (remote_nodemaps[j] == NULL) {`
			`DEBUG(DEBUG_ERR,(__location__ " Did not get a remote nodemap for node %d, restarting monitoring\n", j));`
if we cant pull the remote nodemap off a node we should mark it as a culprit so it eventually becomes banned. (This used to be ctdb commit 0889ae3c237bdb3bd72d45f2f64f5e5d8420870c) 2009-04-02 07:50:43 +04:00			`ctdb_set_culprit(rec, j);`

speed startup: alter recovery loop We do a recovery on startup. But the code does: Sleep for ctdb->tunable.recover_interval. Check for recovery. We want to do it in the other order. This is best done by extracting the loop into a separate "main_loop" function. Seconds between ctdbd first log message and node healthy: BEFORE: 24.09 AFTER: 23.58 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 097046025176b9fcb670839d1a9f100f890e7ed2) 2010-06-22 17:20:23 +04:00			`return;`
update to the flags handling make sure to abort the monitoring and restart if we failed to get the nodemap from a remote node (This used to be ctdb commit 4eac0214e732e6c2f867d66ec71d4406680dbb94) 2008-12-09 02:45:14 +03:00			`}`

redo and update how we synchronize flags across the cluster. this simplifies the code and should close a race condition between the local recovery daemon and a remote node when flags are changing. (This used to be ctdb commit 32d460b8469eb53145f04161a5d01166f9b5f09e) 2008-12-05 08:32:30 +03:00			`/* if the nodes disagree on how many nodes there are`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`then this is a good reason to try recovery`
			`*/`
redo and update how we synchronize flags across the cluster. this simplifies the code and should close a race condition between the local recovery daemon and a remote node when flags are changing. (This used to be ctdb commit 32d460b8469eb53145f04161a5d01166f9b5f09e) 2008-12-05 08:32:30 +03:00			`if (remote_nodemaps[j]->num != nodemap->num) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Remote node:%u has different node count. %u vs %u of the local node\n",`
redo and update how we synchronize flags across the cluster. this simplifies the code and should close a race condition between the local recovery daemon and a remote node when flags are changing. (This used to be ctdb commit 32d460b8469eb53145f04161a5d01166f9b5f09e) 2008-12-05 08:32:30 +03:00			`nodemap->nodes[j].pnn, remote_nodemaps[j]->num, nodemap->num));`
new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00			`ctdb_set_culprit(rec, nodemap->nodes[j].pnn);`
			`do_recovery(rec, mem_ctx, pnn, nodemap, vnnmap);`
speed startup: alter recovery loop We do a recovery on startup. But the code does: Sleep for ctdb->tunable.recover_interval. Check for recovery. We want to do it in the other order. This is best done by extracting the loop into a separate "main_loop" function. Seconds between ctdbd first log message and node healthy: BEFORE: 24.09 AFTER: 23.58 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 097046025176b9fcb670839d1a9f100f890e7ed2) 2010-06-22 17:20:23 +04:00			`return;`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`}`

			`/* if the nodes disagree on which nodes exist and are`
			`active, then that is also a good reason to do recovery`
			`*/`
			`for (i=0;i<nodemap->num;i++) {`
redo and update how we synchronize flags across the cluster. this simplifies the code and should close a race condition between the local recovery daemon and a remote node when flags are changing. (This used to be ctdb commit 32d460b8469eb53145f04161a5d01166f9b5f09e) 2008-12-05 08:32:30 +03:00			`if (remote_nodemaps[j]->nodes[i].pnn != nodemap->nodes[i].pnn) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Remote node:%u has different nodemap pnn for %d (%u vs %u).\n",`
change how we do public addresses and takeover so that we can have multiple public addresses spread across multiple interfaces on each node. this is a massive patch since we have previously made the assumtion that we only have one public address per node. get rid of the public_interface argument. the public addresses file now explicitely lists which interface the address belongs to (This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8) 2007-09-04 03:50:07 +04:00			`nodemap->nodes[j].pnn, i,`
redo and update how we synchronize flags across the cluster. this simplifies the code and should close a race condition between the local recovery daemon and a remote node when flags are changing. (This used to be ctdb commit 32d460b8469eb53145f04161a5d01166f9b5f09e) 2008-12-05 08:32:30 +03:00			`remote_nodemaps[j]->nodes[i].pnn, nodemap->nodes[i].pnn));`
new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00			`ctdb_set_culprit(rec, nodemap->nodes[j].pnn);`
store the num_active variable (number of connected/active nodes) inside the rec structure and avoid passing this as an extra parameter to do_recovery() (This used to be ctdb commit 8bb229aa3b4bd41e48d4e4e2e148d8680c8ba436) 2008-02-29 04:55:20 +03:00			`do_recovery(rec, mem_ctx, pnn, nodemap,`
new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00			`vnnmap);`
speed startup: alter recovery loop We do a recovery on startup. But the code does: Sleep for ctdb->tunable.recover_interval. Check for recovery. We want to do it in the other order. This is best done by extracting the loop into a separate "main_loop" function. Seconds between ctdbd first log message and node healthy: BEFORE: 24.09 AFTER: 23.58 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 097046025176b9fcb670839d1a9f100f890e7ed2) 2010-06-22 17:20:23 +04:00			`return;`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`}`
			`}`
recoverd: Assemble up-to-date node flags information from remote nodes Currently nodemap used by recovery master is the one obtained from the local node. This information may have been updated while processing main loop. Before comparing node flags on all the nodes, create up-to-date node flags information based on the information received from all the nodes. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit fcf77dec5af973a0e32f3999bc012053a6f47a96) 2013-07-22 11:26:28 +04:00			`}`

			`/*`
			`* Update node flags obtained from each active node. This ensure we have`
			`* up-to-date information for all the nodes.`
			`*/`
			`for (j=0; j<nodemap->num; j++) {`
			`if (nodemap->nodes[j].flags & NODE_FLAGS_INACTIVE) {`
			`continue;`
			`}`
			`nodemap->nodes[j].flags = remote_nodemaps[j]->nodes[j].flags;`
			`}`

			`for (j=0; j<nodemap->num; j++) {`
			`if (nodemap->nodes[j].flags & NODE_FLAGS_INACTIVE) {`
			`continue;`
			`}`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00
redo and update how we synchronize flags across the cluster. this simplifies the code and should close a race condition between the local recovery daemon and a remote node when flags are changing. (This used to be ctdb commit 32d460b8469eb53145f04161a5d01166f9b5f09e) 2008-12-05 08:32:30 +03:00			`/* verify the flags are consistent`
			`*/`
			`for (i=0; i<nodemap->num; i++) {`
			`if (nodemap->nodes[i].flags & NODE_FLAGS_DISCONNECTED) {`
			`continue;`
			`}`

			`if (nodemap->nodes[i].flags != remote_nodemaps[j]->nodes[i].flags) {`
			`DEBUG(DEBUG_ERR, (__location__ " Remote node:%u has different flags for node %u. It has 0x%02x vs our 0x%02x\n",`
			`nodemap->nodes[j].pnn,`
			`nodemap->nodes[i].pnn,`
			`remote_nodemaps[j]->nodes[i].flags,`
recoverd: Fix printing of node flags from local information Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 124e2a471aeda9c900fd898178a30522d7d74221) 2013-01-23 07:35:47 +04:00			`nodemap->nodes[i].flags));`
redo and update how we synchronize flags across the cluster. this simplifies the code and should close a race condition between the local recovery daemon and a remote node when flags are changing. (This used to be ctdb commit 32d460b8469eb53145f04161a5d01166f9b5f09e) 2008-12-05 08:32:30 +03:00			`if (i == j) {`
			`DEBUG(DEBUG_ERR,("Use flags 0x%02x from remote node %d for cluster update of its own flags\n", remote_nodemaps[j]->nodes[i].flags, j));`
			`update_flags_on_all_nodes(ctdb, nodemap, nodemap->nodes[i].pnn, remote_nodemaps[j]->nodes[i].flags);`
new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00			`ctdb_set_culprit(rec, nodemap->nodes[j].pnn);`
redo and update how we synchronize flags across the cluster. this simplifies the code and should close a race condition between the local recovery daemon and a remote node when flags are changing. (This used to be ctdb commit 32d460b8469eb53145f04161a5d01166f9b5f09e) 2008-12-05 08:32:30 +03:00			`do_recovery(rec, mem_ctx, pnn, nodemap,`
new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00			`vnnmap);`
speed startup: alter recovery loop We do a recovery on startup. But the code does: Sleep for ctdb->tunable.recover_interval. Check for recovery. We want to do it in the other order. This is best done by extracting the loop into a separate "main_loop" function. Seconds between ctdbd first log message and node healthy: BEFORE: 24.09 AFTER: 23.58 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 097046025176b9fcb670839d1a9f100f890e7ed2) 2010-06-22 17:20:23 +04:00			`return;`
redo and update how we synchronize flags across the cluster. this simplifies the code and should close a race condition between the local recovery daemon and a remote node when flags are changing. (This used to be ctdb commit 32d460b8469eb53145f04161a5d01166f9b5f09e) 2008-12-05 08:32:30 +03:00			`} else {`
			`DEBUG(DEBUG_ERR,("Use flags 0x%02x from local recmaster node for cluster update of node %d flags\n", nodemap->nodes[i].flags, i));`
			`update_flags_on_all_nodes(ctdb, nodemap, nodemap->nodes[i].pnn, nodemap->nodes[i].flags);`
new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00			`ctdb_set_culprit(rec, nodemap->nodes[j].pnn);`
redo and update how we synchronize flags across the cluster. this simplifies the code and should close a race condition between the local recovery daemon and a remote node when flags are changing. (This used to be ctdb commit 32d460b8469eb53145f04161a5d01166f9b5f09e) 2008-12-05 08:32:30 +03:00			`do_recovery(rec, mem_ctx, pnn, nodemap,`
new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00			`vnnmap);`
speed startup: alter recovery loop We do a recovery on startup. But the code does: Sleep for ctdb->tunable.recover_interval. Check for recovery. We want to do it in the other order. This is best done by extracting the loop into a separate "main_loop" function. Seconds between ctdbd first log message and node healthy: BEFORE: 24.09 AFTER: 23.58 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 097046025176b9fcb670839d1a9f100f890e7ed2) 2010-06-22 17:20:23 +04:00			`return;`
redo and update how we synchronize flags across the cluster. this simplifies the code and should close a race condition between the local recovery daemon and a remote node when flags are changing. (This used to be ctdb commit 32d460b8469eb53145f04161a5d01166f9b5f09e) 2008-12-05 08:32:30 +03:00			`}`
			`}`
			`}`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`}`

ctdb_recoverd: Move num_lmasters calculation to near where it is used Unless this node is the recovery master then this is not needed. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-03-29 12:00:17 +03:00
			`/* count how many active nodes there are */`
			`num_lmasters = 0;`
			`for (i=0; i<nodemap->num; i++) {`
			`if (!(nodemap->nodes[i].flags & NODE_FLAGS_INACTIVE)) {`
			`if (ctdb_node_has_capabilities(rec->caps,`
			`ctdb->nodes[i]->pnn,`
			`CTDB_CAP_LMASTER)) {`
			`num_lmasters++;`
			`}`
			`}`
			`}`

update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00
recoverd: Fix the VNN lmaster consistency check It does cope with node that don't have the lmaster capability. Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 588172bcb6bf267339e2bd09e23d2c4904a27a41) 2013-09-26 07:11:04 +04:00			`/* There must be the same number of lmasters in the vnn map as`
			`* there are active nodes with the lmaster capability... or`
			`* do a recovery.`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`*/`
ctdb-recoverd: Make num_lmasters a local variable It isn't used anywhere else and is always re-initialised to 0. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-03-29 09:49:02 +03:00			`if (vnnmap->size != num_lmasters) {`
recoverd: Fix the VNN lmaster consistency check It does cope with node that don't have the lmaster capability. Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 588172bcb6bf267339e2bd09e23d2c4904a27a41) 2013-09-26 07:11:04 +04:00			`DEBUG(DEBUG_ERR, (__location__ " The vnnmap count is different from the number of active lmaster nodes: %u vs %u\n",`
ctdb-recoverd: Make num_lmasters a local variable It isn't used anywhere else and is always re-initialised to 0. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-03-29 09:49:02 +03:00			`vnnmap->size, num_lmasters));`
new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00			`ctdb_set_culprit(rec, ctdb->pnn);`
			`do_recovery(rec, mem_ctx, pnn, nodemap, vnnmap);`
speed startup: alter recovery loop We do a recovery on startup. But the code does: Sleep for ctdb->tunable.recover_interval. Check for recovery. We want to do it in the other order. This is best done by extracting the loop into a separate "main_loop" function. Seconds between ctdbd first log message and node healthy: BEFORE: 24.09 AFTER: 23.58 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 097046025176b9fcb670839d1a9f100f890e7ed2) 2010-06-22 17:20:23 +04:00			`return;`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`}`

			`/* verify that all active nodes in the nodemap also exist in`
			`the vnnmap.`
			`*/`
			`for (j=0; j<nodemap->num; j++) {`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`if (nodemap->nodes[j].flags & NODE_FLAGS_INACTIVE) {`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`continue;`
			`}`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`if (nodemap->nodes[j].pnn == pnn) {`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`continue;`
			`}`

			`for (i=0; i<vnnmap->size; i++) {`
change how we do public addresses and takeover so that we can have multiple public addresses spread across multiple interfaces on each node. this is a massive patch since we have previously made the assumtion that we only have one public address per node. get rid of the public_interface argument. the public addresses file now explicitely lists which interface the address belongs to (This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8) 2007-09-04 03:50:07 +04:00			`if (vnnmap->map[i] == nodemap->nodes[j].pnn) {`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`break;`
			`}`
			`}`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`if (i == vnnmap->size) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Node %u is active in the nodemap but did not exist in the vnnmap\n",`
change how we do public addresses and takeover so that we can have multiple public addresses spread across multiple interfaces on each node. this is a massive patch since we have previously made the assumtion that we only have one public address per node. get rid of the public_interface argument. the public addresses file now explicitely lists which interface the address belongs to (This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8) 2007-09-04 03:50:07 +04:00			`nodemap->nodes[j].pnn));`
new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00			`ctdb_set_culprit(rec, nodemap->nodes[j].pnn);`
			`do_recovery(rec, mem_ctx, pnn, nodemap, vnnmap);`
speed startup: alter recovery loop We do a recovery on startup. But the code does: Sleep for ctdb->tunable.recover_interval. Check for recovery. We want to do it in the other order. This is best done by extracting the loop into a separate "main_loop" function. Seconds between ctdbd first log message and node healthy: BEFORE: 24.09 AFTER: 23.58 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 097046025176b9fcb670839d1a9f100f890e7ed2) 2010-06-22 17:20:23 +04:00			`return;`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`}`
			`}`


also verify that the generation id is the same on all the nodes and if not, trigger a recovery (This used to be ctdb commit 46b8a66ee70419c153acf45eeec88c1fc8f230ce) 2007-05-04 05:57:45 +04:00			`/* verify that all other nodes have the same vnnmap`
			`and are from the same generation`
			`*/`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`for (j=0; j<nodemap->num; j++) {`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`if (nodemap->nodes[j].flags & NODE_FLAGS_INACTIVE) {`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`continue;`
			`}`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`if (nodemap->nodes[j].pnn == pnn) {`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`continue;`
			`}`

change how we do public addresses and takeover so that we can have multiple public addresses spread across multiple interfaces on each node. this is a massive patch since we have previously made the assumtion that we only have one public address per node. get rid of the public_interface argument. the public addresses file now explicitely lists which interface the address belongs to (This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8) 2007-09-04 03:50:07 +04:00			`ret = ctdb_ctrl_getvnnmap(ctdb, CONTROL_TIMEOUT(), nodemap->nodes[j].pnn,`
formatting fixes (This used to be ctdb commit ed63a2057698aed3931762605b2ea2368681af2b) 2007-06-07 12:39:37 +04:00			`mem_ctx, &remote_vnnmap);`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Unable to get vnnmap from remote node %u\n",`
change how we do public addresses and takeover so that we can have multiple public addresses spread across multiple interfaces on each node. this is a massive patch since we have previously made the assumtion that we only have one public address per node. get rid of the public_interface argument. the public addresses file now explicitely lists which interface the address belongs to (This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8) 2007-09-04 03:50:07 +04:00			`nodemap->nodes[j].pnn));`
speed startup: alter recovery loop We do a recovery on startup. But the code does: Sleep for ctdb->tunable.recover_interval. Check for recovery. We want to do it in the other order. This is best done by extracting the loop into a separate "main_loop" function. Seconds between ctdbd first log message and node healthy: BEFORE: 24.09 AFTER: 23.58 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 097046025176b9fcb670839d1a9f100f890e7ed2) 2010-06-22 17:20:23 +04:00			`return;`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`}`

also verify that the generation id is the same on all the nodes and if not, trigger a recovery (This used to be ctdb commit 46b8a66ee70419c153acf45eeec88c1fc8f230ce) 2007-05-04 05:57:45 +04:00			`/* verify the vnnmap generation is the same */`
			`if (vnnmap->generation != remote_vnnmap->generation) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Remote node %u has different generation of vnnmap. %u vs %u (ours)\n",`
change how we do public addresses and takeover so that we can have multiple public addresses spread across multiple interfaces on each node. this is a massive patch since we have previously made the assumtion that we only have one public address per node. get rid of the public_interface argument. the public addresses file now explicitely lists which interface the address belongs to (This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8) 2007-09-04 03:50:07 +04:00			`nodemap->nodes[j].pnn, remote_vnnmap->generation, vnnmap->generation));`
new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00			`ctdb_set_culprit(rec, nodemap->nodes[j].pnn);`
			`do_recovery(rec, mem_ctx, pnn, nodemap, vnnmap);`
speed startup: alter recovery loop We do a recovery on startup. But the code does: Sleep for ctdb->tunable.recover_interval. Check for recovery. We want to do it in the other order. This is best done by extracting the loop into a separate "main_loop" function. Seconds between ctdbd first log message and node healthy: BEFORE: 24.09 AFTER: 23.58 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 097046025176b9fcb670839d1a9f100f890e7ed2) 2010-06-22 17:20:23 +04:00			`return;`
also verify that the generation id is the same on all the nodes and if not, trigger a recovery (This used to be ctdb commit 46b8a66ee70419c153acf45eeec88c1fc8f230ce) 2007-05-04 05:57:45 +04:00			`}`

update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`/* verify the vnnmap size is the same */`
			`if (vnnmap->size != remote_vnnmap->size) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Remote node %u has different size of vnnmap. %u vs %u (ours)\n",`
change how we do public addresses and takeover so that we can have multiple public addresses spread across multiple interfaces on each node. this is a massive patch since we have previously made the assumtion that we only have one public address per node. get rid of the public_interface argument. the public addresses file now explicitely lists which interface the address belongs to (This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8) 2007-09-04 03:50:07 +04:00			`nodemap->nodes[j].pnn, remote_vnnmap->size, vnnmap->size));`
new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00			`ctdb_set_culprit(rec, nodemap->nodes[j].pnn);`
			`do_recovery(rec, mem_ctx, pnn, nodemap, vnnmap);`
speed startup: alter recovery loop We do a recovery on startup. But the code does: Sleep for ctdb->tunable.recover_interval. Check for recovery. We want to do it in the other order. This is best done by extracting the loop into a separate "main_loop" function. Seconds between ctdbd first log message and node healthy: BEFORE: 24.09 AFTER: 23.58 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 097046025176b9fcb670839d1a9f100f890e7ed2) 2010-06-22 17:20:23 +04:00			`return;`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`}`

			`/* verify the vnnmap is the same */`
			`for (i=0;i<vnnmap->size;i++) {`
			`if (remote_vnnmap->map[i] != vnnmap->map[i]) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR, (__location__ " Remote node %u has different vnnmap.\n",`
change how we do public addresses and takeover so that we can have multiple public addresses spread across multiple interfaces on each node. this is a massive patch since we have previously made the assumtion that we only have one public address per node. get rid of the public_interface argument. the public addresses file now explicitely lists which interface the address belongs to (This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8) 2007-09-04 03:50:07 +04:00			`nodemap->nodes[j].pnn));`
new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00			`ctdb_set_culprit(rec, nodemap->nodes[j].pnn);`
store the num_active variable (number of connected/active nodes) inside the rec structure and avoid passing this as an extra parameter to do_recovery() (This used to be ctdb commit 8bb229aa3b4bd41e48d4e4e2e148d8680c8ba436) 2008-02-29 04:55:20 +03:00			`do_recovery(rec, mem_ctx, pnn, nodemap,`
new prototype banning code (This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a) 2009-09-03 20:20:39 +04:00			`vnnmap);`
speed startup: alter recovery loop We do a recovery on startup. But the code does: Sleep for ctdb->tunable.recover_interval. Check for recovery. We want to do it in the other order. This is best done by extracting the loop into a separate "main_loop" function. Seconds between ctdbd first log message and node healthy: BEFORE: 24.09 AFTER: 23.58 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 097046025176b9fcb670839d1a9f100f890e7ed2) 2010-06-22 17:20:23 +04:00			`return;`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00			`}`
			`}`
			`}`

added health monitoring logic to ctdb, so a node loses its public IP address if one of the sybsystem event scripts reports a problem (This used to be ctdb commit c7a089256d86cec21097453bce5acbccee87413f) 2007-06-06 04:25:46 +04:00			`/* we might need to change who has what IP assigned */`
prevent recursion in the calling of ctdb_takeover_run (This used to be ctdb commit 0fbdeb7c91b965d9bc5ecc7b24e31070378d8f1d) 2007-09-13 08:08:18 +04:00			`if (rec->need_takeover_run) {`
server: reload the public addresses before doing a takeover run metze (This used to be ctdb commit 0e41a2204fa8a1e77dc83c0d4b253ab272b5c72d) 2010-01-19 10:42:48 +03:00			`/* update the list of public ips that a node can handle for`
			`all connected nodes`
			`*/`
ctdb-recoverd: Remote IP validation can't cause a takeover run Remote IP validation is only called when a takeover run is about to happen anyway, so don't bother flagging one. Given that a takeover run isn't being triggered, also drop the test that checks if takeover runs are disabled. These are the only uses of the rec argument, so drop it. One possible further simplification would be to remove this function because it doesn't accomplish anything. However, it is worth leaving it as a reminder that remote IP validation should be done properly at some time in the future. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-10-28 12:33:29 +03:00			`ret = ctdb_reload_remote_public_ips(ctdb, nodemap);`
server: reload the public addresses before doing a takeover run metze (This used to be ctdb commit 0e41a2204fa8a1e77dc83c0d4b253ab272b5c72d) 2010-01-19 10:42:48 +03:00			`if (ret != 0) {`
speed startup: alter recovery loop We do a recovery on startup. But the code does: Sleep for ctdb->tunable.recover_interval. Check for recovery. We want to do it in the other order. This is best done by extracting the loop into a separate "main_loop" function. Seconds between ctdbd first log message and node healthy: BEFORE: 24.09 AFTER: 23.58 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 097046025176b9fcb670839d1a9f100f890e7ed2) 2010-06-22 17:20:23 +04:00			`return;`
server: reload the public addresses before doing a takeover run metze (This used to be ctdb commit 0e41a2204fa8a1e77dc83c0d4b253ab272b5c72d) 2010-01-19 10:42:48 +03:00			`}`

recoverd: Track the nodes that fail takeover run and set culprit count If any of the nodes fail takeover run (either due to timeout or failure to complete within takeover_timeout interval) from main loop, recovery master will give up trying takeover run with following message: "Unable to setup public takeover addresses. Try again later" And as a side-effect the monitoring is disabled on all the nodes. Before ctdb_takeover_run() is called from main loop, monitoring get disabled via startrecovery event. Since ctdb_takeover_run() fails, it never runs recovered event and monitoring does not get re-enabled. In main_loop, ctdb_takeover_run() is called with a takeover_fail_callback. This callback will get called if any of the nodes fail in handling takeip/releaseip/ipreallocated events in ctdb_takeover_run(). Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit a5c6bb1fffb8dc3960af113957a1fd080cc7c245) 2012-10-23 09:23:12 +04:00			`/* If takeover run fails, then the offending nodes are`
			`* assigned ban culprit counts. And we re-try takeover.`
			`* If takeover run fails repeatedly, the node would get`
			`* banned.`
			`*/`
ctdb-recoverd: Do not run recovery-related events around IP takeover This is not a recovery, so do not run "startrecovery and "recovered" events. There are other IP takeover runs where these are not run. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-10-28 11:47:03 +03:00			`do_takeover_run(rec, nodemap, true);`
added health monitoring logic to ctdb, so a node loses its public IP address if one of the sybsystem event scripts reports a problem (This used to be ctdb commit c7a089256d86cec21097453bce5acbccee87413f) 2007-06-06 04:25:46 +04:00			`}`
speed startup: alter recovery loop We do a recovery on startup. But the code does: Sleep for ctdb->tunable.recover_interval. Check for recovery. We want to do it in the other order. This is best done by extracting the loop into a separate "main_loop" function. Seconds between ctdbd first log message and node healthy: BEFORE: 24.09 AFTER: 23.58 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 097046025176b9fcb670839d1a9f100f890e7ed2) 2010-06-22 17:20:23 +04:00			`}`

			`/*`
			`the main monitoring loop`
			`*/`
			`static void monitor_cluster(struct ctdb_context *ctdb)`
			`{`
			`struct ctdb_recoverd *rec;`

			`DEBUG(DEBUG_NOTICE,("monitor_cluster starting\n"));`

			`rec = talloc_zero(ctdb, struct ctdb_recoverd);`
			`CTDB_NO_MEMORY_FATAL(ctdb, rec);`

			`rec->ctdb = ctdb;`
added health monitoring logic to ctdb, so a node loses its public IP address if one of the sybsystem event scripts reports a problem (This used to be ctdb commit c7a089256d86cec21097453bce5acbccee87413f) 2007-06-06 04:25:46 +04:00
ctdb-recoverd: Reimplement disabling takeover runs using ctdb_op_disable() Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-02-08 12:52:12 +03:00			`rec->takeover_run = ctdb_op_init(rec, "takeover runs");`
			`CTDB_NO_MEMORY_FATAL(ctdb, rec->takeover_run);`
recoverd: do_takeover_run() should mark when a takeover run is in progress Nested takeover runs should never happens so they should fail. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 8ed29c60c0a7dd29f2a6efdf694d38e94281e1c4) 2013-09-03 05:20:01 +04:00
ctdb-recoverd: Reimplement ReRecoveryTimeout using ctdb_op_disable() Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-02-06 06:47:33 +03:00			`rec->recovery = ctdb_op_init(rec, "recoveries");`
			`CTDB_NO_MEMORY_FATAL(ctdb, rec->recovery);`

speed startup: alter recovery loop We do a recovery on startup. But the code does: Sleep for ctdb->tunable.recover_interval. Check for recovery. We want to do it in the other order. This is best done by extracting the loop into a separate "main_loop" function. Seconds between ctdbd first log message and node healthy: BEFORE: 24.09 AFTER: 23.58 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 097046025176b9fcb670839d1a9f100f890e7ed2) 2010-06-22 17:20:23 +04:00			`rec->priority_time = timeval_current();`
force an update of the flags from the recmaster after each monitoring run (This used to be ctdb commit 251aeadc8b16a9c27a4bae78c97ad6e93e6cfdf4) 2008-06-26 07:08:37 +04:00
speed startup: alter recovery loop We do a recovery on startup. But the code does: Sleep for ctdb->tunable.recover_interval. Check for recovery. We want to do it in the other order. This is best done by extracting the loop into a separate "main_loop" function. Seconds between ctdbd first log message and node healthy: BEFORE: 24.09 AFTER: 23.58 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 097046025176b9fcb670839d1a9f100f890e7ed2) 2010-06-22 17:20:23 +04:00			`/* register a message port for sending memory dumps */`
			`ctdb_client_set_message_handler(ctdb, CTDB_SRVID_MEM_DUMP, mem_dump_handler, rec);`
update getvnnmap control to take a timeout parameter dont explicitely free the vnnmap pointer in the getvnnmap control this is freed by the mem_ctx instead add code to the recoverd to detect when/if recovery is required veiry that the number of active nodes, the nodemap and the vnn map is consistent across the entire cluster and if not trigger a recovery (which right now just prints "we need to do recovery" to the screen. (This used to be ctdb commit 2b0a207a3748bdb3394dc9fd0d1c344ee1bb0bb5) 2007-05-04 03:45:53 +04:00
speed startup: alter recovery loop We do a recovery on startup. But the code does: Sleep for ctdb->tunable.recover_interval. Check for recovery. We want to do it in the other order. This is best done by extracting the loop into a separate "main_loop" function. Seconds between ctdbd first log message and node healthy: BEFORE: 24.09 AFTER: 23.58 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 097046025176b9fcb670839d1a9f100f890e7ed2) 2010-06-22 17:20:23 +04:00			`/* register a message port for recovery elections */`
ctdb-include: Use new protocol definitions This gets rid of the duplicate definitions from ctdb_protocol.h. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 09:51:52 +03:00			`ctdb_client_set_message_handler(ctdb, CTDB_SRVID_ELECTION, election_handler, rec);`
speed startup: alter recovery loop We do a recovery on startup. But the code does: Sleep for ctdb->tunable.recover_interval. Check for recovery. We want to do it in the other order. This is best done by extracting the loop into a separate "main_loop" function. Seconds between ctdbd first log message and node healthy: BEFORE: 24.09 AFTER: 23.58 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 097046025176b9fcb670839d1a9f100f890e7ed2) 2010-06-22 17:20:23 +04:00
			`/* when nodes are disabled/enabled */`
			`ctdb_client_set_message_handler(ctdb, CTDB_SRVID_SET_NODE_FLAGS, monitor_handler, rec);`

			`/* when we are asked to puch out a flag change */`
			`ctdb_client_set_message_handler(ctdb, CTDB_SRVID_PUSH_NODE_FLAGS, push_flags_handler, rec);`

			`/* register a message port for vacuum fetch */`
			`ctdb_client_set_message_handler(ctdb, CTDB_SRVID_VACUUM_FETCH, vacuum_fetch_handler, rec);`

			`/* register a message port for reloadnodes */`
			`ctdb_client_set_message_handler(ctdb, CTDB_SRVID_RELOAD_NODES, reload_nodes_handler, rec);`

			`/* register a message port for performing a takeover run */`
			`ctdb_client_set_message_handler(ctdb, CTDB_SRVID_TAKEOVER_RUN, ip_reallocate_handler, rec);`

			`/* register a message port for disabling the ip check for a short while */`
			`ctdb_client_set_message_handler(ctdb, CTDB_SRVID_DISABLE_IP_CHECK, disable_ip_check_handler, rec);`

			`/* register a message port for updating the recovery daemons node assignment for an ip */`
			`ctdb_client_set_message_handler(ctdb, CTDB_SRVID_RECD_UPDATE_IP, recd_update_ip_handler, rec);`

When adding ips to nodes, set up a deferred rebalance for the whole node to trigger after 60 seconds in case the normal ipreallocated is not sufficient to trigger rebalance. (This used to be ctdb commit 4340263b219d75c39f8de22abe3f6f1c1ee63ea2) 2012-02-27 23:56:04 +04:00			`/* register a message port for forcing a rebalance of a node next`
			`reallocation */`
			`ctdb_client_set_message_handler(ctdb, CTDB_SRVID_REBALANCE_NODE, recd_node_rebalance_handler, rec);`

recoverd: New SRVID message CTDB_SRVID_DISABLE_TAKEOVER_RUNS This implements a superset of CTDB_SRVID_DISABLE_IP_CHECK. It stops the IP checks but also causes any attempted takeover runs to fail and be rescheduled. This is meant to completely stop IP movements. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 00db4de53a0d86013e79e6577e7e6cf3ef864e56) 2013-08-27 09:04:40 +04:00			`/* Register a message port for disabling takeover runs */`
			`ctdb_client_set_message_handler(ctdb,`
			`CTDB_SRVID_DISABLE_TAKEOVER_RUNS,`
			`disable_takeover_runs_handler, rec);`

ctdb-recoverd: New message ID CTDB_SRVID_DISABLE_RECOVERIES Also add test stub support. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-02-06 07:06:44 +03:00			`/* Register a message port for disabling recoveries */`
			`ctdb_client_set_message_handler(ctdb,`
			`CTDB_SRVID_DISABLE_RECOVERIES,`
			`disable_recoveries_handler, rec);`

ctdb-recoverd: Detach database from recovery daemon As part of vacuuming, recoverd attaches to databases to migrate records. When detaching a database from main daemon, it should be removed from recovery daemon also. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Michael Adam <obnox@samba.org> Autobuild-User(master): Michael Adam <obnox@samba.org> Autobuild-Date(master): Wed Apr 23 17:05:45 CEST 2014 on sn-devel-104 2014-04-22 09:24:49 +04:00			`/* register a message port for detaching database */`
			`ctdb_client_set_message_handler(ctdb,`
			`CTDB_SRVID_DETACH_DATABASE,`
			`detach_database_handler, rec);`

speed startup: alter recovery loop We do a recovery on startup. But the code does: Sleep for ctdb->tunable.recover_interval. Check for recovery. We want to do it in the other order. This is best done by extracting the loop into a separate "main_loop" function. Seconds between ctdbd first log message and node healthy: BEFORE: 24.09 AFTER: 23.58 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 097046025176b9fcb670839d1a9f100f890e7ed2) 2010-06-22 17:20:23 +04:00			`for (;;) {`
			`TALLOC_CTX *mem_ctx = talloc_new(ctdb);`
speed startup: don't wait a full recovery interval if we've already waited We currently sleep for one second, whether or not we've already slept. Change this to sleep for the remainder of the second, if at all. Seconds between ctdbd first log message and node healthy: BEFORE: 18.09 AFTER: 17.08 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit fde760b5f39c77172308a583da4c2443b71541c9) 2010-06-22 17:20:35 +04:00			`struct timeval start;`
			`double elapsed;`

speed startup: alter recovery loop We do a recovery on startup. But the code does: Sleep for ctdb->tunable.recover_interval. Check for recovery. We want to do it in the other order. This is best done by extracting the loop into a separate "main_loop" function. Seconds between ctdbd first log message and node healthy: BEFORE: 24.09 AFTER: 23.58 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 097046025176b9fcb670839d1a9f100f890e7ed2) 2010-06-22 17:20:23 +04:00			`if (!mem_ctx) {`
			`DEBUG(DEBUG_CRIT,(__location__`
			`" Failed to create temp context\n"));`
			`exit(-1);`
			`}`

speed startup: don't wait a full recovery interval if we've already waited We currently sleep for one second, whether or not we've already slept. Change this to sleep for the remainder of the second, if at all. Seconds between ctdbd first log message and node healthy: BEFORE: 18.09 AFTER: 17.08 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit fde760b5f39c77172308a583da4c2443b71541c9) 2010-06-22 17:20:35 +04:00			`start = timeval_current();`
speed startup: alter recovery loop We do a recovery on startup. But the code does: Sleep for ctdb->tunable.recover_interval. Check for recovery. We want to do it in the other order. This is best done by extracting the loop into a separate "main_loop" function. Seconds between ctdbd first log message and node healthy: BEFORE: 24.09 AFTER: 23.58 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 097046025176b9fcb670839d1a9f100f890e7ed2) 2010-06-22 17:20:23 +04:00			`main_loop(ctdb, rec, mem_ctx);`
			`talloc_free(mem_ctx);`

			`/* we only check for recovery once every second */`
speed startup: don't wait a full recovery interval if we've already waited We currently sleep for one second, whether or not we've already slept. Change this to sleep for the remainder of the second, if at all. Seconds between ctdbd first log message and node healthy: BEFORE: 18.09 AFTER: 17.08 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit fde760b5f39c77172308a583da4c2443b71541c9) 2010-06-22 17:20:35 +04:00			`elapsed = timeval_elapsed(&start);`
			`if (elapsed < ctdb->tunable.recover_interval) {`
			`ctdb_wait_timeout(ctdb, ctdb->tunable.recover_interval`
			`- elapsed);`
			`}`
speed startup: alter recovery loop We do a recovery on startup. But the code does: Sleep for ctdb->tunable.recover_interval. Check for recovery. We want to do it in the other order. This is best done by extracting the loop into a separate "main_loop" function. Seconds between ctdbd first log message and node healthy: BEFORE: 24.09 AFTER: 23.58 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 097046025176b9fcb670839d1a9f100f890e7ed2) 2010-06-22 17:20:23 +04:00			`}`
start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00			`}`

added health monitoring logic to ctdb, so a node loses its public IP address if one of the sybsystem event scripts reports a problem (This used to be ctdb commit c7a089256d86cec21097453bce5acbccee87413f) 2007-06-06 04:25:46 +04:00			`/*`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`event handler for when the main ctdbd dies`
			`*/`
ctdb-daemon: Stop using tevent compatibility definitions Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-26 08:50:09 +03:00			`static void ctdb_recoverd_parent(struct tevent_context *ev,`
			`struct tevent_fd *fde,`
moved the recovery daemon into the main ctdbd and enable it by default (This used to be ctdb commit 2a7d42124731f43d013cb76a798525eab4cc1ee0) 2007-05-15 09:13:36 +04:00			`uint16_t flags, void *private_data)`
			`{`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ALERT,("recovery daemon parent died - exiting\n"));`
moved the recovery daemon into the main ctdbd and enable it by default (This used to be ctdb commit 2a7d42124731f43d013cb76a798525eab4cc1ee0) 2007-05-15 09:13:36 +04:00			`_exit(1);`
			`}`

Monitor that the recovery daemon is still running from the main ctdb daemon and if it has terminated, then we shut down the main daemon as well (This used to be ctdb commit 7e587acaf8006254e89ff9b4bf48454821c85863) 2008-05-06 05:19:17 +04:00			`/*`
			`called regularly to verify that the recovery daemon is still running`
			`*/`
ctdb-daemon: Stop using tevent compatibility definitions Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-26 08:50:09 +03:00			`static void ctdb_check_recd(struct tevent_context *ev,`
			`struct tevent_timer *te,`
			`struct timeval yt, void *p)`
Monitor that the recovery daemon is still running from the main ctdb daemon and if it has terminated, then we shut down the main daemon as well (This used to be ctdb commit 7e587acaf8006254e89ff9b4bf48454821c85863) 2008-05-06 05:19:17 +04:00			`{`
			`struct ctdb_context *ctdb = talloc_get_type(p, struct ctdb_context);`

Track all child process so we never send a signal to an unrelated process (our child died and kernel wrapped the pid-space and reused the pid for a different process Wrap all creation of child processes inside ctdb_fork() which is used to track all processes we have spawned. Capture SIGCHLD to track also which child processes have terminated. Wrap kill() inside ctdb_kill() and make sure that we never send a !0 signal to a child process pid that has already terminated (and might have been replaced with a (This used to be ctdb commit f73a4b1495830bcdd094a93732a89dd53b3c2f78) 2012-05-03 05:42:41 +04:00			`if (ctdb_kill(ctdb, ctdb->recoverd_pid, 0) != 0) {`
If/when the recovery daemon terminates unexpectedly, try to restart it again from the main daemon instead of just shutting down the main deamon too. While it does not address the reason for recovery daemon shutting down, it reduces the impact of such issues and makes the system more robust. (This used to be ctdb commit 0566ef3d6cef809bda204877c493c80ff9eb2c40) 2011-03-01 04:09:42 +03:00			`DEBUG(DEBUG_ERR,("Recovery daemon (pid:%d) is no longer running. Trying to restart recovery daemon.\n", (int)ctdb->recoverd_pid));`
Monitor that the recovery daemon is still running from the main ctdb daemon and if it has terminated, then we shut down the main daemon as well (This used to be ctdb commit 7e587acaf8006254e89ff9b4bf48454821c85863) 2008-05-06 05:19:17 +04:00
ctdb-daemon: Stop using tevent compatibility definitions Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-26 08:50:09 +03:00			`tevent_add_timer(ctdb->ev, ctdb, timeval_zero(),`
			`ctdb_restart_recd, ctdb);`
Monitor that the recovery daemon is still running from the main ctdb daemon and if it has terminated, then we shut down the main daemon as well (This used to be ctdb commit 7e587acaf8006254e89ff9b4bf48454821c85863) 2008-05-06 05:19:17 +04:00
If/when the recovery daemon terminates unexpectedly, try to restart it again from the main daemon instead of just shutting down the main deamon too. While it does not address the reason for recovery daemon shutting down, it reduces the impact of such issues and makes the system more robust. (This used to be ctdb commit 0566ef3d6cef809bda204877c493c80ff9eb2c40) 2011-03-01 04:09:42 +03:00			`return;`
Monitor that the recovery daemon is still running from the main ctdb daemon and if it has terminated, then we shut down the main daemon as well (This used to be ctdb commit 7e587acaf8006254e89ff9b4bf48454821c85863) 2008-05-06 05:19:17 +04:00			`}`

ctdb-daemon: Stop using tevent compatibility definitions Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-26 08:50:09 +03:00			`tevent_add_timer(ctdb->ev, ctdb->recd_ctx,`
			`timeval_current_ofs(30, 0),`
			`ctdb_check_recd, ctdb);`
Monitor that the recovery daemon is still running from the main ctdb daemon and if it has terminated, then we shut down the main daemon as well (This used to be ctdb commit 7e587acaf8006254e89ff9b4bf48454821c85863) 2008-05-06 05:19:17 +04:00			`}`

ctdb-daemon: Stop using tevent compatibility definitions Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-26 08:50:09 +03:00			`static void recd_sig_child_handler(struct tevent_context *ev,`
			`struct tevent_signal *se, int signum,`
			`int count, void *dont_care,`
			`void *private_data)`
proper waitpid() fix. remove all waitpid() calls and use the event system to trap sigchld (This used to be ctdb commit 77458b2b6b51b2970c12b0e5b097088d3fb9d358) 2008-07-09 08:02:54 +04:00			`{`
			`// struct ctdb_context *ctdb = talloc_get_type(private_data, struct ctdb_context);`
			`int status;`
			`pid_t pid = -1;`

			`while (pid != 0) {`
			`pid = waitpid(-1, &status, WNOHANG);`
			`if (pid == -1) {`
dont log an error if waitpid returns -1 and errno is ECHILD (This used to be ctdb commit fdf50f3e774e3980af81c0b6f4ff81d085f4f697) 2009-06-19 09:55:13 +04:00			`if (errno != ECHILD) {`
			`DEBUG(DEBUG_ERR, (__location__ " waitpid() returned error. errno:%s(%d)\n", strerror(errno),errno));`
			`}`
proper waitpid() fix. remove all waitpid() calls and use the event system to trap sigchld (This used to be ctdb commit 77458b2b6b51b2970c12b0e5b097088d3fb9d358) 2008-07-09 08:02:54 +04:00			`return;`
			`}`
			`if (pid > 0) {`
			`DEBUG(DEBUG_DEBUG, ("RECD SIGCHLD from %d\n", (int)pid));`
			`}`
			`}`
			`}`

implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`/*`
			`startup the recovery daemon as a child of the main ctdb daemon`
			`*/`
moved the recovery daemon into the main ctdbd and enable it by default (This used to be ctdb commit 2a7d42124731f43d013cb76a798525eab4cc1ee0) 2007-05-15 09:13:36 +04:00			`int ctdb_start_recoverd(struct ctdb_context *ctdb)`
start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00			`{`
moved the recovery daemon into the main ctdbd and enable it by default (This used to be ctdb commit 2a7d42124731f43d013cb76a798525eab4cc1ee0) 2007-05-15 09:13:36 +04:00			`int fd[2];`
ctdb-daemon: Stop using tevent compatibility definitions Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-26 08:50:09 +03:00			`struct tevent_signal *se;`
event: Update events to latest Samba version 0.9.8 In Samba this is now called "tevent", and while we use the backwards compatibility wrappers they don't offer EVENT_FD_AUTOCLOSE: that is now a separate tevent_fd_set_auto_close() function. This is based on Samba version 7f29f817fa939ef1bbb740584f09e76e2ecd5b06. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 85e5e760cc91eb3157d3a88996ce474491646726) 2010-08-18 03:46:31 +04:00			`struct tevent_fd *fde;`
start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00
moved the recovery daemon into the main ctdbd and enable it by default (This used to be ctdb commit 2a7d42124731f43d013cb76a798525eab4cc1ee0) 2007-05-15 09:13:36 +04:00			`if (pipe(fd) != 0) {`
			`return -1;`
start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00			`}`

ctdb-logging: Remove log ringbuffer As far as we know, nobody uses this and it just complicates the logging subsystem. Remove all ringbuffer code and documentation. Update the local daemons startup code correspondingly. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Volker Lendecke <vl@samba.org> 2014-08-08 06:51:03 +04:00			`ctdb->recoverd_pid = ctdb_fork(ctdb);`
when we are shutting down, we should first shut down the recovery daemon (This used to be ctdb commit 39ade6b329adcd3234124d6a8daaa6181abf739b) 2007-10-22 06:34:08 +04:00			`if (ctdb->recoverd_pid == -1) {`
moved the recovery daemon into the main ctdbd and enable it by default (This used to be ctdb commit 2a7d42124731f43d013cb76a798525eab4cc1ee0) 2007-05-15 09:13:36 +04:00			`return -1;`
start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00			`}`
daemon: On shutdown, destroy timed events that check if recoverd is active When CTDB is shutting down, recovery daemon is stopped, but the event that checks if recovery daemon is still alive is not destroyed. So recovery master is restarted during shutdown if CTDB daemon takes longer to shutdown. There are two processes that check if recovery daemon is working. 1. ctdb_check_recd() - which checks every 30 seconds if the recovery daemon process exists. 2. ctdb_recd_ping_timeout() - which is triggered when recovery daemon fails to ping CTDB daemon. Both the events are periodic and need to be destroyed when shutting down. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 746168df2e691058e601016110fae818c6a265c3) 2012-12-04 08:05:44 +04:00
when we are shutting down, we should first shut down the recovery daemon (This used to be ctdb commit 39ade6b329adcd3234124d6a8daaa6181abf739b) 2007-10-22 06:34:08 +04:00			`if (ctdb->recoverd_pid != 0) {`
daemon: On shutdown, destroy timed events that check if recoverd is active When CTDB is shutting down, recovery daemon is stopped, but the event that checks if recovery daemon is still alive is not destroyed. So recovery master is restarted during shutdown if CTDB daemon takes longer to shutdown. There are two processes that check if recovery daemon is working. 1. ctdb_check_recd() - which checks every 30 seconds if the recovery daemon process exists. 2. ctdb_recd_ping_timeout() - which is triggered when recovery daemon fails to ping CTDB daemon. Both the events are periodic and need to be destroyed when shutting down. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 746168df2e691058e601016110fae818c6a265c3) 2012-12-04 08:05:44 +04:00			`talloc_free(ctdb->recd_ctx);`
			`ctdb->recd_ctx = talloc_new(ctdb);`
			`CTDB_NO_MEMORY(ctdb, ctdb->recd_ctx);`

moved the recovery daemon into the main ctdbd and enable it by default (This used to be ctdb commit 2a7d42124731f43d013cb76a798525eab4cc1ee0) 2007-05-15 09:13:36 +04:00			`close(fd[0]);`
ctdb-daemon: Stop using tevent compatibility definitions Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-26 08:50:09 +03:00			`tevent_add_timer(ctdb->ev, ctdb->recd_ctx,`
			`timeval_current_ofs(30, 0),`
			`ctdb_check_recd, ctdb);`
moved the recovery daemon into the main ctdbd and enable it by default (This used to be ctdb commit 2a7d42124731f43d013cb76a798525eab4cc1ee0) 2007-05-15 09:13:36 +04:00			`return 0;`
start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00			`}`

moved the recovery daemon into the main ctdbd and enable it by default (This used to be ctdb commit 2a7d42124731f43d013cb76a798525eab4cc1ee0) 2007-05-15 09:13:36 +04:00			`close(fd[1]);`

			`srandom(getpid() ^ time(NULL));`
start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00
ctdbd: Set process names for child processes This helps distinguish processes in process list in top, perf, etc. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 2493f57ce268d6fe7e4c40a87852c347fd60d29e) 2013-07-09 06:32:53 +04:00			`ctdb_set_process_name("ctdb_recovered");`
logging: give a unique logging name to each forked child. This means we can distinguish which child is logging, esp. via syslog where we have no pid. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 68b3761a0874429b90731741f0531f76dcfbb081) 2010-07-19 13:59:09 +04:00			`if (switch_from_server_to_client(ctdb, "recoverd") != 0) {`
create a helper function that converts a ctdb instance in daemon mode to become a ctdb client instance. use this from the recovery daemon child process to switch to client mode and connect back to the main daemon (This used to be ctdb commit 16f31786a031255ab5b3099a0a3c745de973347a) 2009-03-23 04:37:30 +03:00			`DEBUG(DEBUG_CRIT, (__location__ "ERROR: failed to switch recovery daemon into client mode. shutting down.\n"));`
start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00			`exit(1);`
			`}`

Drop the debug level for logging fd creation to DEBUG_DEBUG (This used to be ctdb commit eae1d4f9e52e73b4d8769868fffdafa590d03784) 2010-02-03 22:37:41 +03:00			`DEBUG(DEBUG_DEBUG, (__location__ " Created PIPE FD:%d to recovery daemon\n", fd[0]));`
add logging everytime we create a filedescriptor in the main ctdb daemon so we can spot if there are leaks. plug two leaks for filedescriptors related to when sending ARP fail and one leak when we can not parse the local address during tcp connection establish (This used to be ctdb commit ddd089810a14efe4be6e1ff3eccaa604e4913c9e) 2009-10-15 04:24:54 +04:00
ctdb-daemon: Stop using tevent compatibility definitions Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-26 08:50:09 +03:00			`fde = tevent_add_fd(ctdb->ev, ctdb, fd[0], TEVENT_FD_READ,`
			`ctdb_recoverd_parent, &fd[0]);`
event: Update events to latest Samba version 0.9.8 In Samba this is now called "tevent", and while we use the backwards compatibility wrappers they don't offer EVENT_FD_AUTOCLOSE: that is now a separate tevent_fd_set_auto_close() function. This is based on Samba version 7f29f817fa939ef1bbb740584f09e76e2ecd5b06. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 85e5e760cc91eb3157d3a88996ce474491646726) 2010-08-18 03:46:31 +04:00			`tevent_fd_set_auto_close(fde);`
create a helper function that converts a ctdb instance in daemon mode to become a ctdb client instance. use this from the recovery daemon child process to switch to client mode and connect back to the main daemon (This used to be ctdb commit 16f31786a031255ab5b3099a0a3c745de973347a) 2009-03-23 04:37:30 +03:00
proper waitpid() fix. remove all waitpid() calls and use the event system to trap sigchld (This used to be ctdb commit 77458b2b6b51b2970c12b0e5b097088d3fb9d358) 2008-07-09 08:02:54 +04:00			`/* set up a handler to pick up sigchld */`
ctdb-daemon: Stop using tevent compatibility definitions Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-26 08:50:09 +03:00			`se = tevent_add_signal(ctdb->ev, ctdb, SIGCHLD, 0,`
			`recd_sig_child_handler, ctdb);`
proper waitpid() fix. remove all waitpid() calls and use the event system to trap sigchld (This used to be ctdb commit 77458b2b6b51b2970c12b0e5b097088d3fb9d358) 2008-07-09 08:02:54 +04:00			`if (se == NULL) {`
			`DEBUG(DEBUG_CRIT,("Failed to set up signal handler for SIGCHLD in recovery daemon\n"));`
			`exit(1);`
			`}`

moved the recovery daemon into the main ctdbd and enable it by default (This used to be ctdb commit 2a7d42124731f43d013cb76a798525eab4cc1ee0) 2007-05-15 09:13:36 +04:00			`monitor_cluster(ctdb);`
recovery daemon with recovery master election election is primitive, it elects the lowest vnn as the recovery master two new controls, to get/set recovery master for a node to use recovery daemon, start one ./bin/recoverd --socket=ctdb.socket* for each ctdb daemon it has been briefly tested by deleting and adding nodes to a 4 node cluster but needs more testing (This used to be ctdb commit 541d1cc49d46d44042a31a8404d521412ef2fdb3) 2007-05-07 00:51:58 +04:00
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ALERT,("ERROR: ctdb_recoverd finished!?\n"));`
moved the recovery daemon into the main ctdbd and enable it by default (This used to be ctdb commit 2a7d42124731f43d013cb76a798525eab4cc1ee0) 2007-05-15 09:13:36 +04:00			`return -1;`
start working on a recovery daemon change ctdb_control so it takes a timeval pointer as argument. this is the timeout. if the node has not responded within hte timeout ctdb_control will return an error instead of hanging. if the timeval pointer is NULL then the call will block indefinitely if there is no response. this is used for now in the createdb control but all the helpers ctdb_ctrl_* should probably be updated to take a timeout parameter as well. (This used to be ctdb commit 1fe64b04869b17dbf123851b0fe09df8d28a6211) 2007-05-04 02:30:18 +04:00			`}`
when we are shutting down, we should first shut down the recovery daemon (This used to be ctdb commit 39ade6b329adcd3234124d6a8daaa6181abf739b) 2007-10-22 06:34:08 +04:00
			`/*`
			`shutdown the recovery daemon`
			`*/`
			`void ctdb_stop_recoverd(struct ctdb_context *ctdb)`
			`{`
			`if (ctdb->recoverd_pid == 0) {`
			`return;`
			`}`

merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_NOTICE,("Shutting down recovery daemon\n"));`
Track all child process so we never send a signal to an unrelated process (our child died and kernel wrapped the pid-space and reused the pid for a different process Wrap all creation of child processes inside ctdb_fork() which is used to track all processes we have spawned. Capture SIGCHLD to track also which child processes have terminated. Wrap kill() inside ctdb_kill() and make sure that we never send a !0 signal to a child process pid that has already terminated (and might have been replaced with a (This used to be ctdb commit f73a4b1495830bcdd094a93732a89dd53b3c2f78) 2012-05-03 05:42:41 +04:00			`ctdb_kill(ctdb, ctdb->recoverd_pid, SIGTERM);`
daemon: On shutdown, destroy timed events that check if recoverd is active When CTDB is shutting down, recovery daemon is stopped, but the event that checks if recovery daemon is still alive is not destroyed. So recovery master is restarted during shutdown if CTDB daemon takes longer to shutdown. There are two processes that check if recovery daemon is working. 1. ctdb_check_recd() - which checks every 30 seconds if the recovery daemon process exists. 2. ctdb_recd_ping_timeout() - which is triggered when recovery daemon fails to ping CTDB daemon. Both the events are periodic and need to be destroyed when shutting down. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 746168df2e691058e601016110fae818c6a265c3) 2012-12-04 08:05:44 +04:00
			`TALLOC_FREE(ctdb->recd_ctx);`
			`TALLOC_FREE(ctdb->recd_ping_count);`
when we are shutting down, we should first shut down the recovery daemon (This used to be ctdb commit 39ade6b329adcd3234124d6a8daaa6181abf739b) 2007-10-22 06:34:08 +04:00			`}`
If/when the recovery daemon terminates unexpectedly, try to restart it again from the main daemon instead of just shutting down the main deamon too. While it does not address the reason for recovery daemon shutting down, it reduces the impact of such issues and makes the system more robust. (This used to be ctdb commit 0566ef3d6cef809bda204877c493c80ff9eb2c40) 2011-03-01 04:09:42 +03:00
ctdb-daemon: Stop using tevent compatibility definitions Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-26 08:50:09 +03:00			`static void ctdb_restart_recd(struct tevent_context *ev,`
			`struct tevent_timer *te,`
			`struct timeval t, void *private_data)`
If/when the recovery daemon terminates unexpectedly, try to restart it again from the main daemon instead of just shutting down the main deamon too. While it does not address the reason for recovery daemon shutting down, it reduces the impact of such issues and makes the system more robust. (This used to be ctdb commit 0566ef3d6cef809bda204877c493c80ff9eb2c40) 2011-03-01 04:09:42 +03:00			`{`
			`struct ctdb_context *ctdb = talloc_get_type(private_data, struct ctdb_context);`

			`DEBUG(DEBUG_ERR,("Restarting recovery daemon\n"));`
			`ctdb_stop_recoverd(ctdb);`
			`ctdb_start_recoverd(ctdb);`
			`}`

4196 lines 114 KiB C Raw Normal View History Unescape Escape

4196 lines

114 KiB

C

Raw Normal View History