samba-mirror

mirror of https://github.com/samba-team/samba.git synced 2025-02-04 17:47:26 +03:00

1707 lines

45 KiB

C

Raw Normal View History

break set/get vnn map out from ctdb_control and put it in ctdb_recover.c for the time being remove all the [de]marshalling and just pass a structure around instead (This used to be ctdb commit b1169555ab7015976c0135ff51121cc238f5887c) 2007-05-03 11:06:24 +10:00			`/*`
separate out the freeze/thaw handling from recovery (This used to be ctdb commit 0b0640bd8b8334961f240e0cf276ac112cd6e616) 2007-05-12 15:15:27 +10:00			`ctdb recovery code`
break set/get vnn map out from ctdb_control and put it in ctdb_recover.c for the time being remove all the [de]marshalling and just pass a structure around instead (This used to be ctdb commit b1169555ab7015976c0135ff51121cc238f5887c) 2007-05-03 11:06:24 +10:00
			`Copyright (C) Andrew Tridgell 2007`
			`Copyright (C) Ronnie Sahlberg 2007`

ctdb is GPL not LGPL (This used to be ctdb commit 8624378010d1c2a1438e1e701339dfba7276f960) 2007-05-31 13:50:53 +10:00			`This program is free software; you can redistribute it and/or modify`
			`it under the terms of the GNU General Public License as published by`
update lib/replace from samba4 (This used to be ctdb commit f0555484105668c01c21f56322992e752e831109) 2007-07-10 15:29:31 +10:00			`the Free Software Foundation; either version 3 of the License, or`
ctdb is GPL not LGPL (This used to be ctdb commit 8624378010d1c2a1438e1e701339dfba7276f960) 2007-05-31 13:50:53 +10:00			`(at your option) any later version.`

			`This program is distributed in the hope that it will be useful,`
break set/get vnn map out from ctdb_control and put it in ctdb_recover.c for the time being remove all the [de]marshalling and just pass a structure around instead (This used to be ctdb commit b1169555ab7015976c0135ff51121cc238f5887c) 2007-05-03 11:06:24 +10:00			`but WITHOUT ANY WARRANTY; without even the implied warranty of`
ctdb is GPL not LGPL (This used to be ctdb commit 8624378010d1c2a1438e1e701339dfba7276f960) 2007-05-31 13:50:53 +10:00			`MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the`
			`GNU General Public License for more details.`

			`You should have received a copy of the GNU General Public License`
update lib/replace from samba4 (This used to be ctdb commit f0555484105668c01c21f56322992e752e831109) 2007-07-10 15:29:31 +10:00			`along with this program; if not, see <http://www.gnu.org/licenses/>.`
break set/get vnn map out from ctdb_control and put it in ctdb_recover.c for the time being remove all the [de]marshalling and just pass a structure around instead (This used to be ctdb commit b1169555ab7015976c0135ff51121cc238f5887c) 2007-05-03 11:06:24 +10:00			`*/`
ctdb-daemon: Remove dependency on includes.h Instead of includes.h, include the required header files explicitly. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-26 16:50:46 +11:00			`#include "replace.h"`
add a ctdb uptime command that prints when ctdb was started and when the last recovery occured (This used to be ctdb commit b86e8ccbdac044bb949c4fc2ebb27635126272a9) 2008-01-17 11:33:23 +11:00			`#include "system/time.h"`
break set/get vnn map out from ctdb_control and put it in ctdb_recover.c for the time being remove all the [de]marshalling and just pass a structure around instead (This used to be ctdb commit b1169555ab7015976c0135ff51121cc238f5887c) 2007-05-03 11:06:24 +10:00			`#include "system/network.h"`
			`#include "system/filesys.h"`
			`#include "system/wait.h"`
ctdb-daemon: Remove dependency on includes.h Instead of includes.h, include the required header files explicitly. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-26 16:50:46 +11:00
			`#include <talloc.h>`
			`#include <tevent.h>`
			`#include <tdb.h>`

ctdb-util: Rename db_wrap to tdb_wrap and make it a build subsystem This makes it consistent with Samba, to ease transition. Update unit test code to link to with tdb_wrap instead of including db_wrap.c. There are some potential whitespace fixes in this commit that have been ignored. CTDB's lib/tdb_wrap will be deleted after the transition to Samba's lib/tdb_wrap, so there's no point polishing it too much. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2014-08-15 15:46:33 +10:00			`#include "lib/tdb_wrap/tdb_wrap.h"`
ctdb-daemon: Remove dependency on includes.h Instead of includes.h, include the required header files explicitly. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-26 16:50:46 +11:00			`#include "lib/util/dlinklist.h"`
			`#include "lib/util/debug.h"`
ctdb-recovery: Include lib/util/time.h instead of samba_util.h Less is more... Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2016-02-16 13:41:21 +11:00			`#include "lib/util/time.h"`
ctdb: Use prctl_set_comment from lib/util Signed-off-by: Christof Schmitt <cs@samba.org> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-09-23 16:10:59 -07:00			`#include "lib/util/util_process.h"`
ctdb-daemon: Remove dependency on includes.h Instead of includes.h, include the required header files explicitly. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-26 16:50:46 +11:00
			`#include "ctdb_private.h"`
			`#include "ctdb_client.h"`

ctdb-daemon: Separate prototypes for system specific functions This groups function prototypes for system specific functions in common/system.h and removes them from ctdb_private.h. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-23 14:11:53 +11:00			`#include "common/system.h"`
ctdb-daemon: Separate prototypes for common client/server functions This groups function prototypes for common client/server functions in common/common.h and removes them from ctdb_private.h. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-23 14:17:34 +11:00			`#include "common/common.h"`
ctdb-server: Replace ctdb_logging.h with common/logging.h Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Michael Adam <obnox@samba.org> 2015-11-11 15:41:10 +11:00			`#include "common/logging.h"`
more robust freeze/thaw logic (This used to be ctdb commit 51c1e51aeb7dfac1683584df7ef1bef98c092f76) 2007-05-12 15:29:06 +10:00
break set/get vnn map out from ctdb_control and put it in ctdb_recover.c for the time being remove all the [de]marshalling and just pass a structure around instead (This used to be ctdb commit b1169555ab7015976c0135ff51121cc238f5887c) 2007-05-03 11:06:24 +10:00			`int`
			`ctdb_control_getvnnmap(struct ctdb_context ctdb, uint32_t opcode, TDB_DATA indata, TDB_DATA outdata)`
			`{`
separate the wire format and internal format for the vnn_map (This used to be ctdb commit 9a71718d87c5162f1423d85c2e86a01f6771925e) 2007-05-10 08:13:19 +10:00			`struct ctdb_vnn_map_wire *map;`
			`size_t len;`
break set/get vnn map out from ctdb_control and put it in ctdb_recover.c for the time being remove all the [de]marshalling and just pass a structure around instead (This used to be ctdb commit b1169555ab7015976c0135ff51121cc238f5887c) 2007-05-03 11:06:24 +10:00
ctdb: Fix some "declarations after code" problems Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2014-09-04 11:21:24 +10:00			`CHECK_CONTROL_DATA_SIZE(0);`

separate the wire format and internal format for the vnn_map (This used to be ctdb commit 9a71718d87c5162f1423d85c2e86a01f6771925e) 2007-05-10 08:13:19 +10:00			`len = offsetof(struct ctdb_vnn_map_wire, map) + sizeof(uint32_t)*ctdb->vnn_map->size;`
			`map = talloc_size(outdata, len);`
fixed some incorrect CTDB_NO_MEMORY*() calls found after fixing the _VOID varient (This used to be ctdb commit 07c9133aedecaee3607ad3b6fa94e5c56417a9de) 2008-07-04 17:04:26 +10:00			`CTDB_NO_MEMORY(ctdb, map);`
separate the wire format and internal format for the vnn_map (This used to be ctdb commit 9a71718d87c5162f1423d85c2e86a01f6771925e) 2007-05-10 08:13:19 +10:00
			`map->generation = ctdb->vnn_map->generation;`
			`map->size = ctdb->vnn_map->size;`
			`memcpy(map->map, ctdb->vnn_map->map, sizeof(uint32_t)*map->size);`

			`outdata->dsize = len;`
			`outdata->dptr = (uint8_t *)map;`
break set/get vnn map out from ctdb_control and put it in ctdb_recover.c for the time being remove all the [de]marshalling and just pass a structure around instead (This used to be ctdb commit b1169555ab7015976c0135ff51121cc238f5887c) 2007-05-03 11:06:24 +10:00
			`return 0;`
			`}`

ctdb-daemon: Remove freeze requirement for updating vnnmap In the parallel database recovery model, all the database will not remain frozen at the same time. So relax the condition to check if recovery is active. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-09-14 13:49:05 +10:00			`int`
break set/get vnn map out from ctdb_control and put it in ctdb_recover.c for the time being remove all the [de]marshalling and just pass a structure around instead (This used to be ctdb commit b1169555ab7015976c0135ff51121cc238f5887c) 2007-05-03 11:06:24 +10:00			`ctdb_control_setvnnmap(struct ctdb_context ctdb, uint32_t opcode, TDB_DATA indata, TDB_DATA outdata)`
			`{`
fixed setvnnmap to use wire structures too (This used to be ctdb commit 1208e4219d220b80e2f74974cac8ed2b8956d3ef) 2007-05-10 08:22:26 +10:00			`struct ctdb_vnn_map_wire map = (struct ctdb_vnn_map_wire )indata.dptr;`

ctdb-daemon: Remove freeze requirement for updating vnnmap In the parallel database recovery model, all the database will not remain frozen at the same time. So relax the condition to check if recovery is active. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-09-14 13:49:05 +10:00			`if (ctdb->recovery_mode != CTDB_RECOVERY_ACTIVE) {`
			`DEBUG(DEBUG_ERR, ("Attempt to set vnnmap when not in recovery\n"));`
ctdb-daemon: Avoid the use of ctdb->freeze_mode variable Use ctdb->freeze_mode only in ctdb_freeze.c and use the functions to check if databases are frozen everywhere else. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2014-08-21 12:32:02 +10:00			`return -1;`
don't allow setvnnmap while not frozen (This used to be ctdb commit a73f47f565894cc7e346177d87f2e6813837e1c6) 2007-05-14 13:48:40 +10:00			`}`

fixed setvnnmap to use wire structures too (This used to be ctdb commit 1208e4219d220b80e2f74974cac8ed2b8956d3ef) 2007-05-10 08:22:26 +10:00			`talloc_free(ctdb->vnn_map);`

			`ctdb->vnn_map = talloc(ctdb, struct ctdb_vnn_map);`
			`CTDB_NO_MEMORY(ctdb, ctdb->vnn_map);`

			`ctdb->vnn_map->generation = map->generation;`
			`ctdb->vnn_map->size = map->size;`
			`ctdb->vnn_map->map = talloc_array(ctdb->vnn_map, uint32_t, map->size);`
			`CTDB_NO_MEMORY(ctdb, ctdb->vnn_map->map);`
break set/get vnn map out from ctdb_control and put it in ctdb_recover.c for the time being remove all the [de]marshalling and just pass a structure around instead (This used to be ctdb commit b1169555ab7015976c0135ff51121cc238f5887c) 2007-05-03 11:06:24 +10:00
fixed setvnnmap to use wire structures too (This used to be ctdb commit 1208e4219d220b80e2f74974cac8ed2b8956d3ef) 2007-05-10 08:22:26 +10:00			`memcpy(ctdb->vnn_map->map, map->map, sizeof(uint32_t)*map->size);`
break set/get vnn map out from ctdb_control and put it in ctdb_recover.c for the time being remove all the [de]marshalling and just pass a structure around instead (This used to be ctdb commit b1169555ab7015976c0135ff51121cc238f5887c) 2007-05-03 11:06:24 +10:00
			`return 0;`
			`}`

fixup getdbmap control so it looks a bit nicer (This used to be ctdb commit 78a4d61cb78da20af5210488e685c91bc3023e90) 2007-05-03 13:07:34 +10:00			`int`
			`ctdb_control_getdbmap(struct ctdb_context ctdb, uint32_t opcode, TDB_DATA indata, TDB_DATA outdata)`
			`{`
			`uint32_t i, len;`
			`struct ctdb_db_context *ctdb_db;`
ctdb-daemon: Rename struct ctdb_dbid_map to ctdb_dbid_map_old Match struct ctdb_dbid as per protocol/protocol.h Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 17:46:05 +11:00			`struct ctdb_dbid_map_old *dbid_map;`
fixup getdbmap control so it looks a bit nicer (This used to be ctdb commit 78a4d61cb78da20af5210488e685c91bc3023e90) 2007-05-03 13:07:34 +10:00
			`CHECK_CONTROL_DATA_SIZE(0);`

			`len = 0;`
			`for(ctdb_db=ctdb->db_list;ctdb_db;ctdb_db=ctdb_db->next){`
			`len++;`
			`}`


ctdb-daemon: Rename struct ctdb_dbid_map to ctdb_dbid_map_old Match struct ctdb_dbid as per protocol/protocol.h Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 17:46:05 +11:00			`outdata->dsize = offsetof(struct ctdb_dbid_map_old, dbs) + sizeof(dbid_map->dbs[0])*len;`
fixup getdbmap control so it looks a bit nicer (This used to be ctdb commit 78a4d61cb78da20af5210488e685c91bc3023e90) 2007-05-03 13:07:34 +10:00			`outdata->dptr = (unsigned char *)talloc_zero_size(outdata, outdata->dsize);`
			`if (!outdata->dptr) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 20:07:15 +11:00			`DEBUG(DEBUG_ALERT, (__location__ " Failed to allocate dbmap array\n"));`
fixup getdbmap control so it looks a bit nicer (This used to be ctdb commit 78a4d61cb78da20af5210488e685c91bc3023e90) 2007-05-03 13:07:34 +10:00			`exit(1);`
			`}`

ctdb-daemon: Rename struct ctdb_dbid_map to ctdb_dbid_map_old Match struct ctdb_dbid as per protocol/protocol.h Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 17:46:05 +11:00			`dbid_map = (struct ctdb_dbid_map_old *)outdata->dptr;`
fixup getdbmap control so it looks a bit nicer (This used to be ctdb commit 78a4d61cb78da20af5210488e685c91bc3023e90) 2007-05-03 13:07:34 +10:00			`dbid_map->num = len;`
added support for persistent databases in ctdbd (This used to be ctdb commit 3115090a0d882beca9d70761130b74bb0821f201) 2007-09-21 12:24:02 +10:00			`for (i=0,ctdb_db=ctdb->db_list;ctdb_db;i++,ctdb_db=ctdb_db->next){`
ctdb-daemon: Rename struct ctdb_dbid_map to ctdb_dbid_map_old Match struct ctdb_dbid as per protocol/protocol.h Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 17:46:05 +11:00			`dbid_map->dbs[i].db_id = ctdb_db->db_id;`
ReadOnly: Change the ctdb_db structure to keep a uint8_t for flags instead of a boolean for the persistent flag. This is the same size as the original boolean but allows ut to add additional flags for the database (This used to be ctdb commit 7462761638d25880ad46024ad4ef21667eb99a98) 2011-09-01 10:21:55 +10:00			`if (ctdb_db->persistent != 0) {`
			`dbid_map->dbs[i].flags \|= CTDB_DB_FLAGS_PERSISTENT;`
			`}`
ReadOnly: add a readonly flag to the getdbmap control and show the readonly setting in ctdb getdbmap output (This used to be ctdb commit 4cac9ad7d9c9ca657a247a6c215476399c7d2210) 2011-09-01 10:28:15 +10:00			`if (ctdb_db->readonly != 0) {`
			`dbid_map->dbs[i].flags \|= CTDB_DB_FLAGS_READONLY;`
			`}`
STICKY: add prototype code to make records stick to a node to "calm" down if they are found to be very hot and accessed by a lot of clients. This can improve performance and stop clients from having to chase a rapidly migrating/bouncing record (This used to be ctdb commit d0d98f7e45e5084b81335b004d50bddc80cdc219) 2012-03-20 16:58:35 +11:00			`if (ctdb_db->sticky != 0) {`
			`dbid_map->dbs[i].flags \|= CTDB_DB_FLAGS_STICKY;`
			`}`
fixup getdbmap control so it looks a bit nicer (This used to be ctdb commit 78a4d61cb78da20af5210488e685c91bc3023e90) 2007-05-03 13:07:34 +10:00			`}`

			`return 0;`
			`}`
cleanup getnodemap (This used to be ctdb commit 3867ccf71a167fb82dbc5a3f03f968a325a0c70b) 2007-05-03 13:30:38 +10:00
ctdb-daemon: Factor out new function ctdb_node_list_to_map() Change ctdb_control_getnodemap() to use this. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-02-20 12:31:37 +11:00			`int`
			`ctdb_control_getnodemap(struct ctdb_context ctdb, uint32_t opcode, TDB_DATA indata, TDB_DATA outdata)`
			`{`
			`CHECK_CONTROL_DATA_SIZE(0);`

			`outdata->dptr = (unsigned char *)ctdb_node_list_to_map(ctdb->nodes,`
			`ctdb->num_nodes,`
			`outdata);`
			`if (outdata->dptr == NULL) {`
			`return -1;`
			`}`

			`outdata->dsize = talloc_get_size(outdata->dptr);`

update TAKEIP/RELEASEIP/GETPUBLICIP/GETNODEMAP controls so we retain an older ipv4-only version of these controls. We need this so that we are backwardcompatible with old versions of ctdb and so that we can interoperate with a ipv4-only recmaster during a rolling upgrade. (This used to be ctdb commit 6b76c520f97127099bd9fbaa0fa7af1c61947fb7) 2008-10-14 10:40:29 +11:00			`return 0;`
			`}`

ctdb-daemon: Don't delay reloading the nodes file Presumably this was done to minimise the chance of a recovery occurring while the nodemaps are inconsistent across nodes. Another potential theory is that the forced recovery in the ctdb.c:control_reload_nodes_file() stops another recovery occurring for ReRecoveryTimeout seconds, so this delay causes the reloads to occur during that period. This is no longer necessary because recoveries are now explicitly disabled while node files are reloaded. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-02-10 15:43:03 +11:00			`/*`
			`reload the nodes file`
			`*/`
			`int`
			`ctdb_control_reload_nodes_file(struct ctdb_context *ctdb, uint32_t opcode)`
to make it easier/less disruptive to add nodes to a running cluster add a new control that causes the node to drop the current nodes list and reread it from the nodes file. During this operation, the node will also drop the tcp layer and restart it. When we drop the tcp layer, by talloc_free()ing the ctcp structure add a destructor to ctcp so that we also can clean up and remove the references in the ctdb structure to the transport layer add two new commands for the ctdb tool. one to list all nodes in the nodesfile and the second a command to trigger a node to drop the transport and reinitialize it with the nde nodes file (This used to be ctdb commit 4bc20ac73e9fa94ffd43cccb6eeb438eeff9963c) 2008-02-19 14:44:48 +11:00			`{`
redesign how reloadnodes is implemented. modify the transport methods to allow to restart individual connections and set up destructors properly. only tear down/set-up tcp connections to nodes removed from the cluster or nodes added to the cluster. Leave tcp connections to unchanged nodes connected. make "ctdb reloadnodes" explicitely cause a recovery of the cluster once the files have been realoaded (This used to be ctdb commit d1057ed6de7de9f2a64d8fa012c52647e89b515b) 2008-12-02 13:26:30 +11:00			`int i, num_nodes;`
			`TALLOC_CTX *tmp_ctx;`
ctdb-daemon: Don't delay reloading the nodes file Presumably this was done to minimise the chance of a recovery occurring while the nodemaps are inconsistent across nodes. Another potential theory is that the forced recovery in the ctdb.c:control_reload_nodes_file() stops another recovery occurring for ReRecoveryTimeout seconds, so this delay causes the reloads to occur during that period. This is no longer necessary because recoveries are now explicitly disabled while node files are reloaded. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-02-10 15:43:03 +11:00			`struct ctdb_node **nodes;`
redesign how reloadnodes is implemented. modify the transport methods to allow to restart individual connections and set up destructors properly. only tear down/set-up tcp connections to nodes removed from the cluster or nodes added to the cluster. Leave tcp connections to unchanged nodes connected. make "ctdb reloadnodes" explicitely cause a recovery of the cluster once the files have been realoaded (This used to be ctdb commit d1057ed6de7de9f2a64d8fa012c52647e89b515b) 2008-12-02 13:26:30 +11:00
			`tmp_ctx = talloc_new(ctdb);`

			`/* steal the old nodes file for a while */`
			`talloc_steal(tmp_ctx, ctdb->nodes);`
			`nodes = ctdb->nodes;`
			`ctdb->nodes = NULL;`
			`num_nodes = ctdb->num_nodes;`
			`ctdb->num_nodes = 0;`
to make it easier/less disruptive to add nodes to a running cluster add a new control that causes the node to drop the current nodes list and reread it from the nodes file. During this operation, the node will also drop the tcp layer and restart it. When we drop the tcp layer, by talloc_free()ing the ctcp structure add a destructor to ctcp so that we also can clean up and remove the references in the ctdb structure to the transport layer add two new commands for the ctdb tool. one to list all nodes in the nodesfile and the second a command to trigger a node to drop the transport and reinitialize it with the nde nodes file (This used to be ctdb commit 4bc20ac73e9fa94ffd43cccb6eeb438eeff9963c) 2008-02-19 14:44:48 +11:00
redesign how reloadnodes is implemented. modify the transport methods to allow to restart individual connections and set up destructors properly. only tear down/set-up tcp connections to nodes removed from the cluster or nodes added to the cluster. Leave tcp connections to unchanged nodes connected. make "ctdb reloadnodes" explicitely cause a recovery of the cluster once the files have been realoaded (This used to be ctdb commit d1057ed6de7de9f2a64d8fa012c52647e89b515b) 2008-12-02 13:26:30 +11:00			`/* load the new nodes file */`
to make it easier/less disruptive to add nodes to a running cluster add a new control that causes the node to drop the current nodes list and reread it from the nodes file. During this operation, the node will also drop the tcp layer and restart it. When we drop the tcp layer, by talloc_free()ing the ctcp structure add a destructor to ctcp so that we also can clean up and remove the references in the ctdb structure to the transport layer add two new commands for the ctdb tool. one to list all nodes in the nodesfile and the second a command to trigger a node to drop the transport and reinitialize it with the nde nodes file (This used to be ctdb commit 4bc20ac73e9fa94ffd43cccb6eeb438eeff9963c) 2008-02-19 14:44:48 +11:00			`ctdb_load_nodes_file(ctdb);`
ctdb->methods becomes NULL when we shutdown the transport. If we shutdown the transport and CTDB later decides to send a command out for queueing, the call to ctdb->methods->allocate_pkt() will SEGV. This could trigger for example when we are in the process of shuttind down CTDBD and have already shutdown the transport but we are still waiting for the "shutdown" eventscripts to finish. If the event scripts now take much much longer to execute for some reason, this race condition becomes much more probable. Decorate all dereferencing of ctdb->methods-> with a check that ctdb->menthods is non-NULL (This used to be ctdb commit c4c2c53918da6fb566d6e9cbd6b02e61ae2921e7) 2008-05-11 14:28:33 +10:00
When we reload the nodes file instead of shutting down/restarting the entire tcp layer just bounce all outgoing connections and reconnect (This used to be ctdb commit e701a531868149f16561011e65794a4a46ee6596) 2008-10-07 18:12:54 +11:00			`for (i=0; i<ctdb->num_nodes; i++) {`
redesign how reloadnodes is implemented. modify the transport methods to allow to restart individual connections and set up destructors properly. only tear down/set-up tcp connections to nodes removed from the cluster or nodes added to the cluster. Leave tcp connections to unchanged nodes connected. make "ctdb reloadnodes" explicitely cause a recovery of the cluster once the files have been realoaded (This used to be ctdb commit d1057ed6de7de9f2a64d8fa012c52647e89b515b) 2008-12-02 13:26:30 +11:00			`/* keep any identical pre-existing nodes and connections */`
			`if ((i < num_nodes) && ctdb_same_address(&ctdb->nodes[i]->address, &nodes[i]->address)) {`
			`talloc_free(ctdb->nodes[i]);`
			`ctdb->nodes[i] = talloc_steal(ctdb->nodes, nodes[i]);`
			`continue;`
			`}`

add a new node state : DELETED. This is used to mark nodes as being DELETED internally in ctdb so that nodes are not renumbered if / when they are removed from the nodes file. This is used to be able to do "ctdb reloadnodes" at runtime without causing nodes to be renumbered. To do this, instead of deleting a node from the nodes file, just comment it out like 1.0.0.1 #1.0.0.2 1.0.0.3 After removing 1.0.0.2 from the cluster, the remaining nodes retain their pnn's from prior to the deletion, namely 0 and 2 Any line in the nodes file that is commented out represents a DELETED pnn (This used to be ctdb commit 6a5e4fd7fa391206b463bb4e976502f3ac5bd343) 2009-06-01 14:18:34 +10:00			`if (ctdb->nodes[i]->flags & NODE_FLAGS_DELETED) {`
			`continue;`
			`}`

redesign how reloadnodes is implemented. modify the transport methods to allow to restart individual connections and set up destructors properly. only tear down/set-up tcp connections to nodes removed from the cluster or nodes added to the cluster. Leave tcp connections to unchanged nodes connected. make "ctdb reloadnodes" explicitely cause a recovery of the cluster once the files have been realoaded (This used to be ctdb commit d1057ed6de7de9f2a64d8fa012c52647e89b515b) 2008-12-02 13:26:30 +11:00			`/* any new or different nodes must be added */`
When we reload the nodes file instead of shutting down/restarting the entire tcp layer just bounce all outgoing connections and reconnect (This used to be ctdb commit e701a531868149f16561011e65794a4a46ee6596) 2008-10-07 18:12:54 +11:00			`if (ctdb->methods->add_node(ctdb->nodes[i]) != 0) {`
			`DEBUG(DEBUG_CRIT, (__location__ " methods->add_node failed at %d\n", i));`
			`ctdb_fatal(ctdb, "failed to add node. shutting down\n");`
			`}`
redesign how reloadnodes is implemented. modify the transport methods to allow to restart individual connections and set up destructors properly. only tear down/set-up tcp connections to nodes removed from the cluster or nodes added to the cluster. Leave tcp connections to unchanged nodes connected. make "ctdb reloadnodes" explicitely cause a recovery of the cluster once the files have been realoaded (This used to be ctdb commit d1057ed6de7de9f2a64d8fa012c52647e89b515b) 2008-12-02 13:26:30 +11:00			`if (ctdb->methods->connect_node(ctdb->nodes[i]) != 0) {`
			`DEBUG(DEBUG_CRIT, (__location__ " methods->add_connect failed at %d\n", i));`
			`ctdb_fatal(ctdb, "failed to connect to node. shutting down\n");`
			`}`
ctdb->methods becomes NULL when we shutdown the transport. If we shutdown the transport and CTDB later decides to send a command out for queueing, the call to ctdb->methods->allocate_pkt() will SEGV. This could trigger for example when we are in the process of shuttind down CTDBD and have already shutdown the transport but we are still waiting for the "shutdown" eventscripts to finish. If the event scripts now take much much longer to execute for some reason, this race condition becomes much more probable. Decorate all dereferencing of ctdb->methods-> with a check that ctdb->menthods is non-NULL (This used to be ctdb commit c4c2c53918da6fb566d6e9cbd6b02e61ae2921e7) 2008-05-11 14:28:33 +10:00			`}`
to make it easier/less disruptive to add nodes to a running cluster add a new control that causes the node to drop the current nodes list and reread it from the nodes file. During this operation, the node will also drop the tcp layer and restart it. When we drop the tcp layer, by talloc_free()ing the ctcp structure add a destructor to ctcp so that we also can clean up and remove the references in the ctdb structure to the transport layer add two new commands for the ctdb tool. one to list all nodes in the nodesfile and the second a command to trigger a node to drop the transport and reinitialize it with the nde nodes file (This used to be ctdb commit 4bc20ac73e9fa94ffd43cccb6eeb438eeff9963c) 2008-02-19 14:44:48 +11:00
add a new node state : DELETED. This is used to mark nodes as being DELETED internally in ctdb so that nodes are not renumbered if / when they are removed from the nodes file. This is used to be able to do "ctdb reloadnodes" at runtime without causing nodes to be renumbered. To do this, instead of deleting a node from the nodes file, just comment it out like 1.0.0.1 #1.0.0.2 1.0.0.3 After removing 1.0.0.2 from the cluster, the remaining nodes retain their pnn's from prior to the deletion, namely 0 and 2 Any line in the nodes file that is commented out represents a DELETED pnn (This used to be ctdb commit 6a5e4fd7fa391206b463bb4e976502f3ac5bd343) 2009-06-01 14:18:34 +10:00			`/* tell the recovery daemon to reaload the nodes file too */`
			`ctdb_daemon_send_message(ctdb, ctdb->pnn, CTDB_SRVID_RELOAD_NODES, tdb_null);`

redesign how reloadnodes is implemented. modify the transport methods to allow to restart individual connections and set up destructors properly. only tear down/set-up tcp connections to nodes removed from the cluster or nodes added to the cluster. Leave tcp connections to unchanged nodes connected. make "ctdb reloadnodes" explicitely cause a recovery of the cluster once the files have been realoaded (This used to be ctdb commit d1057ed6de7de9f2a64d8fa012c52647e89b515b) 2008-12-02 13:26:30 +11:00			`talloc_free(tmp_ctx);`
to make it easier/less disruptive to add nodes to a running cluster add a new control that causes the node to drop the current nodes list and reread it from the nodes file. During this operation, the node will also drop the tcp layer and restart it. When we drop the tcp layer, by talloc_free()ing the ctcp structure add a destructor to ctcp so that we also can clean up and remove the references in the ctdb structure to the transport layer add two new commands for the ctdb tool. one to list all nodes in the nodesfile and the second a command to trigger a node to drop the transport and reinitialize it with the nde nodes file (This used to be ctdb commit 4bc20ac73e9fa94ffd43cccb6eeb438eeff9963c) 2008-02-19 14:44:48 +11:00
cleanup getnodemap (This used to be ctdb commit 3867ccf71a167fb82dbc5a3f03f968a325a0c70b) 2007-05-03 13:30:38 +10:00			`return 0;`
			`}`
cleanup the control "write record" (This used to be ctdb commit 4dd5c26a21a5dc2b2f76eb23cfeb4df82ba4e956) 2007-05-03 16:18:03 +10:00
more efficient traversal in pulldb control (This used to be ctdb commit fe614b10868e63b70e081b5bbfb74bf16fdf5716) 2008-01-07 14:07:01 +11:00			`/*`
			`a traverse function for pulling all relevent records from pulldb`
			`*/`
			`struct pulldb_data {`
- got rid of the complex hand marshalling in the recovery controls - fixed the re-send of ctdb calls after a generation change - fixed a reqid idr leak in controls - removed the write_record test code - use the new nonblock lockall code to prevent ctdbd from ever doing a blocking lock that could deadlock with smbd - moved more of the recovery controls into ctdb_recover.c (This used to be ctdb commit 565a21aa4f1e842309986ab97d6244801153deec) 2007-05-10 17:43:45 +10:00			`struct ctdb_context *ctdb;`
DEBUG: Add checks for and print debug messages when 1) a database contains very many records, 2) when a database is very big, 3) when a single record is very big. Add tunables to control when to log these instances and allow it to be completely turned off by setting the threshold to 0 (This used to be ctdb commit 9ed58fef4991725f75509433496f4d5ffae0ae87) 2012-05-21 13:11:38 +10:00			`struct ctdb_db_context *ctdb_db;`
renamed the pulldb structure to a ctdb_marshall_buffer (This used to be ctdb commit bad53b2d342bb9760497e6f4a61e64ca50d6e771) 2008-07-30 19:59:18 +10:00			`struct ctdb_marshall_buffer *pulldata;`
more efficient traversal in pulldb control (This used to be ctdb commit fe614b10868e63b70e081b5bbfb74bf16fdf5716) 2008-01-07 14:07:01 +11:00			`uint32_t len;`
RECOVER: When we pull databases during recovery, we used to reallocate the databuffer for each entry added. This would normally not be an issue, but for cases where memory is fragmented, this could start to cost significant cpu if we need to reallocate and move to a different region. Change this to instead preallocate , by default, 10MByte chunks to the data buffer. This significantly reduces the number of potential reallocate and move operations that may be required. Create a tunable to override/change how much preallocation should be used. (This used to be ctdb commit 1f262deaad0818f159f9c68330f7fec121679023) 2012-05-25 12:27:59 +10:00			`uint32_t allocated_len;`
more efficient traversal in pulldb control (This used to be ctdb commit fe614b10868e63b70e081b5bbfb74bf16fdf5716) 2008-01-07 14:07:01 +11:00			`bool failed;`
- got rid of the complex hand marshalling in the recovery controls - fixed the re-send of ctdb calls after a generation change - fixed a reqid idr leak in controls - removed the write_record test code - use the new nonblock lockall code to prevent ctdbd from ever doing a blocking lock that could deadlock with smbd - moved more of the recovery controls into ctdb_recover.c (This used to be ctdb commit 565a21aa4f1e842309986ab97d6244801153deec) 2007-05-10 17:43:45 +10:00			`};`

more efficient traversal in pulldb control (This used to be ctdb commit fe614b10868e63b70e081b5bbfb74bf16fdf5716) 2008-01-07 14:07:01 +11:00			`static int traverse_pulldb(struct tdb_context tdb, TDB_DATA key, TDB_DATA data, void p)`
- got rid of the complex hand marshalling in the recovery controls - fixed the re-send of ctdb calls after a generation change - fixed a reqid idr leak in controls - removed the write_record test code - use the new nonblock lockall code to prevent ctdbd from ever doing a blocking lock that could deadlock with smbd - moved more of the recovery controls into ctdb_recover.c (This used to be ctdb commit 565a21aa4f1e842309986ab97d6244801153deec) 2007-05-10 17:43:45 +10:00			`{`
more efficient traversal in pulldb control (This used to be ctdb commit fe614b10868e63b70e081b5bbfb74bf16fdf5716) 2008-01-07 14:07:01 +11:00			`struct pulldb_data params = (struct pulldb_data )p;`
ctdb-daemon: Rename struct ctdb_rec_data to ctdb_rec_data_old Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 17:30:30 +11:00			`struct ctdb_rec_data_old *rec;`
DEBUG: Add checks for and print debug messages when 1) a database contains very many records, 2) when a database is very big, 3) when a single record is very big. Add tunables to control when to log these instances and allow it to be completely turned off by setting the threshold to 0 (This used to be ctdb commit 9ed58fef4991725f75509433496f4d5ffae0ae87) 2012-05-21 13:11:38 +10:00			`struct ctdb_context *ctdb = params->ctdb;`
			`struct ctdb_db_context *ctdb_db = params->ctdb_db;`
- got rid of the complex hand marshalling in the recovery controls - fixed the re-send of ctdb calls after a generation change - fixed a reqid idr leak in controls - removed the write_record test code - use the new nonblock lockall code to prevent ctdbd from ever doing a blocking lock that could deadlock with smbd - moved more of the recovery controls into ctdb_recover.c (This used to be ctdb commit 565a21aa4f1e842309986ab97d6244801153deec) 2007-05-10 17:43:45 +10:00
more efficient traversal in pulldb control (This used to be ctdb commit fe614b10868e63b70e081b5bbfb74bf16fdf5716) 2008-01-07 14:07:01 +11:00			`/* add the record to the blob */`
			`rec = ctdb_marshall_record(params->pulldata, 0, key, NULL, data);`
			`if (rec == NULL) {`
			`params->failed = true;`
			`return -1;`
- got rid of the complex hand marshalling in the recovery controls - fixed the re-send of ctdb calls after a generation change - fixed a reqid idr leak in controls - removed the write_record test code - use the new nonblock lockall code to prevent ctdbd from ever doing a blocking lock that could deadlock with smbd - moved more of the recovery controls into ctdb_recover.c (This used to be ctdb commit 565a21aa4f1e842309986ab97d6244801153deec) 2007-05-10 17:43:45 +10:00			`}`
RECOVER: When we pull databases during recovery, we used to reallocate the databuffer for each entry added. This would normally not be an issue, but for cases where memory is fragmented, this could start to cost significant cpu if we need to reallocate and move to a different region. Change this to instead preallocate , by default, 10MByte chunks to the data buffer. This significantly reduces the number of potential reallocate and move operations that may be required. Create a tunable to override/change how much preallocation should be used. (This used to be ctdb commit 1f262deaad0818f159f9c68330f7fec121679023) 2012-05-25 12:27:59 +10:00			`if (params->len + rec->length >= params->allocated_len) {`
			`params->allocated_len = rec->length + params->len + ctdb->tunable.pulldb_preallocation_size;`
			`params->pulldata = talloc_realloc_size(NULL, params->pulldata, params->allocated_len);`
			`}`
more efficient traversal in pulldb control (This used to be ctdb commit fe614b10868e63b70e081b5bbfb74bf16fdf5716) 2008-01-07 14:07:01 +11:00			`if (params->pulldata == NULL) {`
When memory allocations for recovery fails, dont dereference a null pointer while trying to print the log message for the failure. also shutdown ctdb with ctdb_fatal() (This used to be ctdb commit f8642d0438c6bbb34a72c25d6a904b626e247410) 2010-09-03 11:58:27 +10:00			`DEBUG(DEBUG_CRIT,(__location__ " Failed to expand pulldb_data to %u\n", rec->length + params->len));`
			`ctdb_fatal(params->ctdb, "failed to allocate memory for recovery. shutting down\n");`
more efficient traversal in pulldb control (This used to be ctdb commit fe614b10868e63b70e081b5bbfb74bf16fdf5716) 2008-01-07 14:07:01 +11:00			`}`
			`params->pulldata->count++;`
			`memcpy(params->len+(uint8_t *)params->pulldata, rec, rec->length);`
			`params->len += rec->length;`
DEBUG: Add checks for and print debug messages when 1) a database contains very many records, 2) when a database is very big, 3) when a single record is very big. Add tunables to control when to log these instances and allow it to be completely turned off by setting the threshold to 0 (This used to be ctdb commit 9ed58fef4991725f75509433496f4d5ffae0ae87) 2012-05-21 13:11:38 +10:00
			`if (ctdb->tunable.db_record_size_warn != 0 && rec->length > ctdb->tunable.db_record_size_warn) {`
			`DEBUG(DEBUG_ERR,("Data record in %s is big. Record size is %d bytes\n", ctdb_db->db_name, (int)rec->length));`
			`}`

more efficient traversal in pulldb control (This used to be ctdb commit fe614b10868e63b70e081b5bbfb74bf16fdf5716) 2008-01-07 14:07:01 +11:00			`talloc_free(rec);`
- got rid of the complex hand marshalling in the recovery controls - fixed the re-send of ctdb calls after a generation change - fixed a reqid idr leak in controls - removed the write_record test code - use the new nonblock lockall code to prevent ctdbd from ever doing a blocking lock that could deadlock with smbd - moved more of the recovery controls into ctdb_recover.c (This used to be ctdb commit 565a21aa4f1e842309986ab97d6244801153deec) 2007-05-10 17:43:45 +10:00
			`return 0;`
			`}`

			`/*`
ctdb:recover: fix a comment typo Signed-off-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit 5067392d2e06795559f25828b65c129608b65c0b) 2012-11-20 11:20:34 +01:00			`pull a bunch of records from a ltdb, filtering by lmaster`
- got rid of the complex hand marshalling in the recovery controls - fixed the re-send of ctdb calls after a generation change - fixed a reqid idr leak in controls - removed the write_record test code - use the new nonblock lockall code to prevent ctdbd from ever doing a blocking lock that could deadlock with smbd - moved more of the recovery controls into ctdb_recover.c (This used to be ctdb commit 565a21aa4f1e842309986ab97d6244801153deec) 2007-05-10 17:43:45 +10:00			`*/`
			`int32_t ctdb_control_pull_db(struct ctdb_context ctdb, TDB_DATA indata, TDB_DATA outdata)`
cleanup the control "write record" (This used to be ctdb commit 4dd5c26a21a5dc2b2f76eb23cfeb4df82ba4e956) 2007-05-03 16:18:03 +10:00			`{`
ctdb-daemon: Rename struct ctdb_control_pulldb to ctdb_pulldb Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-28 19:10:53 +11:00			`struct ctdb_pulldb *pull;`
cleanup the control "write record" (This used to be ctdb commit 4dd5c26a21a5dc2b2f76eb23cfeb4df82ba4e956) 2007-05-03 16:18:03 +10:00			`struct ctdb_db_context *ctdb_db;`
more efficient traversal in pulldb control (This used to be ctdb commit fe614b10868e63b70e081b5bbfb74bf16fdf5716) 2008-01-07 14:07:01 +11:00			`struct pulldb_data params;`
rename the structure we use for marshalling multiple records (This used to be ctdb commit 4d205476d286570a6e1f52b59af42858ce051106) 2008-07-30 14:24:56 +10:00			`struct ctdb_marshall_buffer *reply;`
- got rid of the complex hand marshalling in the recovery controls - fixed the re-send of ctdb calls after a generation change - fixed a reqid idr leak in controls - removed the write_record test code - use the new nonblock lockall code to prevent ctdbd from ever doing a blocking lock that could deadlock with smbd - moved more of the recovery controls into ctdb_recover.c (This used to be ctdb commit 565a21aa4f1e842309986ab97d6244801153deec) 2007-05-10 17:43:45 +10:00
ctdb-daemon: Rename struct ctdb_control_pulldb to ctdb_pulldb Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-28 19:10:53 +11:00			`pull = (struct ctdb_pulldb *)indata.dptr;`
ctdb-daemon: Use database specific mark/unmark routines Instead of marking all the databases with priority, mark only the database which is currently being processed. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-09-14 14:53:45 +10:00
- got rid of the complex hand marshalling in the recovery controls - fixed the re-send of ctdb calls after a generation change - fixed a reqid idr leak in controls - removed the write_record test code - use the new nonblock lockall code to prevent ctdbd from ever doing a blocking lock that could deadlock with smbd - moved more of the recovery controls into ctdb_recover.c (This used to be ctdb commit 565a21aa4f1e842309986ab97d6244801153deec) 2007-05-10 17:43:45 +10:00			`ctdb_db = find_ctdb_db(ctdb, pull->db_id);`
			`if (!ctdb_db) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 20:07:15 +11:00			`DEBUG(DEBUG_ERR,(__location__ " Unknown db 0x%08x\n", pull->db_id));`
- got rid of the complex hand marshalling in the recovery controls - fixed the re-send of ctdb calls after a generation change - fixed a reqid idr leak in controls - removed the write_record test code - use the new nonblock lockall code to prevent ctdbd from ever doing a blocking lock that could deadlock with smbd - moved more of the recovery controls into ctdb_recover.c (This used to be ctdb commit 565a21aa4f1e842309986ab97d6244801153deec) 2007-05-10 17:43:45 +10:00			`return -1;`
			`}`

ctdb-daemon: Use database specific freeze check routine Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-09-15 14:01:49 +10:00			`if (!ctdb_db_frozen(ctdb_db)) {`
ctdb-daemon: Use database specific mark/unmark routines Instead of marking all the databases with priority, mark only the database which is currently being processed. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-09-14 14:53:45 +10:00			`DEBUG(DEBUG_ERR,`
			`("rejecting ctdb_control_pull_db when not frozen\n"));`
initial attempt at freezing databases in priority order (This used to be ctdb commit e8d692590da1070c87a4144031e3306d190ebed2) 2009-10-12 12:08:39 +11:00			`return -1;`
			`}`

rename the structure we use for marshalling multiple records (This used to be ctdb commit 4d205476d286570a6e1f52b59af42858ce051106) 2008-07-30 14:24:56 +10:00			`reply = talloc_zero(outdata, struct ctdb_marshall_buffer);`
more efficient traversal in pulldb control (This used to be ctdb commit fe614b10868e63b70e081b5bbfb74bf16fdf5716) 2008-01-07 14:07:01 +11:00			`CTDB_NO_MEMORY(ctdb, reply);`

			`reply->db_id = pull->db_id;`
prioritise the dmaster in case of matching rsn (This used to be ctdb commit 4996a12174aa0d215a5b14cb970bdf83eed34a39) 2007-05-12 19:57:12 +10:00
more efficient traversal in pulldb control (This used to be ctdb commit fe614b10868e63b70e081b5bbfb74bf16fdf5716) 2008-01-07 14:07:01 +11:00			`params.ctdb = ctdb;`
DEBUG: Add checks for and print debug messages when 1) a database contains very many records, 2) when a database is very big, 3) when a single record is very big. Add tunables to control when to log these instances and allow it to be completely turned off by setting the threshold to 0 (This used to be ctdb commit 9ed58fef4991725f75509433496f4d5ffae0ae87) 2012-05-21 13:11:38 +10:00			`params.ctdb_db = ctdb_db;`
more efficient traversal in pulldb control (This used to be ctdb commit fe614b10868e63b70e081b5bbfb74bf16fdf5716) 2008-01-07 14:07:01 +11:00			`params.pulldata = reply;`
rename the structure we use for marshalling multiple records (This used to be ctdb commit 4d205476d286570a6e1f52b59af42858ce051106) 2008-07-30 14:24:56 +10:00			`params.len = offsetof(struct ctdb_marshall_buffer, data);`
RECOVER: When we pull databases during recovery, we used to reallocate the databuffer for each entry added. This would normally not be an issue, but for cases where memory is fragmented, this could start to cost significant cpu if we need to reallocate and move to a different region. Change this to instead preallocate , by default, 10MByte chunks to the data buffer. This significantly reduces the number of potential reallocate and move operations that may be required. Create a tunable to override/change how much preallocation should be used. (This used to be ctdb commit 1f262deaad0818f159f9c68330f7fec121679023) 2012-05-25 12:27:59 +10:00			`params.allocated_len = params.len;`
more efficient traversal in pulldb control (This used to be ctdb commit fe614b10868e63b70e081b5bbfb74bf16fdf5716) 2008-01-07 14:07:01 +11:00			`params.failed = false;`
- got rid of the complex hand marshalling in the recovery controls - fixed the re-send of ctdb calls after a generation change - fixed a reqid idr leak in controls - removed the write_record test code - use the new nonblock lockall code to prevent ctdbd from ever doing a blocking lock that could deadlock with smbd - moved more of the recovery controls into ctdb_recover.c (This used to be ctdb commit 565a21aa4f1e842309986ab97d6244801153deec) 2007-05-10 17:43:45 +10:00
server: Use tdb_check to verify persistent tdbs on startup Depending on --max-persistent-check-errors we allow ctdb to start with unhealthy persistent databases. The default is 0 which means to reject a startup with unhealthy dbs. The health of the persistent databases is checked after each recovery. Node monitoring and the "startup" is deferred until all persistent databases are healthy. Databases can become healthy automaticly by a completely HEALTHY node joining the cluster. Or by an administrator with "ctdb backupdb/restoredb" or "ctdb wipedb". metze (This used to be ctdb commit 15f133d5150ed1badb4fef7d644f10cd08a25cb5) 2009-12-07 13:28:11 +01:00			`if (ctdb_db->unhealthy_reason) {`
			`/* this is just a warning, as the tdb should be empty anyway */`
			`DEBUG(DEBUG_WARNING,("db(%s) unhealty in ctdb_control_pull_db: %s\n",`
			`ctdb_db->db_name, ctdb_db->unhealthy_reason));`
			`}`

ctdb-daemon: Use database specific mark/unmark routines Instead of marking all the databases with priority, mark only the database which is currently being processed. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-09-14 14:53:45 +10:00			`if (ctdb_lockdb_mark(ctdb_db) != 0) {`
			`DEBUG(DEBUG_ERR,(__location__ " Failed to get lock on entire db - failing\n"));`
- got rid of the complex hand marshalling in the recovery controls - fixed the re-send of ctdb calls after a generation change - fixed a reqid idr leak in controls - removed the write_record test code - use the new nonblock lockall code to prevent ctdbd from ever doing a blocking lock that could deadlock with smbd - moved more of the recovery controls into ctdb_recover.c (This used to be ctdb commit 565a21aa4f1e842309986ab97d6244801153deec) 2007-05-10 17:43:45 +10:00			`return -1;`
			`}`

more efficient traversal in pulldb control (This used to be ctdb commit fe614b10868e63b70e081b5bbfb74bf16fdf5716) 2008-01-07 14:07:01 +11:00			`if (tdb_traverse_read(ctdb_db->ltdb->tdb, traverse_pulldb, &params) == -1) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 20:07:15 +11:00			`DEBUG(DEBUG_ERR,(__location__ " Failed to get traverse db '%s'\n", ctdb_db->db_name));`
ctdb-daemon: Use database specific mark/unmark routines Instead of marking all the databases with priority, mark only the database which is currently being processed. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-09-14 14:53:45 +10:00			`ctdb_lockdb_unmark(ctdb_db);`
more efficient traversal in pulldb control (This used to be ctdb commit fe614b10868e63b70e081b5bbfb74bf16fdf5716) 2008-01-07 14:07:01 +11:00			`talloc_free(params.pulldata);`
			`return -1;`
- got rid of the complex hand marshalling in the recovery controls - fixed the re-send of ctdb calls after a generation change - fixed a reqid idr leak in controls - removed the write_record test code - use the new nonblock lockall code to prevent ctdbd from ever doing a blocking lock that could deadlock with smbd - moved more of the recovery controls into ctdb_recover.c (This used to be ctdb commit 565a21aa4f1e842309986ab97d6244801153deec) 2007-05-10 17:43:45 +10:00			`}`

ctdb-daemon: Use database specific mark/unmark routines Instead of marking all the databases with priority, mark only the database which is currently being processed. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-09-14 14:53:45 +10:00			`ctdb_lockdb_unmark(ctdb_db);`
- got rid of the complex hand marshalling in the recovery controls - fixed the re-send of ctdb calls after a generation change - fixed a reqid idr leak in controls - removed the write_record test code - use the new nonblock lockall code to prevent ctdbd from ever doing a blocking lock that could deadlock with smbd - moved more of the recovery controls into ctdb_recover.c (This used to be ctdb commit 565a21aa4f1e842309986ab97d6244801153deec) 2007-05-10 17:43:45 +10:00
more efficient traversal in pulldb control (This used to be ctdb commit fe614b10868e63b70e081b5bbfb74bf16fdf5716) 2008-01-07 14:07:01 +11:00			`outdata->dptr = (uint8_t *)params.pulldata;`
			`outdata->dsize = params.len;`
- got rid of the complex hand marshalling in the recovery controls - fixed the re-send of ctdb calls after a generation change - fixed a reqid idr leak in controls - removed the write_record test code - use the new nonblock lockall code to prevent ctdbd from ever doing a blocking lock that could deadlock with smbd - moved more of the recovery controls into ctdb_recover.c (This used to be ctdb commit 565a21aa4f1e842309986ab97d6244801153deec) 2007-05-10 17:43:45 +10:00
DEBUG: Add checks for and print debug messages when 1) a database contains very many records, 2) when a database is very big, 3) when a single record is very big. Add tunables to control when to log these instances and allow it to be completely turned off by setting the threshold to 0 (This used to be ctdb commit 9ed58fef4991725f75509433496f4d5ffae0ae87) 2012-05-21 13:11:38 +10:00			`if (ctdb->tunable.db_record_count_warn != 0 && params.pulldata->count > ctdb->tunable.db_record_count_warn) {`
			`DEBUG(DEBUG_ERR,("Database %s is big. Contains %d records\n", ctdb_db->db_name, params.pulldata->count));`
			`}`
			`if (ctdb->tunable.db_size_warn != 0 && outdata->dsize > ctdb->tunable.db_size_warn) {`
			`DEBUG(DEBUG_ERR,("Database %s is big. Contains %d bytes\n", ctdb_db->db_name, (int)outdata->dsize));`
			`}`


- got rid of the complex hand marshalling in the recovery controls - fixed the re-send of ctdb calls after a generation change - fixed a reqid idr leak in controls - removed the write_record test code - use the new nonblock lockall code to prevent ctdbd from ever doing a blocking lock that could deadlock with smbd - moved more of the recovery controls into ctdb_recover.c (This used to be ctdb commit 565a21aa4f1e842309986ab97d6244801153deec) 2007-05-10 17:43:45 +10:00			`return 0;`
			`}`

ctdb-daemon: Implement new controls DB_PULL and DB_PUSH_START/DB_PUSH_CONFIRM Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2016-02-19 17:32:09 +11:00			`struct db_pull_state {`
			`struct ctdb_context *ctdb;`
			`struct ctdb_db_context *ctdb_db;`
			`struct ctdb_marshall_buffer *recs;`
			`uint32_t pnn;`
			`uint64_t srvid;`
			`uint32_t num_records;`
			`};`

			`static int traverse_db_pull(struct tdb_context *tdb, TDB_DATA key,`
			`TDB_DATA data, void *private_data)`
			`{`
			`struct db_pull_state state = (struct db_pull_state )private_data;`
			`struct ctdb_marshall_buffer *recs;`

			`recs = ctdb_marshall_add(state->ctdb, state->recs,`
			`state->ctdb_db->db_id, 0, key, NULL, data);`
			`if (recs == NULL) {`
			`TALLOC_FREE(state->recs);`
			`return -1;`
			`}`
			`state->recs = recs;`

			`if (talloc_get_size(state->recs) >=`
			`state->ctdb->tunable.rec_buffer_size_limit) {`
			`TDB_DATA buffer;`
			`int ret;`

			`buffer = ctdb_marshall_finish(state->recs);`
			`ret = ctdb_daemon_send_message(state->ctdb, state->pnn,`
			`state->srvid, buffer);`
			`if (ret != 0) {`
			`TALLOC_FREE(state->recs);`
			`return -1;`
			`}`

			`state->num_records += state->recs->count;`
			`TALLOC_FREE(state->recs);`
			`}`

			`return 0;`
			`}`

			`int32_t ctdb_control_db_pull(struct ctdb_context *ctdb,`
			`struct ctdb_req_control_old *c,`
			`TDB_DATA indata, TDB_DATA *outdata)`
			`{`
			`struct ctdb_pulldb_ext *pulldb_ext;`
			`struct ctdb_db_context *ctdb_db;`
			`struct db_pull_state state;`
			`int ret;`

			`pulldb_ext = (struct ctdb_pulldb_ext *)indata.dptr;`

			`ctdb_db = find_ctdb_db(ctdb, pulldb_ext->db_id);`
			`if (ctdb_db == NULL) {`
			`DEBUG(DEBUG_ERR,(__location__ " Unknown db 0x%08x\n",`
			`pulldb_ext->db_id));`
			`return -1;`
			`}`

			`if (!ctdb_db_frozen(ctdb_db)) {`
			`DEBUG(DEBUG_ERR,`
			`("rejecting ctdb_control_pull_db when not frozen\n"));`
			`return -1;`
			`}`

			`if (ctdb_db->unhealthy_reason) {`
			`/* this is just a warning, as the tdb should be empty anyway */`
			`DEBUG(DEBUG_WARNING,`
			`("db(%s) unhealty in ctdb_control_db_pull: %s\n",`
			`ctdb_db->db_name, ctdb_db->unhealthy_reason));`
			`}`

			`state.ctdb = ctdb;`
			`state.ctdb_db = ctdb_db;`
			`state.recs = NULL;`
			`state.pnn = c->hdr.srcnode;`
			`state.srvid = pulldb_ext->srvid;`
			`state.num_records = 0;`

			`if (ctdb_lockdb_mark(ctdb_db) != 0) {`
			`DEBUG(DEBUG_ERR,`
			`(__location__ " Failed to get lock on entire db - failing\n"));`
			`return -1;`
			`}`

			`ret = tdb_traverse_read(ctdb_db->ltdb->tdb, traverse_db_pull, &state);`
			`if (ret == -1) {`
			`DEBUG(DEBUG_ERR,`
			`(__location__ " Failed to get traverse db '%s'\n",`
			`ctdb_db->db_name));`
			`ctdb_lockdb_unmark(ctdb_db);`
			`return -1;`
			`}`

			`/* Last few records */`
			`if (state.recs != NULL) {`
			`TDB_DATA buffer;`

			`buffer = ctdb_marshall_finish(state.recs);`
			`ret = ctdb_daemon_send_message(state.ctdb, state.pnn,`
			`state.srvid, buffer);`
			`if (ret != 0) {`
			`TALLOC_FREE(state.recs);`
			`ctdb_lockdb_unmark(ctdb_db);`
			`return -1;`
			`}`

			`state.num_records += state.recs->count;`
			`TALLOC_FREE(state.recs);`
			`}`

			`ctdb_lockdb_unmark(ctdb_db);`

			`outdata->dptr = talloc_size(outdata, sizeof(uint32_t));`
			`if (outdata->dptr == NULL) {`
			`DEBUG(DEBUG_ERR, (__location__ " Memory allocation error\n"));`
			`return -1;`
			`}`

			`memcpy(outdata->dptr, (uint8_t *)&state.num_records, sizeof(uint32_t));`
			`outdata->dsize = sizeof(uint32_t);`

			`return 0;`
			`}`

- got rid of the complex hand marshalling in the recovery controls - fixed the re-send of ctdb calls after a generation change - fixed a reqid idr leak in controls - removed the write_record test code - use the new nonblock lockall code to prevent ctdbd from ever doing a blocking lock that could deadlock with smbd - moved more of the recovery controls into ctdb_recover.c (This used to be ctdb commit 565a21aa4f1e842309986ab97d6244801153deec) 2007-05-10 17:43:45 +10:00			`/*`
			`push a bunch of records into a ltdb, filtering by rsn`
			`*/`
			`int32_t ctdb_control_push_db(struct ctdb_context *ctdb, TDB_DATA indata)`
			`{`
rename the structure we use for marshalling multiple records (This used to be ctdb commit 4d205476d286570a6e1f52b59af42858ce051106) 2008-07-30 14:24:56 +10:00			`struct ctdb_marshall_buffer reply = (struct ctdb_marshall_buffer )indata.dptr;`
- got rid of the complex hand marshalling in the recovery controls - fixed the re-send of ctdb calls after a generation change - fixed a reqid idr leak in controls - removed the write_record test code - use the new nonblock lockall code to prevent ctdbd from ever doing a blocking lock that could deadlock with smbd - moved more of the recovery controls into ctdb_recover.c (This used to be ctdb commit 565a21aa4f1e842309986ab97d6244801153deec) 2007-05-10 17:43:45 +10:00			`struct ctdb_db_context *ctdb_db;`
			`int i, ret;`
ctdb-daemon: Rename struct ctdb_rec_data to ctdb_rec_data_old Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 17:30:30 +11:00			`struct ctdb_rec_data_old *rec;`
- got rid of the complex hand marshalling in the recovery controls - fixed the re-send of ctdb calls after a generation change - fixed a reqid idr leak in controls - removed the write_record test code - use the new nonblock lockall code to prevent ctdbd from ever doing a blocking lock that could deadlock with smbd - moved more of the recovery controls into ctdb_recover.c (This used to be ctdb commit 565a21aa4f1e842309986ab97d6244801153deec) 2007-05-10 17:43:45 +10:00
rename the structure we use for marshalling multiple records (This used to be ctdb commit 4d205476d286570a6e1f52b59af42858ce051106) 2008-07-30 14:24:56 +10:00			`if (indata.dsize < offsetof(struct ctdb_marshall_buffer, data)) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 20:07:15 +11:00			`DEBUG(DEBUG_ERR,(__location__ " invalid data in pulldb reply\n"));`
- got rid of the complex hand marshalling in the recovery controls - fixed the re-send of ctdb calls after a generation change - fixed a reqid idr leak in controls - removed the write_record test code - use the new nonblock lockall code to prevent ctdbd from ever doing a blocking lock that could deadlock with smbd - moved more of the recovery controls into ctdb_recover.c (This used to be ctdb commit 565a21aa4f1e842309986ab97d6244801153deec) 2007-05-10 17:43:45 +10:00			`return -1;`
			`}`

			`ctdb_db = find_ctdb_db(ctdb, reply->db_id);`
cleanup the control "write record" (This used to be ctdb commit 4dd5c26a21a5dc2b2f76eb23cfeb4df82ba4e956) 2007-05-03 16:18:03 +10:00			`if (!ctdb_db) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 20:07:15 +11:00			`DEBUG(DEBUG_ERR,(__location__ " Unknown db 0x%08x\n", reply->db_id));`
cleanup the control "write record" (This used to be ctdb commit 4dd5c26a21a5dc2b2f76eb23cfeb4df82ba4e956) 2007-05-03 16:18:03 +10:00			`return -1;`
			`}`

ctdb-daemon: Use database specific freeze check routine Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-09-15 14:01:49 +10:00			`if (!ctdb_db_frozen(ctdb_db)) {`
ctdb-daemon: Use database specific mark/unmark routines Instead of marking all the databases with priority, mark only the database which is currently being processed. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-09-14 14:53:45 +10:00			`DEBUG(DEBUG_ERR,`
			`("rejecting ctdb_control_push_db when not frozen\n"));`
initial attempt at freezing databases in priority order (This used to be ctdb commit e8d692590da1070c87a4144031e3306d190ebed2) 2009-10-12 12:08:39 +11:00			`return -1;`
			`}`

ctdb-daemon: Use database specific mark/unmark routines Instead of marking all the databases with priority, mark only the database which is currently being processed. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-09-14 14:53:45 +10:00			`if (ctdb_lockdb_mark(ctdb_db) != 0) {`
			`DEBUG(DEBUG_ERR,(__location__ " Failed to get lock on entire db - failing\n"));`
- got rid of the complex hand marshalling in the recovery controls - fixed the re-send of ctdb calls after a generation change - fixed a reqid idr leak in controls - removed the write_record test code - use the new nonblock lockall code to prevent ctdbd from ever doing a blocking lock that could deadlock with smbd - moved more of the recovery controls into ctdb_recover.c (This used to be ctdb commit 565a21aa4f1e842309986ab97d6244801153deec) 2007-05-10 17:43:45 +10:00			`return -1;`
			`}`
cleanup the control "write record" (This used to be ctdb commit 4dd5c26a21a5dc2b2f76eb23cfeb4df82ba4e956) 2007-05-03 16:18:03 +10:00
ctdb-daemon: Rename struct ctdb_rec_data to ctdb_rec_data_old Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 17:30:30 +11:00			`rec = (struct ctdb_rec_data_old *)&reply->data[0];`
- got rid of the complex hand marshalling in the recovery controls - fixed the re-send of ctdb calls after a generation change - fixed a reqid idr leak in controls - removed the write_record test code - use the new nonblock lockall code to prevent ctdbd from ever doing a blocking lock that could deadlock with smbd - moved more of the recovery controls into ctdb_recover.c (This used to be ctdb commit 565a21aa4f1e842309986ab97d6244801153deec) 2007-05-10 17:43:45 +10:00
added debug constants to allow for better mapping to syslog levels (This used to be ctdb commit 7ba8f1dde318eab03f4257e5a89fd23e7281e502) 2008-02-04 17:44:24 +11:00			`DEBUG(DEBUG_INFO,("starting push of %u records for dbid 0x%x\n",`
- merged ctdb_store test from ronnie - added DatabaseHashSize tunable - added logging of events inside recovery (for timing) (This used to be ctdb commit 3593cdb928b91e217faf1b3c537fa28dc82cdace) 2007-06-17 23:31:44 +10:00			`reply->count, reply->db_id));`

- got rid of the complex hand marshalling in the recovery controls - fixed the re-send of ctdb calls after a generation change - fixed a reqid idr leak in controls - removed the write_record test code - use the new nonblock lockall code to prevent ctdbd from ever doing a blocking lock that could deadlock with smbd - moved more of the recovery controls into ctdb_recover.c (This used to be ctdb commit 565a21aa4f1e842309986ab97d6244801153deec) 2007-05-10 17:43:45 +10:00			`for (i=0;i<reply->count;i++) {`
			`TDB_DATA key, data;`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 12:38:01 +11:00			`struct ctdb_ltdb_header *hdr;`
- got rid of the complex hand marshalling in the recovery controls - fixed the re-send of ctdb calls after a generation change - fixed a reqid idr leak in controls - removed the write_record test code - use the new nonblock lockall code to prevent ctdbd from ever doing a blocking lock that could deadlock with smbd - moved more of the recovery controls into ctdb_recover.c (This used to be ctdb commit 565a21aa4f1e842309986ab97d6244801153deec) 2007-05-10 17:43:45 +10:00
			`key.dptr = &rec->data[0];`
			`key.dsize = rec->keylen;`
			`data.dptr = &rec->data[key.dsize];`
			`data.dsize = rec->datalen;`

			`if (data.dsize < sizeof(struct ctdb_ltdb_header)) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 20:07:15 +11:00			`DEBUG(DEBUG_CRIT,(__location__ " bad ltdb record\n"));`
more robust freeze/thaw logic (This used to be ctdb commit 51c1e51aeb7dfac1683584df7ef1bef98c092f76) 2007-05-12 15:29:06 +10:00			`goto failed;`
- got rid of the complex hand marshalling in the recovery controls - fixed the re-send of ctdb calls after a generation change - fixed a reqid idr leak in controls - removed the write_record test code - use the new nonblock lockall code to prevent ctdbd from ever doing a blocking lock that could deadlock with smbd - moved more of the recovery controls into ctdb_recover.c (This used to be ctdb commit 565a21aa4f1e842309986ab97d6244801153deec) 2007-05-10 17:43:45 +10:00			`}`
			`hdr = (struct ctdb_ltdb_header *)data.dptr;`
ReadOnly: After performing a recovery, clear out all flags related to readonly delegations and revoke (This used to be ctdb commit 9985a97e11688f3f688bb84e1180fd57c42077f4) 2011-07-20 13:08:21 +10:00			`/* strip off any read only record flags. All readonly records`
			`are revoked implicitely by a recovery`
			`*/`
recover: use CTDB_REC_RO_FLAGS where appropriate Signed-off-by: Michael Adam <obnox@samba.org> Reviewed-By: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit b5a8791268e938d7e017056e0e2bd2cbec1fa690) 2013-04-19 16:24:32 +02:00			`hdr->flags &= ~CTDB_REC_RO_FLAGS;`
ReadOnly: After performing a recovery, clear out all flags related to readonly delegations and revoke (This used to be ctdb commit 9985a97e11688f3f688bb84e1180fd57c42077f4) 2011-07-20 13:08:21 +10:00
- got rid of the complex hand marshalling in the recovery controls - fixed the re-send of ctdb calls after a generation change - fixed a reqid idr leak in controls - removed the write_record test code - use the new nonblock lockall code to prevent ctdbd from ever doing a blocking lock that could deadlock with smbd - moved more of the recovery controls into ctdb_recover.c (This used to be ctdb commit 565a21aa4f1e842309986ab97d6244801153deec) 2007-05-10 17:43:45 +10:00			`data.dptr += sizeof(*hdr);`
			`data.dsize -= sizeof(*hdr);`

new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 12:38:01 +11:00			`ret = ctdb_ltdb_store(ctdb_db, key, hdr, data);`
- got rid of the complex hand marshalling in the recovery controls - fixed the re-send of ctdb calls after a generation change - fixed a reqid idr leak in controls - removed the write_record test code - use the new nonblock lockall code to prevent ctdbd from ever doing a blocking lock that could deadlock with smbd - moved more of the recovery controls into ctdb_recover.c (This used to be ctdb commit 565a21aa4f1e842309986ab97d6244801153deec) 2007-05-10 17:43:45 +10:00			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 20:07:15 +11:00			`DEBUG(DEBUG_CRIT, (__location__ " Unable to store record\n"));`
more robust freeze/thaw logic (This used to be ctdb commit 51c1e51aeb7dfac1683584df7ef1bef98c092f76) 2007-05-12 15:29:06 +10:00			`goto failed;`
- got rid of the complex hand marshalling in the recovery controls - fixed the re-send of ctdb calls after a generation change - fixed a reqid idr leak in controls - removed the write_record test code - use the new nonblock lockall code to prevent ctdbd from ever doing a blocking lock that could deadlock with smbd - moved more of the recovery controls into ctdb_recover.c (This used to be ctdb commit 565a21aa4f1e842309986ab97d6244801153deec) 2007-05-10 17:43:45 +10:00			`}`
this fixes the non-dmaster bug that has plagued us for months (This used to be ctdb commit 2acf6c6201862debfca054a09262f75c066d2deb) 2008-01-05 09:34:47 +11:00
ctdb-daemon: Rename struct ctdb_rec_data to ctdb_rec_data_old Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 17:30:30 +11:00			`rec = (struct ctdb_rec_data_old )(rec->length + (uint8_t )rec);`
- got rid of the complex hand marshalling in the recovery controls - fixed the re-send of ctdb calls after a generation change - fixed a reqid idr leak in controls - removed the write_record test code - use the new nonblock lockall code to prevent ctdbd from ever doing a blocking lock that could deadlock with smbd - moved more of the recovery controls into ctdb_recover.c (This used to be ctdb commit 565a21aa4f1e842309986ab97d6244801153deec) 2007-05-10 17:43:45 +10:00			`}`

added debug constants to allow for better mapping to syslog levels (This used to be ctdb commit 7ba8f1dde318eab03f4257e5a89fd23e7281e502) 2008-02-04 17:44:24 +11:00			`DEBUG(DEBUG_DEBUG,("finished push of %u records for dbid 0x%x\n",`
- merged ctdb_store test from ronnie - added DatabaseHashSize tunable - added logging of events inside recovery (for timing) (This used to be ctdb commit 3593cdb928b91e217faf1b3c537fa28dc82cdace) 2007-06-17 23:31:44 +10:00			`reply->count, reply->db_id));`

ReadOnly: After recovering all databases, make sure to clear out the tracking database used to track delegations and revoke. This is because the recovery will implicitely result in a revoke of all delegations. (This used to be ctdb commit b5520933b9922d6af6f59f535824e1cdacb9f774) 2011-07-20 13:20:32 +10:00			`if (ctdb_db->readonly) {`
			`DEBUG(DEBUG_CRIT,("Clearing the tracking database for dbid 0x%x\n",`
			`ctdb_db->db_id));`
			`if (tdb_wipe_all(ctdb_db->rottdb) != 0) {`
			`DEBUG(DEBUG_ERR,("Failed to wipe tracking database for 0x%x. Dropping read-only delegation support\n", ctdb_db->db_id));`
			`ctdb_db->readonly = false;`
			`tdb_close(ctdb_db->rottdb);`
			`ctdb_db->rottdb = NULL;`
ReadOnly: Check the readonly flag instead of whether the tdb pointer is NULL or not (This used to be ctdb commit 01314c2cb3a480917d6a632b83c39f0a48bba0e7) 2011-08-23 10:41:52 +10:00			`ctdb_db->readonly = false;`
ReadOnly: After recovering all databases, make sure to clear out the tracking database used to track delegations and revoke. This is because the recovery will implicitely result in a revoke of all delegations. (This used to be ctdb commit b5520933b9922d6af6f59f535824e1cdacb9f774) 2011-07-20 13:20:32 +10:00			`}`
ReadOnly: Once recovery has finished, make sure to free all revoke child processes and trigger the destructors for all deferred calls to re-queue the original packets to the input packet processing function (This used to be ctdb commit 530a78aa05910beeca0867c4dbe226d4ce73f946) 2011-07-20 14:25:29 +10:00			`while (ctdb_db->revokechild_active != NULL) {`
			`talloc_free(ctdb_db->revokechild_active);`
			`}`
ReadOnly: After recovering all databases, make sure to clear out the tracking database used to track delegations and revoke. This is because the recovery will implicitely result in a revoke of all delegations. (This used to be ctdb commit b5520933b9922d6af6f59f535824e1cdacb9f774) 2011-07-20 13:20:32 +10:00			`}`

ctdb-daemon: Use database specific mark/unmark routines Instead of marking all the databases with priority, mark only the database which is currently being processed. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-09-14 14:53:45 +10:00			`ctdb_lockdb_unmark(ctdb_db);`
- got rid of the complex hand marshalling in the recovery controls - fixed the re-send of ctdb calls after a generation change - fixed a reqid idr leak in controls - removed the write_record test code - use the new nonblock lockall code to prevent ctdbd from ever doing a blocking lock that could deadlock with smbd - moved more of the recovery controls into ctdb_recover.c (This used to be ctdb commit 565a21aa4f1e842309986ab97d6244801153deec) 2007-05-10 17:43:45 +10:00			`return 0;`
more robust freeze/thaw logic (This used to be ctdb commit 51c1e51aeb7dfac1683584df7ef1bef98c092f76) 2007-05-12 15:29:06 +10:00
			`failed:`
ctdb-daemon: Use database specific mark/unmark routines Instead of marking all the databases with priority, mark only the database which is currently being processed. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-09-14 14:53:45 +10:00			`ctdb_lockdb_unmark(ctdb_db);`
more robust freeze/thaw logic (This used to be ctdb commit 51c1e51aeb7dfac1683584df7ef1bef98c092f76) 2007-05-12 15:29:06 +10:00			`return -1;`
- got rid of the complex hand marshalling in the recovery controls - fixed the re-send of ctdb calls after a generation change - fixed a reqid idr leak in controls - removed the write_record test code - use the new nonblock lockall code to prevent ctdbd from ever doing a blocking lock that could deadlock with smbd - moved more of the recovery controls into ctdb_recover.c (This used to be ctdb commit 565a21aa4f1e842309986ab97d6244801153deec) 2007-05-10 17:43:45 +10:00			`}`

ctdb-daemon: Implement new controls DB_PULL and DB_PUSH_START/DB_PUSH_CONFIRM Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2016-02-19 17:32:09 +11:00			`struct db_push_state {`
			`struct ctdb_context *ctdb;`
			`struct ctdb_db_context *ctdb_db;`
			`uint64_t srvid;`
			`uint32_t num_records;`
			`bool failed;`
			`};`

			`static void db_push_msg_handler(uint64_t srvid, TDB_DATA indata,`
			`void *private_data)`
			`{`
			`struct db_push_state *state = talloc_get_type(`
			`private_data, struct db_push_state);`
			`struct ctdb_marshall_buffer *recs;`
			`struct ctdb_rec_data_old *rec;`
			`int i, ret;`

			`if (state->failed) {`
			`return;`
			`}`

			`recs = (struct ctdb_marshall_buffer *)indata.dptr;`
			`rec = (struct ctdb_rec_data_old *)&recs->data[0];`

			`DEBUG(DEBUG_INFO, ("starting push of %u records for dbid 0x%x\n",`
			`recs->count, recs->db_id));`

			`for (i=0; i<recs->count; i++) {`
			`TDB_DATA key, data;`
			`struct ctdb_ltdb_header *hdr;`

			`key.dptr = &rec->data[0];`
			`key.dsize = rec->keylen;`
			`data.dptr = &rec->data[key.dsize];`
			`data.dsize = rec->datalen;`

			`if (data.dsize < sizeof(struct ctdb_ltdb_header)) {`
			`DEBUG(DEBUG_CRIT,(__location__ " bad ltdb record\n"));`
			`goto failed;`
			`}`

			`hdr = (struct ctdb_ltdb_header *)data.dptr;`
			`/* Strip off any read only record flags.`
			`* All readonly records are revoked implicitely by a recovery.`
			`*/`
			`hdr->flags &= ~CTDB_REC_RO_FLAGS;`

			`data.dptr += sizeof(*hdr);`
			`data.dsize -= sizeof(*hdr);`

			`ret = ctdb_ltdb_store(state->ctdb_db, key, hdr, data);`
			`if (ret != 0) {`
			`DEBUG(DEBUG_ERR,`
			`(__location__ " Unable to store record\n"));`
			`goto failed;`
			`}`

			`rec = (struct ctdb_rec_data_old )(rec->length + (uint8_t )rec);`
			`}`

			`DEBUG(DEBUG_DEBUG, ("finished push of %u records for dbid 0x%x\n",`
			`recs->count, recs->db_id));`

			`state->num_records += recs->count;`
			`return;`

			`failed:`
			`state->failed = true;`
			`}`

			`int32_t ctdb_control_db_push_start(struct ctdb_context *ctdb, TDB_DATA indata)`
			`{`
			`struct ctdb_pulldb_ext *pulldb_ext;`
			`struct ctdb_db_context *ctdb_db;`
			`struct db_push_state *state;`
			`int ret;`

			`pulldb_ext = (struct ctdb_pulldb_ext *)indata.dptr;`

			`ctdb_db = find_ctdb_db(ctdb, pulldb_ext->db_id);`
			`if (ctdb_db == NULL) {`
			`DEBUG(DEBUG_ERR,`
			`(__location__ " Unknown db 0x%08x\n", pulldb_ext->db_id));`
			`return -1;`
			`}`

			`if (!ctdb_db_frozen(ctdb_db)) {`
			`DEBUG(DEBUG_ERR,`
			`("rejecting ctdb_control_db_push_start when not frozen\n"));`
			`return -1;`
			`}`

			`if (ctdb_db->push_started) {`
			`DEBUG(DEBUG_WARNING,`
			`(__location__ " DB push already started for %s\n",`
			`ctdb_db->db_name));`

			`/* De-register old state */`
			`state = (struct db_push_state *)ctdb_db->push_state;`
			`if (state != NULL) {`
			`srvid_deregister(ctdb->srv, state->srvid, state);`
			`talloc_free(state);`
			`ctdb_db->push_state = NULL;`
			`}`
			`}`

			`state = talloc_zero(ctdb_db, struct db_push_state);`
			`if (state == NULL) {`
			`DEBUG(DEBUG_ERR, (__location__ " Memory allocation error\n"));`
			`return -1;`
			`}`

			`state->ctdb = ctdb;`
			`state->ctdb_db = ctdb_db;`
			`state->srvid = pulldb_ext->srvid;`
			`state->failed = false;`

			`ret = srvid_register(ctdb->srv, state, state->srvid,`
			`db_push_msg_handler, state);`
			`if (ret != 0) {`
			`DEBUG(DEBUG_ERR,`
			`(__location__ " Failed to register srvid for db push\n"));`
			`talloc_free(state);`
			`return -1;`
			`}`

			`if (ctdb_lockdb_mark(ctdb_db) != 0) {`
			`DEBUG(DEBUG_ERR,`
			`(__location__ " Failed to get lock on entire db - failing\n"));`
			`srvid_deregister(ctdb->srv, state->srvid, state);`
			`talloc_free(state);`
			`return -1;`
			`}`

			`ctdb_db->push_started = true;`
			`ctdb_db->push_state = state;`

			`return 0;`
			`}`

			`int32_t ctdb_control_db_push_confirm(struct ctdb_context *ctdb,`
			`TDB_DATA indata, TDB_DATA *outdata)`
			`{`
			`uint32_t db_id;`
			`struct ctdb_db_context *ctdb_db;`
			`struct db_push_state *state;`

			`db_id = (uint32_t )indata.dptr;`

			`ctdb_db = find_ctdb_db(ctdb, db_id);`
			`if (ctdb_db == NULL) {`
			`DEBUG(DEBUG_ERR,(__location__ " Unknown db 0x%08x\n", db_id));`
			`return -1;`
			`}`

			`if (!ctdb_db_frozen(ctdb_db)) {`
			`DEBUG(DEBUG_ERR,`
			`("rejecting ctdb_control_db_push_confirm when not frozen\n"));`
			`return -1;`
			`}`

			`if (!ctdb_db->push_started) {`
			`DEBUG(DEBUG_ERR, (__location__ " DB push not started\n"));`
			`return -1;`
			`}`

			`if (ctdb_db->readonly) {`
			`DEBUG(DEBUG_ERR,`
			`("Clearing the tracking database for dbid 0x%x\n",`
			`ctdb_db->db_id));`
			`if (tdb_wipe_all(ctdb_db->rottdb) != 0) {`
			`DEBUG(DEBUG_ERR,`
			`("Failed to wipe tracking database for 0x%x."`
			`" Dropping read-only delegation support\n",`
			`ctdb_db->db_id));`
			`ctdb_db->readonly = false;`
			`tdb_close(ctdb_db->rottdb);`
			`ctdb_db->rottdb = NULL;`
			`ctdb_db->readonly = false;`
			`}`

			`while (ctdb_db->revokechild_active != NULL) {`
			`talloc_free(ctdb_db->revokechild_active);`
			`}`
			`}`

			`ctdb_lockdb_unmark(ctdb_db);`

			`state = (struct db_push_state *)ctdb_db->push_state;`
			`if (state == NULL) {`
			`DEBUG(DEBUG_ERR, (__location__ " Missing push db state\n"));`
			`return -1;`
			`}`

			`srvid_deregister(ctdb->srv, state->srvid, state);`

			`outdata->dptr = talloc_size(outdata, sizeof(uint32_t));`
			`if (outdata->dptr == NULL) {`
			`DEBUG(DEBUG_ERR, (__location__ " Memory allocation error\n"));`
			`talloc_free(state);`
			`ctdb_db->push_state = NULL;`
			`return -1;`
			`}`

			`memcpy(outdata->dptr, (uint8_t *)&state->num_records, sizeof(uint32_t));`
			`outdata->dsize = sizeof(uint32_t);`

			`talloc_free(state);`
			`ctdb_db->push_state = NULL;`

			`return 0;`
			`}`

- make calling of recovered event script async - shutdown sockets before calling shutdown script (This used to be ctdb commit c5e099feef94a014a77742b6cc1d0afe78ef9da9) 2007-06-02 08:41:19 +10:00			`struct ctdb_set_recmode_state {`
add back the test inside the daemon that if someone asks us to drop recovery mode back to NORMAL that we can not lock the reclock file since at this stage it MUST be locked by the recovery daemon. in order to avoid a non-blocking fnctl() lock from blocking and cause "issues" we move the 'test that we can not lock reclock file' into a child process. (This used to be ctdb commit 3af994641ec2234e37da1fa1f693441586471a7e) 2007-10-16 15:27:07 +10:00			`struct ctdb_context *ctdb;`
ctdb-daemon: Rename struct ctdb_req_control to ctdb_req_control_old Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 16:42:05 +11:00			`struct ctdb_req_control_old *c;`
add back the test inside the daemon that if someone asks us to drop recovery mode back to NORMAL that we can not lock the reclock file since at this stage it MUST be locked by the recovery daemon. in order to avoid a non-blocking fnctl() lock from blocking and cause "issues" we move the 'test that we can not lock reclock file' into a child process. (This used to be ctdb commit 3af994641ec2234e37da1fa1f693441586471a7e) 2007-10-16 15:27:07 +10:00			`int fd[2];`
ctdb-daemon: Stop using tevent compatibility definitions Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-26 16:50:09 +11:00			`struct tevent_timer *te;`
			`struct tevent_fd *fde;`
add back the test inside the daemon that if someone asks us to drop recovery mode back to NORMAL that we can not lock the reclock file since at this stage it MUST be locked by the recovery daemon. in order to avoid a non-blocking fnctl() lock from blocking and cause "issues" we move the 'test that we can not lock reclock file' into a child process. (This used to be ctdb commit 3af994641ec2234e37da1fa1f693441586471a7e) 2007-10-16 15:27:07 +10:00			`pid_t child;`
Track how long it takes to take out the recovery lock from both the main dameon and also from the recovery daemon. Log this in "ctdb statistics". Also add a varaible "RecLockLatencyMs" that will log an error everytime it takes longer than this to access the reclock file. (This used to be ctdb commit 042377ed803bb8f7ca9d6ea1a387427b7b8ba45a) 2009-05-14 10:33:25 +10:00			`struct timeval start_time;`
- make calling of recovered event script async - shutdown sockets before calling shutdown script (This used to be ctdb commit c5e099feef94a014a77742b6cc1d0afe78ef9da9) 2007-06-02 08:41:19 +10:00			`};`

add back the test inside the daemon that if someone asks us to drop recovery mode back to NORMAL that we can not lock the reclock file since at this stage it MUST be locked by the recovery daemon. in order to avoid a non-blocking fnctl() lock from blocking and cause "issues" we move the 'test that we can not lock reclock file' into a child process. (This used to be ctdb commit 3af994641ec2234e37da1fa1f693441586471a7e) 2007-10-16 15:27:07 +10:00			`/*`
			`called if our set_recmode child times out. this would happen if`
			`ctdb_recovery_lock() would block.`
			`*/`
ctdb-daemon: Stop using tevent compatibility definitions Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-26 16:50:09 +11:00			`static void ctdb_set_recmode_timeout(struct tevent_context *ev,`
			`struct tevent_timer *te,`
			`struct timeval t, void *private_data)`
add back the test inside the daemon that if someone asks us to drop recovery mode back to NORMAL that we can not lock the reclock file since at this stage it MUST be locked by the recovery daemon. in order to avoid a non-blocking fnctl() lock from blocking and cause "issues" we move the 'test that we can not lock reclock file' into a child process. (This used to be ctdb commit 3af994641ec2234e37da1fa1f693441586471a7e) 2007-10-16 15:27:07 +10:00			`{`
			`struct ctdb_set_recmode_state *state = talloc_get_type(private_data,`
			`struct ctdb_set_recmode_state);`

fixed problem with looping ctdb recoveries After a node failure, GPFS can get into a state where non-blocking fcntl() locks can take a long time. This means to the ctdb set_recmode test timing out, which leads to a recovery failure, and a new recovery. The recovery loop can last a long time. The fix is to consider a fcntl timeout as a success of this test. The test is to see that we can't lock the shared reclock file, so a timeout is fine for a success. (This used to be ctdb commit 6579a6a2a7161214adedf0f67dce62f4a4ad1afe) 2008-11-21 08:05:59 +11:00			`/* we consider this a success, not a failure, as we failed to`
			`set the recovery lock which is what we wanted. This can be`
			`caused by the cluster filesystem being very slow to`
			`arbitrate locks immediately after a node failure.`
			`*/`
Add a new variable VerifyRecoveryLock which can be used to disable the test that the recovery daemon holds the lock properly when performing a recovery (This used to be ctdb commit 329df9e47e6ca8ab5143985a999e68f37c6d88a5) 2009-05-01 01:18:27 +10:00			`DEBUG(DEBUG_ERR,(__location__ " set_recmode child process hung/timedout CFS slow to grant locks? (allowing recmode set anyway)\n"));`
ctdb-recovery: Don't store recmode in recovery mode state The callbacks that use this value are only ever called if recovery mode is being set to NORMAL. So do not check if recmode is NORMAL either. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2016-01-11 13:58:54 +11:00			`state->ctdb->recovery_mode = CTDB_RECOVERY_NORMAL;`
fixed problem with looping ctdb recoveries After a node failure, GPFS can get into a state where non-blocking fcntl() locks can take a long time. This means to the ctdb set_recmode test timing out, which leads to a recovery failure, and a new recovery. The recovery loop can last a long time. The fix is to consider a fcntl timeout as a success of this test. The test is to see that we can't lock the shared reclock file, so a timeout is fine for a success. (This used to be ctdb commit 6579a6a2a7161214adedf0f67dce62f4a4ad1afe) 2008-11-21 08:05:59 +11:00			`ctdb_request_control_reply(state->ctdb, state->c, NULL, 0, NULL);`
add back the test inside the daemon that if someone asks us to drop recovery mode back to NORMAL that we can not lock the reclock file since at this stage it MUST be locked by the recovery daemon. in order to avoid a non-blocking fnctl() lock from blocking and cause "issues" we move the 'test that we can not lock reclock file' into a child process. (This used to be ctdb commit 3af994641ec2234e37da1fa1f693441586471a7e) 2007-10-16 15:27:07 +10:00			`talloc_free(state);`
			`}`


			`/* when we free the recmode state we must kill any child process.`
			`*/`
			`static int set_recmode_destructor(struct ctdb_set_recmode_state *state)`
			`{`
dont leak file descriptors when set recmdoe timesout (This used to be ctdb commit fc8a364eb095ec11ca01246a583bf1dc53510141) 2009-06-19 14:58:06 +10:00			`if (state->fd[0] != -1) {`
			`state->fd[0] = -1;`
			`}`
Track all child process so we never send a signal to an unrelated process (our child died and kernel wrapped the pid-space and reused the pid for a different process Wrap all creation of child processes inside ctdb_fork() which is used to track all processes we have spawned. Capture SIGCHLD to track also which child processes have terminated. Wrap kill() inside ctdb_kill() and make sure that we never send a !0 signal to a child process pid that has already terminated (and might have been replaced with a (This used to be ctdb commit f73a4b1495830bcdd094a93732a89dd53b3c2f78) 2012-05-03 11:42:41 +10:00			`ctdb_kill(state->ctdb, state->child, SIGKILL);`
add back the test inside the daemon that if someone asks us to drop recovery mode back to NORMAL that we can not lock the reclock file since at this stage it MUST be locked by the recovery daemon. in order to avoid a non-blocking fnctl() lock from blocking and cause "issues" we move the 'test that we can not lock reclock file' into a child process. (This used to be ctdb commit 3af994641ec2234e37da1fa1f693441586471a7e) 2007-10-16 15:27:07 +10:00			`return 0;`
			`}`

			`/* this is called when the client process has completed ctdb_recovery_lock()`
			`and has written data back to us through the pipe.`
			`*/`
ctdb-daemon: Stop using tevent compatibility definitions Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-26 16:50:09 +11:00			`static void set_recmode_handler(struct tevent_context *ev,`
			`struct tevent_fd *fde,`
			`uint16_t flags, void *private_data)`
add back the test inside the daemon that if someone asks us to drop recovery mode back to NORMAL that we can not lock the reclock file since at this stage it MUST be locked by the recovery daemon. in order to avoid a non-blocking fnctl() lock from blocking and cause "issues" we move the 'test that we can not lock reclock file' into a child process. (This used to be ctdb commit 3af994641ec2234e37da1fa1f693441586471a7e) 2007-10-16 15:27:07 +10:00			`{`
			`struct ctdb_set_recmode_state *state= talloc_get_type(private_data,`
			`struct ctdb_set_recmode_state);`
merge from ronnie (This used to be ctdb commit 75d4b386293e186a6bb8532515585ab72670d663) 2007-10-18 15:44:02 +10:00			`char c = 0;`
add back the test inside the daemon that if someone asks us to drop recovery mode back to NORMAL that we can not lock the reclock file since at this stage it MUST be locked by the recovery daemon. in order to avoid a non-blocking fnctl() lock from blocking and cause "issues" we move the 'test that we can not lock reclock file' into a child process. (This used to be ctdb commit 3af994641ec2234e37da1fa1f693441586471a7e) 2007-10-16 15:27:07 +10:00			`int ret;`
ctdb-recovery: Clean up status handling from recmode child This currently returns an incorrect error when the expected number of bytes are not read. Separate out the different cases to clarify the logic and avoid reporting the wrong error. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2016-01-28 14:59:18 +11:00			`int status = 0;`
			`const char *err = NULL;`
add back the test inside the daemon that if someone asks us to drop recovery mode back to NORMAL that we can not lock the reclock file since at this stage it MUST be locked by the recovery daemon. in order to avoid a non-blocking fnctl() lock from blocking and cause "issues" we move the 'test that we can not lock reclock file' into a child process. (This used to be ctdb commit 3af994641ec2234e37da1fa1f693441586471a7e) 2007-10-16 15:27:07 +10:00
			`/* we got a response from our child process so we can abort the`
			`timeout.`
			`*/`
			`talloc_free(state->te);`
			`state->te = NULL;`

ctdb: Use sys_read() and sys_write() to ensure correct signal interaction ... and avoid compiler warnings in some cases. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2014-07-30 21:03:53 +10:00			`ret = sys_read(state->fd[0], &c, 1);`
ctdb-recovery: Clean up status handling from recmode child This currently returns an incorrect error when the expected number of bytes are not read. Separate out the different cases to clarify the logic and avoid reporting the wrong error. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2016-01-28 14:59:18 +11:00			`if (ret == 1) {`
ctdb-recovery: Negate the status when checking the recovery lock Have 0 indicate that the lock was taken. This allows non-zero values to be used to indicate why the lock could not be taken. EACCES means lock contention. For now use just EACCES to cover all failures, since ctdb_recovery_lock() returns a bool and details of other errors will be lost. ctdb_recovery_lock() will undergo some big changes, so don't try to fix this now. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2016-01-28 15:07:30 +11:00			`/* Child wrote status. EACCES indicates that it was unable`
ctdb-recovery: Clean up status handling from recmode child This currently returns an incorrect error when the expected number of bytes are not read. Separate out the different cases to clarify the logic and avoid reporting the wrong error. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2016-01-28 14:59:18 +11:00			`* to take the lock, which is the expected outcome.`
ctdb-recovery: Negate the status when checking the recovery lock Have 0 indicate that the lock was taken. This allows non-zero values to be used to indicate why the lock could not be taken. EACCES means lock contention. For now use just EACCES to cover all failures, since ctdb_recovery_lock() returns a bool and details of other errors will be lost. ctdb_recovery_lock() will undergo some big changes, so don't try to fix this now. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2016-01-28 15:07:30 +11:00			`* 0 indicates that it was able to take the`
ctdb-recovery: Clean up status handling from recmode child This currently returns an incorrect error when the expected number of bytes are not read. Separate out the different cases to clarify the logic and avoid reporting the wrong error. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2016-01-28 14:59:18 +11:00			`* lock, which is an error because the recovery daemon`
			`* should be holding the lock. */`
ctdb-recovery: Limit scope of reclock latency statistics It does not make sense to update this statistic for the timeout case, since this could skew the statistic. To keep it simple, just update it for the usual case where there is lock contention, since this is the usual case. So the daemon statistic measures time to test the lock and the corresponding recovery daemon statistic measures time to take the lock. Additionally, the recovery daemon will eventually use this code to take the lock, and the method of updating the latency statistic will need to be pushed further out to a configurable handler that depends on the calling context. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> Autobuild-User(master): Amitay Isaacs <amitay@samba.org> Autobuild-Date(master): Tue Feb 23 10:32:06 CET 2016 on sn-devel-144 2016-02-01 11:46:05 +11:00			`double l = timeval_elapsed(&state->start_time);`

ctdb-recovery: Negate the status when checking the recovery lock Have 0 indicate that the lock was taken. This allows non-zero values to be used to indicate why the lock could not be taken. EACCES means lock contention. For now use just EACCES to cover all failures, since ctdb_recovery_lock() returns a bool and details of other errors will be lost. ctdb_recovery_lock() will undergo some big changes, so don't try to fix this now. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2016-01-28 15:07:30 +11:00			`if (c == EACCES) {`
ctdb-recovery: Clean up status handling from recmode child This currently returns an incorrect error when the expected number of bytes are not read. Separate out the different cases to clarify the logic and avoid reporting the wrong error. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2016-01-28 14:59:18 +11:00			`status = 0;`
			`err = NULL;`

			`state->ctdb->recovery_mode = CTDB_RECOVERY_NORMAL;`

			`/* release any deferred attach calls from clients */`
			`ctdb_process_deferred_attach(state->ctdb);`
ctdb-recovery: Limit scope of reclock latency statistics It does not make sense to update this statistic for the timeout case, since this could skew the statistic. To keep it simple, just update it for the usual case where there is lock contention, since this is the usual case. So the daemon statistic measures time to test the lock and the corresponding recovery daemon statistic measures time to take the lock. Additionally, the recovery daemon will eventually use this code to take the lock, and the method of updating the latency statistic will need to be pushed further out to a configurable handler that depends on the calling context. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> Autobuild-User(master): Amitay Isaacs <amitay@samba.org> Autobuild-Date(master): Tue Feb 23 10:32:06 CET 2016 on sn-devel-144 2016-02-01 11:46:05 +11:00
			`CTDB_UPDATE_RECLOCK_LATENCY(state->ctdb, "daemon reclock", reclock.ctdbd, l);`
ctdb-recovery: Clean up status handling from recmode child This currently returns an incorrect error when the expected number of bytes are not read. Separate out the different cases to clarify the logic and avoid reporting the wrong error. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2016-01-28 14:59:18 +11:00			`} else {`
			`status = -1;`
			`err = "Took recovery lock from daemon during recovery - probably a cluster filesystem lock coherence problem";`
			`}`
			`} else {`
			`/* Child did not write status. Unexpected error.`
			`* Child may have received a signal. */`
			`status = -1;`
			`err = "Unexpected error when testing recovery lock";`
add back the test inside the daemon that if someone asks us to drop recovery mode back to NORMAL that we can not lock the reclock file since at this stage it MUST be locked by the recovery daemon. in order to avoid a non-blocking fnctl() lock from blocking and cause "issues" we move the 'test that we can not lock reclock file' into a child process. (This used to be ctdb commit 3af994641ec2234e37da1fa1f693441586471a7e) 2007-10-16 15:27:07 +10:00			`}`

ctdb-recovery: Clean up status handling from recmode child This currently returns an incorrect error when the expected number of bytes are not read. Separate out the different cases to clarify the logic and avoid reporting the wrong error. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2016-01-28 14:59:18 +11:00			`ctdb_request_control_reply(state->ctdb, state->c, NULL, status, err);`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 13:59:28 +11:00			`talloc_free(state);`
add back the test inside the daemon that if someone asks us to drop recovery mode back to NORMAL that we can not lock the reclock file since at this stage it MUST be locked by the recovery daemon. in order to avoid a non-blocking fnctl() lock from blocking and cause "issues" we move the 'test that we can not lock reclock file' into a child process. (This used to be ctdb commit 3af994641ec2234e37da1fa1f693441586471a7e) 2007-10-16 15:27:07 +10:00			`}`

add a context and a timed event so that once we have been in recovery mode for too long we drop all public ip addresses (This used to be ctdb commit 403c68f96e1380dd07217c688de2730464f77ea0) 2008-10-22 11:04:41 +11:00			`static void`
ctdb-daemon: Stop using tevent compatibility definitions Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-26 16:50:09 +11:00			`ctdb_drop_all_ips_event(struct tevent_context ev, struct tevent_timer te,`
			`struct timeval t, void *private_data)`
add a context and a timed event so that once we have been in recovery mode for too long we drop all public ip addresses (This used to be ctdb commit 403c68f96e1380dd07217c688de2730464f77ea0) 2008-10-22 11:04:41 +11:00			`{`
			`struct ctdb_context *ctdb = talloc_get_type(private_data, struct ctdb_context);`

increase the loglevel for the message we print when we automatically release all ips when we have been in recovery for too long (This used to be ctdb commit 7af060ded5113a49832f6a08a942523a202586b3) 2009-04-24 18:09:51 +10:00			`DEBUG(DEBUG_ERR,(__location__ " Been in recovery mode for too long. Dropping all IPS\n"));`
add a context and a timed event so that once we have been in recovery mode for too long we drop all public ip addresses (This used to be ctdb commit 403c68f96e1380dd07217c688de2730464f77ea0) 2008-10-22 11:04:41 +11:00			`talloc_free(ctdb->release_ips_ctx);`
			`ctdb->release_ips_ctx = NULL;`

			`ctdb_release_all_ips(ctdb);`
			`}`

Add a new tunable : DisableIPFailover that when set to non 0 will stopp any ip reallocations at all from happening. (This used to be ctdb commit d8d37493478a26c5f1809a5f3df89ffd6e149281) 2010-11-09 15:19:06 +11:00			`/*`
			`* Set up an event to drop all public ips if we remain in recovery for too`
			`* long`
			`*/`
			`int ctdb_deferred_drop_all_ips(struct ctdb_context *ctdb)`
			`{`
			`if (ctdb->release_ips_ctx != NULL) {`
			`talloc_free(ctdb->release_ips_ctx);`
			`}`
			`ctdb->release_ips_ctx = talloc_new(ctdb);`
			`CTDB_NO_MEMORY(ctdb, ctdb->release_ips_ctx);`

ctdb-daemon: Stop using tevent compatibility definitions Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-26 16:50:09 +11:00			`tevent_add_timer(ctdb->ev, ctdb->release_ips_ctx,`
			`timeval_current_ofs(ctdb->tunable.recovery_drop_all_ips, 0),`
			`ctdb_drop_all_ips_event, ctdb);`
Add a new tunable : DisableIPFailover that when set to non 0 will stopp any ip reallocations at all from happening. (This used to be ctdb commit d8d37493478a26c5f1809a5f3df89ffd6e149281) 2010-11-09 15:19:06 +11:00			`return 0;`
			`}`

added lockwait child code for entering recovery mode. A child processes holds lockall locks for the entire recovery process (This used to be ctdb commit f892f30def75b0d964c35eae38c4cf675597dd28) 2007-05-12 14:34:21 +10:00			`/*`
			`set the recovery mode`
			`*/`
- make calling of recovered event script async - shutdown sockets before calling shutdown script (This used to be ctdb commit c5e099feef94a014a77742b6cc1d0afe78ef9da9) 2007-06-02 08:41:19 +10:00			`int32_t ctdb_control_set_recmode(struct ctdb_context *ctdb,`
ctdb-daemon: Rename struct ctdb_req_control to ctdb_req_control_old Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 16:42:05 +11:00			`struct ctdb_req_control_old *c,`
- make calling of recovered event script async - shutdown sockets before calling shutdown script (This used to be ctdb commit c5e099feef94a014a77742b6cc1d0afe78ef9da9) 2007-06-02 08:41:19 +10:00			`TDB_DATA indata, bool *async_reply,`
added error messages in ctdb_control replies (This used to be ctdb commit bd848f5b760e6b2a73ebfc67fd8adb3c31479fb5) 2007-05-12 21:25:26 +10:00			`const char **errormsg)`
added lockwait child code for entering recovery mode. A child processes holds lockall locks for the entire recovery process (This used to be ctdb commit f892f30def75b0d964c35eae38c4cf675597dd28) 2007-05-12 14:34:21 +10:00			`{`
separate out the freeze/thaw handling from recovery (This used to be ctdb commit 0b0640bd8b8334961f240e0cf276ac112cd6e616) 2007-05-12 15:15:27 +10:00			`uint32_t recmode = (uint32_t )indata.dptr;`
initial attempt at freezing databases in priority order (This used to be ctdb commit e8d692590da1070c87a4144031e3306d190ebed2) 2009-10-12 12:08:39 +11:00			`int i, ret;`
- make specification of a recovery lock file compulsory - die if someone other than the recmaster can get the recovery lock (This used to be ctdb commit a827d0d0e430ca8ad5d521367e45097185492869) 2007-06-02 11:36:42 +10:00			`struct ctdb_set_recmode_state *state;`
add back the test inside the daemon that if someone asks us to drop recovery mode back to NORMAL that we can not lock the reclock file since at this stage it MUST be locked by the recovery daemon. in order to avoid a non-blocking fnctl() lock from blocking and cause "issues" we move the 'test that we can not lock reclock file' into a child process. (This used to be ctdb commit 3af994641ec2234e37da1fa1f693441586471a7e) 2007-10-16 15:27:07 +10:00			`pid_t parent = getpid();`
ctdb-daemon: Add a check for database generation consistency Before setting recovery mode to normal, confirm that all the databases are recovered by matching the database generation with the global generation. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-09-11 16:14:12 +10:00			`struct ctdb_db_context *ctdb_db;`
- make specification of a recovery lock file compulsory - die if someone other than the recmaster can get the recovery lock (This used to be ctdb commit a827d0d0e430ca8ad5d521367e45097185492869) 2007-06-02 11:36:42 +10:00
add a context and a timed event so that once we have been in recovery mode for too long we drop all public ip addresses (This used to be ctdb commit 403c68f96e1380dd07217c688de2730464f77ea0) 2008-10-22 11:04:41 +11:00			`/* if we enter recovery but stay in recovery for too long`
			`we will eventually drop all our ip addresses`
			`*/`
			`if (recmode == CTDB_RECOVERY_NORMAL) {`
			`talloc_free(ctdb->release_ips_ctx);`
			`ctdb->release_ips_ctx = NULL;`
			`} else {`
Add a new tunable : DisableIPFailover that when set to non 0 will stopp any ip reallocations at all from happening. (This used to be ctdb commit d8d37493478a26c5f1809a5f3df89ffd6e149281) 2010-11-09 15:19:06 +11:00			`if (ctdb_deferred_drop_all_ips(ctdb) != 0) {`
			`DEBUG(DEBUG_ERR,("Failed to set up deferred drop all ips\n"));`
			`}`
add a context and a timed event so that once we have been in recovery mode for too long we drop all public ip addresses (This used to be ctdb commit 403c68f96e1380dd07217c688de2730464f77ea0) 2008-10-22 11:04:41 +11:00			`}`

show start/stop time of recovery on all nodes (This used to be ctdb commit 9f7662279c367eb3e8a58e6f4aeca521e6f1f1d0) 2008-01-08 09:30:11 +11:00			`if (recmode != ctdb->recovery_mode) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 20:07:15 +11:00			`DEBUG(DEBUG_NOTICE,(__location__ " Recovery mode set to %s\n",`
show start/stop time of recovery on all nodes (This used to be ctdb commit 9f7662279c367eb3e8a58e6f4aeca521e6f1f1d0) 2008-01-08 09:30:11 +11:00			`recmode==CTDB_RECOVERY_NORMAL?"NORMAL":"ACTIVE"));`
			`}`

- make specification of a recovery lock file compulsory - die if someone other than the recmaster can get the recovery lock (This used to be ctdb commit a827d0d0e430ca8ad5d521367e45097185492869) 2007-06-02 11:36:42 +10:00			`if (recmode != CTDB_RECOVERY_NORMAL \|\|`
			`ctdb->recovery_mode != CTDB_RECOVERY_ACTIVE) {`
			`ctdb->recovery_mode = recmode;`
			`return 0;`
- make calling of recovered event script async - shutdown sockets before calling shutdown script (This used to be ctdb commit c5e099feef94a014a77742b6cc1d0afe78ef9da9) 2007-06-02 08:41:19 +10:00			`}`
- make specification of a recovery lock file compulsory - die if someone other than the recmaster can get the recovery lock (This used to be ctdb commit a827d0d0e430ca8ad5d521367e45097185492869) 2007-06-02 11:36:42 +10:00
ctdb-recovery: Don't store recmode in recovery mode state The callbacks that use this value are only ever called if recovery mode is being set to NORMAL. So do not check if recmode is NORMAL either. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2016-01-11 13:58:54 +11:00			`/* From this point: recmode == CTDB_RECOVERY_NORMAL`
			`*`
			`* Therefore, what follows is special handling when setting`
			`* recovery mode back to normal */`
test (This used to be ctdb commit 4f2d722cf29175c3c207e6ebb6d4f9e370767249) 2008-06-26 14:14:37 +10:00
ctdb-daemon: Add a check for database generation consistency Before setting recovery mode to normal, confirm that all the databases are recovered by matching the database generation with the global generation. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-09-11 16:14:12 +10:00			`for (ctdb_db = ctdb->db_list; ctdb_db != NULL; ctdb_db = ctdb_db->next) {`
			`if (ctdb_db->generation != ctdb->vnn_map->generation) {`
			`DEBUG(DEBUG_ERR,`
			`("Inconsistent DB generation %u for %s\n",`
			`ctdb_db->generation, ctdb_db->db_name));`
			`DEBUG(DEBUG_ERR, ("Recovery mode set to ACTIVE\n"));`
			`return -1;`
			`}`
			`}`

initial attempt at freezing databases in priority order (This used to be ctdb commit e8d692590da1070c87a4144031e3306d190ebed2) 2009-10-12 12:08:39 +11:00			`/* force the databases to thaw */`
			`for (i=1; i<=NUM_DB_PRIORITIES; i++) {`
ctdb-daemon: Avoid the use of ctdb->freeze_handle variable These variables are used for state information related to freezing databases. Instead use the API functions to check if the databases are frozen. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-09-15 12:22:17 +10:00			`if (ctdb_db_prio_frozen(ctdb, i)) {`
ctdb-daemon: Do not thaw databases if recovery is active This prevents ctdb tool from thawing databases prematurely in thaw/wipedb/restoredb commands if recovery is active. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2014-05-06 14:20:44 +10:00			`ctdb_control_thaw(ctdb, i, false);`
initial attempt at freezing databases in priority order (This used to be ctdb commit e8d692590da1070c87a4144031e3306d190ebed2) 2009-10-12 12:08:39 +11:00			`}`
test (This used to be ctdb commit 4f2d722cf29175c3c207e6ebb6d4f9e370767249) 2008-06-26 14:14:37 +10:00			`}`

Deferred attach : at early startup, defer any db attach calls until we are out of recovery. (This used to be ctdb commit eeaabd579841f60ab2c5b004cbbb1f5de2bfe685) 2011-02-23 15:46:36 +11:00			`/* release any deferred attach calls from clients */`
			`if (recmode == CTDB_RECOVERY_NORMAL) {`
			`ctdb_process_deferred_attach(ctdb);`
			`}`

ctdb-daemon: Mark tunable VerifyRecoveryLock as obsolete It is pointless having a recovery lock but not sanity checking that it is working. Also, the logic that uses this tunable is confusing. In some places the recovery lock is released unnecessarily because the tunable isn't set. Simplify the logic by assuming that if a recovery lock is specified then it should be verified. Update documentation that references this tunable. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2014-12-09 13:47:42 +11:00			`if (ctdb->recovery_lock_file == NULL) {`
			`/* Not using recovery lock file */`
Add a new variable VerifyRecoveryLock which can be used to disable the test that the recovery daemon holds the lock properly when performing a recovery (This used to be ctdb commit 329df9e47e6ca8ab5143985a999e68f37c6d88a5) 2009-05-01 01:18:27 +10:00			`ctdb->recovery_mode = recmode;`
			`return 0;`
			`}`

ctdb-daemon: Don't leak memory if not using recovery lock Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Michael Adam <obnox@samba.org> 2016-01-11 13:41:30 +11:00			`state = talloc(ctdb, struct ctdb_set_recmode_state);`
			`CTDB_NO_MEMORY(ctdb, state);`

			`state->start_time = timeval_current();`
			`state->fd[0] = -1;`
			`state->fd[1] = -1;`

add back the test inside the daemon that if someone asks us to drop recovery mode back to NORMAL that we can not lock the reclock file since at this stage it MUST be locked by the recovery daemon. in order to avoid a non-blocking fnctl() lock from blocking and cause "issues" we move the 'test that we can not lock reclock file' into a child process. (This used to be ctdb commit 3af994641ec2234e37da1fa1f693441586471a7e) 2007-10-16 15:27:07 +10:00			`/* For the rest of what needs to be done, we need to do this in`
			`a child process since`
			`1, the call to ctdb_recovery_lock() can block if the cluster`
			`filesystem is in the process of recovery.`
			`*/`
			`ret = pipe(state->fd);`
- make specification of a recovery lock file compulsory - die if someone other than the recmaster can get the recovery lock (This used to be ctdb commit a827d0d0e430ca8ad5d521367e45097185492869) 2007-06-02 11:36:42 +10:00			`if (ret != 0) {`
add back the test inside the daemon that if someone asks us to drop recovery mode back to NORMAL that we can not lock the reclock file since at this stage it MUST be locked by the recovery daemon. in order to avoid a non-blocking fnctl() lock from blocking and cause "issues" we move the 'test that we can not lock reclock file' into a child process. (This used to be ctdb commit 3af994641ec2234e37da1fa1f693441586471a7e) 2007-10-16 15:27:07 +10:00			`talloc_free(state);`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 20:07:15 +11:00			`DEBUG(DEBUG_CRIT,(__location__ " Failed to open pipe for set_recmode child\n"));`
- make specification of a recovery lock file compulsory - die if someone other than the recmaster can get the recovery lock (This used to be ctdb commit a827d0d0e430ca8ad5d521367e45097185492869) 2007-06-02 11:36:42 +10:00			`return -1;`
add back the test inside the daemon that if someone asks us to drop recovery mode back to NORMAL that we can not lock the reclock file since at this stage it MUST be locked by the recovery daemon. in order to avoid a non-blocking fnctl() lock from blocking and cause "issues" we move the 'test that we can not lock reclock file' into a child process. (This used to be ctdb commit 3af994641ec2234e37da1fa1f693441586471a7e) 2007-10-16 15:27:07 +10:00			`}`
- make specification of a recovery lock file compulsory - die if someone other than the recmaster can get the recovery lock (This used to be ctdb commit a827d0d0e430ca8ad5d521367e45097185492869) 2007-06-02 11:36:42 +10:00
Track all child process so we never send a signal to an unrelated process (our child died and kernel wrapped the pid-space and reused the pid for a different process Wrap all creation of child processes inside ctdb_fork() which is used to track all processes we have spawned. Capture SIGCHLD to track also which child processes have terminated. Wrap kill() inside ctdb_kill() and make sure that we never send a !0 signal to a child process pid that has already terminated (and might have been replaced with a (This used to be ctdb commit f73a4b1495830bcdd094a93732a89dd53b3c2f78) 2012-05-03 11:42:41 +10:00			`state->child = ctdb_fork(ctdb);`
add back the test inside the daemon that if someone asks us to drop recovery mode back to NORMAL that we can not lock the reclock file since at this stage it MUST be locked by the recovery daemon. in order to avoid a non-blocking fnctl() lock from blocking and cause "issues" we move the 'test that we can not lock reclock file' into a child process. (This used to be ctdb commit 3af994641ec2234e37da1fa1f693441586471a7e) 2007-10-16 15:27:07 +10:00			`if (state->child == (pid_t)-1) {`
			`close(state->fd[0]);`
			`close(state->fd[1]);`
			`talloc_free(state);`
			`return -1;`
- make specification of a recovery lock file compulsory - die if someone other than the recmaster can get the recovery lock (This used to be ctdb commit a827d0d0e430ca8ad5d521367e45097185492869) 2007-06-02 11:36:42 +10:00			`}`
added timeouts in all event scripts (This used to be ctdb commit d986c91a607ed7c7d4869ea786b5cdf80e7862f1) 2007-06-06 13:45:12 +10:00
add back the test inside the daemon that if someone asks us to drop recovery mode back to NORMAL that we can not lock the reclock file since at this stage it MUST be locked by the recovery daemon. in order to avoid a non-blocking fnctl() lock from blocking and cause "issues" we move the 'test that we can not lock reclock file' into a child process. (This used to be ctdb commit 3af994641ec2234e37da1fa1f693441586471a7e) 2007-10-16 15:27:07 +10:00			`if (state->child == 0) {`
ctdb-recovery: Negate the status when checking the recovery lock Have 0 indicate that the lock was taken. This allows non-zero values to be used to indicate why the lock could not be taken. EACCES means lock contention. For now use just EACCES to cover all failures, since ctdb_recovery_lock() returns a bool and details of other errors will be lost. ctdb_recovery_lock() will undergo some big changes, so don't try to fix this now. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2016-01-28 15:07:30 +11:00			`char cc = EACCES;`
add back the test inside the daemon that if someone asks us to drop recovery mode back to NORMAL that we can not lock the reclock file since at this stage it MUST be locked by the recovery daemon. in order to avoid a non-blocking fnctl() lock from blocking and cause "issues" we move the 'test that we can not lock reclock file' into a child process. (This used to be ctdb commit 3af994641ec2234e37da1fa1f693441586471a7e) 2007-10-16 15:27:07 +10:00			`close(state->fd[0]);`

ctdb: Use prctl_set_comment from lib/util Signed-off-by: Christof Schmitt <cs@samba.org> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-09-23 16:10:59 -07:00			`prctl_set_comment("ctdb_recmode");`
logging: give a unique logging name to each forked child. This means we can distinguish which child is logging, esp. via syslog where we have no pid. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 68b3761a0874429b90731741f0531f76dcfbb081) 2010-07-19 19:29:09 +09:30			`debug_extra = talloc_asprintf(NULL, "set_recmode:");`
ctdb-recoverd: Improve error messages on recovery lock coherence fail When the daemon is able to take the recovery lock during recovery we might as well guess that the cluster filesystem has a lock coherence problem and print a more useful message. This will be more helpful to those trying out cluster filesystems that don't have lock coherence or that are difficult to setup. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2014-12-17 20:33:19 +11:00			`/* Daemon should not be able to get the recover lock,`
			`* as it should be held by the recovery master */`
ctdb-recoverd: Simplify ctdb_recovery_lock() Have it just silently take or fail to take the lock, except on an unexpected failure (where it should log an error). This means that when it is called we need to keep the old behaviour and explicitly release the lock. In do_recovery() the lock is released and a message is printed before attempting to take the lock. In the daemon sanity check the lock must be released in the error path if it is actually taken. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2014-12-09 14:50:38 +11:00			`if (ctdb_recovery_lock(ctdb)) {`
ctdb-recoverd: Improve error messages on recovery lock coherence fail When the daemon is able to take the recovery lock during recovery we might as well guess that the cluster filesystem has a lock coherence problem and print a more useful message. This will be more helpful to those trying out cluster filesystems that don't have lock coherence or that are difficult to setup. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2014-12-17 20:33:19 +11:00			`DEBUG(DEBUG_ERR,`
			`("ERROR: Daemon able to take recovery lock on \"%s\" during recovery\n",`
			`ctdb->recovery_lock_file));`
ctdb-recoverd: Simplify ctdb_recovery_lock() Have it just silently take or fail to take the lock, except on an unexpected failure (where it should log an error). This means that when it is called we need to keep the old behaviour and explicitly release the lock. In do_recovery() the lock is released and a message is printed before attempting to take the lock. In the daemon sanity check the lock must be released in the error path if it is actually taken. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2014-12-09 14:50:38 +11:00			`ctdb_recovery_unlock(ctdb);`
ctdb-recovery: Negate the status when checking the recovery lock Have 0 indicate that the lock was taken. This allows non-zero values to be used to indicate why the lock could not be taken. EACCES means lock contention. For now use just EACCES to cover all failures, since ctdb_recovery_lock() returns a bool and details of other errors will be lost. ctdb_recovery_lock() will undergo some big changes, so don't try to fix this now. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2016-01-28 15:07:30 +11:00			`cc = 0;`
add back the test inside the daemon that if someone asks us to drop recovery mode back to NORMAL that we can not lock the reclock file since at this stage it MUST be locked by the recovery daemon. in order to avoid a non-blocking fnctl() lock from blocking and cause "issues" we move the 'test that we can not lock reclock file' into a child process. (This used to be ctdb commit 3af994641ec2234e37da1fa1f693441586471a7e) 2007-10-16 15:27:07 +10:00			`}`

ctdb: Use sys_read() and sys_write() to ensure correct signal interaction ... and avoid compiler warnings in some cases. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2014-07-30 21:03:53 +10:00			`sys_write(state->fd[1], &cc, 1);`
ctdb: Use ctdb_wait_for_process_to_exit() Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2015-12-08 14:20:59 +11:00			`ctdb_wait_for_process_to_exit(parent);`
add back the test inside the daemon that if someone asks us to drop recovery mode back to NORMAL that we can not lock the reclock file since at this stage it MUST be locked by the recovery daemon. in order to avoid a non-blocking fnctl() lock from blocking and cause "issues" we move the 'test that we can not lock reclock file' into a child process. (This used to be ctdb commit 3af994641ec2234e37da1fa1f693441586471a7e) 2007-10-16 15:27:07 +10:00			`_exit(0);`
			`}`
			`close(state->fd[1]);`
add logging everytime we create a filedescriptor in the main ctdb daemon so we can spot if there are leaks. plug two leaks for filedescriptors related to when sending ARP fail and one leak when we can not parse the local address during tcp connection establish (This used to be ctdb commit ddd089810a14efe4be6e1ff3eccaa604e4913c9e) 2009-10-15 11:24:54 +11:00			`set_close_on_exec(state->fd[0]);`

dont leak file descriptors when set recmdoe timesout (This used to be ctdb commit fc8a364eb095ec11ca01246a583bf1dc53510141) 2009-06-19 14:58:06 +10:00			`state->fd[1] = -1;`
add back the test inside the daemon that if someone asks us to drop recovery mode back to NORMAL that we can not lock the reclock file since at this stage it MUST be locked by the recovery daemon. in order to avoid a non-blocking fnctl() lock from blocking and cause "issues" we move the 'test that we can not lock reclock file' into a child process. (This used to be ctdb commit 3af994641ec2234e37da1fa1f693441586471a7e) 2007-10-16 15:27:07 +10:00
			`talloc_set_destructor(state, set_recmode_destructor);`

Drop the debug level for logging fd creation to DEBUG_DEBUG (This used to be ctdb commit eae1d4f9e52e73b4d8769868fffdafa590d03784) 2010-02-04 06:37:41 +11:00			`DEBUG(DEBUG_DEBUG, (__location__ " Created PIPE FD:%d for setrecmode\n", state->fd[0]));`
add logging everytime we create a filedescriptor in the main ctdb daemon so we can spot if there are leaks. plug two leaks for filedescriptors related to when sending ARP fail and one leak when we can not parse the local address during tcp connection establish (This used to be ctdb commit ddd089810a14efe4be6e1ff3eccaa604e4913c9e) 2009-10-15 11:24:54 +11:00
ctdb-daemon: Stop using tevent compatibility definitions Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-26 16:50:09 +11:00			`state->te = tevent_add_timer(ctdb->ev, state, timeval_current_ofs(5, 0),`
			`ctdb_set_recmode_timeout, state);`
add back the test inside the daemon that if someone asks us to drop recovery mode back to NORMAL that we can not lock the reclock file since at this stage it MUST be locked by the recovery daemon. in order to avoid a non-blocking fnctl() lock from blocking and cause "issues" we move the 'test that we can not lock reclock file' into a child process. (This used to be ctdb commit 3af994641ec2234e37da1fa1f693441586471a7e) 2007-10-16 15:27:07 +10:00
ctdb-daemon: Stop using tevent compatibility definitions Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-26 16:50:09 +11:00			`state->fde = tevent_add_fd(ctdb->ev, state, state->fd[0], TEVENT_FD_READ,`
			`set_recmode_handler, (void *)state);`
reduce the timeout we wait for the reclock child process to finish to 5 seconds before we log an error and abort (This used to be ctdb commit 6d1e4321b63973c2e53c63d386e8cc0bd9605cae) 2009-06-19 13:09:11 +10:00
add back the test inside the daemon that if someone asks us to drop recovery mode back to NORMAL that we can not lock the reclock file since at this stage it MUST be locked by the recovery daemon. in order to avoid a non-blocking fnctl() lock from blocking and cause "issues" we move the 'test that we can not lock reclock file' into a child process. (This used to be ctdb commit 3af994641ec2234e37da1fa1f693441586471a7e) 2007-10-16 15:27:07 +10:00			`if (state->fde == NULL) {`
			`talloc_free(state);`
			`return -1;`
- make specification of a recovery lock file compulsory - die if someone other than the recmaster can get the recovery lock (This used to be ctdb commit a827d0d0e430ca8ad5d521367e45097185492869) 2007-06-02 11:36:42 +10:00			`}`
event: Update events to latest Samba version 0.9.8 In Samba this is now called "tevent", and while we use the backwards compatibility wrappers they don't offer EVENT_FD_AUTOCLOSE: that is now a separate tevent_fd_set_auto_close() function. This is based on Samba version 7f29f817fa939ef1bbb740584f09e76e2ecd5b06. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 85e5e760cc91eb3157d3a88996ce474491646726) 2010-08-18 09:16:31 +09:30			`tevent_fd_set_auto_close(state->fde);`
add back the test inside the daemon that if someone asks us to drop recovery mode back to NORMAL that we can not lock the reclock file since at this stage it MUST be locked by the recovery daemon. in order to avoid a non-blocking fnctl() lock from blocking and cause "issues" we move the 'test that we can not lock reclock file' into a child process. (This used to be ctdb commit 3af994641ec2234e37da1fa1f693441586471a7e) 2007-10-16 15:27:07 +10:00
			`state->ctdb = ctdb;`
			`state->c = talloc_steal(state, c);`

- make specification of a recovery lock file compulsory - die if someone other than the recmaster can get the recovery lock (This used to be ctdb commit a827d0d0e430ca8ad5d521367e45097185492869) 2007-06-02 11:36:42 +10:00			`*async_reply = true;`

separate out the freeze/thaw handling from recovery (This used to be ctdb commit 0b0640bd8b8334961f240e0cf276ac112cd6e616) 2007-05-12 15:15:27 +10:00			`return 0;`
added lockwait child code for entering recovery mode. A child processes holds lockall locks for the entire recovery process (This used to be ctdb commit f892f30def75b0d964c35eae38c4cf675597dd28) 2007-05-12 14:34:21 +10:00			`}`
merge from tridge (This used to be ctdb commit 7bca79ad6357149fd7c6b28ce4b05de3d223a7de) 2007-05-14 06:25:15 +10:00
- moved cmdline options that are only relevant to ctdbd into ctdbd.c - fixed a valgrind error on failing to send a control - don't mark node dead when already disconnected - moved node list lock code into common code (This used to be ctdb commit bcc0432d0fea7ef223f82ccee81cf35c18144b1b) 2007-06-02 10:03:28 +10:00
ctdb-recoverd: New function ctdb_recovery_have_lock() True if this recovery daemon holds the lock. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2014-12-09 13:50:22 +11:00			`bool ctdb_recovery_have_lock(struct ctdb_context *ctdb)`
			`{`
			`return ctdb->recovery_lock_fd != -1;`
			`}`

- moved cmdline options that are only relevant to ctdbd into ctdbd.c - fixed a valgrind error on failing to send a control - don't mark node dead when already disconnected - moved node list lock code into common code (This used to be ctdb commit bcc0432d0fea7ef223f82ccee81cf35c18144b1b) 2007-06-02 10:03:28 +10:00			`/*`
- make specification of a recovery lock file compulsory - die if someone other than the recmaster can get the recovery lock (This used to be ctdb commit a827d0d0e430ca8ad5d521367e45097185492869) 2007-06-02 11:36:42 +10:00			`try and get the recovery lock in shared storage - should only work`
			`on the recovery master recovery daemon. Anywhere else is a bug`
- moved cmdline options that are only relevant to ctdbd into ctdbd.c - fixed a valgrind error on failing to send a control - don't mark node dead when already disconnected - moved node list lock code into common code (This used to be ctdb commit bcc0432d0fea7ef223f82ccee81cf35c18144b1b) 2007-06-02 10:03:28 +10:00			`*/`
ctdb-recoverd: Simplify ctdb_recovery_lock() Have it just silently take or fail to take the lock, except on an unexpected failure (where it should log an error). This means that when it is called we need to keep the old behaviour and explicitly release the lock. In do_recovery() the lock is released and a message is printed before attempting to take the lock. In the daemon sanity check the lock must be released in the error path if it is actually taken. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2014-12-09 14:50:38 +11:00			`bool ctdb_recovery_lock(struct ctdb_context *ctdb)`
- moved cmdline options that are only relevant to ctdbd into ctdbd.c - fixed a valgrind error on failing to send a control - don't mark node dead when already disconnected - moved node list lock code into common code (This used to be ctdb commit bcc0432d0fea7ef223f82ccee81cf35c18144b1b) 2007-06-02 10:03:28 +10:00			`{`
			`struct flock lock;`

ctdb-recoverd: Simplify ctdb_recovery_lock() Have it just silently take or fail to take the lock, except on an unexpected failure (where it should log an error). This means that when it is called we need to keep the old behaviour and explicitly release the lock. In do_recovery() the lock is released and a message is printed before attempting to take the lock. In the daemon sanity check the lock must be released in the error path if it is actually taken. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2014-12-09 14:50:38 +11:00			`ctdb->recovery_lock_fd = open(ctdb->recovery_lock_file,`
			`O_RDWR\|O_CREAT, 0600);`
- make specification of a recovery lock file compulsory - die if someone other than the recmaster can get the recovery lock (This used to be ctdb commit a827d0d0e430ca8ad5d521367e45097185492869) 2007-06-02 11:36:42 +10:00			`if (ctdb->recovery_lock_fd == -1) {`
ctdb-recoverd: Simplify ctdb_recovery_lock() Have it just silently take or fail to take the lock, except on an unexpected failure (where it should log an error). This means that when it is called we need to keep the old behaviour and explicitly release the lock. In do_recovery() the lock is released and a message is printed before attempting to take the lock. In the daemon sanity check the lock must be released in the error path if it is actually taken. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2014-12-09 14:50:38 +11:00			`DEBUG(DEBUG_ERR,`
			`("ctdb_recovery_lock: Unable to open %s - (%s)\n",`
			`ctdb->recovery_lock_file, strerror(errno)));`
- moved cmdline options that are only relevant to ctdbd into ctdbd.c - fixed a valgrind error on failing to send a control - don't mark node dead when already disconnected - moved node list lock code into common code (This used to be ctdb commit bcc0432d0fea7ef223f82ccee81cf35c18144b1b) 2007-06-02 10:03:28 +10:00			`return false;`
			`}`

make sure we set close on exec on any possibly inherited fds (This used to be ctdb commit d9dec82076f14a348e7b67b4350180681ff86f32) 2007-09-19 11:46:37 +10:00			`set_close_on_exec(ctdb->recovery_lock_fd);`

- moved cmdline options that are only relevant to ctdbd into ctdbd.c - fixed a valgrind error on failing to send a control - don't mark node dead when already disconnected - moved node list lock code into common code (This used to be ctdb commit bcc0432d0fea7ef223f82ccee81cf35c18144b1b) 2007-06-02 10:03:28 +10:00			`lock.l_type = F_WRLCK;`
			`lock.l_whence = SEEK_SET;`
			`lock.l_start = 0;`
			`lock.l_len = 1;`
			`lock.l_pid = 0;`

- make specification of a recovery lock file compulsory - die if someone other than the recmaster can get the recovery lock (This used to be ctdb commit a827d0d0e430ca8ad5d521367e45097185492869) 2007-06-02 11:36:42 +10:00			`if (fcntl(ctdb->recovery_lock_fd, F_SETLK, &lock) != 0) {`
ctdb: improve helpfulness of debug message when taking reclock fails Print out the errno if the fcntl call. Signed-off-by: Michael Adam <obnox@samba.org> Reviewed-by: Richard Sharpe <rsharpe@samba.org> Autobuild-User(master): Michael Adam <obnox@samba.org> Autobuild-Date(master): Fri Jan 9 04:25:02 CET 2015 on sn-devel-104 2015-01-09 00:10:37 +01:00			`int saved_errno = errno;`
fixed a fd leak on the recovery lock (This used to be ctdb commit 186f35c42ed4fcc9ed44390b0dd036ece475d45e) 2007-09-24 10:19:07 +10:00			`close(ctdb->recovery_lock_fd);`
			`ctdb->recovery_lock_fd = -1;`
ctdb-recoverd: Simplify ctdb_recovery_lock() Have it just silently take or fail to take the lock, except on an unexpected failure (where it should log an error). This means that when it is called we need to keep the old behaviour and explicitly release the lock. In do_recovery() the lock is released and a message is printed before attempting to take the lock. In the daemon sanity check the lock must be released in the error path if it is actually taken. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2014-12-09 14:50:38 +11:00			`/* Fail silently on these errors, since they indicate`
			`* lock contention, but log an error for any other`
			`* failure. */`
			`if (saved_errno != EACCES &&`
			`saved_errno != EAGAIN) {`
			`DEBUG(DEBUG_ERR,("ctdb_recovery_lock: Failed to get "`
			`"recovery lock on '%s' - (%s)\n",`
			`ctdb->recovery_lock_file,`
			`strerror(saved_errno)));`
added some debug lines to help track down a problem (This used to be ctdb commit 2ca31e9de179f76e392a26cc8305e2473357c760) 2007-10-18 16:27:36 +10:00			`}`
- moved cmdline options that are only relevant to ctdbd into ctdbd.c - fixed a valgrind error on failing to send a control - don't mark node dead when already disconnected - moved node list lock code into common code (This used to be ctdb commit bcc0432d0fea7ef223f82ccee81cf35c18144b1b) 2007-06-02 10:03:28 +10:00			`return false;`
			`}`

			`return true;`
			`}`
new simpler and much faster recovery code based on tdb transactions (This used to be ctdb commit 9ef2268a1674b01f60c58fed72af8ac982fe77a3) 2008-01-06 12:38:01 +11:00
ctdb-recoverd: New function ctdb_recovery_unlock() Unlock the recovery lock file. This way knowledge of the file descriptor isn't sprinkled throughout the code. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> 2014-12-09 14:07:20 +11:00			`void ctdb_recovery_unlock(struct ctdb_context *ctdb)`
			`{`
			`if (ctdb->recovery_lock_fd != -1) {`
			`DEBUG(DEBUG_NOTICE, ("Releasing recovery lock\n"));`
			`close(ctdb->recovery_lock_fd);`
			`ctdb->recovery_lock_fd = -1;`
			`}`
			`}`

added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 17:23:27 +11:00			`/*`
			`delete a record as part of the vacuum process`
			`only delete if we are not lmaster or dmaster, and our rsn is <= the provided rsn`
			`use non-blocking locks`
Redo the vacukming process to mkake it scalable. Vacumming used to delete one record at a time on all nodes, that was m*n behaviour and would require a huge storm of ctdb->ctdb controls and just wouldnt scale at all. The new vacuming process collects all records to be deleted locally and then only sends 1 control to the other nodes. This control contains a list of all records to be deleted. (This used to be ctdb commit 9e625ece19a91f362c9539fa73b6b2108f0d9c53) 2008-03-13 07:53:29 +11:00
			`return 0 if the record was successfully deleted (i.e. it does not exist`
			`when the function returns)`
			`or !0 is the record still exists in the tdb after returning.`
added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 17:23:27 +11:00			`*/`
ctdb-daemon: Rename struct ctdb_rec_data to ctdb_rec_data_old Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 17:30:30 +11:00			`static int delete_tdb_record(struct ctdb_context ctdb, struct ctdb_db_context ctdb_db, struct ctdb_rec_data_old *rec)`
added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 17:23:27 +11:00			`{`
vacuuming: Fix vacuuming bug where requests keep bouncing between nodes (part 2) This is caused by corruption of a record header such that the records on two nodes point to each other as dmaster. This makes a request for that record bounce between nodes endlessly. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit f0853013655ac3bedf1b793de128fb679c6db6c6) 2013-08-12 15:50:30 +10:00			`TDB_DATA key, data, data2;`
added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 17:23:27 +11:00			`struct ctdb_ltdb_header hdr, hdr2;`
ensure the main daemon doesn't use a blocking lock on the freelist (This used to be ctdb commit 73f8257906b09e6516f675883d8e7a3c455ad869) 2008-01-08 22:31:48 +11:00
			`/* these are really internal tdb functions - but we need them here for`
			`non-blocking lock of the freelist */`
			`int tdb_lock_nonblock(struct tdb_context *tdb, int list, int ltype);`
			`int tdb_unlock(struct tdb_context *tdb, int list, int ltype);`
added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 17:23:27 +11:00

			`key.dsize = rec->keylen;`
			`key.dptr = &rec->data[0];`
			`data.dsize = rec->datalen;`
			`data.dptr = &rec->data[rec->keylen];`

			`if (ctdb_lmaster(ctdb, &key) == ctdb->pnn) {`
added debug constants to allow for better mapping to syslog levels (This used to be ctdb commit 7ba8f1dde318eab03f4257e5a89fd23e7281e502) 2008-02-04 17:44:24 +11:00			`DEBUG(DEBUG_INFO,(__location__ " Called delete on record where we are lmaster\n"));`
added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 17:23:27 +11:00			`return -1;`
			`}`

			`if (data.dsize != sizeof(struct ctdb_ltdb_header)) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 20:07:15 +11:00			`DEBUG(DEBUG_ERR,(__location__ " Bad record size\n"));`
added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 17:23:27 +11:00			`return -1;`
			`}`

			`hdr = (struct ctdb_ltdb_header *)data.dptr;`

			`/* use a non-blocking lock */`
			`if (tdb_chainlock_nonblock(ctdb_db->ltdb->tdb, key) != 0) {`
			`return -1;`
			`}`

vacuuming: Fix vacuuming bug where requests keep bouncing between nodes (part 2) This is caused by corruption of a record header such that the records on two nodes point to each other as dmaster. This makes a request for that record bounce between nodes endlessly. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit f0853013655ac3bedf1b793de128fb679c6db6c6) 2013-08-12 15:50:30 +10:00			`data2 = tdb_fetch(ctdb_db->ltdb->tdb, key);`
			`if (data2.dptr == NULL) {`
added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 17:23:27 +11:00			`tdb_chainunlock(ctdb_db->ltdb->tdb, key);`
			`return 0;`
			`}`

vacuuming: Fix vacuuming bug where requests keep bouncing between nodes (part 2) This is caused by corruption of a record header such that the records on two nodes point to each other as dmaster. This makes a request for that record bounce between nodes endlessly. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit f0853013655ac3bedf1b793de128fb679c6db6c6) 2013-08-12 15:50:30 +10:00			`if (data2.dsize < sizeof(struct ctdb_ltdb_header)) {`
ensure the main daemon doesn't use a blocking lock on the freelist (This used to be ctdb commit 73f8257906b09e6516f675883d8e7a3c455ad869) 2008-01-08 22:31:48 +11:00			`if (tdb_lock_nonblock(ctdb_db->ltdb->tdb, -1, F_WRLCK) == 0) {`
Check return value of tdb_delete() Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 5cdcc3d45d358ddbcd7e864898eed9cbd9935429) 2012-11-19 11:20:31 +01:00			`if (tdb_delete(ctdb_db->ltdb->tdb, key) != 0) {`
			`DEBUG(DEBUG_CRIT,(__location__ " Failed to delete corrupt record\n"));`
			`}`
ensure the main daemon doesn't use a blocking lock on the freelist (This used to be ctdb commit 73f8257906b09e6516f675883d8e7a3c455ad869) 2008-01-08 22:31:48 +11:00			`tdb_unlock(ctdb_db->ltdb->tdb, -1, F_WRLCK);`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 20:07:15 +11:00			`DEBUG(DEBUG_CRIT,(__location__ " Deleted corrupt record\n"));`
ensure the main daemon doesn't use a blocking lock on the freelist (This used to be ctdb commit 73f8257906b09e6516f675883d8e7a3c455ad869) 2008-01-08 22:31:48 +11:00			`}`
added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 17:23:27 +11:00			`tdb_chainunlock(ctdb_db->ltdb->tdb, key);`
vacuuming: Fix vacuuming bug where requests keep bouncing between nodes (part 2) This is caused by corruption of a record header such that the records on two nodes point to each other as dmaster. This makes a request for that record bounce between nodes endlessly. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit f0853013655ac3bedf1b793de128fb679c6db6c6) 2013-08-12 15:50:30 +10:00			`free(data2.dptr);`
added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 17:23:27 +11:00			`return 0;`
			`}`

vacuuming: Fix vacuuming bug where requests keep bouncing between nodes (part 2) This is caused by corruption of a record header such that the records on two nodes point to each other as dmaster. This makes a request for that record bounce between nodes endlessly. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit f0853013655ac3bedf1b793de128fb679c6db6c6) 2013-08-12 15:50:30 +10:00			`hdr2 = (struct ctdb_ltdb_header *)data2.dptr;`
added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 17:23:27 +11:00
			`if (hdr2->rsn > hdr->rsn) {`
			`tdb_chainunlock(ctdb_db->ltdb->tdb, key);`
added debug constants to allow for better mapping to syslog levels (This used to be ctdb commit 7ba8f1dde318eab03f4257e5a89fd23e7281e502) 2008-02-04 17:44:24 +11:00			`DEBUG(DEBUG_INFO,(__location__ " Skipping record with rsn=%llu - called with rsn=%llu\n",`
added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 17:23:27 +11:00			`(unsigned long long)hdr2->rsn, (unsigned long long)hdr->rsn));`
vacuuming: Fix vacuuming bug where requests keep bouncing between nodes (part 2) This is caused by corruption of a record header such that the records on two nodes point to each other as dmaster. This makes a request for that record bounce between nodes endlessly. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit f0853013655ac3bedf1b793de128fb679c6db6c6) 2013-08-12 15:50:30 +10:00			`free(data2.dptr);`
			`return -1;`
added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 17:23:27 +11:00			`}`

READONLY: skip vacuuming or deleting records with readonly delegations. they are hot. wait until they have been revoked before we recall them. (This used to be ctdb commit 7417d994c2a159f71d27d4bcd2f53684862eece3) 2012-02-29 16:09:24 +11:00			`/* do not allow deleting record that have readonly flags set. */`
recover: use CTDB_REC_RO_FLAGS where appropriate Signed-off-by: Michael Adam <obnox@samba.org> Reviewed-By: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit b5a8791268e938d7e017056e0e2bd2cbec1fa690) 2013-04-19 16:24:32 +02:00			`if (hdr->flags & CTDB_REC_RO_FLAGS) {`
READONLY: skip vacuuming or deleting records with readonly delegations. they are hot. wait until they have been revoked before we recall them. (This used to be ctdb commit 7417d994c2a159f71d27d4bcd2f53684862eece3) 2012-02-29 16:09:24 +11:00			`tdb_chainunlock(ctdb_db->ltdb->tdb, key);`
			`DEBUG(DEBUG_INFO,(__location__ " Skipping record with readonly flags set\n"));`
vacuuming: Fix vacuuming bug where requests keep bouncing between nodes (part 2) This is caused by corruption of a record header such that the records on two nodes point to each other as dmaster. This makes a request for that record bounce between nodes endlessly. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit f0853013655ac3bedf1b793de128fb679c6db6c6) 2013-08-12 15:50:30 +10:00			`free(data2.dptr);`
			`return -1;`
READONLY: skip vacuuming or deleting records with readonly delegations. they are hot. wait until they have been revoked before we recall them. (This used to be ctdb commit 7417d994c2a159f71d27d4bcd2f53684862eece3) 2012-02-29 16:09:24 +11:00			`}`
recover: use CTDB_REC_RO_FLAGS where appropriate Signed-off-by: Michael Adam <obnox@samba.org> Reviewed-By: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit b5a8791268e938d7e017056e0e2bd2cbec1fa690) 2013-04-19 16:24:32 +02:00			`if (hdr2->flags & CTDB_REC_RO_FLAGS) {`
READONLY: skip vacuuming or deleting records with readonly delegations. they are hot. wait until they have been revoked before we recall them. (This used to be ctdb commit 7417d994c2a159f71d27d4bcd2f53684862eece3) 2012-02-29 16:09:24 +11:00			`tdb_chainunlock(ctdb_db->ltdb->tdb, key);`
			`DEBUG(DEBUG_INFO,(__location__ " Skipping record with readonly flags set\n"));`
vacuuming: Fix vacuuming bug where requests keep bouncing between nodes (part 2) This is caused by corruption of a record header such that the records on two nodes point to each other as dmaster. This makes a request for that record bounce between nodes endlessly. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit f0853013655ac3bedf1b793de128fb679c6db6c6) 2013-08-12 15:50:30 +10:00			`free(data2.dptr);`
			`return -1;`
READONLY: skip vacuuming or deleting records with readonly delegations. they are hot. wait until they have been revoked before we recall them. (This used to be ctdb commit 7417d994c2a159f71d27d4bcd2f53684862eece3) 2012-02-29 16:09:24 +11:00			`}`

added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 17:23:27 +11:00			`if (hdr2->dmaster == ctdb->pnn) {`
			`tdb_chainunlock(ctdb_db->ltdb->tdb, key);`
added debug constants to allow for better mapping to syslog levels (This used to be ctdb commit 7ba8f1dde318eab03f4257e5a89fd23e7281e502) 2008-02-04 17:44:24 +11:00			`DEBUG(DEBUG_INFO,(__location__ " Attempted delete record where we are the dmaster\n"));`
vacuuming: Fix vacuuming bug where requests keep bouncing between nodes (part 2) This is caused by corruption of a record header such that the records on two nodes point to each other as dmaster. This makes a request for that record bounce between nodes endlessly. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit f0853013655ac3bedf1b793de128fb679c6db6c6) 2013-08-12 15:50:30 +10:00			`free(data2.dptr);`
			`return -1;`
added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 17:23:27 +11:00			`}`

ensure the main daemon doesn't use a blocking lock on the freelist (This used to be ctdb commit 73f8257906b09e6516f675883d8e7a3c455ad869) 2008-01-08 22:31:48 +11:00			`if (tdb_lock_nonblock(ctdb_db->ltdb->tdb, -1, F_WRLCK) != 0) {`
			`tdb_chainunlock(ctdb_db->ltdb->tdb, key);`
vacuuming: Fix vacuuming bug where requests keep bouncing between nodes (part 2) This is caused by corruption of a record header such that the records on two nodes point to each other as dmaster. This makes a request for that record bounce between nodes endlessly. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit f0853013655ac3bedf1b793de128fb679c6db6c6) 2013-08-12 15:50:30 +10:00			`free(data2.dptr);`
			`return -1;`
ensure the main daemon doesn't use a blocking lock on the freelist (This used to be ctdb commit 73f8257906b09e6516f675883d8e7a3c455ad869) 2008-01-08 22:31:48 +11:00			`}`

added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 17:23:27 +11:00			`if (tdb_delete(ctdb_db->ltdb->tdb, key) != 0) {`
ensure the main daemon doesn't use a blocking lock on the freelist (This used to be ctdb commit 73f8257906b09e6516f675883d8e7a3c455ad869) 2008-01-08 22:31:48 +11:00			`tdb_unlock(ctdb_db->ltdb->tdb, -1, F_WRLCK);`
added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 17:23:27 +11:00			`tdb_chainunlock(ctdb_db->ltdb->tdb, key);`
added debug constants to allow for better mapping to syslog levels (This used to be ctdb commit 7ba8f1dde318eab03f4257e5a89fd23e7281e502) 2008-02-04 17:44:24 +11:00			`DEBUG(DEBUG_INFO,(__location__ " Failed to delete record\n"));`
vacuuming: Fix vacuuming bug where requests keep bouncing between nodes (part 2) This is caused by corruption of a record header such that the records on two nodes point to each other as dmaster. This makes a request for that record bounce between nodes endlessly. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit f0853013655ac3bedf1b793de128fb679c6db6c6) 2013-08-12 15:50:30 +10:00			`free(data2.dptr);`
			`return -1;`
added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 17:23:27 +11:00			`}`

ensure the main daemon doesn't use a blocking lock on the freelist (This used to be ctdb commit 73f8257906b09e6516f675883d8e7a3c455ad869) 2008-01-08 22:31:48 +11:00			`tdb_unlock(ctdb_db->ltdb->tdb, -1, F_WRLCK);`
added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 17:23:27 +11:00			`tdb_chainunlock(ctdb_db->ltdb->tdb, key);`
vacuuming: Fix vacuuming bug where requests keep bouncing between nodes (part 2) This is caused by corruption of a record header such that the records on two nodes point to each other as dmaster. This makes a request for that record bounce between nodes endlessly. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit f0853013655ac3bedf1b793de128fb679c6db6c6) 2013-08-12 15:50:30 +10:00			`free(data2.dptr);`
			`return 0;`
added two new ctdb commands: ctdb vacuum : vacuums all the databases, deleting any zero length ctdb records ctdb repack : repacks all the databases, resulting in a perfectly packed database with no freelist entries (This used to be ctdb commit 3532119c84ab3247051ed6ba21ba3243ae2f6bf4) 2008-01-08 17:23:27 +11:00			`}`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 13:59:28 +11:00

Redo the vacukming process to mkake it scalable. Vacumming used to delete one record at a time on all nodes, that was m*n behaviour and would require a huge storm of ctdb->ctdb controls and just wouldnt scale at all. The new vacuming process collects all records to be deleted locally and then only sends 1 control to the other nodes. This control contains a list of all records to be deleted. (This used to be ctdb commit 9e625ece19a91f362c9539fa73b6b2108f0d9c53) 2008-03-13 07:53:29 +11:00
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 13:59:28 +11:00			`struct recovery_callback_state {`
ctdb-daemon: Rename struct ctdb_req_control to ctdb_req_control_old Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 16:42:05 +11:00			`struct ctdb_req_control_old *c;`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 13:59:28 +11:00			`};`


			`/*`
			`called when the 'recovered' event script has finished`
			`*/`
			`static void ctdb_end_recovery_callback(struct ctdb_context ctdb, int status, void p)`
			`{`
			`struct recovery_callback_state *state = talloc_get_type(p, struct recovery_callback_state);`

			`ctdb_enable_monitoring(ctdb);`
Create macros to update the statistics counters and use these macros everywhere instead of manipulating the coutenrs directly. (This used to be ctdb commit 2e648df890e5713bc575965d87937827b068d0d7) 2010-09-29 10:38:41 +10:00			`CTDB_INCREMENT_STAT(ctdb, num_recoveries);`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 13:59:28 +11:00
			`if (status != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 20:07:15 +11:00			`DEBUG(DEBUG_ERR,(__location__ " recovered event script failed (status %d)\n", status));`
eventscript: handle banning within the callbacks Currently the timeout handler in eventscript.c does the banning if a timeout happens. However, because monitor events are different, it has to special case them. As we call the callback anyway in this case, we should make that handle -ETIME as it sees fit: for everyone but the monitor event, we simply ban ourselves. The more complicated monitor event banning logic is now in ctdb_monitor.c where it belongs. Note: I wrapped the other bans in "if (status == -ETIME)", though they should probably ban themselves on any error. This change should be a noop. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 9ecee127e19a9e7cae114a66f3514ee7a75276c5) 2009-12-07 23:48:57 +10:30			`if (status == -ETIME) {`
			`ctdb_ban_self(ctdb);`
			`}`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 13:59:28 +11:00			`}`

			`ctdb_request_control_reply(ctdb, state->c, NULL, status, NULL);`
			`talloc_free(state);`

track both when we last started and ended a recovery. make ctdb uptime print how long the recovery took in the recovery daemon when we check that the public ip address allocation on the local node is correct (we have the ips we should have and we dont have any we shouldnt have) use ctdb uptime and check the recovery start/stop times and make sure we dont check for ip allocation inconsistencies during a recovery where the ip address allocation is in flux. (This used to be ctdb commit f86551580349b7f662f9a07e4eb0c1189e38e429) 2008-07-02 13:55:59 +10:00			`gettimeofday(&ctdb->last_recovery_finished, NULL);`
ctdbd: Add new runstate CTDB_RUNSTATE_FIRST_RECOVERY This adds more serialisation to the startup, ensuring that the "startup" event runs after everything to do with the first recovery (including the "recovered" event). Given that it now takes longer to get to the "startup" state, the initscript needs to wait until ctdbd gets to "first_recovery". Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit ed6814ff0a59ddbb1c1b3128b505380f60d7aeb7) 2013-04-18 20:30:14 +10:00
			`if (ctdb->runstate == CTDB_RUNSTATE_FIRST_RECOVERY) {`
			`ctdb_set_runstate(ctdb, CTDB_RUNSTATE_STARTUP);`
			`}`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 13:59:28 +11:00			`}`

			`/*`
			`recovery has finished`
			`*/`
			`int32_t ctdb_control_end_recovery(struct ctdb_context *ctdb,`
ctdb-daemon: Rename struct ctdb_req_control to ctdb_req_control_old Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 16:42:05 +11:00			`struct ctdb_req_control_old *c,`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 13:59:28 +11:00			`bool *async_reply)`
			`{`
			`int ret;`
			`struct recovery_callback_state *state;`

merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 20:07:15 +11:00			`DEBUG(DEBUG_NOTICE,("Recovery has finished\n"));`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 13:59:28 +11:00
recover: finish pending trans3 commits when a recovery is finished. When the end_recovery control is received, pending trans3 commits are finished. During the recovery, all the actions like persistent_callback and persistent_store_timeout had been disabled to let the recovery do its job. After the recover is completed, send the reply to the waiting clients. (This used to be ctdb commit f7dfeb7143f574c2434f7dd16917380dfd1f4f64) 2011-02-23 17:39:57 +01:00			`ctdb_persistent_finish_trans3_commits(ctdb);`

merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 13:59:28 +11:00			`state = talloc(ctdb, struct recovery_callback_state);`
			`CTDB_NO_MEMORY(ctdb, state);`

In ctdb_control_end_recovery, We used to talloc_steal c (the command packet) and make it a child of the "event script state context". If we failed to create a eventscript child context for some reason, this would have talloc freed state, but at the same time it would also implicitely have freed c. Once ctdb_control_end_recovery() returns the error back to the caller, the caller would dereference both c, and also outdata which is a child of c and we would either read garbage data or segv. Change the ordering so we only talloc_steal c as a child of state IFF we have successfully created a child context for the script. BZ61068 (This used to be ctdb commit 259054c3632e42bbaa614ee7e888e6e850733d60) 2010-02-23 12:43:49 +11:00			`state->c = c;`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 13:59:28 +11:00
			`ctdb_disable_monitoring(ctdb);`

eventscript: put timeout inside ctdb_event_script_callback_v Everyone uses the same timeout value, so just remove it from the API. If we ever need variable timeouts, that might as well be central too. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 533c3e053293941d2a9484b495e78d45f478bb08) 2009-11-24 11:09:46 +10:30			`ret = ctdb_event_script_callback(ctdb, state,`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 13:59:28 +11:00			`ctdb_end_recovery_callback,`
Add flag to ctdb_event_script_callback indicating when called by client. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit a1d654a982ca56fade82552f4e6b5586236d3233) 2009-11-26 15:49:49 +11:00			`state,`
			`CTDB_EVENT_RECOVERED, "%s", "");`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 13:59:28 +11:00
			`if (ret != 0) {`
			`ctdb_enable_monitoring(ctdb);`

merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 20:07:15 +11:00			`DEBUG(DEBUG_ERR,(__location__ " Failed to end recovery\n"));`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 13:59:28 +11:00			`talloc_free(state);`
			`return -1;`
			`}`

			`/* tell the control that we will be reply asynchronously */`
In ctdb_control_end_recovery, We used to talloc_steal c (the command packet) and make it a child of the "event script state context". If we failed to create a eventscript child context for some reason, this would have talloc freed state, but at the same time it would also implicitely have freed c. Once ctdb_control_end_recovery() returns the error back to the caller, the caller would dereference both c, and also outdata which is a child of c and we would either read garbage data or segv. Change the ordering so we only talloc_steal c as a child of state IFF we have successfully created a child context for the script. BZ61068 (This used to be ctdb commit 259054c3632e42bbaa614ee7e888e6e850733d60) 2010-02-23 12:43:49 +11:00			`state->c = talloc_steal(state, c);`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 13:59:28 +11:00			`*async_reply = true;`
			`return 0;`
			`}`

			`/*`
			`called when the 'startrecovery' event script has finished`
			`*/`
			`static void ctdb_start_recovery_callback(struct ctdb_context ctdb, int status, void p)`
			`{`
			`struct recovery_callback_state *state = talloc_get_type(p, struct recovery_callback_state);`

			`if (status != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 20:07:15 +11:00			`DEBUG(DEBUG_ERR,(__location__ " startrecovery event script failed (status %d)\n", status));`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 13:59:28 +11:00			`}`

			`ctdb_request_control_reply(ctdb, state->c, NULL, status, NULL);`
			`talloc_free(state);`
			`}`

			`/*`
track both when we last started and ended a recovery. make ctdb uptime print how long the recovery took in the recovery daemon when we check that the public ip address allocation on the local node is correct (we have the ips we should have and we dont have any we shouldnt have) use ctdb uptime and check the recovery start/stop times and make sure we dont check for ip allocation inconsistencies during a recovery where the ip address allocation is in flux. (This used to be ctdb commit f86551580349b7f662f9a07e4eb0c1189e38e429) 2008-07-02 13:55:59 +10:00			`run the startrecovery eventscript`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 13:59:28 +11:00			`*/`
			`int32_t ctdb_control_start_recovery(struct ctdb_context *ctdb,`
ctdb-daemon: Rename struct ctdb_req_control to ctdb_req_control_old Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 16:42:05 +11:00			`struct ctdb_req_control_old *c,`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 13:59:28 +11:00			`bool *async_reply)`
			`{`
			`int ret;`
			`struct recovery_callback_state *state;`

update a comment to reflect that this is not always a real recovery it can also be printed when we just do an ip reallocation (This used to be ctdb commit e4c9e511fc5e15e0638ebb9117cb4a65ca8fda4b) 2008-07-02 12:01:19 +10:00			`DEBUG(DEBUG_NOTICE,(__location__ " startrecovery eventscript has been invoked\n"));`
track both when we last started and ended a recovery. make ctdb uptime print how long the recovery took in the recovery daemon when we check that the public ip address allocation on the local node is correct (we have the ips we should have and we dont have any we shouldnt have) use ctdb uptime and check the recovery start/stop times and make sure we dont check for ip allocation inconsistencies during a recovery where the ip address allocation is in flux. (This used to be ctdb commit f86551580349b7f662f9a07e4eb0c1189e38e429) 2008-07-02 13:55:59 +10:00			`gettimeofday(&ctdb->last_recovery_started, NULL);`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 13:59:28 +11:00
			`state = talloc(ctdb, struct recovery_callback_state);`
			`CTDB_NO_MEMORY(ctdb, state);`

			`state->c = talloc_steal(state, c);`

			`ctdb_disable_monitoring(ctdb);`

eventscript: put timeout inside ctdb_event_script_callback_v Everyone uses the same timeout value, so just remove it from the API. If we ever need variable timeouts, that might as well be central too. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 533c3e053293941d2a9484b495e78d45f478bb08) 2009-11-24 11:09:46 +10:30			`ret = ctdb_event_script_callback(ctdb, state,`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 13:59:28 +11:00			`ctdb_start_recovery_callback,`
ctdb-daemon: No need to call event scripts with CTDB_CALLED_BY_USER This was added to support external monitoring using CTDB event scripts. However, it was never used. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2013-12-16 15:57:42 +11:00			`state,`
Add flag to ctdb_event_script_callback indicating when called by client. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit a1d654a982ca56fade82552f4e6b5586236d3233) 2009-11-26 15:49:49 +11:00			`CTDB_EVENT_START_RECOVERY,`
eventscript: introduce enum for different event script calls. Rather than doing strcmp everywhere, pass an explicit enum around. This also subtly documents what options are available. The "options" arg is now used for extra arguments only. Unfortunately, gcc complains on empty format strings, so we make ctdb_event_script() take no varargs, and add ctdb_event_script_args(). We leave ctdb_event_script_callback() taking varargs, which means callers have to do "%s", "". For the moment, we have CTDB_EVENT_UNKNOWN for handling forced scripts from the ctdb tool. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 8001488be4f2beb25e943fe01b2afc2e8779930d) 2009-11-24 11:16:49 +10:30			`"%s", "");`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 13:59:28 +11:00
			`if (ret != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 20:07:15 +11:00			`DEBUG(DEBUG_ERR,(__location__ " Failed to start recovery\n"));`
merge async recovery changes from Ronnie (This used to be ctdb commit 576e317640d25f8059114f15c6f1ebcee5e5b6e2) 2008-01-29 13:59:28 +11:00			`talloc_free(state);`
			`return -1;`
			`}`

			`/* tell the control that we will be reply asynchronously */`
			`*async_reply = true;`
			`return 0;`
			`}`

Redo the vacukming process to mkake it scalable. Vacumming used to delete one record at a time on all nodes, that was m*n behaviour and would require a huge storm of ctdb->ctdb controls and just wouldnt scale at all. The new vacuming process collects all records to be deleted locally and then only sends 1 control to the other nodes. This control contains a list of all records to be deleted. (This used to be ctdb commit 9e625ece19a91f362c9539fa73b6b2108f0d9c53) 2008-03-13 07:53:29 +11:00			`/*`
			`try to delete all these records as part of the vacuuming process`
			`and return the records we failed to delete`
			`*/`
			`int32_t ctdb_control_try_delete_records(struct ctdb_context ctdb, TDB_DATA indata, TDB_DATA outdata)`
			`{`
rename the structure we use for marshalling multiple records (This used to be ctdb commit 4d205476d286570a6e1f52b59af42858ce051106) 2008-07-30 14:24:56 +10:00			`struct ctdb_marshall_buffer reply = (struct ctdb_marshall_buffer )indata.dptr;`
Redo the vacukming process to mkake it scalable. Vacumming used to delete one record at a time on all nodes, that was m*n behaviour and would require a huge storm of ctdb->ctdb controls and just wouldnt scale at all. The new vacuming process collects all records to be deleted locally and then only sends 1 control to the other nodes. This control contains a list of all records to be deleted. (This used to be ctdb commit 9e625ece19a91f362c9539fa73b6b2108f0d9c53) 2008-03-13 07:53:29 +11:00			`struct ctdb_db_context *ctdb_db;`
			`int i;`
ctdb-daemon: Rename struct ctdb_rec_data to ctdb_rec_data_old Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 17:30:30 +11:00			`struct ctdb_rec_data_old *rec;`
rename the structure we use for marshalling multiple records (This used to be ctdb commit 4d205476d286570a6e1f52b59af42858ce051106) 2008-07-30 14:24:56 +10:00			`struct ctdb_marshall_buffer *records;`
Redo the vacukming process to mkake it scalable. Vacumming used to delete one record at a time on all nodes, that was m*n behaviour and would require a huge storm of ctdb->ctdb controls and just wouldnt scale at all. The new vacuming process collects all records to be deleted locally and then only sends 1 control to the other nodes. This control contains a list of all records to be deleted. (This used to be ctdb commit 9e625ece19a91f362c9539fa73b6b2108f0d9c53) 2008-03-13 07:53:29 +11:00
rename the structure we use for marshalling multiple records (This used to be ctdb commit 4d205476d286570a6e1f52b59af42858ce051106) 2008-07-30 14:24:56 +10:00			`if (indata.dsize < offsetof(struct ctdb_marshall_buffer, data)) {`
Redo the vacukming process to mkake it scalable. Vacumming used to delete one record at a time on all nodes, that was m*n behaviour and would require a huge storm of ctdb->ctdb controls and just wouldnt scale at all. The new vacuming process collects all records to be deleted locally and then only sends 1 control to the other nodes. This control contains a list of all records to be deleted. (This used to be ctdb commit 9e625ece19a91f362c9539fa73b6b2108f0d9c53) 2008-03-13 07:53:29 +11:00			`DEBUG(DEBUG_ERR,(__location__ " invalid data in try_delete_records\n"));`
			`return -1;`
			`}`

			`ctdb_db = find_ctdb_db(ctdb, reply->db_id);`
			`if (!ctdb_db) {`
			`DEBUG(DEBUG_ERR,(__location__ " Unknown db 0x%08x\n", reply->db_id));`
			`return -1;`
			`}`


			`DEBUG(DEBUG_DEBUG,("starting try_delete_records of %u records for dbid 0x%x\n",`
			`reply->count, reply->db_id));`


			`/* create a blob to send back the records we couldnt delete */`
rename the structure we use for marshalling multiple records (This used to be ctdb commit 4d205476d286570a6e1f52b59af42858ce051106) 2008-07-30 14:24:56 +10:00			`records = (struct ctdb_marshall_buffer *)`
Redo the vacukming process to mkake it scalable. Vacumming used to delete one record at a time on all nodes, that was m*n behaviour and would require a huge storm of ctdb->ctdb controls and just wouldnt scale at all. The new vacuming process collects all records to be deleted locally and then only sends 1 control to the other nodes. This control contains a list of all records to be deleted. (This used to be ctdb commit 9e625ece19a91f362c9539fa73b6b2108f0d9c53) 2008-03-13 07:53:29 +11:00			`talloc_zero_size(outdata,`
rename the structure we use for marshalling multiple records (This used to be ctdb commit 4d205476d286570a6e1f52b59af42858ce051106) 2008-07-30 14:24:56 +10:00			`offsetof(struct ctdb_marshall_buffer, data));`
Redo the vacukming process to mkake it scalable. Vacumming used to delete one record at a time on all nodes, that was m*n behaviour and would require a huge storm of ctdb->ctdb controls and just wouldnt scale at all. The new vacuming process collects all records to be deleted locally and then only sends 1 control to the other nodes. This control contains a list of all records to be deleted. (This used to be ctdb commit 9e625ece19a91f362c9539fa73b6b2108f0d9c53) 2008-03-13 07:53:29 +11:00			`if (records == NULL) {`
			`DEBUG(DEBUG_ERR,(__location__ " Out of memory\n"));`
			`return -1;`
			`}`
			`records->db_id = ctdb_db->db_id;`


ctdb-daemon: Rename struct ctdb_rec_data to ctdb_rec_data_old Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 17:30:30 +11:00			`rec = (struct ctdb_rec_data_old *)&reply->data[0];`
Redo the vacukming process to mkake it scalable. Vacumming used to delete one record at a time on all nodes, that was m*n behaviour and would require a huge storm of ctdb->ctdb controls and just wouldnt scale at all. The new vacuming process collects all records to be deleted locally and then only sends 1 control to the other nodes. This control contains a list of all records to be deleted. (This used to be ctdb commit 9e625ece19a91f362c9539fa73b6b2108f0d9c53) 2008-03-13 07:53:29 +11:00			`for (i=0;i<reply->count;i++) {`
			`TDB_DATA key, data;`

			`key.dptr = &rec->data[0];`
			`key.dsize = rec->keylen;`
			`data.dptr = &rec->data[key.dsize];`
			`data.dsize = rec->datalen;`

			`if (data.dsize < sizeof(struct ctdb_ltdb_header)) {`
			`DEBUG(DEBUG_CRIT,(__location__ " bad ltdb record in indata\n"));`
			`return -1;`
			`}`

			`/* If we cant delete the record we must add it to the reply`
			`so the lmaster knows it may not purge this record`
			`*/`
			`if (delete_tdb_record(ctdb, ctdb_db, rec) != 0) {`
			`size_t old_size;`
			`struct ctdb_ltdb_header *hdr;`

			`hdr = (struct ctdb_ltdb_header *)data.dptr;`
			`data.dptr += sizeof(*hdr);`
			`data.dsize -= sizeof(*hdr);`

			`DEBUG(DEBUG_INFO, (__location__ " Failed to vacuum delete record with hash 0x%08x\n", ctdb_hash(&key)));`

			`old_size = talloc_get_size(records);`
			`records = talloc_realloc_size(outdata, records, old_size + rec->length);`
			`if (records == NULL) {`
			`DEBUG(DEBUG_ERR,(__location__ " Failed to expand\n"));`
			`return -1;`
			`}`
			`records->count++;`
			`memcpy(old_size+(uint8_t *)records, rec, rec->length);`
			`}`

ctdb-daemon: Rename struct ctdb_rec_data to ctdb_rec_data_old Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 17:30:30 +11:00			`rec = (struct ctdb_rec_data_old )(rec->length + (uint8_t )rec);`
Redo the vacukming process to mkake it scalable. Vacumming used to delete one record at a time on all nodes, that was m*n behaviour and would require a huge storm of ctdb->ctdb controls and just wouldnt scale at all. The new vacuming process collects all records to be deleted locally and then only sends 1 control to the other nodes. This control contains a list of all records to be deleted. (This used to be ctdb commit 9e625ece19a91f362c9539fa73b6b2108f0d9c53) 2008-03-13 07:53:29 +11:00			`}`


ctdb-vacuum: Use existing function ctdb_marshall_finish Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Volker Lendecke <vl@samba.org> Autobuild-User(master): Amitay Isaacs <amitay@samba.org> Autobuild-Date(master): Wed Jul 23 09:44:00 CEST 2014 on sn-devel-104 2014-05-06 18:52:54 +10:00			`*outdata = ctdb_marshall_finish(records);`
Redo the vacukming process to mkake it scalable. Vacumming used to delete one record at a time on all nodes, that was m*n behaviour and would require a huge storm of ctdb->ctdb controls and just wouldnt scale at all. The new vacuming process collects all records to be deleted locally and then only sends 1 control to the other nodes. This control contains a list of all records to be deleted. (This used to be ctdb commit 9e625ece19a91f362c9539fa73b6b2108f0d9c53) 2008-03-13 07:53:29 +11:00
			`return 0;`
			`}`
Expand the client async framework so that it can take a callback function. This allows us to use the async framework also for controls that return outdata. Add a "capabilities" field to the ctdb_node structure. This field is only initialized and kept valid inside the recovery daemon context and not inside the main ctdb daemon. change the GET_CAPABILITIES control to return the capabilities in outdata instead of in the res return variable. When performing a recovery inside the recovery daemon, read the capabilities from all connected nodes and update the ctdb->nodes list of nodes. when building the new vnnmap after the database rebuild in recovery, do not include any nodes which lack the LMASTER capability in the new vnnmap. Unless there are no available connected node that sports the LMASTER capability in which case we let the local node (recmaster) take on the lmaster role temporarily (i.e. become a member of the vnnmap list) (This used to be ctdb commit 0f1883c69c689b28b0c04148774840b2c4081df6) 2008-05-06 15:42:59 +10:00
vacuum: introduce the RECEIVE_RECORDS control This in preparation of turning the vacuming on the lmaster into into a two phase process: - First the node sends the list of records to be vacuumed to all other nodes with this new RECEIVE_RECORDS control. The remote nodes should store the lmaster's empty current copy. - Only those records that could be stored on all other nodes are processed further. They are send to all other nodes with the TRY_DELETE_RECORDS control as before for deletion. Signed-off-by: Michael Adam <obnox@samba.org> Reviewed-By: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit e397702e271af38204fd99733bbeba7c1db3a999) 2012-12-21 00:24:47 +01:00			`/**`
			`* Store a record as part of the vacuum process:`
			`* This is called from the RECEIVE_RECORD control which`
			`* the lmaster uses to send the current empty copy`
			`* to all nodes for storing, before it lets the other`
			`* nodes delete the records in the second phase with`
			`* the TRY_DELETE_RECORDS control.`
			`*`
			`* Only store if we are not lmaster or dmaster, and our`
			`* rsn is <= the provided rsn. Use non-blocking locks.`
			`*`
			`* return 0 if the record was successfully stored.`
			`* return !0 if the record still exists in the tdb after returning.`
			`*/`
			`static int store_tdb_record(struct ctdb_context *ctdb,`
			`struct ctdb_db_context *ctdb_db,`
ctdb-daemon: Rename struct ctdb_rec_data to ctdb_rec_data_old Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 17:30:30 +11:00			`struct ctdb_rec_data_old *rec)`
vacuum: introduce the RECEIVE_RECORDS control This in preparation of turning the vacuming on the lmaster into into a two phase process: - First the node sends the list of records to be vacuumed to all other nodes with this new RECEIVE_RECORDS control. The remote nodes should store the lmaster's empty current copy. - Only those records that could be stored on all other nodes are processed further. They are send to all other nodes with the TRY_DELETE_RECORDS control as before for deletion. Signed-off-by: Michael Adam <obnox@samba.org> Reviewed-By: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit e397702e271af38204fd99733bbeba7c1db3a999) 2012-12-21 00:24:47 +01:00			`{`
			`TDB_DATA key, data, data2;`
			`struct ctdb_ltdb_header hdr, hdr2;`
			`int ret;`

			`key.dsize = rec->keylen;`
			`key.dptr = &rec->data[0];`
			`data.dsize = rec->datalen;`
			`data.dptr = &rec->data[rec->keylen];`

			`if (ctdb_lmaster(ctdb, &key) == ctdb->pnn) {`
			`DEBUG(DEBUG_INFO, (__location__ " Called store_tdb_record "`
			`"where we are lmaster\n"));`
			`return -1;`
			`}`

			`if (data.dsize != sizeof(struct ctdb_ltdb_header)) {`
			`DEBUG(DEBUG_ERR, (__location__ " Bad record size\n"));`
			`return -1;`
			`}`

			`hdr = (struct ctdb_ltdb_header *)data.dptr;`

			`/* use a non-blocking lock */`
			`if (tdb_chainlock_nonblock(ctdb_db->ltdb->tdb, key) != 0) {`
vacuum: Reduce the priority of non-critical error Since the complete database is not locked when the receive_records control is received, it's possible that we may not be able to obtain lock on a chain. We will try again to store this record. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit 32723c9efdad1c6ca4aa53f308ccd9bef1aadfff) 2013-05-24 18:07:39 +10:00			`DEBUG(DEBUG_INFO, (__location__ " Failed to lock chain in non-blocking mode\n"));`
vacuum: introduce the RECEIVE_RECORDS control This in preparation of turning the vacuming on the lmaster into into a two phase process: - First the node sends the list of records to be vacuumed to all other nodes with this new RECEIVE_RECORDS control. The remote nodes should store the lmaster's empty current copy. - Only those records that could be stored on all other nodes are processed further. They are send to all other nodes with the TRY_DELETE_RECORDS control as before for deletion. Signed-off-by: Michael Adam <obnox@samba.org> Reviewed-By: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit e397702e271af38204fd99733bbeba7c1db3a999) 2012-12-21 00:24:47 +01:00			`return -1;`
			`}`

			`data2 = tdb_fetch(ctdb_db->ltdb->tdb, key);`
			`if (data2.dptr == NULL \|\| data2.dsize < sizeof(struct ctdb_ltdb_header)) {`
ctdb-server: Coverity fixes Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Michael Adam <obnox@samba.org> 2013-11-11 12:39:27 +11:00			`if (tdb_store(ctdb_db->ltdb->tdb, key, data, 0) == -1) {`
			`DEBUG(DEBUG_ERR, (__location__ "Failed to store record\n"));`
			`ret = -1;`
			`goto done;`
			`}`
vacuum: introduce the RECEIVE_RECORDS control This in preparation of turning the vacuming on the lmaster into into a two phase process: - First the node sends the list of records to be vacuumed to all other nodes with this new RECEIVE_RECORDS control. The remote nodes should store the lmaster's empty current copy. - Only those records that could be stored on all other nodes are processed further. They are send to all other nodes with the TRY_DELETE_RECORDS control as before for deletion. Signed-off-by: Michael Adam <obnox@samba.org> Reviewed-By: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit e397702e271af38204fd99733bbeba7c1db3a999) 2012-12-21 00:24:47 +01:00			`DEBUG(DEBUG_INFO, (__location__ " Stored record\n"));`
			`ret = 0;`
			`goto done;`
			`}`

vacuuming: Fix vacuuming bug where requests keep bouncing between nodes (part 1) This is caused by corruption of a record header such that the records on two nodes point to each other as dmaster. This makes a request for that record bounce between nodes endlessly. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit a610bc351f0754c84c78c27d02f9a695e60c5b0f) 2013-08-12 15:51:00 +10:00			`hdr2 = (struct ctdb_ltdb_header *)data2.dptr;`
vacuum: introduce the RECEIVE_RECORDS control This in preparation of turning the vacuming on the lmaster into into a two phase process: - First the node sends the list of records to be vacuumed to all other nodes with this new RECEIVE_RECORDS control. The remote nodes should store the lmaster's empty current copy. - Only those records that could be stored on all other nodes are processed further. They are send to all other nodes with the TRY_DELETE_RECORDS control as before for deletion. Signed-off-by: Michael Adam <obnox@samba.org> Reviewed-By: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit e397702e271af38204fd99733bbeba7c1db3a999) 2012-12-21 00:24:47 +01:00
			`if (hdr2->rsn > hdr->rsn) {`
			`DEBUG(DEBUG_INFO, (__location__ " Skipping record with "`
			`"rsn=%llu - called with rsn=%llu\n",`
			`(unsigned long long)hdr2->rsn,`
			`(unsigned long long)hdr->rsn));`
			`ret = -1;`
			`goto done;`
			`}`

			`/* do not allow vacuuming of records that have readonly flags set. */`
recover: use CTDB_REC_RO_FLAGS where appropriate Signed-off-by: Michael Adam <obnox@samba.org> Reviewed-By: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit b5a8791268e938d7e017056e0e2bd2cbec1fa690) 2013-04-19 16:24:32 +02:00			`if (hdr->flags & CTDB_REC_RO_FLAGS) {`
vacuum: introduce the RECEIVE_RECORDS control This in preparation of turning the vacuming on the lmaster into into a two phase process: - First the node sends the list of records to be vacuumed to all other nodes with this new RECEIVE_RECORDS control. The remote nodes should store the lmaster's empty current copy. - Only those records that could be stored on all other nodes are processed further. They are send to all other nodes with the TRY_DELETE_RECORDS control as before for deletion. Signed-off-by: Michael Adam <obnox@samba.org> Reviewed-By: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit e397702e271af38204fd99733bbeba7c1db3a999) 2012-12-21 00:24:47 +01:00			`DEBUG(DEBUG_INFO,(__location__ " Skipping record with readonly "`
			`"flags set\n"));`
			`ret = -1;`
			`goto done;`
			`}`
recover: use CTDB_REC_RO_FLAGS where appropriate Signed-off-by: Michael Adam <obnox@samba.org> Reviewed-By: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit b5a8791268e938d7e017056e0e2bd2cbec1fa690) 2013-04-19 16:24:32 +02:00			`if (hdr2->flags & CTDB_REC_RO_FLAGS) {`
vacuum: introduce the RECEIVE_RECORDS control This in preparation of turning the vacuming on the lmaster into into a two phase process: - First the node sends the list of records to be vacuumed to all other nodes with this new RECEIVE_RECORDS control. The remote nodes should store the lmaster's empty current copy. - Only those records that could be stored on all other nodes are processed further. They are send to all other nodes with the TRY_DELETE_RECORDS control as before for deletion. Signed-off-by: Michael Adam <obnox@samba.org> Reviewed-By: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit e397702e271af38204fd99733bbeba7c1db3a999) 2012-12-21 00:24:47 +01:00			`DEBUG(DEBUG_INFO,(__location__ " Skipping record with readonly "`
			`"flags set\n"));`
			`ret = -1;`
			`goto done;`
			`}`

			`if (hdr2->dmaster == ctdb->pnn) {`
			`DEBUG(DEBUG_INFO, (__location__ " Attempted to store record "`
			`"where we are the dmaster\n"));`
			`ret = -1;`
			`goto done;`
			`}`

			`if (tdb_store(ctdb_db->ltdb->tdb, key, data, 0) != 0) {`
			`DEBUG(DEBUG_INFO,(__location__ " Failed to store record\n"));`
			`ret = -1;`
			`goto done;`
			`}`

			`ret = 0;`

			`done:`
			`tdb_chainunlock(ctdb_db->ltdb->tdb, key);`
			`free(data2.dptr);`
			`return ret;`
			`}`



			`/**`
			`* Try to store all these records as part of the vacuuming process`
			`* and return the records we failed to store.`
			`*/`
			`int32_t ctdb_control_receive_records(struct ctdb_context *ctdb,`
			`TDB_DATA indata, TDB_DATA *outdata)`
			`{`
			`struct ctdb_marshall_buffer reply = (struct ctdb_marshall_buffer )indata.dptr;`
			`struct ctdb_db_context *ctdb_db;`
			`int i;`
ctdb-daemon: Rename struct ctdb_rec_data to ctdb_rec_data_old Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 17:30:30 +11:00			`struct ctdb_rec_data_old *rec;`
vacuum: introduce the RECEIVE_RECORDS control This in preparation of turning the vacuming on the lmaster into into a two phase process: - First the node sends the list of records to be vacuumed to all other nodes with this new RECEIVE_RECORDS control. The remote nodes should store the lmaster's empty current copy. - Only those records that could be stored on all other nodes are processed further. They are send to all other nodes with the TRY_DELETE_RECORDS control as before for deletion. Signed-off-by: Michael Adam <obnox@samba.org> Reviewed-By: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit e397702e271af38204fd99733bbeba7c1db3a999) 2012-12-21 00:24:47 +01:00			`struct ctdb_marshall_buffer *records;`

			`if (indata.dsize < offsetof(struct ctdb_marshall_buffer, data)) {`
			`DEBUG(DEBUG_ERR,`
			`(__location__ " invalid data in receive_records\n"));`
			`return -1;`
			`}`

			`ctdb_db = find_ctdb_db(ctdb, reply->db_id);`
			`if (!ctdb_db) {`
			`DEBUG(DEBUG_ERR, (__location__ " Unknown db 0x%08x\n",`
			`reply->db_id));`
			`return -1;`
			`}`

			`DEBUG(DEBUG_DEBUG, ("starting receive_records of %u records for "`
			`"dbid 0x%x\n", reply->count, reply->db_id));`

			`/* create a blob to send back the records we could not store */`
			`records = (struct ctdb_marshall_buffer *)`
			`talloc_zero_size(outdata,`
			`offsetof(struct ctdb_marshall_buffer, data));`
			`if (records == NULL) {`
			`DEBUG(DEBUG_ERR, (__location__ " Out of memory\n"));`
			`return -1;`
			`}`
			`records->db_id = ctdb_db->db_id;`

ctdb-daemon: Rename struct ctdb_rec_data to ctdb_rec_data_old Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 17:30:30 +11:00			`rec = (struct ctdb_rec_data_old *)&reply->data[0];`
vacuum: introduce the RECEIVE_RECORDS control This in preparation of turning the vacuming on the lmaster into into a two phase process: - First the node sends the list of records to be vacuumed to all other nodes with this new RECEIVE_RECORDS control. The remote nodes should store the lmaster's empty current copy. - Only those records that could be stored on all other nodes are processed further. They are send to all other nodes with the TRY_DELETE_RECORDS control as before for deletion. Signed-off-by: Michael Adam <obnox@samba.org> Reviewed-By: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit e397702e271af38204fd99733bbeba7c1db3a999) 2012-12-21 00:24:47 +01:00			`for (i=0; i<reply->count; i++) {`
			`TDB_DATA key, data;`

			`key.dptr = &rec->data[0];`
			`key.dsize = rec->keylen;`
			`data.dptr = &rec->data[key.dsize];`
			`data.dsize = rec->datalen;`

			`if (data.dsize < sizeof(struct ctdb_ltdb_header)) {`
			`DEBUG(DEBUG_CRIT, (__location__ " bad ltdb record "`
			`"in indata\n"));`
			`return -1;`
			`}`

			`/*`
			`* If we can not store the record we must add it to the reply`
			`* so the lmaster knows it may not purge this record.`
			`*/`
			`if (store_tdb_record(ctdb, ctdb_db, rec) != 0) {`
			`size_t old_size;`
			`struct ctdb_ltdb_header *hdr;`

			`hdr = (struct ctdb_ltdb_header *)data.dptr;`
			`data.dptr += sizeof(*hdr);`
			`data.dsize -= sizeof(*hdr);`

			`DEBUG(DEBUG_INFO, (__location__ " Failed to store "`
			`"record with hash 0x%08x in vacuum "`
			`"via RECEIVE_RECORDS\n",`
			`ctdb_hash(&key)));`

			`old_size = talloc_get_size(records);`
			`records = talloc_realloc_size(outdata, records,`
			`old_size + rec->length);`
			`if (records == NULL) {`
			`DEBUG(DEBUG_ERR, (__location__ " Failed to "`
			`"expand\n"));`
			`return -1;`
			`}`
			`records->count++;`
			`memcpy(old_size+(uint8_t *)records, rec, rec->length);`
			`}`

ctdb-daemon: Rename struct ctdb_rec_data to ctdb_rec_data_old Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-29 17:30:30 +11:00			`rec = (struct ctdb_rec_data_old )(rec->length + (uint8_t )rec);`
vacuum: introduce the RECEIVE_RECORDS control This in preparation of turning the vacuming on the lmaster into into a two phase process: - First the node sends the list of records to be vacuumed to all other nodes with this new RECEIVE_RECORDS control. The remote nodes should store the lmaster's empty current copy. - Only those records that could be stored on all other nodes are processed further. They are send to all other nodes with the TRY_DELETE_RECORDS control as before for deletion. Signed-off-by: Michael Adam <obnox@samba.org> Reviewed-By: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit e397702e271af38204fd99733bbeba7c1db3a999) 2012-12-21 00:24:47 +01:00			`}`

ctdb-vacuum: Use existing function ctdb_marshall_finish Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Volker Lendecke <vl@samba.org> Autobuild-User(master): Amitay Isaacs <amitay@samba.org> Autobuild-Date(master): Wed Jul 23 09:44:00 CEST 2014 on sn-devel-104 2014-05-06 18:52:54 +10:00			`*outdata = ctdb_marshall_finish(records);`
vacuum: introduce the RECEIVE_RECORDS control This in preparation of turning the vacuming on the lmaster into into a two phase process: - First the node sends the list of records to be vacuumed to all other nodes with this new RECEIVE_RECORDS control. The remote nodes should store the lmaster's empty current copy. - Only those records that could be stored on all other nodes are processed further. They are send to all other nodes with the TRY_DELETE_RECORDS control as before for deletion. Signed-off-by: Michael Adam <obnox@samba.org> Reviewed-By: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit e397702e271af38204fd99733bbeba7c1db3a999) 2012-12-21 00:24:47 +01:00
			`return 0;`
			`}`


Expand the client async framework so that it can take a callback function. This allows us to use the async framework also for controls that return outdata. Add a "capabilities" field to the ctdb_node structure. This field is only initialized and kept valid inside the recovery daemon context and not inside the main ctdb daemon. change the GET_CAPABILITIES control to return the capabilities in outdata instead of in the res return variable. When performing a recovery inside the recovery daemon, read the capabilities from all connected nodes and update the ctdb->nodes list of nodes. when building the new vnnmap after the database rebuild in recovery, do not include any nodes which lack the LMASTER capability in the new vnnmap. Unless there are no available connected node that sports the LMASTER capability in which case we let the local node (recmaster) take on the lmaster role temporarily (i.e. become a member of the vnnmap list) (This used to be ctdb commit 0f1883c69c689b28b0c04148774840b2c4081df6) 2008-05-06 15:42:59 +10:00			`/*`
			`report capabilities`
			`*/`
			`int32_t ctdb_control_get_capabilities(struct ctdb_context ctdb, TDB_DATA outdata)`
			`{`
			`uint32_t *capabilities = NULL;`

			`capabilities = talloc(outdata, uint32_t);`
			`CTDB_NO_MEMORY(ctdb, capabilities);`
			`*capabilities = ctdb->capabilities;`

			`outdata->dsize = sizeof(uint32_t);`
			`outdata->dptr = (uint8_t *)capabilities;`

			`return 0;`
			`}`

daemon: On shutdown, destroy timed events that check if recoverd is active When CTDB is shutting down, recovery daemon is stopped, but the event that checks if recovery daemon is still alive is not destroyed. So recovery master is restarted during shutdown if CTDB daemon takes longer to shutdown. There are two processes that check if recovery daemon is working. 1. ctdb_check_recd() - which checks every 30 seconds if the recovery daemon process exists. 2. ctdb_recd_ping_timeout() - which is triggered when recovery daemon fails to ping CTDB daemon. Both the events are periodic and need to be destroyed when shutting down. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 746168df2e691058e601016110fae818c6a265c3) 2012-12-04 15:05:44 +11:00			`/* The recovery daemon will ping us at regular intervals.`
			`If we havent been pinged for a while we assume the recovery`
			`daemon is inoperable and we restart.`
			`*/`
ctdb-daemon: Stop using tevent compatibility definitions Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-26 16:50:09 +11:00			`static void ctdb_recd_ping_timeout(struct tevent_context *ev,`
			`struct tevent_timer *te,`
			`struct timeval t, void *p)`
additional monitoring between the two daemons. we currently only monitor that the dameons are running by kill(0, pid) and verifying the the domain socket between them is ok. this is not sufficient since we can have a situation where the recovery daemon is hung. this new code monitors that the recovery daemon is operating. if the recovery hangs, we log this and shut down the main daemon (This used to be ctdb commit cd69d292292eaab3aac0e9d9fc57cb621597c63c) 2008-09-09 13:44:46 +10:00			`{`
			`struct ctdb_context *ctdb = talloc_get_type(p, struct ctdb_context);`
The ctdb daemon keeps track of whether the recovery process is running correctly by measuring how long it was since the last successful communication with the recovery daemon was recorded. After a certain timeout the ctdb daemon would deem the recovery daemon as inoperable and shut down. If the system clock is suddenly changed forward by many (60 or more) seconds this could cause the timeout to trigger prematurely/immediately where ctdb would incorrectly think that more than 60 seconds had passed since last successful communications and thus abort. Instead of cehcking for one timeout occuring, only deem the recovery daemon to be "down" and trigger a shutdown if communications have timedout for three intervals in a row. (This used to be ctdb commit 196968c552e6ebcb57389d769a4b25f42fa8bc5d) 2008-09-17 14:17:41 +10:00			`uint32_t *count = talloc_get_type(ctdb->recd_ping_count, uint32_t);`
additional monitoring between the two daemons. we currently only monitor that the dameons are running by kill(0, pid) and verifying the the domain socket between them is ok. this is not sufficient since we can have a situation where the recovery daemon is hung. this new code monitors that the recovery daemon is operating. if the recovery hangs, we log this and shut down the main daemon (This used to be ctdb commit cd69d292292eaab3aac0e9d9fc57cb621597c63c) 2008-09-09 13:44:46 +10:00
add extra debug statements to the log to make it easier to see when a recovery dameon has hung due to the underlying filesystem hanging. (This used to be ctdb commit 5b0067a4e335cbbf6e606646e612d4bfcfdb7441) 2009-05-12 18:39:34 +10:00			`DEBUG(DEBUG_ERR, ("Recovery daemon ping timeout. Count : %u\n", *count));`
The ctdb daemon keeps track of whether the recovery process is running correctly by measuring how long it was since the last successful communication with the recovery daemon was recorded. After a certain timeout the ctdb daemon would deem the recovery daemon as inoperable and shut down. If the system clock is suddenly changed forward by many (60 or more) seconds this could cause the timeout to trigger prematurely/immediately where ctdb would incorrectly think that more than 60 seconds had passed since last successful communications and thus abort. Instead of cehcking for one timeout occuring, only deem the recovery daemon to be "down" and trigger a shutdown if communications have timedout for three intervals in a row. (This used to be ctdb commit 196968c552e6ebcb57389d769a4b25f42fa8bc5d) 2008-09-17 14:17:41 +10:00
use the correct tunable failcount not timeout (This used to be ctdb commit 475cfada33b4c13aaaca773d5485bbe26bffbf46) 2008-09-17 14:24:12 +10:00			`if (*count < ctdb->tunable.recd_ping_failcount) {`
The ctdb daemon keeps track of whether the recovery process is running correctly by measuring how long it was since the last successful communication with the recovery daemon was recorded. After a certain timeout the ctdb daemon would deem the recovery daemon as inoperable and shut down. If the system clock is suddenly changed forward by many (60 or more) seconds this could cause the timeout to trigger prematurely/immediately where ctdb would incorrectly think that more than 60 seconds had passed since last successful communications and thus abort. Instead of cehcking for one timeout occuring, only deem the recovery daemon to be "down" and trigger a shutdown if communications have timedout for three intervals in a row. (This used to be ctdb commit 196968c552e6ebcb57389d769a4b25f42fa8bc5d) 2008-09-17 14:17:41 +10:00			`(*count)++;`
ctdb-daemon: Stop using tevent compatibility definitions Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-26 16:50:09 +11:00			`tevent_add_timer(ctdb->ev, ctdb->recd_ping_count,`
			`timeval_current_ofs(ctdb->tunable.recd_ping_timeout, 0),`
			`ctdb_recd_ping_timeout, ctdb);`
The ctdb daemon keeps track of whether the recovery process is running correctly by measuring how long it was since the last successful communication with the recovery daemon was recorded. After a certain timeout the ctdb daemon would deem the recovery daemon as inoperable and shut down. If the system clock is suddenly changed forward by many (60 or more) seconds this could cause the timeout to trigger prematurely/immediately where ctdb would incorrectly think that more than 60 seconds had passed since last successful communications and thus abort. Instead of cehcking for one timeout occuring, only deem the recovery daemon to be "down" and trigger a shutdown if communications have timedout for three intervals in a row. (This used to be ctdb commit 196968c552e6ebcb57389d769a4b25f42fa8bc5d) 2008-09-17 14:17:41 +10:00			`return;`
			`}`

Restart recovery dameon if it looks like it hung. Dont shutdown ctdbd completely, that only makes the problem worse. (This used to be ctdb commit 221ecc2509f6d267d1854c1042ff945a620510bb) 2011-03-04 06:55:24 +11:00			`DEBUG(DEBUG_ERR, ("Final timeout for recovery daemon ping. Restarting recovery daemon. (This can be caused if the cluster filesystem has hung)\n"));`
additional monitoring between the two daemons. we currently only monitor that the dameons are running by kill(0, pid) and verifying the the domain socket between them is ok. this is not sufficient since we can have a situation where the recovery daemon is hung. this new code monitors that the recovery daemon is operating. if the recovery hangs, we log this and shut down the main daemon (This used to be ctdb commit cd69d292292eaab3aac0e9d9fc57cb621597c63c) 2008-09-09 13:44:46 +10:00
			`ctdb_stop_recoverd(ctdb);`
Restart recovery dameon if it looks like it hung. Dont shutdown ctdbd completely, that only makes the problem worse. (This used to be ctdb commit 221ecc2509f6d267d1854c1042ff945a620510bb) 2011-03-04 06:55:24 +11:00			`ctdb_start_recoverd(ctdb);`
additional monitoring between the two daemons. we currently only monitor that the dameons are running by kill(0, pid) and verifying the the domain socket between them is ok. this is not sufficient since we can have a situation where the recovery daemon is hung. this new code monitors that the recovery daemon is operating. if the recovery hangs, we log this and shut down the main daemon (This used to be ctdb commit cd69d292292eaab3aac0e9d9fc57cb621597c63c) 2008-09-09 13:44:46 +10:00			`}`

			`int32_t ctdb_control_recd_ping(struct ctdb_context *ctdb)`
			`{`
The ctdb daemon keeps track of whether the recovery process is running correctly by measuring how long it was since the last successful communication with the recovery daemon was recorded. After a certain timeout the ctdb daemon would deem the recovery daemon as inoperable and shut down. If the system clock is suddenly changed forward by many (60 or more) seconds this could cause the timeout to trigger prematurely/immediately where ctdb would incorrectly think that more than 60 seconds had passed since last successful communications and thus abort. Instead of cehcking for one timeout occuring, only deem the recovery daemon to be "down" and trigger a shutdown if communications have timedout for three intervals in a row. (This used to be ctdb commit 196968c552e6ebcb57389d769a4b25f42fa8bc5d) 2008-09-17 14:17:41 +10:00			`talloc_free(ctdb->recd_ping_count);`
additional monitoring between the two daemons. we currently only monitor that the dameons are running by kill(0, pid) and verifying the the domain socket between them is ok. this is not sufficient since we can have a situation where the recovery daemon is hung. this new code monitors that the recovery daemon is operating. if the recovery hangs, we log this and shut down the main daemon (This used to be ctdb commit cd69d292292eaab3aac0e9d9fc57cb621597c63c) 2008-09-09 13:44:46 +10:00
The ctdb daemon keeps track of whether the recovery process is running correctly by measuring how long it was since the last successful communication with the recovery daemon was recorded. After a certain timeout the ctdb daemon would deem the recovery daemon as inoperable and shut down. If the system clock is suddenly changed forward by many (60 or more) seconds this could cause the timeout to trigger prematurely/immediately where ctdb would incorrectly think that more than 60 seconds had passed since last successful communications and thus abort. Instead of cehcking for one timeout occuring, only deem the recovery daemon to be "down" and trigger a shutdown if communications have timedout for three intervals in a row. (This used to be ctdb commit 196968c552e6ebcb57389d769a4b25f42fa8bc5d) 2008-09-17 14:17:41 +10:00			`ctdb->recd_ping_count = talloc_zero(ctdb, uint32_t);`
			`CTDB_NO_MEMORY(ctdb, ctdb->recd_ping_count);`
additional monitoring between the two daemons. we currently only monitor that the dameons are running by kill(0, pid) and verifying the the domain socket between them is ok. this is not sufficient since we can have a situation where the recovery daemon is hung. this new code monitors that the recovery daemon is operating. if the recovery hangs, we log this and shut down the main daemon (This used to be ctdb commit cd69d292292eaab3aac0e9d9fc57cb621597c63c) 2008-09-09 13:44:46 +10:00
			`if (ctdb->tunable.recd_ping_timeout != 0) {`
ctdb-daemon: Stop using tevent compatibility definitions Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> 2015-10-26 16:50:09 +11:00			`tevent_add_timer(ctdb->ev, ctdb->recd_ping_count,`
			`timeval_current_ofs(ctdb->tunable.recd_ping_timeout, 0),`
			`ctdb_recd_ping_timeout, ctdb);`
additional monitoring between the two daemons. we currently only monitor that the dameons are running by kill(0, pid) and verifying the the domain socket between them is ok. this is not sufficient since we can have a situation where the recovery daemon is hung. this new code monitors that the recovery daemon is operating. if the recovery hangs, we log this and shut down the main daemon (This used to be ctdb commit cd69d292292eaab3aac0e9d9fc57cb621597c63c) 2008-09-09 13:44:46 +10:00			`}`

			`return 0;`
			`}`

add a context and a timed event so that once we have been in recovery mode for too long we drop all public ip addresses (This used to be ctdb commit 403c68f96e1380dd07217c688de2730464f77ea0) 2008-10-22 11:04:41 +11:00

			`int32_t ctdb_control_set_recmaster(struct ctdb_context *ctdb, uint32_t opcode, TDB_DATA indata)`
			`{`
ctdbd: Log a message when recovery master changes Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-Programmed-With: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 1f96ea08f9a39dfe537c9b957ac512c84dc76f91) 2013-05-14 16:20:32 +10:00			`uint32_t new_recmaster;`

add a context and a timed event so that once we have been in recovery mode for too long we drop all public ip addresses (This used to be ctdb commit 403c68f96e1380dd07217c688de2730464f77ea0) 2008-10-22 11:04:41 +11:00			`CHECK_CONTROL_DATA_SIZE(sizeof(uint32_t));`
ctdbd: Log a message when recovery master changes Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-Programmed-With: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 1f96ea08f9a39dfe537c9b957ac512c84dc76f91) 2013-05-14 16:20:32 +10:00			`new_recmaster = ((uint32_t *)(&indata.dptr[0]))[0];`

			`if (ctdb->pnn != new_recmaster && ctdb->recovery_master == ctdb->pnn) {`
			`DEBUG(DEBUG_NOTICE,`
			`("This node (%u) is no longer the recovery master\n", ctdb->pnn));`
			`}`

			`if (ctdb->pnn == new_recmaster && ctdb->recovery_master != new_recmaster) {`
			`DEBUG(DEBUG_NOTICE,`
			`("This node (%u) is now the recovery master\n", ctdb->pnn));`
			`}`
allow to change the recmaster even the database is not frozen (This used to be ctdb commit 03e2e436db5cfd29a56d13f5d2101e42389bfc94) 2008-11-21 16:24:12 +11:00
ctdbd: Log a message when recovery master changes Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-Programmed-With: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 1f96ea08f9a39dfe537c9b957ac512c84dc76f91) 2013-05-14 16:20:32 +10:00			`ctdb->recovery_master = new_recmaster;`
add a context and a timed event so that once we have been in recovery mode for too long we drop all public ip addresses (This used to be ctdb commit 403c68f96e1380dd07217c688de2730464f77ea0) 2008-10-22 11:04:41 +11:00			`return 0;`
			`}`
add two new controls, CTOP_NODE and CONTINUE_NODE that are used to stop/continue a node instead of using modflags messages (This used to be ctdb commit 54b4a02053a0f98f8c424e7f658890254023d39a) 2009-07-09 12:22:46 +10:00
create a new event : stopped. This event is called when a node is stopped and is used by eventscripts that need to do certain cleanup and removal of configuration or ip addresses or routing ... Note that a STOPPED node is considered "inactive" and as such will not be running the "recovered" event when the rest of the cluster has recovered. (This used to be ctdb commit 65e9309564611bf937ded3c74a79abff895d7c59) 2009-07-17 12:26:16 +10:00
ctdbd: Remove the "stopped" event It isn't used, superceded by "ipreallocated". Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit c2bb8596a8af6406ef50e53953884df9d6246a96) 2013-02-21 14:28:13 +11:00			`int32_t ctdb_control_stop_node(struct ctdb_context *ctdb)`
add two new controls, CTOP_NODE and CONTINUE_NODE that are used to stop/continue a node instead of using modflags messages (This used to be ctdb commit 54b4a02053a0f98f8c424e7f658890254023d39a) 2009-07-09 12:22:46 +10:00			`{`
ctdbd: Log node state transitions at higher debug level Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit db31dc48bd3135e9242af08bb79b67a17a2b1668) 2013-05-29 12:11:49 +10:00			`DEBUG(DEBUG_NOTICE, ("Stopping node\n"));`
create a new event : stopped. This event is called when a node is stopped and is used by eventscripts that need to do certain cleanup and removal of configuration or ip addresses or routing ... Note that a STOPPED node is considered "inactive" and as such will not be running the "recovered" event when the rest of the cluster has recovered. (This used to be ctdb commit 65e9309564611bf937ded3c74a79abff895d7c59) 2009-07-17 12:26:16 +10:00			`ctdb_disable_monitoring(ctdb);`
add two new controls, CTOP_NODE and CONTINUE_NODE that are used to stop/continue a node instead of using modflags messages (This used to be ctdb commit 54b4a02053a0f98f8c424e7f658890254023d39a) 2009-07-09 12:22:46 +10:00			`ctdb->nodes[ctdb->pnn]->flags \|= NODE_FLAGS_STOPPED;`

			`return 0;`
			`}`

			`int32_t ctdb_control_continue_node(struct ctdb_context *ctdb)`
			`{`
ctdbd: Log node state transitions at higher debug level Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit db31dc48bd3135e9242af08bb79b67a17a2b1668) 2013-05-29 12:11:49 +10:00			`DEBUG(DEBUG_NOTICE, ("Continue node\n"));`
add two new controls, CTOP_NODE and CONTINUE_NODE that are used to stop/continue a node instead of using modflags messages (This used to be ctdb commit 54b4a02053a0f98f8c424e7f658890254023d39a) 2009-07-09 12:22:46 +10:00			`ctdb->nodes[ctdb->pnn]->flags &= ~NODE_FLAGS_STOPPED;`

			`return 0;`
			`}`
ReadOnly: add a new control to activate readonly lock capability for a database. let all databases default to not support this until enabled through this control (This used to be ctdb commit 908a07c42e5135a3ba30a625fc4f4e4916de197a) 2011-09-01 11:08:18 +10:00

1707 lines 45 KiB C Raw Normal View History Unescape Escape

1707 lines

45 KiB

C

Raw Normal View History