samba-mirror

mirror of https://github.com/samba-team/samba.git synced 2024-12-23 17:34:34 +03:00

281 lines

7.6 KiB

C

Raw Normal View History

add a missing file :-) (This used to be ctdb commit 29cf1b927f2cebfdc43e22d32a270e956716e2c5) 2007-05-18 14:06:29 +04:00			`/*`
			`monitoring links to all other nodes to detect dead nodes`


			`Copyright (C) Ronnie Sahlberg 2007`

ctdb is GPL not LGPL (This used to be ctdb commit 8624378010d1c2a1438e1e701339dfba7276f960) 2007-05-31 07:50:53 +04:00			`This program is free software; you can redistribute it and/or modify`
			`it under the terms of the GNU General Public License as published by`
update lib/replace from samba4 (This used to be ctdb commit f0555484105668c01c21f56322992e752e831109) 2007-07-10 09:29:31 +04:00			`the Free Software Foundation; either version 3 of the License, or`
ctdb is GPL not LGPL (This used to be ctdb commit 8624378010d1c2a1438e1e701339dfba7276f960) 2007-05-31 07:50:53 +04:00			`(at your option) any later version.`

			`This program is distributed in the hope that it will be useful,`
add a missing file :-) (This used to be ctdb commit 29cf1b927f2cebfdc43e22d32a270e956716e2c5) 2007-05-18 14:06:29 +04:00			`but WITHOUT ANY WARRANTY; without even the implied warranty of`
ctdb is GPL not LGPL (This used to be ctdb commit 8624378010d1c2a1438e1e701339dfba7276f960) 2007-05-31 07:50:53 +04:00			`MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the`
			`GNU General Public License for more details.`

			`You should have received a copy of the GNU General Public License`
update lib/replace from samba4 (This used to be ctdb commit f0555484105668c01c21f56322992e752e831109) 2007-07-10 09:29:31 +04:00			`along with this program; if not, see <http://www.gnu.org/licenses/>.`
add a missing file :-) (This used to be ctdb commit 29cf1b927f2cebfdc43e22d32a270e956716e2c5) 2007-05-18 14:06:29 +04:00			`*/`

			`#include "includes.h"`
			`#include "lib/events/events.h"`
			`#include "system/filesys.h"`
			`#include "system/wait.h"`
			`#include "../include/ctdb_private.h"`

			`/*`
- up rx_cnt on all packet types - notice when a node becomes available again (This used to be ctdb commit e05110dd6112e81f224937dfd7370d963ce9531a) 2007-05-18 17:23:36 +04:00			`see if any nodes are dead`
			`*/`
add a missing file :-) (This used to be ctdb commit 29cf1b927f2cebfdc43e22d32a270e956716e2c5) 2007-05-18 14:06:29 +04:00			`static void ctdb_check_for_dead_nodes(struct event_context ev, struct timed_event te,`
added timeouts in all event scripts (This used to be ctdb commit d986c91a607ed7c7d4869ea786b5cdf80e7862f1) 2007-06-06 07:45:12 +04:00			`struct timeval t, void *private_data)`
add a missing file :-) (This used to be ctdb commit 29cf1b927f2cebfdc43e22d32a270e956716e2c5) 2007-05-18 14:06:29 +04:00			`{`
			`struct ctdb_context *ctdb = talloc_get_type(private_data, struct ctdb_context);`
			`int i;`

added health monitoring logic to ctdb, so a node loses its public IP address if one of the sybsystem event scripts reports a problem (This used to be ctdb commit c7a089256d86cec21097453bce5acbccee87413f) 2007-06-06 04:25:46 +04:00			`if (ctdb->monitoring_mode == CTDB_MONITORING_DISABLED) {`
added timeouts in all event scripts (This used to be ctdb commit d986c91a607ed7c7d4869ea786b5cdf80e7862f1) 2007-06-06 07:45:12 +04:00			`event_add_timed(ctdb->ev, ctdb->monitor_context,`
added health monitoring logic to ctdb, so a node loses its public IP address if one of the sybsystem event scripts reports a problem (This used to be ctdb commit c7a089256d86cec21097453bce5acbccee87413f) 2007-06-06 04:25:46 +04:00			`timeval_current_ofs(ctdb->tunable.keepalive_interval, 0),`
add controls to enable/disable the monitoring of dead nodes (This used to be ctdb commit 79d29c39bb81feb069db3fc6d3d392c1e75a4d13) 2007-05-21 03:24:34 +04:00			`ctdb_check_for_dead_nodes, ctdb);`
			`return;`
			`}`

add a missing file :-) (This used to be ctdb commit 29cf1b927f2cebfdc43e22d32a270e956716e2c5) 2007-05-18 14:06:29 +04:00			`/* send a keepalive to all other nodes, unless */`
			`for (i=0;i<ctdb->num_nodes;i++) {`
- up rx_cnt on all packet types - notice when a node becomes available again (This used to be ctdb commit e05110dd6112e81f224937dfd7370d963ce9531a) 2007-05-18 17:23:36 +04:00			`struct ctdb_node *node = ctdb->nodes[i];`
change ctdb->vnn to ctdb->pnn (This used to be ctdb commit 8c776e5707e503ec6586aae39ac6b3ea5a2fd2bc) 2007-09-04 04:06:36 +04:00			`if (node->pnn == ctdb->pnn) {`
add a missing file :-) (This used to be ctdb commit 29cf1b927f2cebfdc43e22d32a270e956716e2c5) 2007-05-18 14:06:29 +04:00			`continue;`
			`}`
- up rx_cnt on all packet types - notice when a node becomes available again (This used to be ctdb commit e05110dd6112e81f224937dfd7370d963ce9531a) 2007-05-18 17:23:36 +04:00
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`if (node->flags & NODE_FLAGS_DISCONNECTED) {`
- moved cmdline options that are only relevant to ctdbd into ctdbd.c - fixed a valgrind error on failing to send a control - don't mark node dead when already disconnected - moved node list lock code into common code (This used to be ctdb commit bcc0432d0fea7ef223f82ccee81cf35c18144b1b) 2007-06-02 04:03:28 +04:00			`/* it might have come alive again */`
			`if (node->rx_cnt != 0) {`
			`ctdb_node_connected(node);`
			`}`
merge from ronnie (This used to be ctdb commit 985d718e03510398b9a5cfdf6a4d559a90738a11) 2007-05-19 11:21:58 +04:00			`continue;`
add a missing file :-) (This used to be ctdb commit 29cf1b927f2cebfdc43e22d32a270e956716e2c5) 2007-05-18 14:06:29 +04:00			`}`

- moved cmdline options that are only relevant to ctdbd into ctdbd.c - fixed a valgrind error on failing to send a control - don't mark node dead when already disconnected - moved node list lock code into common code (This used to be ctdb commit bcc0432d0fea7ef223f82ccee81cf35c18144b1b) 2007-06-02 04:03:28 +04:00
- up rx_cnt on all packet types - notice when a node becomes available again (This used to be ctdb commit e05110dd6112e81f224937dfd7370d963ce9531a) 2007-05-18 17:23:36 +04:00			`if (node->rx_cnt == 0) {`
			`node->dead_count++;`
add a missing file :-) (This used to be ctdb commit 29cf1b927f2cebfdc43e22d32a270e956716e2c5) 2007-05-18 14:06:29 +04:00			`} else {`
- up rx_cnt on all packet types - notice when a node becomes available again (This used to be ctdb commit e05110dd6112e81f224937dfd7370d963ce9531a) 2007-05-18 17:23:36 +04:00			`node->dead_count = 0;`
add a missing file :-) (This used to be ctdb commit 29cf1b927f2cebfdc43e22d32a270e956716e2c5) 2007-05-18 14:06:29 +04:00			`}`

- up rx_cnt on all packet types - notice when a node becomes available again (This used to be ctdb commit e05110dd6112e81f224937dfd7370d963ce9531a) 2007-05-18 17:23:36 +04:00			`node->rx_cnt = 0;`

added health monitoring logic to ctdb, so a node loses its public IP address if one of the sybsystem event scripts reports a problem (This used to be ctdb commit c7a089256d86cec21097453bce5acbccee87413f) 2007-06-06 04:25:46 +04:00			`if (node->dead_count >= ctdb->tunable.keepalive_limit) {`
change how we do public addresses and takeover so that we can have multiple public addresses spread across multiple interfaces on each node. this is a massive patch since we have previously made the assumtion that we only have one public address per node. get rid of the public_interface argument. the public addresses file now explicitely lists which interface the address belongs to (This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8) 2007-09-04 03:50:07 +04:00			`DEBUG(0,("dead count reached for node %u\n", node->pnn));`
use ctdb_dead_node() instead of reimplementing the same code again this leaves only one single function where a node is marked as dead instead of two places (This used to be ctdb commit aa764ea26cc26d5c1ae188105236da603576f45b) 2007-05-19 10:59:10 +04:00			`ctdb_node_dead(node);`
change how we do public addresses and takeover so that we can have multiple public addresses spread across multiple interfaces on each node. this is a massive patch since we have previously made the assumtion that we only have one public address per node. get rid of the public_interface argument. the public addresses file now explicitely lists which interface the address belongs to (This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8) 2007-09-04 03:50:07 +04:00			`ctdb_send_keepalive(ctdb, node->pnn);`
- up rx_cnt on all packet types - notice when a node becomes available again (This used to be ctdb commit e05110dd6112e81f224937dfd7370d963ce9531a) 2007-05-18 17:23:36 +04:00			`/* maybe tell the transport layer to kill the`
			`sockets as well?`
add a missing file :-) (This used to be ctdb commit 29cf1b927f2cebfdc43e22d32a270e956716e2c5) 2007-05-18 14:06:29 +04:00			`*/`
			`continue;`
			`}`
add a node->tx_cnt counter only send keepalive packets if the count is zero (This used to be ctdb commit 2cbd424231caccf0a531cf6501761115efe68f3e) 2007-05-19 04:20:19 +04:00
- moved cmdline options that are only relevant to ctdbd into ctdbd.c - fixed a valgrind error on failing to send a control - don't mark node dead when already disconnected - moved node list lock code into common code (This used to be ctdb commit bcc0432d0fea7ef223f82ccee81cf35c18144b1b) 2007-06-02 04:03:28 +04:00			`if (node->tx_cnt == 0) {`
change how we do public addresses and takeover so that we can have multiple public addresses spread across multiple interfaces on each node. this is a massive patch since we have previously made the assumtion that we only have one public address per node. get rid of the public_interface argument. the public addresses file now explicitely lists which interface the address belongs to (This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8) 2007-09-04 03:50:07 +04:00			`DEBUG(5,("sending keepalive to %u\n", node->pnn));`
			`ctdb_send_keepalive(ctdb, node->pnn);`
add a node->tx_cnt counter only send keepalive packets if the count is zero (This used to be ctdb commit 2cbd424231caccf0a531cf6501761115efe68f3e) 2007-05-19 04:20:19 +04:00			`}`
add a missing file :-) (This used to be ctdb commit 29cf1b927f2cebfdc43e22d32a270e956716e2c5) 2007-05-18 14:06:29 +04:00
add a node->tx_cnt counter only send keepalive packets if the count is zero (This used to be ctdb commit 2cbd424231caccf0a531cf6501761115efe68f3e) 2007-05-19 04:20:19 +04:00			`node->tx_cnt = 0;`
add a missing file :-) (This used to be ctdb commit 29cf1b927f2cebfdc43e22d32a270e956716e2c5) 2007-05-18 14:06:29 +04:00			`}`

added timeouts in all event scripts (This used to be ctdb commit d986c91a607ed7c7d4869ea786b5cdf80e7862f1) 2007-06-06 07:45:12 +04:00			`event_add_timed(ctdb->ev, ctdb->monitor_context,`
added health monitoring logic to ctdb, so a node loses its public IP address if one of the sybsystem event scripts reports a problem (This used to be ctdb commit c7a089256d86cec21097453bce5acbccee87413f) 2007-06-06 04:25:46 +04:00			`timeval_current_ofs(ctdb->tunable.keepalive_interval, 0),`
add a missing file :-) (This used to be ctdb commit 29cf1b927f2cebfdc43e22d32a270e956716e2c5) 2007-05-18 14:06:29 +04:00			`ctdb_check_for_dead_nodes, ctdb);`
			`}`

added health monitoring logic to ctdb, so a node loses its public IP address if one of the sybsystem event scripts reports a problem (This used to be ctdb commit c7a089256d86cec21097453bce5acbccee87413f) 2007-06-06 04:25:46 +04:00			`static void ctdb_check_health(struct event_context ev, struct timed_event te,`
			`struct timeval t, void *private_data);`

			`/*`
			`called when a health monitoring event script finishes`
			`*/`
			`static void ctdb_health_callback(struct ctdb_context ctdb, int status, void p)`
			`{`
change ctdb->vnn to ctdb->pnn (This used to be ctdb commit 8c776e5707e503ec6586aae39ac6b3ea5a2fd2bc) 2007-09-04 04:06:36 +04:00			`struct ctdb_node *node = ctdb->nodes[ctdb->pnn];`
added health monitoring logic to ctdb, so a node loses its public IP address if one of the sybsystem event scripts reports a problem (This used to be ctdb commit c7a089256d86cec21097453bce5acbccee87413f) 2007-06-06 04:25:46 +04:00			`TDB_DATA data;`
			`struct ctdb_node_flag_change c;`
run monitoring more quickly when unhealthy and at startup (This used to be ctdb commit ff1c205928e3ef5bcc6bf4e4b2122a19fa38d8f4) 2007-09-24 04:12:18 +04:00			`uint32_t next_interval;`
added health monitoring logic to ctdb, so a node loses its public IP address if one of the sybsystem event scripts reports a problem (This used to be ctdb commit c7a089256d86cec21097453bce5acbccee87413f) 2007-06-06 04:25:46 +04:00
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`c.pnn = ctdb->pnn;`
change the structure used for node flag change messages so that we can see both the old flags as well as the new flags (so we can tell which flags changed) send the CTDB_SRVID_RECONFIGURE messages to connected nodes only, not to every node, connected or not, in the cluster. in the handler inside the recovery daemon which is invoked for node flag change messages, only do a takeover_run() and redistribute the ip addresses IF it was the disabled or the unhealthy flags that changed. Also send out the cluster reconfigured message in this case. If any of the other flags changed we dont need to do the takeover_run(0 here since that will be done during recovery. (This used to be ctdb commit 5549b2058e2c148a8ca9d419123acf3247bb8829) 2007-08-21 11:25:15 +04:00			`c.old_flags = node->flags;`

merged admin enable/disable change from ronnie (This used to be ctdb commit df17b69dfd83a98f9c711994c7dd51ad2cc0ab8a) 2007-06-07 05:15:22 +04:00			`if (status != 0 && !(node->flags & NODE_FLAGS_UNHEALTHY)) {`
added health monitoring logic to ctdb, so a node loses its public IP address if one of the sybsystem event scripts reports a problem (This used to be ctdb commit c7a089256d86cec21097453bce5acbccee87413f) 2007-06-06 04:25:46 +04:00			`DEBUG(0,("monitor event failed - disabling node\n"));`
merged admin enable/disable change from ronnie (This used to be ctdb commit df17b69dfd83a98f9c711994c7dd51ad2cc0ab8a) 2007-06-07 05:15:22 +04:00			`node->flags \|= NODE_FLAGS_UNHEALTHY;`
			`} else if (status == 0 && (node->flags & NODE_FLAGS_UNHEALTHY)) {`
added health monitoring logic to ctdb, so a node loses its public IP address if one of the sybsystem event scripts reports a problem (This used to be ctdb commit c7a089256d86cec21097453bce5acbccee87413f) 2007-06-06 04:25:46 +04:00			`DEBUG(0,("monitor event OK - node re-enabled\n"));`
run monitoring more quickly when unhealthy and at startup (This used to be ctdb commit ff1c205928e3ef5bcc6bf4e4b2122a19fa38d8f4) 2007-09-24 04:12:18 +04:00			`node->flags &= ~NODE_FLAGS_UNHEALTHY;`
			`}`

			`if (node->flags & NODE_FLAGS_UNHEALTHY) {`
			`next_interval = ctdb->tunable.monitor_retry;`
added health monitoring logic to ctdb, so a node loses its public IP address if one of the sybsystem event scripts reports a problem (This used to be ctdb commit c7a089256d86cec21097453bce5acbccee87413f) 2007-06-06 04:25:46 +04:00			`} else {`
run monitoring more quickly when unhealthy and at startup (This used to be ctdb commit ff1c205928e3ef5bcc6bf4e4b2122a19fa38d8f4) 2007-09-24 04:12:18 +04:00			`next_interval = ctdb->tunable.monitor_interval;`
			`}`

			`event_add_timed(ctdb->ev, ctdb->monitor_context,`
			`timeval_current_ofs(next_interval, 0),`
			`ctdb_check_health, ctdb);`

			`if (c.old_flags == node->flags) {`
added health monitoring logic to ctdb, so a node loses its public IP address if one of the sybsystem event scripts reports a problem (This used to be ctdb commit c7a089256d86cec21097453bce5acbccee87413f) 2007-06-06 04:25:46 +04:00			`return;`
			`}`

change the structure used for node flag change messages so that we can see both the old flags as well as the new flags (so we can tell which flags changed) send the CTDB_SRVID_RECONFIGURE messages to connected nodes only, not to every node, connected or not, in the cluster. in the handler inside the recovery daemon which is invoked for node flag change messages, only do a takeover_run() and redistribute the ip addresses IF it was the disabled or the unhealthy flags that changed. Also send out the cluster reconfigured message in this case. If any of the other flags changed we dont need to do the takeover_run(0 here since that will be done during recovery. (This used to be ctdb commit 5549b2058e2c148a8ca9d419123acf3247bb8829) 2007-08-21 11:25:15 +04:00			`c.new_flags = node->flags;`
added health monitoring logic to ctdb, so a node loses its public IP address if one of the sybsystem event scripts reports a problem (This used to be ctdb commit c7a089256d86cec21097453bce5acbccee87413f) 2007-06-06 04:25:46 +04:00
			`data.dptr = (uint8_t *)&c;`
			`data.dsize = sizeof(c);`

ensure all nodes display disabled nodes correctly (This used to be ctdb commit 959f82cfe926994658f5826007caccb0409003e1) 2007-06-06 15:27:09 +04:00			`/* tell the other nodes that something has changed */`
propogate flag changes to all connected nodes (This used to be ctdb commit 711d1f7e20f1e98caaf08a57df0b1825ff6e97a0) 2007-06-09 15:58:50 +04:00			`ctdb_daemon_send_message(ctdb, CTDB_BROADCAST_CONNECTED,`
ensure all nodes display disabled nodes correctly (This used to be ctdb commit 959f82cfe926994658f5826007caccb0409003e1) 2007-06-06 15:27:09 +04:00			`CTDB_SRVID_NODE_FLAGS_CHANGED, data);`
merged admin enable/disable change from ronnie (This used to be ctdb commit df17b69dfd83a98f9c711994c7dd51ad2cc0ab8a) 2007-06-07 05:15:22 +04:00
added health monitoring logic to ctdb, so a node loses its public IP address if one of the sybsystem event scripts reports a problem (This used to be ctdb commit c7a089256d86cec21097453bce5acbccee87413f) 2007-06-06 04:25:46 +04:00			`}`


prevent a deadly embrace between smbd and ctdbd by moving the calling of the startup event scripts after the point where recovery has started and the node is in normal operation This makes the 'startup' script just a special type of the 'monitor' script which is called first (This used to be ctdb commit 7424c30a5fd04aea0137c466b4318c3f185280d8) 2007-11-12 02:53:11 +03:00			`/*`
			`called when the startup event script finishes`
			`*/`
			`static void ctdb_startup_callback(struct ctdb_context ctdb, int status, void p)`
			`{`
			`if (status != 0) {`
			`DEBUG(0,("startup event failed\n"));`
			`} else if (status == 0) {`
			`DEBUG(0,("startup event OK - enabling monitoring\n"));`
			`ctdb->done_startup = true;`
			`}`

			`if (ctdb->done_startup) {`
			`event_add_timed(ctdb->ev, ctdb->monitor_context,`
			`timeval_zero(),`
			`ctdb_check_health, ctdb);`
			`} else {`
			`event_add_timed(ctdb->ev, ctdb->monitor_context,`
			`timeval_current_ofs(ctdb->tunable.monitor_interval, 0),`
			`ctdb_check_health, ctdb);`
			`}`

			`}`


added health monitoring logic to ctdb, so a node loses its public IP address if one of the sybsystem event scripts reports a problem (This used to be ctdb commit c7a089256d86cec21097453bce5acbccee87413f) 2007-06-06 04:25:46 +04:00			`/*`
			`see if the event scripts think we are healthy`
			`*/`
			`static void ctdb_check_health(struct event_context ev, struct timed_event te,`
			`struct timeval t, void *private_data)`
			`{`
			`struct ctdb_context *ctdb = talloc_get_type(private_data, struct ctdb_context);`
			`int ret;`

don't do the first startup event until we are out of recovery (This used to be ctdb commit 689940eb6e23f16ee063331caf3986613a8963ea) 2007-11-12 05:10:15 +03:00			`if (ctdb->recovery_mode != CTDB_RECOVERY_NORMAL \|\|`
			`(ctdb->monitoring_mode == CTDB_MONITORING_DISABLED && ctdb->done_startup)) {`
added timeouts in all event scripts (This used to be ctdb commit d986c91a607ed7c7d4869ea786b5cdf80e7862f1) 2007-06-06 07:45:12 +04:00			`event_add_timed(ctdb->ev, ctdb->monitor_context,`
added health monitoring logic to ctdb, so a node loses its public IP address if one of the sybsystem event scripts reports a problem (This used to be ctdb commit c7a089256d86cec21097453bce5acbccee87413f) 2007-06-06 04:25:46 +04:00			`timeval_current_ofs(ctdb->tunable.monitor_interval, 0),`
			`ctdb_check_health, ctdb);`
			`return;`
			`}`

prevent a deadly embrace between smbd and ctdbd by moving the calling of the startup event scripts after the point where recovery has started and the node is in normal operation This makes the 'startup' script just a special type of the 'monitor' script which is called first (This used to be ctdb commit 7424c30a5fd04aea0137c466b4318c3f185280d8) 2007-11-12 02:53:11 +03:00			`if (!ctdb->done_startup) {`
			`ret = ctdb_event_script_callback(ctdb,`
			`timeval_current_ofs(ctdb->tunable.script_timeout, 0),`
			`ctdb->monitor_context, ctdb_startup_callback,`
			`ctdb, "startup");`
			`} else {`
			`ret = ctdb_event_script_callback(ctdb,`
			`timeval_current_ofs(ctdb->tunable.script_timeout, 0),`
			`ctdb->monitor_context, ctdb_health_callback,`
			`ctdb, "monitor");`
			`}`

added health monitoring logic to ctdb, so a node loses its public IP address if one of the sybsystem event scripts reports a problem (This used to be ctdb commit c7a089256d86cec21097453bce5acbccee87413f) 2007-06-06 04:25:46 +04:00			`if (ret != 0) {`
			`DEBUG(0,("Unable to launch monitor event script\n"));`
added timeouts in all event scripts (This used to be ctdb commit d986c91a607ed7c7d4869ea786b5cdf80e7862f1) 2007-06-06 07:45:12 +04:00			`event_add_timed(ctdb->ev, ctdb->monitor_context,`
run monitoring more quickly when unhealthy and at startup (This used to be ctdb commit ff1c205928e3ef5bcc6bf4e4b2122a19fa38d8f4) 2007-09-24 04:12:18 +04:00			`timeval_current_ofs(ctdb->tunable.monitor_retry, 0),`
added health monitoring logic to ctdb, so a node loses its public IP address if one of the sybsystem event scripts reports a problem (This used to be ctdb commit c7a089256d86cec21097453bce5acbccee87413f) 2007-06-06 04:25:46 +04:00			`ctdb_check_health, ctdb);`
			`}`
			`}`

added timeouts in all event scripts (This used to be ctdb commit d986c91a607ed7c7d4869ea786b5cdf80e7862f1) 2007-06-06 07:45:12 +04:00			`/* stop any monitoring */`
			`void ctdb_stop_monitoring(struct ctdb_context *ctdb)`
			`{`
			`talloc_free(ctdb->monitor_context);`
			`ctdb->monitor_context = talloc_new(ctdb);`
			`CTDB_NO_MEMORY_FATAL(ctdb, ctdb->monitor_context);`
			`}`
added health monitoring logic to ctdb, so a node loses its public IP address if one of the sybsystem event scripts reports a problem (This used to be ctdb commit c7a089256d86cec21097453bce5acbccee87413f) 2007-06-06 04:25:46 +04:00
- up rx_cnt on all packet types - notice when a node becomes available again (This used to be ctdb commit e05110dd6112e81f224937dfd7370d963ce9531a) 2007-05-18 17:23:36 +04:00			`/*`
			`start watching for nodes that might be dead`
			`*/`
added timeouts in all event scripts (This used to be ctdb commit d986c91a607ed7c7d4869ea786b5cdf80e7862f1) 2007-06-06 07:45:12 +04:00			`void ctdb_start_monitoring(struct ctdb_context *ctdb)`
add a missing file :-) (This used to be ctdb commit 29cf1b927f2cebfdc43e22d32a270e956716e2c5) 2007-05-18 14:06:29 +04:00			`{`
added timeouts in all event scripts (This used to be ctdb commit d986c91a607ed7c7d4869ea786b5cdf80e7862f1) 2007-06-06 07:45:12 +04:00			`struct timed_event *te;`

			`ctdb_stop_monitoring(ctdb);`

			`te = event_add_timed(ctdb->ev, ctdb->monitor_context,`
			`timeval_current_ofs(ctdb->tunable.keepalive_interval, 0),`
			`ctdb_check_for_dead_nodes, ctdb);`
			`CTDB_NO_MEMORY_FATAL(ctdb, te);`

			`te = event_add_timed(ctdb->ev, ctdb->monitor_context,`
run monitoring more quickly when unhealthy and at startup (This used to be ctdb commit ff1c205928e3ef5bcc6bf4e4b2122a19fa38d8f4) 2007-09-24 04:12:18 +04:00			`timeval_current_ofs(ctdb->tunable.monitor_retry, 0),`
added timeouts in all event scripts (This used to be ctdb commit d986c91a607ed7c7d4869ea786b5cdf80e7862f1) 2007-06-06 07:45:12 +04:00			`ctdb_check_health, ctdb);`
			`CTDB_NO_MEMORY_FATAL(ctdb, te);`
add a missing file :-) (This used to be ctdb commit 29cf1b927f2cebfdc43e22d32a270e956716e2c5) 2007-05-18 14:06:29 +04:00			`}`
merged admin enable/disable change from ronnie (This used to be ctdb commit df17b69dfd83a98f9c711994c7dd51ad2cc0ab8a) 2007-06-07 05:15:22 +04:00

			`/*`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`modify flags on a node`
merged admin enable/disable change from ronnie (This used to be ctdb commit df17b69dfd83a98f9c711994c7dd51ad2cc0ab8a) 2007-06-07 05:15:22 +04:00			`*/`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`int32_t ctdb_control_modflags(struct ctdb_context *ctdb, TDB_DATA indata)`
merged admin enable/disable change from ronnie (This used to be ctdb commit df17b69dfd83a98f9c711994c7dd51ad2cc0ab8a) 2007-06-07 05:15:22 +04:00			`{`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`struct ctdb_node_modflags m = (struct ctdb_node_modflags )indata.dptr;`
merged admin enable/disable change from ronnie (This used to be ctdb commit df17b69dfd83a98f9c711994c7dd51ad2cc0ab8a) 2007-06-07 05:15:22 +04:00			`TDB_DATA data;`
			`struct ctdb_node_flag_change c;`
change ctdb->vnn to ctdb->pnn (This used to be ctdb commit 8c776e5707e503ec6586aae39ac6b3ea5a2fd2bc) 2007-09-04 04:06:36 +04:00			`struct ctdb_node *node = ctdb->nodes[ctdb->pnn];`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`uint32_t old_flags = node->flags;`
merged admin enable/disable change from ronnie (This used to be ctdb commit df17b69dfd83a98f9c711994c7dd51ad2cc0ab8a) 2007-06-07 05:15:22 +04:00
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`node->flags \|= m->set;`
			`node->flags &= ~m->clear;`

			`if (node->flags == old_flags) {`
if we get a modflag control but the flags remain unchanged, log this (This used to be ctdb commit 5a0cd9b37b21665054bd35facd87f0a6ff4dcd55) 2007-11-23 02:31:51 +03:00			`DEBUG(0, ("Control modflags on node %u - Unchanged - flags 0x%x\n", ctdb->pnn, node->flags));`
implement a scheme where nodes are banned if they continuously caused the cluster to start a recovery session. The node is banned from the cluster for the RecoveryBanPeriod (default of 5 minutes) (This used to be ctdb commit 4ad43dd07f526b6002477177fbf55483246c2c0c) 2007-06-07 09:18:55 +04:00			`return 0;`
merged admin enable/disable change from ronnie (This used to be ctdb commit df17b69dfd83a98f9c711994c7dd51ad2cc0ab8a) 2007-06-07 05:15:22 +04:00			`}`

change ctdb->vnn to ctdb->pnn (This used to be ctdb commit 8c776e5707e503ec6586aae39ac6b3ea5a2fd2bc) 2007-09-04 04:06:36 +04:00			`DEBUG(0, ("Control modflags on node %u - flags now 0x%x\n", ctdb->pnn, node->flags));`
added admin commands to ban/unban nodes (This used to be ctdb commit 4dad04172e7e4955b5bf6444a85b19901c9683ad) 2007-06-07 10:34:33 +04:00
			`/* if we have been banned, go into recovery mode */`
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn change ctdb_ban_info.vnn to ctdb_ban_info.pnn (This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a) 2007-09-04 04:33:10 +04:00			`c.pnn = ctdb->pnn;`
change the structure used for node flag change messages so that we can see both the old flags as well as the new flags (so we can tell which flags changed) send the CTDB_SRVID_RECONFIGURE messages to connected nodes only, not to every node, connected or not, in the cluster. in the handler inside the recovery daemon which is invoked for node flag change messages, only do a takeover_run() and redistribute the ip addresses IF it was the disabled or the unhealthy flags that changed. Also send out the cluster reconfigured message in this case. If any of the other flags changed we dont need to do the takeover_run(0 here since that will be done during recovery. (This used to be ctdb commit 5549b2058e2c148a8ca9d419123acf3247bb8829) 2007-08-21 11:25:15 +04:00			`c.old_flags = old_flags;`
			`c.new_flags = node->flags;`
merged admin enable/disable change from ronnie (This used to be ctdb commit df17b69dfd83a98f9c711994c7dd51ad2cc0ab8a) 2007-06-07 05:15:22 +04:00
			`data.dptr = (uint8_t *)&c;`
			`data.dsize = sizeof(c);`

			`/* tell the other nodes that something has changed */`
propogate flag changes to all connected nodes (This used to be ctdb commit 711d1f7e20f1e98caaf08a57df0b1825ff6e97a0) 2007-06-09 15:58:50 +04:00			`ctdb_daemon_send_message(ctdb, CTDB_BROADCAST_CONNECTED,`
merged admin enable/disable change from ronnie (This used to be ctdb commit df17b69dfd83a98f9c711994c7dd51ad2cc0ab8a) 2007-06-07 05:15:22 +04:00			`CTDB_SRVID_NODE_FLAGS_CHANGED, data);`
added admin commands to ban/unban nodes (This used to be ctdb commit 4dad04172e7e4955b5bf6444a85b19901c9683ad) 2007-06-07 10:34:33 +04:00
			`if ((node->flags & NODE_FLAGS_BANNED) && !(old_flags & NODE_FLAGS_BANNED)) {`
			`/* make sure we are frozen */`
			`DEBUG(0,("This node has been banned - forcing freeze and recovery\n"));`
when a node becomes banned its databases are no longer part of ctdb and it should thus no longer serve any database access calls until it has been reintroduced into the cluster. when becoming banned, reset the local generation id to 1 to prevent any further database access calls from other nodes from being processed. (This used to be ctdb commit b531021db43ebaa5f5d0ace28c59913d359bd8a8) 2007-08-22 04:38:35 +04:00			`/* Reset the generation id to 1 to make us ignore any`
			`REQ/REPLY CALL/DMASTER someone sends to us.`
			`We are now banned so we shouldnt service database calls`
			`anymore.`
			`*/`
create a define to represent the 'invalid' generation id we used in two places. create a new helper function to generate new generation id values that know about the invalid id and avoids generating it. update the ctdb status tool to know about the invalid generation id and print the string INVALID instead (This used to be ctdb commit 4fbcd189543cb8a92227fdcd3d158472e558ccda) 2007-08-22 06:38:31 +04:00			`ctdb->vnn_map->generation = INVALID_GENERATION;`
when a node becomes banned its databases are no longer part of ctdb and it should thus no longer serve any database access calls until it has been reintroduced into the cluster. when becoming banned, reset the local generation id to 1 to prevent any further database access calls from other nodes from being processed. (This used to be ctdb commit b531021db43ebaa5f5d0ace28c59913d359bd8a8) 2007-08-22 04:38:35 +04:00
- send tcp info to all connected nodes, not just vnnmap nodes - use a non-blocking freeze when banned - release all IPs when banned (This used to be ctdb commit 070e85e532b33b792f85c3e72eee205d906aaf85) 2007-06-10 02:46:33 +04:00			`ctdb_start_freeze(ctdb);`
			`ctdb_release_all_ips(ctdb);`
added admin commands to ban/unban nodes (This used to be ctdb commit 4dad04172e7e4955b5bf6444a85b19901c9683ad) 2007-06-07 10:34:33 +04:00			`ctdb->recovery_mode = CTDB_RECOVERY_ACTIVE;`
			`}`
merged admin enable/disable change from ronnie (This used to be ctdb commit df17b69dfd83a98f9c711994c7dd51ad2cc0ab8a) 2007-06-07 05:15:22 +04:00
			`return 0;`
			`}`

281 lines 7.6 KiB C Raw Normal View History Unescape Escape

281 lines

7.6 KiB

C

Raw Normal View History