samba-mirror

mirror of https://github.com/samba-team/samba.git synced 2024-12-25 23:21:54 +03:00

690 lines

18 KiB

C

Raw Normal View History

added support for persistent databases in ctdbd (This used to be ctdb commit 3115090a0d882beca9d70761130b74bb0821f201) 2007-09-21 06:24:02 +04:00			`/*`
			`persistent store logic`

			`Copyright (C) Andrew Tridgell 2007`
			`Copyright (C) Ronnie Sahlberg 2007`

			`This program is free software; you can redistribute it and/or modify`
			`it under the terms of the GNU General Public License as published by`
			`the Free Software Foundation; either version 3 of the License, or`
			`(at your option) any later version.`

			`This program is distributed in the hope that it will be useful,`
			`but WITHOUT ANY WARRANTY; without even the implied warranty of`
			`MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the`
			`GNU General Public License for more details.`

			`You should have received a copy of the GNU General Public License`
			`along with this program; if not, see <http://www.gnu.org/licenses/>.`
			`*/`

			`#include "includes.h"`
			`#include "system/filesys.h"`
			`#include "system/wait.h"`
			`#include "db_wrap.h"`
			`#include "lib/tdb/include/tdb.h"`
			`#include "../include/ctdb_private.h"`

			`struct ctdb_persistent_state {`
			`struct ctdb_context *ctdb;`
persistent: add a ctdb_db context to the ctdb_persistent_state struct. (This used to be ctdb commit a14917c983c3b9bbbf38f5ddeecdbbe5bde32364) 2011-02-23 02:23:18 +03:00			`struct ctdb_db_context ctdb_db; / used by trans3_commit */`
persistent: add a client context to the persistent_stat and track the db_id The db_id is tracked in the client context as an indication that a transaction commit is in progress. This is cleared in the persistent_state talloc destructor. This is in order to properly treat running trans3_commits if the client disconnects. (This used to be ctdb commit e886ff24f4e3e250944289db95916b948893d26c) 2011-02-23 19:35:27 +03:00			`struct ctdb_client client; / used by trans3_commit */`
added support for persistent databases in ctdbd (This used to be ctdb commit 3115090a0d882beca9d70761130b74bb0821f201) 2007-09-21 06:24:02 +04:00			`struct ctdb_req_control *c;`
			`const char *errormsg;`
			`uint32_t num_pending;`
			`int32_t status;`
return a more detailed error code from a trans2 commit error (This used to be ctdb commit 6915661a460cd589b441ac7cd8695f35c4e83113) 2008-08-08 03:58:49 +04:00			`uint32_t num_failed, num_sent;`
added support for persistent databases in ctdbd (This used to be ctdb commit 3115090a0d882beca9d70761130b74bb0821f201) 2007-09-21 06:24:02 +04:00			`};`

return a more detailed error code from a trans2 commit error (This used to be ctdb commit 6915661a460cd589b441ac7cd8695f35c4e83113) 2008-08-08 03:58:49 +04:00			`/*`
			`1) all nodes fail, and all nodes reply`
			`2) some nodes fail, all nodes reply`
			`3) some nodes timeout`
			`4) all nodes succeed`
			`*/`

added support for persistent databases in ctdbd (This used to be ctdb commit 3115090a0d882beca9d70761130b74bb0821f201) 2007-09-21 06:24:02 +04:00			`/*`
			`called when a node has acknowledged a ctdb_control_update_record call`
			`*/`
			`static void ctdb_persistent_callback(struct ctdb_context *ctdb,`
			`int32_t status, TDB_DATA data,`
			`const char *errormsg,`
			`void *private_data)`
			`{`
			`struct ctdb_persistent_state *state = talloc_get_type(private_data,`
			`struct ctdb_persistent_state);`
persistent: reduce indentation for the finishing moves in ctdb_persistent_callback (This used to be ctdb commit 2c2d1646eb753ea9561f085bcb101153267b052b) 2011-02-23 00:47:30 +03:00			`enum ctdb_trans2_commit_error etype;`
in ctdb_control_persistent_store() we must talloc_steal() the pointer to c to prevent it from being immediately freed (and our persistent store state with it) if we need to wait asynchronously for other nodes before we can reply back to the client (This used to be ctdb commit fa5915280933e4d2e7d4d07199829c9c2b87a335) 2007-09-21 09:19:33 +04:00
persistent_callback: ignore the update-recordreturn code of remote node in recovery If a recovery was started, then all further processing of the update_record controls sent by the trans3_commit control is disabled. The recovery should trigger sending the reply for the update record control when finished. (This used to be ctdb commit 12cf0619255b12230843cd8bb49cbfdea376ca2f) 2011-02-23 00:24:50 +03:00			`if (ctdb->recovery_mode != CTDB_RECOVERY_NORMAL) {`
			`DEBUG(DEBUG_INFO, ("ctdb_persistent_callback: ignoring reply "`
			`"during recovery\n"));`
			`return;`
			`}`

added support for persistent databases in ctdbd (This used to be ctdb commit 3115090a0d882beca9d70761130b74bb0821f201) 2007-09-21 06:24:02 +04:00			`if (status != 0) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,("ctdb_persistent_callback failed with status %d (%s)\n",`
persistent_callback: print "no error message given" instead of "(null)" (This used to be ctdb commit d871a38978219e004833608c11aae98fe47614b9) 2011-02-23 00:49:52 +03:00			`status, errormsg?errormsg:"no error message given"));`
added support for persistent databases in ctdbd (This used to be ctdb commit 3115090a0d882beca9d70761130b74bb0821f201) 2007-09-21 06:24:02 +04:00			`state->status = status;`
			`state->errormsg = errormsg;`
return a more detailed error code from a trans2 commit error (This used to be ctdb commit 6915661a460cd589b441ac7cd8695f35c4e83113) 2008-08-08 03:58:49 +04:00			`state->num_failed++;`
persistent: if a node failed to update_record, trigger a recovery and stop processing of the update_record replies in order to let the recovery finish the trans3_commit control. (This used to be ctdb commit cab95570dc1eefb08abbac5ae411c29f699b51cc) 2011-02-23 00:44:16 +03:00
			`/*`
			`* If a node failed to complete the update_record control,`
			`* then either a recovery is already running or something`
			`* bad is going on. So trigger a recovery and let the`
			`* recovery finish the transaction, sending back the reply`
			`* for the trans3_commit control to the client.`
			`*/`
			`ctdb->recovery_mode = CTDB_RECOVERY_ACTIVE;`
			`return;`
added support for persistent databases in ctdbd (This used to be ctdb commit 3115090a0d882beca9d70761130b74bb0821f201) 2007-09-21 06:24:02 +04:00			`}`
persistent: if a node failed to update_record, trigger a recovery and stop processing of the update_record replies in order to let the recovery finish the trans3_commit control. (This used to be ctdb commit cab95570dc1eefb08abbac5ae411c29f699b51cc) 2011-02-23 00:44:16 +03:00
added support for persistent databases in ctdbd (This used to be ctdb commit 3115090a0d882beca9d70761130b74bb0821f201) 2007-09-21 06:24:02 +04:00			`state->num_pending--;`
persistent: reduce indentation for the finishing moves in ctdb_persistent_callback (This used to be ctdb commit 2c2d1646eb753ea9561f085bcb101153267b052b) 2011-02-23 00:47:30 +03:00
			`if (state->num_pending != 0) {`
			`return;`
added support for persistent databases in ctdbd (This used to be ctdb commit 3115090a0d882beca9d70761130b74bb0821f201) 2007-09-21 06:24:02 +04:00			`}`
persistent: reduce indentation for the finishing moves in ctdb_persistent_callback (This used to be ctdb commit 2c2d1646eb753ea9561f085bcb101153267b052b) 2011-02-23 00:47:30 +03:00
			`if (state->num_failed == state->num_sent) {`
			`etype = CTDB_TRANS2_COMMIT_ALLFAIL;`
			`} else if (state->num_failed != 0) {`
			`etype = CTDB_TRANS2_COMMIT_SOMEFAIL;`
			`} else {`
			`etype = CTDB_TRANS2_COMMIT_SUCCESS;`
			`}`

			`ctdb_request_control_reply(state->ctdb, state->c, NULL, etype, state->errormsg);`
			`talloc_free(state);`
added support for persistent databases in ctdbd (This used to be ctdb commit 3115090a0d882beca9d70761130b74bb0821f201) 2007-09-21 06:24:02 +04:00			`}`

in ctdb_control_persistent_store() we must talloc_steal() the pointer to c to prevent it from being immediately freed (and our persistent store state with it) if we need to wait asynchronously for other nodes before we can reply back to the client (This used to be ctdb commit fa5915280933e4d2e7d4d07199829c9c2b87a335) 2007-09-21 09:19:33 +04:00			`/*`
			`called if persistent store times out`
			`*/`
			`static void ctdb_persistent_store_timeout(struct event_context ev, struct timed_event te,`
			`struct timeval t, void *private_data)`
			`{`
			`struct ctdb_persistent_state *state = talloc_get_type(private_data, struct ctdb_persistent_state);`
persistent_store_timout: do not really time out the trans3_commit control in recovery If a recovery was started, then all further processing of the update_record controls sent by the trans3_commit control and timing them out is disabled. The recovery should trigger sending the reply for the update record control when finished. (This used to be ctdb commit 983c1ca2e18ecd60fca69bfe9e116125cc695857) 2011-02-23 00:24:50 +03:00
			`if (state->ctdb->recovery_mode != CTDB_RECOVERY_NORMAL) {`
			`DEBUG(DEBUG_INFO, ("ctdb_persistent_store_timeout: ignoring "`
			`"timeout during recovery\n"));`
			`return;`
			`}`

return a more detailed error code from a trans2 commit error (This used to be ctdb commit 6915661a460cd589b441ac7cd8695f35c4e83113) 2008-08-08 03:58:49 +04:00			`ctdb_request_control_reply(state->ctdb, state->c, NULL, CTDB_TRANS2_COMMIT_TIMEOUT,`
			`"timeout in ctdb_persistent_state");`
in ctdb_control_persistent_store() we must talloc_steal() the pointer to c to prevent it from being immediately freed (and our persistent store state with it) if we need to wait asynchronously for other nodes before we can reply back to the client (This used to be ctdb commit fa5915280933e4d2e7d4d07199829c9c2b87a335) 2007-09-21 09:19:33 +04:00
			`talloc_free(state);`
			`}`
added support for persistent databases in ctdbd (This used to be ctdb commit 3115090a0d882beca9d70761130b74bb0821f201) 2007-09-21 06:24:02 +04:00
persistent: add ctdb_persistent_finish_trans3_commits(). This function walks all databases and checks for running trans3 commits. It sends replies to all of them (with error code) and ends them. To be called when a recovery finishes. (This used to be ctdb commit 70ba153b532528bdccea70c5ea28972257f384c1) 2011-02-23 19:38:40 +03:00			`/**`
			`* Finish pending trans3 commit controls, i.e. send`
			`* reply to the client. This is called by the end-recovery`
			`* control to fix the situation when a recovery interrupts`
server:persistent: fix a comment typo. Signed-off-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit 6455ce5e4980a63d56ed30f7059869c8356c12ea) 2013-02-22 14:36:00 +04:00			`* the usual progress of a transaction.`
persistent: add ctdb_persistent_finish_trans3_commits(). This function walks all databases and checks for running trans3 commits. It sends replies to all of them (with error code) and ends them. To be called when a recovery finishes. (This used to be ctdb commit 70ba153b532528bdccea70c5ea28972257f384c1) 2011-02-23 19:38:40 +03:00			`*/`
			`void ctdb_persistent_finish_trans3_commits(struct ctdb_context *ctdb)`
			`{`
			`struct ctdb_db_context *ctdb_db;`

			`if (ctdb->recovery_mode != CTDB_RECOVERY_NORMAL) {`
server:persistent: fix a debug message (copy'n'paste error) Signed-off-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit 87c89b7c2a14e2ee79a3efc7e8125842bc04bf23) 2013-02-22 15:42:10 +04:00			`DEBUG(DEBUG_INFO, ("ctdb_persistent_finish_trans3_commits: "`
			`"skipping execution when recovery is "`
			`"active\n"));`
persistent: add ctdb_persistent_finish_trans3_commits(). This function walks all databases and checks for running trans3 commits. It sends replies to all of them (with error code) and ends them. To be called when a recovery finishes. (This used to be ctdb commit 70ba153b532528bdccea70c5ea28972257f384c1) 2011-02-23 19:38:40 +03:00			`return;`
			`}`

			`for (ctdb_db = ctdb->db_list; ctdb_db; ctdb_db = ctdb_db->next) {`
			`struct ctdb_persistent_state *state;`

			`if (ctdb_db->persistent_state == NULL) {`
			`continue;`
			`}`

			`state = ctdb_db->persistent_state;`

			`ctdb_request_control_reply(ctdb, state->c, NULL,`
			`CTDB_TRANS2_COMMIT_SOMEFAIL,`
			`"trans3 commit ended by recovery");`

			`/* The destructor sets ctdb_db->persistent_state to NULL. */`
			`talloc_free(state);`
			`}`
			`}`

added support for persistent databases in ctdbd (This used to be ctdb commit 3115090a0d882beca9d70761130b74bb0821f201) 2007-09-21 06:24:02 +04:00			`/*`
added new multi-record transaction commit code (This used to be ctdb commit 9ff3380099fe6f4d39de126db0826971a10ee692) 2008-07-30 13:57:00 +04:00			`store a set of persistent records - called from a ctdb client when it has updated`
			`some records in a persistent database. The client will have the record`
added support for persistent databases in ctdbd (This used to be ctdb commit 3115090a0d882beca9d70761130b74bb0821f201) 2007-09-21 06:24:02 +04:00			`locked for the duration of this call. The client is the dmaster when`
			`this call is made`
			`*/`
added new multi-record transaction commit code (This used to be ctdb commit 9ff3380099fe6f4d39de126db0826971a10ee692) 2008-07-30 13:57:00 +04:00			`int32_t ctdb_control_trans2_commit(struct ctdb_context *ctdb,`
			`struct ctdb_req_control *c,`
			`TDB_DATA recdata, bool *async_reply)`
added support for persistent databases in ctdbd (This used to be ctdb commit 3115090a0d882beca9d70761130b74bb0821f201) 2007-09-21 06:24:02 +04:00			`{`
Add two new controls to start and cancel a persistent update. This allows ctdb to automatically start a new full blown recovery if a client has started updating the local tdb for a persistent database but is kill -9ed before it has ensured the update is distributed clusterwide. (This used to be ctdb commit 1ffccb3e0b3b5bd376c5302304029af393709518) 2008-07-17 07:50:55 +04:00			`struct ctdb_client *client = ctdb_reqid_find(ctdb, c->client_id, struct ctdb_client);`
added support for persistent databases in ctdbd (This used to be ctdb commit 3115090a0d882beca9d70761130b74bb0821f201) 2007-09-21 06:24:02 +04:00			`struct ctdb_persistent_state *state;`
			`int i;`
Fix persistent transaction commit race condition. In ctdb_client.c:ctdb_transaction_commit(), after a failed TRANS2_COMMIT control call (for instance due to the 1-second being exceeded waiting for a busy node's reply), there is a 1-second gap between the transaction_cancel() and replay_transaction() calls in which there is no lock on the persistent db. And due to the lack of global state indicating that a transaction is in progress in ctdbd, other nodes may succeed to start transactions on the db in this gap and even worse work on top of the possibly already pushed changes. So the data diverges on the several nodes. This change fixes this by introducing global state for a transaction commit being active in the ctdb_db_context struct and in a db_id field in the client so that a client keeps track of _which_ tdb it as transaction commit running on. These data are set by ctdb upon entering the trans2_commit control and they are cleared in the trans2_error or trans2_finished controls. This makes it impossible to start a nother transaction or migrate a record to a different node while a transaction is active on a persistent tdb, including the retry loop. This approach is dead lock free and still allows recovery process to be started in the retry-gap between cancel and replay. Also note, that this solution does not require any change in the client side. This was debugged and developed together with Stefan Metzmacher <metze@samba.org> - thanks! Michael (This used to be ctdb commit f88103516e5ad723062fb95fcb07a128f1069d69) 2009-07-21 13:30:38 +04:00			`struct ctdb_marshall_buffer m = (struct ctdb_marshall_buffer )recdata.dptr;`
			`struct ctdb_db_context *ctdb_db;`

			`ctdb_db = find_ctdb_db(ctdb, m->db_id);`
			`if (ctdb_db == NULL) {`
			`DEBUG(DEBUG_ERR,(__location__ " ctdb_control_trans2_commit: "`
server: uniformly log db and client ids as 8-digit hex numbers in trans2_commit Michael (This used to be ctdb commit 2febdd23f754a2d4699bed36b941442ab362a376) 2009-10-29 15:44:39 +03:00			`"Unknown database db_id[0x%08x]\n", m->db_id));`
Fix persistent transaction commit race condition. In ctdb_client.c:ctdb_transaction_commit(), after a failed TRANS2_COMMIT control call (for instance due to the 1-second being exceeded waiting for a busy node's reply), there is a 1-second gap between the transaction_cancel() and replay_transaction() calls in which there is no lock on the persistent db. And due to the lack of global state indicating that a transaction is in progress in ctdbd, other nodes may succeed to start transactions on the db in this gap and even worse work on top of the possibly already pushed changes. So the data diverges on the several nodes. This change fixes this by introducing global state for a transaction commit being active in the ctdb_db_context struct and in a db_id field in the client so that a client keeps track of _which_ tdb it as transaction commit running on. These data are set by ctdb upon entering the trans2_commit control and they are cleared in the trans2_error or trans2_finished controls. This makes it impossible to start a nother transaction or migrate a record to a different node while a transaction is active on a persistent tdb, including the retry loop. This approach is dead lock free and still allows recovery process to be started in the retry-gap between cancel and replay. Also note, that this solution does not require any change in the client side. This was debugged and developed together with Stefan Metzmacher <metze@samba.org> - thanks! Michael (This used to be ctdb commit f88103516e5ad723062fb95fcb07a128f1069d69) 2009-07-21 13:30:38 +04:00			`return -1;`
			`}`
added support for persistent databases in ctdbd (This used to be ctdb commit 3115090a0d882beca9d70761130b74bb0821f201) 2007-09-21 06:24:02 +04:00
Add two new controls to start and cancel a persistent update. This allows ctdb to automatically start a new full blown recovery if a client has started updating the local tdb for a persistent database but is kill -9ed before it has ensured the update is distributed clusterwide. (This used to be ctdb commit 1ffccb3e0b3b5bd376c5302304029af393709518) 2008-07-17 07:50:55 +04:00			`if (client == NULL) {`
			`DEBUG(DEBUG_ERR,(__location__ " can not match persistent_store to a client. Returning error\n"));`
			`return -1;`
			`}`
added new multi-record transaction commit code (This used to be ctdb commit 9ff3380099fe6f4d39de126db0826971a10ee692) 2008-07-30 13:57:00 +04:00
server: Use tdb_check to verify persistent tdbs on startup Depending on --max-persistent-check-errors we allow ctdb to start with unhealthy persistent databases. The default is 0 which means to reject a startup with unhealthy dbs. The health of the persistent databases is checked after each recovery. Node monitoring and the "startup" is deferred until all persistent databases are healthy. Databases can become healthy automaticly by a completely HEALTHY node joining the cluster. Or by an administrator with "ctdb backupdb/restoredb" or "ctdb wipedb". metze (This used to be ctdb commit 15f133d5150ed1badb4fef7d644f10cd08a25cb5) 2009-12-07 15:28:11 +03:00			`if (ctdb_db->unhealthy_reason) {`
			`DEBUG(DEBUG_ERR,("db(%s) unhealty in ctdb_control_trans2_commit: %s\n",`
			`ctdb_db->db_name, ctdb_db->unhealthy_reason));`
			`return -1;`
			`}`

added new multi-record transaction commit code (This used to be ctdb commit 9ff3380099fe6f4d39de126db0826971a10ee692) 2008-07-30 13:57:00 +04:00			`/* handling num_persistent_updates is a bit strange -`
			`there are 3 cases`
			`1) very old clients, which never called CTDB_CONTROL_START_PERSISTENT_UPDATE`
			`They don't expect num_persistent_updates to be used at all`

			`2) less old clients, which uses CTDB_CONTROL_START_PERSISTENT_UPDATE, and expected`
			`this commit to then decrement it`

			`3) new clients which use TRANS2 commit functions, and`
			`expect this function to increment the counter, and`
			`then have it decremented in ctdb_control_trans2_error`
			`or ctdb_control_trans2_finished`
			`*/`
added a new control CTDB_CONTROL_TRANS2_COMMIT_RETRY so we can tell the difference between a initial commit attempt and a retry, which allows us to get the persistent updates counter right for retries (This used to be ctdb commit 7f29c50ccbc7789bfbc20bcb4b65758af9ebe6c5) 2008-08-08 07:11:28 +04:00			`switch (c->opcode) {`
			`case CTDB_CONTROL_PERSISTENT_STORE:`
ctdbd: refuse PERSISTENT_STORE if transaction is running. Michael (This used to be ctdb commit c07d6d90f7afd19213ad44624c3e2b9c85f4eea8) 2009-07-20 18:33:53 +04:00			`if (ctdb_db->transaction_active) {`
server: fix debug message in trans2_commit (refusing persistent store during transaction) log the right db_id also log the client_id Michael (This used to be ctdb commit 48ac5c77698ab7a28d24629cc8a6985011c5d14d) 2009-10-29 15:48:36 +03:00			`DEBUG(DEBUG_ERR, (__location__ " trans2_commit: a "`
			`"transaction is active on database "`
			`"db_id[0x%08x] - refusing persistent "`
			`" store for client id[0x%08x]\n",`
			`ctdb_db->db_id, client->client_id));`
ctdbd: refuse PERSISTENT_STORE if transaction is running. Michael (This used to be ctdb commit c07d6d90f7afd19213ad44624c3e2b9c85f4eea8) 2009-07-20 18:33:53 +04:00			`return -1;`
			`}`
added new multi-record transaction commit code (This used to be ctdb commit 9ff3380099fe6f4d39de126db0826971a10ee692) 2008-07-30 13:57:00 +04:00			`if (client->num_persistent_updates > 0) {`
			`client->num_persistent_updates--;`
added a new control CTDB_CONTROL_TRANS2_COMMIT_RETRY so we can tell the difference between a initial commit attempt and a retry, which allows us to get the persistent updates counter right for retries (This used to be ctdb commit 7f29c50ccbc7789bfbc20bcb4b65758af9ebe6c5) 2008-08-08 07:11:28 +04:00			`}`
			`break;`
			`case CTDB_CONTROL_TRANS2_COMMIT:`
Fix persistent transaction commit race condition. In ctdb_client.c:ctdb_transaction_commit(), after a failed TRANS2_COMMIT control call (for instance due to the 1-second being exceeded waiting for a busy node's reply), there is a 1-second gap between the transaction_cancel() and replay_transaction() calls in which there is no lock on the persistent db. And due to the lack of global state indicating that a transaction is in progress in ctdbd, other nodes may succeed to start transactions on the db in this gap and even worse work on top of the possibly already pushed changes. So the data diverges on the several nodes. This change fixes this by introducing global state for a transaction commit being active in the ctdb_db_context struct and in a db_id field in the client so that a client keeps track of _which_ tdb it as transaction commit running on. These data are set by ctdb upon entering the trans2_commit control and they are cleared in the trans2_error or trans2_finished controls. This makes it impossible to start a nother transaction or migrate a record to a different node while a transaction is active on a persistent tdb, including the retry loop. This approach is dead lock free and still allows recovery process to be started in the retry-gap between cancel and replay. Also note, that this solution does not require any change in the client side. This was debugged and developed together with Stefan Metzmacher <metze@samba.org> - thanks! Michael (This used to be ctdb commit f88103516e5ad723062fb95fcb07a128f1069d69) 2009-07-21 13:30:38 +04:00			`if (ctdb_db->transaction_active) {`
server: fix a debug message in trans2_commit - log the correct db_id Michael (This used to be ctdb commit ab9657b5a66d5665e6c5fd1bf8eb4074a3bffeec) 2009-10-29 15:24:19 +03:00			`DEBUG(DEBUG_ERR,(__location__ " trans2_commit: there is"`
			`" already a transaction commit "`
server: uniformly log db and client ids as 8-digit hex numbers in trans2_commit Michael (This used to be ctdb commit 2febdd23f754a2d4699bed36b941442ab362a376) 2009-10-29 15:44:39 +03:00			`"active on db_id[0x%08x] - forbidding "`
			`"client_id[0x%08x] to commit\n",`
server: output client_id in some debug messages in trans2_commit Michael (This used to be ctdb commit 11fefd02e6c9531ffb28b9e6acaf42ba39757d87) 2009-10-29 15:27:47 +03:00			`ctdb_db->db_id, client->client_id));`
Fix persistent transaction commit race condition. In ctdb_client.c:ctdb_transaction_commit(), after a failed TRANS2_COMMIT control call (for instance due to the 1-second being exceeded waiting for a busy node's reply), there is a 1-second gap between the transaction_cancel() and replay_transaction() calls in which there is no lock on the persistent db. And due to the lack of global state indicating that a transaction is in progress in ctdbd, other nodes may succeed to start transactions on the db in this gap and even worse work on top of the possibly already pushed changes. So the data diverges on the several nodes. This change fixes this by introducing global state for a transaction commit being active in the ctdb_db_context struct and in a db_id field in the client so that a client keeps track of _which_ tdb it as transaction commit running on. These data are set by ctdb upon entering the trans2_commit control and they are cleared in the trans2_error or trans2_finished controls. This makes it impossible to start a nother transaction or migrate a record to a different node while a transaction is active on a persistent tdb, including the retry loop. This approach is dead lock free and still allows recovery process to be started in the retry-gap between cancel and replay. Also note, that this solution does not require any change in the client side. This was debugged and developed together with Stefan Metzmacher <metze@samba.org> - thanks! Michael (This used to be ctdb commit f88103516e5ad723062fb95fcb07a128f1069d69) 2009-07-21 13:30:38 +04:00			`return -1;`
			`}`
			`if (client->db_id != 0) {`
			`DEBUG(DEBUG_ERR,(__location__ " ERROR: trans2_commit: "`
server: uniformly log db and client ids as 8-digit hex numbers in trans2_commit Michael (This used to be ctdb commit 2febdd23f754a2d4699bed36b941442ab362a376) 2009-10-29 15:44:39 +03:00			`"client-db_id[0x%08x] != 0 "`
			`"(client_id[0x%08x])\n",`
server: output client_id in some debug messages in trans2_commit Michael (This used to be ctdb commit 11fefd02e6c9531ffb28b9e6acaf42ba39757d87) 2009-10-29 15:27:47 +03:00			`client->db_id, client->client_id));`
Fix persistent transaction commit race condition. In ctdb_client.c:ctdb_transaction_commit(), after a failed TRANS2_COMMIT control call (for instance due to the 1-second being exceeded waiting for a busy node's reply), there is a 1-second gap between the transaction_cancel() and replay_transaction() calls in which there is no lock on the persistent db. And due to the lack of global state indicating that a transaction is in progress in ctdbd, other nodes may succeed to start transactions on the db in this gap and even worse work on top of the possibly already pushed changes. So the data diverges on the several nodes. This change fixes this by introducing global state for a transaction commit being active in the ctdb_db_context struct and in a db_id field in the client so that a client keeps track of _which_ tdb it as transaction commit running on. These data are set by ctdb upon entering the trans2_commit control and they are cleared in the trans2_error or trans2_finished controls. This makes it impossible to start a nother transaction or migrate a record to a different node while a transaction is active on a persistent tdb, including the retry loop. This approach is dead lock free and still allows recovery process to be started in the retry-gap between cancel and replay. Also note, that this solution does not require any change in the client side. This was debugged and developed together with Stefan Metzmacher <metze@samba.org> - thanks! Michael (This used to be ctdb commit f88103516e5ad723062fb95fcb07a128f1069d69) 2009-07-21 13:30:38 +04:00			`return -1;`
			`}`
added new multi-record transaction commit code (This used to be ctdb commit 9ff3380099fe6f4d39de126db0826971a10ee692) 2008-07-30 13:57:00 +04:00			`client->num_persistent_updates++;`
Fix persistent transaction commit race condition. In ctdb_client.c:ctdb_transaction_commit(), after a failed TRANS2_COMMIT control call (for instance due to the 1-second being exceeded waiting for a busy node's reply), there is a 1-second gap between the transaction_cancel() and replay_transaction() calls in which there is no lock on the persistent db. And due to the lack of global state indicating that a transaction is in progress in ctdbd, other nodes may succeed to start transactions on the db in this gap and even worse work on top of the possibly already pushed changes. So the data diverges on the several nodes. This change fixes this by introducing global state for a transaction commit being active in the ctdb_db_context struct and in a db_id field in the client so that a client keeps track of _which_ tdb it as transaction commit running on. These data are set by ctdb upon entering the trans2_commit control and they are cleared in the trans2_error or trans2_finished controls. This makes it impossible to start a nother transaction or migrate a record to a different node while a transaction is active on a persistent tdb, including the retry loop. This approach is dead lock free and still allows recovery process to be started in the retry-gap between cancel and replay. Also note, that this solution does not require any change in the client side. This was debugged and developed together with Stefan Metzmacher <metze@samba.org> - thanks! Michael (This used to be ctdb commit f88103516e5ad723062fb95fcb07a128f1069d69) 2009-07-21 13:30:38 +04:00			`ctdb_db->transaction_active = true;`
			`client->db_id = m->db_id;`
server: add positive debug statements to trans2_commit and trans2_finished When the operation completed / started successfully. Michael (This used to be ctdb commit 0df012d58eb83195ea0365be19e0566dbc394a66) 2009-10-29 15:53:44 +03:00			`DEBUG(DEBUG_DEBUG, (__location__ " client id[0x%08x] started to"`
			`" commit transaction on db id[0x%08x]\n",`
			`client->client_id, client->db_id));`
added a new control CTDB_CONTROL_TRANS2_COMMIT_RETRY so we can tell the difference between a initial commit attempt and a retry, which allows us to get the persistent updates counter right for retries (This used to be ctdb commit 7f29c50ccbc7789bfbc20bcb4b65758af9ebe6c5) 2008-08-08 07:11:28 +04:00			`break;`
			`case CTDB_CONTROL_TRANS2_COMMIT_RETRY:`
			`/* already updated from the first commit */`
Fix persistent transaction commit race condition. In ctdb_client.c:ctdb_transaction_commit(), after a failed TRANS2_COMMIT control call (for instance due to the 1-second being exceeded waiting for a busy node's reply), there is a 1-second gap between the transaction_cancel() and replay_transaction() calls in which there is no lock on the persistent db. And due to the lack of global state indicating that a transaction is in progress in ctdbd, other nodes may succeed to start transactions on the db in this gap and even worse work on top of the possibly already pushed changes. So the data diverges on the several nodes. This change fixes this by introducing global state for a transaction commit being active in the ctdb_db_context struct and in a db_id field in the client so that a client keeps track of _which_ tdb it as transaction commit running on. These data are set by ctdb upon entering the trans2_commit control and they are cleared in the trans2_error or trans2_finished controls. This makes it impossible to start a nother transaction or migrate a record to a different node while a transaction is active on a persistent tdb, including the retry loop. This approach is dead lock free and still allows recovery process to be started in the retry-gap between cancel and replay. Also note, that this solution does not require any change in the client side. This was debugged and developed together with Stefan Metzmacher <metze@samba.org> - thanks! Michael (This used to be ctdb commit f88103516e5ad723062fb95fcb07a128f1069d69) 2009-07-21 13:30:38 +04:00			`if (client->db_id != m->db_id) {`
			`DEBUG(DEBUG_ERR,(__location__ " ERROR: trans2_commit "`
server: uniformly log db and client ids as 8-digit hex numbers in trans2_commit Michael (This used to be ctdb commit 2febdd23f754a2d4699bed36b941442ab362a376) 2009-10-29 15:44:39 +03:00			`"retry: client-db_id[0x%08x] != "`
			`"db_id[0x%08x] (client_id[0x%08x])\n",`
			`client->db_id,`
server: output client_id in some debug messages in trans2_commit Michael (This used to be ctdb commit 11fefd02e6c9531ffb28b9e6acaf42ba39757d87) 2009-10-29 15:27:47 +03:00			`m->db_id, client->client_id));`
Fix persistent transaction commit race condition. In ctdb_client.c:ctdb_transaction_commit(), after a failed TRANS2_COMMIT control call (for instance due to the 1-second being exceeded waiting for a busy node's reply), there is a 1-second gap between the transaction_cancel() and replay_transaction() calls in which there is no lock on the persistent db. And due to the lack of global state indicating that a transaction is in progress in ctdbd, other nodes may succeed to start transactions on the db in this gap and even worse work on top of the possibly already pushed changes. So the data diverges on the several nodes. This change fixes this by introducing global state for a transaction commit being active in the ctdb_db_context struct and in a db_id field in the client so that a client keeps track of _which_ tdb it as transaction commit running on. These data are set by ctdb upon entering the trans2_commit control and they are cleared in the trans2_error or trans2_finished controls. This makes it impossible to start a nother transaction or migrate a record to a different node while a transaction is active on a persistent tdb, including the retry loop. This approach is dead lock free and still allows recovery process to be started in the retry-gap between cancel and replay. Also note, that this solution does not require any change in the client side. This was debugged and developed together with Stefan Metzmacher <metze@samba.org> - thanks! Michael (This used to be ctdb commit f88103516e5ad723062fb95fcb07a128f1069d69) 2009-07-21 13:30:38 +04:00			`return -1;`
			`}`
server: add positive debug statements to trans2_commit and trans2_finished When the operation completed / started successfully. Michael (This used to be ctdb commit 0df012d58eb83195ea0365be19e0566dbc394a66) 2009-10-29 15:53:44 +03:00			`DEBUG(DEBUG_DEBUG, (__location__ " client id[0x%08x] started "`
			`"transaction commit retry on "`
			`"db_id[0x%08x]\n",`
			`client->client_id, client->db_id));`
added a new control CTDB_CONTROL_TRANS2_COMMIT_RETRY so we can tell the difference between a initial commit attempt and a retry, which allows us to get the persistent updates counter right for retries (This used to be ctdb commit 7f29c50ccbc7789bfbc20bcb4b65758af9ebe6c5) 2008-08-08 07:11:28 +04:00			`break;`
Only decrement the "number of persistent writes in flight" If/when it is >0 or we will break if used against an unpatched samba server (This used to be ctdb commit 52a38487f981fd5981c02a7a063ad2c598591c10) 2008-07-17 12:47:20 +04:00			`}`
Add two new controls to start and cancel a persistent update. This allows ctdb to automatically start a new full blown recovery if a client has started updating the local tdb for a persistent database but is kill -9ed before it has ensured the update is distributed clusterwide. (This used to be ctdb commit 1ffccb3e0b3b5bd376c5302304029af393709518) 2008-07-17 07:50:55 +04:00
server:trans2_commit: move the check for active recovery down. This needs to be done after the control-dispatcher: In the TRANS2_COMMIT control, the client->db_id needs to be set before bailing out, since otherwise the next TRANS2_COMMIT_RETRY will fail... Michael (This used to be ctdb commit 59faf3f923a5989b5ee94ef02a12827412775bae) 2009-12-04 02:06:34 +03:00			`if (ctdb->recovery_mode != CTDB_RECOVERY_NORMAL) {`
			`DEBUG(DEBUG_INFO,("rejecting ctdb_control_trans2_commit when recovery active\n"));`
			`return -1;`
			`}`

fix some memory hierarchy bugs in allocation of the state structure for persistent writes. since these two controls (UPDATE_RECORD and PERSISTENT_STORE) can respond asynchronously to the control, we can not allocate the state variable as a child off ctdb_req_control instead we must allocate state as a child off ctdb itself and steal ctdb_req_control so it becomes a child of state. othervise both ctdb_req_control and also state will be released immediately after we have finished setting up the async reply and returned. (This used to be ctdb commit 6f6de0becd179be9eb9a6bf70562b090205ce196) 2008-05-22 10:29:46 +04:00			`state = talloc_zero(ctdb, struct ctdb_persistent_state);`
added support for persistent databases in ctdbd (This used to be ctdb commit 3115090a0d882beca9d70761130b74bb0821f201) 2007-09-21 06:24:02 +04:00			`CTDB_NO_MEMORY(ctdb, state);`

			`state->ctdb = ctdb;`
fixed a valgrind error, and some warnings (This used to be ctdb commit c0f52dbb385fa0748680adb7c40755c92e577551) 2007-09-24 03:57:14 +04:00			`state->c = c;`
added support for persistent databases in ctdbd (This used to be ctdb commit 3115090a0d882beca9d70761130b74bb0821f201) 2007-09-21 06:24:02 +04:00
avoid using connected nodes that aren't in the vnn map yet (This used to be ctdb commit 2b5ae133f5f6fa9ad1a8896fe4b4c542d4ca462d) 2007-09-21 09:44:13 +04:00			`for (i=0;i<ctdb->vnn_map->size;i++) {`
			`struct ctdb_node *node = ctdb->nodes[ctdb->vnn_map->map[i]];`
added support for persistent databases in ctdbd (This used to be ctdb commit 3115090a0d882beca9d70761130b74bb0821f201) 2007-09-21 06:24:02 +04:00			`int ret;`

			`/* only send to active nodes */`
			`if (node->flags & NODE_FLAGS_INACTIVE) {`
			`continue;`
			`}`

			`/* don't send to ourselves */`
			`if (node->pnn == ctdb->pnn) {`
			`continue;`
			`}`

			`ret = ctdb_daemon_send_control(ctdb, node->pnn, 0, CTDB_CONTROL_UPDATE_RECORD,`
			`c->client_id, 0, recdata,`
			`ctdb_persistent_callback, state);`
			`if (ret == -1) {`
merge from ronnie (This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c) 2008-02-04 12:07:15 +03:00			`DEBUG(DEBUG_ERR,("Unable to send CTDB_CONTROL_UPDATE_RECORD to pnn %u\n", node->pnn));`
added support for persistent databases in ctdbd (This used to be ctdb commit 3115090a0d882beca9d70761130b74bb0821f201) 2007-09-21 06:24:02 +04:00			`talloc_free(state);`
			`return -1;`
			`}`

			`state->num_pending++;`
return a more detailed error code from a trans2 commit error (This used to be ctdb commit 6915661a460cd589b441ac7cd8695f35c4e83113) 2008-08-08 03:58:49 +04:00			`state->num_sent++;`
added support for persistent databases in ctdbd (This used to be ctdb commit 3115090a0d882beca9d70761130b74bb0821f201) 2007-09-21 06:24:02 +04:00			`}`

			`if (state->num_pending == 0) {`
			`talloc_free(state);`
			`return 0;`
			`}`

			`/* we need to wait for the replies */`
			`*async_reply = true;`
in ctdb_control_persistent_store() we must talloc_steal() the pointer to c to prevent it from being immediately freed (and our persistent store state with it) if we need to wait asynchronously for other nodes before we can reply back to the client (This used to be ctdb commit fa5915280933e4d2e7d4d07199829c9c2b87a335) 2007-09-21 09:19:33 +04:00
fixed a valgrind error, and some warnings (This used to be ctdb commit c0f52dbb385fa0748680adb7c40755c92e577551) 2007-09-24 03:57:14 +04:00			`/* need to keep the control structure around */`
			`talloc_steal(state, c);`

			`/* but we won't wait forever */`
in ctdb_control_persistent_store() we must talloc_steal() the pointer to c to prevent it from being immediately freed (and our persistent store state with it) if we need to wait asynchronously for other nodes before we can reply back to the client (This used to be ctdb commit fa5915280933e4d2e7d4d07199829c9c2b87a335) 2007-09-21 09:19:33 +04:00			`event_add_timed(ctdb->ev, state,`
			`timeval_current_ofs(ctdb->tunable.control_timeout, 0),`
			`ctdb_persistent_store_timeout, state);`

added support for persistent databases in ctdbd (This used to be ctdb commit 3115090a0d882beca9d70761130b74bb0821f201) 2007-09-21 06:24:02 +04:00			`return 0;`
			`}`

persistent: allocate the persistent state in the ctdb_db struct in trans3_commit Make sure that ctdb_db->persistent_state is correctly NULL-ed when the state is freed. This way, we can use ctdb_db->persistent_state as an indication for whether a transaction commit is currently running. (This used to be ctdb commit 761cb235193564a0f337d0308f0a9e6de0ef2710) 2011-02-23 02:01:13 +03:00			`static int ctdb_persistent_state_destructor(struct ctdb_persistent_state *state)`
			`{`
persistent: add a client context to the persistent_stat and track the db_id The db_id is tracked in the client context as an indication that a transaction commit is in progress. This is cleared in the persistent_state talloc destructor. This is in order to properly treat running trans3_commits if the client disconnects. (This used to be ctdb commit e886ff24f4e3e250944289db95916b948893d26c) 2011-02-23 19:35:27 +03:00			`if (state->client != NULL) {`
			`state->client->db_id = 0;`
			`}`

persistent: allocate the persistent state in the ctdb_db struct in trans3_commit Make sure that ctdb_db->persistent_state is correctly NULL-ed when the state is freed. This way, we can use ctdb_db->persistent_state as an indication for whether a transaction commit is currently running. (This used to be ctdb commit 761cb235193564a0f337d0308f0a9e6de0ef2710) 2011-02-23 02:01:13 +03:00			`if (state->ctdb_db != NULL) {`
			`state->ctdb_db->persistent_state = NULL;`
			`}`

			`return 0;`
			`}`
added support for persistent databases in ctdbd (This used to be ctdb commit 3115090a0d882beca9d70761130b74bb0821f201) 2007-09-21 06:24:02 +04:00
server: add a new control CTDB_CONTROL_TRANS3_COMMIT This is a simplified version of the trans2 commit control: It just rolls out the marshall buffer to all active nodes. It is the main ctdbd part of the re-implementation of the persistent transactions. The client code is changed to take a global lock to start a transactions and store into the marshal buffer instead of writing to the local tdb under a local transaction. The old transaction implementation is going to be removed in a later commit. Michael (This used to be ctdb commit f66428f9d2013080a414404c1ba6117888352fd6) 2009-12-03 19:59:49 +03:00			`/*`
			`* Store a set of persistent records.`
			`* This is used to roll out a transaction to all nodes.`
			`*/`
			`int32_t ctdb_control_trans3_commit(struct ctdb_context *ctdb,`
			`struct ctdb_req_control *c,`
			`TDB_DATA recdata, bool *async_reply)`
			`{`
			`struct ctdb_client *client;`
			`struct ctdb_persistent_state *state;`
			`int i;`
			`struct ctdb_marshall_buffer m = (struct ctdb_marshall_buffer )recdata.dptr;`
			`struct ctdb_db_context *ctdb_db;`

			`if (ctdb->recovery_mode != CTDB_RECOVERY_NORMAL) {`
			`DEBUG(DEBUG_INFO,("rejecting ctdb_control_trans3_commit when recovery active\n"));`
			`return -1;`
			`}`

persistent: add a client context to the persistent_stat and track the db_id The db_id is tracked in the client context as an indication that a transaction commit is in progress. This is cleared in the persistent_state talloc destructor. This is in order to properly treat running trans3_commits if the client disconnects. (This used to be ctdb commit e886ff24f4e3e250944289db95916b948893d26c) 2011-02-23 19:35:27 +03:00			`client = ctdb_reqid_find(ctdb, c->client_id, struct ctdb_client);`
			`if (client == NULL) {`
			`DEBUG(DEBUG_ERR,(__location__ " can not match persistent_store "`
			`"to a client. Returning error\n"));`
			`return -1;`
			`}`

			`if (client->db_id != 0) {`
			`DEBUG(DEBUG_ERR,(__location__ " ERROR: trans3_commit: "`
			`"client-db_id[0x%08x] != 0 "`
			`"(client_id[0x%08x]): trans3_commit active?\n",`
			`client->db_id, client->client_id));`
			`return -1;`
			`}`

server: add a new control CTDB_CONTROL_TRANS3_COMMIT This is a simplified version of the trans2 commit control: It just rolls out the marshall buffer to all active nodes. It is the main ctdbd part of the re-implementation of the persistent transactions. The client code is changed to take a global lock to start a transactions and store into the marshal buffer instead of writing to the local tdb under a local transaction. The old transaction implementation is going to be removed in a later commit. Michael (This used to be ctdb commit f66428f9d2013080a414404c1ba6117888352fd6) 2009-12-03 19:59:49 +03:00			`ctdb_db = find_ctdb_db(ctdb, m->db_id);`
			`if (ctdb_db == NULL) {`
			`DEBUG(DEBUG_ERR,(__location__ " ctdb_control_trans3_commit: "`
			`"Unknown database db_id[0x%08x]\n", m->db_id));`
			`return -1;`
			`}`

persistent: reject trans3_control when a commit is already active. This should actually never happen. (This used to be ctdb commit f416e76838fe2adf629d4356d1cc87054b1af164) 2011-02-23 02:03:07 +03:00			`if (ctdb_db->persistent_state != NULL) {`
			`DEBUG(DEBUG_ERR, (__location__ " Error: "`
			`"ctdb_control_trans3_commit "`
			`"called while a transaction commit is "`
			`"active. db_id[0x%08x]\n", m->db_id));`
			`return -1;`
			`}`

persistent: allocate the persistent state in the ctdb_db struct in trans3_commit Make sure that ctdb_db->persistent_state is correctly NULL-ed when the state is freed. This way, we can use ctdb_db->persistent_state as an indication for whether a transaction commit is currently running. (This used to be ctdb commit 761cb235193564a0f337d0308f0a9e6de0ef2710) 2011-02-23 02:01:13 +03:00			`ctdb_db->persistent_state = talloc_zero(ctdb_db,`
			`struct ctdb_persistent_state);`
			`CTDB_NO_MEMORY(ctdb, ctdb_db->persistent_state);`
server: add a new control CTDB_CONTROL_TRANS3_COMMIT This is a simplified version of the trans2 commit control: It just rolls out the marshall buffer to all active nodes. It is the main ctdbd part of the re-implementation of the persistent transactions. The client code is changed to take a global lock to start a transactions and store into the marshal buffer instead of writing to the local tdb under a local transaction. The old transaction implementation is going to be removed in a later commit. Michael (This used to be ctdb commit f66428f9d2013080a414404c1ba6117888352fd6) 2009-12-03 19:59:49 +03:00
persistent: add a client context to the persistent_stat and track the db_id The db_id is tracked in the client context as an indication that a transaction commit is in progress. This is cleared in the persistent_state talloc destructor. This is in order to properly treat running trans3_commits if the client disconnects. (This used to be ctdb commit e886ff24f4e3e250944289db95916b948893d26c) 2011-02-23 19:35:27 +03:00			`client->db_id = m->db_id;`

persistent: allocate the persistent state in the ctdb_db struct in trans3_commit Make sure that ctdb_db->persistent_state is correctly NULL-ed when the state is freed. This way, we can use ctdb_db->persistent_state as an indication for whether a transaction commit is currently running. (This used to be ctdb commit 761cb235193564a0f337d0308f0a9e6de0ef2710) 2011-02-23 02:01:13 +03:00			`state = ctdb_db->persistent_state;`
server: add a new control CTDB_CONTROL_TRANS3_COMMIT This is a simplified version of the trans2 commit control: It just rolls out the marshall buffer to all active nodes. It is the main ctdbd part of the re-implementation of the persistent transactions. The client code is changed to take a global lock to start a transactions and store into the marshal buffer instead of writing to the local tdb under a local transaction. The old transaction implementation is going to be removed in a later commit. Michael (This used to be ctdb commit f66428f9d2013080a414404c1ba6117888352fd6) 2009-12-03 19:59:49 +03:00			`state->ctdb = ctdb;`
persistent: allocate the persistent state in the ctdb_db struct in trans3_commit Make sure that ctdb_db->persistent_state is correctly NULL-ed when the state is freed. This way, we can use ctdb_db->persistent_state as an indication for whether a transaction commit is currently running. (This used to be ctdb commit 761cb235193564a0f337d0308f0a9e6de0ef2710) 2011-02-23 02:01:13 +03:00			`state->ctdb_db = ctdb_db;`
server: add a new control CTDB_CONTROL_TRANS3_COMMIT This is a simplified version of the trans2 commit control: It just rolls out the marshall buffer to all active nodes. It is the main ctdbd part of the re-implementation of the persistent transactions. The client code is changed to take a global lock to start a transactions and store into the marshal buffer instead of writing to the local tdb under a local transaction. The old transaction implementation is going to be removed in a later commit. Michael (This used to be ctdb commit f66428f9d2013080a414404c1ba6117888352fd6) 2009-12-03 19:59:49 +03:00			`state->c = c;`
persistent: add a client context to the persistent_stat and track the db_id The db_id is tracked in the client context as an indication that a transaction commit is in progress. This is cleared in the persistent_state talloc destructor. This is in order to properly treat running trans3_commits if the client disconnects. (This used to be ctdb commit e886ff24f4e3e250944289db95916b948893d26c) 2011-02-23 19:35:27 +03:00			`state->client = client;`

persistent: allocate the persistent state in the ctdb_db struct in trans3_commit Make sure that ctdb_db->persistent_state is correctly NULL-ed when the state is freed. This way, we can use ctdb_db->persistent_state as an indication for whether a transaction commit is currently running. (This used to be ctdb commit 761cb235193564a0f337d0308f0a9e6de0ef2710) 2011-02-23 02:01:13 +03:00			`talloc_set_destructor(state, ctdb_persistent_state_destructor);`
server: add a new control CTDB_CONTROL_TRANS3_COMMIT This is a simplified version of the trans2 commit control: It just rolls out the marshall buffer to all active nodes. It is the main ctdbd part of the re-implementation of the persistent transactions. The client code is changed to take a global lock to start a transactions and store into the marshal buffer instead of writing to the local tdb under a local transaction. The old transaction implementation is going to be removed in a later commit. Michael (This used to be ctdb commit f66428f9d2013080a414404c1ba6117888352fd6) 2009-12-03 19:59:49 +03:00
			`for (i = 0; i < ctdb->vnn_map->size; i++) {`
			`struct ctdb_node *node = ctdb->nodes[ctdb->vnn_map->map[i]];`
			`int ret;`

			`/* only send to active nodes */`
			`if (node->flags & NODE_FLAGS_INACTIVE) {`
			`continue;`
			`}`

			`ret = ctdb_daemon_send_control(ctdb, node->pnn, 0,`
			`CTDB_CONTROL_UPDATE_RECORD,`
			`c->client_id, 0, recdata,`
			`ctdb_persistent_callback,`
			`state);`
			`if (ret == -1) {`
			`DEBUG(DEBUG_ERR,("Unable to send "`
			`"CTDB_CONTROL_UPDATE_RECORD "`
			`"to pnn %u\n", node->pnn));`
			`talloc_free(state);`
			`return -1;`
			`}`

			`state->num_pending++;`
			`state->num_sent++;`
			`}`

			`if (state->num_pending == 0) {`
			`talloc_free(state);`
			`return 0;`
			`}`

			`/* we need to wait for the replies */`
			`*async_reply = true;`

			`/* need to keep the control structure around */`
			`talloc_steal(state, c);`

			`/* but we won't wait forever */`
			`event_add_timed(ctdb->ev, state,`
			`timeval_current_ofs(ctdb->tunable.control_timeout, 0),`
			`ctdb_persistent_store_timeout, state);`

			`return 0;`
			`}`


added new multi-record transaction commit code (This used to be ctdb commit 9ff3380099fe6f4d39de126db0826971a10ee692) 2008-07-30 13:57:00 +04:00			`/*`
			`called when a client has finished a local commit in a transaction to`
			`a persistent database`
			`*/`
			`int32_t ctdb_control_trans2_finished(struct ctdb_context *ctdb,`
			`struct ctdb_req_control *c)`
			`{`
			`struct ctdb_client *client = ctdb_reqid_find(ctdb, c->client_id, struct ctdb_client);`
Fix persistent transaction commit race condition. In ctdb_client.c:ctdb_transaction_commit(), after a failed TRANS2_COMMIT control call (for instance due to the 1-second being exceeded waiting for a busy node's reply), there is a 1-second gap between the transaction_cancel() and replay_transaction() calls in which there is no lock on the persistent db. And due to the lack of global state indicating that a transaction is in progress in ctdbd, other nodes may succeed to start transactions on the db in this gap and even worse work on top of the possibly already pushed changes. So the data diverges on the several nodes. This change fixes this by introducing global state for a transaction commit being active in the ctdb_db_context struct and in a db_id field in the client so that a client keeps track of _which_ tdb it as transaction commit running on. These data are set by ctdb upon entering the trans2_commit control and they are cleared in the trans2_error or trans2_finished controls. This makes it impossible to start a nother transaction or migrate a record to a different node while a transaction is active on a persistent tdb, including the retry loop. This approach is dead lock free and still allows recovery process to be started in the retry-gap between cancel and replay. Also note, that this solution does not require any change in the client side. This was debugged and developed together with Stefan Metzmacher <metze@samba.org> - thanks! Michael (This used to be ctdb commit f88103516e5ad723062fb95fcb07a128f1069d69) 2009-07-21 13:30:38 +04:00			`struct ctdb_db_context *ctdb_db;`

			`ctdb_db = find_ctdb_db(ctdb, client->db_id);`
			`if (ctdb_db == NULL) {`
			`DEBUG(DEBUG_ERR,(__location__ " ctdb_control_trans2_finish "`
			`"Unknown database 0x%08x\n", client->db_id));`
			`return -1;`
			`}`
			`if (!ctdb_db->transaction_active) {`
			`DEBUG(DEBUG_ERR,(__location__ " ctdb_control_trans2_finish: "`
			`"Database 0x%08x has no transaction commit "`
			`"started\n", client->db_id));`
			`return -1;`
			`}`

			`ctdb_db->transaction_active = false;`
			`client->db_id = 0;`
added new multi-record transaction commit code (This used to be ctdb commit 9ff3380099fe6f4d39de126db0826971a10ee692) 2008-07-30 13:57:00 +04:00
			`if (client->num_persistent_updates == 0) {`
			`DEBUG(DEBUG_ERR, (__location__ " ERROR: num_persistent_updates == 0\n"));`
cover some corner cases where the persistent database could become inconsistent (This used to be ctdb commit c76c214be401cb116265ed17ffe6c77c979ded82) 2008-08-07 07:34:18 +04:00			`DEBUG(DEBUG_ERR,(__location__ " Forcing recovery\n"));`
			`client->ctdb->recovery_mode = CTDB_RECOVERY_ACTIVE;`
added new multi-record transaction commit code (This used to be ctdb commit 9ff3380099fe6f4d39de126db0826971a10ee692) 2008-07-30 13:57:00 +04:00			`return -1;`
			`}`
			`client->num_persistent_updates--;`

server: add positive debug statements to trans2_commit and trans2_finished When the operation completed / started successfully. Michael (This used to be ctdb commit 0df012d58eb83195ea0365be19e0566dbc394a66) 2009-10-29 15:53:44 +03:00			`DEBUG(DEBUG_DEBUG, (__location__ " client id[0x%08x] finished "`
			`"transaction commit db_id[0x%08x]\n",`
			`client->client_id, ctdb_db->db_id));`

added new multi-record transaction commit code (This used to be ctdb commit 9ff3380099fe6f4d39de126db0826971a10ee692) 2008-07-30 13:57:00 +04:00			`return 0;`
			`}`

			`/*`
			`called when a client gets an error committing its database`
			`during a transaction commit`
			`*/`
			`int32_t ctdb_control_trans2_error(struct ctdb_context *ctdb,`
			`struct ctdb_req_control *c)`
			`{`
			`struct ctdb_client *client = ctdb_reqid_find(ctdb, c->client_id, struct ctdb_client);`
Fix persistent transaction commit race condition. In ctdb_client.c:ctdb_transaction_commit(), after a failed TRANS2_COMMIT control call (for instance due to the 1-second being exceeded waiting for a busy node's reply), there is a 1-second gap between the transaction_cancel() and replay_transaction() calls in which there is no lock on the persistent db. And due to the lack of global state indicating that a transaction is in progress in ctdbd, other nodes may succeed to start transactions on the db in this gap and even worse work on top of the possibly already pushed changes. So the data diverges on the several nodes. This change fixes this by introducing global state for a transaction commit being active in the ctdb_db_context struct and in a db_id field in the client so that a client keeps track of _which_ tdb it as transaction commit running on. These data are set by ctdb upon entering the trans2_commit control and they are cleared in the trans2_error or trans2_finished controls. This makes it impossible to start a nother transaction or migrate a record to a different node while a transaction is active on a persistent tdb, including the retry loop. This approach is dead lock free and still allows recovery process to be started in the retry-gap between cancel and replay. Also note, that this solution does not require any change in the client side. This was debugged and developed together with Stefan Metzmacher <metze@samba.org> - thanks! Michael (This used to be ctdb commit f88103516e5ad723062fb95fcb07a128f1069d69) 2009-07-21 13:30:38 +04:00			`struct ctdb_db_context *ctdb_db;`

			`ctdb_db = find_ctdb_db(ctdb, client->db_id);`
			`if (ctdb_db == NULL) {`
			`DEBUG(DEBUG_ERR,(__location__ " ctdb_control_trans2_error: "`
			`"Unknown database 0x%08x\n", client->db_id));`
			`return -1;`
			`}`
			`if (!ctdb_db->transaction_active) {`
			`DEBUG(DEBUG_ERR,(__location__ " ctdb_control_trans2_error: "`
			`"Database 0x%08x has no transaction commit "`
			`"started\n", client->db_id));`
			`return -1;`
			`}`

			`ctdb_db->transaction_active = false;`
			`client->db_id = 0;`

added new multi-record transaction commit code (This used to be ctdb commit 9ff3380099fe6f4d39de126db0826971a10ee692) 2008-07-30 13:57:00 +04:00			`if (client->num_persistent_updates == 0) {`
			`DEBUG(DEBUG_ERR, (__location__ " ERROR: num_persistent_updates == 0\n"));`
cover some corner cases where the persistent database could become inconsistent (This used to be ctdb commit c76c214be401cb116265ed17ffe6c77c979ded82) 2008-08-07 07:34:18 +04:00			`} else {`
			`client->num_persistent_updates--;`
added new multi-record transaction commit code (This used to be ctdb commit 9ff3380099fe6f4d39de126db0826971a10ee692) 2008-07-30 13:57:00 +04:00			`}`

server: extend a debug message in ctdb_control_trans2_error() Michael (This used to be ctdb commit 0fb9573d1c838b436ab9be83e197b68f35f94acb) 2009-10-29 15:54:55 +03:00			`DEBUG(DEBUG_ERR,(__location__ " An error occurred during transaction on"`
			`" db_id[0x%08x] - forcing recovery\n",`
			`ctdb_db->db_id));`
added new multi-record transaction commit code (This used to be ctdb commit 9ff3380099fe6f4d39de126db0826971a10ee692) 2008-07-30 13:57:00 +04:00			`client->ctdb->recovery_mode = CTDB_RECOVERY_ACTIVE;`

			`return 0;`
			`}`

Revert "update the "uptime" command to indicate the "time since last" is the time since the last recovery OR failover." This reverts commit 3b0d44497800a16400d05a30bdaf6e6c285d4b36. (This used to be ctdb commit cb36bbb5418290e8e5b770d2d836285b15da2a6f) 2009-10-29 02:49:00 +03:00			`/**`
			`* Tell whether a transaction is active on this node on the give DB.`
			`*/`
			`int32_t ctdb_control_trans2_active(struct ctdb_context *ctdb,`
server: trans2_active: don't report a transaction active on the node that performs the transaction Otherwise a node can lock itself out, e.g. when a commit control times out... Michael (This used to be ctdb commit cb432e30351d5e5a41e98da3c7b1c2a4d400a3a2) 2009-10-29 19:08:37 +03:00			`struct ctdb_req_control *c,`
Revert "update the "uptime" command to indicate the "time since last" is the time since the last recovery OR failover." This reverts commit 3b0d44497800a16400d05a30bdaf6e6c285d4b36. (This used to be ctdb commit cb36bbb5418290e8e5b770d2d836285b15da2a6f) 2009-10-29 02:49:00 +03:00			`uint32_t db_id)`
			`{`
			`struct ctdb_db_context *ctdb_db;`
server: trans2_active: don't report a transaction active on the node that performs the transaction Otherwise a node can lock itself out, e.g. when a commit control times out... Michael (This used to be ctdb commit cb432e30351d5e5a41e98da3c7b1c2a4d400a3a2) 2009-10-29 19:08:37 +03:00			`struct ctdb_client *client = ctdb_reqid_find(ctdb, c->client_id, struct ctdb_client);`
Revert "update the "uptime" command to indicate the "time since last" is the time since the last recovery OR failover." This reverts commit 3b0d44497800a16400d05a30bdaf6e6c285d4b36. (This used to be ctdb commit cb36bbb5418290e8e5b770d2d836285b15da2a6f) 2009-10-29 02:49:00 +03:00
			`ctdb_db = find_ctdb_db(ctdb, db_id);`
			`if (!ctdb_db) {`
			`DEBUG(DEBUG_ERR,(__location__ " Unknown db 0x%08x\n", db_id));`
			`return -1;`
			`}`

server: trans2_active: don't report a transaction active on the node that performs the transaction Otherwise a node can lock itself out, e.g. when a commit control times out... Michael (This used to be ctdb commit cb432e30351d5e5a41e98da3c7b1c2a4d400a3a2) 2009-10-29 19:08:37 +03:00			`if (client->db_id == db_id) {`
			`return 0;`
			`}`

Revert "update the "uptime" command to indicate the "time since last" is the time since the last recovery OR failover." This reverts commit 3b0d44497800a16400d05a30bdaf6e6c285d4b36. (This used to be ctdb commit cb36bbb5418290e8e5b770d2d836285b15da2a6f) 2009-10-29 02:49:00 +03:00			`if (ctdb_db->transaction_active) {`
			`return 1;`
			`} else {`
			`return 0;`
			`}`
			`}`
Add two new controls to start and cancel a persistent update. This allows ctdb to automatically start a new full blown recovery if a client has started updating the local tdb for a persistent database but is kill -9ed before it has ensured the update is distributed clusterwide. (This used to be ctdb commit 1ffccb3e0b3b5bd376c5302304029af393709518) 2008-07-17 07:50:55 +04:00
			`/*`
added new multi-record transaction commit code (This used to be ctdb commit 9ff3380099fe6f4d39de126db0826971a10ee692) 2008-07-30 13:57:00 +04:00			`backwards compatibility:`

Add two new controls to start and cancel a persistent update. This allows ctdb to automatically start a new full blown recovery if a client has started updating the local tdb for a persistent database but is kill -9ed before it has ensured the update is distributed clusterwide. (This used to be ctdb commit 1ffccb3e0b3b5bd376c5302304029af393709518) 2008-07-17 07:50:55 +04:00			`start a persistent store operation. passing both the key, header and`
			`data to the daemon. If the client disconnects before it has issued`
			`a persistent_update call to the daemon we trigger a full recovery`
			`to ensure the databases are brought back in sync.`
			`for now we ignore the recdata that the client has passed to us.`
			`*/`
			`int32_t ctdb_control_start_persistent_update(struct ctdb_context *ctdb,`
			`struct ctdb_req_control *c,`
			`TDB_DATA recdata)`
			`{`
			`struct ctdb_client *client = ctdb_reqid_find(ctdb, c->client_id, struct ctdb_client);`

			`if (client == NULL) {`
			`DEBUG(DEBUG_ERR,(__location__ " can not match start_persistent_update to a client. Returning error\n"));`
			`return -1;`
			`}`

			`client->num_persistent_updates++;`

			`return 0;`
			`}`

added new multi-record transaction commit code (This used to be ctdb commit 9ff3380099fe6f4d39de126db0826971a10ee692) 2008-07-30 13:57:00 +04:00			`/*`
			`backwards compatibility:`

			`called to tell ctdbd that it is no longer doing a persistent update`
			`*/`
Add two new controls to start and cancel a persistent update. This allows ctdb to automatically start a new full blown recovery if a client has started updating the local tdb for a persistent database but is kill -9ed before it has ensured the update is distributed clusterwide. (This used to be ctdb commit 1ffccb3e0b3b5bd376c5302304029af393709518) 2008-07-17 07:50:55 +04:00			`int32_t ctdb_control_cancel_persistent_update(struct ctdb_context *ctdb,`
added new multi-record transaction commit code (This used to be ctdb commit 9ff3380099fe6f4d39de126db0826971a10ee692) 2008-07-30 13:57:00 +04:00			`struct ctdb_req_control *c,`
			`TDB_DATA recdata)`
Add two new controls to start and cancel a persistent update. This allows ctdb to automatically start a new full blown recovery if a client has started updating the local tdb for a persistent database but is kill -9ed before it has ensured the update is distributed clusterwide. (This used to be ctdb commit 1ffccb3e0b3b5bd376c5302304029af393709518) 2008-07-17 07:50:55 +04:00			`{`
			`struct ctdb_client *client = ctdb_reqid_find(ctdb, c->client_id, struct ctdb_client);`

			`if (client == NULL) {`
			`DEBUG(DEBUG_ERR,(__location__ " can not match cancel_persistent_update to a client. Returning error\n"));`
			`return -1;`
			`}`

Only decrement the "number of persistent writes in flight" If/when it is >0 or we will break if used against an unpatched samba server (This used to be ctdb commit 52a38487f981fd5981c02a7a063ad2c598591c10) 2008-07-17 12:47:20 +04:00			`if (client->num_persistent_updates > 0) {`
			`client->num_persistent_updates--;`
			`}`
Add two new controls to start and cancel a persistent update. This allows ctdb to automatically start a new full blown recovery if a client has started updating the local tdb for a persistent database but is kill -9ed before it has ensured the update is distributed clusterwide. (This used to be ctdb commit 1ffccb3e0b3b5bd376c5302304029af393709518) 2008-07-17 07:50:55 +04:00
			`return 0;`
			`}`
added new multi-record transaction commit code (This used to be ctdb commit 9ff3380099fe6f4d39de126db0826971a10ee692) 2008-07-30 13:57:00 +04:00

			`/*`
			`backwards compatibility:`

			`single record varient of ctdb_control_trans2_commit for older clients`
			`*/`
			`int32_t ctdb_control_persistent_store(struct ctdb_context *ctdb,`
			`struct ctdb_req_control *c,`
			`TDB_DATA recdata, bool *async_reply)`
			`{`
			`struct ctdb_marshall_buffer *m;`
			`struct ctdb_rec_data rec = (struct ctdb_rec_data )recdata.dptr;`
			`TDB_DATA key, data;`

			`if (recdata.dsize != offsetof(struct ctdb_rec_data, data) +`
			`rec->keylen + rec->datalen) {`
			`DEBUG(DEBUG_ERR, (__location__ " Bad data size in recdata\n"));`
			`return -1;`
			`}`

			`key.dptr = &rec->data[0];`
			`key.dsize = rec->keylen;`
			`data.dptr = &rec->data[rec->keylen];`
			`data.dsize = rec->datalen;`

			`m = ctdb_marshall_add(c, NULL, rec->reqid, rec->reqid, key, NULL, data);`
			`CTDB_NO_MEMORY(ctdb, m);`

			`return ctdb_control_trans2_commit(ctdb, c, ctdb_marshall_finish(m), async_reply);`
			`}`

Add a new control CTDB_GET_DB_SEQNUM - fetch a persistent db's sequence number. Michael (This used to be ctdb commit a7e3b5fac6b3f5d74473f26eb86c067b35647996) 2009-12-11 17:31:02 +03:00			`static int32_t ctdb_get_db_seqnum(struct ctdb_context *ctdb,`
			`uint32_t db_id,`
			`uint64_t *seqnum)`
			`{`
			`int32_t ret;`
			`struct ctdb_db_context *ctdb_db;`
			`const char *keyname = CTDB_DB_SEQNUM_KEY;`
			`TDB_DATA key;`
			`TDB_DATA data;`
			`TALLOC_CTX *mem_ctx = talloc_new(ctdb);`
DB Seqnum: must provide a ctdb_ltdb_header when calling ctdb_ltdb_fetch() (This used to be ctdb commit 1fea9ef55a6a9d201ad1b49583451ac3e6b1c66d) 2011-11-28 03:41:17 +04:00			`struct ctdb_ltdb_header header;`
Add a new control CTDB_GET_DB_SEQNUM - fetch a persistent db's sequence number. Michael (This used to be ctdb commit a7e3b5fac6b3f5d74473f26eb86c067b35647996) 2009-12-11 17:31:02 +03:00
			`ctdb_db = find_ctdb_db(ctdb, db_id);`
			`if (!ctdb_db) {`
			`DEBUG(DEBUG_ERR,(__location__ " Unknown db 0x%08x\n", db_id));`
			`ret = -1;`
			`goto done;`
			`}`

			`key.dptr = (uint8_t *)discard_const(keyname);`
			`key.dsize = strlen(keyname) + 1;`

DB Seqnum: must provide a ctdb_ltdb_header when calling ctdb_ltdb_fetch() (This used to be ctdb commit 1fea9ef55a6a9d201ad1b49583451ac3e6b1c66d) 2011-11-28 03:41:17 +04:00			`ret = (int32_t)ctdb_ltdb_fetch(ctdb_db, key, &header, mem_ctx, &data);`
Add a new control CTDB_GET_DB_SEQNUM - fetch a persistent db's sequence number. Michael (This used to be ctdb commit a7e3b5fac6b3f5d74473f26eb86c067b35647996) 2009-12-11 17:31:02 +03:00			`if (ret != 0) {`
			`goto done;`
			`}`

			`if (data.dsize != sizeof(uint64_t)) {`
			`*seqnum = 0;`
			`goto done;`
			`}`

			`seqnum = (uint64_t *)data.dptr;`
added new multi-record transaction commit code (This used to be ctdb commit 9ff3380099fe6f4d39de126db0826971a10ee692) 2008-07-30 13:57:00 +04:00
Add a new control CTDB_GET_DB_SEQNUM - fetch a persistent db's sequence number. Michael (This used to be ctdb commit a7e3b5fac6b3f5d74473f26eb86c067b35647996) 2009-12-11 17:31:02 +03:00			`done:`
			`talloc_free(mem_ctx);`
			`return ret;`
			`}`

			`/**`
			`* Get the sequence number of a persistent database.`
			`*/`
			`int32_t ctdb_control_get_db_seqnum(struct ctdb_context *ctdb,`
			`TDB_DATA indata,`
			`TDB_DATA *outdata)`
			`{`
			`uint32_t db_id;`
			`int32_t ret;`
			`uint64_t seqnum;`

			`db_id = (uint32_t )indata.dptr;`
			`ret = ctdb_get_db_seqnum(ctdb, db_id, &seqnum);`
			`if (ret != 0) {`
			`goto done;`
			`}`

			`outdata->dsize = sizeof(uint64_t);`
			`outdata->dptr = (uint8_t *)talloc_zero(outdata, uint64_t);`
			`if (outdata->dptr == NULL) {`
			`ret = -1;`
			`goto done;`
			`}`

			`*(outdata->dptr) = seqnum;`

			`done:`
			`return ret;`
			`}`

690 lines 18 KiB C Raw Normal View History Unescape Escape

690 lines

18 KiB

C

Raw Normal View History