samba-mirror

mirror of https://github.com/samba-team/samba.git synced 2024-12-28 07:21:54 +03:00

Author	SHA1	Message	Date
Michael Adam	2bd04f0ff8	persistent: add ctdb_persistent_finish_trans3_commits(). This function walks all databases and checks for running trans3 commits. It sends replies to all of them (with error code) and ends them. To be called when a recovery finishes. (This used to be ctdb commit 70ba153b532528bdccea70c5ea28972257f384c1)	2011-02-24 10:35:26 +01:00
Michael Adam	0b3d8d28f6	persistent: add a client context to the persistent_stat and track the db_id The db_id is tracked in the client context as an indication that a transaction commit is in progress. This is cleared in the persistent_state talloc destructor. This is in order to properly treat running trans3_commits if the client disconnects. (This used to be ctdb commit e886ff24f4e3e250944289db95916b948893d26c)	2011-02-24 10:35:25 +01:00
Michael Adam	65f7a44987	persistent: reject trans3_control when a commit is already active. This should actually never happen. (This used to be ctdb commit f416e76838fe2adf629d4356d1cc87054b1af164)	2011-02-24 10:35:25 +01:00
Michael Adam	01c2c0c262	persistent: allocate the persistent state in the ctdb_db struct in trans3_commit Make sure that ctdb_db->persistent_state is correctly NULL-ed when the state is freed. This way, we can use ctdb_db->persistent_state as an indication for whether a transaction commit is currently running. (This used to be ctdb commit 761cb235193564a0f337d0308f0a9e6de0ef2710)	2011-02-24 10:35:25 +01:00
Michael Adam	503b647319	persistent: add a ctdb_db context to the ctdb_persistent_state struct. (This used to be ctdb commit a14917c983c3b9bbbf38f5ddeecdbbe5bde32364)	2011-02-24 10:35:25 +01:00
Michael Adam	76acf72bc5	persistent_callback: print "no error message given" instead of "(null)" (This used to be ctdb commit d871a38978219e004833608c11aae98fe47614b9)	2011-02-24 10:35:25 +01:00
Michael Adam	e050266690	persistent: reduce indentation for the finishing moves in ctdb_persistent_callback (This used to be ctdb commit 2c2d1646eb753ea9561f085bcb101153267b052b)	2011-02-24 10:35:24 +01:00
Michael Adam	033ba0b466	persistent: if a node failed to update_record, trigger a recovery and stop processing of the update_record replies in order to let the recovery finish the trans3_commit control. (This used to be ctdb commit cab95570dc1eefb08abbac5ae411c29f699b51cc)	2011-02-24 10:35:24 +01:00
Michael Adam	0c93a2932c	persistent_store_timout: do not really time out the trans3_commit control in recovery If a recovery was started, then all further processing of the update_record controls sent by the trans3_commit control and timing them out is disabled. The recovery should trigger sending the reply for the update record control when finished. (This used to be ctdb commit 983c1ca2e18ecd60fca69bfe9e116125cc695857)	2011-02-24 10:35:24 +01:00
Michael Adam	c9df23ae1d	persistent_callback: ignore the update-recordreturn code of remote node in recovery If a recovery was started, then all further processing of the update_record controls sent by the trans3_commit control is disabled. The recovery should trigger sending the reply for the update record control when finished. (This used to be ctdb commit 12cf0619255b12230843cd8bb49cbfdea376ca2f)	2011-02-24 10:35:24 +01:00
Ronnie Sahlberg	c4006ce844	Add ctdb_fork(0 which will fork a child process and drop the real-time scheduler for the child. Use ctdb_fork() from callers where we dont want the child to be running at real-time privilege. (This used to be ctdb commit 58795a4c9e0624e20fa3e0023b65127053edd103)	2011-01-11 07:40:41 +11:00
Ronnie Sahlberg	5ef29f9f25	Update latency countes to show min/max and average (This used to be ctdb commit 1919e949af4641ffe919123e44b02fb87c13ab9f)	2010-10-11 15:12:24 +11:00
Ronnie Sahlberg	39c367a68f	Create macros to update the statistics counters and use these macros everywhere instead of manipulating the coutenrs directly. (This used to be ctdb commit 2e648df890e5713bc575965d87937827b068d0d7)	2010-09-29 12:14:24 +10:00
Ronnie Sahlberg	2e8aac6689	Merge commit 'rusty/ports-from-1.0.112' into foo (This used to be ctdb commit 13e58d92f5f1723e850a82ae030d0ca57e89b1ee)	2010-08-19 13:17:56 +10:00
Rusty Russell	9fbb191b78	logging: give a unique logging name to each forked child. This means we can distinguish which child is logging, esp. via syslog where we have no pid. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 68b3761a0874429b90731741f0531f76dcfbb081)	2010-08-18 11:46:32 +09:30
Rusty Russell	f93440c4b7	event: Update events to latest Samba version 0.9.8 In Samba this is now called "tevent", and while we use the backwards compatibility wrappers they don't offer EVENT_FD_AUTOCLOSE: that is now a separate tevent_fd_set_auto_close() function. This is based on Samba version `7f29f817fa`. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 85e5e760cc91eb3157d3a88996ce474491646726)	2010-08-18 09:16:31 +09:30
Ronnie Sahlberg	d7c00d8d7e	Drop the debug level for logging fd creation to DEBUG_DEBUG (This used to be ctdb commit eae1d4f9e52e73b4d8769868fffdafa590d03784)	2010-02-04 06:37:41 +11:00
Stefan Metzmacher	94bc40307a	server: Use tdb_check to verify persistent tdbs on startup Depending on --max-persistent-check-errors we allow ctdb to start with unhealthy persistent databases. The default is 0 which means to reject a startup with unhealthy dbs. The health of the persistent databases is checked after each recovery. Node monitoring and the "startup" is deferred until all persistent databases are healthy. Databases can become healthy automaticly by a completely HEALTHY node joining the cluster. Or by an administrator with "ctdb backupdb/restoredb" or "ctdb wipedb". metze (This used to be ctdb commit 15f133d5150ed1badb4fef7d644f10cd08a25cb5)	2009-12-16 08:06:10 +01:00
Michael Adam	46de365e78	Add a new control CTDB_GET_DB_SEQNUM - fetch a persistent db's sequence number. Michael (This used to be ctdb commit a7e3b5fac6b3f5d74473f26eb86c067b35647996)	2009-12-12 00:45:39 +01:00
Michael Adam	faacd5ca79	server: add a new control CTDB_CONTROL_TRANS3_COMMIT This is a simplified version of the trans2 commit control: It just rolls out the marshall buffer to all active nodes. It is the main ctdbd part of the re-implementation of the persistent transactions. The client code is changed to take a global lock to start a transactions and store into the marshal buffer instead of writing to the local tdb under a local transaction. The old transaction implementation is going to be removed in a later commit. Michael (This used to be ctdb commit f66428f9d2013080a414404c1ba6117888352fd6)	2009-12-12 00:43:26 +01:00
Michael Adam	c1039fba0e	server:trans2_commit: move the check for active recovery down. This needs to be done after the control-dispatcher: In the TRANS2_COMMIT control, the client->db_id needs to be set before bailing out, since otherwise the next TRANS2_COMMIT_RETRY will fail... Michael (This used to be ctdb commit 59faf3f923a5989b5ee94ef02a12827412775bae)	2009-12-04 15:03:21 +01:00
Michael Adam	673a8588b1	server: fix debug message in trans2_commit (refusing persistent store during transaction) log the right db_id also log the client_id Michael (This used to be ctdb commit 48ac5c77698ab7a28d24629cc8a6985011c5d14d)	2009-10-30 09:29:25 +11:00
Michael Adam	1de0c6f807	server: uniformly log db and client ids as 8-digit hex numbers in trans2_commit Michael (This used to be ctdb commit 2febdd23f754a2d4699bed36b941442ab362a376)	2009-10-30 09:28:06 +11:00
Michael Adam	7384dfe4a9	server: line-wrap a debug statement in trans2_commit Michael (This used to be ctdb commit 3be446434adb0f3095ac0ef4b7c4a6258780b863)	2009-10-30 09:27:33 +11:00
Michael Adam	7bfa959a86	server: output client_id in some debug messages in trans2_commit Michael (This used to be ctdb commit 11fefd02e6c9531ffb28b9e6acaf42ba39757d87)	2009-10-30 09:26:51 +11:00
Michael Adam	4d073bd779	server: fix a debug message in trans2_commit - log the correct db_id Michael (This used to be ctdb commit ab9657b5a66d5665e6c5fd1bf8eb4074a3bffeec)	2009-10-30 09:26:16 +11:00
Michael Adam	dca16d5f64	server: extend a debug message in ctdb_control_trans2_error() Michael (This used to be ctdb commit 0fb9573d1c838b436ab9be83e197b68f35f94acb)	2009-10-30 09:24:17 +11:00
Michael Adam	2187e6c379	server: add positive debug statements to trans2_commit and trans2_finished When the operation completed / started successfully. Michael (This used to be ctdb commit 0df012d58eb83195ea0365be19e0566dbc394a66)	2009-10-30 09:23:29 +11:00
Michael Adam	0113744fec	server: trans2_active: don't report a transaction active on the node that performs the transaction Otherwise a node can lock itself out, e.g. when a commit control times out... Michael (This used to be ctdb commit cb432e30351d5e5a41e98da3c7b1c2a4d400a3a2)	2009-10-30 09:22:18 +11:00
Ronnie Sahlberg	023d09cd38	Revert "update the "uptime" command to indicate the "time since last" is the time since the last recovery OR failover." This reverts commit 3b0d44497800a16400d05a30bdaf6e6c285d4b36. (This used to be ctdb commit cb36bbb5418290e8e5b770d2d836285b15da2a6f)	2009-10-29 10:49:00 +11:00
Ronnie Sahlberg	279b7ca564	update the "uptime" command to indicate the "time since last" is the time since the last recovery OR failover. (This used to be ctdb commit 3b0d44497800a16400d05a30bdaf6e6c285d4b36)	2009-10-29 10:37:10 +11:00
Michael Adam	abac42ca34	server: add a new ctdb control CTDB_TRANS2_ACTIVE This aske the daemon wheter a transaction is currently active on a given DB on that node. More precisely this asks for the transaction_active flag in the ctdb_db_context that is set in the CTDB_TRANS2_COMMIT control and cleared in the CTDB_TRANS2_ERROR or CTDB_TRANS2_FINISHED controls. This will be useful for fixing race conditions in the transaction code. Michael (This used to be ctdb commit 8d430ae6968dfe566614379436fc3c56003fcd88)	2009-10-29 10:14:30 +11:00
Michael Adam	769a36c048	In ctdb_ltdb_store(), add a missing transaction_cancel when local store failed. Spotted by Volker. Michael (This used to be ctdb commit 0a4d409baabf242a87c06293789d589c896b104c)	2009-10-21 12:49:59 +11:00
Ronnie Sahlberg	9de3652380	add logging everytime we create a filedescriptor in the main ctdb daemon so we can spot if there are leaks. plug two leaks for filedescriptors related to when sending ARP fail and one leak when we can not parse the local address during tcp connection establish (This used to be ctdb commit ddd089810a14efe4be6e1ff3eccaa604e4913c9e)	2009-10-15 11:24:54 +11:00
Michael Adam	a6cf23362f	ctdbd: refuse PERSISTENT_STORE if transaction is running. Michael (This used to be ctdb commit c07d6d90f7afd19213ad44624c3e2b9c85f4eea8)	2009-07-29 11:13:38 +10:00
Michael Adam	4cd06a330e	Fix persistent transaction commit race condition. In ctdb_client.c:ctdb_transaction_commit(), after a failed TRANS2_COMMIT control call (for instance due to the 1-second being exceeded waiting for a busy node's reply), there is a 1-second gap between the transaction_cancel() and replay_transaction() calls in which there is no lock on the persistent db. And due to the lack of global state indicating that a transaction is in progress in ctdbd, other nodes may succeed to start transactions on the db in this gap and even worse work on top of the possibly already pushed changes. So the data diverges on the several nodes. This change fixes this by introducing global state for a transaction commit being active in the ctdb_db_context struct and in a db_id field in the client so that a client keeps track of _which_ tdb it as transaction commit running on. These data are set by ctdb upon entering the trans2_commit control and they are cleared in the trans2_error or trans2_finished controls. This makes it impossible to start a nother transaction or migrate a record to a different node while a transaction is active on a persistent tdb, including the retry loop. This approach is dead lock free and still allows recovery process to be started in the retry-gap between cancel and replay. Also note, that this solution does not require any change in the client side. This was debugged and developed together with Stefan Metzmacher <metze@samba.org> - thanks! Michael (This used to be ctdb commit f88103516e5ad723062fb95fcb07a128f1069d69)	2009-07-29 11:12:39 +10:00
Ronnie Sahlberg	e1b0cea427	add control and logging of very high latencies. log the type of operation and the database name for all latencies higher than a treshold (This used to be ctdb commit 1d581dcd507e8e13d7ae085ff4d6a9f3e2aaeba5)	2008-10-30 12:49:53 +11:00
Andrew Tridgell	aa1bc0abba	added a new control CTDB_CONTROL_TRANS2_COMMIT_RETRY so we can tell the difference between a initial commit attempt and a retry, which allows us to get the persistent updates counter right for retries (This used to be ctdb commit 7f29c50ccbc7789bfbc20bcb4b65758af9ebe6c5)	2008-08-08 13:11:28 +10:00
Andrew Tridgell	5a0249d34c	return a more detailed error code from a trans2 commit error (This used to be ctdb commit 6915661a460cd589b441ac7cd8695f35c4e83113)	2008-08-08 09:58:49 +10:00
Andrew Tridgell	5ee51ae84e	fixed a looping error bug with the new transactions code (This used to be ctdb commit 0592ba2a4fbd1b3b7a6bd0780eadbd6d449baaad)	2008-08-08 00:44:33 +10:00
Andrew Tridgell	bbedba23c7	cover some corner cases where the persistent database could become inconsistent (This used to be ctdb commit c76c214be401cb116265ed17ffe6c77c979ded82)	2008-08-07 13:34:18 +10:00
Andrew Tridgell	78acc59784	implemented replayable transactions in ctdb to prevent deadlock (This used to be ctdb commit b6d9a0396fb4b325778d3810dc656f719f31b9f1)	2008-08-04 14:51:51 +10:00
Andrew Tridgell	98502135e7	added new multi-record transaction commit code (This used to be ctdb commit 9ff3380099fe6f4d39de126db0826971a10ee692)	2008-07-30 19:57:00 +10:00
Ronnie Sahlberg	90ff67dc74	Only decrement the "number of persistent writes in flight" If/when it is >0 or we will break if used against an unpatched samba server (This used to be ctdb commit 52a38487f981fd5981c02a7a063ad2c598591c10)	2008-07-17 18:47:20 +10:00
Ronnie Sahlberg	6eb4e46fe1	Add two new controls to start and cancel a persistent update. This allows ctdb to automatically start a new full blown recovery if a client has started updating the local tdb for a persistent database but is kill -9ed before it has ensured the update is distributed clusterwide. (This used to be ctdb commit 1ffccb3e0b3b5bd376c5302304029af393709518)	2008-07-17 13:50:55 +10:00
Ronnie Sahlberg	334db8ccba	proper waitpid() fix. remove all waitpid() calls and use the event system to trap sigchld (This used to be ctdb commit 77458b2b6b51b2970c12b0e5b097088d3fb9d358)	2008-07-09 14:02:54 +10:00
Ronnie Sahlberg	522830dea8	Revert "waitpid() can block if it takes a long time before the child terminates" This reverts commit bfba5c7249eff8a10a43b53c1b89dd44b625fd10. revert the waitpid changes. we need to waitpid for some childredn so should refactor the approach completely (This used to be ctdb commit 702ced6c2fe569c01fe96c60d0f35a7e61506a96)	2008-07-08 17:41:31 +10:00
Ronnie Sahlberg	d67de4a7d2	waitpid() can block if it takes a long time before the child terminates so we should not call it from the main daemon. 1, set SIGCHLD to SIG_DFL to make sure we ignore this signal 2, get rid of all waitpid() calls 3, change reporting of event script status code from _exit()/waitpid() to write()/read() one byte across the pipe. (This used to be ctdb commit bfba5c7249eff8a10a43b53c1b89dd44b625fd10)	2008-07-08 03:48:11 +10:00
Ronnie Sahlberg	60a3fb926d	dont bother casting to a void* private_data pointer, just pass it as 'state' structure (This used to be ctdb commit 1d7c3eb454e33cd17c74606c4ea011fd79959c80)	2008-05-28 13:40:12 +10:00
Ronnie Sahlberg	0b0f5bc5e6	remove another field we dont need in the childwrite_handle structure (This used to be ctdb commit 70085523f4c35a20786023c489325554e2a6f9c1)	2008-05-28 13:31:58 +10:00

1 2

61 Commits