1
0
mirror of https://github.com/samba-team/samba.git synced 2024-12-28 07:21:54 +03:00
Commit Graph

61 Commits

Author SHA1 Message Date
Michael Adam
2bd04f0ff8 persistent: add ctdb_persistent_finish_trans3_commits().
This function walks all databases and checks for running trans3 commits.
It sends replies to all of them (with error code) and ends them.
To be called when a recovery finishes.

(This used to be ctdb commit 70ba153b532528bdccea70c5ea28972257f384c1)
2011-02-24 10:35:26 +01:00
Michael Adam
0b3d8d28f6 persistent: add a client context to the persistent_stat and track the db_id
The db_id is tracked in the client context as an indication that a
transaction commit is in progress. This is cleared in the persistent_state
talloc destructor.

This is in order to properly treat running trans3_commits if the client
disconnects.

(This used to be ctdb commit e886ff24f4e3e250944289db95916b948893d26c)
2011-02-24 10:35:25 +01:00
Michael Adam
65f7a44987 persistent: reject trans3_control when a commit is already active.
This should actually never happen.

(This used to be ctdb commit f416e76838fe2adf629d4356d1cc87054b1af164)
2011-02-24 10:35:25 +01:00
Michael Adam
01c2c0c262 persistent: allocate the persistent state in the ctdb_db struct in trans3_commit
Make sure that ctdb_db->persistent_state is correctly NULL-ed when
the state is freed. This way, we can use ctdb_db->persistent_state
as an indication for whether a transaction commit is currently
running.

(This used to be ctdb commit 761cb235193564a0f337d0308f0a9e6de0ef2710)
2011-02-24 10:35:25 +01:00
Michael Adam
503b647319 persistent: add a ctdb_db context to the ctdb_persistent_state struct.
(This used to be ctdb commit a14917c983c3b9bbbf38f5ddeecdbbe5bde32364)
2011-02-24 10:35:25 +01:00
Michael Adam
76acf72bc5 persistent_callback: print "no error message given" instead of "(null)"
(This used to be ctdb commit d871a38978219e004833608c11aae98fe47614b9)
2011-02-24 10:35:25 +01:00
Michael Adam
e050266690 persistent: reduce indentation for the finishing moves in ctdb_persistent_callback
(This used to be ctdb commit 2c2d1646eb753ea9561f085bcb101153267b052b)
2011-02-24 10:35:24 +01:00
Michael Adam
033ba0b466 persistent: if a node failed to update_record, trigger a recovery
and stop processing of the update_record replies in order to let
the recovery finish the trans3_commit control.

(This used to be ctdb commit cab95570dc1eefb08abbac5ae411c29f699b51cc)
2011-02-24 10:35:24 +01:00
Michael Adam
0c93a2932c persistent_store_timout: do not really time out the trans3_commit control in recovery
If a recovery was started, then all further processing of the update_record
controls sent by the trans3_commit control and timing them out is disabled.
The recovery should trigger sending the reply for the update record control
when finished.

(This used to be ctdb commit 983c1ca2e18ecd60fca69bfe9e116125cc695857)
2011-02-24 10:35:24 +01:00
Michael Adam
c9df23ae1d persistent_callback: ignore the update-recordreturn code of remote node in recovery
If a recovery was started, then all further processing of the update_record
controls sent by the trans3_commit control is disabled. The recovery should
trigger sending the reply for the update record control when finished.

(This used to be ctdb commit 12cf0619255b12230843cd8bb49cbfdea376ca2f)
2011-02-24 10:35:24 +01:00
Ronnie Sahlberg
c4006ce844 Add ctdb_fork(0 which will fork a child process and drop the real-time
scheduler for the child.

Use ctdb_fork() from callers where we dont want the child to be running
at real-time privilege.

(This used to be ctdb commit 58795a4c9e0624e20fa3e0023b65127053edd103)
2011-01-11 07:40:41 +11:00
Ronnie Sahlberg
5ef29f9f25 Update latency countes to show min/max and average
(This used to be ctdb commit 1919e949af4641ffe919123e44b02fb87c13ab9f)
2010-10-11 15:12:24 +11:00
Ronnie Sahlberg
39c367a68f Create macros to update the statistics counters and use these macros
everywhere instead of manipulating the coutenrs directly.

(This used to be ctdb commit 2e648df890e5713bc575965d87937827b068d0d7)
2010-09-29 12:14:24 +10:00
Ronnie Sahlberg
2e8aac6689 Merge commit 'rusty/ports-from-1.0.112' into foo
(This used to be ctdb commit 13e58d92f5f1723e850a82ae030d0ca57e89b1ee)
2010-08-19 13:17:56 +10:00
Rusty Russell
9fbb191b78 logging: give a unique logging name to each forked child.
This means we can distinguish which child is logging, esp. via syslog where we have no pid.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

(This used to be ctdb commit 68b3761a0874429b90731741f0531f76dcfbb081)
2010-08-18 11:46:32 +09:30
Rusty Russell
f93440c4b7 event: Update events to latest Samba version 0.9.8
In Samba this is now called "tevent", and while we use the backwards
compatibility wrappers they don't offer EVENT_FD_AUTOCLOSE: that is now
a separate tevent_fd_set_auto_close() function.

This is based on Samba version 7f29f817fa.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 85e5e760cc91eb3157d3a88996ce474491646726)
2010-08-18 09:16:31 +09:30
Ronnie Sahlberg
d7c00d8d7e Drop the debug level for logging fd creation to DEBUG_DEBUG
(This used to be ctdb commit eae1d4f9e52e73b4d8769868fffdafa590d03784)
2010-02-04 06:37:41 +11:00
Stefan Metzmacher
94bc40307a server: Use tdb_check to verify persistent tdbs on startup
Depending on --max-persistent-check-errors we allow ctdb
to start with unhealthy persistent databases.

The default is 0 which means to reject a startup with
unhealthy dbs.

The health of the persistent databases is checked after each
recovery. Node monitoring and the "startup" is deferred
until all persistent databases are healthy.

Databases can become healthy automaticly by a completely
HEALTHY node joining the cluster. Or by an administrator
with "ctdb backupdb/restoredb" or "ctdb wipedb".

metze

(This used to be ctdb commit 15f133d5150ed1badb4fef7d644f10cd08a25cb5)
2009-12-16 08:06:10 +01:00
Michael Adam
46de365e78 Add a new control CTDB_GET_DB_SEQNUM - fetch a persistent db's sequence number.
Michael

(This used to be ctdb commit a7e3b5fac6b3f5d74473f26eb86c067b35647996)
2009-12-12 00:45:39 +01:00
Michael Adam
faacd5ca79 server: add a new control CTDB_CONTROL_TRANS3_COMMIT
This is a simplified version of the trans2 commit control:
It just rolls out the marshall buffer to all active nodes.

It is the main ctdbd part of the re-implementation of the
persistent transactions. The client code is changed to
take a global lock to start a transactions and store into
the marshal buffer instead of writing to the local tdb
under a local transaction.

The old transaction implementation is going to be
removed in a later commit.

Michael

(This used to be ctdb commit f66428f9d2013080a414404c1ba6117888352fd6)
2009-12-12 00:43:26 +01:00
Michael Adam
c1039fba0e server:trans2_commit: move the check for active recovery down.
This needs to be done after the control-dispatcher:
In the TRANS2_COMMIT control, the client->db_id needs
to be set before bailing out, since otherwise the
next TRANS2_COMMIT_RETRY will fail...

Michael

(This used to be ctdb commit 59faf3f923a5989b5ee94ef02a12827412775bae)
2009-12-04 15:03:21 +01:00
Michael Adam
673a8588b1 server: fix debug message in trans2_commit (refusing persistent store during transaction)
log the right db_id
also log the client_id

Michael

(This used to be ctdb commit 48ac5c77698ab7a28d24629cc8a6985011c5d14d)
2009-10-30 09:29:25 +11:00
Michael Adam
1de0c6f807 server: uniformly log db and client ids as 8-digit hex numbers in trans2_commit
Michael

(This used to be ctdb commit 2febdd23f754a2d4699bed36b941442ab362a376)
2009-10-30 09:28:06 +11:00
Michael Adam
7384dfe4a9 server: line-wrap a debug statement in trans2_commit
Michael

(This used to be ctdb commit 3be446434adb0f3095ac0ef4b7c4a6258780b863)
2009-10-30 09:27:33 +11:00
Michael Adam
7bfa959a86 server: output client_id in some debug messages in trans2_commit
Michael

(This used to be ctdb commit 11fefd02e6c9531ffb28b9e6acaf42ba39757d87)
2009-10-30 09:26:51 +11:00
Michael Adam
4d073bd779 server: fix a debug message in trans2_commit - log the correct db_id
Michael

(This used to be ctdb commit ab9657b5a66d5665e6c5fd1bf8eb4074a3bffeec)
2009-10-30 09:26:16 +11:00
Michael Adam
dca16d5f64 server: extend a debug message in ctdb_control_trans2_error()
Michael

(This used to be ctdb commit 0fb9573d1c838b436ab9be83e197b68f35f94acb)
2009-10-30 09:24:17 +11:00
Michael Adam
2187e6c379 server: add positive debug statements to trans2_commit and trans2_finished
When the operation completed / started successfully.

Michael

(This used to be ctdb commit 0df012d58eb83195ea0365be19e0566dbc394a66)
2009-10-30 09:23:29 +11:00
Michael Adam
0113744fec server: trans2_active: don't report a transaction active on the node that performs the transaction
Otherwise a node can lock itself out, e.g. when a commit control times out...

Michael

(This used to be ctdb commit cb432e30351d5e5a41e98da3c7b1c2a4d400a3a2)
2009-10-30 09:22:18 +11:00
Ronnie Sahlberg
023d09cd38 Revert "update the "uptime" command to indicate the "time since last" is the time since the last recovery OR failover."
This reverts commit 3b0d44497800a16400d05a30bdaf6e6c285d4b36.

(This used to be ctdb commit cb36bbb5418290e8e5b770d2d836285b15da2a6f)
2009-10-29 10:49:00 +11:00
Ronnie Sahlberg
279b7ca564 update the "uptime" command to indicate the "time since last" is the time since the last recovery OR failover.
(This used to be ctdb commit 3b0d44497800a16400d05a30bdaf6e6c285d4b36)
2009-10-29 10:37:10 +11:00
Michael Adam
abac42ca34 server: add a new ctdb control CTDB_TRANS2_ACTIVE
This aske the daemon wheter a transaction is currently active on a
given DB on that node. More precisely this asks for the transaction_active
flag in the ctdb_db_context that is set in the CTDB_TRANS2_COMMIT
control and cleared in the CTDB_TRANS2_ERROR or CTDB_TRANS2_FINISHED controls.

This will be useful for fixing race conditions in the transaction code.

Michael

(This used to be ctdb commit 8d430ae6968dfe566614379436fc3c56003fcd88)
2009-10-29 10:14:30 +11:00
Michael Adam
769a36c048 In ctdb_ltdb_store(), add a missing transaction_cancel when local store failed.
Spotted by Volker.

Michael

(This used to be ctdb commit 0a4d409baabf242a87c06293789d589c896b104c)
2009-10-21 12:49:59 +11:00
Ronnie Sahlberg
9de3652380 add logging everytime we create a filedescriptor in the main ctdb daemon
so we can spot if there are leaks.

plug two leaks for filedescriptors related to when sending ARP fail
and one leak when we can not parse the local address during tcp connection establish

(This used to be ctdb commit ddd089810a14efe4be6e1ff3eccaa604e4913c9e)
2009-10-15 11:24:54 +11:00
Michael Adam
a6cf23362f ctdbd: refuse PERSISTENT_STORE if transaction is running.
Michael

(This used to be ctdb commit c07d6d90f7afd19213ad44624c3e2b9c85f4eea8)
2009-07-29 11:13:38 +10:00
Michael Adam
4cd06a330e Fix persistent transaction commit race condition.
In ctdb_client.c:ctdb_transaction_commit(), after a failed
TRANS2_COMMIT control call (for instance due to the 1-second
being exceeded waiting for a busy node's reply), there is a
1-second gap between the transaction_cancel() and
replay_transaction() calls in which there is no lock on the
persistent db. And due to the lack of global state
indicating that a transaction is in progress in ctdbd, other nodes
may succeed to start transactions on the db in this gap and
even worse work on top of the possibly already pushed changes.
So the data diverges on the several nodes.

This change fixes this by introducing global state for a transaction
commit being active in the ctdb_db_context struct and in a db_id field
in the client so that a client keeps track of _which_ tdb it as
transaction commit running on. These data are set by ctdb upon
entering the trans2_commit control and they are cleared in the
trans2_error or trans2_finished controls. This makes it impossible
to start a nother transaction or migrate a record to a different
node while a transaction is active on a persistent tdb, including
the retry loop.

This approach is dead lock free and still allows recovery process
to be started in the retry-gap between cancel and replay.
Also note, that this solution does not require any change in the
client side.

This was debugged and developed together with
Stefan Metzmacher <metze@samba.org> - thanks!

Michael

(This used to be ctdb commit f88103516e5ad723062fb95fcb07a128f1069d69)
2009-07-29 11:12:39 +10:00
Ronnie Sahlberg
e1b0cea427 add control and logging of very high latencies.
log the type of operation and the database name for all latencies higher
than a treshold

(This used to be ctdb commit 1d581dcd507e8e13d7ae085ff4d6a9f3e2aaeba5)
2008-10-30 12:49:53 +11:00
Andrew Tridgell
aa1bc0abba added a new control CTDB_CONTROL_TRANS2_COMMIT_RETRY so we can tell
the difference between a initial commit attempt and a retry, which
allows us to get the persistent updates counter right for retries

(This used to be ctdb commit 7f29c50ccbc7789bfbc20bcb4b65758af9ebe6c5)
2008-08-08 13:11:28 +10:00
Andrew Tridgell
5a0249d34c return a more detailed error code from a trans2 commit error
(This used to be ctdb commit 6915661a460cd589b441ac7cd8695f35c4e83113)
2008-08-08 09:58:49 +10:00
Andrew Tridgell
5ee51ae84e fixed a looping error bug with the new transactions code
(This used to be ctdb commit 0592ba2a4fbd1b3b7a6bd0780eadbd6d449baaad)
2008-08-08 00:44:33 +10:00
Andrew Tridgell
bbedba23c7 cover some corner cases where the persistent database could become
inconsistent

(This used to be ctdb commit c76c214be401cb116265ed17ffe6c77c979ded82)
2008-08-07 13:34:18 +10:00
Andrew Tridgell
78acc59784 implemented replayable transactions in ctdb to prevent deadlock
(This used to be ctdb commit b6d9a0396fb4b325778d3810dc656f719f31b9f1)
2008-08-04 14:51:51 +10:00
Andrew Tridgell
98502135e7 added new multi-record transaction commit code
(This used to be ctdb commit 9ff3380099fe6f4d39de126db0826971a10ee692)
2008-07-30 19:57:00 +10:00
Ronnie Sahlberg
90ff67dc74 Only decrement the "number of persistent writes in flight" If/when
it is >0    or we will break if used against an unpatched samba server

(This used to be ctdb commit 52a38487f981fd5981c02a7a063ad2c598591c10)
2008-07-17 18:47:20 +10:00
Ronnie Sahlberg
6eb4e46fe1 Add two new controls to start and cancel a persistent update.
This allows ctdb to automatically start a new full blown recovery
if a client has started updating the local tdb for a persistent database
but is kill -9ed before it has ensured the update is distributed clusterwide.

(This used to be ctdb commit 1ffccb3e0b3b5bd376c5302304029af393709518)
2008-07-17 13:50:55 +10:00
Ronnie Sahlberg
334db8ccba proper waitpid() fix.
remove all waitpid() calls and use the event system to trap sigchld

(This used to be ctdb commit 77458b2b6b51b2970c12b0e5b097088d3fb9d358)
2008-07-09 14:02:54 +10:00
Ronnie Sahlberg
522830dea8 Revert "waitpid() can block if it takes a long time before the child terminates"
This reverts commit bfba5c7249eff8a10a43b53c1b89dd44b625fd10.

revert the waitpid changes.   we need to waitpid for some childredn so should
refactor the approach completely

(This used to be ctdb commit 702ced6c2fe569c01fe96c60d0f35a7e61506a96)
2008-07-08 17:41:31 +10:00
Ronnie Sahlberg
d67de4a7d2 waitpid() can block if it takes a long time before the child terminates
so we should not call it from the main daemon.

1, set SIGCHLD to SIG_DFL to make sure we ignore this signal

2, get rid of all waitpid() calls

3, change reporting of event script status code from _exit()/waitpid()   to write()/read() one byte across the pipe.

(This used to be ctdb commit bfba5c7249eff8a10a43b53c1b89dd44b625fd10)
2008-07-08 03:48:11 +10:00
Ronnie Sahlberg
60a3fb926d dont bother casting to a void* private_data pointer,
just pass it as 'state' structure

(This used to be ctdb commit 1d7c3eb454e33cd17c74606c4ea011fd79959c80)
2008-05-28 13:40:12 +10:00
Ronnie Sahlberg
0b0f5bc5e6 remove another field we dont need in the childwrite_handle structure
(This used to be ctdb commit 70085523f4c35a20786023c489325554e2a6f9c1)
2008-05-28 13:31:58 +10:00