1
0
mirror of https://github.com/samba-team/samba.git synced 2025-01-26 10:04:02 +03:00

29 Commits

Author SHA1 Message Date
Ronnie Sahlberg
7730facc62 fix a debug message
(This used to be ctdb commit 856bd6de6218d9b70baed0e6443be4253ea31afe)
2010-06-09 16:22:44 +10:00
Ronnie Sahlberg
d9a3e1d0c0 idr can timeout and wrap/be reused quite quickly.
If a noremote node hangs for an extended period, it is possible
that we might have a DMASTER request in flight for record A to that node.
Eventually we will reuse the idr, and may reuse it for a DMASTER request to a different node for a different record B.

If while the request for B is in flight,  the first tnode un-hangs and responds back
we would receive a dmaster reply for the wrong record.

This would cause a record to become perpetually locked, since inside the daemon we would tdb_chainlock(dmaster_reply->pdu->key)   but once the migration would complete we would chainunlock   idr->state->call->key

Adding code to verify that when we receive a dmaster reply packet that it does in fact match the exact same key that the state variable we have for the idr in flight.

(This used to be ctdb commit 2f6a870d7ff02ceb61fde242f752dccbfcb4cb37)
2010-06-09 16:19:29 +10:00
Ronnie Sahlberg
75f3ef154c add extra logging for failed ctdb_ltdb_unlock() for a few more places
it is called from

(This used to be ctdb commit 5c0fea90c6474a51992a9c4aeb6af7dfeb213ee0)
2010-06-09 14:37:24 +10:00
Ronnie Sahlberg
fa618aa66a add additional logging when tdb_chainunlock() fails
so we can see where it was called from when it fails

(This used to be ctdb commit 0c091b3db6bdefd371787d87bc749593ea8e3c76)
2010-06-09 14:37:16 +10:00
Michael Adam
b72ccfc39a server:ctdb_send_dmaster_reply: fix a message typo.
Michael

(This used to be ctdb commit aa63f728152c37e31cecf2258efcdc8cf5ac0092)
2010-02-23 21:07:54 +11:00
Ronnie Sahlberg
06fdfddf27 Reducing the log level for a debug message
DEBUG(DEBUG_DEBUG,("pnn %u starting migration of %08x t\

(This used to be ctdb commit 6ce4b21b00cce1530aff022584bf695c257a5d55)
2010-02-16 11:02:01 +11:00
Ronnie Sahlberg
ce9d57bc36 Reduce the log level for two debug messages
DEBUG(DEBUG_DEBUG,("pnn %u dmaster response %08x\n", ctdb->pnn, ctdb_has
       DEBUG(DEBUG_DEBUG,("pnn %u dmaster request on %08x for %u from %u\n",

(This used to be ctdb commit a3473e7a445b14520a49585c460429dfbfe1fce0)
2010-02-16 11:01:52 +11:00
Michael Adam
ea65e80223 call: lower the debug message "refusing migration while transction" to lvl INFO
This gets just too noisy on a busy system.
And it is purley informational anyways...

Michael

(This used to be ctdb commit 7f64a00c76203fdf6673c3f862a4bfd17fb848d7)
2009-12-09 21:56:59 +01:00
Ronnie Sahlberg
f5e90ec3b5 Revert "From Wolfgang M."
This reverts commit 5b70fa8cfd5916d3c212823ad5cc1b251ae175ed.

(This used to be ctdb commit 363e7e939ad46b3f75c83c30d4163d63876c2456)
2009-10-29 13:44:12 +11:00
Ronnie Sahlberg
831f9e05a6 From Wolfgang M.
With the new vacuuming code, dont treat an invalid dmaster as fatal. Let it update to the new value insetad.

(This used to be ctdb commit 5b70fa8cfd5916d3c212823ad5cc1b251ae175ed)
2009-10-22 07:58:44 +11:00
Michael Adam
4cd06a330e Fix persistent transaction commit race condition.
In ctdb_client.c:ctdb_transaction_commit(), after a failed
TRANS2_COMMIT control call (for instance due to the 1-second
being exceeded waiting for a busy node's reply), there is a
1-second gap between the transaction_cancel() and
replay_transaction() calls in which there is no lock on the
persistent db. And due to the lack of global state
indicating that a transaction is in progress in ctdbd, other nodes
may succeed to start transactions on the db in this gap and
even worse work on top of the possibly already pushed changes.
So the data diverges on the several nodes.

This change fixes this by introducing global state for a transaction
commit being active in the ctdb_db_context struct and in a db_id field
in the client so that a client keeps track of _which_ tdb it as
transaction commit running on. These data are set by ctdb upon
entering the trans2_commit control and they are cleared in the
trans2_error or trans2_finished controls. This makes it impossible
to start a nother transaction or migrate a record to a different
node while a transaction is active on a persistent tdb, including
the retry loop.

This approach is dead lock free and still allows recovery process
to be started in the retry-gap between cancel and replay.
Also note, that this solution does not require any change in the
client side.

This was debugged and developed together with
Stefan Metzmacher <metze@samba.org> - thanks!

Michael

(This used to be ctdb commit f88103516e5ad723062fb95fcb07a128f1069d69)
2009-07-29 11:12:39 +10:00
Ronnie Sahlberg
e6e1ff32a5 dont try sending a keepalive if the transport is down
(This used to be ctdb commit 5cdc04669db8c2ddbbff5af82307a16e8d807b83)
2009-06-30 12:17:05 +10:00
Ronnie Sahlberg
6450ae533a Dont even try allocating and sending a CALL packet if the transport is down
(This used to be ctdb commit cb8dd896914d4e44ad7b8bb000176a7c78f394ae)
2009-06-30 12:16:13 +10:00
Ronnie Sahlberg
127754e192 failing a dmaster send due to the transport being down is fatal
(This used to be ctdb commit c17dafc79bec25bbb796478c33f503503d382a20)
2009-06-30 12:14:58 +10:00
Ronnie Sahlberg
757ba01ddc if we fail a dmaster migration due to the transport being down, then that is a fatal condition.
(This used to be ctdb commit 75dea671f68ac6649095357c36b3697a927721e9)
2009-06-30 12:13:15 +10:00
Ronnie Sahlberg
dd1774cd85 dont try to send error packets if the transport is down
(This used to be ctdb commit 65b94d280731df3245b26d69f39acfaf5bccf0d8)
2009-06-30 12:10:27 +10:00
Ronnie Sahlberg
22fb69d337 dont even try to allocate a packet if the transport is down since it will fail
(This used to be ctdb commit a73f316cb9cec877dc0bc3f7baa21be1b1454273)
2009-06-30 11:55:42 +10:00
Ronnie Sahlberg
26ec64a571 fix a memory leak
allocate the memory to the 'call' context and not off the 'ctdb' context

(This used to be ctdb commit be89005bd5d13409e377d425db2aad1c0d5b3826)
2008-03-25 11:11:13 +11:00
Ronnie Sahlberg
d53424731f in ctdb_call_local() we can not talloc_steal() the returned data and hang it off ctdb.
This can cause a memory leak if the call is terminated before we have managed to respond to the client.
(and the call is talloc_free()d but the data is still hanging off ctdb)

instead we must talloc_steal() the data and hang it off the call structure to avoid the memory leak.

In order to do this we must also change the call structure that is passed into ctdb_call_local() to be allocated through talloc().

This structure was previously either a static variable, or an element of a larger talloc()ed structure (ctdb_call_state or ctdb_client_call_state) so
we must change all creations of a ctdb_call into explicitely creating it through talloc()

(This used to be ctdb commit 4becf32aea088a25686e8bc330eb47d85ae0ef8f)
2008-03-19 13:54:17 +11:00
Andrew Tridgell
f6e53f433b merge from ronnie
(This used to be ctdb commit e7b57d38cf7255be823a223cf15b7526285b4f1c)
2008-02-04 20:07:15 +11:00
Andrew Tridgell
9d6ac0cf55 added debug constants to allow for better mapping to syslog levels
(This used to be ctdb commit 7ba8f1dde318eab03f4257e5a89fd23e7281e502)
2008-02-04 17:44:24 +11:00
Andrew Tridgell
fc21f78231 make some specific cases of the non-dmaster bug non-fatal
(This used to be ctdb commit 7b516ab06c7ba7ffe9ecf3f76720df5360176b2c)
2008-01-05 09:32:29 +11:00
Ronnie Sahlberg
f69321edc8 change debug output from vnn to pnn
(This used to be ctdb commit 93a7cf759ae3f9af6671b9f8589e1399a669b46f)
2007-09-04 10:47:02 +10:00
Ronnie Sahlberg
eb4cf6a686 change ctdb->vnn to ctdb->pnn
(This used to be ctdb commit 8c776e5707e503ec6586aae39ac6b3ea5a2fd2bc)
2007-09-04 10:06:36 +10:00
Ronnie Sahlberg
135a964220 pass the header to ctdb_become_dmaster instead of just the reqid
this allows us to print from which node Invalid or Dropped orphan become 
dmaster packets came from

(This used to be ctdb commit 88efd1bf4c796cd2b184156b72296587bc38bb40)
2007-07-11 09:44:52 +10:00
Andrew Tridgell
32de198fd3 update lib/replace from samba4
(This used to be ctdb commit f0555484105668c01c21f56322992e752e831109)
2007-07-10 15:29:31 +10:00
Andrew Tridgell
a55c03b31b log the generation numbers to give a hint about this bug
(This used to be ctdb commit 12018494baa33c5f6c52e6eae94ac77a56d3e5a0)
2007-07-08 19:36:55 +10:00
Andrew Tridgell
06a71762a4 some #include cleanups
(This used to be ctdb commit 1a07d87122d51a40cd8ad5fe13533298c26857cb)
2007-06-07 22:26:27 +10:00
Andrew Tridgell
ae3d54094b start splitting the code into separate client and server pieces
(This used to be ctdb commit 603cd77988c181525946cd5eb0f4d0d646b58059)
2007-06-07 22:06:19 +10:00