IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
If a noremote node hangs for an extended period, it is possible
that we might have a DMASTER request in flight for record A to that node.
Eventually we will reuse the idr, and may reuse it for a DMASTER request to a different node for a different record B.
If while the request for B is in flight, the first tnode un-hangs and responds back
we would receive a dmaster reply for the wrong record.
This would cause a record to become perpetually locked, since inside the daemon we would tdb_chainlock(dmaster_reply->pdu->key) but once the migration would complete we would chainunlock idr->state->call->key
Adding code to verify that when we receive a dmaster reply packet that it does in fact match the exact same key that the state variable we have for the idr in flight.
(This used to be ctdb commit 2f6a870d7ff02ceb61fde242f752dccbfcb4cb37)
DEBUG(DEBUG_DEBUG,("pnn %u dmaster response %08x\n", ctdb->pnn, ctdb_has
DEBUG(DEBUG_DEBUG,("pnn %u dmaster request on %08x for %u from %u\n",
(This used to be ctdb commit a3473e7a445b14520a49585c460429dfbfe1fce0)
This gets just too noisy on a busy system.
And it is purley informational anyways...
Michael
(This used to be ctdb commit 7f64a00c76203fdf6673c3f862a4bfd17fb848d7)
With the new vacuuming code, dont treat an invalid dmaster as fatal. Let it update to the new value insetad.
(This used to be ctdb commit 5b70fa8cfd5916d3c212823ad5cc1b251ae175ed)
In ctdb_client.c:ctdb_transaction_commit(), after a failed
TRANS2_COMMIT control call (for instance due to the 1-second
being exceeded waiting for a busy node's reply), there is a
1-second gap between the transaction_cancel() and
replay_transaction() calls in which there is no lock on the
persistent db. And due to the lack of global state
indicating that a transaction is in progress in ctdbd, other nodes
may succeed to start transactions on the db in this gap and
even worse work on top of the possibly already pushed changes.
So the data diverges on the several nodes.
This change fixes this by introducing global state for a transaction
commit being active in the ctdb_db_context struct and in a db_id field
in the client so that a client keeps track of _which_ tdb it as
transaction commit running on. These data are set by ctdb upon
entering the trans2_commit control and they are cleared in the
trans2_error or trans2_finished controls. This makes it impossible
to start a nother transaction or migrate a record to a different
node while a transaction is active on a persistent tdb, including
the retry loop.
This approach is dead lock free and still allows recovery process
to be started in the retry-gap between cancel and replay.
Also note, that this solution does not require any change in the
client side.
This was debugged and developed together with
Stefan Metzmacher <metze@samba.org> - thanks!
Michael
(This used to be ctdb commit f88103516e5ad723062fb95fcb07a128f1069d69)
This can cause a memory leak if the call is terminated before we have managed to respond to the client.
(and the call is talloc_free()d but the data is still hanging off ctdb)
instead we must talloc_steal() the data and hang it off the call structure to avoid the memory leak.
In order to do this we must also change the call structure that is passed into ctdb_call_local() to be allocated through talloc().
This structure was previously either a static variable, or an element of a larger talloc()ed structure (ctdb_call_state or ctdb_client_call_state) so
we must change all creations of a ctdb_call into explicitely creating it through talloc()
(This used to be ctdb commit 4becf32aea088a25686e8bc330eb47d85ae0ef8f)
this allows us to print from which node Invalid or Dropped orphan become
dmaster packets came from
(This used to be ctdb commit 88efd1bf4c796cd2b184156b72296587bc38bb40)