IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
1) when all nodes write the same value to the record, or when writing
a value that is already there, we can skip the write and save
ourselves a network transactions
2) when all remote nodes fail an update, and we then fail a replay, we
don't need to trigger a recovery. This solves a corner case where
we could get into a recovery loop
(This used to be commit 2481bfce4307274806584b0d8e295cc7f638e184)
This is because ctdbd can fail in performing the persistent_store
due to race conditions, and this does not mean it can't succeed
the next time.
To not loop infinitely, this makes use of a new parametric option:
"dbwrap ctdb:max store retries" (integer) which defaults to 5
and sets the upper limit for the number or repeats of the
fetch/store cycle.
Michael
(This used to be commit 2bcc9e6ecef876030e552a607d92597f60203db2)
in the persistent db_ctdb_store operation.
This is to prevent deadlocks in db_ctdb_persistent_store().
There is a tradeoff: Usually, the record is still locked
after db->store operation. This lock is usually released
via the talloc destructor with the TALLOC_FREE to
the record. So we have two choices:
- Either re-lock the record after the call to persistent_store
or cancel_persistent update and this way not changing any
assumptions callers may have about the state, but possibly
introducing new race conditions.
- Or don't lock the record again but just remove the
talloc_destructor. This is less racy but assumes that
the lock is always released via TALLOC_FREE of the record.
I choose the first variant for now since it seems less racy.
We can't guarantee that we succeed in getting the lock
anyways. The only real danger here is that a caller
performs multiple store operations after a fetch_locked()
which is currently not the case.
Michael
(This used to be commit d004c9a7281d2577c3ba2012c8f790cc198ea700)
Only filled in for tdb so far, for rbt it's pointless, and ctdb itself needs to
be extended
(This used to be commit 0a55e018dd68af06d84332d54148bbfb0b510b22)
The lockup could happen when packet_read_sync() gets two packets in a row, the
first one being an async message, and the second one being the response to a
ctdb request.
Also add some debug msg to ctdb_conn.c, and cut off the "locking key" messages
to only dump 20 hex chars at debug level 10. >10 will dump everything.
(This used to be commit 0a55880a240b619810371a19144dd0a75208adfe)
I'm 100% certain I've forgotten to merge something, but the main code
should be in. It's mainly in dbwrap_ctdb.c, ctdbd_conn.c and
messages_ctdbd.c.
There should be no changes to the non-cluster case, it does survive make
test on my laptop.
It survives some very basic tests with ctdbd enables, I did not do the
full test suite for clusters yet.
Phew...
Volker
(This used to be commit 15553d6327a3aecdd2b0b94a3656d04bf4106323)