IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
1) when all nodes write the same value to the record, or when writing
a value that is already there, we can skip the write and save
ourselves a network transactions
2) when all remote nodes fail an update, and we then fail a replay, we
don't need to trigger a recovery. This solves a corner case where
we could get into a recovery loop
(This used to be commit 2481bfce43)
This is because ctdbd can fail in performing the persistent_store
due to race conditions, and this does not mean it can't succeed
the next time.
To not loop infinitely, this makes use of a new parametric option:
"dbwrap ctdb:max store retries" (integer) which defaults to 5
and sets the upper limit for the number or repeats of the
fetch/store cycle.
Michael
(This used to be commit 2bcc9e6ece)
in the persistent db_ctdb_store operation.
This is to prevent deadlocks in db_ctdb_persistent_store().
There is a tradeoff: Usually, the record is still locked
after db->store operation. This lock is usually released
via the talloc destructor with the TALLOC_FREE to
the record. So we have two choices:
- Either re-lock the record after the call to persistent_store
or cancel_persistent update and this way not changing any
assumptions callers may have about the state, but possibly
introducing new race conditions.
- Or don't lock the record again but just remove the
talloc_destructor. This is less racy but assumes that
the lock is always released via TALLOC_FREE of the record.
I choose the first variant for now since it seems less racy.
We can't guarantee that we succeed in getting the lock
anyways. The only real danger here is that a caller
performs multiple store operations after a fetch_locked()
which is currently not the case.
Michael
(This used to be commit d004c9a728)
The lockup could happen when packet_read_sync() gets two packets in a row, the
first one being an async message, and the second one being the response to a
ctdb request.
Also add some debug msg to ctdb_conn.c, and cut off the "locking key" messages
to only dump 20 hex chars at debug level 10. >10 will dump everything.
(This used to be commit 0a55880a24)
I'm 100% certain I've forgotten to merge something, but the main code
should be in. It's mainly in dbwrap_ctdb.c, ctdbd_conn.c and
messages_ctdbd.c.
There should be no changes to the non-cluster case, it does survive make
test on my laptop.
It survives some very basic tests with ctdbd enables, I did not do the
full test suite for clusters yet.
Phew...
Volker
(This used to be commit 15553d6327)