IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Otherwise a node can lock itself out, e.g. when a commit control times out...
Michael
(This used to be ctdb commit cb432e30351d5e5a41e98da3c7b1c2a4d400a3a2)
In ctdb_transaction_commit(), when the trans2_commit control fails, there
is a race condition in the 1 second sleep between the local transaction_cancel
and the call to ctdb_replay_transaction(): The database is not locked, and
neither is the transaction_lock record. So another client can start and possibly
complete a new transaction in this gap, but only on the same node: The locking
of the transaction_lock record on a different node which involves migration of
the record to the other node has been disabled by introduction of the
transaction_active flag on the db which closes precisely this gap from the start
of the commit until the call to TRANS2_FINISH or TRANS2_ERROR.
But this mechanism does not cover the case where a process on the same node
tries to start a transaction: There is no obstacle to locking the transaction_lock
record because the record does not need to be migrated.
This commit closes this race condition in ctdb_transaction_fetch_start()
by using the new ctdb_ctrl_transaction_active() call to ask the local
ctdb daemon whether it has a transaction running on the database.
If so, the check is repeated until the running transaction is done.
This does introduce an additional call to the local ctdbd when starting
transactions, but it does close the (hopefully) last race condition.
Michael
(This used to be ctdb commit 02ee9dfd3c6b09f5c5172a7e38738c20b7f0aecd)
This aske the daemon wheter a transaction is currently active on a
given DB on that node. More precisely this asks for the transaction_active
flag in the ctdb_db_context that is set in the CTDB_TRANS2_COMMIT
control and cleared in the CTDB_TRANS2_ERROR or CTDB_TRANS2_FINISHED controls.
This will be useful for fixing race conditions in the transaction code.
Michael
(This used to be ctdb commit 8d430ae6968dfe566614379436fc3c56003fcd88)
Commit 25e82a8a667a54c6921ef076c63fdd738dd75d19 changed wait_until()
to protect the command it runs from "set -e" by running it in a
subshell. This breaks uses where the command is expected to set
global variables. For example, wait_until_get_src_socket lost the
value of $out from its call to get_src_socket().
The fix is to not be lazy and use a sub-shell!
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 39642e745254d93d74dde907787503854fe6ca4a)
When a single script is finished, also log the name of the script, the duration it took and the return status.
In the loop where we signal back to the main daemon that the script finished, do this once every 100ms instead of once every 1 second
(This used to be ctdb commit 6a1f7a7b1b3a0b8f89998db8fdad83bbb4e9b5a5)
add a global variable holding the pid of the main daemon.
change the tracking of time() in the event loop to only check/warn when called from the main daemon
(This used to be ctdb commit a10fc51f4c30e85ada6d4b7347b0f9a8ebc76637)
All event scripts use only the relative path, so we should
here.
Also PATH includes /sbin and /usr/sbin...
metze
(This used to be ctdb commit 20678e1506db1f96b58c326ee91339e797c07c22)
The problem was this:
When the monitor event fails, the node->flags get updated,
and an update (containing the old and new flags) is sent to
the recovery master.
If the recovery master sends the update to itself (the same process),
it was compairing the node->flags variable with the received new flags.
This check always found both flag values to be equal
and never sets the rec->need_takeover_run variable to true.
There were two problem, first the push_flags_handler() function
didn't pass the received old flags.
And the ctdb_control_modflags() function ignored the received old flags.
metze
(This used to be ctdb commit 8ec633b64a05a2d903c2b9639909f15f6375548f)
log an error if the clock jumps backwards
also log an error if the clock jumps >5 seconds forward (we assume here we will get at least one event every 5 seconds)
(This used to be ctdb commit 11193e1e192bee6f579bdf1303153571a82711d7)
make ctdb_queue_length() cheaper by using a counter variable instead of counting the number of packets each time.
(This used to be ctdb commit 331c6e3afd96d8b5e191153a631efdbdabb6ea33)
The way to use this is from a client to :
1, first create a message handle and bind it to a SRVID
A special prefix for the srvid space has been set aside for samba :
Only samba is allowed to use srvid's with the top 32 bits set like this.
The lower 32 bits are for samba to use internally.
2, register a "notification" using the new control :
CTDB_CONTROL_REGISTER_NOTIFY = 114,
This control takes as indata a structure like this :
struct ctdb_client_notify_register {
uint64_t srvid;
uint32_t len;
uint8_t notify_data[1];
};
srvid is the srvid used in the space set aside above.
len and notify_data is an arbitrary blob.
When notifications are later sent out to all clients, this is the payload of that notification message.
If a client has registered with control 114 and then disconnects from ctdbd, ctdbd will broadcast a message to that srvid to all nodes/listeners in the cluster.
A client can resister itself with as many different srvid's it want, but this is handled through a linked list from the client structure so it mainly designed for "few notifications per client".
3, a client that no longer wants to have a notification set up can deregister using control
CTDB_CONTROL_DEREGISTER_NOTIFY = 115,
which takes this as arguments :
struct ctdb_client_notify_deregister {
uint64_t srvid;
};
When a client deregisters, there will no longer be sent a message to all other clients when this client disconnects from ctdbd.
(This used to be ctdb commit f1b6ee4a55cdca60f93d992f0431d91bf301af2c)
The waiters reference the locakwait handle in order to remove itself from the li
nked list which caused a SEGV.
We dont actually need to remove ourselves from this list here since
if the parent freeze_handle holding the list is freed, then all waiters are rele
ased as well, and the only place we actually need to relink the waiter is in ctd
b_freeze_lock_handler, where we want to respond back to the clients and release
the waiters but we still want to keep the freeze_handle hanging around.
(This used to be ctdb commit e01ab46bafad09a5e320d420734db129d35863bc)
With the new vacuuming code, dont treat an invalid dmaster as fatal. Let it update to the new value insetad.
(This used to be ctdb commit 5b70fa8cfd5916d3c212823ad5cc1b251ae175ed)
The timeout for waiting for state changes isn't very predictable. It
is "about" MonitorInterval seconds... but can be longer given the
duration of eventscript runs and other things. So, we change the
timeout to MonitorInterval + EventScriptTimeout, hoping it never takes
that long.
Move the eventscript installation/removal from the old fake-tests into
a function in the functions file. Implement supporting functions to
create/remove/check-for various files that it handles. Also add a
function that uses all of this that waits for the next monitor event
(but only if all other monitor events pass).
The final check in the skip share check tests uses the above and waits
for a monitor event, and then checks that the node is still healthy.
Also enhance the wait_until function to handle a command starting with
'!' (as a separate word) to make it easy to wait for a file not to
exist.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 25e82a8a667a54c6921ef076c63fdd738dd75d19)
This can race with teh eventascript handling that does a :
list all scripts, sort them, then execute them
so trap status code 127 which means the script could not be executed (or /bin/sh does not exist) and treat it as not to cause the node to become unhealthy
(This used to be ctdb commit befabc917edb036ca81f5216f65a6d62b26ee83e)