IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
This is clearer than using the MODFLAGS control directly.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This makes fields such as recmaster and nodemap easily available if
required.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
An unused argument needlessly extends the length of function calls. A
subsequent change will allow rec->nodemap to be used if necessary.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Inside the nested event loop in ctdb_ctrl_getnodemap(), various
asynchronous handlers may dereference rec->nodemap, which will be
NULL.
One example is lost_reclock_handler(), which causes rec->nodemap to be
unconditionally dereferenced in list_of_nodes() via this call chain:
list_of_nodes()
list_of_active_nodes()
set_recovery_mode()
force_election()
lost_reclock_handler()
Instead of attempting to trace all of the cases, just avoid leaving
rec->nodemap set to NULL. Attempting to use an old value is generally
harmless, especially since it will be the same as the new value in
most cases.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14324
Reported-by: Volker Lendecke <vl@samba.org>
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Tue Mar 24 01:22:45 UTC 2020 on sn-devel-184
The only reason for recoverd attaching to databases was to migrate
records to the local node as part of vacuuming. Recovery daemon does
not take part in database vacuuming any more.
The actual database recovery is handled via the recovery_helper and
recovery daemon should not need to attach to the databases any more.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
This is now implemented in the ctdb daemon using VACUMM_FETCH control.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Only do this if the recovery lock is unset. Log every minute for the
first 10 minutes, then every 10 minutes, then every hour.
This is useful for determining whether a split brain occurred. It is
particularly useful if logging failed or was throttled at startup, so
there is no evidence of the split brain when it began.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Simple cases where variables need to be declared as an unsigned type
instead of an int.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
state is always freed before exiting this function, so allocate fde
off it instead of long-lived ctdb context.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13943
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Currently this will wait forever. It really needs a timeout in case
the cluster filesystem (or other lock mechanism) is completely wedged.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13800
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
We really shouldn't see unknown errors. They probably represent a
misconfigured recovery lock or similar.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13800
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Add an explicit case for a timeout and clean up the other messages.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13800
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
If nested events occur while the file descriptor handler is still
active then chaos can ensue. For example, if a node is banned and the
lock is explicitly cancelled (e.g. due to election loss) then
double-talloc-free()s abound.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13800
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The lock may have been lost due to a failure in the underlying locking
mechanism. This could be due to quorum loss or similar. It is best
to call an election to confirm that this node should still be master.
At worst, the node will reelect itself, fail to take the lock and then
ban itself. This is a suitable outcome for a node that has been
partitioned from others in the cluster.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Signed-off-by: Andreas Schneider <asn@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
Autobuild-User(master): Andreas Schneider <asn@cryptomilk.org>
Autobuild-Date(master): Thu Nov 8 11:03:11 CET 2018 on sn-devel-144
This allows the attempt to be cancelled if an election is lost and an
unlock is done before the attempt is completed.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13617
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Tue Sep 18 02:18:30 CEST 2018 on sn-devel-144
If the recovery lock is in the process of being taken then free the
cluster mutex handle but leave the recovery lock handle in place.
This allows ctdb_recovery_lock() to fail.
Note that this isn't yet live because rec->recovery_lock_handle is
still only set at the completion of the attempt to take the lock.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13617
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This makes upcoming changes simpler.
Update to modern debug macro while touching relevant line.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13617
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
... not just cluster mutex handle.
This makes the recovery lock handle long-lived and with allow the
releasing code to cancel an in-progress attempt to take the recovery
lock.
The cluster mutex handle is now allocated off the recovery lock
handle.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13617
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
At the moment this is still local and is freed after the mutex is
successfully taken.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13617
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
If the master changed while trying to take the lock then fail gracefully.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13617
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Use the "failover:disabled" option instead.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13589
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
system_socket.[ch] will contain all the raw socket code and other
functions that use ctdb_sock_addr. system.[ch] will contain other
platform dependent functions.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Recovery and takeover are run via helper from recovery daemon. While the
helpers are running, it's possible for the current node to lose election.
If that happens, abort the currently running recovery/takeover helper.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
BUG: https://bugzilla.samba.org/show_bug.cgi?id=12978
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Fri Aug 25 13:32:58 CEST 2017 on sn-devel-144
BUG: https://bugzilla.samba.org/show_bug.cgi?id=12857
If we drop public IPs because CTDB is in recovery for too long, then
avoid spamming logs "Trigger takeoverrun" every second.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
BUG: https://bugzilla.samba.org/show_bug.cgi?id=12857
This can be used later in the main_loop to avoid the local ip check.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Signed-off-by: Chris Lamb <chris@chris-lamb.co.uk>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
Reviewed-by: Garming Sam <garming@catalyst.net.nz>
The recovery helper does it's own logging, so there is no need to
pass logfd.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Mon Dec 5 11:59:42 CET 2016 on sn-devel-144