IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
The recovery lock helper must exit when it notices its parent is gone.
However, that can take a few seconds.
The usual way of terminating the recovery daemon is for the main ctdbd
to send it a SIGTERM. Installing a handler is nice and simple.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
If the process holding the recovery lock terminates unexpectedly then
the recovery daemon needs to know that the lock is no longer held.
While here, rename hold_reclock_handler() to take_reclock_handler() so
there is a clear difference between the two handler names.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This makes the API more general. If they are needed in a handler then
they can be in the private data.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
It won't be called more than once by the cluster mutex code.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
It isn't necessarily a file.
Don't bother changing the control, since it doesn't pervade the code.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Support for updating the recovery lock is being removed because it
isn't possible to recover from failure.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
If the node becomes stopped or banned after recovery is marked
active, then it will never freeze the databases, and hence the
node will keep banning itself indefinitely, until ctdbd is restarted.
This is a regression from 4.3, introduced with
b4357a79d9
and
d8f3b490bb
BUG: https://bugzilla.samba.org/show_bug.cgi?id=11945
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Michael Adam <obnox@samba.org>
Autobuild-Date(master): Wed Jun 1 17:36:12 CEST 2016 on sn-devel-144
If a recovery is going to be done then this will be followed by a
takeover run anyway. So, there's no use doing the takeover run
checks, potentially doing a takeover run and then doing a recovery.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The recovery daemon should be less involved in the service monitoring
logic.
The cases handled here are already handled elsewhere:
* When a node becomes unhealthy/healthy the monitoring code will
trigger a takeover run
* When a node is disabled/enabled the ctdb CLI tool will trigger a
takeover run
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Banning is now handled by the takeover code sending banning credit
messages.
This commit makes a change in behaviour quite obvious. Takeover runs
were initiated from several locations in the code but banning was only
done from one of these locations. Now banning can be done from any
failed takeover run.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Now all the IP takeover code for non-master node is in this function.
The function can always be renamed to something more suitable.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Fri May 6 15:10:59 CEST 2016 on sn-devel-144
Update log levels and messages, comments and wrapping of long lines.
No functional changes.
Note that interfaces_have_changed() already does adequate logging.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
When public IP checking is disabled, verify_local_ip_allocation()
still retrieves known IP addresses and runs through a loop that does
nothing.
Instead, completely skip the retrieval and checking loop.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This makes verify_local_ip_allocation() self-contained and simplifies
main_loop().
Due to indentation changes, this commit is most easily read when
ignoring whitespace.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
There is no need to return one of several states and then trigger an
election for one of those return states. Have the recovery master
validation trigger the election directly and just return whether
monitoring should continue.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Change this to return just 0 or -1. It isn't monitoring anything.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
update_local_flags() never returns MONITOR_ELECTION_NEEDED, so drop
this entire if-statement.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The cluster mutex code already passes the latency and expects the
handler to update the statistics.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
ctdb_recovery_have_lock(), ctdb_recovery_lock(),
ctdb_recovery_unlock() are only used by recovery daemon, so move them
there.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This will be called from recovery helper to assign banning credits to
misbehaving node.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
If set, this was used to setup an IP takeover run on a timer after
certain updates to the public IP address configuration (e.g. "ctdb
addip").
However, "ctdb reloadips" completely manages public IP reconfiguration
and avoids the anomalies that DeferredRebalanceOnNodeAdd was
introduced to work around.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This matches the behaviour during serial database recovery.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Thu Feb 11 08:01:14 CET 2016 on sn-devel-144
The first element of these structures is a 32-bit PNN. On 64-bit
systems this field can be followed by 32-bits of padding. When the
structures are copied this can cause uninitialised memory to be
copied.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
It hasn't worked since commit cda5f02c7c
in 2009, which reworked the banning code. Since then
ctdb_control_modflags() has contained a comment saying:
/* we don't let other nodes modify our BANNED status */
Unbanning all nodes originally occurred here when the recovery master
role moved to a new node. The logic could have been meant for the
case when the old recovery master was malfunctioning, so got banned.
If any other nodes had been banned by this recovery master then they
would be unbanned. However, this would also unban the old recovery
master, which is probably suboptimal. The logic would also trigger if
a node was banned for a good reason and then the recovery master was
stopped. So, apart from doing nothing, the logic is too simplistic so
might as well be removed.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Starting to untangle cluster management, database recovery and public
IP allocation. This is a non-trivial subset of the cluster management
code that runs in the recovery daemon on all nodes.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Mon Nov 16 11:47:45 CET 2015 on sn-devel-104
Capabilities are used when computing an election result so having them
up-to-date seems like a good idea.
Also update several instances of an ambiguous comment.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The VNN map is only needed on the recovery master, so no need for all
recovery daemons to retrieve it.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This is already handled in update_recovery_lock(), which is called
immediately before.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The only non-obvious part here is dropping the setting of the nodemap
local variable to NULL. If the following control succeeds then it is
set, otherwise return and it doesn't matter.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
That is, using CTDB_CURRENT_NODE makes this more obvious.
Also fix incorrect error messages.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Each recovery daemon knows who the recmaster is and is in sync with
its local daemon. The recovery master is running this check so do not
bother checking with its local daemon - both agree that it is the
recovery master.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The recovery daemon already knows which node is the master. This
relies on rec->recmaster being correctly initialised and correctly set
during elections.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Recovery should not do cluster management functions. Setting the
recovery master should only be done via an election.
Main loop will determine if recovery master is inconsistent across the
cluster and force an election if necessary.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The recovery daemon pushes knowledge of recovery master election
progress/result to local daemon. It then retrieves that information
again.
Instead, have the recovery daemon reliably track election
progress/result in rec->recmaster so it doesn't need to be retrieved.
Be careful to maintain consistency by only doing this when the local
daemon has been updated.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
There can be no holes in the nodemap. Even if a node has been deleted
it will take a slot in the nodemap. The only exception is that the
nodemap shrinks if nodes are deleted from the end. That should never
include the master because a node should be shutdown before being
deleted, and an election should already have take place.
To avoid walking off the end of the nodemap nodes array just confirm
that the master node's PNN is a valid index into the array. No need
to walk through the nodemap.
After this, in this section of the code j is now invalid. So use the
master's PNN to index into the nodemap. This is safe.
In the process, clean up some log messages to avoid saying "Force
reelection". It's just an "election".
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This is currently done before each IP takeover run, so just factor it
in.
ctdb_reload_remote_public_ips() becomes static.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Thu Nov 12 09:28:45 CET 2015 on sn-devel-104