IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
This is used to reply to the recmode control for all the different
cases. The callers can later be generalised to use a pointer, which
can then be used for recovery lock handling in different contexts.
Note that the handle is now freed in set_recmode_handler() rather than
the callbacks.
There is one difference in behaviour. Deferred attach calls are now
processed in the timeout case, where they weren't before. That's a
bug fix!
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
'0' = Child took the mutex
'1' = Unable to take mutex - contention
'2' = Unable to take mutex - timeout
'3' = Unable to take mutex - error
This is a straightforward API. When the child is generalised to an
external helper then this makes it easier for a helper to be, for
example, a simple script.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Use the more general name "cluster mutex", since we are likely to end
up with more than one cluster-wide lock. There will probably be a
dedicated recovery lock, held only during recovery, and also a second
lock that is held by the master node. Currently one lock is used for
both purposes.
At the moment the struct and functions are involved with setting the
recovery mode. However, they'll be abstracted out to more generally
deal with the cluster mutexes, so "recmode" -> "cluster_mutex". Drop
"set" from names, since this is used to test the lock. Also drop
"ctdb" prefix from functions, since they are local to this file. The
struct will eventually be a long-lived handle that will release the
mutex when freed, so name it accordingly.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
It does not make sense to update this statistic for the timeout case,
since this could skew the statistic. To keep it simple, just update
it for the usual case where there is lock contention, since this is
the usual case. So the daemon statistic measures time to test the
lock and the corresponding recovery daemon statistic measures time to
take the lock.
Additionally, the recovery daemon will eventually use this code to
take the lock, and the method of updating the latency statistic will
need to be pushed further out to a configurable handler that depends
on the calling context.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue Feb 23 10:32:06 CET 2016 on sn-devel-144
Have 0 indicate that the lock was taken. This allows non-zero values
to be used to indicate why the lock could not be taken. EACCES means
lock contention.
For now use just EACCES to cover all failures, since
ctdb_recovery_lock() returns a bool and details of other errors will
be lost. ctdb_recovery_lock() will undergo some big changes, so don't
try to fix this now.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This currently returns an incorrect error when the expected number of
bytes are not read. Separate out the different cases to clarify the
logic and avoid reporting the wrong error.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This is already done before the destructor is assigned.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The callbacks that use this value are only ever called if recovery
mode is being set to NORMAL. So do not check if recmode is NORMAL
either.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The child process writes the status into the pipe before looping to
wait.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This groups function prototypes for common client/server functions in
common/common.h and removes them from ctdb_private.h.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Instead of includes.h, include the required header files explicitly.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
This groups function prototypes for system specific functions in
common/system.h and removes them from ctdb_private.h.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
In the parallel database recovery model, all the database will not remain
frozen at the same time. So relax the condition to check if recovery
is active.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Before setting recovery mode to normal, confirm that all the databases are
recovered by matching the database generation with the global generation.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Instead of marking all the databases with priority, mark only the database
which is currently being processed.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
These variables are used for state information related to freezing
databases. Instead use the API functions to check if the databases
are frozen.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Use ctdb->freeze_mode only in ctdb_freeze.c and use the functions to
check if databases are frozen everywhere else.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
This reverts commit 39d2fd330a60ea590d76213f8cb406a42fa8d680.
An election can occur in the middle of a recovery. During the
election the recovery master can change. When a node loses a round of
the election and stops being the recovery master it releases the
recovery lock. Then at the end of the ongoing recovery all nodes are
able to take the recovery lock so they will all abort.
The most likely cause for a change in recovery master is that several
(all?) nodes are starting up and the "connected-ness" of each node is
a primary factor in winning the election. In this situation the
recovery master can bounce around the cluster.
The simplest solution is to revert this patch so that the recovery
will fail. The new recovery master will then start a new recovery.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Mon May 4 10:40:36 CEST 2015 on sn-devel-104
Presumably this was done to minimise the chance of a recovery
occurring while the nodemaps are inconsistent across nodes.
Another potential theory is that the forced recovery in the
ctdb.c:control_reload_nodes_file() stops another recovery occurring
for ReRecoveryTimeout seconds, so this delay causes the reloads to
occur during that period.
This is no longer necessary because recoveries are now explicitly
disabled while node files are reloaded.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Every time a nodemap is contructed the node IP addresses all need to
be parsed. This isn't very productive use of CPU.
Instead, parse each string once when the nodes file is loaded. This
results in much simpler code.
This code also removes the use of ctdb_address. Duplicating the port
is pointless without an abstraction layer around ctdb_address. If
CTDB gets an incompatible transport in the future then add an
abstraction layer.
Note that the infiniband code is not updated. Compilation of the
infiniband code is already broken. Fixing it will be a separate,
properly tested effort.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
When the daemon is able to take the recovery lock during recovery we
might as well guess that the cluster filesystem has a lock coherence
problem and print a more useful message. This will be more helpful to
those trying out cluster filesystems that don't have lock coherence or
that are difficult to setup.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Have it just silently take or fail to take the lock, except on an
unexpected failure (where it should log an error).
This means that when it is called we need to keep the old behaviour
and explicitly release the lock. In do_recovery() the lock is
released and a message is printed before attempting to take the lock.
In the daemon sanity check the lock must be released in the error path
if it is actually taken.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Unlock the recovery lock file. This way knowledge of the file
descriptor isn't sprinkled throughout the code.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
It is pointless having a recovery lock but not sanity checking that it
is working. Also, the logic that uses this tunable is confusing. In
some places the recovery lock is released unnecessarily because the
tunable isn't set.
Simplify the logic by assuming that if a recovery lock is specified
then it should be verified.
Update documentation that references this tunable.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Print out the errno if the fcntl call.
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Richard Sharpe <rsharpe@samba.org>
Autobuild-User(master): Michael Adam <obnox@samba.org>
Autobuild-Date(master): Fri Jan 9 04:25:02 CET 2015 on sn-devel-104
This makes it consistent with Samba, to ease transition.
Update unit test code to link to with tdb_wrap instead of including
db_wrap.c.
There are some potential whitespace fixes in this commit that have
been ignored. CTDB's lib/tdb_wrap will be deleted after the
transition to Samba's lib/tdb_wrap, so there's no point polishing it
too much.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This prevents ctdb tool from thawing databases prematurely in
thaw/wipedb/restoredb commands if recovery is active.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
This was added to support external monitoring using CTDB event scripts.
However, it was never used.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>