IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Presumably this was done to minimise the chance of a recovery
occurring while the nodemaps are inconsistent across nodes.
Another potential theory is that the forced recovery in the
ctdb.c:control_reload_nodes_file() stops another recovery occurring
for ReRecoveryTimeout seconds, so this delay causes the reloads to
occur during that period.
This is no longer necessary because recoveries are now explicitly
disabled while node files are reloaded.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The potential resulting recovery won't run anyway. Also recoveries
may have been disabled by "reloadnodes" and if the nodemaps are
inconsistent between nodes then avoid triggering an unnecessary
recovery.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Factor out new function srvid_disable_and_reply(), which can be
re-used.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This can be used to disable and re-enable an operation, and do all the
relevant sanity checking.
Most of this is from existing functions
disable_takeover_runs_handler(), clear_takeover_runs_disable() and
reenable_takeover_runs().
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This is like CTDB_CONTROL_GET_NODEMAP but it loads from the nodes file
instead of the daemon.
Also new client function ctdb_ctrl_getnodesfile()
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
New function ctdb_read_nodes_file() reads a nodes file into a node
map, which is a useful intermediate format. This function should
replace the node reading code in the ctdb CLI tool. It will also be
useful for sanity checking of nodes files across the cluster.
New function convert_node_map_to_list() converts a node map to a node
array (and associated node count). This fills in the details that
aren't present in the node map. This may also useful as a separate
function later if node list reloading stages the data after a sanity
check - the approach is not yet finalised.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Every time a nodemap is contructed the node IP addresses all need to
be parsed. This isn't very productive use of CPU.
Instead, parse each string once when the nodes file is loaded. This
results in much simpler code.
This code also removes the use of ctdb_address. Duplicating the port
is pointless without an abstraction layer around ctdb_address. If
CTDB gets an incompatible transport in the future then add an
abstraction layer.
Note that the infiniband code is not updated. Compilation of the
infiniband code is already broken. Fixing it will be a separate,
properly tested effort.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Having it require a CTDB context stops ctdb_parse_address() from being
used in more generic code. Just use the existing talloc context for
memory allocations.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Just add a flags parameter to ctdb_add_nodes() and use the same code.
Less is more.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This is currently set in 2 places. One of them makes the node loading
code difficult to refactor. Also, when the surrounding code in either
place is touched then it might get broken.
This only needs to be done once at startup, not on every reload. So
do it once in a very obvious way, sacrificing a few CPU cycles for
some added clarity.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Each node reload unnecessarily and incorrectly resets the VNN map,
causing a potentially unnecessary recovery. When nodes are reloaded
any newly deleted nodes should already be disconnected and any newly
added nodes should also be disconnected. This means that reloading
the nodes file should not cause a change in the VNN map.
The current implementation also leaks memory every time the nodes are
reloaded.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Michael Adam <obnox@samba.org>
Autobuild-User(master): Michael Adam <obnox@samba.org>
Autobuild-Date(master): Fri Mar 13 19:14:20 CET 2015 on sn-devel-104
scandir allocates every name individually, see example code in susv4 or man
scandir
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Michael Adam <obnox@samba.org>
The use of talloc with a static variable is somewhat confusing.
Statically allocate an array and use ctdb_set_helper() instead.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Volker Lendecke <vl@samba.org>
If ctdb_lock_helper cannot get a lock within 10 seconds, ctdb daemon
logs a message and invokes an external debug script. This is repeated
every 10 seconds.
In case of a contention or on a loaded system, there can be multiple
ctdb_lock_helper processes waiting to get lock on record(s). For each
lock request taking longer, ctdb daemon will flood the log every
10 seconds. Instead of logging aggressively every 10 seconds, relax
logging to every 100s and 1000s if the elapsed time has exceeded 100s
and 1000s respectively.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
Autobuild-User(master): Michael Adam <obnox@samba.org>
Autobuild-Date(master): Thu Mar 5 12:06:44 CET 2015 on sn-devel-104
When the daemon is able to take the recovery lock during recovery we
might as well guess that the cluster filesystem has a lock coherence
problem and print a more useful message. This will be more helpful to
those trying out cluster filesystems that don't have lock coherence or
that are difficult to setup.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Just continue to hold it, otherwise a broken node might win an
election and grab the lock.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Have it just silently take or fail to take the lock, except on an
unexpected failure (where it should log an error).
This means that when it is called we need to keep the old behaviour
and explicitly release the lock. In do_recovery() the lock is
released and a message is printed before attempting to take the lock.
In the daemon sanity check the lock must be released in the error path
if it is actually taken.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This has not done anything useful since commit
b9d8bb23af. Instead, just check that
the lock is held.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Unlock the recovery lock file. This way knowledge of the file
descriptor isn't sprinkled throughout the code.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
It is pointless having a recovery lock but not sanity checking that it
is working. Also, the logic that uses this tunable is confusing. In
some places the recovery lock is released unnecessarily because the
tunable isn't set.
Simplify the logic by assuming that if a recovery lock is specified
then it should be verified.
Update documentation that references this tunable.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
If the recovery lock file is unset then this dereferences a NULL
pointer. The regression is due to commit
6f1ac7af0f.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Print out the errno if the fcntl call.
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Richard Sharpe <rsharpe@samba.org>
Autobuild-User(master): Michael Adam <obnox@samba.org>
Autobuild-Date(master): Fri Jan 9 04:25:02 CET 2015 on sn-devel-104
Log a message when the reclock file actually changes and avoid a
memory allocation when it doesn't change.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
ctdb_sys_find_ifname() doesn't work for IPv6 addresses so don't use
it.
Trust the eventscript to do sanity checking on the interface. Current
warnings are replaced with equivalents generated by the eventscript.
The unlikely message:
Public IP %s is hosted on interface %s but we have no VNN
will be replaced by:
WARNING: Public IP %s hosted on interface %s but VNN says __none__
which is clear enough.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Processing one migration request at a time is very slow and processing
a batch of records can take longer than VacuumInterval. This causes
subsequent vacuum fetch requests to be dropped. The dropped records
can accumulate quickly and will cause the vacuum database traverse to
be quite expensive.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Fri Dec 5 17:06:58 CET 2014 on sn-devel-104
Such records should be processed by the local vacuuming daemon to ensure
that all the remote copies have been deleted first.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
This avoids vacuuming getting in the way of ctdb daemon to process
record requests.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
This avoids vacuuming getting in the way of ctdb daemon to process
record requests.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
This prevents multiple child processes being forked at the same time
for vacuuming TDBs.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>