IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Sun May 10 06:10:21 CEST 2015 on sn-devel-104
Don't initialise it after ctdb_event_script_callback_v() may have
short-circuited. This can stop ctdb_event_script_args() from ever
terminating.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
status.done is never set to true unless event_script_callback() is
invoked. The short-circuit in ctdb_event_script_callback_v() means
that this doesn't happen. CTDB can't work very well without 00.ctdb
(for tunable initialisation and the like) but it shouldn't get stuck.
So call the callback when there are no scripts in
event_script_callback().
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This is done by 10.interace where the monitor event fails when there
is a missing interface. The in-daemon interface checking adds no
value.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
If all nodes are still in, say, FIRST_RECOVERY runstate, then the logs
contain unfortunate noise like:
recoverd:Failed to find node to cover ip 10.0.2.131
This avoids that by adding an early exit that avoids running
takeover_run_core() when there are no nodes in the
CTDB_RUNSTATE_RUNNING.
To support this add the runstate to the ipflags structure. There are
clearly other ways of hacking this but this seems the simplest.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
It isn't possible to hold the recovery lock without having a lock file
set.
This is part of a goal to generalise the recovery lock mechanism to
just use a helper program, which may use a lock file or may use
something else.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Election packets from the current node are ignored at the beginning of
the function, so this does not need to be checked.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
No need to just send it to the recovery master.
This reduces the need for main daemon code to know which node is the
recovery master. The end goal is for the main daemon to not need to
know which node is the recovery master - this information would be
stored in the recovery daemon (and subsequently a separate cluster
management daemon).
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Databases are only pulled by the recovery master, so it can compare
with current node PNN.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Unless this node is the recovery master then this is not needed.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
It isn't used anywhere else and is always re-initialised to 0.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
They are initialised and updated but the values are never used.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Simplify update_capabilities() using the capabilities API and store
the capabilities in new field rec->caps rather than scattered around
ctdb->nodes.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This reverts commit 39d2fd330a.
An election can occur in the middle of a recovery. During the
election the recovery master can change. When a node loses a round of
the election and stops being the recovery master it releases the
recovery lock. Then at the end of the ongoing recovery all nodes are
able to take the recovery lock so they will all abort.
The most likely cause for a change in recovery master is that several
(all?) nodes are starting up and the "connected-ness" of each node is
a primary factor in winning the election. In this situation the
recovery master can bounce around the cluster.
The simplest solution is to revert this patch so that the recovery
will fail. The new recovery master will then start a new recovery.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Mon May 4 10:40:36 CEST 2015 on sn-devel-104
Due to usage of CTDB_NO_MEMORY macro,
some of the resources are not freed in failure cases.
Signed-off-by: Rajesh Joseph <rjoseph@redhat.com>
Reviewed-by: Guenther Deschner <gd@samba.org>
Reviewed-by: Michael Adam <obnox@samba.org>
Autobuild-User(master): Günther Deschner <gd@samba.org>
Autobuild-Date(master): Fri Apr 17 16:49:05 CEST 2015 on sn-devel-104
Presumably this was done to minimise the chance of a recovery
occurring while the nodemaps are inconsistent across nodes.
Another potential theory is that the forced recovery in the
ctdb.c:control_reload_nodes_file() stops another recovery occurring
for ReRecoveryTimeout seconds, so this delay causes the reloads to
occur during that period.
This is no longer necessary because recoveries are now explicitly
disabled while node files are reloaded.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The potential resulting recovery won't run anyway. Also recoveries
may have been disabled by "reloadnodes" and if the nodemaps are
inconsistent between nodes then avoid triggering an unnecessary
recovery.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Factor out new function srvid_disable_and_reply(), which can be
re-used.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This can be used to disable and re-enable an operation, and do all the
relevant sanity checking.
Most of this is from existing functions
disable_takeover_runs_handler(), clear_takeover_runs_disable() and
reenable_takeover_runs().
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This is like CTDB_CONTROL_GET_NODEMAP but it loads from the nodes file
instead of the daemon.
Also new client function ctdb_ctrl_getnodesfile()
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
New function ctdb_read_nodes_file() reads a nodes file into a node
map, which is a useful intermediate format. This function should
replace the node reading code in the ctdb CLI tool. It will also be
useful for sanity checking of nodes files across the cluster.
New function convert_node_map_to_list() converts a node map to a node
array (and associated node count). This fills in the details that
aren't present in the node map. This may also useful as a separate
function later if node list reloading stages the data after a sanity
check - the approach is not yet finalised.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Every time a nodemap is contructed the node IP addresses all need to
be parsed. This isn't very productive use of CPU.
Instead, parse each string once when the nodes file is loaded. This
results in much simpler code.
This code also removes the use of ctdb_address. Duplicating the port
is pointless without an abstraction layer around ctdb_address. If
CTDB gets an incompatible transport in the future then add an
abstraction layer.
Note that the infiniband code is not updated. Compilation of the
infiniband code is already broken. Fixing it will be a separate,
properly tested effort.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Having it require a CTDB context stops ctdb_parse_address() from being
used in more generic code. Just use the existing talloc context for
memory allocations.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Just add a flags parameter to ctdb_add_nodes() and use the same code.
Less is more.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This is currently set in 2 places. One of them makes the node loading
code difficult to refactor. Also, when the surrounding code in either
place is touched then it might get broken.
This only needs to be done once at startup, not on every reload. So
do it once in a very obvious way, sacrificing a few CPU cycles for
some added clarity.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Each node reload unnecessarily and incorrectly resets the VNN map,
causing a potentially unnecessary recovery. When nodes are reloaded
any newly deleted nodes should already be disconnected and any newly
added nodes should also be disconnected. This means that reloading
the nodes file should not cause a change in the VNN map.
The current implementation also leaks memory every time the nodes are
reloaded.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Michael Adam <obnox@samba.org>
Autobuild-User(master): Michael Adam <obnox@samba.org>
Autobuild-Date(master): Fri Mar 13 19:14:20 CET 2015 on sn-devel-104
scandir allocates every name individually, see example code in susv4 or man
scandir
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Michael Adam <obnox@samba.org>
The use of talloc with a static variable is somewhat confusing.
Statically allocate an array and use ctdb_set_helper() instead.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Volker Lendecke <vl@samba.org>
If ctdb_lock_helper cannot get a lock within 10 seconds, ctdb daemon
logs a message and invokes an external debug script. This is repeated
every 10 seconds.
In case of a contention or on a loaded system, there can be multiple
ctdb_lock_helper processes waiting to get lock on record(s). For each
lock request taking longer, ctdb daemon will flood the log every
10 seconds. Instead of logging aggressively every 10 seconds, relax
logging to every 100s and 1000s if the elapsed time has exceeded 100s
and 1000s respectively.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
Autobuild-User(master): Michael Adam <obnox@samba.org>
Autobuild-Date(master): Thu Mar 5 12:06:44 CET 2015 on sn-devel-104