IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
This is undocumented and is not needed. It was a workaround for
trying to ensure public IP addresses are properly rebalanced after
running "ctdb addip" on multiple nodes. "ctdb reloadips" is a better
solution.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
These have been scattered around the code so that
tevent_loop_allow_nesting() can be called. However, only the main
daemon and some tests currently use nested event loops.
TEVENT_DEPRECATED is already defined in the places where it is needed.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Fri Feb 26 07:11:29 CET 2016 on sn-devel-144
The "natgwlist" command is no longer marked "auto all" and is also
marked "without daemon". That latter is not strictly true because
ctdb_natgw needs the daemon so a subsequent invocation of "ctdb
nodestatus" will work. However, "without daemon" is used here because
the top-level "ctdb natgwlist" does not need to open a connection to
the daemon. It just needs to invoke ctdb_natgw.
Update tests to suit.
It would make sense to make "ctdb natgw" generally call out to
ctdb_natgw, passing all argument. However, that can be done later.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This is intended to replace the use of "ctdb natgwlist" in 11.natgw
and provide different views of the NAT gateway status.
It replaces the use of CTDB_NATGW_SLAVE_ONLY=yes with a "slave-only"
keyword in the NAT gateway nodes file. This means the nodes file must
be consistent on all nodes in a NAT gateway group.
Note that this script is not yet integrated, so there are no behaviour
or documentation changes.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This is to avoid clash with samba structure server_id.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
This groups function prototypes for common client/server functions in
common/common.h and removes them from ctdb_private.h.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
When building standalone ctdb from git repo, samba_version_file correctly
includes git sha in VERSION string. When building standalone ctdb from
tarball, samba_version_file puts UNKNOWN in the VERSION string.
Use the packaged include/ctdb_version.h file to set the correct git sha.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Instead of includes.h, include the required header files explicitly.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
This groups function prototypes for system specific functions in
common/system.h and removes them from ctdb_private.h.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
The databases are repacked automatically during vacuuming when the
freelist size grows beyond configured threshold (RepackLimit).
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
The same structure is required in new controls for database transactions.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
In this case: ctdbd_wrapper, onnode, ctdb_diagnostics, ctdb.sudoers.
Set sensible defaults from configure options.
Update documentation to match, trying to fix up anything that has been
missed before.
The onnode unit tests need a symlink to the functions file.
The simple integration tests need to set CTDB_BASE and also
need symlinks to functions/nodes.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Jose A. Rivera <jarrpa@samba.org>
Reviewed-by: Michael Adam <obnox@samba.org>
fixup
Signed-off-by: Martin Schwenke <martin@meltin.net>
This hasn't existed for a long time.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Jose A. Rivera <jarrpa@samba.org>
Reviewed-by: Michael Adam <obnox@samba.org>
Just use ctdb_tcp_connection. It is the same. There are no external
users.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Volker Lendecke <vl@samba.org>
The timed out error is ignored for certain events (start_recovery,
recoverd, takeip, releaseip). If these events time out, then the debug
hung script outputs the following:
3 scripts were executed last releaseip cycle
00.ctdb Status:OK Duration:4.381 Thu Jul 16 23:45:24 2015
01.reclock Status:OK Duration:13.422 Thu Jul 16 23:45:28 2015
10.external Status:DISABLED
10.interface Status:OK Duration:-1437083142.208 Thu Jul 16 23:45:42 2015
The endtime for timed out scripts is not set. Since the status is not
returned as -ETIME for some events, ctdb scriptstatus prints -ve duration.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
This follows the same pattern as the tstore command, and it allows
specifying key strings with a trailing \0 character.
Signed-off-by: Christof Schmitt <cs@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
Autobuild-User(master): Jeremy Allison <jra@samba.org>
Autobuild-Date(master): Mon Jul 6 23:23:22 CEST 2015 on sn-devel-104
Memory allocated by ctdb_sys_find_ifname is not
freed by the caller.
Signed-off-by: Rajesh Joseph <rjoseph@redhat.com>
Reviewed-by: Michael Adam <obnox@samba.org>
A recovery is not required: when deleting a node it should already be
disconnected and when adding a node it will also be disconnected. The
new sanity checks in "reloadnodes" ensure that these assumptions are
met.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
If a recovery occurs when some nodes have reloaded and others haven't
then the nodemaps with be inconsistent so bad things will happen.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The code was too "clever". The 4 different cases should be separate.
The "node remains deleted" case doesn't need the IP address comparison
(always 0.0.0.0) or the disconnected check.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
There is no reason to serialise these or even handle remote nodes
first. Using a broadcast is more efficient and is less code.
Update expected test results to reflect changed order of messages.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Mon Mar 23 15:04:00 CET 2015 on sn-devel-104
"ctdb reloadnodes" currently does no sanity checking of the nodes
file. This can cause chaos if a line is deleted from the nodes file
rather than commented out. It also repeatedly produces a spurious
warning for each deleted node, even if the node was deleted a long
time ago.
Instead compare the nodemap with the contents of the local nodes file
to sanity check before attempting any reloads. Note that this is
still imperfect if the nodes files are inconsistent across nodes but
it is better. Also ensure that any nodes that are to be deleted are
already disconnected. Avoid trying to talk to deleted nodes.
The current implementation is a bit unfortunate when it comes to
deleting nodes. The most obvious alternative to the above complexity
would be to reloadnodes on the specified node first, then fetch the
node map (in which newly deleted nodes would be marked as such) and
then handle the remote nodes. However, the implementation of
reloadnodes is asynchronous and it only actions the reload after 1
second. This is presumably to avoid the recovery master noticing the
inconsistency between nodemaps and triggering a recovery before all
nodes have had their nodemaps updated.
Note that this recovery can still occur if the check is done at an
inconvenient time. A better long term approach might be to quiesce
the recovery master checks while reloadnodes is in progress.
Update a unit test to reflect the change.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This compares the nodes file on the current node with that on all
nodes. If any are different then do not reload nodes.
If any nodes files can't be fetched then do not reload nodes. This
could be because some nodes are running an older version without this
feature. This is unsupported: why make a major cluster
reconfiguration while a cluster is half upgraded?
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
It should not be possible to specify "-n <othernode>", unless
<othernode> is the current node. To support this, add new function
assert_current_node_only().
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
In the CTDB CLI tool source code and the documentation example.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
To support this, update printm() to replace ':' in format string with
options.machineseparator, which is a string but must contain a single
character.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
printm() is a printf(3) replacement and must be used to printing any
machine readable output. It currently just calls vprintf(3). Later
it will change the field delimiter.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Put declarations into ctdb_logging.h, factor out some common code,
clean up #includes.
Remove the check so see if the 1st character of the debug level is
'-'. This is wrong, since it is trying to check for a negative
numeric debug level (which is no longer supported) and would need to
be handled in the else anyway.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Found by address sanitizer.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Fri Oct 17 12:56:02 CEST 2014 on sn-devel-104
As far as we know, nobody uses this and it just complicates the
logging subsystem.
Remove all ringbuffer code and documentation. Update the local
daemons startup code correspondingly.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Volker Lendecke <vl@samba.org>
This makes it consistent with Samba, to ease transition.
Update unit test code to link to with tdb_wrap instead of including
db_wrap.c.
There are some potential whitespace fixes in this commit that have
been ignored. CTDB's lib/tdb_wrap will be deleted after the
transition to Samba's lib/tdb_wrap, so there's no point polishing it
too much.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This function is only used in this file. Samba's lib/util doesn't
have timeval_delta(), so staging a clean transition.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This is part of a migration to Samba's lib/util. CTDB always passes 0
(i.e. no max_size) so use a simple assert() to enforce this, rather
than changing a lot of code that will be discarded anyway.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This effectively reverts commit 442953c540.
The correct way of telling recovery daemon to trigger a database recovery is
by setting recovery mode to active. There is no need to freeze databases as
recovery master will do that across the cluster anyway.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Recent changes have caused these commands to attempt to get
capabilities from all nodes before doing further filtering. This
means that capabilities are unnecessarily fetched from nodes that are
unlikely to be the master. If such a node does not answer the control
then many nodes can fail to calculate the master node. In the case of
natgwlist this will cause "monitor" events to fail resulting in
unhealthy nodes.
Restore the behaviour where capabilities are only fetched for a node
that will be the master if it has the desired flags.
Although this masks a problem where a connected node is not replying,
it can help to avoid an outage in some cases.
Add supporting tests and infrastructure. Infrastructure just lets a
timeout be faked - just for ctdb_ctrl_getcapabilities_stub() so far.
First test checks that this infrastructure works if the first node
times out in natgwlist. Second test checks the case worked around by
the above fix - that is, no failure when a node with PNN beyond the
NATGW master can time out.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Thu May 29 05:59:37 CEST 2014 on sn-devel-104
script_status->num_scripts is used as the count in this message:
"%d scripts were executed last %s cycle\n"
However, script_status->num_scripts includes disabled scripts, which
are never actually executed.
Instead, count the number of scripts that aren't disabled and make the
message print that.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Wed May 28 02:27:48 CEST 2014 on sn-devel-104
Now freeing ctdb_db context will close the tdb database. So make sure
all the locks are released (by freeing record handles or memory context
from which record handles are allocated) before freeing ctdb_db context.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
This makes sure that AllowClientDBAttach is set to 0 before detaching any
databases.
If someone enables the tunable between checking of tunable and actual
detaching of databases, then they deserve what they get. :-)
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
Commit ba69742ccd missed the point of
filtering disconnected nodes while limiting the nodemap to those in
the NAT gateway group. It was really to avoid trying to fetch
capabilities from disconnected nodes. This should be explicitly done
in filter_nodemap_by_capabilities(), otherwise "ctdb natgwlist" simply
fails when there is a disconnected node.
Note that the alternate solution where filter_nodemap_by_flags() is
called before filter_nodemap_by_capabilities() would not be not
correct. Filtering on flags first can produce a "healthier" set of
nodes where none of them have the NAT gateway capability.
Also extend stub for ctdb_ctrl_getcapabilities() to fail when trying
to get capabilities from a disconnected node and add a corresponding
test to confirm that "ctdb natgwlist" is no longer broken.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This way this logic is centralised. It also means that the IP address
comparisons in the NAT gateway code are IPv6 safe.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
There is a check for empty lines in the loop just below.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Factor it from read_nodes_file(). Use it there and in
read_natgw_nodes_file().
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Add another filter function, like the ones for capabilities and flags
to, for filtering by NAT gateway nodes. This makes the main
natgw_list function more readable.
Note that this drops the early filtering of disconnected nodes, so
they will now be listed in a NAT gateway group. This makes sense.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The index of the nodes array in nodemap isn't the PNN.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Instead of just finding the first node that doesn't have any flags in
flag_mask set, change it into a function that filters a nodemap to
exclude nodes with the given flags.
This makes the NATGW code simpler but also provides a function that
can be used in other code.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Check capabilities once to build a filtered node list instead of
repeatedly checking capabilities
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
It looks like the original without_daemon code still tried to
establish a client connection to the daemon. Closing stderr looks to
be a cheap way of hiding the errors when this failed.
However, later cleanups avoid the client connection altogether, so do
not close stderr. Now debug output from without_daemon commands
actually appears.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Fri Jan 31 07:52:46 CET 2014 on sn-devel-104
If a node isn't numeric then it is silently converted to 0.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
This fixes an alignment discrepancy on 32-bit vs 64-bit platforms.
sizeof(struct ctdb_ltdb_header) = 20 (32-bit)
= 24 (64-bit)
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
This is naive and assumes no performance problems when updating
persistent DBs. It also does no error handling.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
Also add test.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
This can be useful for piping data to onnode in certain circumstances.
There are now also enough command-line options that they should
definitely be alphabetically ordered.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
When running a mixed version cluster, compatibility with older
versions was was broken during recent refactorisation.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
Ban time of 0 is not supported.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit c072eb1f6488f94f83a6d3a81d88bf29ad866943)
This bit-rotted a long time ago when the "ThisNode" column was added
to "ctdb -Y status" output. The fake "ctdb -Y status" output in the
test was never updated to reflect this change.
Instead of making sure that all columns are "0", just check that
they're not "1". This implicitly ignores "Y" and "N" in this
"ThisNode" column without having to do anything else clever.
Also update associated tests. The main "ctdb ok" test had a duplicate
opening line for a here document, which was tickled by this change.
This fixes samba bz#8122.
Signed-off-by: Martin Schwenke <martin@meltin.net>
onnode test fixup
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 01a46205c3a3d6609dc0b0324319b89667dffa32)
Ensure that environment variable CTDB_BASE is set.
Update defaults for nodes and natgw_nodes to use CTDB_BASE.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 2b6dc0d2799f3563b767622b6f9246450aa4036b)
This command was added to test persistent database recovery with sequence
numbers. With the new persistent transaction code, sequence numbers get
updated automatically, so there is no need for this command.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 14bfd22fad1a5fd27eede1be7fccbaed9466e13e)
ret is initialised too early and is clobbered by the call to
ctdb_ctrl_getcapabilities(). Initialising it later means that the
function returns -1 when no LVS master is found.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 3296559c43e70f755fcf2c06677891e0319c8142)
Apparently it used to mean a permanent ban but it is unclear if this
was ever supported.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit c8a6e5ce579e2fe320c40268e7e9ddfe68b8cd30)
This means that takeover runs will be disabled for about as long as the
reloadips control can take to complete.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 6d44657a5e5b0df22bab2d487a503dd1c5ba79b4)
There's no reason why specifying a node should be compulsory. This is
a cluster-wide operation because it is implemented by the recovery
master so multiple nodes should not be specified using -n. However,
the command should be able to specify multiple nodes so let it have
its own nodestring argument.
This change should be backward compatible with the old requirement of
specifying a single node via -n.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 0846c00597adb66bba8c9dbf63443d0c2f91a7d1)
Use a broadcast instead of trying to win the race of determining the
recovery master and then sending the message before the recovery
master changes.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit ac946ee4ad01b1e5cd1006930b9f8a190a0a58ba)
This implementation disables takeover runs on all nodes before trying
to reload IPs. It also takes "all" or the list of PNNs as an argument
to the command instead of to -n. -n can still be specified with a
single node indicating that node should be considered the current node
- that might be confusing so could be removed.
This implementation does not use CTDB_SRVID_RELOAD_ALL_IPS, so it can
be removed.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit d66a072d9b120c78c47e726e9f29a3c1cfdd87ce)
This will be useful for other SRVIDs.
The error checking in the handler depends on the SRVID responding with
a uint32_t where <0 indicates an error and >=0 is a PNN that
succeeded.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 52050e1c75b21961dafe2bc410268b44240ab24e)
Instead of the current global variable. This is in anticipation of
abstracting the code.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit c58ee0eddf7ae3283e3ca8bd25575e6e677e1b17)
No need for a separate one for each SRVID.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit d9c22b04d5aa7938a3965bd3144568664eb772ce)
list_of_active_nodes_except_pnn() is only used here and can be removed
if we remove this call. Less is more...
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit d4e206fb818048b7fab4797c877b854bdbb1ab70)
The useful cases are either CTDB_CURRENT_NODE, in which case
ctdb_get_pnn() does the job, or a PNN, which is... ummm... a PNN! :-)
This works because parse_nodestring() validates PNNs.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 7b3f7eea2465efb099a2faf3e42174bc97b13a16)
* ipreallocate is cluster-wide so should not be auto-all
* enablescript, disablescript, getreclock, setreclock, natgwlist can
all be auto-all without issues
* xpnn, ipiface a local-only so don't work with -n, so might as well
not be auto-all
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 123a4677528cb46bee1c6dad8a5162eba9880bc1)
This has the side effect of making these commands more resilient to
control timeouts.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 0fe79662e20e347d9e1cb12a42cd356e33572402)
Now we will only have one set of bugs. :-)
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 444521c852749558f39dc6131acce9e47eefd489)
Having other functions call control_ipreallocate() suggests that the
it might look at the argv/argv arguments that are passed. This is not
the case. Change the callers so they call the new ipreallocate()
function instead.
Broadcast CTDB_SRVID_TAKEOVER_RUN to all connected nodes. Inactive
nodes will ignore it. This is safe since we only want 1 reply. If we
didn't get a response, we don't actually care if there's no active
recovery master - just fire, wait, retry, ...
Ignore some failures on the basis that they might be transient, so it
is probably worth retrying.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 4bf0b1c9d21986eecb7682f935bd6154c65533cc)
This has already been stored at connect time and can't fail.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit d8eb2e7fdd7645719370dad4f2faa5c3fffa8249)
The current 3 second timeout is arbitrary and users trip over it
sometimes.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit b49c4f39666d5b1596213bf41bcdc47ed3c327ae)
This will allows eventscripts to send information about multiple tcp
connections to a single "ctdb killtcp" command, saving the overhead of
setting up a client connection per tcp connection.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit af5aa369c266430fe912df0c26116b68bac3572e)
At the moment there is no easy way to force a recovery when attempting
to reproduce certain classes of bugs. This option is added without
documentation because it is dangerous until the bugs are fixed! :-)
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 4f87925a287f612a6ab3b5da1a387a31c7bea28f)
This avoids premature exits from "ctdb stop" and "ctdb continue" due to
intermittent control (e.g. getpnn, getnodemap) timeouts.
This needs a proper fix to distinguish between timeout and failure
conditions and take appropriate action.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit c48583fd238496a81ddc46a21892f0b49559036a)
Otherwise callers can't tell the difference between some other failure
(e.g. memory allocation failure) and an unknown tunable.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 03fd90d41f9cd9b8c42dc6b8b8d46ae19101a544)
This adds more serialisation to the startup, ensuring that the
"startup" event runs after everything to do with the first recovery
(including the "recovered" event).
Given that it now takes longer to get to the "startup" state, the
initscript needs to wait until ctdbd gets to "first_recovery".
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit ed6814ff0a59ddbb1c1b3128b505380f60d7aeb7)
If one or more run states are specified then "ctdb runstate" succeeds
only if ctdbd is in one of those run states.
At the moment, if the "setup" event fails then the initscript succeeds
but ctdbd exits almost immediately. This behaviour isn't very
friendly.
The initscript now waits until ctdbd is in "startup" or "running" run
state via the use of "ctdb runstate startup running", meaning that ctdbd
has successfully passed the "setup" event.
The "setup" event code in 00.ctdb now waits until ctdbd is in the
"setup" run state before proceeding via the use of "ctdb runstate setup".
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 4a2effcc455be67ff4a779a59ca81ba584312cd6)
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit bf20c3ab090f75f59097b36186347cedb1c445d4)
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 9e7b7cd04adc5e66e2ffa4edf463a682aaea379b)
This code tried to find the recovery master and send an ipreallocate
request to that node. When a node is stopped, this code asked the
stopped node for recovery master. Stopped node does not have up-to-date
information on the current recovery master. So ipreallocate requests
were sent to the wrong node and ignored by that node which is not the
recovery master.
Send ipreallocate request to all active nodes. That way we guarantee
that the current recovery master will see it and respond to it.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-Programmed-With: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 0577ce3c68e4febf49a1ef5093e918db9d5ec636)
This avoids clash with version.h from Samba tree.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit d18fcfff674e876abde8d51afec92d9c4a090d2f)
Also, include description of -e option in usage.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 35264e42ade4676468cf7713fa339c784e932953)
Moving the IP is an optimisation so should not cause failure.
Refactor and simplify the retry-move-IP into new function
try_moveip().
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 5402f85dde045576cbaf64e01c68e28ed52204e8)
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit d1ec06d30148e6fd344625a2fbf1c22391bd908a)
Most of the commands related to database operations can now use the
common code (db_exists()) to refer to database with either name or id.
In addition to return db_id for db_name, the function returns all the
flags set for the database.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit ca6e7eccc90f2869c220231666bf284798342bce)
This fixes the wrong code where same variable 'ret' is used to track the pnn
and the return value of a function call.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 718233c445cd6627ab3962b6565c2655f1f8efd0)
We don't need extra commands for these.
Also, allow a default value of NOTICE for the getlog level.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 7197e600f46f2d1638f6c45c0149f109ea25a47c)
This adds commands rdgetlog and rdclearlog
These are analogous to getlog and clearlog but operate on the logs for
the recovery daemon.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit ef55e06192819d840c09b65741bab737223ac34c)
* Factor out repeated code into new function find_natgw()
* Support both machine and human readable output
* Use libctdb
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit a56ec75edd1705b0539513d396d311f0e80a3bf5)
control_getcapabilities(), control_lvs(), control_lvsmaster() updated
to use ctdb_getcapabilities(), ctdb_getnodemap() as appropriate.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit c30ec02615183ecf9b412ad415bf1abd859aec45)
This used to catch trailing blank lines. However, these are caught
just as effectively by the whitespace filtering in the loop below.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 7b75a3bb722dc86139b1a07a0100d08c34620b91)
The first line is currently human readable and the rest is machine
readable. This doesn't make sense. Do one or the other...
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit b29d5bbaa7048291c4b3a39bf12e04f0436f67da)
It is already in 2 places and we might use it in another.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 12a0a7a208d1c8fa8991894200d1dc133f3a2d1a)
A list of files is given rather than a command. These files are
pushed to the specified nodes.
Quoting is fragile/broken so filenames with spaces won't work - you
win some, you lose some. :-)
All of the other onnode options should work together with this option.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit aed9b98ddbbf3e81de4f7257a10676565f7d7507)
Originally, "ctdb cattdb" attached explicitly as non-persistent, which
is now forbidden for persistent databases by the server.
Pair-Programmed-With: Gregor Beck <gbeck@sernet.de>
(This used to be ctdb commit 85a367005bd669309bb7e532b60d27621110180d)
Also add a method to use the recovery master/daemon to reload the public ips on all nodes in the cluster.
Reloading the public ips on all node sin the cluster is only suported if all nodes in the cluster are available and healthy.
(This used to be ctdb commit 05603e914f8c12618d7e06943c0f7df207f645b0)
This can improve performance and stop clients from having to chase a rapidly migrating/bouncing record
(This used to be ctdb commit d0d98f7e45e5084b81335b004d50bddc80cdc219)
If used with -n <nodes> the "current" node needs to change.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit a0340a50c2acd9ccc281faef032a364254f7f95a)
Everytime we give a delegation to another node we count this as one delegation.
If the same record is delegated to several nodes we count one for each node.
Everytime a record has all its delegations revoked we count this as one revoke.
(This used to be ctdb commit b098bcf8007be63889aaed640a951b0eeaa9d191)
An old, buggy version of this code was merged. This fixes it.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit bc4d5d5f0048487776f9f5d9f04a0af2e5d45aac)
parse_nodestring() checks what this used to check. parse_nodestring()
already has the nodemap.
It was a 50-50 decision to decide whether to update verify_node() to
check against a nodemap that is passed in or to delete it.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Conflicts:
tools/ctdb.c
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 2c1baf5ddd3c2cdd2a3f98b8e208d3a48530d1d1)
This is very much like "ctdb status" but actually returns the status
for use in scripts.
This can be used in 2 modes:
* An optional nodestring is passed directly to the command without
using -n. In this mode the current node is asked for the status of
all listed nodes.
* The nodestring is passed via -n. In this mode the designated nodes
will be asked for their status. This is like "ctdb status".
These modes can be mixed. For example:
ctdb nodestatus 1,2,3 -n 0
asks node 0 for the status of nodes 1, 2 and 3, returning the bitwise
OR of their statuses.
This version uses the auto_all functionality, so the output isn't
necessarily pretty. An improved version that does its own -n
processing may appear soon.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 355685a14be17bf4648788f8a72c54790fe03502)
Create 2 new functions: control_status_1_machine() and
control_status_1_human() that contain chunks of code from
control_status(). We're about to find another purpose for these
functions.
This should be a no-op.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit fade71539482e8276f57ba3c003fe004d8666ce7)
Centralise -n nodestring parsing and add the ability to pass a
comma-separated list of node numbers. Listing a node that is
disconnected or deleted results in failure, similar to the way passing
a single node currently works. All of the auto_all commands inherit
this functionality. For now, the non-auto_all commands do not inherit
this - they need to be individually tweaked. Therefore, we haven't
updated the documentation to advertise this feature.
Implemented via a new function parse_nodestring() that parses an
optional (pass NULL when not available to indicate "current node")
comma-separated list of node numbers or "all". parse_nodestring() can
be told to be non-fatal for disconnected/deleted nodes so it can also
be used in other contexts (yes, coming soon). main() is changed to
call this function.
A new magic PNN value CTDB_MULTICAST is added and along with a
corresponding option.nodes structure member (a talloc-ed array of
PNNs). This is also populated for "all" as well.
control_status() has new function pretty_print_flags() factored out so
pretty-printed flags can be used in error/debug messages. New
function is_partially_online() is also factored out - this simplifies
some of the logic.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 920e3a732eb9e09004edde6cfb3c7db8a004016f)
This puts the parsing and checking logic close together. This makes
it easy to change the parsing code. Changed parsing code can now
easily use both old client code and libctdb since both are guaranteed
to be setup.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 57fb074a65dc56168fc3813b79a5bab4b3727cf3)
It's the only one in the file.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit bf1174ef699b06485b36ee8ae70412be0759e142)
This allows all that logic to be hacked a little more easily.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit f93ffeee7b9e9ca5dd116655bdc7f89fc987ed8a)
Most of the action in main() happens inside a for loop and an if
statement. This causes 2 levels of extra indent for the code and
makes it harder to read.
Instead, the current body of the loop is put below the loop and its
corresponding failure check.
To see how small this change really is, view with "diff -w". ;-)
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 8527396b7290cfc8378779631e91d2ae09e2a106)
This didn't have auto_all set as true. However, there's no special
code to handle "-n all" and it just fails. If auto_all works for
status then it might as well work for scriptstatus.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 3084220e2aac3664511969f10cad206e505150a0)
This patch changes the callback signature for traversal
functions to allow a client to abort a traverse before it finishes.
Updates to all callers and examples as well as rb-test tool.
(This used to be ctdb commit 8ab0c63ad36cfbbb1e5fed46a1f4c47b1fdb581f)
This case was never tested and fakessh obviously won't handle the
extra arguments.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 02184bd5b9ab94cdf2b9ff92e56a509f92f9e4aa)
Current behaviour is for onnode to timeout (for about 20s) for each
attempted ssh to a down node. With 40 or 50 invocations of onnode
this takes a long time.
2 changes to work around this:
* If EXTRA_SSH_OPTS (which is passed to ssh by onnode) does not
contains a ConnectTimeout= setting then add a setting for a 5 second
timeout.
* Filter the nodes before starting any diagnosis, taking out any "bad
nodes" that are uncontactable via onnode.
In the nodes summary at the beginning of the output, print
information about any "bad nodes".
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 8c3b6427dbaade87e1a0f5590f0894c2e69b31a3)
Add option -e to get the old behaviour and process empty records too.
Signed-off-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit d9859540c2000864bc6c58be5afe19aa3b1064b2)
1/0 is unsuitable since it can be useful to check 'if a column is "1" there is something wrong with that node'
(This used to be ctdb commit b963f5e40b1e73a60363568da88557cad9e58a28)
Regular "ctdb status" output flags which node is the local node, do the
same for machine readable output.
(This used to be ctdb commit 3885141f37724b3dea61b45fbac38489ec356588)
Following connection to the local ctdbd, ctdb_cmdline_client() currently
issues a CTDB_CONTROL_GET_PNN request with a fixed 3 second timeout.
The ctdb cmd line client accepts a --timelimit argument for specifying
a per request timeout, pass this value through to ctdb_cmdline_client()
for use as a CTDB_CONTROL_GET_PNN request timeout.
(This used to be ctdb commit 0634d0305f42f17048b6830733767e8dc300e11c)
let all databases default to not support this until enabled through this control
(This used to be ctdb commit 908a07c42e5135a3ba30a625fc4f4e4916de197a)
the persistent flag.
This is the same size as the original boolean but allows ut to add additional flags for the database
(This used to be ctdb commit 7462761638d25880ad46024ad4ef21667eb99a98)
Lines in machine readable output always start with a colon (':').
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit a779d83a6213e2ba21621f7e090964428f89422d)
The API for this function has changed since the 1.2 branch where readonly locks are being merged from
(This used to be ctdb commit d01b9716d3e50f4c6d102e8411f0401b0f499699)