1
0
mirror of https://github.com/samba-team/samba.git synced 2025-03-03 12:58:35 +03:00

5573 Commits

Author SHA1 Message Date
Martin Schwenke
0858b11ff7 ctdb-tests: Use ctdb_node_list_to_map() in tool stubs
Drop copy of old ctdb_control_nodemap().

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue Apr  7 10:20:41 CEST 2015 on sn-devel-104
2015-04-07 10:20:41 +02:00
Martin Schwenke
1ef1cfdc4d ctdb-common: Move ctdb_node_list_to_map() to utilities
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-04-07 07:43:13 +02:00
Martin Schwenke
dd52d82c73 ctdb-daemon: Factor out new function ctdb_node_list_to_map()
Change ctdb_control_getnodemap() to use this.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-04-07 07:43:13 +02:00
Martin Schwenke
ffbe0a6def ctdb-tools: Drop the recovery from "reloadnodes"
A recovery is not required: when deleting a node it should already be
disconnected and when adding a node it will also be disconnected.  The
new sanity checks in "reloadnodes" ensure that these assumptions are
met.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-04-07 07:43:13 +02:00
Martin Schwenke
d340f308e7 ctdb-daemon: Don't delay reloading the nodes file
Presumably this was done to minimise the chance of a recovery
occurring while the nodemaps are inconsistent across nodes.

Another potential theory is that the forced recovery in the
ctdb.c:control_reload_nodes_file() stops another recovery occurring
for ReRecoveryTimeout seconds, so this delay causes the reloads to
occur during that period.

This is no longer necessary because recoveries are now explicitly
disabled while node files are reloaded.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-04-07 07:43:13 +02:00
Martin Schwenke
85bd9a33eb ctdb-recoverd: Avoid nodemap-related checks when recoveries are disabled
The potential resulting recovery won't run anyway.  Also recoveries
may have been disabled by "reloadnodes" and if the nodemaps are
inconsistent between nodes then avoid triggering an unnecessary
recovery.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-04-07 07:43:13 +02:00
Martin Schwenke
13dc4a9842 ctdb-tool: Update "reloadnodes" to disable recoveries
If a recovery occurs when some nodes have reloaded and others haven't
then the nodemaps with be inconsistent so bad things will happen.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-04-07 07:43:13 +02:00
Martin Schwenke
ee9619c28b ctdb-recoverd: New message ID CTDB_SRVID_DISABLE_RECOVERIES
Also add test stub support.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-04-07 07:43:13 +02:00
Martin Schwenke
2ca484cd50 ctdb-recoverd: Simplify disable_ip_check_handler() using ctdb_op_disable()
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-04-07 07:43:13 +02:00
Martin Schwenke
108db3396f ctdb-recoverd: Add slightly more abstraction for disabling takeover runs
Factor out new function srvid_disable_and_reply(), which can be
re-used.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-04-07 07:43:13 +02:00
Martin Schwenke
ec32d9bea8 ctdb-recoverd: Reimplement ReRecoveryTimeout using ctdb_op_disable()
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-04-07 07:43:12 +02:00
Martin Schwenke
281f7e8152 ctdb-recoverd: Use a goto for do_recovery() failures
This will allow extra things to be done on failure.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-04-07 07:43:12 +02:00
Martin Schwenke
a2044c65bc ctdb-recoverd: Reimplement disabling takeover runs using ctdb_op_disable()
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-04-07 07:43:12 +02:00
Martin Schwenke
55b246195b ctdb-recoverd: Add a new abstraction ctdb_op_disable()
This can be used to disable and re-enable an operation, and do all the
relevant sanity checking.

Most of this is from existing functions
disable_takeover_runs_handler(), clear_takeover_runs_disable() and
reenable_takeover_runs().

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-04-07 07:43:12 +02:00
Martin Schwenke
ae9cd037ee ctdb-daemon: Pass on consistent flag information to recovery daemon
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-04-07 07:43:12 +02:00
Martin Schwenke
4b972bbdb3 ctdb-tests: Add "ctdb reloadnodes" test for "node remains deleted"
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-04-07 07:43:12 +02:00
Martin Schwenke
181658f5bb ctdb-tools: Fix spurious messages about deleted nodes being disconnected
The code was too "clever".  The 4 different cases should be separate.
The "node remains deleted" case doesn't need the IP address comparison
(always 0.0.0.0) or the disconnected check.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-04-07 07:43:12 +02:00
David Disseldorp
12309f8bfb ctdb: check for talloc_asprintf() failure
Signed-off-by: David Disseldorp <ddiss@samba.org>
Reviewed-by: Michael Adam <obnox@samba.org>

Autobuild-User(master): Michael Adam <obnox@samba.org>
Autobuild-Date(master): Wed Apr  1 15:36:03 CEST 2015 on sn-devel-104
2015-04-01 15:36:03 +02:00
Rajesh Joseph
801bdcde6a ctdb: Coverity fix for CID 1291643
CID 1291643: Resource leak: leaked_handle: Handle
variable lock_fd going out of scope leaks the handle.

Fix: on failure case release handle variable lock_fd

Signed-off-by: Rajesh Joseph <rjoseph@redhat.com>
Reviewed-by: Michael Adam <obnox@samba.org>
Reviewed-by: David Disseldorp <ddiss@samba.org>
2015-04-01 12:54:11 +02:00
Amitay Isaacs
079575d80f ctdb-tests: Switch to tcp check in rpcinfo stub
Use -T tcp instead of deprecated options -u and -t.  Also, check for
localhost.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>

Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Fri Mar 27 09:16:50 CET 2015 on sn-devel-104
2015-03-27 09:16:50 +01:00
Amitay Isaacs
14886ed00c ctdb-scripts: Use tcp connection for checking RPC services
It's possible for a RPC service to register only for UDP and not TCP.
Since we assume all the NFS operations are over TCP, always check RPC
services over TCP.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2015-03-27 06:40:08 +01:00
Martin Schwenke
130202d635 ctdb-scripts: Respect $RPCMOUNTDOPTS when restarting rpc.mountd
$RPCMOUNTDOPTS is ignored when restarting rpc.statd due to the service
being unresponsive.  This variable can be used to increase the number
of rpc.mountd threads when there are a lot of clients reattaching so
ignoring it can mean that only a single rpc.mount thread is started.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-03-27 06:40:08 +01:00
Amitay Isaacs
62ba95a9f3 ctdb-daemon: Drop tunable that is no longer in use
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2015-03-27 06:40:08 +01:00
Amitay Isaacs
41ed26cbf7 ctdb-recoverd: Fix typo in comment
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2015-03-27 06:40:08 +01:00
Jelmer Vernooij
90ec37cf90 Move waf into third_party/.
Signed-Off-By: Jelmer Vernooij <jelmer@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
2015-03-26 22:47:22 +01:00
Volker Lendecke
508b45fca9 ctdb: Fix CID 1125615 Copy into fixed size buffer
Might be a "can't happen", but strcpy always looks fishy

Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Michael Adam <obnox@samba.org>
2015-03-26 14:54:20 +01:00
Volker Lendecke
93d4e80129 ctdb: Fix CID 1125634 Out-of-bounds write
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Michael Adam <obnox@samba.org>
2015-03-26 14:54:20 +01:00
Christof Schmitt
0509790ec3 build: Move systemd checks to lib/util
Only lib/util uses the systemd library, so it makes sense to have the
checks there. This also removes the need for the ctdb build script to
specify an empty tag for the systemd library.

Signed-off-by: Christof Schmitt <cs@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-03-23 21:22:11 +01:00
Martin Schwenke
c8918b70b9 ctdb-tools: Use a broadcast to connected nodes for "reloadnodes"
There is no reason to serialise these or even handle remote nodes
first.  Using a broadcast is more efficient and is less code.

Update expected test results to reflect changed order of messages.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Mon Mar 23 15:04:00 CET 2015 on sn-devel-104
2015-03-23 15:04:00 +01:00
Martin Schwenke
c99d2702ee ctdb-tests: Add unit tests for "reloadnodes" sanity checking
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-03-23 12:23:12 +01:00
Martin Schwenke
1cebd75f62 ctdb-tools: Sanity check changes before processing "reloadnodes"
"ctdb reloadnodes" currently does no sanity checking of the nodes
file.  This can cause chaos if a line is deleted from the nodes file
rather than commented out.  It also repeatedly produces a spurious
warning for each deleted node, even if the node was deleted a long
time ago.

Instead compare the nodemap with the contents of the local nodes file
to sanity check before attempting any reloads.  Note that this is
still imperfect if the nodes files are inconsistent across nodes but
it is better.  Also ensure that any nodes that are to be deleted are
already disconnected.  Avoid trying to talk to deleted nodes.

The current implementation is a bit unfortunate when it comes to
deleting nodes.  The most obvious alternative to the above complexity
would be to reloadnodes on the specified node first, then fetch the
node map (in which newly deleted nodes would be marked as such) and
then handle the remote nodes.  However, the implementation of
reloadnodes is asynchronous and it only actions the reload after 1
second.  This is presumably to avoid the recovery master noticing the
inconsistency between nodemaps and triggering a recovery before all
nodes have had their nodemaps updated.

Note that this recovery can still occur if the check is done at an
inconvenient time.  A better long term approach might be to quiesce
the recovery master checks while reloadnodes is in progress.

Update a unit test to reflect the change.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-03-23 12:23:12 +01:00
Martin Schwenke
2cb2aa58d0 ctdb-tests: Add "ctdb reloadnodes" unit tests
A basic test and some for cross-node consistency checking.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-03-23 12:23:12 +01:00
Martin Schwenke
16c79eb887 ctdb-tools: Add cross-node file comparison to "reloadnodes"
This compares the nodes file on the current node with that on all
nodes.  If any are different then do not reload nodes.

If any nodes files can't be fetched then do not reload nodes.  This
could be because some nodes are running an older version without this
feature.  This is unsupported: why make a major cluster
reconfiguration while a cluster is half upgraded?

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-03-23 12:23:12 +01:00
Martin Schwenke
5207d13152 ctdb-tests: Test stub for ctrl_getnodesfile()
Also stub support for CTDB_CONTROL_GET_NODES_FILE

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-03-23 12:23:12 +01:00
Martin Schwenke
81e526965c ctdb-daemon: New control CTDB_CONTROL_GET_NODES_FILE
This is like CTDB_CONTROL_GET_NODEMAP but it loads from the nodes file
instead of the daemon.

Also new client function ctdb_ctrl_getnodesfile()

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-03-23 12:23:12 +01:00
Martin Schwenke
8e12e112f8 ctdb-tools: "reloadnodes" should only run against current node
It should not be possible to specify "-n <othernode>", unless
<othernode> is the current node.  To support this, add new function
assert_current_node_only().

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-03-23 12:23:12 +01:00
Martin Schwenke
77e879253b ctdb-tools: Remove unused struct pnn_node and function read_pnn_node_file()
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-03-23 12:23:12 +01:00
Martin Schwenke
3703e8aadd ctdb-tools: Reimplement read_natgw_nodes_file() using ctdb_read_nodes_file()
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-03-23 12:23:12 +01:00
Martin Schwenke
c5538a464f ctdb-tools: Reimplement read_nodes_file() using ctdb_get_nodes_file()
Update the implementation of "ctdb xpnn" and "ctdb listnodes"
accordingly.  Update associated tests too.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-03-23 12:23:12 +01:00
Martin Schwenke
5148228f41 ctdb-daemon: Move ctdb_read_nodes_file() to utilities
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-03-23 12:23:12 +01:00
Martin Schwenke
1ada9c4ef7 ctdb-daemon: Factor out node parsing code
New function ctdb_read_nodes_file() reads a nodes file into a node
map, which is a useful intermediate format.  This function should
replace the node reading code in the ctdb CLI tool.  It will also be
useful for sanity checking of nodes files across the cluster.

New function convert_node_map_to_list() converts a node map to a node
array (and associated node count).  This fills in the details that
aren't present in the node map.  This may also useful as a separate
function later if node list reloading stages the data after a sanity
check - the approach is not yet finalised.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-03-23 12:23:12 +01:00
Martin Schwenke
a5be2c245d ctdb-daemon: Store node addresses as ctdb_sock_addr rather than strings
Every time a nodemap is contructed the node IP addresses all need to
be parsed.  This isn't very productive use of CPU.

Instead, parse each string once when the nodes file is loaded.  This
results in much simpler code.

This code also removes the use of ctdb_address.  Duplicating the port
is pointless without an abstraction layer around ctdb_address.  If
CTDB gets an incompatible transport in the future then add an
abstraction layer.

Note that the infiniband code is not updated.  Compilation of the
infiniband code is already broken.  Fixing it will be a separate,
properly tested effort.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
2015-03-23 12:23:12 +01:00
Martin Schwenke
3cbeb17d0f ctdb-common: Drop ctdb context from ctdb_parse_address()
Having it require a CTDB context stops ctdb_parse_address() from being
used in more generic code.  Just use the existing talloc context for
memory allocations.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-03-23 12:23:12 +01:00
Martin Schwenke
a1e65d0c8d ctdb-daemon: Remove function ctdb_add_deleted_node()
Just add a flags parameter to ctdb_add_nodes() and use the same code.
Less is more.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-03-23 12:23:12 +01:00
Martin Schwenke
876529054a ctdb-daemon: Set node PNN in one place
This is currently set in 2 places.  One of them makes the node loading
code difficult to refactor.  Also, when the surrounding code in either
place is touched then it might get broken.

This only needs to be done once at startup, not on every reload.  So
do it once in a very obvious way, sacrificing a few CPU cycles for
some added clarity.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-03-23 12:23:12 +01:00
Martin Schwenke
db6385afe9 ctdb-daemon: Move VNN map initialisation out of node loading
Each node reload unnecessarily and incorrectly resets the VNN map,
causing a potentially unnecessary recovery.  When nodes are reloaded
any newly deleted nodes should already be disconnected and any newly
added nodes should also be disconnected.  This means that reloading
the nodes file should not cause a change in the VNN map.

The current implementation also leaks memory every time the nodes are
reloaded.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-03-23 12:23:12 +01:00
Martin Schwenke
ee073f60b1 ctdb-tests: Fix error return for ctdb_client_async_control_stub()
It should be -1 even without a failure callback registered.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-03-23 12:23:12 +01:00
Martin Schwenke
c0891339ec ctdb-tests: Add asserts to ensure that pointers are set
These can be unset if a NODEMAP, IFACES or VNNMAP section is missing.
Affected functions would then dereference a NULL pointer and the test
program would crash.  Adding some helpful messages makes the problem
easier to diagnose when writing tests.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-03-23 12:23:12 +01:00
Amitay Isaacs
1f523a628a ctdb-tests: Avoid early exits in scripts that appear on tail of a pipe
When executing a shell script code "foo | bar", if "bar" terminates early,
then "foo" can get I/O error when writing to stdout.

The tdbtool stub did not wait to read anything from stdin when it is
expected to.  This would cause tests to fail randomly under load when
tdbtool process exited early.

Similarly, debug function read from stdin only under certain conditions
(higher debug and when not reading from tty).  Otherwise, exited early.

Thanks to Andrew Bartlett for noticing the problem and Catalyst Cloud
(http://catalyst.net.nz/cloud) for providing resources to test fixes.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-Programmed-With: Martin Schwenke <martin@meltin.net>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>

Autobuild-User(master): Michael Adam <obnox@samba.org>
Autobuild-Date(master): Fri Mar 20 16:26:37 CET 2015 on sn-devel-104
2015-03-20 16:26:36 +01:00
Amitay Isaacs
4f82ef4b38 ctdb-scripts: Simplify 00.ctdb event script
Avoid extra which commands.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2015-03-20 13:49:26 +01:00