1
0
mirror of https://github.com/samba-team/samba.git synced 2025-01-13 13:18:06 +03:00
Commit Graph

1637 Commits

Author SHA1 Message Date
Amitay Isaacs
9b6865475e ctdb-daemon: Remove obsolete IPv4 only controls
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Jeremy Allison <jra@samba.org>
2015-05-12 01:32:11 +02:00
Amitay Isaacs
4f4e6ebace ctdb-daemon: Remove older data structure that supports only IPv4 addresses
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Jeremy Allison <jra@samba.org>
2015-05-12 01:32:11 +02:00
Martin Schwenke
c75f297ac3 ctdb-daemon: Fix typo in debug message
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Sun May 10 06:10:21 CEST 2015 on sn-devel-104
2015-05-10 06:10:21 +02:00
Martin Schwenke
d30b529ccc ctdb-daemon: Initialise eventscript status earlier
Don't initialise it after ctdb_event_script_callback_v() may have
short-circuited.  This can stop ctdb_event_script_args() from ever
terminating.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-05-10 03:22:14 +02:00
Martin Schwenke
070964dbcf ctdb-daemon: Make ctdb_event_script_args() terminate if no scripts
status.done is never set to true unless event_script_callback() is
invoked.  The short-circuit in ctdb_event_script_callback_v() means
that this doesn't happen.  CTDB can't work very well without 00.ctdb
(for tunable initialisation and the like) but it shouldn't get stuck.

So call the callback when there are no scripts in
event_script_callback().

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-05-10 03:22:14 +02:00
Martin Schwenke
6808b0aa6a ctdb-daemon: Drop interface monitoring
This is done by 10.interace where the monitor event fails when there
is a missing interface.  The in-daemon interface checking adds no
value.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-05-10 03:22:14 +02:00
Martin Schwenke
7ee57b8d7c ctdb-recoverd: Short circuit takeover run if no nodes are RUNNING
If all nodes are still in, say, FIRST_RECOVERY runstate, then the logs
contain unfortunate noise like:

  recoverd:Failed to find node to cover ip 10.0.2.131

This avoids that by adding an early exit that avoids running
takeover_run_core() when there are no nodes in the
CTDB_RUNSTATE_RUNNING.

To support this add the runstate to the ipflags structure.  There are
clearly other ways of hacking this but this seems the simplest.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-05-10 03:22:13 +02:00
Martin Schwenke
91f99ddfb3 ctdb-recoverd: Remove redundant condition when checking recovery lock
It isn't possible to hold the recovery lock without having a lock file
set.

This is part of a goal to generalise the recovery lock mechanism to
just use a helper program, which may use a lock file or may use
something else.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-05-10 03:22:13 +02:00
Martin Schwenke
a45ab7d1fe ctdb-recoverd: Simplify using TALLOC_FREE()
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-05-10 03:22:13 +02:00
Martin Schwenke
2c72c9de48 ctdb-recoverd: Drop redundant condition in election handler
Election packets from the current node are ignored at the beginning of
the function, so this does not need to be checked.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-05-10 03:22:13 +02:00
Martin Schwenke
c75fdf208f ctdb-recoverd: Remove unused memory context variable
It is set, memory is allocated but it is never used.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-05-10 03:22:13 +02:00
Martin Schwenke
e6f99fcba3 ctdb-daemon: Broadcast IP rellocation request from monitor code
No need to just send it to the recovery master.

This reduces the need for main daemon code to know which node is the
recovery master.  The end goal is for the main daemon to not need to
know which node is the recovery master - this information would be
stored in the recovery daemon (and subsequently a separate cluster
management daemon).

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-05-10 03:22:13 +02:00
Martin Schwenke
4b4ba77f4a ctdb-recoverd: Replace unnecessary use of ctdb->recovery_master
Databases are only pulled by the recovery master, so it can compare
with current node PNN.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-05-10 03:22:13 +02:00
Martin Schwenke
6415edfa26 ctdb-recoverd: Rename some local variables to avoid conflict with convention
rec is always a (struct ctdb_recoverd *)

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-05-10 03:22:13 +02:00
Martin Schwenke
36fc620898 ctdb_recoverd: Move num_lmasters calculation to near where it is used
Unless this node is the recovery master then this is not needed.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-05-10 03:22:13 +02:00
Martin Schwenke
1fd2d3886c ctdb-recoverd: Make num_lmasters a local variable
It isn't used anywhere else and is always re-initialised to 0.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-05-10 03:22:13 +02:00
Martin Schwenke
385e9326ea ctdb-recoverd: Remove unused struct members num_active and num_connected
They are initialised and updated but the values are never used.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-05-10 03:22:13 +02:00
Martin Schwenke
c3d6678dbc ctdb-recoverd: Use capabilities API
Simplify update_capabilities() using the capabilities API and store
the capabilities in new field rec->caps rather than scattered around
ctdb->nodes.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-05-10 03:22:13 +02:00
Martin Schwenke
20a7945a26 Revert "ctdb-recoverd: Abort when daemon can take recovery lock during recovery"
This reverts commit 39d2fd330a.

An election can occur in the middle of a recovery.  During the
election the recovery master can change.  When a node loses a round of
the election and stops being the recovery master it releases the
recovery lock.  Then at the end of the ongoing recovery all nodes are
able to take the recovery lock so they will all abort.

The most likely cause for a change in recovery master is that several
(all?) nodes are starting up and the "connected-ness" of each node is
a primary factor in winning the election.  In this situation the
recovery master can bounce around the cluster.

The simplest solution is to revert this patch so that the recovery
will fail.  The new recovery master will then start a new recovery.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Mon May  4 10:40:36 CEST 2015 on sn-devel-104
2015-05-04 10:40:36 +02:00
Rajesh Joseph
9b33732a57 ctdb: Coverity fix for CID 1125630
Due to usage of CTDB_NO_MEMORY macro,
some of the resources are not freed in failure cases.

Signed-off-by: Rajesh Joseph <rjoseph@redhat.com>
Reviewed-by: Guenther Deschner <gd@samba.org>
Reviewed-by: Michael Adam <obnox@samba.org>

Autobuild-User(master): Günther Deschner <gd@samba.org>
Autobuild-Date(master): Fri Apr 17 16:49:05 CEST 2015 on sn-devel-104
2015-04-17 16:49:04 +02:00
Martin Schwenke
1ef1cfdc4d ctdb-common: Move ctdb_node_list_to_map() to utilities
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-04-07 07:43:13 +02:00
Martin Schwenke
dd52d82c73 ctdb-daemon: Factor out new function ctdb_node_list_to_map()
Change ctdb_control_getnodemap() to use this.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-04-07 07:43:13 +02:00
Martin Schwenke
d340f308e7 ctdb-daemon: Don't delay reloading the nodes file
Presumably this was done to minimise the chance of a recovery
occurring while the nodemaps are inconsistent across nodes.

Another potential theory is that the forced recovery in the
ctdb.c:control_reload_nodes_file() stops another recovery occurring
for ReRecoveryTimeout seconds, so this delay causes the reloads to
occur during that period.

This is no longer necessary because recoveries are now explicitly
disabled while node files are reloaded.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-04-07 07:43:13 +02:00
Martin Schwenke
85bd9a33eb ctdb-recoverd: Avoid nodemap-related checks when recoveries are disabled
The potential resulting recovery won't run anyway.  Also recoveries
may have been disabled by "reloadnodes" and if the nodemaps are
inconsistent between nodes then avoid triggering an unnecessary
recovery.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-04-07 07:43:13 +02:00
Martin Schwenke
ee9619c28b ctdb-recoverd: New message ID CTDB_SRVID_DISABLE_RECOVERIES
Also add test stub support.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-04-07 07:43:13 +02:00
Martin Schwenke
2ca484cd50 ctdb-recoverd: Simplify disable_ip_check_handler() using ctdb_op_disable()
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-04-07 07:43:13 +02:00
Martin Schwenke
108db3396f ctdb-recoverd: Add slightly more abstraction for disabling takeover runs
Factor out new function srvid_disable_and_reply(), which can be
re-used.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-04-07 07:43:13 +02:00
Martin Schwenke
ec32d9bea8 ctdb-recoverd: Reimplement ReRecoveryTimeout using ctdb_op_disable()
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-04-07 07:43:12 +02:00
Martin Schwenke
281f7e8152 ctdb-recoverd: Use a goto for do_recovery() failures
This will allow extra things to be done on failure.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-04-07 07:43:12 +02:00
Martin Schwenke
a2044c65bc ctdb-recoverd: Reimplement disabling takeover runs using ctdb_op_disable()
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-04-07 07:43:12 +02:00
Martin Schwenke
55b246195b ctdb-recoverd: Add a new abstraction ctdb_op_disable()
This can be used to disable and re-enable an operation, and do all the
relevant sanity checking.

Most of this is from existing functions
disable_takeover_runs_handler(), clear_takeover_runs_disable() and
reenable_takeover_runs().

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-04-07 07:43:12 +02:00
Martin Schwenke
ae9cd037ee ctdb-daemon: Pass on consistent flag information to recovery daemon
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-04-07 07:43:12 +02:00
Amitay Isaacs
62ba95a9f3 ctdb-daemon: Drop tunable that is no longer in use
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2015-03-27 06:40:08 +01:00
Amitay Isaacs
41ed26cbf7 ctdb-recoverd: Fix typo in comment
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2015-03-27 06:40:08 +01:00
Martin Schwenke
81e526965c ctdb-daemon: New control CTDB_CONTROL_GET_NODES_FILE
This is like CTDB_CONTROL_GET_NODEMAP but it loads from the nodes file
instead of the daemon.

Also new client function ctdb_ctrl_getnodesfile()

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-03-23 12:23:12 +01:00
Martin Schwenke
5148228f41 ctdb-daemon: Move ctdb_read_nodes_file() to utilities
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-03-23 12:23:12 +01:00
Martin Schwenke
1ada9c4ef7 ctdb-daemon: Factor out node parsing code
New function ctdb_read_nodes_file() reads a nodes file into a node
map, which is a useful intermediate format.  This function should
replace the node reading code in the ctdb CLI tool.  It will also be
useful for sanity checking of nodes files across the cluster.

New function convert_node_map_to_list() converts a node map to a node
array (and associated node count).  This fills in the details that
aren't present in the node map.  This may also useful as a separate
function later if node list reloading stages the data after a sanity
check - the approach is not yet finalised.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-03-23 12:23:12 +01:00
Martin Schwenke
a5be2c245d ctdb-daemon: Store node addresses as ctdb_sock_addr rather than strings
Every time a nodemap is contructed the node IP addresses all need to
be parsed.  This isn't very productive use of CPU.

Instead, parse each string once when the nodes file is loaded.  This
results in much simpler code.

This code also removes the use of ctdb_address.  Duplicating the port
is pointless without an abstraction layer around ctdb_address.  If
CTDB gets an incompatible transport in the future then add an
abstraction layer.

Note that the infiniband code is not updated.  Compilation of the
infiniband code is already broken.  Fixing it will be a separate,
properly tested effort.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
2015-03-23 12:23:12 +01:00
Martin Schwenke
3cbeb17d0f ctdb-common: Drop ctdb context from ctdb_parse_address()
Having it require a CTDB context stops ctdb_parse_address() from being
used in more generic code.  Just use the existing talloc context for
memory allocations.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-03-23 12:23:12 +01:00
Martin Schwenke
a1e65d0c8d ctdb-daemon: Remove function ctdb_add_deleted_node()
Just add a flags parameter to ctdb_add_nodes() and use the same code.
Less is more.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-03-23 12:23:12 +01:00
Martin Schwenke
876529054a ctdb-daemon: Set node PNN in one place
This is currently set in 2 places.  One of them makes the node loading
code difficult to refactor.  Also, when the surrounding code in either
place is touched then it might get broken.

This only needs to be done once at startup, not on every reload.  So
do it once in a very obvious way, sacrificing a few CPU cycles for
some added clarity.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-03-23 12:23:12 +01:00
Martin Schwenke
db6385afe9 ctdb-daemon: Move VNN map initialisation out of node loading
Each node reload unnecessarily and incorrectly resets the VNN map,
causing a potentially unnecessary recovery.  When nodes are reloaded
any newly deleted nodes should already be disconnected and any newly
added nodes should also be disconnected.  This means that reloading
the nodes file should not cause a change in the VNN map.

The current implementation also leaks memory every time the nodes are
reloaded.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-03-23 12:23:12 +01:00
Volker Lendecke
d171d2010a ctdb: Fix CID 1125613 Destination buffer too small
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Michael Adam <obnox@samba.org>

Autobuild-User(master): Michael Adam <obnox@samba.org>
Autobuild-Date(master): Fri Mar 13 19:14:20 CET 2015 on sn-devel-104
2015-03-13 19:14:20 +01:00
Volker Lendecke
8d9bb5c54a ctdb: Introduce a helper var in ctdb_get_script_list
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Michael Adam <obnox@samba.org>
2015-03-13 16:39:05 +01:00
Volker Lendecke
c1e8bfb186 ctdb: Fix memleak in ctdb_get_script_list
scandir allocates every name individually, see example code in susv4 or man
scandir

Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Michael Adam <obnox@samba.org>
2015-03-13 16:39:05 +01:00
Volker Lendecke
a8cc495b96 ctdb: Make for-loop in ctdb_get_script_list more idiomatic
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Michael Adam <obnox@samba.org>
2015-03-13 16:39:05 +01:00
Volker Lendecke
b584bdebf9 ctdb: Fix whitespace
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Michael Adam <obnox@samba.org>
2015-03-13 16:39:05 +01:00
Volker Lendecke
f724bfb44a ctdb: Fix CID 1288201 Array compared against 0
"helper_prog" is now declared as a static array

Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
2015-03-11 16:11:07 +01:00
Martin Schwenke
b7b508c765 ctdb-daemon: Use statically allocated arrays for helper paths
The use of talloc with a static variable is somewhat confusing.
Statically allocate an array and use ctdb_set_helper() instead.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Volker Lendecke <vl@samba.org>
2015-03-10 15:29:06 +01:00
Amitay Isaacs
3f97be6d0f ctdb-locking: Back-off from logging every 10 seconds
If ctdb_lock_helper cannot get a lock within 10 seconds, ctdb daemon
logs a message and invokes an external debug script.  This is repeated
every 10 seconds.

In case of a contention or on a loaded system, there can be multiple
ctdb_lock_helper processes waiting to get lock on record(s).  For each
lock request taking longer, ctdb daemon will flood the log every
10 seconds.  Instead of logging aggressively every 10 seconds, relax
logging to every 100s and 1000s if the elapsed time has exceeded 100s
and 1000s respectively.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>

Autobuild-User(master): Michael Adam <obnox@samba.org>
Autobuild-Date(master): Thu Mar  5 12:06:44 CET 2015 on sn-devel-104
2015-03-05 12:06:44 +01:00