1
0
mirror of https://github.com/samba-team/samba.git synced 2024-12-28 07:21:54 +03:00
Commit Graph

1615 Commits

Author SHA1 Message Date
Martin Schwenke
d340f308e7 ctdb-daemon: Don't delay reloading the nodes file
Presumably this was done to minimise the chance of a recovery
occurring while the nodemaps are inconsistent across nodes.

Another potential theory is that the forced recovery in the
ctdb.c:control_reload_nodes_file() stops another recovery occurring
for ReRecoveryTimeout seconds, so this delay causes the reloads to
occur during that period.

This is no longer necessary because recoveries are now explicitly
disabled while node files are reloaded.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-04-07 07:43:13 +02:00
Martin Schwenke
85bd9a33eb ctdb-recoverd: Avoid nodemap-related checks when recoveries are disabled
The potential resulting recovery won't run anyway.  Also recoveries
may have been disabled by "reloadnodes" and if the nodemaps are
inconsistent between nodes then avoid triggering an unnecessary
recovery.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-04-07 07:43:13 +02:00
Martin Schwenke
ee9619c28b ctdb-recoverd: New message ID CTDB_SRVID_DISABLE_RECOVERIES
Also add test stub support.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-04-07 07:43:13 +02:00
Martin Schwenke
2ca484cd50 ctdb-recoverd: Simplify disable_ip_check_handler() using ctdb_op_disable()
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-04-07 07:43:13 +02:00
Martin Schwenke
108db3396f ctdb-recoverd: Add slightly more abstraction for disabling takeover runs
Factor out new function srvid_disable_and_reply(), which can be
re-used.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-04-07 07:43:13 +02:00
Martin Schwenke
ec32d9bea8 ctdb-recoverd: Reimplement ReRecoveryTimeout using ctdb_op_disable()
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-04-07 07:43:12 +02:00
Martin Schwenke
281f7e8152 ctdb-recoverd: Use a goto for do_recovery() failures
This will allow extra things to be done on failure.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-04-07 07:43:12 +02:00
Martin Schwenke
a2044c65bc ctdb-recoverd: Reimplement disabling takeover runs using ctdb_op_disable()
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-04-07 07:43:12 +02:00
Martin Schwenke
55b246195b ctdb-recoverd: Add a new abstraction ctdb_op_disable()
This can be used to disable and re-enable an operation, and do all the
relevant sanity checking.

Most of this is from existing functions
disable_takeover_runs_handler(), clear_takeover_runs_disable() and
reenable_takeover_runs().

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-04-07 07:43:12 +02:00
Martin Schwenke
ae9cd037ee ctdb-daemon: Pass on consistent flag information to recovery daemon
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-04-07 07:43:12 +02:00
Amitay Isaacs
62ba95a9f3 ctdb-daemon: Drop tunable that is no longer in use
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2015-03-27 06:40:08 +01:00
Amitay Isaacs
41ed26cbf7 ctdb-recoverd: Fix typo in comment
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2015-03-27 06:40:08 +01:00
Martin Schwenke
81e526965c ctdb-daemon: New control CTDB_CONTROL_GET_NODES_FILE
This is like CTDB_CONTROL_GET_NODEMAP but it loads from the nodes file
instead of the daemon.

Also new client function ctdb_ctrl_getnodesfile()

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-03-23 12:23:12 +01:00
Martin Schwenke
5148228f41 ctdb-daemon: Move ctdb_read_nodes_file() to utilities
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-03-23 12:23:12 +01:00
Martin Schwenke
1ada9c4ef7 ctdb-daemon: Factor out node parsing code
New function ctdb_read_nodes_file() reads a nodes file into a node
map, which is a useful intermediate format.  This function should
replace the node reading code in the ctdb CLI tool.  It will also be
useful for sanity checking of nodes files across the cluster.

New function convert_node_map_to_list() converts a node map to a node
array (and associated node count).  This fills in the details that
aren't present in the node map.  This may also useful as a separate
function later if node list reloading stages the data after a sanity
check - the approach is not yet finalised.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-03-23 12:23:12 +01:00
Martin Schwenke
a5be2c245d ctdb-daemon: Store node addresses as ctdb_sock_addr rather than strings
Every time a nodemap is contructed the node IP addresses all need to
be parsed.  This isn't very productive use of CPU.

Instead, parse each string once when the nodes file is loaded.  This
results in much simpler code.

This code also removes the use of ctdb_address.  Duplicating the port
is pointless without an abstraction layer around ctdb_address.  If
CTDB gets an incompatible transport in the future then add an
abstraction layer.

Note that the infiniband code is not updated.  Compilation of the
infiniband code is already broken.  Fixing it will be a separate,
properly tested effort.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
2015-03-23 12:23:12 +01:00
Martin Schwenke
3cbeb17d0f ctdb-common: Drop ctdb context from ctdb_parse_address()
Having it require a CTDB context stops ctdb_parse_address() from being
used in more generic code.  Just use the existing talloc context for
memory allocations.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-03-23 12:23:12 +01:00
Martin Schwenke
a1e65d0c8d ctdb-daemon: Remove function ctdb_add_deleted_node()
Just add a flags parameter to ctdb_add_nodes() and use the same code.
Less is more.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-03-23 12:23:12 +01:00
Martin Schwenke
876529054a ctdb-daemon: Set node PNN in one place
This is currently set in 2 places.  One of them makes the node loading
code difficult to refactor.  Also, when the surrounding code in either
place is touched then it might get broken.

This only needs to be done once at startup, not on every reload.  So
do it once in a very obvious way, sacrificing a few CPU cycles for
some added clarity.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-03-23 12:23:12 +01:00
Martin Schwenke
db6385afe9 ctdb-daemon: Move VNN map initialisation out of node loading
Each node reload unnecessarily and incorrectly resets the VNN map,
causing a potentially unnecessary recovery.  When nodes are reloaded
any newly deleted nodes should already be disconnected and any newly
added nodes should also be disconnected.  This means that reloading
the nodes file should not cause a change in the VNN map.

The current implementation also leaks memory every time the nodes are
reloaded.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-03-23 12:23:12 +01:00
Volker Lendecke
d171d2010a ctdb: Fix CID 1125613 Destination buffer too small
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Michael Adam <obnox@samba.org>

Autobuild-User(master): Michael Adam <obnox@samba.org>
Autobuild-Date(master): Fri Mar 13 19:14:20 CET 2015 on sn-devel-104
2015-03-13 19:14:20 +01:00
Volker Lendecke
8d9bb5c54a ctdb: Introduce a helper var in ctdb_get_script_list
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Michael Adam <obnox@samba.org>
2015-03-13 16:39:05 +01:00
Volker Lendecke
c1e8bfb186 ctdb: Fix memleak in ctdb_get_script_list
scandir allocates every name individually, see example code in susv4 or man
scandir

Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Michael Adam <obnox@samba.org>
2015-03-13 16:39:05 +01:00
Volker Lendecke
a8cc495b96 ctdb: Make for-loop in ctdb_get_script_list more idiomatic
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Michael Adam <obnox@samba.org>
2015-03-13 16:39:05 +01:00
Volker Lendecke
b584bdebf9 ctdb: Fix whitespace
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Michael Adam <obnox@samba.org>
2015-03-13 16:39:05 +01:00
Volker Lendecke
f724bfb44a ctdb: Fix CID 1288201 Array compared against 0
"helper_prog" is now declared as a static array

Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
2015-03-11 16:11:07 +01:00
Martin Schwenke
b7b508c765 ctdb-daemon: Use statically allocated arrays for helper paths
The use of talloc with a static variable is somewhat confusing.
Statically allocate an array and use ctdb_set_helper() instead.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Volker Lendecke <vl@samba.org>
2015-03-10 15:29:06 +01:00
Amitay Isaacs
3f97be6d0f ctdb-locking: Back-off from logging every 10 seconds
If ctdb_lock_helper cannot get a lock within 10 seconds, ctdb daemon
logs a message and invokes an external debug script.  This is repeated
every 10 seconds.

In case of a contention or on a loaded system, there can be multiple
ctdb_lock_helper processes waiting to get lock on record(s).  For each
lock request taking longer, ctdb daemon will flood the log every
10 seconds.  Instead of logging aggressively every 10 seconds, relax
logging to every 100s and 1000s if the elapsed time has exceeded 100s
and 1000s respectively.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>

Autobuild-User(master): Michael Adam <obnox@samba.org>
Autobuild-Date(master): Thu Mar  5 12:06:44 CET 2015 on sn-devel-104
2015-03-05 12:06:44 +01:00
Martin Schwenke
54f0c39e5a ctdb-client: Return a value of 1 when setting obsolete tunable variable
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-02-18 05:34:06 +01:00
Martin Schwenke
39d2fd330a ctdb-recoverd: Abort when daemon can take recovery lock during recovery
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Fri Feb 13 09:48:15 CET 2015 on sn-devel-104
2015-02-13 09:48:15 +01:00
Martin Schwenke
432d677489 ctdb-recoverd: Improve error messages on recovery lock coherence fail
When the daemon is able to take the recovery lock during recovery we
might as well guess that the cluster filesystem has a lock coherence
problem and print a more useful message.  This will be more helpful to
those trying out cluster filesystems that don't have lock coherence or
that are difficult to setup.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-02-13 07:19:07 +01:00
Martin Schwenke
48c91407ab ctdb-recoverd: Don't release and re-take the recovery lock
Just continue to hold it, otherwise a broken node might win an
election and grab the lock.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-02-13 07:19:07 +01:00
Martin Schwenke
1d6ed91f55 ctdb-recoverd: Simplify ctdb_recovery_lock()
Have it just silently take or fail to take the lock, except on an
unexpected failure (where it should log an error).

This means that when it is called we need to keep the old behaviour
and explicitly release the lock.  In do_recovery() the lock is
released and a message is printed before attempting to take the lock.
In the daemon sanity check the lock must be released in the error path
if it is actually taken.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-02-13 07:19:07 +01:00
Martin Schwenke
be19a17faf ctdb-recoverd: Remove check_recovery_lock()
This has not done anything useful since commit
b9d8bb23af.  Instead, just check that
the lock is held.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-02-13 07:19:07 +01:00
Martin Schwenke
668ed53662 ctdb-recoverd: Improve logging when recovery lock file is changed
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-02-13 07:19:07 +01:00
Martin Schwenke
db32a2bce5 ctdb-recoverd: New function ctdb_recovery_unlock()
Unlock the recovery lock file.  This way knowledge of the file
descriptor isn't sprinkled throughout the code.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-02-13 07:19:07 +01:00
Martin Schwenke
72701be663 ctdb-recoverd: New function ctdb_recovery_have_lock()
True if this recovery daemon holds the lock.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-02-13 07:19:07 +01:00
Martin Schwenke
4d3b52f1ce ctdb-daemon: Log a warning when setting obsolete tunables
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-02-13 07:19:07 +01:00
Martin Schwenke
d110fe2318 ctdb-daemon: Mark tunable VerifyRecoveryLock as obsolete
It is pointless having a recovery lock but not sanity checking that it
is working.  Also, the logic that uses this tunable is confusing.  In
some places the recovery lock is released unnecessarily because the
tunable isn't set.

Simplify the logic by assuming that if a recovery lock is specified
then it should be verified.

Update documentation that references this tunable.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-02-13 07:19:07 +01:00
Martin Schwenke
5e00673f2d ctdb-daemon: Fix SET_RECLOCK_FILE regression
If the recovery lock file is unset then this dereferences a NULL
pointer.  The regression is due to commit
6f1ac7af0f.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-02-04 03:14:07 +01:00
Michael Adam
a59fb322d6 ctdb: improve helpfulness of debug message when taking reclock fails
Print out the errno if the fcntl call.

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Richard Sharpe <rsharpe@samba.org>

Autobuild-User(master): Michael Adam <obnox@samba.org>
Autobuild-Date(master): Fri Jan  9 04:25:02 CET 2015 on sn-devel-104
2015-01-09 04:25:02 +01:00
Martin Schwenke
6f1ac7af0f ctdb-daemon: Handle out-of-memory when setting recovery lock file
Log a message when the reclock file actually changes and avoid a
memory allocation when it doesn't change.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
2015-01-09 02:03:40 +01:00
Amitay Isaacs
e0bf5dd456 ctdb-daemon: Use correct tdb flags when enabling robust mutex support
BUG: https://bugzilla.samba.org/show_bug.cgi?id=11000

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
2014-12-19 13:15:12 +01:00
Stefan Metzmacher
6604b7bd8d ctdb/server: add format string checking to ctdb_tevent_logging()
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2014-12-17 09:26:07 +01:00
Martin Schwenke
108b1be0ee ctdb-daemon: Trust vnn->interface for an IP when releasing it
ctdb_sys_find_ifname() doesn't work for IPv6 addresses so don't use
it.

Trust the eventscript to do sanity checking on the interface.  Current
warnings are replaced with equivalents generated by the eventscript.
The unlikely message:

  Public IP %s is hosted on interface %s but we have no VNN

will be replaced by:

  WARNING: Public IP %s hosted on interface %s but VNN says __none__

which is clear enough.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2014-12-05 21:02:40 +01:00
Amitay Isaacs
959b9ea0ef ctdb-recoverd: Process all the records for vacuum fetch in a loop
Processing one migration request at a time is very slow and processing
a batch of records can take longer than VacuumInterval.  This causes
subsequent vacuum fetch requests to be dropped.  The dropped records
can accumulate quickly and will cause the vacuum database traverse to
be quite expensive.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Fri Dec  5 17:06:58 CET 2014 on sn-devel-104
2014-12-05 17:06:58 +01:00
Amitay Isaacs
257311e337 ctdb-vacuum: Do not delete VACUUM MIGRATED records immediately
Such records should be processed by the local vacuuming daemon to ensure
that all the remote copies have been deleted first.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2014-12-05 14:43:07 +01:00
Amitay Isaacs
dbb1958284 ctdb-vacuum: Use non-blocking lock when traversing delete tree
This avoids vacuuming getting in the way of ctdb daemon to process
record requests.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2014-12-05 14:43:07 +01:00
Amitay Isaacs
d35f512cd9 ctdb-vacuum: Use non-blocking lock when traversing delete queue
This avoids vacuuming getting in the way of ctdb daemon to process
record requests.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2014-12-05 14:43:07 +01:00
Amitay Isaacs
e4597f8771 ctdb-vacuum: Stagger vacuuming child processes
This prevents multiple child processes being forked at the same time
for vacuuming TDBs.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2014-12-05 14:43:07 +01:00