1
0
mirror of https://github.com/samba-team/samba.git synced 2025-02-01 05:47:28 +03:00

228 Commits

Author SHA1 Message Date
Martin Schwenke
ab75f2a587 ctdb-recovery: Use a configurable handler when testing cluster mutex
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-04-28 09:39:16 +02:00
Martin Schwenke
419f57f378 ctdb-recovery: Factor out new function set_recmode_handler()
This is used to reply to the recmode control for all the different
cases.  The callers can later be generalised to use a pointer, which
can then be used for recovery lock handling in different contexts.

Note that the handle is now freed in set_recmode_handler() rather than
the callbacks.

There is one difference in behaviour.  Deferred attach calls are now
processed in the timeout case, where they weren't before.  That's a
bug fix!

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-04-28 09:39:16 +02:00
Martin Schwenke
14a2330692 ctdb-recovery: Use single char ASCII numbers for status from child
'0' = Child took the mutex
  '1' = Unable to take mutex - contention
  '2' = Unable to take mutex - timeout
  '3' = Unable to take mutex - error

This is a straightforward API.  When the child is generalised to an
external helper then this makes it easier for a helper to be, for
example, a simple script.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-04-28 09:39:16 +02:00
Martin Schwenke
4842b6bb91 ctdb-recovery: Rename recovery lock functions and struct
Use the more general name "cluster mutex", since we are likely to end
up with more than one cluster-wide lock.  There will probably be a
dedicated recovery lock, held only during recovery, and also a second
lock that is held by the master node.  Currently one lock is used for
both purposes.

At the moment the struct and functions are involved with setting the
recovery mode.  However, they'll be abstracted out to more generally
deal with the cluster mutexes, so "recmode" -> "cluster_mutex".  Drop
"set" from names, since this is used to test the lock.  Also drop
"ctdb" prefix from functions, since they are local to this file.  The
struct will eventually be a long-lived handle that will release the
mutex when freed, so name it accordingly.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-04-28 09:39:16 +02:00
Amitay Isaacs
95a15cde45 ctdb-daemon: Implement new controls DB_PULL and DB_PUSH_START/DB_PUSH_CONFIRM
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2016-03-25 03:26:15 +01:00
Martin Schwenke
46edef25df ctdb-recovery: Limit scope of reclock latency statistics
It does not make sense to update this statistic for the timeout case,
since this could skew the statistic.  To keep it simple, just update
it for the usual case where there is lock contention, since this is
the usual case.  So the daemon statistic measures time to test the
lock and the corresponding recovery daemon statistic measures time to
take the lock.

Additionally, the recovery daemon will eventually use this code to
take the lock, and the method of updating the latency statistic will
need to be pushed further out to a configurable handler that depends
on the calling context.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue Feb 23 10:32:06 CET 2016 on sn-devel-144
2016-02-23 10:32:06 +01:00
Martin Schwenke
188019b877 ctdb-recovery: Negate the status when checking the recovery lock
Have 0 indicate that the lock was taken.  This allows non-zero values
to be used to indicate why the lock could not be taken.  EACCES means
lock contention.

For now use just EACCES to cover all failures, since
ctdb_recovery_lock() returns a bool and details of other errors will
be lost.  ctdb_recovery_lock() will undergo some big changes, so don't
try to fix this now.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-02-23 07:23:18 +01:00
Martin Schwenke
fad3f367b7 ctdb-recovery: Clean up status handling from recmode child
This currently returns an incorrect error when the expected number of
bytes are not read.  Separate out the different cases to clarify the
logic and avoid reporting the wrong error.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-02-23 07:23:18 +01:00
Martin Schwenke
b6c3918457 ctdb-recovery: Don't bother ensuring file descriptor is -1
This is already done before the destructor is assigned.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-02-23 07:23:18 +01:00
Martin Schwenke
531e6724ba ctdb-recovery: Don't store recmode in recovery mode state
The callbacks that use this value are only ever called if recovery
mode is being set to NORMAL.  So do not check if recmode is NORMAL
either.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-02-23 07:23:18 +01:00
Martin Schwenke
6695fa50ae ctdb: Use ctdb_wait_for_process_to_exit()
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-02-23 07:23:18 +01:00
Martin Schwenke
4d6ec81299 ctdb-recovery: Drop redundant status send when setting recovery mode
The child process writes the status into the pipe before looping to
wait.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-02-23 07:23:18 +01:00
Martin Schwenke
3e2f2169a4 ctdb-recovery: Include lib/util/time.h instead of samba_util.h
Less is more...

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-02-23 07:23:18 +01:00
Martin Schwenke
24160ee6a4 ctdb-daemon: Don't leak memory if not using recovery lock
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
2016-01-12 19:16:17 +01:00
Christof Schmitt
03b27bd139 ctdb: Use prctl_set_comment from lib/util
Signed-off-by: Christof Schmitt <cs@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-11-18 04:05:13 +01:00
Amitay Isaacs
f50db5cba5 ctdb-server: Replace ctdb_logging.h with common/logging.h
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
2015-11-16 00:46:15 +01:00
Amitay Isaacs
64d8bb626b ctdb-daemon: Rename struct ctdb_control_pulldb to ctdb_pulldb
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2015-11-04 00:47:15 +01:00
Amitay Isaacs
645cd43200 ctdb-daemon: Rename struct ctdb_dbid_map to ctdb_dbid_map_old
Match struct ctdb_dbid as per protocol/protocol.h

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2015-11-04 00:47:15 +01:00
Amitay Isaacs
b99436e425 ctdb-daemon: Rename struct ctdb_rec_data to ctdb_rec_data_old
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2015-11-04 00:47:14 +01:00
Amitay Isaacs
e1fed53e2a ctdb-daemon: Rename struct ctdb_req_control to ctdb_req_control_old
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2015-11-04 00:47:14 +01:00
Amitay Isaacs
4647787773 ctdb-daemon: Separate prototypes for common client/server functions
This groups function prototypes for common client/server functions in
common/common.h and removes them from ctdb_private.h.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2015-10-30 02:00:27 +01:00
Amitay Isaacs
01c6c90e98 ctdb-daemon: Remove dependency on includes.h
Instead of includes.h, include the required header files explicitly.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2015-10-30 02:00:27 +01:00
Amitay Isaacs
2fdb332fad ctdb-daemon: Stop using tevent compatibility definitions
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2015-10-30 02:00:27 +01:00
Amitay Isaacs
b900adc55c ctdb-daemon: Separate prototypes for system specific functions
This groups function prototypes for system specific functions in
common/system.h and removes them from ctdb_private.h.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2015-10-30 02:00:27 +01:00
Amitay Isaacs
42f7722151 ctdb-daemon: Remove freeze requirement for updating vnnmap
In the parallel database recovery model, all the database will not remain
frozen at the same time.  So relax the condition to check if recovery
is active.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2015-10-07 14:53:27 +02:00
Amitay Isaacs
3cbd0409f3 ctdb-daemon: Add a check for database generation consistency
Before setting recovery mode to normal, confirm that all the databases are
recovered by matching the database generation with the global generation.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2015-10-07 14:53:27 +02:00
Amitay Isaacs
66c7bcc777 ctdb-daemon: Use database specific mark/unmark routines
Instead of marking all the databases with priority, mark only the database
which is currently being processed.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2015-10-07 14:53:27 +02:00
Amitay Isaacs
e0fa182d93 ctdb-daemon: Use database specific freeze check routine
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2015-10-07 14:53:27 +02:00
Amitay Isaacs
7afabb1285 ctdb-daemon: Avoid the use of ctdb->freeze_handle variable
These variables are used for state information related to freezing
databases.  Instead use the API functions to check if the databases
are frozen.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2015-10-07 14:53:26 +02:00
Amitay Isaacs
8c58c7392f ctdb-daemon: Avoid the use of ctdb->freeze_mode variable
Use ctdb->freeze_mode only in ctdb_freeze.c and use the functions to
check if databases are frozen everywhere else.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2015-10-07 14:53:26 +02:00
Amitay Isaacs
9b6865475e ctdb-daemon: Remove obsolete IPv4 only controls
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Jeremy Allison <jra@samba.org>
2015-05-12 01:32:11 +02:00
Martin Schwenke
20a7945a26 Revert "ctdb-recoverd: Abort when daemon can take recovery lock during recovery"
This reverts commit 39d2fd330a60ea590d76213f8cb406a42fa8d680.

An election can occur in the middle of a recovery.  During the
election the recovery master can change.  When a node loses a round of
the election and stops being the recovery master it releases the
recovery lock.  Then at the end of the ongoing recovery all nodes are
able to take the recovery lock so they will all abort.

The most likely cause for a change in recovery master is that several
(all?) nodes are starting up and the "connected-ness" of each node is
a primary factor in winning the election.  In this situation the
recovery master can bounce around the cluster.

The simplest solution is to revert this patch so that the recovery
will fail.  The new recovery master will then start a new recovery.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Mon May  4 10:40:36 CEST 2015 on sn-devel-104
2015-05-04 10:40:36 +02:00
Martin Schwenke
1ef1cfdc4d ctdb-common: Move ctdb_node_list_to_map() to utilities
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-04-07 07:43:13 +02:00
Martin Schwenke
dd52d82c73 ctdb-daemon: Factor out new function ctdb_node_list_to_map()
Change ctdb_control_getnodemap() to use this.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-04-07 07:43:13 +02:00
Martin Schwenke
d340f308e7 ctdb-daemon: Don't delay reloading the nodes file
Presumably this was done to minimise the chance of a recovery
occurring while the nodemaps are inconsistent across nodes.

Another potential theory is that the forced recovery in the
ctdb.c:control_reload_nodes_file() stops another recovery occurring
for ReRecoveryTimeout seconds, so this delay causes the reloads to
occur during that period.

This is no longer necessary because recoveries are now explicitly
disabled while node files are reloaded.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-04-07 07:43:13 +02:00
Martin Schwenke
a5be2c245d ctdb-daemon: Store node addresses as ctdb_sock_addr rather than strings
Every time a nodemap is contructed the node IP addresses all need to
be parsed.  This isn't very productive use of CPU.

Instead, parse each string once when the nodes file is loaded.  This
results in much simpler code.

This code also removes the use of ctdb_address.  Duplicating the port
is pointless without an abstraction layer around ctdb_address.  If
CTDB gets an incompatible transport in the future then add an
abstraction layer.

Note that the infiniband code is not updated.  Compilation of the
infiniband code is already broken.  Fixing it will be a separate,
properly tested effort.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
2015-03-23 12:23:12 +01:00
Martin Schwenke
39d2fd330a ctdb-recoverd: Abort when daemon can take recovery lock during recovery
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Fri Feb 13 09:48:15 CET 2015 on sn-devel-104
2015-02-13 09:48:15 +01:00
Martin Schwenke
432d677489 ctdb-recoverd: Improve error messages on recovery lock coherence fail
When the daemon is able to take the recovery lock during recovery we
might as well guess that the cluster filesystem has a lock coherence
problem and print a more useful message.  This will be more helpful to
those trying out cluster filesystems that don't have lock coherence or
that are difficult to setup.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-02-13 07:19:07 +01:00
Martin Schwenke
1d6ed91f55 ctdb-recoverd: Simplify ctdb_recovery_lock()
Have it just silently take or fail to take the lock, except on an
unexpected failure (where it should log an error).

This means that when it is called we need to keep the old behaviour
and explicitly release the lock.  In do_recovery() the lock is
released and a message is printed before attempting to take the lock.
In the daemon sanity check the lock must be released in the error path
if it is actually taken.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-02-13 07:19:07 +01:00
Martin Schwenke
db32a2bce5 ctdb-recoverd: New function ctdb_recovery_unlock()
Unlock the recovery lock file.  This way knowledge of the file
descriptor isn't sprinkled throughout the code.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-02-13 07:19:07 +01:00
Martin Schwenke
72701be663 ctdb-recoverd: New function ctdb_recovery_have_lock()
True if this recovery daemon holds the lock.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-02-13 07:19:07 +01:00
Martin Schwenke
d110fe2318 ctdb-daemon: Mark tunable VerifyRecoveryLock as obsolete
It is pointless having a recovery lock but not sanity checking that it
is working.  Also, the logic that uses this tunable is confusing.  In
some places the recovery lock is released unnecessarily because the
tunable isn't set.

Simplify the logic by assuming that if a recovery lock is specified
then it should be verified.

Update documentation that references this tunable.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-02-13 07:19:07 +01:00
Michael Adam
a59fb322d6 ctdb: improve helpfulness of debug message when taking reclock fails
Print out the errno if the fcntl call.

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Richard Sharpe <rsharpe@samba.org>

Autobuild-User(master): Michael Adam <obnox@samba.org>
Autobuild-Date(master): Fri Jan  9 04:25:02 CET 2015 on sn-devel-104
2015-01-09 04:25:02 +01:00
Martin Schwenke
acf26089f1 ctdb-util: Rename db_wrap to tdb_wrap and make it a build subsystem
This makes it consistent with Samba, to ease transition.

Update unit test code to link to with tdb_wrap instead of including
db_wrap.c.

There are some potential whitespace fixes in this commit that have
been ignored.  CTDB's lib/tdb_wrap will be deleted after the
transition to Samba's lib/tdb_wrap, so there's no point polishing it
too much.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2014-09-10 01:36:15 +02:00
Martin Schwenke
b0f9d33058 ctdb: Fix some "declarations after code" problems
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2014-09-10 01:36:14 +02:00
Martin Schwenke
c1558adeaa ctdb: Use sys_read() and sys_write() to ensure correct signal interaction
... and avoid compiler warnings in some cases.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2014-08-21 04:46:13 +02:00
Amitay Isaacs
f87b7f664f ctdb-vacuum: Use existing function ctdb_marshall_finish
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Volker Lendecke <vl@samba.org>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Wed Jul 23 09:44:00 CEST 2014 on sn-devel-104
2014-07-23 09:44:00 +02:00
Amitay Isaacs
2855173dac ctdb-daemon: Do not thaw databases if recovery is active
This prevents ctdb tool from thawing databases prematurely in
thaw/wipedb/restoredb commands if recovery is active.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2014-07-07 13:29:50 +02:00
Amitay Isaacs
7aa20ccb5c ctdb-daemon: No need to call event scripts with CTDB_CALLED_BY_USER
This was added to support external monitoring using CTDB event scripts.
However, it was never used.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2014-01-16 11:41:12 +11:00
Amitay Isaacs
6d1b74f052 ctdb-server: Coverity fixes
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
2013-11-19 17:13:03 +01:00