1
0
mirror of https://github.com/samba-team/samba.git synced 2024-12-24 21:34:56 +03:00
Commit Graph

215 Commits

Author SHA1 Message Date
Martin Schwenke
4656b0816a ctdb-daemon: Don't explicitly disable monitoring around recovery
Monitoring can fail during recovery due to databases (e.g. registry)
being unavailable.  This has been avoided by explicitly disabling
monitoring around recovery via the START_RECOVERY and END_RECOVERY
controls.  With this approach only there is still a window between
enabling recovery mode and START_RECOVERY when monitoring could be
attempted.  However, explicitly disabling monitoring is unnecessary
because monitoring is not done when a node is in recovery.

So remove the explicit disable/enable of monitoring and rely on
monitoring being skipped when recovery mode is active.

The only possible change of behaviour with this change is that there
is now a window between setting recovery mode to normal and the
END_RECOVERY control where monitoring is enabled.  However, at this
point databases would be available and the "recovered" event will
cancel any in-progress monitoring.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2017-09-14 14:49:15 +02:00
Martin Schwenke
173aa683d5 ctdb-daemon: Don't explicitly disable monitoring when stopping a node
Monitoring is now avoided for inactive nodes anyway.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2017-09-14 14:49:15 +02:00
Amitay Isaacs
027689a2cf ctdb-daemon: Increase priority of logs when recovery happens
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2017-07-04 13:11:16 +02:00
Amitay Isaacs
c6f2624287 ctdb-daemon: Increase priority of logs when node is stopped/continued
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2017-07-04 13:11:16 +02:00
Amitay Isaacs
1992404326 ctdb-daemon: Increase priority of logs for recmaster changes
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2017-07-04 13:11:16 +02:00
Amitay Isaacs
7c462b0df8 ctdb-daemon: Store db_flags instead of individual boolean flags
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2017-06-29 10:34:27 +02:00
Amitay Isaacs
4e43a344cc ctdb-daemon: Add accessors for CTDB_DB_FLAGS_STICKY flag
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2017-06-29 10:34:27 +02:00
Amitay Isaacs
d0fa710ea1 ctdb-daemon: Add accessors for CTDB_DB_FLAGS_READONLY flag
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2017-06-29 10:34:26 +02:00
Amitay Isaacs
94af277c48 ctdb-daemon: Add accessors for CTDB_DB_FLAGS_PERSISTENT flag
This allows to differentiate between the two database models.

ctdb_db_persistent() - replicated and permanent
ctdb_db_volatile() - distributed and temporary

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2017-06-29 10:34:26 +02:00
Amitay Isaacs
f8200153b2 ctdb-recovery: Finish processing for recovery mode ACTIVE first
BUG: https://bugzilla.samba.org/show_bug.cgi?id=12857

This simplifies the code and avoids complicated conditions.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2017-06-24 10:28:21 +02:00
Amitay Isaacs
d74dadd7f2 ctdb-recovery: Simplify logging of recovery mode setting
BUG: https://bugzilla.samba.org/show_bug.cgi?id=12857

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2017-06-24 10:28:21 +02:00
Amitay Isaacs
f2771fcbf4 ctdb-recovery: Setting up of recmode should be idempotent
BUG: https://bugzilla.samba.org/show_bug.cgi?id=12857

If the recovery mode is already set to the expected value, there is
nothing to do.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2017-06-24 10:28:21 +02:00
Martin Schwenke
c6a7f680ce ctdb-daemon: Fix CID 1363067 Resource leak (RESOURCE_LEAK)
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-08-03 05:29:24 +02:00
Martin Schwenke
74aca5f4c6 ctdb-daemon: Fix CID 1363233 Resource leak (RESOURCE_LEAK)
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-08-03 05:29:24 +02:00
Amitay Isaacs
79b6b4b621 ctdb-daemon: Drop priorites from freeze/thaw code
Parallel database recovery freezes databases in parallel and irrespective
of database priority.  So drop priority from freeze/thaw code.
Database priority will be dropped completely soon.

Now FREEZE and THAW controls operate on all the databases.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2016-07-25 21:29:42 +02:00
Amitay Isaacs
7c8c6ce74e ctdb-daemon: Improve log message
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2016-07-05 10:53:14 +02:00
Amitay Isaacs
e6818c8e3c ctdb-recoverd: Improve election win messages
Logging that node has lost election is less useful than knowing which
node has won the election.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2016-07-05 10:53:14 +02:00
Amitay Isaacs
c620bf5deb ctdb-daemon: Reset push_started flag once DB_PUSH_CONFIRM is done
Once DB_PUSH_START is processed as part of recovery, push_started
flag tracks if there are multiple attempts to send DB_PUSH_START.
In DB_PUSH_CONFIRM, once the record count is confirmed, all information
related to DB_PUSH should be reset.  However, The push_started flag was
not reset when the push_state was reset.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Wed Jun  8 14:31:52 CEST 2016 on sn-devel-144
2016-06-08 14:31:52 +02:00
Martin Schwenke
95a7920d22 ctdb-cluster-mutex: Register an extra handler for when mutex is lost
Pass NULL if not needed.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-06-08 00:51:29 +02:00
Martin Schwenke
4f0ca0107c ctdb-cluster-mutex: ctdb_cluster_mutex() registers handler and private data
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-06-08 00:51:29 +02:00
Martin Schwenke
145ddcbe37 ctdb-cluster-mutex: Drop cluster_mutex_handler() ctdb and handle arguments
This makes the API more general.  If they are needed in a handler then
they can be in the private data.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-06-08 00:51:29 +02:00
Martin Schwenke
8cf74f335e ctdb-recovery: Wrap private data for reclock test callback
This will allow a simplification of the cluster mutex API, so the
private data can be registered when calling ctdb_cluster_mutex().

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-06-08 00:51:29 +02:00
Martin Schwenke
5c4744e69d ctdb-cluster-mutex: Pass a talloc context to allocate the handle off
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-06-08 00:51:28 +02:00
Martin Schwenke
fdd214ce6a ctdb-daemon: Rename recovery lock file to just recovery lock
It isn't necessarily a file.

Don't bother changing the control, since it doesn't pervade the code.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-06-08 00:51:28 +02:00
Martin Schwenke
091d4d2dbb ctdb-recovery: Consistency check reclock in start recovery control
If the recovery lock setting is not consistent with that of the
recovery master then abort.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-06-08 00:51:28 +02:00
Martin Schwenke
3e272e081f ctdb-recover: Avoid duplicate deferred attach processing
Deferred attach processing is done unconditionally at this point.  It
is then done again if recovery lock checking is done and completes
successfuly.  If the recovery lock checking fails then it should not
be done at all.

Move this processing so it is done with the early exit when the
recovery lock is not being used.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-05-06 11:39:09 +02:00
Martin Schwenke
bcb838ba1e ctdb-recovery: Move recovery lock functions to recovery daemon code
ctdb_recovery_have_lock(), ctdb_recovery_lock(),
ctdb_recovery_unlock() are only used by recovery daemon, so move them
there.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-04-28 09:39:17 +02:00
Martin Schwenke
df99d9e273 ctdb-cluster-mutex: Factor out cluster mutex code
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-04-28 09:39:17 +02:00
Martin Schwenke
ecc6751c6b ctdb-recovery: Factor out setting of cluster mutex handler
This means that the cluster mutex handle can now be treated as opaque.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-04-28 09:39:17 +02:00
Martin Schwenke
94fb2cf0ec ctdb_recovery: ctdb_cluster_mutex() now takes an argstring argument
All of the ctdb_cluster_mutex_* infrastucture can now handle an
arbitrary mutex.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-04-28 09:39:17 +02:00
Martin Schwenke
46684867b1 ctdb-recovery: Recovery lock setting can now include helper command
The underlying change is to allow the cluster mutex argstring to
optionally contain a helper command.  When the argument string starts
with '!' then the first word is the helper command to run.  This is
now the standard way of changing the helper from the default.

CTDB_CLUSTER_MUTEX_HELPER show now only be used to change the location
of the default helper when testing.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-04-28 09:39:17 +02:00
Martin Schwenke
918b0d9a9c ctdb-recovery: Parse recovery lock setting
This is currently just treated as the name of a lock file.  However,
it is really some arbitrary arguments to lock helper.

Therefore, it should be parsed and passed as separate arguments to the
lock helper.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-04-28 09:39:17 +02:00
Martin Schwenke
64d557200e ctdb-recovery: Reimplement ctdb_recovery_lock() using ctdb_cluster_mutex()
Replace the file descriptor for the recovery lock in the CTDB context
with the cluster mutex handle, where non-NULL means locked.
Attempting to take the recovery lock is now asynchronous and no longer
blocks the recovery daemon.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-04-28 09:39:16 +02:00
Martin Schwenke
0b0b954ff2 ctdb-recovery: Kill cluster mutex helper with a signal that can be caught
Unlike fcntl(2), some other helper might need to explicitly take
action to release a mutex.  This can be done by catching SIGTERM.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-04-28 09:39:16 +02:00
Martin Schwenke
e679a1731c ctdb-recovery: Switch ctdb_cluster_mutex() to use helper
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-04-28 09:39:16 +02:00
Martin Schwenke
978404ecde ctdb-recovery: Add optional timeout argument to ctdb_cluster_mutex()
Timeout in seconds, 0 means no timeout.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-04-28 09:39:16 +02:00
Martin Schwenke
43e9f58d6a ctdb-recovery: Factor out reclock testing into ctdb_cluster_mutex()
This is currently only used to check whether the recovery lock can be
taken.  However, name it more generally in anticipation of using it
for general cluster mutex taking and testing.

No functional changes.  A couple of debug message simplifications and
code rearrangements.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-04-28 09:39:16 +02:00
Martin Schwenke
ab75f2a587 ctdb-recovery: Use a configurable handler when testing cluster mutex
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-04-28 09:39:16 +02:00
Martin Schwenke
419f57f378 ctdb-recovery: Factor out new function set_recmode_handler()
This is used to reply to the recmode control for all the different
cases.  The callers can later be generalised to use a pointer, which
can then be used for recovery lock handling in different contexts.

Note that the handle is now freed in set_recmode_handler() rather than
the callbacks.

There is one difference in behaviour.  Deferred attach calls are now
processed in the timeout case, where they weren't before.  That's a
bug fix!

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-04-28 09:39:16 +02:00
Martin Schwenke
14a2330692 ctdb-recovery: Use single char ASCII numbers for status from child
'0' = Child took the mutex
  '1' = Unable to take mutex - contention
  '2' = Unable to take mutex - timeout
  '3' = Unable to take mutex - error

This is a straightforward API.  When the child is generalised to an
external helper then this makes it easier for a helper to be, for
example, a simple script.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-04-28 09:39:16 +02:00
Martin Schwenke
4842b6bb91 ctdb-recovery: Rename recovery lock functions and struct
Use the more general name "cluster mutex", since we are likely to end
up with more than one cluster-wide lock.  There will probably be a
dedicated recovery lock, held only during recovery, and also a second
lock that is held by the master node.  Currently one lock is used for
both purposes.

At the moment the struct and functions are involved with setting the
recovery mode.  However, they'll be abstracted out to more generally
deal with the cluster mutexes, so "recmode" -> "cluster_mutex".  Drop
"set" from names, since this is used to test the lock.  Also drop
"ctdb" prefix from functions, since they are local to this file.  The
struct will eventually be a long-lived handle that will release the
mutex when freed, so name it accordingly.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-04-28 09:39:16 +02:00
Amitay Isaacs
95a15cde45 ctdb-daemon: Implement new controls DB_PULL and DB_PUSH_START/DB_PUSH_CONFIRM
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2016-03-25 03:26:15 +01:00
Martin Schwenke
46edef25df ctdb-recovery: Limit scope of reclock latency statistics
It does not make sense to update this statistic for the timeout case,
since this could skew the statistic.  To keep it simple, just update
it for the usual case where there is lock contention, since this is
the usual case.  So the daemon statistic measures time to test the
lock and the corresponding recovery daemon statistic measures time to
take the lock.

Additionally, the recovery daemon will eventually use this code to
take the lock, and the method of updating the latency statistic will
need to be pushed further out to a configurable handler that depends
on the calling context.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue Feb 23 10:32:06 CET 2016 on sn-devel-144
2016-02-23 10:32:06 +01:00
Martin Schwenke
188019b877 ctdb-recovery: Negate the status when checking the recovery lock
Have 0 indicate that the lock was taken.  This allows non-zero values
to be used to indicate why the lock could not be taken.  EACCES means
lock contention.

For now use just EACCES to cover all failures, since
ctdb_recovery_lock() returns a bool and details of other errors will
be lost.  ctdb_recovery_lock() will undergo some big changes, so don't
try to fix this now.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-02-23 07:23:18 +01:00
Martin Schwenke
fad3f367b7 ctdb-recovery: Clean up status handling from recmode child
This currently returns an incorrect error when the expected number of
bytes are not read.  Separate out the different cases to clarify the
logic and avoid reporting the wrong error.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-02-23 07:23:18 +01:00
Martin Schwenke
b6c3918457 ctdb-recovery: Don't bother ensuring file descriptor is -1
This is already done before the destructor is assigned.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-02-23 07:23:18 +01:00
Martin Schwenke
531e6724ba ctdb-recovery: Don't store recmode in recovery mode state
The callbacks that use this value are only ever called if recovery
mode is being set to NORMAL.  So do not check if recmode is NORMAL
either.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-02-23 07:23:18 +01:00
Martin Schwenke
6695fa50ae ctdb: Use ctdb_wait_for_process_to_exit()
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-02-23 07:23:18 +01:00
Martin Schwenke
4d6ec81299 ctdb-recovery: Drop redundant status send when setting recovery mode
The child process writes the status into the pipe before looping to
wait.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-02-23 07:23:18 +01:00
Martin Schwenke
3e2f2169a4 ctdb-recovery: Include lib/util/time.h instead of samba_util.h
Less is more...

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2016-02-23 07:23:18 +01:00