1
0
mirror of https://github.com/samba-team/samba.git synced 2025-03-12 20:58:37 +03:00

585 Commits

Author SHA1 Message Date
Amitay Isaacs
01c6c90e98 ctdb-daemon: Remove dependency on includes.h
Instead of includes.h, include the required header files explicitly.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2015-10-30 02:00:27 +01:00
Amitay Isaacs
2fdb332fad ctdb-daemon: Stop using tevent compatibility definitions
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2015-10-30 02:00:27 +01:00
Amitay Isaacs
7084cb92e2 ctdb-include: Move include/internal/cmdline.h to common/
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2015-10-30 02:00:27 +01:00
Amitay Isaacs
b900adc55c ctdb-daemon: Separate prototypes for system specific functions
This groups function prototypes for system specific functions in
common/system.h and removes them from ctdb_private.h.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2015-10-30 02:00:27 +01:00
Volker Lendecke
d527ab1094 ctdbd: Fix a typo
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
2015-10-14 02:19:14 +02:00
Amitay Isaacs
0101748287 ctdb-recoverd: Always check for recmaster before doing recovery
Recovery daemon checks if it is the recovery master before performing
certain checks.  During those checks it's possible that re-election can
change the recmaster.  In such a case, the recovery daemon should never
do a database recovery.

This is not complete fix since the recovery master can still change
while the recovery is going on.  The correct fix is to abort recovery
if the recovery master changes.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Wed Oct  7 17:55:05 CEST 2015 on sn-devel-104
2015-10-07 17:55:05 +02:00
Amitay Isaacs
3cf93d9136 ctdb-recoverd: Get rid of connected-ness comparison in election
The reason for favouring more connected node is to create a larger
cluster in case of a split brain.  In split brain condition, the nodes
are not communicating across partitions and each partition will run its
own election.  Among all the partitions, the node which holds the recovery
lock will eventually "win".  All the other nodes which won election but
could not grab recovery lock will end up banning themselves.

This also prevents the recovery master role from bouncing between nodes
during startup when the entire cluster is restarted.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2015-10-07 14:53:29 +02:00
Amitay Isaacs
fbd9c9fd2f ctdb-recoverd: Do not freeze databases for election
If election occurs during SMB activity, then trying to freeze all the
databases can cause samba/ctdb deadlock which parallel database recovery
is trying to avoid.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2015-10-07 14:53:29 +02:00
Amitay Isaacs
e6ff36506c ctdb-recoverd: Add code for parallel database recovery
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2015-10-07 14:53:29 +02:00
Amitay Isaacs
4b39a7706f ctdb-recoverd: Update flags on all nodes before database recovery
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2015-10-07 14:53:29 +02:00
Amitay Isaacs
9843363629 ctdb-recoverd: Update capabilities before the database recovery
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2015-10-07 14:53:29 +02:00
Amitay Isaacs
14cacd2925 ctdb-recovery: Factor out existing database recovery code
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2015-10-07 14:53:29 +02:00
Amitay Isaacs
62f1e2579a ctdb-daemon: Replace ctdb_message with srvid abstraction
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2015-10-07 14:53:28 +02:00
Amitay Isaacs
1df2594386 ctdb-daemon: Introduce per database generation
The database generation for each database is updated only during recovery.
After recovery is complete the database generation would be the same as
the global generation.

The database generation is required for parallel database recovery.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2015-10-07 14:53:27 +02:00
Amitay Isaacs
4f155e77a8 ctdb-daemon: Rename ctdb_control_wipe_database to ctdb_control_transdb
The same structure is required in new controls for database transactions.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2015-10-07 14:53:27 +02:00
Martin Schwenke
b234ae0a90 ctdb-recoverd: Clear IP assignment tree on election loss
If a node was previously recovery master (say, 20 years ago) and it
becomes recovery master again then, if IP assignments have changed,
verify_remote_ip_allocation() can produce messages like the following
when called during recovery:

  ctdbd: recoverd:Inconsistent IP allocation - node 0 thinks 10.1.1.1 is held by node 0 while it is assigned to node 1

When a node loses an election it should clear all data specific to it
being the recovery master.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-07-01 04:18:28 +02:00
Amitay Isaacs
941669ae36 ctdb-recovered: Drop unused variable
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
2015-06-05 11:28:23 +02:00
Amitay Isaacs
2e2dba8d13 ctdb-recoverd/vacuum: Remove vacuum_info structure
For all the records listed in VACUUM_FETCH, migration requests are sent
immediately without waiting.  This means there can only be a single
VACUUM_FETCH processing active at a time.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
2015-06-05 11:28:23 +02:00
Michael Adam
92d1486b87 ctdb-recoverd/vacuum: move fetch loop back into fetch handler.
With the processing of one element factored out,
it is more natural to have the actual loop inside the
handler function. This also makes the talloc/free
bracked around the loop more obvious.

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-06-05 11:28:23 +02:00
Michael Adam
4103463ad2 ctdb-recoverd/vacuum: slightly reorder the vacuum fetch loop
Reads more naturally this way, imho.

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-06-05 11:28:23 +02:00
Michael Adam
a1c941be6f ctdb-recoverd/vacuum: add common exit point to vacuum_fetch_handler
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-06-05 11:28:23 +02:00
Michael Adam
9092617888 ctdb-recoverd/vacuum: factor vacuum_fetch_process_one out of vacuum_fetch_loop
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-06-05 11:28:23 +02:00
Michael Adam
84ab6d003a ctdb-recoverd/vacuum: move two variables into scope.
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-06-05 11:28:23 +02:00
Michael Adam
9e5cf6fd5c ctdb-recoverd/vacuum: remove unneeded prototype.
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-06-05 11:28:23 +02:00
Martin Schwenke
91f99ddfb3 ctdb-recoverd: Remove redundant condition when checking recovery lock
It isn't possible to hold the recovery lock without having a lock file
set.

This is part of a goal to generalise the recovery lock mechanism to
just use a helper program, which may use a lock file or may use
something else.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-05-10 03:22:13 +02:00
Martin Schwenke
a45ab7d1fe ctdb-recoverd: Simplify using TALLOC_FREE()
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-05-10 03:22:13 +02:00
Martin Schwenke
2c72c9de48 ctdb-recoverd: Drop redundant condition in election handler
Election packets from the current node are ignored at the beginning of
the function, so this does not need to be checked.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-05-10 03:22:13 +02:00
Martin Schwenke
c75fdf208f ctdb-recoverd: Remove unused memory context variable
It is set, memory is allocated but it is never used.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-05-10 03:22:13 +02:00
Martin Schwenke
4b4ba77f4a ctdb-recoverd: Replace unnecessary use of ctdb->recovery_master
Databases are only pulled by the recovery master, so it can compare
with current node PNN.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-05-10 03:22:13 +02:00
Martin Schwenke
6415edfa26 ctdb-recoverd: Rename some local variables to avoid conflict with convention
rec is always a (struct ctdb_recoverd *)

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-05-10 03:22:13 +02:00
Martin Schwenke
36fc620898 ctdb_recoverd: Move num_lmasters calculation to near where it is used
Unless this node is the recovery master then this is not needed.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-05-10 03:22:13 +02:00
Martin Schwenke
1fd2d3886c ctdb-recoverd: Make num_lmasters a local variable
It isn't used anywhere else and is always re-initialised to 0.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-05-10 03:22:13 +02:00
Martin Schwenke
385e9326ea ctdb-recoverd: Remove unused struct members num_active and num_connected
They are initialised and updated but the values are never used.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-05-10 03:22:13 +02:00
Martin Schwenke
c3d6678dbc ctdb-recoverd: Use capabilities API
Simplify update_capabilities() using the capabilities API and store
the capabilities in new field rec->caps rather than scattered around
ctdb->nodes.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-05-10 03:22:13 +02:00
Martin Schwenke
85bd9a33eb ctdb-recoverd: Avoid nodemap-related checks when recoveries are disabled
The potential resulting recovery won't run anyway.  Also recoveries
may have been disabled by "reloadnodes" and if the nodemaps are
inconsistent between nodes then avoid triggering an unnecessary
recovery.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-04-07 07:43:13 +02:00
Martin Schwenke
ee9619c28b ctdb-recoverd: New message ID CTDB_SRVID_DISABLE_RECOVERIES
Also add test stub support.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-04-07 07:43:13 +02:00
Martin Schwenke
2ca484cd50 ctdb-recoverd: Simplify disable_ip_check_handler() using ctdb_op_disable()
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-04-07 07:43:13 +02:00
Martin Schwenke
108db3396f ctdb-recoverd: Add slightly more abstraction for disabling takeover runs
Factor out new function srvid_disable_and_reply(), which can be
re-used.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-04-07 07:43:13 +02:00
Martin Schwenke
ec32d9bea8 ctdb-recoverd: Reimplement ReRecoveryTimeout using ctdb_op_disable()
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-04-07 07:43:12 +02:00
Martin Schwenke
281f7e8152 ctdb-recoverd: Use a goto for do_recovery() failures
This will allow extra things to be done on failure.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-04-07 07:43:12 +02:00
Martin Schwenke
a2044c65bc ctdb-recoverd: Reimplement disabling takeover runs using ctdb_op_disable()
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-04-07 07:43:12 +02:00
Martin Schwenke
55b246195b ctdb-recoverd: Add a new abstraction ctdb_op_disable()
This can be used to disable and re-enable an operation, and do all the
relevant sanity checking.

Most of this is from existing functions
disable_takeover_runs_handler(), clear_takeover_runs_disable() and
reenable_takeover_runs().

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-04-07 07:43:12 +02:00
Martin Schwenke
48c91407ab ctdb-recoverd: Don't release and re-take the recovery lock
Just continue to hold it, otherwise a broken node might win an
election and grab the lock.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-02-13 07:19:07 +01:00
Martin Schwenke
1d6ed91f55 ctdb-recoverd: Simplify ctdb_recovery_lock()
Have it just silently take or fail to take the lock, except on an
unexpected failure (where it should log an error).

This means that when it is called we need to keep the old behaviour
and explicitly release the lock.  In do_recovery() the lock is
released and a message is printed before attempting to take the lock.
In the daemon sanity check the lock must be released in the error path
if it is actually taken.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-02-13 07:19:07 +01:00
Martin Schwenke
be19a17faf ctdb-recoverd: Remove check_recovery_lock()
This has not done anything useful since commit
b9d8bb23af8abefb2d967e9b4e9d6e60c4a3b520.  Instead, just check that
the lock is held.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-02-13 07:19:07 +01:00
Martin Schwenke
668ed53662 ctdb-recoverd: Improve logging when recovery lock file is changed
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-02-13 07:19:07 +01:00
Martin Schwenke
db32a2bce5 ctdb-recoverd: New function ctdb_recovery_unlock()
Unlock the recovery lock file.  This way knowledge of the file
descriptor isn't sprinkled throughout the code.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-02-13 07:19:07 +01:00
Martin Schwenke
72701be663 ctdb-recoverd: New function ctdb_recovery_have_lock()
True if this recovery daemon holds the lock.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-02-13 07:19:07 +01:00
Martin Schwenke
d110fe2318 ctdb-daemon: Mark tunable VerifyRecoveryLock as obsolete
It is pointless having a recovery lock but not sanity checking that it
is working.  Also, the logic that uses this tunable is confusing.  In
some places the recovery lock is released unnecessarily because the
tunable isn't set.

Simplify the logic by assuming that if a recovery lock is specified
then it should be verified.

Update documentation that references this tunable.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2015-02-13 07:19:07 +01:00
Amitay Isaacs
959b9ea0ef ctdb-recoverd: Process all the records for vacuum fetch in a loop
Processing one migration request at a time is very slow and processing
a batch of records can take longer than VacuumInterval.  This causes
subsequent vacuum fetch requests to be dropped.  The dropped records
can accumulate quickly and will cause the vacuum database traverse to
be quite expensive.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Fri Dec  5 17:06:58 CET 2014 on sn-devel-104
2014-12-05 17:06:58 +01:00