1
0
mirror of https://github.com/samba-team/samba.git synced 2024-12-23 17:34:34 +03:00
Commit Graph

8074 Commits

Author SHA1 Message Date
Martin Schwenke
80f3f7c188 ctdb-tests: Improve counting of database records
Record counts are sometimes incomplete for large databases when
relevant tests are run on a real cluster.

This probably has something to do with ssh, pipes and buffering, so
move the filtering and counting to the remote end.  This means that
only the count comes across the pipe, instead of all the record data.

Instead of explicitly excluding the key for persistent database
sequence numbers, just exclude any key starting with '_'.  Such keys
are not used in tests.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Mon Oct  8 05:36:11 CEST 2018 on sn-devel-144
2018-10-08 05:36:11 +02:00
Martin Schwenke
52dcecbc92 ctdb-tests: Add extra debug to large database recovery test
This test sometimes fails, probably because the test is flakey.
Either the records aren't being added correctly or the counting of
records loses records.  Try to debug both possibilities.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-10-08 02:46:23 +02:00
Martin Schwenke
d67d8ed44a ctdb-tests: Shut down transaction_loop clients more cleanly
A transaction_loop client can exit with a transaction active when its
time limit expires.  This causes a recovery and causes problems with
the test cleanup, which detects unwanted recoveries and fails.

Set a flag when the time limit expires and exit cleanly before the
next transaction is started.

Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-10-08 02:46:23 +02:00
Martin Schwenke
2aa006a311 ctdb-tools: Have onnode pass -n option even when regular ssh not in use
ONNODE_SSH is really a test hook, so it doesn't need to support
completely random values.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-10-08 02:46:23 +02:00
Martin Schwenke
6ac5124b01 ctdb-tests: Support closing of stdin in local daemons ssh stub
Not sure this is needed but this makes it behave the same as ssh.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-10-08 02:46:23 +02:00
Martin Schwenke
0dfb3c87b5 ctdb-tests: Be more careful when building public IP addresses
The goal is to allow more local daemons by expanding the address range
rather than generating invalid addresses.

For IPv6, use a separate address space instead of an offset for the
2nd address.

For IPv4, use the last 2 octets with addresses starting at
192.168.100.1 and 192.168.200.1.  Avoid addresses with 0 and 255 in
the last octet by using a maximum of 100 addresses per "subnet"
starting at .1.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-10-08 02:46:23 +02:00
Martin Schwenke
36eb738877 ctdb-tests: Be more careful when building node addresses
The goal is to allow more local daemons by expanding the address range
rather than generating invalid addresses.

For IPv6, use all 4 trailing hex digits.

For IPv4, use the last 2 octets.  Although 127.0.0.0 is a /8 network,
avoid unexpected issues due to 0 and 255 in the last octet.  Use a
maximum of 100 addresses per "subnet" starting at .1.  Keep the first
group of addresses in 127.0.0.0/24 to continue to allow a reasonable
number of nodes to be tested with socket-wrapper.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-10-08 02:46:23 +02:00
Martin Schwenke
03dddc37b5 ctdb-tests: Don't format IPv4 octets as hex digits
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-10-08 02:46:22 +02:00
Martin Schwenke
0eabac5295 ctdb-tests: Be more efficient about starting/stopping local daemons
Don't loop, just use onnode all.

For shutting down, use onnode -p all.  This results in a significant
time saving for stopping many deamons because "ctdb shutdown" is now
synchronous.

onnode -p all can be used to start daemons directly because they
daemonize.  However, this does not work under valgrind because the
valgrind process does not exit, so onnode will wait forever for it.
In this case, use onnode without the -p option.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-10-08 02:46:22 +02:00
Martin Schwenke
a9ac33015b ctdb-tests: Do not use ctdbd_wrapper in local daemon tests
Run the daemon directly and shut it down using ctdb shutdown.

The wrapper waits for ctdbd to reach >=FIRST_RECOVERY runstate within
a timeout period and shuts ctdbd down if that doesn't happen.  This is
only really used to ensure that ctdbd doesn't exit early after an
apparently successful start.  There are no known cases where ctdbd
will continue running but fail to reach >=FIRST_RECOVERY runstate.

When ctdbd is started in tests, the test code will wait until ctdbd is
in a healthy state on all nodes before proceeding, so there is
effectively no change in behaviour.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-10-08 02:46:22 +02:00
Martin Schwenke
8bde6fa09c ctdb-tests: Don't remove non-existent test database directory
This directory is no longer used.  Lack of removal doesn't seem to
cause a problem.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-10-08 02:46:22 +02:00
Martin Schwenke
f2e4a5e9fa ctdb-tests: Drop unused function maybe_stop_ctdb()
There are too many functions to start/stop daemons.  Simplify this.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-10-08 02:46:22 +02:00
Martin Schwenke
2cd6a00399 ctdb-tests: Explicitly check for local daemons when shutting down
This is clearer if the logic is explicit...  and...

There are too many functions to start/stop daemons.  Simplify this.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-10-08 02:46:22 +02:00
Martin Schwenke
90f6b0a1ed ctdb-tests: Drop functions daemons_start(), daemons_stop()
There are too many functions to start/stop daemons.  Simplify this.

Inline the functionality into ctdb_start_all() and ctdb_stop_all().

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-10-08 02:46:22 +02:00
Martin Schwenke
f1ede41adf ctdb-tests: Don't used daemons_start()/daemons_stop() directly in tests
There are too many functions to start/stop daemons.  Simplify this.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-10-08 02:46:22 +02:00
Martin Schwenke
4642a347d0 ctdb-tests: Rename _ctdb_start_all() -> ctdb_start_all()
There are too many functions to start/stop daemons.  Simplify this.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-10-08 02:46:22 +02:00
Martin Schwenke
f57e5bbde7 ctdb-tests: Rename ctdb_start_all() -> ctdb_init()
There are too many functions to start/stop daemons.  Simplify this.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-10-08 02:46:21 +02:00
Martin Schwenke
a66a96934a ctdb-tests: Drop ps_ctdbd()
This was used for debugging tests by ensuring that the arguments to
ctdbd were as expected.  It no longer outputs anything useful because
ctdbd is now started without arguments.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-10-08 02:46:21 +02:00
Amitay Isaacs
83b3c5670d ctdb-tests: Drop code for RECEIVE_RECORDS control
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13641

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2018-10-08 02:46:21 +02:00
Amitay Isaacs
2f89bd96fb ctdb-protocol: Drop marshalling code for RECEIVE_RECORDS control
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13641

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2018-10-08 02:46:21 +02:00
Amitay Isaacs
81dae71fa7 ctdb-protocol: Mark RECEIVE_RECORDS control obsolete
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13641

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2018-10-08 02:46:21 +02:00
Amitay Isaacs
d18385ea2a ctdb-daemon: Drop implementation of RECEIVE_RECORDS control
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13641

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2018-10-08 02:46:21 +02:00
Amitay Isaacs
e15cdc652d ctdb-vacuum: Remove unnecessary check for zero records in delete list
Since no records are deleted from RB tree during step 1, there is no
need for the check.  Run step 2 unconditionally.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13641

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2018-10-08 02:46:21 +02:00
Amitay Isaacs
ef05239717 ctdb-vacuum: Fix the incorrect counting of remote errors
If a node fails to delete a record in TRY_DELETE_RECORDS control during
vacuuming, then it's possible that other nodes also may fail to delete a
record.  So instead of deleting the record from RB tree on first failure,
keep track of the remote failures.

Update delete_list.remote_error and delete_list.left statistics only
once per record during the delete_record_traverse.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13641

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2018-10-08 02:46:21 +02:00
Amitay Isaacs
202b9027ba ctdb-vacuum: Simplify the deletion of vacuumed records
The 3-phase deletion of vacuumed records was introduced to overcome
the problem of record(s) resurrection during recovery.  This problem
is now handled by avoiding the records from recently INACTIVE nodes in
the recovery process.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13641

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2018-10-08 02:46:20 +02:00
Martin Schwenke
dcc9935995 ctdb-tests: Add recovery record resurrection test for volatile databases
Ensure that deleted records and vacuumed records are not resurrected
from recently inactive nodes.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13641

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-10-08 02:46:20 +02:00
Amitay Isaacs
c4ec99b1d3 ctdb-daemon: Invalidate records if a node becomes INACTIVE
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13641

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2018-10-08 02:46:20 +02:00
Amitay Isaacs
040401ca3a ctdb-daemon: Don't pull any records if records are invalidated
This avoids unnecessary work during recovery to pull records from nodes
that were INACTIVE just before the recovery.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13641

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2018-10-08 02:46:20 +02:00
Amitay Isaacs
71896fddf1 ctdb-daemon: Add invalid_records flag to ctdb_db_context
If a node becomes INACTIVE, then all the records in volatile databases
are invalidated.  This avoids the need to include records from such
nodes during subsequent recovery after the node comes out INACTIVE state.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13641

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2018-10-08 02:46:20 +02:00
Noel Power
2e59a3343f PY3: make sure print stmt is enclosed by '(' & ')'
Signed-off-by: Noel Power <noel.power@suse.com>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2018-09-19 22:25:05 +02:00
Martin Schwenke
486022ef8f ctdb-recoverd: Set recovery lock handle at start of attempt
This allows the attempt to be cancelled if an election is lost and an
unlock is done before the attempt is completed.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13617

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Tue Sep 18 02:18:30 CEST 2018 on sn-devel-144
2018-09-18 02:18:30 +02:00
Martin Schwenke
b1dc568784 ctdb-recoverd: Handle cancellation when releasing recovery lock
If the recovery lock is in the process of being taken then free the
cluster mutex handle but leave the recovery lock handle in place.
This allows ctdb_recovery_lock() to fail.

Note that this isn't yet live because rec->recovery_lock_handle is
still only set at the completion of the attempt to take the lock.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13617

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-09-17 22:58:20 +02:00
Martin Schwenke
a755d060c1 ctdb-recoverd: Return early when the recovery lock is not held
This makes upcoming changes simpler.

Update to modern debug macro while touching relevant line.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13617

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-09-17 22:58:20 +02:00
Martin Schwenke
c52216740b ctdb-recoverd: Store recovery lock handle
... not just cluster mutex handle.

This makes the recovery lock handle long-lived and with allow the
releasing code to cancel an in-progress attempt to take the recovery
lock.

The cluster mutex handle is now allocated off the recovery lock
handle.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13617

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-09-17 22:58:20 +02:00
Martin Schwenke
a53b264aee ctdb-recoverd: Use talloc() to allocate recovery lock handle
At the moment this is still local and is freed after the mutex is
successfully taken.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13617

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-09-17 22:58:20 +02:00
Martin Schwenke
af22f03dbe ctdb-recoverd: Rename hold_reclock_state to ctdb_recovery_lock_handle
This will be a longer lived structure.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13617

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-09-17 22:58:20 +02:00
Martin Schwenke
c516e58ce9 ctdb-recoverd: Re-check master on failure to take recovery lock
If the master changed while trying to take the lock then fail gracefully.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13617

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-09-17 22:58:20 +02:00
Martin Schwenke
59fc01646c ctdb-recoverd: Clean up taking of recovery lock
No functional changes, just coding style cleanups and debug message
tweaks.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13617

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-09-17 22:58:20 +02:00
Martin Schwenke
e789d0da57 ctdb-cluster-mutex: Block signals around fork
If SIGTERM is received and the tevent signal handler setup in the
recovery daemon is still enabled then the signal is handled and a
corresponding event is queued.  The child never runs an event loop so
the signal is effectively ignored.

Resetting the SIGTERM handler isn't enough.  A signal can arrive
before that.

Block SIGTERM before forking and then immediately unblock it in the
parent.

In the child, unblock SIGTERM after the signal handler is reset.  An
explicit unblock is needed because according to sigprocmask(2) "the
signal mask is preserved across execve(2)".

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13617

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-09-17 22:58:20 +02:00
Martin Schwenke
5a6b139884 ctdb-cluster-mutex: Reset SIGTERM handler in cluster mutex child
If SIGTERM is received and the tevent signal handler setup in the
recovery daemon is still enabled then the signal is handled and a
corresponding event is queued.  The child never runs an event loop so
the signal is effectively ignored.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13617

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-09-17 22:58:19 +02:00
Noel Power
3cc284b2af PY3: fix some octal literals
Signed-off-by: Noel Power <noel.power@suse.com>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2018-09-16 06:16:19 +02:00
Ralph Wuerthner
e52abc8a44 ctdb-doc: Remove PIDFILE option from ctdbd_wrapper man page
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13610

Signed-off-by: Ralph Wuerthner <ralph.wuerthner@de.ibm.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Ralph Böhme <slow@samba.org>

Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Wed Sep 12 21:50:57 CEST 2018 on sn-devel-144
2018-09-12 21:50:57 +02:00
Martin Schwenke
3903f6c365 ctdb-build: Fix version handling when building tarball
Split get_version() into 2 functions, so that .distversion file can be
created.

Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Wed Sep 12 05:50:46 CEST 2018 on sn-devel-144
2018-09-12 05:50:46 +02:00
Martin Schwenke
a122428a58 ctdb-build: Use wafsamba's INSTALL_DIR()
install_dir() doesn't work (doesn't respect $DESTDIR) and is only
available for backward compatibilty.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Alexander Bokovoy <ab@samba.org>
2018-09-11 06:59:11 +02:00
Martin Schwenke
b487979f89 ctdb-tests: Fix CTDB -O3 --picky-developer build on CentOS 7
gcc 4.8.5 complains:

[319/381] Compiling ctdb/tests/src/system_socket_test.c
../tests/src/system_socket_test.c: In function ‘test_tcp’:
../tests/src/system_socket_test.c:196:20: error: ‘rst_out’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
  assert((rst != 0) == (rst_out != 0));
                    ^
cc1: all warnings being treated as errors

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Andreas Schneider <asn@samba.org>
2018-09-07 17:26:18 +02:00
Martin Schwenke
bc62182ff4 ctdb-tests: Check result of write() in ARP and TCP tests
CTDB -O3 --picky-developer build is failing.  Not sure how this
slipped through.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Thu Sep  6 08:33:59 CEST 2018 on sn-devel-144
2018-09-06 08:33:59 +02:00
Alexander Bokovoy
0a9d98ba15 ctdb/wscript: rework how version number is retrieved
Using default context functions before waf initialization occured
is prone to error. Postpone calling samba_version.* code until we
got default context initialized.

Signed-off-by: Alexander Bokovoy <ab@samba.org>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2018-09-05 06:37:27 +02:00
Alexander Bokovoy
5c3d31eb14 cdtb/wscript: use top and out for waf 2.0
Signed-off-by: Alexander Bokovoy <ab@samba.org>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2018-09-05 06:37:26 +02:00
Alexander Bokovoy
175be9377e ctdb/wscript: adopt to waf-2.0
Signed-off-by: Alexander Bokovoy <ab@samba.org>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2018-09-05 06:37:26 +02:00
Alexander Bokovoy
65074d8901 ctdb/wscript: update to handle waf 2.0.4
Signed-off-by: Alexander Bokovoy <ab@samba.org>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2018-09-05 06:37:22 +02:00