IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
The startup_fd should not be propagated to the child processes created
from a daemon. It should only be used in the daemon code to return the
status of the startup. Another use of startup_fd is to notify the
parent if the daemon process has exited.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13659
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
ctdbd enters a broken state if eventd goes away. A clean shutdown is
not possible because that involves running events. Restarting eventd
is possible but this might mask a serious problem and it is possible
that eventd might keep on disappearing. Just exit.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13659
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Record counts are sometimes incomplete for large databases when
relevant tests are run on a real cluster.
This probably has something to do with ssh, pipes and buffering, so
move the filtering and counting to the remote end. This means that
only the count comes across the pipe, instead of all the record data.
Instead of explicitly excluding the key for persistent database
sequence numbers, just exclude any key starting with '_'. Such keys
are not used in tests.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Mon Oct 8 05:36:11 CEST 2018 on sn-devel-144
This test sometimes fails, probably because the test is flakey.
Either the records aren't being added correctly or the counting of
records loses records. Try to debug both possibilities.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
A transaction_loop client can exit with a transaction active when its
time limit expires. This causes a recovery and causes problems with
the test cleanup, which detects unwanted recoveries and fails.
Set a flag when the time limit expires and exit cleanly before the
next transaction is started.
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
ONNODE_SSH is really a test hook, so it doesn't need to support
completely random values.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Not sure this is needed but this makes it behave the same as ssh.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The goal is to allow more local daemons by expanding the address range
rather than generating invalid addresses.
For IPv6, use a separate address space instead of an offset for the
2nd address.
For IPv4, use the last 2 octets with addresses starting at
192.168.100.1 and 192.168.200.1. Avoid addresses with 0 and 255 in
the last octet by using a maximum of 100 addresses per "subnet"
starting at .1.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The goal is to allow more local daemons by expanding the address range
rather than generating invalid addresses.
For IPv6, use all 4 trailing hex digits.
For IPv4, use the last 2 octets. Although 127.0.0.0 is a /8 network,
avoid unexpected issues due to 0 and 255 in the last octet. Use a
maximum of 100 addresses per "subnet" starting at .1. Keep the first
group of addresses in 127.0.0.0/24 to continue to allow a reasonable
number of nodes to be tested with socket-wrapper.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Don't loop, just use onnode all.
For shutting down, use onnode -p all. This results in a significant
time saving for stopping many deamons because "ctdb shutdown" is now
synchronous.
onnode -p all can be used to start daemons directly because they
daemonize. However, this does not work under valgrind because the
valgrind process does not exit, so onnode will wait forever for it.
In this case, use onnode without the -p option.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Run the daemon directly and shut it down using ctdb shutdown.
The wrapper waits for ctdbd to reach >=FIRST_RECOVERY runstate within
a timeout period and shuts ctdbd down if that doesn't happen. This is
only really used to ensure that ctdbd doesn't exit early after an
apparently successful start. There are no known cases where ctdbd
will continue running but fail to reach >=FIRST_RECOVERY runstate.
When ctdbd is started in tests, the test code will wait until ctdbd is
in a healthy state on all nodes before proceeding, so there is
effectively no change in behaviour.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This directory is no longer used. Lack of removal doesn't seem to
cause a problem.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
There are too many functions to start/stop daemons. Simplify this.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This is clearer if the logic is explicit... and...
There are too many functions to start/stop daemons. Simplify this.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
There are too many functions to start/stop daemons. Simplify this.
Inline the functionality into ctdb_start_all() and ctdb_stop_all().
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
There are too many functions to start/stop daemons. Simplify this.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
There are too many functions to start/stop daemons. Simplify this.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
There are too many functions to start/stop daemons. Simplify this.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This was used for debugging tests by ensuring that the arguments to
ctdbd were as expected. It no longer outputs anything useful because
ctdbd is now started without arguments.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Since no records are deleted from RB tree during step 1, there is no
need for the check. Run step 2 unconditionally.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13641
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
If a node fails to delete a record in TRY_DELETE_RECORDS control during
vacuuming, then it's possible that other nodes also may fail to delete a
record. So instead of deleting the record from RB tree on first failure,
keep track of the remote failures.
Update delete_list.remote_error and delete_list.left statistics only
once per record during the delete_record_traverse.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13641
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
The 3-phase deletion of vacuumed records was introduced to overcome
the problem of record(s) resurrection during recovery. This problem
is now handled by avoiding the records from recently INACTIVE nodes in
the recovery process.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13641
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Ensure that deleted records and vacuumed records are not resurrected
from recently inactive nodes.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13641
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This avoids unnecessary work during recovery to pull records from nodes
that were INACTIVE just before the recovery.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13641
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
If a node becomes INACTIVE, then all the records in volatile databases
are invalidated. This avoids the need to include records from such
nodes during subsequent recovery after the node comes out INACTIVE state.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13641
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
This allows the attempt to be cancelled if an election is lost and an
unlock is done before the attempt is completed.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13617
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Tue Sep 18 02:18:30 CEST 2018 on sn-devel-144
If the recovery lock is in the process of being taken then free the
cluster mutex handle but leave the recovery lock handle in place.
This allows ctdb_recovery_lock() to fail.
Note that this isn't yet live because rec->recovery_lock_handle is
still only set at the completion of the attempt to take the lock.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13617
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This makes upcoming changes simpler.
Update to modern debug macro while touching relevant line.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13617
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
... not just cluster mutex handle.
This makes the recovery lock handle long-lived and with allow the
releasing code to cancel an in-progress attempt to take the recovery
lock.
The cluster mutex handle is now allocated off the recovery lock
handle.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13617
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
At the moment this is still local and is freed after the mutex is
successfully taken.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13617
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
If the master changed while trying to take the lock then fail gracefully.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13617
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
If SIGTERM is received and the tevent signal handler setup in the
recovery daemon is still enabled then the signal is handled and a
corresponding event is queued. The child never runs an event loop so
the signal is effectively ignored.
Resetting the SIGTERM handler isn't enough. A signal can arrive
before that.
Block SIGTERM before forking and then immediately unblock it in the
parent.
In the child, unblock SIGTERM after the signal handler is reset. An
explicit unblock is needed because according to sigprocmask(2) "the
signal mask is preserved across execve(2)".
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13617
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
If SIGTERM is received and the tevent signal handler setup in the
recovery daemon is still enabled then the signal is handled and a
corresponding event is queued. The child never runs an event loop so
the signal is effectively ignored.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13617
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13610
Signed-off-by: Ralph Wuerthner <ralph.wuerthner@de.ibm.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Ralph Böhme <slow@samba.org>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Wed Sep 12 21:50:57 CEST 2018 on sn-devel-144
Split get_version() into 2 functions, so that .distversion file can be
created.
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Wed Sep 12 05:50:46 CEST 2018 on sn-devel-144
install_dir() doesn't work (doesn't respect $DESTDIR) and is only
available for backward compatibilty.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Alexander Bokovoy <ab@samba.org>
gcc 4.8.5 complains:
[319/381] Compiling ctdb/tests/src/system_socket_test.c
../tests/src/system_socket_test.c: In function ‘test_tcp’:
../tests/src/system_socket_test.c:196:20: error: ‘rst_out’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
assert((rst != 0) == (rst_out != 0));
^
cc1: all warnings being treated as errors
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Andreas Schneider <asn@samba.org>
CTDB -O3 --picky-developer build is failing. Not sure how this
slipped through.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Thu Sep 6 08:33:59 CEST 2018 on sn-devel-144