1
0
mirror of https://github.com/samba-team/samba.git synced 2025-01-11 05:18:09 +03:00
Commit Graph

8204 Commits

Author SHA1 Message Date
Martin Schwenke
520568051c ctdb-tests: Drop use of confusing testfailures variable
Exit on first test failure instead of setting a variable.  The bizarre
logic in ctdb_test_exit() makes this worth dropping.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-11-06 07:16:16 +01:00
Martin Schwenke
9ebcebe519 ctdb-tests: Drop useless "ctdb version" test
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-11-06 07:16:16 +01:00
Martin Schwenke
43c26e1e64 ctdb-tests: Rationalise tunable simple tests
These 3 tests duplicate various checks and can easily be handled as a
single test.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-11-06 07:16:15 +01:00
Martin Schwenke
ba86eacb66 ctdb-tests: Rationalise ctdb stop/continue/disable/enable simple tests
The "continue" and "enable" tests are just extensions of the "stop"
and "disable" tests, so drop the latter 2.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-11-06 07:16:15 +01:00
Martin Schwenke
5fdac517fa ctdb-tests: Use wait_until_node_has_no_ips() in some tests
This strengthens those tests to ensure that released IPs aren't
replaced with others.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-11-06 07:16:15 +01:00
Martin Schwenke
eda1296d67 ctdb-tests: Add function wait_until_node_has_no_ips()
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-11-06 07:16:15 +01:00
Martin Schwenke
44019b5577 ctdb-event: Only run talloc report if CTDB_INTERACTIVE is set
This is only really wanted for interactive testing when logging to
stderr.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-11-06 07:16:15 +01:00
Martin Schwenke
e952f0316b ctdb-event: Never fork to become daemon in eventd
This stops ctdbd from being able to shut down eventd, since the PID it
records will be invalid.  There's no need for eventd to fork.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-11-06 07:16:15 +01:00
Martin Schwenke
4e6bd42493 ctdb-daemon: Improve documentation for -i option
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-11-06 07:16:15 +01:00
Martin Schwenke
9c41481f21 ctdb-daemon: Don't set log_to_stdout for become_daemon()
ctdbd logs to stderr in interactive mode, not stdout.  This way stdout
is always closed.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-11-06 07:16:15 +01:00
Martin Schwenke
c84254d23d ctdb-daemon: Avoid unnecessarily spamming the logs when in test mode
Logging the logging location to syslog can be useful on production
systems when the configuration goes unexpectedly missing.  However, in
test mode this just adds noise to the logs on the test system.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-11-06 07:16:14 +01:00
Martin Schwenke
1bbc4fad43 ctdb-tools: Detect unknown node number
If there aren't enough addresses in the list then the shift will
silently fail and the printed address will be the unshifted value of
$1, which is incorrect/unexpected.  So, sanity check the node number.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-11-06 07:16:14 +01:00
Martin Schwenke
08469408c3 ctdb-tests: README updates
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-11-06 07:16:14 +01:00
Martin Schwenke
4f4a835c34 ctdb-tests: Remove export of CTDB_SOCKET
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-11-06 07:16:14 +01:00
Martin Schwenke
d246b1dadf ctdb-tests: Use path_socket() in dummy client
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-11-06 07:16:14 +01:00
Martin Schwenke
3b1e5977d8 ctdb-tests: Drop incorrect comment, unused function
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-11-06 07:16:14 +01:00
Martin Schwenke
82e589e388 ctdb-tests: Drop setting of CTDB_SOCKET and CTDB_PIDFILE
The local daemons ssh stub doesn't need to do this because the ctdbd
and the ctdb tool now only need CTDB_TEST_MODE and CTDB_BASE for local
daemon tests.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-11-06 07:16:14 +01:00
Martin Schwenke
d75fa2c3fd ctdb-daemon: Drop unused function ctdb_set_socketname()
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-11-06 07:16:14 +01:00
Martin Schwenke
5f478b7c5f ctdb-daemon: Use path functions for socket and PID file
Drop the use of ctdb_set_sockname() because it complicates the memory
allocation and this is the only place it is used.  Just assign to the
relevant pointer.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-11-06 07:16:14 +01:00
Martin Schwenke
cd021596da ctdb-tests: Use path_socket() in test client tools
Just leak the memory allocated by path_socket().  This is only used in
short-lived test programs, so it isn't worth the hassle of plumbing a
talloc context through several layers to get here.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-11-06 07:16:14 +01:00
Martin Schwenke
cc3aedd307 ctdb-tools: Use path_socket() in ctdb tool
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-11-06 07:16:14 +01:00
Martin Schwenke
38566780d2 ctdb-tests: Use ctdb-path for fake_ctdbd directory setup
This needs to be done before any of the code changes are made,
including updating the ctdb tool.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-11-06 07:16:13 +01:00
Martin Schwenke
d4a1f897af ctdb-tests: Use ctdb-path-like values for local daemons socket and PID file
However, don't use ctdb-path itself because some tests use nested
instances of onnode.  The outermost instance would set CTDB_SOCKET and
any inner instance would pick up that value, regardless of CTDB_BASE.

This is a temporary measure to avoid breaking testing while use of the
path functions is added to ctdbd and the ctdb tool.  When this is
complete these variables can be removed altogether because the code
will just depend on CTDB_TEST_MODE and CTDB_BASE.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-11-06 07:16:13 +01:00
Martin Schwenke
32c2ec8fa2 ctdb-common: Allow path_socket() to use $CTDB_SOCKET
Use of CTDB_SOCKET is being generally removed.  However, this override
is being added to allow test code outside of ctdb/ to be able to
specify the socket, if desired.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-11-06 07:16:13 +01:00
Martin Schwenke
27df4f002a ctdb-recovery: Ban a node that causes recovery failure
... instead of applying banning credits.

There have been a couple of cases where recovery repeatedly takes just
over 2 minutes to fail.  Therefore, banning credits expire between
failures and a continuously problematic node is never banned,
resulting in endless recoveries.  This is because it takes 2
applications of banning credits before a node is banned, which
generally involves 2 recovery failures.

The recovery helper makes up to 3 attempts to recover each database
during a single run.  If a node causes 3 failures then this is really
equivalent to 3 recovery failures in the model that existed before the
recovery helper added retries.  In that case the node would have been
banned after 2 failures.

So, instead of applying banning credits to the "most failing" node,
simply ban it directly from the recovery helper.

If multiple nodes are causing recovery failures then this can cause a
node to be banned more quickly than it might otherwise have been, even
pre-recovery-helper.  However, 90 seconds (i.e. 3 failures) is a long
time to be in recovery, so banning earlier seems like the best
approach.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13670

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Mon Nov  5 06:52:33 CET 2018 on sn-devel-144
2018-11-05 06:52:33 +01:00
Martin Schwenke
fbea9d3699 ctdb-daemon: Fix valgrind hit in event code
==25741== Syscall param write(buf) points to uninitialised byte(s)
==25741==    at 0x4939291: write (write.c:27)
==25741==    by 0x4868285: sys_write (sys_rw.c:68)
==25741==    by 0x13915D: sock_queue_trigger (sock_io.c:316)
==25741==    by 0x4DE6478: tevent_common_invoke_immediate_handler (in /usr/lib/x86_64-linux-gnu/libtevent.so.0.9.37)
==25741==    by 0x4DE64A2: tevent_common_loop_immediate (in /usr/lib/x86_64-linux-gnu/libtevent.so.0.9.37)
==25741==    by 0x4DEBE5A: ??? (in /usr/lib/x86_64-linux-gnu/libtevent.so.0.9.37)
==25741==    by 0x4DEA2D6: ??? (in /usr/lib/x86_64-linux-gnu/libtevent.so.0.9.37)
==25741==    by 0x4DE57E3: _tevent_loop_once (in /usr/lib/x86_64-linux-gnu/libtevent.so.0.9.37)
==25741==    by 0x15D1BA: ctdb_event_script_args (eventscript.c:821)
==25741==    by 0x13B437: ctdb_start_daemon (ctdb_daemon.c:1315)
==25741==    by 0x110642: main (ctdbd.c:393)
==25741==  Address 0x57888a4 is 100 bytes inside a block of size 144 alloc'd
==25741==    at 0x48357BF: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==25741==    by 0x4B9B7C0: talloc_named_const (in /usr/lib/x86_64-linux-gnu/libtalloc.so.2.1.14)
==25741==    by 0x15CCC6: eventd_client_write (eventscript.c:430)
==25741==    by 0x15CCC6: eventd_client_run (eventscript.c:556)
==25741==    by 0x15CCC6: ctdb_event_script_run (eventscript.c:649)
==25741==    by 0x15D198: ctdb_event_script_args (eventscript.c:812)
==25741==    by 0x13B437: ctdb_start_daemon (ctdb_daemon.c:1315)
==25741==    by 0x110642: main (ctdbd.c:393)
==25741==

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13659

Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Mon Oct 22 09:27:15 CEST 2018 on sn-devel-144
2018-10-22 09:27:15 +02:00
Amitay Isaacs
a190960380 ctdb-event: Check the return status of sock_daemon_set_startup_fd
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13659

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2018-10-22 06:04:20 +02:00
Amitay Isaacs
80549927bc ctdb-common: Set close-on-exec for startup fd
The startup_fd should not be propagated to the child processes created
from a daemon.  It should only be used in the daemon code to return the
status of the startup.  Another use of startup_fd is to notify the
parent if the daemon process has exited.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13659

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2018-10-22 06:04:20 +02:00
Martin Schwenke
c9e1603a5d ctdb-daemon: Exit if eventd goes away
ctdbd enters a broken state if eventd goes away.  A clean shutdown is
not possible because that involves running events.  Restarting eventd
is possible but this might mask a serious problem and it is possible
that eventd might keep on disappearing.  Just exit.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13659

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-10-22 06:04:20 +02:00
Martin Schwenke
a3d12252fa ctdb-daemon: Return early when refusing to run an event script
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13659

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-10-22 06:04:20 +02:00
Martin Schwenke
80f3f7c188 ctdb-tests: Improve counting of database records
Record counts are sometimes incomplete for large databases when
relevant tests are run on a real cluster.

This probably has something to do with ssh, pipes and buffering, so
move the filtering and counting to the remote end.  This means that
only the count comes across the pipe, instead of all the record data.

Instead of explicitly excluding the key for persistent database
sequence numbers, just exclude any key starting with '_'.  Such keys
are not used in tests.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Mon Oct  8 05:36:11 CEST 2018 on sn-devel-144
2018-10-08 05:36:11 +02:00
Martin Schwenke
52dcecbc92 ctdb-tests: Add extra debug to large database recovery test
This test sometimes fails, probably because the test is flakey.
Either the records aren't being added correctly or the counting of
records loses records.  Try to debug both possibilities.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-10-08 02:46:23 +02:00
Martin Schwenke
d67d8ed44a ctdb-tests: Shut down transaction_loop clients more cleanly
A transaction_loop client can exit with a transaction active when its
time limit expires.  This causes a recovery and causes problems with
the test cleanup, which detects unwanted recoveries and fails.

Set a flag when the time limit expires and exit cleanly before the
next transaction is started.

Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-10-08 02:46:23 +02:00
Martin Schwenke
2aa006a311 ctdb-tools: Have onnode pass -n option even when regular ssh not in use
ONNODE_SSH is really a test hook, so it doesn't need to support
completely random values.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-10-08 02:46:23 +02:00
Martin Schwenke
6ac5124b01 ctdb-tests: Support closing of stdin in local daemons ssh stub
Not sure this is needed but this makes it behave the same as ssh.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-10-08 02:46:23 +02:00
Martin Schwenke
0dfb3c87b5 ctdb-tests: Be more careful when building public IP addresses
The goal is to allow more local daemons by expanding the address range
rather than generating invalid addresses.

For IPv6, use a separate address space instead of an offset for the
2nd address.

For IPv4, use the last 2 octets with addresses starting at
192.168.100.1 and 192.168.200.1.  Avoid addresses with 0 and 255 in
the last octet by using a maximum of 100 addresses per "subnet"
starting at .1.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-10-08 02:46:23 +02:00
Martin Schwenke
36eb738877 ctdb-tests: Be more careful when building node addresses
The goal is to allow more local daemons by expanding the address range
rather than generating invalid addresses.

For IPv6, use all 4 trailing hex digits.

For IPv4, use the last 2 octets.  Although 127.0.0.0 is a /8 network,
avoid unexpected issues due to 0 and 255 in the last octet.  Use a
maximum of 100 addresses per "subnet" starting at .1.  Keep the first
group of addresses in 127.0.0.0/24 to continue to allow a reasonable
number of nodes to be tested with socket-wrapper.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-10-08 02:46:23 +02:00
Martin Schwenke
03dddc37b5 ctdb-tests: Don't format IPv4 octets as hex digits
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-10-08 02:46:22 +02:00
Martin Schwenke
0eabac5295 ctdb-tests: Be more efficient about starting/stopping local daemons
Don't loop, just use onnode all.

For shutting down, use onnode -p all.  This results in a significant
time saving for stopping many deamons because "ctdb shutdown" is now
synchronous.

onnode -p all can be used to start daemons directly because they
daemonize.  However, this does not work under valgrind because the
valgrind process does not exit, so onnode will wait forever for it.
In this case, use onnode without the -p option.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-10-08 02:46:22 +02:00
Martin Schwenke
a9ac33015b ctdb-tests: Do not use ctdbd_wrapper in local daemon tests
Run the daemon directly and shut it down using ctdb shutdown.

The wrapper waits for ctdbd to reach >=FIRST_RECOVERY runstate within
a timeout period and shuts ctdbd down if that doesn't happen.  This is
only really used to ensure that ctdbd doesn't exit early after an
apparently successful start.  There are no known cases where ctdbd
will continue running but fail to reach >=FIRST_RECOVERY runstate.

When ctdbd is started in tests, the test code will wait until ctdbd is
in a healthy state on all nodes before proceeding, so there is
effectively no change in behaviour.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-10-08 02:46:22 +02:00
Martin Schwenke
8bde6fa09c ctdb-tests: Don't remove non-existent test database directory
This directory is no longer used.  Lack of removal doesn't seem to
cause a problem.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-10-08 02:46:22 +02:00
Martin Schwenke
f2e4a5e9fa ctdb-tests: Drop unused function maybe_stop_ctdb()
There are too many functions to start/stop daemons.  Simplify this.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-10-08 02:46:22 +02:00
Martin Schwenke
2cd6a00399 ctdb-tests: Explicitly check for local daemons when shutting down
This is clearer if the logic is explicit...  and...

There are too many functions to start/stop daemons.  Simplify this.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-10-08 02:46:22 +02:00
Martin Schwenke
90f6b0a1ed ctdb-tests: Drop functions daemons_start(), daemons_stop()
There are too many functions to start/stop daemons.  Simplify this.

Inline the functionality into ctdb_start_all() and ctdb_stop_all().

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-10-08 02:46:22 +02:00
Martin Schwenke
f1ede41adf ctdb-tests: Don't used daemons_start()/daemons_stop() directly in tests
There are too many functions to start/stop daemons.  Simplify this.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-10-08 02:46:22 +02:00
Martin Schwenke
4642a347d0 ctdb-tests: Rename _ctdb_start_all() -> ctdb_start_all()
There are too many functions to start/stop daemons.  Simplify this.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-10-08 02:46:22 +02:00
Martin Schwenke
f57e5bbde7 ctdb-tests: Rename ctdb_start_all() -> ctdb_init()
There are too many functions to start/stop daemons.  Simplify this.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-10-08 02:46:21 +02:00
Martin Schwenke
a66a96934a ctdb-tests: Drop ps_ctdbd()
This was used for debugging tests by ensuring that the arguments to
ctdbd were as expected.  It no longer outputs anything useful because
ctdbd is now started without arguments.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-10-08 02:46:21 +02:00
Amitay Isaacs
83b3c5670d ctdb-tests: Drop code for RECEIVE_RECORDS control
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13641

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2018-10-08 02:46:21 +02:00
Amitay Isaacs
2f89bd96fb ctdb-protocol: Drop marshalling code for RECEIVE_RECORDS control
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13641

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2018-10-08 02:46:21 +02:00
Amitay Isaacs
81dae71fa7 ctdb-protocol: Mark RECEIVE_RECORDS control obsolete
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13641

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2018-10-08 02:46:21 +02:00
Amitay Isaacs
d18385ea2a ctdb-daemon: Drop implementation of RECEIVE_RECORDS control
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13641

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2018-10-08 02:46:21 +02:00
Amitay Isaacs
e15cdc652d ctdb-vacuum: Remove unnecessary check for zero records in delete list
Since no records are deleted from RB tree during step 1, there is no
need for the check.  Run step 2 unconditionally.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13641

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2018-10-08 02:46:21 +02:00
Amitay Isaacs
ef05239717 ctdb-vacuum: Fix the incorrect counting of remote errors
If a node fails to delete a record in TRY_DELETE_RECORDS control during
vacuuming, then it's possible that other nodes also may fail to delete a
record.  So instead of deleting the record from RB tree on first failure,
keep track of the remote failures.

Update delete_list.remote_error and delete_list.left statistics only
once per record during the delete_record_traverse.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13641

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2018-10-08 02:46:21 +02:00
Amitay Isaacs
202b9027ba ctdb-vacuum: Simplify the deletion of vacuumed records
The 3-phase deletion of vacuumed records was introduced to overcome
the problem of record(s) resurrection during recovery.  This problem
is now handled by avoiding the records from recently INACTIVE nodes in
the recovery process.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13641

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2018-10-08 02:46:20 +02:00
Martin Schwenke
dcc9935995 ctdb-tests: Add recovery record resurrection test for volatile databases
Ensure that deleted records and vacuumed records are not resurrected
from recently inactive nodes.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13641

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-10-08 02:46:20 +02:00
Amitay Isaacs
c4ec99b1d3 ctdb-daemon: Invalidate records if a node becomes INACTIVE
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13641

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2018-10-08 02:46:20 +02:00
Amitay Isaacs
040401ca3a ctdb-daemon: Don't pull any records if records are invalidated
This avoids unnecessary work during recovery to pull records from nodes
that were INACTIVE just before the recovery.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13641

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2018-10-08 02:46:20 +02:00
Amitay Isaacs
71896fddf1 ctdb-daemon: Add invalid_records flag to ctdb_db_context
If a node becomes INACTIVE, then all the records in volatile databases
are invalidated.  This avoids the need to include records from such
nodes during subsequent recovery after the node comes out INACTIVE state.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13641

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2018-10-08 02:46:20 +02:00
Noel Power
2e59a3343f PY3: make sure print stmt is enclosed by '(' & ')'
Signed-off-by: Noel Power <noel.power@suse.com>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2018-09-19 22:25:05 +02:00
Martin Schwenke
486022ef8f ctdb-recoverd: Set recovery lock handle at start of attempt
This allows the attempt to be cancelled if an election is lost and an
unlock is done before the attempt is completed.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13617

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Tue Sep 18 02:18:30 CEST 2018 on sn-devel-144
2018-09-18 02:18:30 +02:00
Martin Schwenke
b1dc568784 ctdb-recoverd: Handle cancellation when releasing recovery lock
If the recovery lock is in the process of being taken then free the
cluster mutex handle but leave the recovery lock handle in place.
This allows ctdb_recovery_lock() to fail.

Note that this isn't yet live because rec->recovery_lock_handle is
still only set at the completion of the attempt to take the lock.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13617

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-09-17 22:58:20 +02:00
Martin Schwenke
a755d060c1 ctdb-recoverd: Return early when the recovery lock is not held
This makes upcoming changes simpler.

Update to modern debug macro while touching relevant line.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13617

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-09-17 22:58:20 +02:00
Martin Schwenke
c52216740b ctdb-recoverd: Store recovery lock handle
... not just cluster mutex handle.

This makes the recovery lock handle long-lived and with allow the
releasing code to cancel an in-progress attempt to take the recovery
lock.

The cluster mutex handle is now allocated off the recovery lock
handle.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13617

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-09-17 22:58:20 +02:00
Martin Schwenke
a53b264aee ctdb-recoverd: Use talloc() to allocate recovery lock handle
At the moment this is still local and is freed after the mutex is
successfully taken.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13617

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-09-17 22:58:20 +02:00
Martin Schwenke
af22f03dbe ctdb-recoverd: Rename hold_reclock_state to ctdb_recovery_lock_handle
This will be a longer lived structure.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13617

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-09-17 22:58:20 +02:00
Martin Schwenke
c516e58ce9 ctdb-recoverd: Re-check master on failure to take recovery lock
If the master changed while trying to take the lock then fail gracefully.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13617

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-09-17 22:58:20 +02:00
Martin Schwenke
59fc01646c ctdb-recoverd: Clean up taking of recovery lock
No functional changes, just coding style cleanups and debug message
tweaks.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13617

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-09-17 22:58:20 +02:00
Martin Schwenke
e789d0da57 ctdb-cluster-mutex: Block signals around fork
If SIGTERM is received and the tevent signal handler setup in the
recovery daemon is still enabled then the signal is handled and a
corresponding event is queued.  The child never runs an event loop so
the signal is effectively ignored.

Resetting the SIGTERM handler isn't enough.  A signal can arrive
before that.

Block SIGTERM before forking and then immediately unblock it in the
parent.

In the child, unblock SIGTERM after the signal handler is reset.  An
explicit unblock is needed because according to sigprocmask(2) "the
signal mask is preserved across execve(2)".

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13617

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-09-17 22:58:20 +02:00
Martin Schwenke
5a6b139884 ctdb-cluster-mutex: Reset SIGTERM handler in cluster mutex child
If SIGTERM is received and the tevent signal handler setup in the
recovery daemon is still enabled then the signal is handled and a
corresponding event is queued.  The child never runs an event loop so
the signal is effectively ignored.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13617

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-09-17 22:58:19 +02:00
Noel Power
3cc284b2af PY3: fix some octal literals
Signed-off-by: Noel Power <noel.power@suse.com>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2018-09-16 06:16:19 +02:00
Ralph Wuerthner
e52abc8a44 ctdb-doc: Remove PIDFILE option from ctdbd_wrapper man page
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13610

Signed-off-by: Ralph Wuerthner <ralph.wuerthner@de.ibm.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Ralph Böhme <slow@samba.org>

Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Wed Sep 12 21:50:57 CEST 2018 on sn-devel-144
2018-09-12 21:50:57 +02:00
Martin Schwenke
3903f6c365 ctdb-build: Fix version handling when building tarball
Split get_version() into 2 functions, so that .distversion file can be
created.

Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Wed Sep 12 05:50:46 CEST 2018 on sn-devel-144
2018-09-12 05:50:46 +02:00
Martin Schwenke
a122428a58 ctdb-build: Use wafsamba's INSTALL_DIR()
install_dir() doesn't work (doesn't respect $DESTDIR) and is only
available for backward compatibilty.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Alexander Bokovoy <ab@samba.org>
2018-09-11 06:59:11 +02:00
Martin Schwenke
b487979f89 ctdb-tests: Fix CTDB -O3 --picky-developer build on CentOS 7
gcc 4.8.5 complains:

[319/381] Compiling ctdb/tests/src/system_socket_test.c
../tests/src/system_socket_test.c: In function ‘test_tcp’:
../tests/src/system_socket_test.c:196:20: error: ‘rst_out’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
  assert((rst != 0) == (rst_out != 0));
                    ^
cc1: all warnings being treated as errors

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Andreas Schneider <asn@samba.org>
2018-09-07 17:26:18 +02:00
Martin Schwenke
bc62182ff4 ctdb-tests: Check result of write() in ARP and TCP tests
CTDB -O3 --picky-developer build is failing.  Not sure how this
slipped through.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Thu Sep  6 08:33:59 CEST 2018 on sn-devel-144
2018-09-06 08:33:59 +02:00
Alexander Bokovoy
0a9d98ba15 ctdb/wscript: rework how version number is retrieved
Using default context functions before waf initialization occured
is prone to error. Postpone calling samba_version.* code until we
got default context initialized.

Signed-off-by: Alexander Bokovoy <ab@samba.org>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2018-09-05 06:37:27 +02:00
Alexander Bokovoy
5c3d31eb14 cdtb/wscript: use top and out for waf 2.0
Signed-off-by: Alexander Bokovoy <ab@samba.org>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2018-09-05 06:37:26 +02:00
Alexander Bokovoy
175be9377e ctdb/wscript: adopt to waf-2.0
Signed-off-by: Alexander Bokovoy <ab@samba.org>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2018-09-05 06:37:26 +02:00
Alexander Bokovoy
65074d8901 ctdb/wscript: update to handle waf 2.0.4
Signed-off-by: Alexander Bokovoy <ab@samba.org>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2018-09-05 06:37:22 +02:00
Martin Schwenke
30eb28818d ctdb-tests: Don't run valgrind or other tracing in simple_test_command()
This function is used to run a extra command to check a result.  This
command is usually a script (often a stub) or an external command, so
no need to trace it with valgrind or whatever else might be specified.
In the worst case the command being run is a shell function, which
valgrind won't be able to find.

There is little use running the event script tests under valgrind.
However, when the whole test suite is being run under valgrind then it
should work.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Mon Sep  3 14:04:00 CEST 2018 on sn-devel-144
2018-09-03 14:04:00 +02:00
Martin Schwenke
8aacde3c5d ctdb-tests: Use known install paths in local daemon tests
The in-tree local daemons tests don't work from a top-level Samba
compile.  The simple test suite was the first test suite and things
have generally worked, so it has been slow to adopt general test
infrastructure changes.

Instead of re-calculating script and helper locations, use the paths
from script_install_paths.sh.  The bin/ directory is already added to
PATH in common.sh, so don't add it here.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-09-03 10:52:12 +02:00
Martin Schwenke
eed738a37a ctdb-tests: If bin/ isn't in ctdb/ then look one level higher
CTDB's test suite doesn't work from a top-level compile.  The first
step to fixing this is to correctly locate the bin/ directory.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-09-03 10:52:11 +02:00
Martin Schwenke
4f1727fe0b ctdb-common: Process the whole config file even if an error occurs
At the moment multiple errors will be encountered one at a time, on
each load or validate.  Instead, allow all configuration errors to
printed in a single pass.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13589

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-09-03 10:52:11 +02:00
Martin Schwenke
920ed66ba7 ctdb-common: Avoid ENOENT for unknown conf options
Only use ENOENT for missing configuration file.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13589

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-09-03 10:52:11 +02:00
Martin Schwenke
f108440038 ctdb-common: Avoid ENOENT for unknown conf type tags
Only use ENOENT for missing configuration file.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13589

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-09-03 10:52:11 +02:00
Martin Schwenke
a017d3181a ctdb-common: Log a message when an invalid conf value is encountered
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13589

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-09-03 10:52:11 +02:00
Martin Schwenke
ebb28c57a1 ctdb-common: Log a message for unknown conf option
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13589

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-09-03 10:52:11 +02:00
Martin Schwenke
421d828f6c ctdb-common: Fix log message for conf option with unknown section
This covers both options that appear before a section and options in
unknown sections.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13589

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-09-03 10:52:11 +02:00
Martin Schwenke
b5453bc27a ctdb-daemon: Drop incorrect log message
The message is incorrect because the actual failure was loading the
config file.  Instead of fixing the message, drop it because
ctdb_config_load() already logs the failure.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13589

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-09-03 10:52:11 +02:00
Martin Schwenke
6d3d9a85e5 ctdb-daemon: Log complete eventd startup command
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13592

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-09-03 10:52:10 +02:00
Martin Schwenke
58b8f2a31e ctdb-common: Clean up comments in TCP packet parsing
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Thu Aug 30 07:50:04 CEST 2018 on sn-devel-144
2018-08-30 07:50:04 +02:00
Martin Schwenke
53ceac9694 ctdb-common: Check the version field in IPv6 packets
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-08-30 04:48:59 +02:00
Martin Schwenke
924a655b2a ctdb-common: Improve TCP packet size and offset calculations
The IPv4 check for short packets was strange.  It appeared to ensure
that the capture included everything up to and including the window
size.  The checksum field immediately follows the window size field,
so just ensure that the packet is large enough to contain everything
up to the start of the checksum.

Add a similar check for IPv6 packets.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-08-30 04:48:59 +02:00
Martin Schwenke
43a2022596 ctdb-tests: Extend TCP packet test to also do packet extraction
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-08-30 04:48:59 +02:00
Martin Schwenke
e2ac36867d ctdb-common: Factor out TCP packet parsing code
This can be tested separately.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-08-30 04:48:59 +02:00
Martin Schwenke
028fdc12e7 ctdb-common: Clean up types/declarations in TCP socket reading
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-08-30 04:48:59 +02:00
Martin Schwenke
cb4848e359 ctdb-common: Fix error handling when parsing TCP packets
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-08-30 04:48:58 +02:00
Martin Schwenke
f3a1f1e1fa ctdb-common: Fix a bug in non-Linux (PCAP) TCP packet capturing
Captured packets include a link-layer header, which is considered in
the Linux code but not the PCAP code.  Also, the actual captured
length is in caplen, not len.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-08-30 04:48:58 +02:00
Martin Schwenke
0beb16f34e ctdb-common: Don't modify a const argument
The current code might be slightly more efficient but
intentionally (although temporarily) modifying a const argument just
seems wrong.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-08-30 04:48:58 +02:00
Martin Schwenke
8fcf1af559 ctdb-common: Avoid magic numbers when building TCP packets
Most packet sizes and offsets are multiples of 32-bit words.  The IPv6
payload length is in octets.  The IPv6 version is the top 4 bits of
the relevant field.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-08-30 04:48:58 +02:00
Martin Schwenke
a02cba1c8a ctdb-tests: Add tests for TCP packet marshalling
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-08-30 04:48:58 +02:00
Martin Schwenke
d7d23e78ed ctdb-common: Factor out TCP packet marshalling code
This can be tested separately.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-08-30 04:48:58 +02:00
Martin Schwenke
a67899573a ctdb-common: Avoid single line multi-assignment
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-08-30 04:48:58 +02:00
Martin Schwenke
af5a42bf02 ctdb-common: Set version more obviously in IPv6 NA packet
Version is the top 4 bits of this field.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-08-30 04:48:58 +02:00
Martin Schwenke
ca0db67df9 ctdb-common: Clarify offset and packet length calculations
Calculate each offset from the beginning of the buffer and explicitly
use the sizes of structures.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-08-30 04:48:58 +02:00
Martin Schwenke
6b1e9a43dc ctdb-common: Use struct ether_arp to avoid manual offset calculations
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-08-30 04:48:58 +02:00
Martin Schwenke
e2a00feca3 ctdb-common: Be more careful with packet sizes
Ethernet packets must be at least 64 bytes.

For ARP the packet size was limited to 64 bytes.  This is probably OK
but the code might as well be a little more general.

For IPv6 NA there was no guarantee that the packet is at least 64
bytes.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-08-30 04:48:57 +02:00
Martin Schwenke
87088af6e4 ctdb-tests: Add tests for ARP and IPv6 NA marshalling
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-08-30 04:48:57 +02:00
Martin Schwenke
39cfd51143 ctdb-common: Separate ARP and IPv6 NA marshalling code
This can be tested separately.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-08-30 04:48:57 +02:00
Martin Schwenke
50a6d15256 ctdb-common: Fix error handling when sending ARPs
There are numerous places in the code where errno can be lost causing
the wrong error to be printed by a caller.  Change ctdb_sys_send_arp()
to always return a useful errno on error instead of returning -1 and
sometimes having errno set correctly.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-08-30 04:48:57 +02:00
Martin Schwenke
2ebb25dfc8 ctdb-common: Factor out common ARP code
Finding the interface and the MAC address are obvious.  Might as well
set up the common parts of the destination address structure.

Continue to open the socket and find the MAC address first.  This
might seem odd because marshalling and other subsequent steps may
fail.  However, in the future this code might be optimised to open a
single socket to send ARPs for a list of addresses on each interface,
so don't change the logic.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-08-30 04:48:57 +02:00
Martin Schwenke
172b87cb1b ctdb-common: Initialise structures when declared
Instead of using ZERO_STRUCT().

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-08-30 04:48:57 +02:00
Martin Schwenke
0927b38226 ctdb-tests: Add basic test to sanity check types in socket marshalling
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-08-30 04:48:57 +02:00
Martin Schwenke
7c361f4866 ctdb-common: Restore dropped copyright attributions
Commit fa94a49dbb accidentally dropped
some copyright attributions.  The original version of system_socket.c
was based on system_linux.c but many parts have been taking from
system_freebsd.c, which had these additional copyright attributions.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-08-30 04:48:57 +02:00
Martin Schwenke
032593487f ctdb-common: Fix CID 1414745 - Out-of-bounds access
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-08-30 04:48:56 +02:00
Martin Schwenke
b430a1ace6 ctdb-daemon: Do not retry connection to eventd
Confirmation is now received from eventd that it is accepting
connections, so this is no longer needed.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13592

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-08-30 04:48:56 +02:00
Martin Schwenke
62ec1ab147 ctdb-daemon: Wait for eventd to be ready before connecting
The current method of retrying the connection to eventd means that
messages get logged for each failure.

Instead, pass a pipe file descriptor to eventd and wait for it to
write 0 to the pipe to indicate that it is ready to accept client
connections.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13592

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-08-30 04:48:56 +02:00
Martin Schwenke
c446ae5e13 ctdb-daemon: Open eventd pipe earlier
The pipe will soon be needed earlier, so initialise it earlier.
Ensure the file descriptors are closed on error.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13592

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-08-30 04:48:56 +02:00
Martin Schwenke
e357b62fe5 ctdb-daemon: Improve error handling consistency
Other errors free argv, so do it here too.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13592

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-08-30 04:48:56 +02:00
Martin Schwenke
11ee92d1bf ctdb-event: Add support to eventd for the startup notification FD
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13592

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-08-30 04:48:56 +02:00
Martin Schwenke
dc6040c121 ctdb-common: Add support for sock daemon to notify of successful startup
The daemon writes 0 into the specified file descriptor when it is up
and listening.  This can be used to avoid loops in clients that
attempt to connect until they succeed.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13592

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-08-30 04:48:56 +02:00
Martin Schwenke
6fb80cbffb ctdb-tests: Check that no IPs are assigned when failover is disabled
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13589

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Fri Aug 24 14:13:12 CEST 2018 on sn-devel-144
2018-08-24 14:13:12 +02:00
Martin Schwenke
55893bf8d2 ctdb-tests: Add an extra conf loading test case
This shows that config file loading continues in spite of unknown keys
if ignore_unknown is true.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13589

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-08-24 10:59:22 +02:00
Martin Schwenke
78aad7623e ctdb-doc: Switch tunable DisableIPFailover to a config option
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13589

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-08-24 10:59:21 +02:00
Martin Schwenke
929634126a ctdb-config: Switch tunable DisableIPFailover to a config option
Use the "failover:disabled" option instead.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13589

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-08-24 10:59:21 +02:00
Martin Schwenke
d003a41a9c ctdb-config: Integrate failover options into conf-tool
Update and add tests accordingly.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13589

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-08-24 10:59:21 +02:00
Martin Schwenke
893dd623df ctdb-failover: Add failover configuration options
Only a "disabled" option for now.  Not documented because it isn't
used yet.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13589

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-08-24 10:59:21 +02:00
Martin Schwenke
8e160d331a ctdb-tests: Drop DisableIPFailover simple test
This is about to become a config file option that can't be dynamically
changed at run-time, so drop this test for now.  This test will be added
once the tunable becomes a config file option.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13589

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-08-24 10:59:21 +02:00
Martin Schwenke
914e9f22d8 ctdb-daemon: Pass DisableIPFailover tunable via environment variable
Preparation for obsoleting this tunable.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13589

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-08-24 10:59:21 +02:00
Martin Schwenke
21de59ab7f ctdb-common: Allow boolean configuration values to have yes/no values
This make the new configuration style more consistent with the old one.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13589

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-08-24 10:59:21 +02:00
Martin Schwenke
a9758f413d ctdb-doc: Switch tunable TDBMutexEnabled to a config option
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13589

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-08-24 10:59:21 +02:00
Martin Schwenke
f42486e891 ctdb-config: Switch tunable TDBMutexEnabled to a config option
Use the "database:tdb mutexes" option instead.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13589

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-08-24 10:59:21 +02:00
Martin Schwenke
8ddfc26d79 ctdb-doc: Add support for migrating tunables to ctdb.conf options
This will become common, so will be useful to have support for.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13589

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-08-24 10:59:20 +02:00
Martin Schwenke
43adcd717c ctdb-doc: Change option "no realtime" option to "realtime scheduling"
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13589

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-08-24 10:59:20 +02:00
Martin Schwenke
17068e756b ctdb-config: Change option "no realtime" option to "realtime scheduling"
Negative options can be confusing, so switch to a positive option.

This was supposed to be done months ago but was forgotten.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13589

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-08-24 10:59:20 +02:00
Martin Schwenke
64d4a7ae5a ctdb-doc: Handle boolean options in config migration more carefully
Values for ctdb.conf options are now returned by
get_ctdb_conf_option().  The main goal is to allow old boolean options
to be replaced by new logically negated options.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13589

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-08-24 10:59:20 +02:00
Martin Schwenke
d4afb60a24 ctdb-doc: Make config migration script notice removed CTDB_BASE option
This should never have been a user-level option, but some people used
it.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13589

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-08-24 10:59:20 +02:00
Martin Schwenke
48335725de ctdb-common: Fix aliasing issue in IPv6 checksum
Since commit 9c51b278b1 the compiler has
been able to inline the affected call to uint16_checksum().  Given
that the data (phdr) is being accessed by an incompatible
pointer (data) there is an aliasing problem when the call is inlined.
This results in incorrect behaviour with -O2/-O3 when compiling with
at least GCC 6, 7, and 8.

Fix this by making the types compatible.

Also fixes CID 1437604 (Reliance on integer endianness).  This is a
false positive because the uint16_checksum doesn't depend on the order
of the input uint16_t items.

https://bugzilla.samba.org/show_bug.cgi?id=13588

Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-08-24 10:59:20 +02:00
Swen Schillig
d0ed4a536e ctdb: calculate queue input buffer size correctly
The queue's input buffer is calculated in an iterative way.
This can result in a few back and forth jumping and
a few memory allocations and mem-free cycles.
This is very time consuming and not required, because the required
memory size can be calculated right away.

Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Jeremy Allison <jra@samba.org>

Autobuild-User(master): Jeremy Allison <jra@samba.org>
Autobuild-Date(master): Sat Aug 18 04:58:05 CEST 2018 on sn-devel-144
2018-08-18 04:58:05 +02:00
Swen Schillig
8262efbc96 ctdb: Replace calculation of bytes to read from socket by MIN() macro
The calculation of the bytes to read from the socket can be done easier
by the usage of the common MIN() macro.

Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Jeremy Allison <jra@samba.org>
2018-08-18 02:01:26 +02:00
David Disseldorp
4abf348ec4 ctdb: add expiry test for ctdb_mutex_ceph_rados_helper
Kill the ctdb_mutex_ceph_rados_helper with SIGKILL and then confirm
that the lock is automatically released following expiry.

Signed-off-by: David Disseldorp <ddiss@samba.org>
Reviewed-by: Samuel Cabrero <scabrero@suse.de>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): David Disseldorp <ddiss@samba.org>
Autobuild-Date(master): Thu Aug  9 16:26:36 CEST 2018 on sn-devel-144
2018-08-09 16:26:36 +02:00
David Disseldorp
ce289e89e5 ctdb_mutex_ceph_rados_helper: fix deadlock via lock renewals
RADOS locks without expiry persist indefinitely. This results in CTDB
deadlock during failover if the recovery master dies unexpectedly, as
subsequently elected recovery master nodes can't obtain the recovery
lock.
Avoid deadlock by using a lock expiration time (10s by default), and
renewing it periodically.

Bug: https://bugzilla.samba.org/show_bug.cgi?id=13540

Signed-off-by: David Disseldorp <ddiss@samba.org>
Reviewed-by: Samuel Cabrero <scabrero@suse.de>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-08-09 13:29:15 +02:00
David Disseldorp
91a89c1464 ctdb_mutex_ceph_rados_helper: rename timer_ev to ppid_timer_ev
In preparation for adding a lock refresh timer.

Signed-off-by: David Disseldorp <ddiss@samba.org>
Reviewed-by: Samuel Cabrero <scabrero@suse.de>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-08-09 13:29:15 +02:00
David Disseldorp
8d30fd5916 ctdb_mutex_ceph_rados_helper: use talloc destructor for cleanup
Signed-off-by: David Disseldorp <ddiss@samba.org>
Reviewed-by: Samuel Cabrero <scabrero@suse.de>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-08-09 13:29:15 +02:00
Samuel Cabrero
85706bd275 ctdb_mutex_ceph_rados_helper: Set SIGINT signal handler
Set a handler for SIGINT to release the lock.

Signed-off-by: Samuel Cabrero <scabrero@suse.de>
Reviewed-by: David Disseldorp <ddiss@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-08-09 13:29:15 +02:00
David Disseldorp
bd64af6b88 ctdb/build: link ctdb_mutex_ceph_rados_helper against ceph-common
ceph-common linkage is needed with new versions of Ceph.
Also respect the --libcephfs_dir=<path> parameter when provided.

Signed-off-by: David Disseldorp <ddiss@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-08-09 13:29:15 +02:00
Swen Schillig
c6b95c3672 ctdb: remove queue destructor as it isn't needed anymore
After

commit e097b7f8ff
Author: David Disseldorp <ddiss@suse.de>
Date:   Sun Jul 31 03:14:54 2011 +0200

    io: Make queue_io_read() safe for reentry

the destructor has no purpose anymore, therfore, remove it.

Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: David Disseldorp <ddiss@samba.org>

Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Mon Aug  6 11:37:32 CEST 2018 on sn-devel-144
2018-08-06 11:37:32 +02:00
Amitay Isaacs
f7b2e5eec5 ctdb-eventd: Fix CID 1438155
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13554

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Volker Lendecke <vl@samba.org>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Fri Aug  3 11:14:01 CEST 2018 on sn-devel-144
2018-08-03 11:14:01 +02:00
Volker Lendecke
33d012c3ce ctdb: Fix a cut&paste error
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13554

Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2018-08-03 08:24:06 +02:00