1
0
mirror of https://github.com/samba-team/samba.git synced 2025-01-20 14:03:59 +03:00

8088 Commits

Author SHA1 Message Date
Volker Lendecke
92b73cf0bf ctdb-tcp: Close inflight connecting TCP sockets after fork
Commit c68b6f96f26 changed the talloc hierarchy such that outgoing TCP sockets
while sitting in the async connect() syscall are not freed via
ctdb_tcp_shutdown() anymore, they are hanging off a longer-running structure.
Free this structure as well.

If an outgoing TCP socket leaks into a long-running child process (possibly the
recovery daemon), this connection will never be closed as seen by the
destination node. Because with recent changes incoming connections will not be
accepted as long as any incoming connection is alive, with that socket leak
into the recovery daemon we will never again be able to successfully connect to
the node that is affected by this leak. Further attempts to connect will be
discarded by the destination as long as the recovery daemon keeps this socket
alive.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14175
RN: Avoid communication breakdown on node reconnect

Signed-off-by: Martin Schwenke <martin@meltin.net>
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(cherry picked from commit a6d99d9e5c5bc58e6d56be7a6c1dbc7c8d1a882f)

Autobuild-User(v4-9-test): Karolin Seeger <kseeger@samba.org>
Autobuild-Date(v4-9-test): Wed Nov 20 14:58:33 UTC 2019 on sn-devel-144
2019-11-20 14:58:32 +00:00
Martin Schwenke
0dcb2efb8f ctdb-tcp: Drop tracking of file descriptor for incoming connections
This file descriptor is owned by the incoming queue.  It will be
closed when the queue is torn down.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14175

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(cherry picked from commit bf47bc18bb8a94231870ef821c0352b7a15c2e28)
2019-11-20 11:15:25 +00:00
Martin Schwenke
14406d123a ctdb-tcp: Avoid orphaning the TCP incoming queue
CTDB's incoming queue handling does not check whether an existing
queue exists, so can overwrite the pointer to the queue.  This used to
be harmless until commit c68b6f96f26664459187ab2fbd56767fb31767e0
changed the read callback to use a parent structure as the callback
data.  Instead of cleaning up an orphaned queue on disconnect, as
before, this will now free the new queue.

At first glance it doesn't seem possible that 2 incoming connections
from the same node could be processed before the intervening
disconnect.  However, the incoming connections and disconnect occur on
different file descriptors.  The queue can become orphaned on node A
when the following sequence occurs:

1. Node A comes up
2. Node A accepts an incoming connection from node B
3. Node B processes a timeout before noticing that outgoing the queue is writable
4. Node B tears down the outgoing connection to node A
5. Node B initiates a new connection to node A
6. Node A accepts an incoming connection from node B

Node A processes then the disconnect of the old incoming connection
from (2) but tears down the new incoming connection from (6).  This
then occurs until the originally affected node is restarted.

However, due to the number of outgoing connection attempts and
associated teardowns, this induces the same behaviour on the
corresponding incoming queue on all nodes that node A attempts to
connect to.  Therefore, other nodes become affected and need to be
restarted too.

As a result, the whole cluster probably needs to be restarted to
recover from this situation.

The problem can occur any time CTDB is started on a node.

The fix is to avoid accepting new incoming connections when a queue
for incoming connections is already present.  The connecting node will
simply retry establishing its outgoing connection.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14175

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(cherry picked from commit d0baad257e511280ff3e5c7372c38c43df841070)
2019-11-20 11:15:25 +00:00
Martin Schwenke
20b823fc25 ctdb-tcp: Check incoming queue to see if incoming connection is up
This makes it consistent with the reverse case.  Also, in_fd will soon
be removed.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14175

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(cherry picked from commit e62b3a05a874db13a848573d2e2fb1c157393b9c)
2019-11-20 11:15:25 +00:00
Amitay Isaacs
6024163e17 ctdb-vacuum: Process all records not deleted on a remote node
This currently skips the last record.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14147
RN: Avoid potential data loss during recovery after vacuuming error

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(cherry picked from commit 33f1c9d9654fbdcb99c23f9d23c4bbe2cc596b98)
2019-10-16 12:16:21 +00:00
Martin Schwenke
9a5bdc6c9e ctdb-tools: Stop deleted nodes from influencing ctdb nodestatus exit code
Deleted nodes should simply be ignored.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14129
RN: Stop deleted nodes from influencing ctdb nodestatus exit code

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(cherry picked from commit 32b5ceb31936ec5447362236c1809db003561d29)

Autobuild-User(v4-9-test): Karolin Seeger <kseeger@samba.org>
Autobuild-Date(v4-9-test): Fri Sep 20 14:09:11 UTC 2019 on sn-devel-144
2019-09-20 14:09:11 +00:00
Ralph Boehme
b9f1be5cf4 ctdb: fix compilation on systems with glibc robust mutexes
On older systems like SLES 11 without POSIX robust mutexes, but with glib robust
mutexes where all the functions are available but have a "_np" suffix,
compilation fails in:

ctdb/tests/src/test_mutex_raw.c.239.o: In function `worker':
/root/samba-4.10.6/bin/default/../../ctdb/tests/src/test_mutex_raw.c:129: undefined reference to `pthread_mutex_consistent'
ctdb/tests/src/test_mutex_raw.c.239.o: In function `main':
/root/samba-4.10.6/bin/default/../../ctdb/tests/src/test_mutex_raw.c:285: undefined reference to `pthread_mutex_consistent'
/root/samba-4.10.6/bin/default/../../ctdb/tests/src/test_mutex_raw.c:332: undefined reference to `pthread_mutexattr_setrobust'
/root/samba-4.10.6/bin/default/../../ctdb/tests/src/test_mutex_raw.c:363: undefined reference to `pthread_mutex_consistent'
collect2: ld returned 1 exit status

This could be fixed by using libreplace system/threads.h instead of pthreads.h
directly, but as there has been a desire to keep test_mutex_raw.c standalone and
compilable without other external depenencies then libc and libpthread, make the
tool developer build only. This should get the average user over the cliff.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14038
RN: Fix compiling ctdb on older systems lacking POSIX robust mutexes

Signed-off-by: Ralph Boehme <slow@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(cherry picked from commit f5388f97792ac2d7962950dad91aaf8ad49bceaa)

Autobuild-User(v4-9-test): Karolin Seeger <kseeger@samba.org>
Autobuild-Date(v4-9-test): Thu Sep  5 16:12:34 UTC 2019 on sn-devel-144
2019-09-05 16:12:34 +00:00
Martin Schwenke
745052cb6b ctdb-recoverd: Fix typo in previous fix
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14085

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue Aug 27 15:29:11 UTC 2019 on sn-devel-184

(cherry picked from commit 8190993d99284162bd8699780248bb2edfec2673)
2019-09-03 12:05:40 +00:00
Martin Schwenke
89b08e4fbc ctdb-tests: Clear deleted record via recovery instead of vacuuming
This test has been flapping because sometimes the record is not
vacuumed within the expected time period, perhaps even because the
check for the record can interfere with vacuuming.  However, instead
of waiting for vacuuming the record can be cleared by doing a
recovery.  This should be much more reliable.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14085
RN: Fix flapping CTDB tests

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Wed Aug 21 13:06:57 UTC 2019 on sn-devel-184

(backported from commit 71ad473ba805abe23bbe6c1a1290612e448e73f3)
Signed-off-by: Martin Schwenke <martin@meltin.net>
2019-09-03 12:05:40 +00:00
Martin Schwenke
4cbd3cd970 ctdb-tests: Strengthen volatile DB traverse test
Check the record count more often, from multiple nodes.  Add a case
with multiple records.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14085

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(cherry picked from commit ca4df06080709adf0cbebc95b0a70b4090dad5ba)
2019-09-03 12:05:40 +00:00
Martin Schwenke
3801c9582b ctdb-recoverd: Only check for LMASTER nodes in the VNN map
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14085

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(cherry picked from commit 5d655ac6f2ff82f8f1c89b06870d600a1a3c7a8a)
2019-09-03 12:05:39 +00:00
Martin Schwenke
68cc58437f ctdb-tests: Don't retrieve the VNN map from target node for notlmaster
Use the VNN map from the node running node_has_status().

This means that

  wait_until_node_has_status 1 notlmaster 10 0

will run "ctdb status" on node 0 and check (for up to 10 seconds) if
node 1 is in the VNN map.

If the LMASTER capability has been dropped on node 1 then the above
will wait for the VNN map to be updated on node 0.  This will happen
as part of the recovery that is triggered by the change of LMASTER
capability.  The next command will then only be able to attach to
$TESTDB after the recovery is complete thus guaranteeing a sane state
for the test to continue.

This stops simple/79_volatile_db_traverse.sh from going into recovery
during the traverse or at some other inconvenient time.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14085

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(cherry picked from commit 53daeb2f878af1634a26e05cb86d87e2faf20173)
2019-09-03 12:05:39 +00:00
Martin Schwenke
31066fde8c ctdb-tests: Handle special cases first and return
All the other cases involve matching bits.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14085

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(cherry picked from commit bff1a3a548a2cace997b767d78bb824438664cb7)
2019-09-03 12:05:39 +00:00
Martin Schwenke
c3f2c55320 ctdb-tests: Inline handling of recovered and notlmaster statuses
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14085

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(cherry picked from commit bb59073515ee5f7886b5d9a20d7b2805857c2708)
2019-09-03 12:05:38 +00:00
Martin Schwenke
cf39c0fc3b ctdb-tests: Drop unused node statuses frozen/unfrozen
Silently drop unused local variable mpat.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14085

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(cherry picked from commit 9b09a87326af28877301ad27bcec5bb13744e2b6)
2019-09-03 12:05:38 +00:00
Martin Schwenke
fd8a55bb3f ctdb-tests: Reformat node_has_status()
Re-indent and drop non-POSIX left-parenthesis from case labels.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14085

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(cherry picked from commit 52227d19735a3305ad633672c70385f443f222f0)
2019-09-03 12:05:37 +00:00
Martin Schwenke
fcf29cda0e ctdb-daemon: Make node inactive in the NODE_STOP control
Currently some of this is supported by a periodic check in the
recovery daemon's main_loop(), which notices the flag change, sets
recovery mode active and freezes databases.  If STOP_NODE returns
immediately then the associated recovery can complete and the node can
be continued before databases are actually frozen.

Instead, immediately do all of the things that make a node inactive.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14087
RN: Stop "ctdb stop" from completing before freezing databases

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue Aug 20 08:32:27 UTC 2019 on sn-devel-184

(cherry picked from commit e9f2e205ee89f4f3d6302cc11b4d0eb2efaf0f53)

Autobuild-User(v4-9-test): Karolin Seeger <kseeger@samba.org>
Autobuild-Date(v4-9-test): Wed Aug 28 12:04:13 UTC 2019 on sn-devel-144
2019-08-28 12:04:13 +00:00
Martin Schwenke
fa705bc7de ctdb-daemon: Drop unused function ctdb_local_node_got_banned()
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14087

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(cherry picked from commit 91ac4c13d8472955d1f04bd775ec4b3ff8bf1b61)
2019-08-28 07:36:30 +00:00
Martin Schwenke
c2ee9bbeee ctdb-daemon: Switch banning code to use ctdb_node_become_inactive()
There's no reason to avoid immediately setting recovery mode to active
and initiating freeze of databases.

This effectively reverts the following commits:

  d8f3b490bbb691c9916eed0df5b980c1aef23c85
  b4357a79d916b1f8ade8fa78563fbef0ce670aa9

The latter is now implemented using a control, resulting in looser
coupling.

See also the following commit:

  f8141e91a693912ea1107a49320e83702a80757a

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14087

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(cherry picked from commit 0f5f7b7cf4e970f3f36c5e0b3d09e710fe90801a)
2019-08-28 07:36:30 +00:00
Martin Schwenke
13780a3ee0 ctdb-daemon: Factor out new function ctdb_node_become_inactive()
This is a superset of ctdb_local_node_got_banned() so will replace
that function, and will also be used in the NODE_STOP control.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14087

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(cherry picked from commit a42bcaabb63722411bee52b80cbfc795593defbc)
2019-08-28 07:36:30 +00:00
Martin Schwenke
f4442942fb ctdb-tcp: Mark node as disconnected if incoming connection goes away
To make it easy to pass the node data to the upcall, the private data
for ctdb_tcp_read_cb() needs to be changed from tnode to node.

RN: Avoid marking a node as connected before it can receive packets
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14084

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Fri Aug 16 22:50:35 UTC 2019 on sn-devel-184

(cherry picked from commit 73c850eda4209b688a169aeeb20c453b738cbb35)
2019-08-28 07:36:30 +00:00
Martin Schwenke
1e45ab3c23 ctdb-tcp: Only mark a node connected if both directions are up
Nodes are currently marked as up if the outgoing connection is
established.  However, if the incoming connection is not yet
established then this node could send a request where the replying
node can not queue its reply.  Wait until both directions are up
before marking a node as connected.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14084

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(cherry picked from commit 8c98c10f242bc722beffc711e85c0e4f2e74cd57)
2019-08-28 07:36:30 +00:00
Martin Schwenke
9155ad23d4 ctdb-tcp: Create outbound queue when the connection becomes writable
Since commit ddd97553f0a8bfaada178ec4a7460d76fa21f079
ctdb_queue_send() doesn't queue a packet if the connection isn't yet
established (i.e. when fd == -1).  So, don't bother creating the
outbound queue during initialisation but create it when the connection
becomes writable.

Now the presence of the queue indicates that the outbound connection
is up.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14084

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(cherry picked from commit 7f4854d9643a096a6d8a354fcd27b7c6ed24a75e)
2019-08-28 07:36:30 +00:00
Martin Schwenke
f2ce6c745c ctdb-tcp: Use TALLOC_FREE()
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14084

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(cherry picked from commit d80d9edb4dc107b15a35a39e5c966a3eaed6453a)
2019-08-28 07:36:30 +00:00
Martin Schwenke
b21bc19bae ctdb-tcp: Move incoming fd and queue into struct ctdb_tcp_node
This makes it easy to track both incoming and outgoing connectivity
states.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14084

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(cherry picked from commit c68b6f96f26664459187ab2fbd56767fb31767e0)
2019-08-28 07:36:30 +00:00
Martin Schwenke
17f1a95203 ctdb-tcp: Rename fd -> out_fd
in_fd is coming soon.

Fix coding style violations in the affected and adjacent lines.
Modernise some debug macros and make them more consistent (e.g. drop
logging of errno when strerror(errno) is already logged.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14084

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(cherry picked from commit c06620169fc178ea6db2631f03edf008285d8cf2)
2019-08-28 07:36:30 +00:00
Martin Schwenke
a8dd1a0577 ctdb-daemon: Add function ctdb_ip_to_node()
This is the core logic from ctdb_ip_to_pnn(), so re-implement that
that function using ctdb_ip_to_node().

Something similar (ctdb_ip_to_nodeid()) was recently removed in commit
010c1d77cd7e192b1fff39b7b91fccbdbbf4a786 because it wasn't required.
Now there is a use case.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14084

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(cherry picked from commit 3acb8e9d1c854b577d6be282257269df83055d31)
2019-08-28 07:36:30 +00:00
Martin Schwenke
a309b862e8 ctdb-daemon: Replace function ctdb_ip_to_nodeid() with ctdb_ip_to_pnn()
Node ID is a poorly defined concept, indicating the slot in the node
map where the IP address was found.  This signed value also ends up
compared to num_nodes, which is unsigned, producing unwanted warnings.

Just return the PNN because this what both callers really want.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(cherry picked from commit 010c1d77cd7e192b1fff39b7b91fccbdbbf4a786)
2019-08-28 07:36:30 +00:00
Rafael David Tinoco
de909ff886 ctdb-config: depend on /etc/ctdb/nodes file
CTDB should start as a disabled unit (systemd) in most of the
distributions and, when trying to enable it for the first time, user
should get an unconfigured, or similar, error.

Depending on /etc/ctdb/nodes file will give a clear direction to final
user on what is needed in order to get cluster up and running. It should
work like previous ENABLED=NO variables in SySV like initialization
scripts.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14017
RN: ctdb.service should only start if /etc/ctdb/nodes is not empty
Signed-off-by: Rafael David Tinoco <rafaeldtinoco@ubuntu.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(cherry picked from commit c5803507df7def388edcd5b6cbfee30cd217b536)
2019-08-08 07:32:21 +00:00
Rafael David Tinoco via samba-technical
44b5168845 ctdb-scripts: Fix tcp_tw_recycle existence check
net.ipv4.tcp_tw_recycle has been removed from Linux 4.12 but, still,
makes sense to check its existence. Unfortunately, current check does
not test for the procfs file existence. This commit fixes the issue.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13984

Signed-off-by: Rafael David Tinoco <rafaeldtinoco@ubuntu.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Andreas Schneider <asn@cryptomilk.org>
Autobuild-Date(master): Tue Jun  4 23:31:24 UTC 2019 on sn-devel-184

(cherry picked from commit 843fbb1207ee7ac84f3282974b66b9290d8da0ac)
2019-06-21 07:56:21 +00:00
Amitay Isaacs
8b52325985 ctdb-common: Fix memory leak in run_proc
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13943

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue May 14 08:59:03 UTC 2019 on sn-devel-184

(cherry picked from commit b1f4c86eea022999d5439e4a6ef3494fe41479b6)

Autobuild-User(v4-9-test): Karolin Seeger <kseeger@samba.org>
Autobuild-Date(v4-9-test): Fri May 17 10:56:19 UTC 2019 on sn-devel-144
2019-05-17 10:56:19 +00:00
Martin Schwenke
5419978537 ctdb-common: Fix memory leak
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13943

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(cherry picked from commit 30bc6e2529cdd444d4ec7902844c3a6fb0858090)
2019-05-17 07:18:32 +00:00
Martin Schwenke
76c7302105 ctdb-recoverd: Fix memory leak
state is always freed before exiting this function, so allocate fde
off it instead of long-lived ctdb context.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13943

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(cherry picked from commit 6a2941e2a9fd6ab2d5b8dbac042b61a7b1b0b914)
2019-05-17 07:18:32 +00:00
Andreas Schneider
925871f580 ctdb:common: Do not print NULL if we don't get a sockpath
sock_socket_start_recv() might not fill sockpath if we return early.

Found by GCC 9.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13937

Signed-off-by: Andreas Schneider <asn@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
(cherry picked from commit 830cb7e67568de5f3ce359cb6af3be8ab545c824)
2019-05-17 07:18:31 +00:00
Martin Schwenke
1c2c081f43 ctdb-daemon: Never use 0 as a client ID
ctdb_control_db_attach() and ctdb_control_db_detach() assume that any
control with client ID 0 comes from another daemon and treat it
specially.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13930

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(cherry picked from commit 8663e0a64fbdb9ea16babbfe87d6f5d7a7b72bbd)
2019-05-17 07:18:30 +00:00
Martin Schwenke
24d70220b2 ctdb-tests: Fix logic error in simple ctdb reloadips test
There is a chance that restoring IP addresses to the test node will
result in different IP addresses being assigned to that node.
Removing a single IP address may then fail (or be a no-op) if it is
done after the restore.

So, swap the single IP address removal to happen first, then restore,
then remove all IP addresses.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13924

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(cherry picked from commit dc89db8ca6aadd4a9f7e8a85843c53709d04587c)
2019-05-17 07:18:30 +00:00
Martin Schwenke
9f679ba14d ctdb-tests: Make ctdb reloadips tests more reliable
ctdb reloadips will fail if it can't disable takover runs.  The most
likely reason for this is that there is already a takeover run in
progress.  We can't predict when this will happen, so retry if this
occurs.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13924

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(cherry picked from commit 8be4ee1a28d5c037955832b6f827d40f28f02796)
2019-05-17 07:18:30 +00:00
Martin Schwenke
0ffba5145c ctdb-tests: Capture output in $out on failure as well
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13924

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(cherry picked from commit cf00db40355b49443263187f9d97934f91287e51)
2019-05-17 07:18:30 +00:00
Martin Schwenke
1eb5d2e4fc ctdb-tests: Don't clean up test var directory in autotest target
If the directory is always cleaned up then it is not possible to look
at daemon logs to debug test failures.

This target is only really used by autobuild.py, which (optionally)
cleans up the parent directory anyway.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13924

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue May  7 06:56:01 UTC 2019 on sn-devel-184

(cherry picked from commit 5a9e338330fe136908a3a17a5df81c054c5cc5b0)
2019-05-17 07:18:30 +00:00
Martin Schwenke
15e5d62b3d ctdb-tests: Fix usage message
Since commit 0e9ead8f28fced3ebfa888786a1dc5bb59e734a3 daemons have
been shut down after each test, so this option no longer has anything
to do with killing daemons.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13924

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(cherry picked from commit a2ab6485e027ebb13871c7d83b7626ac5c9b98c0)
2019-05-17 07:18:30 +00:00
Martin Schwenke
814471f46e ctdb-tests: Wait to allow database attach/detach to take effect
Sometimes the detach test fails:

  Check detaching single test database detach_test1.tdb
  BAD: database detach_test1.tdb is still attached
  Number of databases:4
  dbid:0x5ae995ee name:detach_test4.tdb path:tests/var/simple/node.0/db/volatile/detach_test4.tdb.0
  dbid:0xd84cc13c name:detach_test3.tdb path:tests/var/simple/node.0/db/volatile/detach_test3.tdb.0
  dbid:0x8e8e8cef name:detach_test2.tdb path:tests/var/simple/node.0/db/volatile/detach_test2.tdb.0
  dbid:0xc62491f4 name:detach_test1.tdb path:tests/var/simple/node.0/db/volatile/detach_test1.tdb.0
  Number of databases:3
  dbid:0x5ae995ee name:detach_test4.tdb path:tests/var/simple/node.1/db/volatile/detach_test4.tdb.1
  dbid:0xd84cc13c name:detach_test3.tdb path:tests/var/simple/node.1/db/volatile/detach_test3.tdb.1
  dbid:0x8e8e8cef name:detach_test2.tdb path:tests/var/simple/node.1/db/volatile/detach_test2.tdb.1
  Number of databases:4
  dbid:0x5ae995ee name:detach_test4.tdb path:tests/var/simple/node.2/db/volatile/detach_test4.tdb.2
  dbid:0xd84cc13c name:detach_test3.tdb path:tests/var/simple/node.2/db/volatile/detach_test3.tdb.2
  dbid:0x8e8e8cef name:detach_test2.tdb path:tests/var/simple/node.2/db/volatile/detach_test2.tdb.2
  dbid:0xc62491f4 name:detach_test1.tdb path:tests/var/simple/node.2/db/volatile/detach_test1.tdb.2
  *** TEST COMPLETED (RC=1) AT 2019-04-27 03:35:40, CLEANING UP...

When issued from a client, the detach control re-broadcasts itself
asynchronously to all nodes and then returns success.  The controls to
some nodes to do the actual detach may still be in flight when success
is returned to the client.  Therefore, the test should wait for a few
seconds to allow the asynchronous controls to complete.

The same is true for the attach control, so workaround the problem in
the attach test too.

An alternative is to make the attach and detach controls synchronous
by avoiding the broadcast and waiting for the results of the
individual controls sent to the nodes.  However, a simple
implementation would involve adding new nested event loops.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13924

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(cherry picked from commit 3cb53a7a05409925024d6a67bcfaeb962d896e0b)
2019-05-17 07:18:30 +00:00
Martin Schwenke
3f104bd0db ctdb-tests: Avoid bulk output in $out, prefer $outfile
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13924

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(cherry picked from commit 066cc5b0c561464ed08890d9aa1a1a55b545e9cc)
2019-05-17 07:18:30 +00:00
Martin Schwenke
b594f5161d ctdb-tests: Make try_command_on_node less error-prone
This sometimes fails, apparently due to a cat process in onnode
getting EAGAIN.  The conclusion is that tests that process large
amounts of output should not depend on a sub-shell delivering that
output into a shell variable.

Change try_command_on_node() to leave all of the output in file
$outfile and just put the first 1KB into $out.  $outfile is removed
after each test completes.

Change the implementation of sanity_check_output() to use $outfile
instead of $out.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13924

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(cherry picked from commit 9d02452a24625df5f62fd6d45a16effe2fa45fbe)
2019-05-17 07:18:29 +00:00
Martin Schwenke
7c97bc8328 ctdb-tests: Change sanity_check_output() to internally use $out
All callers are currently passed $out.  Global variable $out is used
in many other places so use it here to simplify the interface and make
future changes simpler.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13924

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(cherry picked from commit 7c3819d1ac264acf998f426e0cef7f6211e0ddee)
2019-05-17 07:18:29 +00:00
Martin Schwenke
30b5d837d5 ctdb-tests: Extend test to cover ctdb rddumpmemory
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13923

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(cherry picked from commit 8108b3134c017c22d245fc5b2207a88d44ab0dd2)
2019-05-17 07:18:29 +00:00
Martin Schwenke
08e229df43 ctdb-tools: Fix ctdb dumpmemory to avoid printing trailing NUL
Fix ctdb rddumpmemory too.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13923

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(cherry picked from commit f78d9388fb459dc83fafb4da6e683e3137ad40e1)
2019-05-17 07:18:29 +00:00
Amitay Isaacs
945a41d384 ctdb-common: Avoid race between fd and signal events
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13895

In run_proc, there was an implicit assumption that when a process exits,
fd event (pipe between parent and child) would be processed first and
signal event (SIGCHLD for the child) would be processed later.

However, that is not the case.  SIGCHLD can be received asynchronously
any time even when the pipe data has not fully been read.  This causes
run_proc to miss some of the output from child process in tests.

When SIGCHLD is being processed, if the pipe between parent and child is
still open, then do an explict read from the pipe to ensure we read any
data still in the pipe before closing the pipe.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Fri Apr 12 08:19:29 UTC 2019 on sn-devel-144

(cherry picked from commit 289201277cd983b27cdfd5376c607eab112b4082)

Autobuild-User(v4-9-test): Karolin Seeger <kseeger@samba.org>
Autobuild-Date(v4-9-test): Mon Apr 15 12:55:46 UTC 2019 on sn-devel-144
2019-04-15 12:55:46 +00:00
Martin Schwenke
d9c47cb86e ctdb-daemon: Revert "We can not assume that just because we could complete a TCP handshake"
We also can not assume that nodes can be marked as connected via only
the keepalive mechanism.  Keepalives are not sent to disconnected
nodes so, in the absence of other packets (e.g. broadcasts), 2 nodes
may never become marked as connected to each other.

Revert to marking nodes as connected in the TCP transport code.  If a
connection is to a non(-operational) ctdbd then it will revert to
disconnected after a short while and may actually flap.  This should
be rare.

This reverts commit 66919db3d7ab1e091223faf515b183af8bfddc83.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13888

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(cherry picked from commit 38dc6d11a26c2e9a2cae7927321f2216ceb1c5ec)
2019-04-15 08:28:11 +00:00
Martin Schwenke
49fa08814e ctdb-scripts: Update statd-callout to try several configuration files
The alternative seems to be to try something via CTDB_NFS_CALLOUT.
That would be complicated and seems like overkill for something this
simple.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13860

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@samba.org>
(cherry picked from commit a2bd4085896804ee2da811e17f18c78a5bf4e658)
2019-04-12 07:57:11 +00:00
Martin Schwenke
dae0e8ec96 ctdb-scripts: Allow load_system_config() to take multiple alternatives
The situation for NFS config has got more complicated and is probably
broken in statd-callout on Debian-like systems at the moment.  Allow
several alternative configuration names to be tried.  Stop after the
first that is found and loaded.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13860

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@samba.org>
(cherry picked from commit 0d67ea5fcca766734ecc73ad6b0139f7c13a15c5)
2019-04-12 07:57:11 +00:00