IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
The current constant value doesn't respect CTDB_TEST_MODE/CTDB_BASE.
Instead use the path module to allow automatic listening in test mode
with local daemons.
A single node can be tested with local daemons, using something like:
$ tests/local_daemons.sh foo setup -n 1 -C "node address"
$ grep "node address" foo/node.0/ctdb.conf
# node address = 127.0.0.1
$ tests/local_daemons.sh foo start all
$ tests/local_daemons.sh foo print-log 0 | grep -i chose
... node.0 ctdbd[24546]: ctdb chose network address 127.0.0.1:4379
The trick is that commenting out the node address in ctdb.conf means
the chosen node address is the first one from the nodes file that
allows bind/listen. In this case it is the only line.
The following ensures that automatic listening works for a node that
isn't the first:
$ cat >mynodes
192.168.1.1
127.0.0.1
$ tests/local_daemons.sh foo setup -n 2 -N mynodes -C "node address"
$ grep "node address" foo/node.1/ctdb.conf
# node address = 127.0.0.1
$ tests/local_daemons.sh foo start 1
$ tests/local_daemons.sh foo print-log 1 | grep -i chose
[...] node.1 ctdbd[22787]: ctdb chose network address 127.0.0.1:4379
Note that the first address isn't local on this host, so will always
fail.
So, doing the above and starting both nodes yields...
...
$ tests/local_daemons.sh foo start 1
$ sleep 3; tests/local_daemons.sh foo start 0
$ tests/local_daemons.sh foo print-log all | grep -i 'chose\|bind'
[...] node.1 ctdbd[26351]: ctdb chose network address 127.0.0.1:4379
[...] node.0 ctdbd[26438]: ctdb_tcp_listen_addr: Failed to bind() to socket - Address already in use (98)
[...] node.0 ctdbd[26438]: Unable to bind to any node address - giving up
... as expected.
It would be nice to add tests for this, but we don't really have
infrastructure for that. At least manual testing shows, for the
obvious cases, the previous commits didn't break anything. :-)
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
The node name is already constructed when the nodes file is loaded, so
just copy the node name.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
The only place the outgoing connection needs to be stopped is when
there is a timeout when waiting for the connection to become writable.
Add a new function ctdb_tcp_node_connect_timeout() to handle this
case.
All of the other cases are attempts to establish a new outgoing
connection (initial attempt, retry after an error or disconnect, ...)
so drop stopping the connection in those cases.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14295
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Signed-off-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Thu Mar 12 05:29:20 UTC 2020 on sn-devel-184
No change in behaviour. This makes the code self-documenting.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14295
Signed-off-by: Ralph Boehme <slow@samba.org>
Signed-off-by: Martin Schwenke <martin@meltin.net>
No change in behaviour. This makes the code self-documenting.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14295
Signed-off-by: Ralph Boehme <slow@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
The node dead upcall has already restarted the outgoing connection.
There's no need to repeat it.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14295
Signed-off-by: Ralph Boehme <slow@samba.org>
Signed-off-by: Martin Schwenke <martin@meltin.net>
ctdb_tcp_tnode_cb() is called when we receive data on the outgoing connection.
This can happen when we get an EOF on the connection because the other side as
closed. In this case data will be NULL.
It would also be called if we received data from the peer. In this case data
will not be NULL.
The latter case is a fatal error though and we already call
ctdb_tcp_stop_connection() for this case as well, which means even though the
node is not fully connected anymore, by not calling the node_dead() upcall
NODE_FLAGS_DISCONNECTED will not be set.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14295
Signed-off-by: Ralph Boehme <slow@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Since commit 77deaadca8, a nodeA which
had previously accepted a connection from nodeB (where nodeB dies
e.g. as as result of fencing) when nodeB attempts to connect again
after restarting is always rejected with
ctdb_listen_event: Incoming queue active, rejecting connection from w.x.y.z
messages.
Consolidate dead node handling in the TCP restart handling.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14295
Signed-off-by: Noel Power <noel.power@suse.com>
Reviewed-by: Ralph Boehme <slow@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
If we can't bind the local end of an outgoing connection then
something has gone wrong. Retrying is better than failing into a
zombie state. The interface might come back up and/or the address my
be reconfigured.
While here, do the same thing for the other (potentially transient)
failures.
The unknown address family failure is special but just handle it via a
retry. Technically it can't happen because the node address parsing
can only return values with address family AF_INET or AF_INET6.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14274
Reported-by: 耿纪超 <gengjichao@jd.com>
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The type-checking is superfluous and gets in the way of readability.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Thu Nov 14 03:45:44 UTC 2019 on sn-devel-184
Commit c68b6f96f2 changed the talloc hierarchy such that outgoing TCP sockets
while sitting in the async connect() syscall are not freed via
ctdb_tcp_shutdown() anymore, they are hanging off a longer-running structure.
Free this structure as well.
If an outgoing TCP socket leaks into a long-running child process (possibly the
recovery daemon), this connection will never be closed as seen by the
destination node. Because with recent changes incoming connections will not be
accepted as long as any incoming connection is alive, with that socket leak
into the recovery daemon we will never again be able to successfully connect to
the node that is affected by this leak. Further attempts to connect will be
discarded by the destination as long as the recovery daemon keeps this socket
alive.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14175
RN: Avoid communication breakdown on node reconnect
Signed-off-by: Martin Schwenke <martin@meltin.net>
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
We have a macro for NULLing out the pointer
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Fri Nov 8 01:35:11 UTC 2019 on sn-devel-184
This file descriptor is owned by the incoming queue. It will be
closed when the queue is torn down.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14175
RN: Avoid communication breakdown on node reconnect
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
CTDB's incoming queue handling does not check whether an existing
queue exists, so can overwrite the pointer to the queue. This used to
be harmless until commit c68b6f96f2
changed the read callback to use a parent structure as the callback
data. Instead of cleaning up an orphaned queue on disconnect, as
before, this will now free the new queue.
At first glance it doesn't seem possible that 2 incoming connections
from the same node could be processed before the intervening
disconnect. However, the incoming connections and disconnect occur on
different file descriptors. The queue can become orphaned on node A
when the following sequence occurs:
1. Node A comes up
2. Node A accepts an incoming connection from node B
3. Node B processes a timeout before noticing that outgoing the queue is writable
4. Node B tears down the outgoing connection to node A
5. Node B initiates a new connection to node A
6. Node A accepts an incoming connection from node B
Node A processes then the disconnect of the old incoming connection
from (2) but tears down the new incoming connection from (6). This
then occurs until the originally affected node is restarted.
However, due to the number of outgoing connection attempts and
associated teardowns, this induces the same behaviour on the
corresponding incoming queue on all nodes that node A attempts to
connect to. Therefore, other nodes become affected and need to be
restarted too.
As a result, the whole cluster probably needs to be restarted to
recover from this situation.
The problem can occur any time CTDB is started on a node.
The fix is to avoid accepting new incoming connections when a queue
for incoming connections is already present. The connecting node will
simply retry establishing its outgoing connection.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14175
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This makes it consistent with the reverse case. Also, in_fd will soon
be removed.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14175
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
To make it easy to pass the node data to the upcall, the private data
for ctdb_tcp_read_cb() needs to be changed from tnode to node.
RN: Avoid marking a node as connected before it can receive packets
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14084
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Fri Aug 16 22:50:35 UTC 2019 on sn-devel-184
Nodes are currently marked as up if the outgoing connection is
established. However, if the incoming connection is not yet
established then this node could send a request where the replying
node can not queue its reply. Wait until both directions are up
before marking a node as connected.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14084
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Since commit ddd97553f0
ctdb_queue_send() doesn't queue a packet if the connection isn't yet
established (i.e. when fd == -1). So, don't bother creating the
outbound queue during initialisation but create it when the connection
becomes writable.
Now the presence of the queue indicates that the outbound connection
is up.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14084
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This makes it easy to track both incoming and outgoing connectivity
states.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14084
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
in_fd is coming soon.
Fix coding style violations in the affected and adjacent lines.
Modernise some debug macros and make them more consistent (e.g. drop
logging of errno when strerror(errno) is already logged.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14084
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Node ID is a poorly defined concept, indicating the slot in the node
map where the IP address was found. This signed value also ends up
compared to num_nodes, which is unsigned, producing unwanted warnings.
Just return the PNN because this what both callers really want.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
We also can not assume that nodes can be marked as connected via only
the keepalive mechanism. Keepalives are not sent to disconnected
nodes so, in the absence of other packets (e.g. broadcasts), 2 nodes
may never become marked as connected to each other.
Revert to marking nodes as connected in the TCP transport code. If a
connection is to a non(-operational) ctdbd then it will revert to
disconnected after a short while and may actually flap. This should
be rare.
This reverts commit 66919db3d7.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13888
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Within ctdb_tcp_read_cb the provided data is checked for sanity,
e.g. correct size and content. This is not required because it was
done already by the caller(queue_process).
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Jeremy Allison <jra@samba.org>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Wed Apr 4 09:31:04 CEST 2018 on sn-devel-144
In case of an error condition the further processing of the data is cancelled
and the callback returns. In such a scenario the data has to be free'd.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Jeremy Allison <jra@samba.org>
It is expected by the caller(queue_process) that the callback is
free'ing the memory referenced by the data pointer.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Jeremy Allison <jra@samba.org>
Set SOCKET_CLOEXEC on the sockets returned by accept. This ensures that
the socket is unavailable to any child process created by system().
Making it harder for malicious code to set up a command channel,
as seen in the exploit for CVE-2015-0240
Signed-off-by: Gary Lockyer <gary@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
This groups function prototypes for common client/server functions in
common/common.h and removes them from ctdb_private.h.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Instead of includes.h, include the required header files explicitly.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
This groups function prototypes for system specific functions in
common/system.h and removes them from ctdb_private.h.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
sockets are created in a loop until an unused address is found.
But the unused socket fds were not closed.
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Signed-off-by: David Disseldorp <ddiss@samba.org>
Reviewed-by: Michael Adam <obnox@samba.org>
Autobuild-User(master): Michael Adam <obnox@samba.org>
Autobuild-Date(master): Wed Apr 1 15:36:03 CEST 2015 on sn-devel-104
CID 1291643: Resource leak: leaked_handle: Handle
variable lock_fd going out of scope leaks the handle.
Fix: on failure case release handle variable lock_fd
Signed-off-by: Rajesh Joseph <rjoseph@redhat.com>
Reviewed-by: Michael Adam <obnox@samba.org>
Reviewed-by: David Disseldorp <ddiss@samba.org>
Every time a nodemap is contructed the node IP addresses all need to
be parsed. This isn't very productive use of CPU.
Instead, parse each string once when the nodes file is loaded. This
results in much simpler code.
This code also removes the use of ctdb_address. Duplicating the port
is pointless without an abstraction layer around ctdb_address. If
CTDB gets an incompatible transport in the future then add an
abstraction layer.
Note that the infiniband code is not updated. Compilation of the
infiniband code is already broken. Fixing it will be a separate,
properly tested effort.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
This is currently set in 2 places. One of them makes the node loading
code difficult to refactor. Also, when the surrounding code in either
place is touched then it might get broken.
This only needs to be done once at startup, not on every reload. So
do it once in a very obvious way, sacrificing a few CPU cycles for
some added clarity.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>