IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Since commit 77deaadca8, a nodeA which
had previously accepted a connection from nodeB (where nodeB dies
e.g. as as result of fencing) when nodeB attempts to connect again
after restarting is always rejected with
ctdb_listen_event: Incoming queue active, rejecting connection from w.x.y.z
messages.
Consolidate dead node handling in the TCP restart handling.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14295
Signed-off-by: Noel Power <noel.power@suse.com>
Reviewed-by: Ralph Boehme <slow@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
The type-checking is superfluous and gets in the way of readability.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Thu Nov 14 03:45:44 UTC 2019 on sn-devel-184
Commit c68b6f96f2 changed the talloc hierarchy such that outgoing TCP sockets
while sitting in the async connect() syscall are not freed via
ctdb_tcp_shutdown() anymore, they are hanging off a longer-running structure.
Free this structure as well.
If an outgoing TCP socket leaks into a long-running child process (possibly the
recovery daemon), this connection will never be closed as seen by the
destination node. Because with recent changes incoming connections will not be
accepted as long as any incoming connection is alive, with that socket leak
into the recovery daemon we will never again be able to successfully connect to
the node that is affected by this leak. Further attempts to connect will be
discarded by the destination as long as the recovery daemon keeps this socket
alive.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14175
RN: Avoid communication breakdown on node reconnect
Signed-off-by: Martin Schwenke <martin@meltin.net>
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This file descriptor is owned by the incoming queue. It will be
closed when the queue is torn down.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14175
RN: Avoid communication breakdown on node reconnect
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Since commit ddd97553f0
ctdb_queue_send() doesn't queue a packet if the connection isn't yet
established (i.e. when fd == -1). So, don't bother creating the
outbound queue during initialisation but create it when the connection
becomes writable.
Now the presence of the queue indicates that the outbound connection
is up.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14084
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This makes it easy to track both incoming and outgoing connectivity
states.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14084
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
in_fd is coming soon.
Fix coding style violations in the affected and adjacent lines.
Modernise some debug macros and make them more consistent (e.g. drop
logging of errno when strerror(errno) is already logged.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14084
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This groups function prototypes for common client/server functions in
common/common.h and removes them from ctdb_private.h.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Instead of includes.h, include the required header files explicitly.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Every time a nodemap is contructed the node IP addresses all need to
be parsed. This isn't very productive use of CPU.
Instead, parse each string once when the nodes file is loaded. This
results in much simpler code.
This code also removes the use of ctdb_address. Duplicating the port
is pointless without an abstraction layer around ctdb_address. If
CTDB gets an incompatible transport in the future then add an
abstraction layer.
Note that the infiniband code is not updated. Compilation of the
infiniband code is already broken. Fixing it will be a separate,
properly tested effort.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
In Samba this is now called "tevent", and while we use the backwards
compatibility wrappers they don't offer EVENT_FD_AUTOCLOSE: that is now
a separate tevent_fd_set_auto_close() function.
This is based on Samba version 7f29f817fa.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 85e5e760cc91eb3157d3a88996ce474491646726)
We've been seeing "Invalid packet of length 0" errors, but we don't know
what is sending them. Add a name for each queue, and print nread.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit e6cf0e8f14f4263fbd8b995418909199924827e9)
This is used to mark nodes as being DELETED internally in ctdb
so that nodes are not renumbered if / when they are removed from the nodes file.
This is used to be able to do "ctdb reloadnodes" at runtime without
causing nodes to be renumbered.
To do this, instead of deleting a node from the nodes file, just comment it out like
1.0.0.1
#1.0.0.2
1.0.0.3
After removing 1.0.0.2 from the cluster, the remaining nodes retain their
pnn's from prior to the deletion, namely 0 and 2
Any line in the nodes file that is commented out represents a DELETED pnn
(This used to be ctdb commit 6a5e4fd7fa391206b463bb4e976502f3ac5bd343)
modify the transport methods to allow to restart individual connections
and set up destructors properly.
only tear down/set-up tcp connections to nodes removed from the cluster
or nodes added to the cluster.
Leave tcp connections to unchanged nodes connected.
make "ctdb reloadnodes" explicitely cause a recovery of the cluster once
the files have been realoaded
(This used to be ctdb commit d1057ed6de7de9f2a64d8fa012c52647e89b515b)
instead of shutting down/restarting the entire tcp layer
just bounce all outgoing connections and reconnect
(This used to be ctdb commit e701a531868149f16561011e65794a4a46ee6596)
Closing it here just causes an epoll error, and may close a fd in use by
another structure to be closed. This caused a infinite recovery loop
(This used to be ctdb commit bc251ac7029c2689776a8c31b28ac1d233d52d4f)
add a new control that causes the node to drop the current nodes list
and reread it from the nodes file.
During this operation, the node will also drop the tcp layer and restart it.
When we drop the tcp layer, by talloc_free()ing the ctcp structure
add a destructor to ctcp so that we also can clean up and remove the references in the ctdb structure to the transport layer
add two new commands for the ctdb tool.
one to list all nodes in the nodesfile and the second a command to trigger a node to drop the transport and reinitialize it with the nde nodes file
(This used to be ctdb commit 4bc20ac73e9fa94ffd43cccb6eeb438eeff9963c)
shut down and restart the transport
othervise, if we use the tcp transport the tcp connection might try to
retransmit the queued data during the time the node is unavailable.
this together with the exponential backoff for tcp means that the tcp
connection quickly reaches the maximum backoff rto which is often 60 or
120 seconds. this would mean that it could take up to 60/120 seconds
before the tcp layer detects that the connection is dead and it has to
be reestablished.
(This used to be ctdb commit 0256db470879ce556b0f00070f7ebeaf37e529ab)
CTDB_FLAG_TORTURE, which forces some race conditions to be much more
likely. For example a 20% chance of not getting the lock on the
first try in the daemon
- abstraced the ctdb_ltdb_lock_fetch_requeue() code to allow it to
work with both inter-node packets and client->daemon packets
- fixed a bug left over in ctdb_call from when the client updated the
header on a call reply
- removed CTDB_FLAG_CONNECT_WAIT flag (not needed any more)
(This used to be ctdb commit 7559dcd184666c3853127e3c8f5baef4fea327c4)
us to put memory directly in the right context, avoiding quite a few
talloc_steal calls, and simplifying the code
- make the fetch lock code in the daemon fully async
(This used to be ctdb commit d98b4b4fcadad614861c0d44a3854d97b01d0f74)
added transport level packet allocator, allowing the transport to
enforce alignment or special memory rules
(This used to be ctdb commit 50304a5c4d8d640732678eeed793857334ca5ec1)
- added --self-connect option to ctdb_test, allowing testing when a
node connects to itself. not as efficient as local bypass, but very
useful for testing purposes (easier to work with 1 task in gdb than
2)
- split the ctdb_call() into an async triple, in the style of Samba4
async functions. So we now have ctdb_call_send(), ctdb_call_recv()
and ctdb_call().
- added the main ctdb_call protocol logic. No error checking yet, but
seems to work for simple cases
- ensure we initialise the length argument to getsockopt()
(This used to be ctdb commit 95fad717ef5ab93be3603aa11d2878876fe868d3)
- split up ctdb layer code into 3 modules
- added a simple test suite
- added packet structures for ctdb_call
- switched to an array for ctdb_node to make vnn lookup easy and fast
(This used to be ctdb commit 8a17460a816a5970f2df8244a06aec55d814f186)
- added basic IO handling for the tcp backend
- added a ctdb_node_dead upcall
- added packet queueing
- adding incoming packet handling
(This used to be ctdb commit 415497c952630e746e8cdcf8e1e2a7b2ac3e51fb)