IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
These were intentionally not static so they could be linked to in unit
test programs. However, using the CCAN-style unit tests where
relevant code is just included, this is no longer necessary.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit d0e9e8554614bd49ffb9ec3509feaa0e80d0f65d)
This patch changes the callback signature for traversal
functions to allow a client to abort a traverse before it finishes.
Updates to all callers and examples as well as rb-test tool.
(This used to be ctdb commit 8ab0c63ad36cfbbb1e5fed46a1f4c47b1fdb581f)
There's a bug in LCP2. Selecting the node with the highest imbalance
doesn't always work. Some nodes can have a high imbalance metric
because they have a lot of IPs. However, these nodes can be part of a
group that is perfectly balanced. Nodes in another group with less
IPs might actually be imbalanced.
Instead of just trying the source node with the highest imbalance this
tries them in descending order of imbalance until it finds one where
an IP can be moved to another node.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 574091d5aced5e87aefad52f8bc47aa75c25fbf6)
There's a bug in LCP2. Selecting the node with the highest imbalance
doesn't always work. Some nodes can have a high imbalance metric
because they have a lot of IPs. However, these nodes can be part of a
group that is perfectly balanced. Nodes in another group with less
IPs might actually be imbalanced.
Factor out the code from lcp2_failback() that actually takes a node
and decides which address should be moved to which node.
This is the first step in fixing the above bug.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 75718c5768b5bb5c0bcd7dd90e0327c6ed22a63d)
this triggered a check for "only run the eventscript if we host the address" to trigger and shortcir=cuit calling the eventscript.
An effect of this would be that 'ctdb delip' would remove the ip from ctdb, but fail to delete it from the interface.
S1028798
(This used to be ctdb commit b82524f240bf21769dd7624ca6026763d38b9396)
cant talloc off vnn since it is not yet initialized and might not always be NULL
(This used to be ctdb commit 3d37be3e2bfb61ede824028aeebaa18ba304faae)
This will make it much easier to root-cause problems such as
S1029023
when an external application deleted the interface while it is still is in use by ctdbd.
(This used to be ctdb commit 9abf9c919a7e6789695490e2c3de56c21b63fa57)
check that the actual interface exist, print error and fail startup if the interface does not exist.
(This used to be ctdb commit cd33bbe6454b7b0316bdfffbd06c67b29779e873)
sometimes we do want to try to set the linkstate for interfaces that are not in use by public addresses right now (but posisbly by other mechanisms) and these messages just spam the logs
S1026357
(This used to be ctdb commit f2fe0a090a9650910ebe49514b3ca01dc593bea3)
Move struct ctdb_public_ip_list to ctdb_private.h and put some
definitions for some functions from ctdb_takeover.c there. This
allows those functions to be called from unit tests.
Add ctdb_takeover_tests.c and the Makefile support to build it.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 9d34be0233edf3bc022345c0494c4b2a4d7f8480)
The current non-deterministic IP allocation algorithm balances IPs
across the whole cluster. It does not consider different
interfaces/VLANs/subnets, so these different groups of IPs aren't
generally well balanced.
This adds the LCP2 algorithm for IP allocation and allows it to be
enabled by setting the "LCP2PublicIPs" tunable to 1.
The LCP2 algorithm calculates the imbalance of a node by totalling the
squares of the distances between each IP on the node. The IP distance
is defined as the length longest common prefix (LCP) of bits that is
found when comparing 2 IPs. The imbalance of a cluster is the maximum
imbalance for any node. At each step the algorithm selects an
allocation to the IP/node combination that results in the choosing the
allocation that best reduces the imbalance of the cluster.
The implementation splits out the IP allocation part of
ctdb_takeover_run() into new function ctdb_takeover_run_core(), and
then extracts out the basic IP assignment code into new functions
basic_allocate_unassigned() and basic_failback(). 3 new functions
lcp2_init(), lcp2_allocate_unassigned() and lcp2_failback() implement
the LCP2 algorithm, and are hooked into ctdb_takeover_run_core().
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 61fc7fbd0235469df22deb6581c6bd47e30bc0be)
Dont talloc_free(vnn) immediately but postphone it until later when
the eventscript callback has completed.
CQ S1026664
(This used to be ctdb commit 0a99e8742a261b1d3a2c8830f5c19ea6c2c47cad)
nodes with many addresses to nodes with few addresses,
loop up to num_ips+5 times instead of only 5 times.
When we have very many public ips per node, we might need to loop more than
5 times or else we will exit without reaching optimal balance.
(This used to be ctdb commit aa8114a625a637277561a66c80bdece3c27e9e20)
Simplify the handling of setting the links in the 10.interface eventscript
and remove the optimization to only call setifacelink on state change
to make the code simpler to read.
If a take ip event fails, flag the node as unhealthy.
Add a check to the interface script to check if the interface exists
or if it has been deleted.
So that we can capture and become UNHELTHY if someone deletes an interface
we are using to host public addresses.
(This used to be ctdb commit 4ab63d2a7262aff30d5eced184c294c9c9dd4974)
by external services failing to start, or blocking CTDBD from finishing the startup phase,
we can encounter a situation where we have not yet fully initialized, but a
remote recovery master tries to release a certain ip clusterwide.
In this situation the node that is pinned down in init/startup phase
would fail to perform the release of the ip address since we are not yet fully operational and not yet host any valid interfaces.
In this situation, we just need to remain unhealthy, there is on need to
also ban the node.
Remove the autobanning for this condition and just let the node remain in
unhealthy mode.
Banning is overkill in this situation when the system is broken and just
draws attention to ctdbd instead of the root cause.
(This used to be ctdb commit d8af74e4c4961deb94c18dde8ba7fc07e944729c)
flag the interface as initially being "link ok"
so that we can add it and startup.
The eventscript can later drop the flag if required
(This used to be ctdb commit 720849b756c825fb8b285f09972a8c39f1888a99)
but thinks it is still unassigned (-1).
add code to the recovery daemon to detect this case and trigger a reallocation
so that the ip gets covered
and change the takeip code to allow for this condition, taking on an ip address that is
already hosted.
cq s1021073
(This used to be ctdb commit 9020baf27cab7821c9094cda185206fb7af0fee7)
since if they are the same for whatever reason this triggers the system
to go into an infinite loop and is unrobust
The scriptds have been changed instead to be able to cope with this
situation for enhanced robustness
During takeover_run and when merging all ip allocations across the cluster
try to kepe track of when and which node currently hosts an ip address
so that we avoid extra ip failovers between nodes
(This used to be ctdb commit cf778b5aaf6356401e3985acccc7df9e08ab6930)
This is called everytime a reallocation is performed.
While STARTRECOVERY/RECOVERED events are only called when
we do ipreallocation as part of a full database/cluster recovery,
this new event can be used to trigger on when we just do a light
failover due to a node becomming unhealthy.
I.e. situations where we do a failover but we do not perform a full
cluster recovery.
Use this to trigger for natgw so we select a new natgw master node
when failover happens and not just when cluster rebuilds happen.
(This used to be ctdb commit 7f4c591388adae20e98984001385cba26598ec67)
Add a new "ctdb deltickle" command to delete tickles from the database.
This can ONLY be used for tickles created by "ctdb addtickle".
Push any "addtickle/deltickle" updates to other nodes every TickleUpdateInterval seconds'
(This used to be ctdb commit acded034e2f0dcae4c2c9e54e16a001caf23caec)
After 5 attempts to send a RST to a client without any response, we free
"con"; this is done during a traverse. This frees the node we are walking
through (the node is made a child of "con" down in rb_tree.c's
trbt_create_node() (Valgrind would catch this, as Martin confirmed).
So, we create a temporary parent and reparent onto that; then we free
that parent after the traverse, thus deleting the unwanted nodes.
CQ:S1019041
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 08f7f85477610a4916c1ec866aa467b28f1bbec3)
In Samba this is now called "tevent", and while we use the backwards
compatibility wrappers they don't offer EVENT_FD_AUTOCLOSE: that is now
a separate tevent_fd_set_auto_close() function.
This is based on Samba version 7f29f817fa.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 85e5e760cc91eb3157d3a88996ce474491646726)
verify that all nodes agree on the most recent ip address assignments
broke "ctdb moveip ..." since that call would never trigger
a full takeover run and thus would immediately trigger an inconsistency.
Add a new message to the recovery daemon where we can tell the recovery daemon to update its assignments.
BZ62782
(This used to be ctdb commit e7069082e5f0380dcddee247db8754218ce18cab)
addresses and verify that the remote nodes have/keep a consistent view of
assigned addresses.
If a remote node has an inconsistent view of addresses visavi the recovery
master this will trigger a full ip reallocation.
(This used to be ctdb commit f3bf2ab61f8dbbc806ec23a68a87aaedd458e712)
This makes sure that we don't get public addresses assigned during the
initial recovery and remove them again in the startup event.
metze
(This used to be ctdb commit f872e8c63a2f8979e6a0d088630575bdd4d7b4f1)
This only marks the interface status and doesn't
generate any directly triggered action.
The actions is later taken by the recovery process
in verify_ip_allocation.
metze
(This used to be ctdb commit cff58b27c970e9252d131125941c372019fd6660)