IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
print a full "pstree -p" to the log.
Example :
|-ctdbd(29826)-+-ctdbd(29862)
| `-ctdbd(31897)-+-00.ctdb(31898)---sleep(31908)
change the default timeout to 60 seconds for eventscripts
(This used to be ctdb commit a3406c10d70f89d332eab25d481083142dff987d)
Remove the explicit vacuum/repack commands from the 00.ctdb eventscript
and implement this in the ctdb daemon.
Combine vacuuming and repacking into one
cheap read traverse to enumerate all candidate records
and one write traverse that both repacks the database and also deletes the record locally where we are lmaster and where the records have already been deleted remotely.
this code also adds initial autotuning heuristics for the vacuum intervals and how many records to delete in each iteration.
minor stylish changes made by ronnie s
(This used to be ctdb commit 95a3ee551241aa164967991fe5efe078e1714bde)
Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Signed-off-by: Wolfgang Mueller-Friedt <wolfmuel@de.ibm.com>
(This used to be ctdb commit 30cdad97706a9e9bb210120699aa939f6b16e8ca)
Log this in "ctdb statistics".
Also add a varaible "RecLockLatencyMs" that will log an error everytime it takes longer than this to access the reclock file.
(This used to be ctdb commit 042377ed803bb8f7ca9d6ea1a387427b7b8ba45a)
this now defaults to 60 seconds
This is useful if a split brain occurs due to network partitioning since it will make sure that the "other half" of the cluster that does not contain the recovery master will eventually release all ips and thus avoiding a duplicate ip situation for the public addresses
(This used to be ctdb commit 70f21428c9eec96bcc787be191e7478ad68956dc)
Rename the variable to SeqnumInterval for
1, it is an interval and not a 1/interval unit
2, so that we catch when people use this old variable and can update the sysconfig file instead of silently changin semantics of this variable
this is a real dodgy variable
(This used to be ctdb commit 68eac459e5d2b6b534f72821036675ffe5d7a350)
log the type of operation and the database name for all latencies higher
than a treshold
(This used to be ctdb commit 1d581dcd507e8e13d7ae085ff4d6a9f3e2aaeba5)
correctly by measuring how long it was since the last successful
communication with the recovery daemon was recorded.
After a certain timeout the ctdb daemon would deem the recovery daemon
as inoperable and shut down.
If the system clock is suddenly changed forward by many (60 or more)
seconds this could cause the timeout to trigger prematurely/immediately
where ctdb would incorrectly think that more than 60 seconds had passed
since last successful communications and thus abort.
Instead of cehcking for one timeout occuring, only deem the recovery
daemon to be "down" and trigger a shutdown if communications have
timedout for three intervals in a row.
(This used to be ctdb commit 196968c552e6ebcb57389d769a4b25f42fa8bc5d)
we currently only monitor that the dameons are running by kill(0, pid)
and verifying the the domain socket between them is ok.
this is not sufficient since we can have a situation where the recovery
daemon is hung.
this new code monitors that the recovery daemon is operating.
if the recovery hangs, we log this and shut down the main daemon
(This used to be ctdb commit cd69d292292eaab3aac0e9d9fc57cb621597c63c)
If the event script that timed out was for the "monitor" event, then
even if it timed out we still return SUCCESS back to the guy invoking the eventscript.
Only consider the eventscript for "monitor" to have failed with an error
IFF it actually terminated with an error, or if it timed out 5 times in a row and hung.
(This used to be ctdb commit 60f3c04bd8b20ecbe937ffed08875cdc6898b422)
and a ctdb command to pull the talloc memory map from a recovery daemon
ctdb rddumpmemory
(This used to be ctdb commit d23950be7406cf288f48b660c0f57a9b8d7bdd05)
when this tunable is set, ip addresses will only be failed over when a node
fails. And only those ip addresses held by the failed node will be reallocated
in the cluster.
When a node becomes active again, this will not lead to any failback of ip addresses.
This can reduce the number of "ip address movements" in the cluster since we dont automatically fail an ip address back, but can also lead to an unbalanced cluster since we no longer attempt to spread the ip addresses out evenly across the active nodes.
This tuneable can NOT be active at the same time as DeterministicIPs are used.
(This used to be ctdb commit d3b8a461b15bc584fa1785eb5922de6d49d8f6c4)
once every such interval :
* the recovery master on each node will uppdate the "connected" count in the
reclock count file (ctdb getreclock)
* if the node thinks it is a recovery master but it detects another node
that is DISCONNECTED but which still holds a lock to the reclock count file
this may mean that we have a split cluster.
if that other node that is DISCONNECTED but still holds the lock on hte reclock
pnn count file, is MORE connected than the local node,
yield the recmaster role and let the other half of the lcuster take over
this add a second, last chance mechanism to detect split clusters.
IF the cluster is split but GPFS is not yet split, this mechanism makes
the largest half of the cluster become the active half.
(This used to be ctdb commit 07af425f444531942cce8abff112c1524228d287)
public addresses to nodes deterministic.
Activate it by adding CTDB_SET_DeterministicIPs=1 in /etc/sysconfig/ctdb
When this is set, the first entry in /etc/ctdb/public_addresses will
always be hosted by node 0, when that node is available, the second
entry by node1 and so on.
This tunable allows the allocation of addresses to become very
unbalanced and is only for debugging/testing use.
Beware, this feature requires that /etc/ctdb/public_addresses are
identical on all the nodes in the cluster.
(This used to be ctdb commit f0ca221f235731542090d8a6c86f2b7cd2ce2f96)
there is an array for each node/public address that contains tcp tickles
we send a TCP_ADD as a broadcast to all nodes when a client is added
if tcp tickles are removed, they are only removed immediately from the
local node.
once every 20 seconds a node will push/broadcast out the tickle list for
all public addresses it manages. this will remove any deleted tickles
from the remote nodes
(This used to be ctdb commit e3c432a915222e1392d91835bc7a73a96ab61ac9)
- added DatabaseHashSize tunable
- added logging of events inside recovery (for timing)
(This used to be ctdb commit 3593cdb928b91e217faf1b3c537fa28dc82cdace)