IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Do not assume all nodes are members of LVS so always deciding the recmaster will be lvsmaster wont work.
Instead,
Create the set of active LVS nodes as those nodes that are LVS capable and
also HEALTHY.
Except if ALL LVS capable nodes are unhealthy in which case we allow the unhealthy
nodes to be part of the active set.
In the active set, pick one of the active nodes as being the lvsmaster
which will receive all incoming traffic and distribute it across
the active lvs nodes in the cluster.
(This used to be ctdb commit b2ccb891b81b041e2186e038b67bb4354b7892aa)
lvs: which shows which nodes are active LVS servers
lvsmaster: which shows which node is the lvs master multiplex node
pnn: which prints the pnn of the local node
(This used to be ctdb commit 00025eef662b867293829228c681df491cd6f371)
when monitoring the node health.
this might be useful to skip for environments with thousands of shares
(This used to be ctdb commit dd900d4ed8f07003c4f1db2d441cfc2ef2c89ef5)
nfs should never stop spontaneously so trying to restart it is
just counterproductive and at best a workaround to
hide real bugs.
(This used to be ctdb commit 90ab48bb8e17f59fcb27ddbff51de546c4447b64)
Kevin Collins noticed that RHEL5 grep-2.5.1-54.2.el5 built for
x86 does not handle \s while the exact same RHEL5 package for amd64
does!
[[:space:]] is more portable. Even across the same package version ( different architecture ) from the same vendor :-)
(This used to be ctdb commit fd7bb21c4f9289fc34a57f9d8cb7c13a02d06096)
This reverts commit bfba5c7249eff8a10a43b53c1b89dd44b625fd10.
revert the waitpid changes. we need to waitpid for some childredn so should
refactor the approach completely
(This used to be ctdb commit 702ced6c2fe569c01fe96c60d0f35a7e61506a96)
to "ping" the local nfs daemon.
Once it has failed more than 3 times in a row it will attempt to restart the nfs service.
(This used to be ctdb commit a4e89f57a8d733ea74df7b0de31eb977d6d37388)
so we should not call it from the main daemon.
1, set SIGCHLD to SIG_DFL to make sure we ignore this signal
2, get rid of all waitpid() calls
3, change reporting of event script status code from _exit()/waitpid() to write()/read() one byte across the pipe.
(This used to be ctdb commit bfba5c7249eff8a10a43b53c1b89dd44b625fd10)
If the event script that timed out was for the "monitor" event, then
even if it timed out we still return SUCCESS back to the guy invoking the eventscript.
Only consider the eventscript for "monitor" to have failed with an error
IFF it actually terminated with an error, or if it timed out 5 times in a row and hung.
(This used to be ctdb commit 60f3c04bd8b20ecbe937ffed08875cdc6898b422)
This is a hack to allow backtraces under valgrind to show what opcode
is getting uninitialised bytes
(This used to be ctdb commit 67bb12c8f0af5914efb44b76bc6ddbb11fc0fcdf)
Just add CTDB_VALGRIND=yes in /etc/sysconfig/ctdb, and look at the
logs in /var/log/ctdb_valgrind.*
(This used to be ctdb commit 9acd577c97059e8924582ac52e9ce5785903f120)
since this is already done implicitely when we changed recovery mode
back to normal
(This used to be ctdb commit af1f6cf7561fe9cb5c97f940d4458c83bdd8e2a0)
make ctdb uptime print how long the recovery took
in the recovery daemon when we check that the public ip address
allocation on the local node is correct (we have the ips we should have
and we dont have any we shouldnt have) use ctdb uptime and check the
recovery start/stop times and make sure we dont check for ip allocation
inconsistencies during a recovery where the ip address allocation is in flux.
(This used to be ctdb commit f86551580349b7f662f9a07e4eb0c1189e38e429)