IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
We want ctdb to shutdown first, as it manages many other
services. With the old level of 32 the NFS service would shutdown
first, and that would trigger ctdb to do a recovery. Then ctdb itself
would be shutdown a few seconds later, which causes a lot of error
messages in the other nodes logs
(This used to be ctdb commit 2f952af1a12e81a652ec9a4794db96f9593f2676)
This would allow a sysadmin to set up ctdb to send an email/snmptrap/... when the status of the node changes.
(This used to be ctdb commit ce534a83a05dbd40238e4eee0669d60ff396f935)
This can take very long if there are very many shares and is in that case better to implement in a separate cronjob than in ctdb eventscript
(This used to be ctdb commit 432604a1435cd2b5a7178fb5aedf1d4b61bffeb9)
this is a timeconsuming process and might not be feasible to perform if there are very many thousand shares
(This used to be ctdb commit 051ae5f3c13892b860818eac803d348f09845dc6)
The httpd service on suse and ubuntu/debian systems is usually
called "apache2" nowadays.
Note: There are older installs with Apache 1.3 out there, in which case
the service is called "apache". An extra check for these installs could
be useful as a sequel to this patch...
Michael
(This used to be ctdb commit b9e50e3416fecef6a881be3f1b91be977299293f)
The netstat test only grepped for the ipv4 wildcard address.
Now the ipv6 wildcard listener is correctly detected as well.
Michael
(This used to be ctdb commit 78e7928797e239e71f96eb001460a0dbf943e18f)
This fixes tcp port monitor events on systems, where netcat or nc
is not found in /usr/bin/, Debian, for instance.
The patch also separates the process of finding the binaries and
calling them, moving the detection outside of the loop over the
ports list.
Michael
(This used to be ctdb commit 3adf100e7f0c04aaf2da9ae4c6984cdb708c3b57)
for managing samba and winbind
This uses CTDB_INIT_STYLE as exported by ctdb.init.
suse systems usually have separate init scripts for
smb for smbd and nmb for nmbd, and the ubuntu/debian
start script for smbd and nmbd is called samba instead
of smb (on redhat).
Michael
(This used to be ctdb commit 5fe84f96f3f79baba1f44ba57ce217f501b3c1f8)
and export CTDB_INIT_STYLE, so that event scripts
as called by ctdbd can use it.
Michael
(This used to be ctdb commit 56a10594ea9e44e3f034ac11161fd06e5ae46544)
This prevents the monitor action of 50.samba from failing
on e.g. a typical [homes] service with "path = /home/%S" .
Michael
(This used to be ctdb commit 023d6c2e3017d323b5a70f987f3b4e0b8b8f0f7b)
When "service ctdb stop" is called and the ctdbd is not running,
don't print the "Failed to connect to daemon" error messages.
But print a warning and exit with status success instead.
Michael
(This used to be ctdb commit fac9ad26b2239818e6fc371fbfaa894fa64045be)
On some systems, the ethtool link detection is not successful when a
cable is plugged but the interface has not been brought up previously.
This improves the test by bringing the interface up (without checking
for success here) and trying the ethtool test again afterwards.
Michael
(This used to be ctdb commit 0c2a7bf18c65452ca1c2f0539bf692507d91e3c6)
Let the event complete successfully. the local recovery daemon will check that we have the address and reissue takip othervise.
There are several reasons why "ip addr add " can fail, one is a misconfiguration
anothe ris that for ipv6 the stack is a lot more picky than for ipv4. for examplke this WILL fail in ipv6 if there is a duplicate ip address on the network.
thus this check could cause rolling-recoveries which is why it has to go
(This used to be ctdb commit 12bc85c90a640a72ff538c003eb81da9dd1f2e3f)
If we use vlan tagging and bonding we must strip the vlan part off the name
so we can check the main bonde device for status.
I.e. check bond0 instead of bond0.<VLANTAG>
(This used to be ctdb commit 795c190b004d404b84dda053593139ed51d345e5)
CTDB_SAMBA_SKIP_CONF_CHECK and CTDB_SAMBA_CHECK_PORTS.
The first is used to tell ctdb to no longer monitoring if the smb.conf file is consistent or not.
The second specifies which ports to check that smb is listening on
instead of using testparm to figure this out.
Since the net, testparm and smbstatus may block indefinitely in some configurations
we must have a way to configure ctdb to NOT use any of these three commands
in the scripts. These commands should thus never be used in scripts.
Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
(This used to be ctdb commit 2fe52c7979ecd28250ec4ac195d3c3999916e573)
Do not assume all nodes are members of LVS so always deciding the recmaster will be lvsmaster wont work.
Instead,
Create the set of active LVS nodes as those nodes that are LVS capable and
also HEALTHY.
Except if ALL LVS capable nodes are unhealthy in which case we allow the unhealthy
nodes to be part of the active set.
In the active set, pick one of the active nodes as being the lvsmaster
which will receive all incoming traffic and distribute it across
the active lvs nodes in the cluster.
(This used to be ctdb commit b2ccb891b81b041e2186e038b67bb4354b7892aa)
when monitoring the node health.
this might be useful to skip for environments with thousands of shares
(This used to be ctdb commit dd900d4ed8f07003c4f1db2d441cfc2ef2c89ef5)
nfs should never stop spontaneously so trying to restart it is
just counterproductive and at best a workaround to
hide real bugs.
(This used to be ctdb commit 90ab48bb8e17f59fcb27ddbff51de546c4447b64)
Kevin Collins noticed that RHEL5 grep-2.5.1-54.2.el5 built for
x86 does not handle \s while the exact same RHEL5 package for amd64
does!
[[:space:]] is more portable. Even across the same package version ( different architecture ) from the same vendor :-)
(This used to be ctdb commit fd7bb21c4f9289fc34a57f9d8cb7c13a02d06096)
to "ping" the local nfs daemon.
Once it has failed more than 3 times in a row it will attempt to restart the nfs service.
(This used to be ctdb commit a4e89f57a8d733ea74df7b0de31eb977d6d37388)
Just add CTDB_VALGRIND=yes in /etc/sysconfig/ctdb, and look at the
logs in /var/log/ctdb_valgrind.*
(This used to be ctdb commit 9acd577c97059e8924582ac52e9ce5785903f120)
stored yet.
Fix a cosmetic and annoying warning message when running "service ctdb start" and supress printing out that "warning your ls command to find the persistent databases didnt find any" ...
(This used to be ctdb commit d32b16a4e5ecc31563c6f2767e7d483f3d980284)
This attempts to fix the problem of ctdb event scripts blocking due to
attempted access to the ctdb databases during recovery. The changes are:
- now only the 'shutdown' and 'startrecovery' events can be called
with the databases locked in recovery. The event scripts must ensure
that for these two events no database access is attempted
- the recovered, takeip and releaseip events could previously be called
inside a recovery. The code now ensures that this doesn't happen, delaying
the events till after recovery has finished
- the 50.samba event script now avoids using testparm unless it is really
needed
This needs extensive testing.
(This used to be ctdb commit e3cdb8f2be6a44ec877efcd75c7297edb008a80b)
Use tdbdump to verify that all persistent database files are good
before we start the daemon.
(This used to be ctdb commit 13d3eb9a8bc7fad14fcd3e7e023c1336657424d6)
grep for lines starting with a '/' character since exportfs will sometimes
split a single export line into two lines of output like this :
[root@fscc-hs21-13 ~]# exportfs
/NFS4exports/tmp
<world>
/NFS4exports <world>
(This used to be ctdb commit 7c569720beb626617d800211faaf9029f0deb4cf)
Make the 60.nfs eventscript more forgiving when using non-us/english
characters in sharenames
(This used to be ctdb commit f4385712134ea783a0c79a687c5d4e6faa1cc4a7)
CTDB_START_AS_DISABLED="yes"
and command line argument
--start-as-disabled
When set, this makes the ctdb node to always start in DISABLED mode and will thus not host any public ip addresses.
The administrator must manually "ctdb enable" the node after it has started when the administrator wants the node to start hosting public ip addresses.
Using this option it is possible to start ctdb on a node without causing any reallocation of ip addresses when it is starting. The node will still merge with the cluster and there will still be a recovery phase but the ip address allocations will not change in the cluster.
(This used to be ctdb commit b93d29f43f5306c244c887b54a77bca8a061daf2)
by default ctdb does not monitor for OOM.
to enable this you need to uncomment the CTDB_MONITOR_FREE_MEMORY line in /etc/sysconfig/ctdb and specify the amount in MByte free that will trigger OOM and cause ctdb to shutdown the node
(This used to be ctdb commit 35627c7450a03f36a353c3dd7cce31ce3433a7ff)
IF lvs has been configured, check that the ipvsadm package has also
been installed since we depend on it.
If not, log an error and return 1
(This used to be ctdb commit 506174bbc47f1176122be2e55099149e3db27d57)
into a case and add an arm for ib*) (infiniband interfaces)
Dont try using ethtool on ib devices
(mii_tool doesnt work either)
IB does have a command ibv_devinfo which can tell whether a physical port
is up or not but it seems nontrivial to map this into a interface name such as ib0
(This used to be ctdb commit ab6bd25542946a732b4378f5476edfb466d6c000)
when monitoring that all nfs shares are available, allow both ' ' and
'\t' characters to separate the exported directory from the options
in /etc/exports
(This used to be ctdb commit ac6cfe9de0acdcf9461068684fa890504454aae4)
check if mountd is running during monitoring and if it is not, try to restart it
(This used to be ctdb commit 3d4b74669164b519398aeeacd59714f1e3884eff)
mainly useful for avoiding ack-storms when doing very rapid
failover/failback during testing but should not be required in
real-world.
this gets rid of a lof of annoying messages from the messages file
(This used to be ctdb commit 50d289dcce2caa7c7be9b6faa3b38b69c2237038)
used in single public ip address mode.
when using this argument, --public-interface must also be used.
add a vnn structure to the ctdb context to describe the single public ip
address
update the killtcp control in the daemon that if a socketpair that is to
be killed does not match a normal public address it checks if the
destination address maches the single public ip address and if so uses
that vnn structure from the ctdb context
this allows killtcp to kill also connections to the single public ip
instead of only normal public addresses
(This used to be ctdb commit 5661ba17b91f62821dec1c76056c78b99752a90b)
to have one single public ip address for the entire cluster.
this ip address is attached to lo on all nodes but only the recmaster
will respond to arp requests for this address.
the recmaster then runs an ipmux process that will pass any incoming
packets to this ip address onto the other node sin the cluster based on
the ip address of the client host
to use this feature one must
1, have one fixed ip address in the customers network attached
permanently attached to an interface
2, set CTDB_PUBLI_INTERFACE=
to specify on which interface the clients attach to the node
3, CTDB_SINGLE_PUBLI_IP=ip-address
to specify which ipaddress should be the "single public ip address"
to test with only one single client, attach several ip addresses to
the client and ping the public address from the client with different -I
options. look in network trace to see to which node the packet is
passed onto.
(This used to be ctdb commit 50d648c95e4e6d7c2867a034c2b550086d853320)
set the node initially unhealthy and let the status monitoring bring the node online.
This fixes a problem with winbindd, where it refused to start because secrets.tdb was not populated
but we could not populate ctdbd, because the net command would not run while ctdbd was still doing startup
and thus frozen
(This used to be ctdb commit 3a001b793dd76fb96addf1e2ccb74da326fbcfbc)
- put removed IPs on loopback with scope host
- check for nul strings in ethtool call
;
(This used to be ctdb commit e2df1d6d08e67a36ff05a590a34c56e900741287)
a bool that specifies whether the ip was held by a loopback adaptor or
not
the name of the interface where the ip was held
when we release an ip address from an interface, move the ip address
over to the loopback interface
when we release an ip address after we have move it onto loopback,
use 60.nfs to kill off the server side (the local part) of the tcp
connection so that the tcp connections dont survive a
failover/failback
61.nfstickle, since we kill hte tcp connections when we release an ip
address we no longer need to restart the nfs service in 61.nfstickle
update ctdb_takeover to use the new signature for ctdb_sys_have_ip
when we add a tcp connection to kill in ctdb_killtcp_add_connection()
check if either the srouce or destination address match a known public
address
(This used to be ctdb commit f9fd2a4719c50f6b8e01d0a1b3a74b76b52ecaf3)
10.interfaces startsup
this setting makes the system only respond to APR requests from the NIC
where the ip address is tied to and adds to the
"principle of least surprise" when using multihoming servers
(This used to be ctdb commit 39ddf347dc45f599964a4c17e67e71faed00e544)
we must always restart the lockmanager when the cluster has been
reconfigured and ip addresses has changed. This is to make sure we get a
clusterwide grace period for nfs locking.
if we dont do this and only restart locking on the nodes that were
direclty affected, a different client can take out a conflicting lock
from a different node before affected clients has had a chance to
reclaim all the locks lost during reconfigure.
grace period on rhel5 kernel has bene increased to 90 seconds!
statd-callout:
we must restart lockmanager to ensure a clusterwide grace period for
nfs. this makes locking "more correct" for nfs clients and prevents
other clients/nodes from taking out a conflicting lock while a different
client/node tries to reclaim lost locks.
This makes it "almost consistent" for NFS clients but there is still
the possibility that a cifs client can take out a conflicting lock
before an nfs client has had a chance to reclaim an existing lock.
This can not be solved with anything less than making the kernel nfs
lock manager "samba aware" and making samba aware of the internal state
of the kernel lock manager so that they can cooperate.
we can not just stop/start the lockmanager back to back in rhel5 since
if they are stopped/started too close to eachother then when the new
lockmanager upon starting up sends out statd notifications two things
can happen:
1, new lockmanager sends out notification BEFORE it has registered with
portmapper leading to
lockmanager starts
lockmanager sends notification to the client
client tries to recover the lock and tries to portmap the lockmanager
port on the server.
server is not (yet) registered with portmapper and server responds
"no such program" to hte clients request to discover where lockmanager
is.
client then just completely gives up reclaiming the lock and doesnt
even reattempt the portmapper call after some timeout.
==> lock reclaim failed.
2, if they are started back to back, and a client tries to reclaim the
lock the lockmanager sometimes sends two responses back to back
to the client. one with status NLM_GRANTED (==you got the lock
reclaimed) and one with status NLM_DENIED (==you could not get the lock
reclaimed)
This confuses the client and leads to the server thinking that the
client does have the lock and the client thinking it has not got the
lock and orphaned locks result.
We also send out additional notification messages of different formats
to allow more legacy clients to interoperate with locking.
(This used to be ctdb commit 13208c1aab2942e28dff87e38e6794bf0c026033)
multiple public addresses spread across multiple interfaces on each
node.
this is a massive patch since we have previously made the assumtion that
we only have one public address per node.
get rid of the public_interface argument. the public addresses file
now explicitely lists which interface the address belongs to
(This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8)
nfs-state directory actually exists (by creating it)
or else the lock manager will not start
(This used to be ctdb commit f2d15d04df842538c8d8331796a3c6fbe23463f2)
specific script /etc/ctdb/events.d/00.ctdb
get rid of CTDB_EVENTS_SCRIPT and --event-script
(This used to be ctdb commit 81ccfaf838e5772d4a58eb6a70224b7b39aba9f3)
instead for from /etc/ctdb/events so that we can get better debugging
output in the logs when something fails in the scripts
(This used to be ctdb commit 4ed96b768aea1611e8002f7095d3c4d12ccf77a3)
overkill since
1, we now kill the tcpconnections for lockd in 60.nfs
2, rpc.statd on linux sends out the notifications using the wrong
interface anyway which breaks a lot of clients including linux !
use our own smnotify tool instead of sm-notify
(This used to be ctdb commit 0163ad0ec01be6189a98ea91e5cec40f6750218f)
check for tcp ports
(the check for these tools should not really use hardcoded paths)
(This used to be ctdb commit 56d77082c07a519dd3804cc24cc7ba889b8469ff)
times to try the reset.
the reset retry attempt is now handled inside the daemon
update the 60.nfs script and remove this parameter that is no longer
used
(This used to be ctdb commit 30fb09b8b9a989e5cfe86b6daf2dcd2487013344)
the tcp connection
change the 60.nfs script to run ctdb killtcp in the foreground so we
dont get lots of these running in parallel when there are a lot of tcp
connections to rst
(This used to be ctdb commit d81616214752882242f2886e94681972a790db80)
- added DatabaseHashSize tunable
- added logging of events inside recovery (for timing)
(This used to be ctdb commit 3593cdb928b91e217faf1b3c537fa28dc82cdace)
- added monitoring of the ethernet link state
When monitoring detects an error, the node loses its public IP address
(This used to be ctdb commit 0af57aead8c983511d25774b4ffe09fa5ff26501)
use a dedicated variable CTDB_MANAGES_NFSLOCK since some might want to
use nfs but no lockmanager
(This used to be ctdb commit 1e8cec86617ffb188bd49c70f074a4b350d3fe3d)