IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
we must always restart the lockmanager when the cluster has been
reconfigured and ip addresses has changed. This is to make sure we get a
clusterwide grace period for nfs locking.
if we dont do this and only restart locking on the nodes that were
direclty affected, a different client can take out a conflicting lock
from a different node before affected clients has had a chance to
reclaim all the locks lost during reconfigure.
grace period on rhel5 kernel has bene increased to 90 seconds!
statd-callout:
we must restart lockmanager to ensure a clusterwide grace period for
nfs. this makes locking "more correct" for nfs clients and prevents
other clients/nodes from taking out a conflicting lock while a different
client/node tries to reclaim lost locks.
This makes it "almost consistent" for NFS clients but there is still
the possibility that a cifs client can take out a conflicting lock
before an nfs client has had a chance to reclaim an existing lock.
This can not be solved with anything less than making the kernel nfs
lock manager "samba aware" and making samba aware of the internal state
of the kernel lock manager so that they can cooperate.
we can not just stop/start the lockmanager back to back in rhel5 since
if they are stopped/started too close to eachother then when the new
lockmanager upon starting up sends out statd notifications two things
can happen:
1, new lockmanager sends out notification BEFORE it has registered with
portmapper leading to
lockmanager starts
lockmanager sends notification to the client
client tries to recover the lock and tries to portmap the lockmanager
port on the server.
server is not (yet) registered with portmapper and server responds
"no such program" to hte clients request to discover where lockmanager
is.
client then just completely gives up reclaiming the lock and doesnt
even reattempt the portmapper call after some timeout.
==> lock reclaim failed.
2, if they are started back to back, and a client tries to reclaim the
lock the lockmanager sometimes sends two responses back to back
to the client. one with status NLM_GRANTED (==you got the lock
reclaimed) and one with status NLM_DENIED (==you could not get the lock
reclaimed)
This confuses the client and leads to the server thinking that the
client does have the lock and the client thinking it has not got the
lock and orphaned locks result.
We also send out additional notification messages of different formats
to allow more legacy clients to interoperate with locking.
(This used to be ctdb commit 13208c1aab2942e28dff87e38e6794bf0c026033)
files
so that we can partition the cluster into different subsets of nodes
which each serve a different subset of the public addresses
(This used to be ctdb commit 889e0fe69e4c88c6166282b12843b8d9727552d6)
everytime we release an ip.
this context is used to hold all resources needed when sending out
gratious arps and tcp tickles during ip takeover.
we hang it off the vnn structure that manages that particular ip address
instead so that we can have multiple ones going in parallell
this bug (or the same bug in different shape) has probably been in ctdb
for very very long but is likely to be hard to trigger
(This used to be ctdb commit c58db1cadaba253b2659573673b28c235ef7db76)
multiple public addresses spread across multiple interfaces on each
node.
this is a massive patch since we have previously made the assumtion that
we only have one public address per node.
get rid of the public_interface argument. the public addresses file
now explicitely lists which interface the address belongs to
(This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8)
nfs-state directory actually exists (by creating it)
or else the lock manager will not start
(This used to be ctdb commit f2d15d04df842538c8d8331796a3c6fbe23463f2)
the shutdown command
and return success to the caller if the _send() was successful
(This used to be ctdb commit 6bacaf8c7a96044708a6eda10cc8576adb7f5f79)
controls to register/unregister/check a server id.
a server id consists of TYPE:VNN:ID where type is specific to the
application. VNN is the node where the serverid was registered and ID
might be a node unique identifier such as a pid or similar.
Clients can register a server id for themself at the local ctdb daemon.
When a client dissappears or when the domain socket connection for the
client drops then any and all server ids registered across that domain
socket will also be automatically removed from the store.
clients can register as many server_ids as they want at the same time
but each TYPE:VNN:ID must be globally unique.
Clients have the option of explicitely unregister a server id by using
the UNREGISTER control.
Registration and unregistration can only be done by clients to the local
daemon. clients can not register their server id to a remote node.
clients can check if a server id does exist on any ctdb node in the
network by using the check control
(This used to be ctdb commit d44798feec26147c5cc05922cb2186f0ef0307be)
passing it as a parameter we set the callback function explicitely from
the caller if the ..._send() function returned a valid state pointer.
(This used to be ctdb commit aa939570662786455f63299b62c99882cff29d42)
callback function which is called upon completion (or timeout) of the
control.
modify scanning of recmaster in the monitoring_cluster code to try the
api out
(This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c)
struct so that if we timeout a control we can print debug info such as
what opcode failed and to which node
we dont need the *status parameter to ctdb_client_control_state
create async versions of the getrecmaster control
pass a memory context to getrecmaster
(This used to be ctdb commit 558b680c82f830fba82c283c78c2de8a0b150b75)
ctdb_call() may pass a null pointer to _recv() and this would cause a
segfault.
fortunately there appears there are no critical users for this codepath
right now so the risk was more theoretical IF clients start using this
call it coult segfault.
change ctdb_control() to become fully async so we later can make
recovery daemon do the expensive controls to nodes in parallell instead
of in sequence
(This used to be ctdb commit 379789cda6ef049f389f10136aaa1b37a4d063a9)