IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
During this window we may still hit asynchronous events that will fail because we can not send/receive packets from other nodes.
These messages are logged as ... Transport is DOWN. To help indicate that they are benign messages related to the process of shutting down.
These messages spam the syslog during normal shutdown, so this patch will drop the loglevel of these messages to DEBUG, so that they will not appear in or spam the syslog.
(This used to be ctdb commit 8275d265d2ae19b765e30ecf18f6b6319b6e6453)
Add a new command "ctdb stats [num]" that prints the [num] most recent statistics intervals collected.
(This used to be ctdb commit e6e16fcd5a45ebd3739a8160c8fb5f44494edb9e)
Add a new "ctdb deltickle" command to delete tickles from the database.
This can ONLY be used for tickles created by "ctdb addtickle".
Push any "addtickle/deltickle" updates to other nodes every TickleUpdateInterval seconds'
(This used to be ctdb commit acded034e2f0dcae4c2c9e54e16a001caf23caec)
In Samba this is now called "tevent", and while we use the backwards
compatibility wrappers they don't offer EVENT_FD_AUTOCLOSE: that is now
a separate tevent_fd_set_auto_close() function.
This is based on Samba version 7f29f817fa939ef1bbb740584f09e76e2ecd5b06.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 85e5e760cc91eb3157d3a88996ce474491646726)
and print the time startistics was taken and for how long the statistics have been collected to the "ctdb statistics" output.
(This used to be ctdb commit 1bdfe0cd3370a335b960ce1ef97eade93b0cd2fa)
->recovery_mode was set to normal but database priorities leven2 or 3 was still set to frozen.
causing the recovery daemon to fail to detect that a recovery was needed to recover access to the database.
BZ63951
(This used to be ctdb commit 7411b2b577a16f85ad6913e1bfccce7ea260a613)
This patch improves the handling of the fetch_lock operation on non-persistent
databases that ctdb clients have to do very frequently.
The normal flow how this goes is the following:
1. Client does a local fetch_lock on the database
2. Client looks if the local node is dmaster.
If yes, everything is fine
If no, continue here
3. Client unlocks the local record
4. Client issues a "get me the record" call to ctdbd
5. ctdbd goes out and fetches the dmaster role
6. ctdbd tells the client to retry
7. Client starts over again
The problem is between step 6 and 7: Before the client has had the chance to
retry (i.e. catch the record with a fetch_locked), another node might have come
asking ctdbd to migrate away the record again. This is a real problem, I've
seen >20 loops of this kind in real workloads.
This patch does the following: Whenever ctdb receives a record as result of
step 5, it puts the key on a "holdback list". As long as a key is on this list,
a request to migrate away the dmaster is put on hold. It is the client's duty
to issue the "CTDB_CONTROL_GOTIT" control when it has successfully done step 2
after having asked ctdb to fetch the record. This will release the key from the
"holdback list" and re-issue all dmaster migration requests.
As a safeguard against malicious clients, once a second (default 1000msecs,
tunable "HoldbackCleanupInterval" in milliseconds) ctdbd goes over the list of
held back keys, deletes them and releases all held back migration requests.
(This used to be ctdb commit 5736e17c139c9a8049e235429aeae0c6c9d0e93d)
This is a simplified version of the trans2 commit control:
It just rolls out the marshall buffer to all active nodes.
It is the main ctdbd part of the re-implementation of the
persistent transactions. The client code is changed to
take a global lock to start a transactions and store into
the marshal buffer instead of writing to the local tdb
under a local transaction.
The old transaction implementation is going to be
removed in a later commit.
Michael
(This used to be ctdb commit f66428f9d2013080a414404c1ba6117888352fd6)
We also no longer return an error before scripts have been run; a special
zero-length data means we have never run the scripts.
"ctdb scriptstatus all" returns all event script results.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 9b90d671581e390e2892d3a68f3ca98d58bef4df)
The child no longer uses ctdb_ctrl_event_script_init or
ctdb_ctrl_event_script_finished, and the others are redundant: it
doesn't need to tell us it's starting a script when it only runs one.
We move start and stop calls to the parent, and eliminate the RPC
infrastructure altogether.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 391926a87a7af73840f10bb314c0a2f951a0854c)
This unifies code paths and simplifies things: we just hand -ENOEXEC to
ctdb_ctrl_event_script_stop().
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit eadf5e44ef97d7703a7d3bce0e7ea0f21cb11f14)
This controls is only used by samba when samba wants to check if a subrecord held by a <node-id>:<smbd-pid> is still valid or if it can be reclaimed.
If the node is banned or stopped, we kill the smbd process and return that the process does not exist to the caller. This allows us to recover subrecords from stopped/banned nodes where smbd is hung waiting for the databases to thaw.
bz58185
(This used to be ctdb commit 157807af72ed4f7314afbc9c19756f9787b92c15)
Rather than doing strcmp everywhere, pass an explicit enum around. This
also subtly documents what options are available. The "options" arg
is now used for extra arguments only.
Unfortunately, gcc complains on empty format strings, so we make
ctdb_event_script() take no varargs, and add ctdb_event_script_args(). We
leave ctdb_event_script_callback() taking varargs, which means callers
have to do "%s", "".
For the moment, we have CTDB_EVENT_UNKNOWN for handling forced scripts
from the ctdb tool.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 470822b329f9d3ca9bef518b56e9ce28d5fedda2)
Otherwise a node can lock itself out, e.g. when a commit control times out...
Michael
(This used to be ctdb commit cb432e30351d5e5a41e98da3c7b1c2a4d400a3a2)
This aske the daemon wheter a transaction is currently active on a
given DB on that node. More precisely this asks for the transaction_active
flag in the ctdb_db_context that is set in the CTDB_TRANS2_COMMIT
control and cleared in the CTDB_TRANS2_ERROR or CTDB_TRANS2_FINISHED controls.
This will be useful for fixing race conditions in the transaction code.
Michael
(This used to be ctdb commit 8d430ae6968dfe566614379436fc3c56003fcd88)
The way to use this is from a client to :
1, first create a message handle and bind it to a SRVID
A special prefix for the srvid space has been set aside for samba :
Only samba is allowed to use srvid's with the top 32 bits set like this.
The lower 32 bits are for samba to use internally.
2, register a "notification" using the new control :
CTDB_CONTROL_REGISTER_NOTIFY = 114,
This control takes as indata a structure like this :
struct ctdb_client_notify_register {
uint64_t srvid;
uint32_t len;
uint8_t notify_data[1];
};
srvid is the srvid used in the space set aside above.
len and notify_data is an arbitrary blob.
When notifications are later sent out to all clients, this is the payload of that notification message.
If a client has registered with control 114 and then disconnects from ctdbd, ctdbd will broadcast a message to that srvid to all nodes/listeners in the cluster.
A client can resister itself with as many different srvid's it want, but this is handled through a linked list from the client structure so it mainly designed for "few notifications per client".
3, a client that no longer wants to have a notification set up can deregister using control
CTDB_CONTROL_DEREGISTER_NOTIFY = 115,
which takes this as arguments :
struct ctdb_client_notify_deregister {
uint64_t srvid;
};
When a client deregisters, there will no longer be sent a message to all other clients when this client disconnects from ctdbd.
(This used to be ctdb commit f1b6ee4a55cdca60f93d992f0431d91bf301af2c)
database priorities will be used to control in which order databases are locked during recovery in.
(This used to be ctdb commit 67741c0ee01916d94cace8e9462ef02507e06078)
This event is called when a node is stopped and is used by eventscripts that need to do certain cleanup and removal of configuration or ip addresses or routing ...
Note that a STOPPED node is considered "inactive" and as such will not be running the "recovered" event when the rest of the cluster has recovered.
(This used to be ctdb commit 65e9309564611bf937ded3c74a79abff895d7c59)
Log this in "ctdb statistics".
Also add a varaible "RecLockLatencyMs" that will log an error everytime it takes longer than this to access the reclock file.
(This used to be ctdb commit 042377ed803bb8f7ca9d6ea1a387427b7b8ba45a)
When a client (such as smbstatus) is killed, it may have outstanding
traverse children on remote nodes. We need to catch the client
disconnect in ctdbd and send a control to all nodes telling them to
kill those outstanding traverse children.
(This used to be ctdb commit f2fb2df4619a14f7f6c11f9132ee7d793028042c)
this command shows which eventscripts were executed during the last monitoring cycle and the status from each eventscript.
If an eventscript timedout or returned an error we also
show the output from the eventscript.
Example :
[root@rcn1 ctdb-git]# ./bin/ctdb scriptstatus
6 scripts were executed last monitoring cycle
00.ctdb Status:OK Duration:0.021 Mon Mar 23 19:04:32 2009
10.interface Status:OK Duration:0.048 Mon Mar 23 19:04:32 2009
20.multipathd Status:OK Duration:0.011 Mon Mar 23 19:04:33 2009
40.vsftpd Status:OK Duration:0.011 Mon Mar 23 19:04:33 2009
41.httpd Status:OK Duration:0.011 Mon Mar 23 19:04:33 2009
50.samba Status:ERROR Duration:0.057 Mon Mar 23 19:04:33 2009
OUTPUT:ERROR: Samba tcp port 445 is not responding
Add a new helper function "switch_from_server_to_client()" which both
the recovery daemon can use as well as in the child process we start for running the actual eventscripts.
Create several new controls, both for the eventscript child process to inform the master daemon of the current status of the scripts as well as for the ctdb tool to extract this information from the runninc daemon.
(This used to be ctdb commit c98f90ad61c9b1e679116fbed948ddca4111968d)
allow clients to register either ipv4 or ipv6 client connections to the tickles list
(This used to be ctdb commit d9b44d7c3255b0fd7359b9afeb613e6ff4c4eaac)
race between the ctdb tool and the recovery daemon both at once
trying to push flag changes across the cluster.
(This used to be ctdb commit a9a1156ea4e10483a4bf4265b8e9203f0af033aa)
older ipv4-only version of these controls.
We need this so that we are backwardcompatible with old versions of ctdb
and so that we can interoperate with a ipv4-only recmaster during a
rolling upgrade.
(This used to be ctdb commit 6b76c520f97127099bd9fbaa0fa7af1c61947fb7)
we currently only monitor that the dameons are running by kill(0, pid)
and verifying the the domain socket between them is ok.
this is not sufficient since we can have a situation where the recovery
daemon is hung.
this new code monitors that the recovery daemon is operating.
if the recovery hangs, we log this and shut down the main daemon
(This used to be ctdb commit cd69d292292eaab3aac0e9d9fc57cb621597c63c)