IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
This reverts commit 401f421fa003d9515df15e759b50b56e0c67d69c.
Conflicts:
include/ctdb_private.h
server/ctdb_tunables.c
(This used to be ctdb commit b883d19a495a41a22db37f9c2cf6250fee529de0)
since we no longer ban nodes when dodgy scripts continue to hang.
We now only mark nodes as unhealthy if monitor events fail or timeout. Never ban.
(This used to be ctdb commit 5c8e56fc7a518e115bceac257867739283cf6a1e)
there is no rational need for a setting where we permanently mark nodes as disabled everytime an eventscript fails
(This used to be ctdb commit 68a8ee99b128a5ec883600735626bdb3bbc9c503)
This patch improves the handling of the fetch_lock operation on non-persistent
databases that ctdb clients have to do very frequently.
The normal flow how this goes is the following:
1. Client does a local fetch_lock on the database
2. Client looks if the local node is dmaster.
If yes, everything is fine
If no, continue here
3. Client unlocks the local record
4. Client issues a "get me the record" call to ctdbd
5. ctdbd goes out and fetches the dmaster role
6. ctdbd tells the client to retry
7. Client starts over again
The problem is between step 6 and 7: Before the client has had the chance to
retry (i.e. catch the record with a fetch_locked), another node might have come
asking ctdbd to migrate away the record again. This is a real problem, I've
seen >20 loops of this kind in real workloads.
This patch does the following: Whenever ctdb receives a record as result of
step 5, it puts the key on a "holdback list". As long as a key is on this list,
a request to migrate away the dmaster is put on hold. It is the client's duty
to issue the "CTDB_CONTROL_GOTIT" control when it has successfully done step 2
after having asked ctdb to fetch the record. This will release the key from the
"holdback list" and re-issue all dmaster migration requests.
As a safeguard against malicious clients, once a second (default 1000msecs,
tunable "HoldbackCleanupInterval" in milliseconds) ctdbd goes over the list of
held back keys, deletes them and releases all held back migration requests.
(This used to be ctdb commit 5736e17c139c9a8049e235429aeae0c6c9d0e93d)
This is a simplified version of the trans2 commit control:
It just rolls out the marshall buffer to all active nodes.
It is the main ctdbd part of the re-implementation of the
persistent transactions. The client code is changed to
take a global lock to start a transactions and store into
the marshal buffer instead of writing to the local tdb
under a local transaction.
The old transaction implementation is going to be
removed in a later commit.
Michael
(This used to be ctdb commit f66428f9d2013080a414404c1ba6117888352fd6)
Date: Wed, 9 Dec 2009 22:45:12 +0100
Subject: [PATCH] Revert an accidential commit
(This used to be ctdb commit af6656f2844d8fd72204a70358c9d589dbe1bd34)
This might be a bit less efficient, but experience in winbind has shown that
event callbacks can trigger changes in the socket state in very hard to
diagnose ways.
(This used to be ctdb commit a78b8ea7168e5fdb2d62379ad3112008b2748576)
We also no longer return an error before scripts have been run; a special
zero-length data means we have never run the scripts.
"ctdb scriptstatus all" returns all event script results.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 9b90d671581e390e2892d3a68f3ca98d58bef4df)
We're going to need this so ctdb can query non-monitor status.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 53bc5ca23ca55a3ac63a440051f16716944a2a51)
Rather than only tranferring to last_status for monitor events, do
it for every event (ctdb->last_status is now an array).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit c73ea56275d4be76f7ed983d7565b20237dbdce3)
We're going to allow fetching status of all script runs, so this
name is no longer appropriate.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit f5cb41ecf3fa986b8af243e8546eb3b985cd902a)
A new helper functions which sets up an event attached to the child's
stdout/stderr which gets routed to the logging callback after being
placed in the normal logs.
This is a generalization of the previous code which was hardcoded to
call ctdb_log_event_script_output.
The only subtlety is that we hang the child fds off the output buffer;
the destructor for that will flush, which means it has to be destroyed
before the output buffer is.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 32cfdc3aec34272612f43a3588e4cabed9c85b68)
The child no longer uses ctdb_ctrl_event_script_init or
ctdb_ctrl_event_script_finished, and the others are redundant: it
doesn't need to tell us it's starting a script when it only runs one.
We move start and stop calls to the parent, and eliminate the RPC
infrastructure altogether.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 391926a87a7af73840f10bb314c0a2f951a0854c)
We put a "scripts" member in ctdb_event_script_state, rather than using
a special struct for monitor events. This will fit better as we further
unify the different events, and holds the reports from the child process
running each monitor script.
Rather than making the monitor state a child of current_monitor_status_ctx,
we just point current_monitor directly at it. This means we need to reset
that pointer in the destructor for ctdb_event_script_state.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 9a2b4f6b17e54685f878d75bad27aa5090b4571f)
We have monitor_event_script_ctx and other_event_script_ctx, and
current_monitor_status_ctx in struct ctdb_context. This seems more
complex than it needs to be.
We use a single "event_script_ctx" as parent for all event script
state structures. Then we explicitly reparent monitor events under
current_monitor_status_ctx: this is freed every script invocation to
kill off any running scripts anyway.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 0d925e6f2767691fa561f15bbb857a2aec531143)
eventscript.c uses this now, but our next patch makes others use it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit a305cb7743c24386e464f6b2efab7e2108bb1e7e)
This unifies code paths and simplifies things: we just hand -ENOEXEC to
ctdb_ctrl_event_script_stop().
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit eadf5e44ef97d7703a7d3bce0e7ea0f21cb11f14)
This simplifies the code a little: last_status is now read to go
(it's only used by the scriptstatus command at the moment).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 6be931266a4e41fd0253f760936ad9707dd97c47)
This controls is only used by samba when samba wants to check if a subrecord held by a <node-id>:<smbd-pid> is still valid or if it can be reclaimed.
If the node is banned or stopped, we kill the smbd process and return that the process does not exist to the caller. This allows us to recover subrecords from stopped/banned nodes where smbd is hung waiting for the databases to thaw.
bz58185
(This used to be ctdb commit 157807af72ed4f7314afbc9c19756f9787b92c15)
Add the mapping to the list everytime we accept() a new client connection
and set it up to remove in the destructor when the client structure is freed.
(This used to be ctdb commit f75d379377f5d4abbff2576ddc5d58d91dc53bf4)
Now we're doing checking, we might as well make sure the commands from
"ctdb eventscripts" are valid.
This gets rid of the "UNKNOWN" event type.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 1d24a3869fe89fc9a109fd9e9b69df5fc665a5f6)
Rather than doing strcmp everywhere, pass an explicit enum around. This
also subtly documents what options are available. The "options" arg
is now used for extra arguments only.
Unfortunately, gcc complains on empty format strings, so we make
ctdb_event_script() take no varargs, and add ctdb_event_script_args(). We
leave ctdb_event_script_callback() taking varargs, which means callers
have to do "%s", "".
For the moment, we have CTDB_EVENT_UNKNOWN for handling forced scripts
from the ctdb tool.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 8001488be4f2beb25e943fe01b2afc2e8779930d)
Everyone uses the same timeout value, so just remove it from the API.
If we ever need variable timeouts, that might as well be central too.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 533c3e053293941d2a9484b495e78d45f478bb08)
command.
Use the existing context used for non-monitor events
Multiple concurrent uses of "ctdb eventscript ..." could otherwise lead to a SEGV
(This used to be ctdb commit 80a8d728e9680040e00d24361dfc9367dd372a56)
This allows running the actual monitoring asynchronously from ctdbd
and only using "status" to pick up the actual results.
(This used to be ctdb commit 1908bac812650ca25151051f5d86815e0b8ed319)
use a udp socket on the ctdbd port to send messages to teh syslog child process for loggign.
we need this when syslog becomes "slow", like very slow, and on boxes where syslog is limited to 100 lines per second and starts to block after that
(This used to be ctdb commit 1446f4c247310e2ff2d522055bd8927d1a78d017)
Otherwise a node can lock itself out, e.g. when a commit control times out...
Michael
(This used to be ctdb commit cb432e30351d5e5a41e98da3c7b1c2a4d400a3a2)
This aske the daemon wheter a transaction is currently active on a
given DB on that node. More precisely this asks for the transaction_active
flag in the ctdb_db_context that is set in the CTDB_TRANS2_COMMIT
control and cleared in the CTDB_TRANS2_ERROR or CTDB_TRANS2_FINISHED controls.
This will be useful for fixing race conditions in the transaction code.
Michael
(This used to be ctdb commit 8d430ae6968dfe566614379436fc3c56003fcd88)
add a global variable holding the pid of the main daemon.
change the tracking of time() in the event loop to only check/warn when called from the main daemon
(This used to be ctdb commit a10fc51f4c30e85ada6d4b7347b0f9a8ebc76637)
The way to use this is from a client to :
1, first create a message handle and bind it to a SRVID
A special prefix for the srvid space has been set aside for samba :
Only samba is allowed to use srvid's with the top 32 bits set like this.
The lower 32 bits are for samba to use internally.
2, register a "notification" using the new control :
CTDB_CONTROL_REGISTER_NOTIFY = 114,
This control takes as indata a structure like this :
struct ctdb_client_notify_register {
uint64_t srvid;
uint32_t len;
uint8_t notify_data[1];
};
srvid is the srvid used in the space set aside above.
len and notify_data is an arbitrary blob.
When notifications are later sent out to all clients, this is the payload of that notification message.
If a client has registered with control 114 and then disconnects from ctdbd, ctdbd will broadcast a message to that srvid to all nodes/listeners in the cluster.
A client can resister itself with as many different srvid's it want, but this is handled through a linked list from the client structure so it mainly designed for "few notifications per client".
3, a client that no longer wants to have a notification set up can deregister using control
CTDB_CONTROL_DEREGISTER_NOTIFY = 115,
which takes this as arguments :
struct ctdb_client_notify_deregister {
uint64_t srvid;
};
When a client deregisters, there will no longer be sent a message to all other clients when this client disconnects from ctdbd.
(This used to be ctdb commit f1b6ee4a55cdca60f93d992f0431d91bf301af2c)
Add a new tunable to control the maximum queue size we allow to a blocked client before we start discarding REQ_MESSAGES instead of queueing them for delivery.
This avoids having queued up very very large number of MESSAGES that samba semds
between eachother to nodes that are blocked/banned/stopped for extended periods
.
(This used to be ctdb commit f76d6fed8f9630450263b9fa4b5fdf3493fb1e11)
Add a tuneable so that when scripts starts to hang/timeout, we can make the node unhealthy instead of banned
(This used to be ctdb commit 2e9fc6f0609833c6d8146196011ef780669d615d)
master to perform an explicit ip reallocation.
This is more reliable and faster than having the recovery dameon track these
changes, and since we now have an explicit method to ask the recovery daemon
to perform an explicit ip reallocation, we should use this.
(This used to be ctdb commit 3807681e74f4bfe92befdae6ed616ff5f1a99880)
transactions we start across all tdb databased during the recovery.
this allows us to properly clean up and delete these tdb transactions on a
recovery failure.
(This used to be ctdb commit b2ce8b900a7d00944c84e0574fea5b371064a06d)