IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
In include/ctdb.h, ctdb_callback_t and ctdb_rrl_callback_t were
defined with a void *private variable. The variable name was
changed to void *private_data to avoid issues encountered in
the Samba autoconf script.
Evan Kinney <evan.kinney@sas.com>
(This used to be ctdb commit 1f453aa4b5e749468c7788afac09c6f0900ea18f)
For non bash shells $_s_script might end with '/*'.
We do the workarround this way, because it makes sense to check
that a script is executable, before trying to execute it.
metze
[ This actually applies to any shell -- Rusty Russell ]
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit e665cfde03fc9ec2264e99512ed5470872a2fd04)
When doing a releaseip event, we do them in parallel for all the separate
IPs. This creates a problem for iptables, which isn't reentrant, giving
the strange message:
iptables encountered unknown error "18446744073709551615" while initializing table "filter"
The worst possible symptom of this is that releaseip won't remove the rule
which prevents us listening to clients during releaseip, and the node will be
healthy but non-responsive.
The simple workaround is to flock-wrap iptables. Better would be to rework
the code so we didn't need to use iptables in these paths.
CQ:S1018353
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 72d6914ee913272312d7b68f1be5ad05ad06587d)
Martin accidentally typed this instead of "ctdb scriptstatus releaseip"
and it crashes.
CQ:S1018859
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 70877b2e7f8fd0d46899bbeca2c6caad6e6e6820)
This unifies our RPM version handling, based on tags.
1) Tags are of form ctdb-<version>.
2) The first <version> starts with .1.
3) Devel versions end with .0.<patchnum>.<checksum>.devel to reliably
identify them.
This means that devel versions will correctly supersede releases and earlier
devels, but new releases will correctly supersede older devel RPMs.
Making a new release is as simple as creating a new git tag.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 44009e02a661d4a1e14246f650974fc4ed7a07c9)
We've been seeing "Invalid packet of length 0" errors, but we don't know
what is sending them. Add a name for each queue, and print nread.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit e6cf0e8f14f4263fbd8b995418909199924827e9)
When tdb throws an error, we didn't report the name of the tdb; we should.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit cfea357c9b2142c8cd8cac1ee712d40b188793e1)
We discovered that recent smbd locks the serverid tdb while
holding a lock on another tdb (locking.tdb):
7: POSIX ADVISORY WRITE smbd-2224318 locking.tdb.0 10600 10600
22: -> POSIX ADVISORY READ smbd-2224318 serverid.tdb.0 26580 26580
The result is a deadlock against the ctdb_freeze code called for
recovery. We extend the "notify" workaround to this case, too.
BZ:65158
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit dfdaa446cf256854ff6d267dceeb86fbee8bb188)
Seconds between ctdbd first log message and node healthy:
BEFORE: 4.03
AFTER: 2.02
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 8f17731dea4287d4f9b21dc58c1bdf26c8a0e628)
The extra recovery interval wait was introduced in 821333afb458 but no
explanation was provided in that message. Nonetheless, if starting
the entire cluster for the first time, it should be safe to skip this.
We use the commandline arg --sloppy-start which should discourage
people from using it outside testing.
Seconds between ctdbd first log message and node healthy:
BEFORE: 16.10
AFTER: 4.03
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 509e2e89ae233a0e91998d95267bf62f296a73cd)
Seconds between ctdbd first log message and node healthy:
BEFORE: 17.08
AFTER: 16.10
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 372201d418f041d69646793105f6898ab12a7d91)
We currently sleep for one second, whether or not we've already slept.
Change this to sleep for the remainder of the second, if at all.
Seconds between ctdbd first log message and node healthy:
BEFORE: 18.09
AFTER: 17.08
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit fde760b5f39c77172308a583da4c2443b71541c9)
Once we've done a startup, we need to run a monitor event successfully
to be marked as healthy. Rather than wait the usual 5 seconds, run it
immediately (which will then reset next_interval to 5 seconds).
Seconds between ctdbd first log message and node healthy:
BEFORE: 23.58
AFTER: 18.09
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit c8651494febcb1c9e558b2002e2a72c2bf547c06)
We do a recovery on startup. But the code does:
Sleep for ctdb->tunable.recover_interval.
Check for recovery.
We want to do it in the other order. This is best done by extracting
the loop into a separate "main_loop" function.
Seconds between ctdbd first log message and node healthy:
BEFORE: 24.09
AFTER: 23.58
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 097046025176b9fcb670839d1a9f100f890e7ed2)
We dont want it to wrap almost immediately so that basically all "ctdb ..."
commands log the "Reqid wrap" warning.
(This used to be ctdb commit f26b59d8b96a70baa80ab1bad406ee6a21330b68)
Ronnie and I tracked down a bug which seems to be caused by a node
running so slowly that we timed out the request and reused the request
id before it responded.
The result was that we unlocked the wrong record, leading to the
following:
ctdbd: tdb_unlock: count is 0
ctdbd: tdb_chainunlock failed
smbd[1630912]: [2010/06/08 15:32:28.251716, 0] lib/util_sock.c:1491(get_peer_addr_internal)
ctdbd: Could not find idr:43
ctdbd: server/ctdb_call.c:492 reqid 43 not found
This exact problem is now detected, but in general we want to delay
id reuse as long as possible to make our system more robust.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 9eb9c53ef29f4871ae2fe62fc5cb6145fca89eed)
Since idtree assigns sequentially, it rarely reaches high numbers.
But such numbers can be forced with idr_get_new_above(), and that
reveals two bugs:
1) Crash in sub_remove() caused by pa array being too short.
2) Shift by more than 32 in _idr_find(), which is undefined, causing
the "outside the current tree" optimization to misfire and return NULL.
Signed-off-by: Rusty Russell <rusty@rustorp.com.au>
(This used to be ctdb commit 32c04e11ebbcf8239e47016302c6ce802a8b0a6f)
If a noremote node hangs for an extended period, it is possible
that we might have a DMASTER request in flight for record A to that node.
Eventually we will reuse the idr, and may reuse it for a DMASTER request to a different node for a different record B.
If while the request for B is in flight, the first tnode un-hangs and responds back
we would receive a dmaster reply for the wrong record.
This would cause a record to become perpetually locked, since inside the daemon we would tdb_chainlock(dmaster_reply->pdu->key) but once the migration would complete we would chainunlock idr->state->call->key
Adding code to verify that when we receive a dmaster reply packet that it does in fact match the exact same key that the state variable we have for the idr in flight.
(This used to be ctdb commit 2f6a870d7ff02ceb61fde242f752dccbfcb4cb37)
In that case, when the main daemon is not running
the ctdb context will be initialized to NULL, since we can not connect.
Move the calls to read the ctdb socketname and connecting via libctdb to
only happen when we are executing a "ctdb ..." command that requires that we talk to the actual daemon.
Otherwise we will get an ugly SEGV for the "ctdb ..." commandline tool
when trying to run a command that is supposed to work also when the daemon is down.
(This used to be ctdb commit 18168da84a6aa8d69465e43402444c7ec979604a)
A simple connector function, made a bit more complex because TDB adds
a '\n' and we don't.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit ae5b89dca00ca080c70868430fa54ba07bd6f5f4)
The code on which this is based could alter the header: a normal client
can't. If we use this differently later we can change this. For the
moment it's a nice extra check.
We optimize out the record write altogether when the record hasn't
changed, rather than just suppressing the seqnum update.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 2638dbae7bf1a35ed37802e35e179e435a5d622a)
I missed some int->bool conversions previously, particularly the
return of ctdb_writerecord().
By always handing functions ctdb_connection or ctdb_db, we keep it
consistent with the rest of the API and can do extra lock consistency
checks.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 3f939956ddd693cba6ea5c655288f4f5ca95f768)
Now we have more messages, it seems to make sense to document their usage
and make them consistent.
In particular, LOG_CRIT for internal libctdb problems, LOG_ALERT for
API misuse.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit a6fed3f577c7ec51df38ed15ecb9db6ea2ae7c8f)
Rather than using a binary, we use a magic value for locking. We also
split out the "dont have the lock yet" from the "do have the lock"
paths for clarity and extra checking.
This should detect a superset of the previous case, even if they free
(and reuse) the lock memory.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit dc081d40051b9204bb38e4de7dfe8d78656593d0)
Verify that the lock is still held and refuse the write otherwise.
We have to guarantee that we dont write to an unlocked record.
If we write to a record after it has been released, the record may have
already migrated off the node, in which case we get a DMASTER split brain for this record. (These application bugs are incredibly hard to track down)
(This used to be ctdb commit f62c7e44dc303f274bbc1dd59fad2167e72a2af0)
This allows us to keep the datastructure valid after the lock has been released by the application and we can trap and warn when the application is accessing the lock after it has been released. I.e. application bugs.
(This used to be ctdb commit 463a266205f145cd9c4c36b9c59d3747eeef0e2e)
Full documentation for all the functions.
This looks longer than it is, because it sorts them into async and
sync parts, and also renames some formal parameters.
Added TODO to libctdb directory to track our plans.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 108e9c2450876a9f8821aa7efd5be971eee5afd3)
We're best off including ctdb_protocol.h to get these, even if we
document the important ones in ctdb.h.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit cdc19dc73032470d57f38bf825d8113b3a0c8cd1)
Return bool instead of -1/0; that's what the young kids are doing
these days!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit e285b5d5a9d4fbc4f75dbb237d2fcdbd84f2d605)
In particular, this stops them grabbing two (with wrappers so we can
enhance this logic once we support threads), and warns them if they
re-enter ctdb_service() holding a lock (you are not supposed to block!).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit c620cfbad3b5f0d6330ef47f572d4ade08e169e8)
This is based on Ronnie's work, merged with mine. That means
errors are all my fault.
Differences from Ronnie's:
1) use syslog's LOG_ levels directly.
2) typesafe arg to log function, and use it (eg stderr) in helper function.
3) store fn in ctdb context, and expose ctdb_log_level directly thru API.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 86259aa395555aaf7b2fae7326caa2ea62961092)
This is going to help for logging, since we want it there.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 0786152472bc43efae4c896f7c6c07c6e080b9b2)
Make the use of ctdb_release_lock() mandatory from the callback.
Split ctdb_release_lock() in two, release the tdb lock in the
ctdb_release_lock() function and move the freeing of the lock structure to ctdb_free_lock() which is private to libctdb.
When the callback returns, verify that the callback has actually released the lock and warn (FIXME) if not.
Update ctdb_writerecord to warn and fail (FIXME) if writing while the lock is not held.
(This used to be ctdb commit 87dc18a3a051da04685f14529c53c428d37c2912)