IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
dont dereference a null pointer while trying to print the log message for the failure.
also shutdown ctdb with ctdb_fatal()
(This used to be ctdb commit f8642d0438c6bbb34a72c25d6a904b626e247410)
This is called everytime a reallocation is performed.
While STARTRECOVERY/RECOVERED events are only called when
we do ipreallocation as part of a full database/cluster recovery,
this new event can be used to trigger on when we just do a light
failover due to a node becomming unhealthy.
I.e. situations where we do a failover but we do not perform a full
cluster recovery.
Use this to trigger for natgw so we select a new natgw master node
when failover happens and not just when cluster rebuilds happen.
(This used to be ctdb commit 7f4c591388adae20e98984001385cba26598ec67)
This can cause ctdbd to spin at 100% in the eventsystem,
creating a timed event that will immediately trigger again
and again.
On uniprocessors this cause the eventscript we are actually waiting for to
basically become cpu starved and never complete.
(This used to be ctdb commit 92c8408fba957a8ded13f7e285da290502735234)
Add a new "ctdb deltickle" command to delete tickles from the database.
This can ONLY be used for tickles created by "ctdb addtickle".
Push any "addtickle/deltickle" updates to other nodes every TickleUpdateInterval seconds'
(This used to be ctdb commit acded034e2f0dcae4c2c9e54e16a001caf23caec)
This means we can distinguish which child is logging, esp. via syslog where we have no pid.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 68b3761a0874429b90731741f0531f76dcfbb081)
After 5 attempts to send a RST to a client without any response, we free
"con"; this is done during a traverse. This frees the node we are walking
through (the node is made a child of "con" down in rb_tree.c's
trbt_create_node() (Valgrind would catch this, as Martin confirmed).
So, we create a temporary parent and reparent onto that; then we free
that parent after the traverse, thus deleting the unwanted nodes.
CQ:S1019041
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 08f7f85477610a4916c1ec866aa467b28f1bbec3)
We shouldn't even think about vacuuming when we've frozen the database
(which is earlier than when we set CTDB_RECOVERY_ACTIVE)
CQ:S1018154 & S1018349
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit d8df6835a931082af232c4b94f1dede6f16169f9)
Martin Schwenke discovered that 517f05e42f17766b1e8db8f1f4789cbad968e304
("freeze: abort vacuuming when we're going to freeze.") used ctdb_db for
a logging message which is in fact uninitialized, causing a crash (even
if it wasn't actually logged).
Initialize it properly. Also fix incorrect format in another logging
message introduced in that same change.
CQ:S1019093
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 8e518950ba281502318d6300f7a5ec6cdf6b5674)
There are some reports of freeze timeouts, and it looks like vacuuming might
be the culprit. So we add code to tell them to abort when a freeze is
going on.
(This is based on the 1.0.112 branch version 517f05e42f, but far
simpler since tdb is now robust against processes being killed during
transaction commit)
CQ:S1018154 & S1018349
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit f5d7dc679501e607c2c83a248a89d3cada9df146)
In Samba this is now called "tevent", and while we use the backwards
compatibility wrappers they don't offer EVENT_FD_AUTOCLOSE: that is now
a separate tevent_fd_set_auto_close() function.
This is based on Samba version 7f29f817fa.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 85e5e760cc91eb3157d3a88996ce474491646726)
We've been seeing "Invalid packet of length 0" errors, but we don't know
what is sending them. Add a name for each queue, and print nread.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit e6cf0e8f14f4263fbd8b995418909199924827e9)
We discovered that recent smbd locks the serverid tdb while
holding a lock on another tdb (locking.tdb):
7: POSIX ADVISORY WRITE smbd-2224318 locking.tdb.0 10600 10600
22: -> POSIX ADVISORY READ smbd-2224318 serverid.tdb.0 26580 26580
The result is a deadlock against the ctdb_freeze code called for
recovery. We extend the "notify" workaround to this case, too.
BZ:65158
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit dfdaa446cf256854ff6d267dceeb86fbee8bb188)
Seconds between ctdbd first log message and node healthy:
BEFORE: 4.03
AFTER: 2.02
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 8f17731dea4287d4f9b21dc58c1bdf26c8a0e628)
The extra recovery interval wait was introduced in 821333afb458 but no
explanation was provided in that message. Nonetheless, if starting
the entire cluster for the first time, it should be safe to skip this.
We use the commandline arg --sloppy-start which should discourage
people from using it outside testing.
Seconds between ctdbd first log message and node healthy:
BEFORE: 16.10
AFTER: 4.03
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 509e2e89ae233a0e91998d95267bf62f296a73cd)
Seconds between ctdbd first log message and node healthy:
BEFORE: 17.08
AFTER: 16.10
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 372201d418f041d69646793105f6898ab12a7d91)
We currently sleep for one second, whether or not we've already slept.
Change this to sleep for the remainder of the second, if at all.
Seconds between ctdbd first log message and node healthy:
BEFORE: 18.09
AFTER: 17.08
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit fde760b5f39c77172308a583da4c2443b71541c9)
Once we've done a startup, we need to run a monitor event successfully
to be marked as healthy. Rather than wait the usual 5 seconds, run it
immediately (which will then reset next_interval to 5 seconds).
Seconds between ctdbd first log message and node healthy:
BEFORE: 23.58
AFTER: 18.09
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit c8651494febcb1c9e558b2002e2a72c2bf547c06)
We do a recovery on startup. But the code does:
Sleep for ctdb->tunable.recover_interval.
Check for recovery.
We want to do it in the other order. This is best done by extracting
the loop into a separate "main_loop" function.
Seconds between ctdbd first log message and node healthy:
BEFORE: 24.09
AFTER: 23.58
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 097046025176b9fcb670839d1a9f100f890e7ed2)
If a noremote node hangs for an extended period, it is possible
that we might have a DMASTER request in flight for record A to that node.
Eventually we will reuse the idr, and may reuse it for a DMASTER request to a different node for a different record B.
If while the request for B is in flight, the first tnode un-hangs and responds back
we would receive a dmaster reply for the wrong record.
This would cause a record to become perpetually locked, since inside the daemon we would tdb_chainlock(dmaster_reply->pdu->key) but once the migration would complete we would chainunlock idr->state->call->key
Adding code to verify that when we receive a dmaster reply packet that it does in fact match the exact same key that the state variable we have for the idr in flight.
(This used to be ctdb commit 2f6a870d7ff02ceb61fde242f752dccbfcb4cb37)
and print the time startistics was taken and for how long the statistics have been collected to the "ctdb statistics" output.
(This used to be ctdb commit 1bdfe0cd3370a335b960ce1ef97eade93b0cd2fa)
->recovery_mode was set to normal but database priorities leven2 or 3 was still set to frozen.
causing the recovery daemon to fail to detect that a recovery was needed to recover access to the database.
BZ63951
(This used to be ctdb commit 7411b2b577a16f85ad6913e1bfccce7ea260a613)
ctdb_client.h is the existing internal client interface (which was mainly
in ctdb.h), and ctdb_protocol.h is the information needed for the wire
protocol only.
ctdb.h will be the new, shiny, libctdb API.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 4bba6b8cd47b352f98d41f9f06258d5ac3c9adef)
verify that all nodes agree on the most recent ip address assignments
broke "ctdb moveip ..." since that call would never trigger
a full takeover run and thus would immediately trigger an inconsistency.
Add a new message to the recovery daemon where we can tell the recovery daemon to update its assignments.
BZ62782
(This used to be ctdb commit e7069082e5f0380dcddee247db8754218ce18cab)