IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
An inactive node can't become the recovery master. So if an inactive
node notices that the recovery master is inactive, it shouldn't force
an election for recovery master and nominate itself as a candidate.
This can cause the recovery master to flip-flop between nodes when all
nodes are inactive.
If there is actually an active node then it will trigger the election.
This is fairly cosmetic but is a step along the way towards ironing
out weirdness when all nodes are stopped.
Also, fix a related comment.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit e7dc10da3ced54ea9d719ad167ee42dcca8dce75)
Doing these checks is pointless and potentially causes unnecessary log
messages.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit a0c30c820fd47d4f8620dc060c825be10754f5d1)
If CTDB starts in STOPPED state then it thinks it is in the middle of
a recovery. rec->ifaces is also NULL and an early exit further down
(that checks to see if a recovery is in process) means that it stays
that way.
However, each time this function is entered the need for a takeover
run is re-flagged. The takeover run never happens due to the the
early exit, causing a couple of unneeded messages to be logged each
time.
This is avoided by moving the code that sets rec->ifaces so that it is
executed earlier and, in this case, in the middle of a recovery.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit f586e8a2911fc6e7f6698f516653145d8fd45dad)
Change this to instead preallocate , by default, 10MByte chunks to the data buffer.
This significantly reduces the number of potential reallocate and move operations that may be required.
Create a tunable to override/change how much preallocation should be used.
(This used to be ctdb commit 1f262deaad0818f159f9c68330f7fec121679023)
Wrap all creation of child processes inside ctdb_fork() which is used to track all processes we have spawned.
Capture SIGCHLD to track also which child processes have terminated.
Wrap kill() inside ctdb_kill() and make sure that we never send a !0 signal to a child process pid that has already terminated (and might have been replaced with a
(This used to be ctdb commit f73a4b1495830bcdd094a93732a89dd53b3c2f78)
Also add a method to use the recovery master/daemon to reload the public ips on all nodes in the cluster.
Reloading the public ips on all node sin the cluster is only suported if all nodes in the cluster are available and healthy.
(This used to be ctdb commit 05603e914f8c12618d7e06943c0f7df207f645b0)
The implementation of DisableIPFailover got intermingled with
--nopublicipcheck. This just looks wrong - Ronnie must have been
having a bad day. :-)
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 5083b266dd68b292c4275505f3d1b878dbf12f11)
Add a new tunable that changes the mode how persistent databases are recovered.
RecoveryPDBBySeqNum
When set to 1, persistent databases will be recovered in whole from the node which
has the highest "__db_sequence_number__" record.
This record is managed by samba for those databases where we do persistent writes and have
inter-record relations.
For these databases we do not want the usual "blend records from all nodes based
on individual record RSN" but instead a mode where we pick one instance of the persistent database.
If no node was found with a "__db_sequence_number__" record at all, we fail back to the original "recover records independently based on record RSN".
Some persistent databases do not contain record interrelations and as such does not
contain this special record at all.
(This used to be ctdb commit 502150c764298a9fa8c4d8aa445bf7d85d4ee9dc)
metze
(cherry picked from commit 6ba8af28f8a8f79db65120a97d7157dcc5c7e083)
Signed-off-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit ccd67cf7f26713e695000d89d9ce8cfa78bfe00f)
compared to old 1.0 branches
This must have been mistakenly applied to master when you intended to push
for a different branch i guess.
Revert "recoverd: try to become the recovery master if we have the capability, but the current master doesn't"
This reverts commit a97d417aba85e901540147a4dff4794249442939.
(This used to be ctdb commit c19cb751077b78cf4b6e28a1e3746d4ffedbfd68)
the persistent flag.
This is the same size as the original boolean but allows ut to add additional flags for the database
(This used to be ctdb commit 7462761638d25880ad46024ad4ef21667eb99a98)
Reduce an infomational message about not performing ip reallocation
from NOTICE(the default) to INFO.
These messages are normal during startup or when stopped/banned when
we will be in recovery mode for a while.
Remove a messager in the loop waiting for initial startup to complete about
the generation being invalid. It is always invalid at this stage before we have
finished initial recovery.
Rate-limit the informational messages for CTDB_WAIT_UNTIL_RECOVERED
so that we only print them once per second for the first 60 seconds and after that only once per 10 minutes.
These messages are normal during startup, but we should not be logging them every second for cases where we will remain in recovery mode during startup for an extended period of time.
Such as if suspended or permabanned.
CQ S1023302
(This used to be ctdb commit 3a0af8780dc595acbed880f288fcbc4f62c862fb)
This way, the records coming in via this handler, can be treated appropriately.
Namely, they can be deleted instead of being stored when the meet the fast-path
vacuuming criteria (empty, never migrated with data...)
(This used to be ctdb commit fb5d832104970320359b3e474eb291ca3d629380)
Those records that are kept after recovery, are non-empty, and
stored identically on all nodes. So this is as if they had been
migrated with data.
Pair-Programmed-With: Stefan Metzmacher <metze@samba.org>
(This used to be ctdb commit 101be642e492a3a54231e2e3e6553a59380fe702)
While it does not address the reason for recovery daemon shutting down, it reduces the impact of such issues and makes the system more robust.
(This used to be ctdb commit 0566ef3d6cef809bda204877c493c80ff9eb2c40)
has failed.
We dont need to rebuild the databases in this situation, we just
need to try again to sort out the ip address allocations.
(This used to be ctdb commit 044c398ffea23d36ee033c8ddf07d11028197346)
scheduler for the child.
Use ctdb_fork() from callers where we dont want the child to be running
at real-time privilege.
(This used to be ctdb commit 58795a4c9e0624e20fa3e0023b65127053edd103)
but thinks it is still unassigned (-1).
add code to the recovery daemon to detect this case and trigger a reallocation
so that the ip gets covered
and change the takeip code to allow for this condition, taking on an ip address that is
already hosted.
cq s1021073
(This used to be ctdb commit 9020baf27cab7821c9094cda185206fb7af0fee7)
This means we can distinguish which child is logging, esp. via syslog where we have no pid.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 68b3761a0874429b90731741f0531f76dcfbb081)
In Samba this is now called "tevent", and while we use the backwards
compatibility wrappers they don't offer EVENT_FD_AUTOCLOSE: that is now
a separate tevent_fd_set_auto_close() function.
This is based on Samba version 7f29f817fa.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 85e5e760cc91eb3157d3a88996ce474491646726)
Seconds between ctdbd first log message and node healthy:
BEFORE: 4.03
AFTER: 2.02
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 8f17731dea4287d4f9b21dc58c1bdf26c8a0e628)
We currently sleep for one second, whether or not we've already slept.
Change this to sleep for the remainder of the second, if at all.
Seconds between ctdbd first log message and node healthy:
BEFORE: 18.09
AFTER: 17.08
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit fde760b5f39c77172308a583da4c2443b71541c9)
We do a recovery on startup. But the code does:
Sleep for ctdb->tunable.recover_interval.
Check for recovery.
We want to do it in the other order. This is best done by extracting
the loop into a separate "main_loop" function.
Seconds between ctdbd first log message and node healthy:
BEFORE: 24.09
AFTER: 23.58
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 097046025176b9fcb670839d1a9f100f890e7ed2)
ctdb_client.h is the existing internal client interface (which was mainly
in ctdb.h), and ctdb_protocol.h is the information needed for the wire
protocol only.
ctdb.h will be the new, shiny, libctdb API.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 4bba6b8cd47b352f98d41f9f06258d5ac3c9adef)
verify that all nodes agree on the most recent ip address assignments
broke "ctdb moveip ..." since that call would never trigger
a full takeover run and thus would immediately trigger an inconsistency.
Add a new message to the recovery daemon where we can tell the recovery daemon to update its assignments.
BZ62782
(This used to be ctdb commit e7069082e5f0380dcddee247db8754218ce18cab)
addresses and verify that the remote nodes have/keep a consistent view of
assigned addresses.
If a remote node has an inconsistent view of addresses visavi the recovery
master this will trigger a full ip reallocation.
(This used to be ctdb commit f3bf2ab61f8dbbc806ec23a68a87aaedd458e712)
We know ask for the known and available interfaces.
This means a node gets a RELEASE_IP event for all interfaces
it "knows", but doesn't serve and a node only gets a TAKE_IP event
for "available" interfaces.
metze
(This used to be ctdb commit a695a38e49e7c3e15a9706392dc920eeab1f11ba)
The do_setsched was being tested for whether to mmap tdbs: let's make it
explicit. We can also happily move the kill-child eventscript hack under
this flag.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 2ee86cc1f311d7b7504c7b14d142b9c4f6f4b469)
This reverts commit 8aef46d2aab3efb322dda51eaa202653cefd5222.
This special recovery logic is wrong now with the transaction rewrite.
The treatment of persistent databases will later be rewritten to use the
database sequence number.
Michael
(This used to be ctdb commit c5a0aef668a63f927d6184612b13ce316eb4a0be)
This patch improves the handling of the fetch_lock operation on non-persistent
databases that ctdb clients have to do very frequently.
The normal flow how this goes is the following:
1. Client does a local fetch_lock on the database
2. Client looks if the local node is dmaster.
If yes, everything is fine
If no, continue here
3. Client unlocks the local record
4. Client issues a "get me the record" call to ctdbd
5. ctdbd goes out and fetches the dmaster role
6. ctdbd tells the client to retry
7. Client starts over again
The problem is between step 6 and 7: Before the client has had the chance to
retry (i.e. catch the record with a fetch_locked), another node might have come
asking ctdbd to migrate away the record again. This is a real problem, I've
seen >20 loops of this kind in real workloads.
This patch does the following: Whenever ctdb receives a record as result of
step 5, it puts the key on a "holdback list". As long as a key is on this list,
a request to migrate away the dmaster is put on hold. It is the client's duty
to issue the "CTDB_CONTROL_GOTIT" control when it has successfully done step 2
after having asked ctdb to fetch the record. This will release the key from the
"holdback list" and re-issue all dmaster migration requests.
As a safeguard against malicious clients, once a second (default 1000msecs,
tunable "HoldbackCleanupInterval" in milliseconds) ctdbd goes over the list of
held back keys, deletes them and releases all held back migration requests.
(This used to be ctdb commit 5736e17c139c9a8049e235429aeae0c6c9d0e93d)
The decision mechanism which records of a persistent db
are to be pulled into the recdb during recovery is now
as follows:
* Usually a record with the higher rsn than that already
stored is taken. (Just as for normal tdbs.)
* If a transaction is running on some node, then those
nodes copies of all records are taken and are not
overwritten later by other nodes' copies.
In order to keep track of whether a record's copy was obtained
from a node with a transaction running, the recovery mechanism
misuses the ctdb tdb header field 'lacount' in the recdb.
It is cleared later when pushing out the recdb database to the
other nodes.
This way, an incomplete transaction is not spoiled when
a recovery interrupts and the replay should usually succeed
(possibly after a few retries).
Michael
(This used to be ctdb commit 8aef46d2aab3efb322dda51eaa202653cefd5222)
It is important to keep track of the dmaster (i.e. the node that last committed
a transaction containing changes to this node).
Michael
(This used to be ctdb commit fe68972eb9cf3aa1f16ba1aacf57ade5d66e647c)
and further down to pull_remote_database(), pull_one_remote_database(),
and push_recdb_database().
This is in preparation of special handling of persistent databases
during recoveries.
Michael
(This used to be ctdb commit 90abc4ac7c16e854cf6e8f96b60a77bc92e35e07)
Rather than doing strcmp everywhere, pass an explicit enum around. This
also subtly documents what options are available. The "options" arg
is now used for extra arguments only.
Unfortunately, gcc complains on empty format strings, so we make
ctdb_event_script() take no varargs, and add ctdb_event_script_args(). We
leave ctdb_event_script_callback() taking varargs, which means callers
have to do "%s", "".
For the moment, we have CTDB_EVENT_UNKNOWN for handling forced scripts
from the ctdb tool.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 470822b329f9d3ca9bef518b56e9ce28d5fedda2)
The problem was this:
When the monitor event fails, the node->flags get updated,
and an update (containing the old and new flags) is sent to
the recovery master.
If the recovery master sends the update to itself (the same process),
it was compairing the node->flags variable with the received new flags.
This check always found both flag values to be equal
and never sets the rec->need_takeover_run variable to true.
There were two problem, first the push_flags_handler() function
didn't pass the received old flags.
And the ctdb_control_modflags() function ignored the received old flags.
metze
(This used to be ctdb commit 8ec633b64a05a2d903c2b9639909f15f6375548f)
we also skip this check if we are already in the process of performing an ip reallocation and not only when we are performing a full recovery.
(This used to be ctdb commit 1a09b02767f3928d3c5db0e0afc59bb938e4a445)
so we can spot if there are leaks.
plug two leaks for filedescriptors related to when sending ARP fail
and one leak when we can not parse the local address during tcp connection establish
(This used to be ctdb commit ddd089810a14efe4be6e1ff3eccaa604e4913c9e)
transactions we start across all tdb databased during the recovery.
this allows us to properly clean up and delete these tdb transactions on a
recovery failure.
(This used to be ctdb commit b2ce8b900a7d00944c84e0574fea5b371064a06d)
this prevents a situation where the remove node may cause spurious ip reallocations.
(This used to be ctdb commit dd122351efaeef5475cdec111eb900110d83ec35)
This is useful when we are moving addresses using moveip in the cluster since otherwise if we collide with the recovery daemons own check we could cause a recovery
(This used to be ctdb commit 9c63858c0b22c81eaccb9865a414af0bbb2833d4)
change this to provide absolution to all nodes once they have participated in a recovery session.
(This used to be ctdb commit f66d17fb2e81a35d5adb3754e1cc902f76b4590a)
This happens normally when someone explicitely triggers a recovery using "ctdb recover"
(This used to be ctdb commit 3085170be8460e59996a3eee4e29fec9ddbcf0f8)
This allows us to timeout the operation if the underlying filesystem has become temporarily unresponsive without causing a new recovery.
(This used to be ctdb commit d177b08f1dc79534491f27726b05405d47e12e20)
This is used to mark nodes as being DELETED internally in ctdb
so that nodes are not renumbered if / when they are removed from the nodes file.
This is used to be able to do "ctdb reloadnodes" at runtime without
causing nodes to be renumbered.
To do this, instead of deleting a node from the nodes file, just comment it out like
1.0.0.1
#1.0.0.2
1.0.0.3
After removing 1.0.0.2 from the cluster, the remaining nodes retain their
pnn's from prior to the deletion, namely 0 and 2
Any line in the nodes file that is commented out represents a DELETED pnn
(This used to be ctdb commit 6a5e4fd7fa391206b463bb4e976502f3ac5bd343)
Log this in "ctdb statistics".
Also add a varaible "RecLockLatencyMs" that will log an error everytime it takes longer than this to access the reclock file.
(This used to be ctdb commit 042377ed803bb8f7ca9d6ea1a387427b7b8ba45a)