IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Dont update the statd settings that often.
When we have very many nodes and very many ips, this would generate
a lot of unnessecary load on the system
(This used to be ctdb commit 0c030c9384500f340d8382c20e1e91b11aa377e9)
This concept didnt work out and it is really just as expensive as a full migration
anyway, without the benefit of caching the data for subsequence accesses.
Now, migrate the records immediately on first access.
This will be combined with a "cheap vacuum-lite" for special empty records to
prevent growth of databases.
Later extensions to mimic read-only behaviour of records will include proper shared read-only locking of database records, making the laccessor/lacount read-only access to the data obsolete anyway.
By removing this special case and handling of lacount laccessor makes the codapath where shared read-only locking will be be implemented simpler, and frees up space in the ctdb_ltdb header for use by vacuuming flags as well as read-only locking flags.
(This used to be ctdb commit 155dd1f4885fe142c6f8bd09430f65daf8a17e51)
too much.
This means we can simplify the way we add ips significantly and stop
trying to move them.
We also check if the node already hosts the ip, in which case we used to return an error. Instead just print an error string but return 0, ok.
This makes it easier to script, and works around broken scripts.
CQ1021034
(This used to be ctdb commit 307e5e95548155a31682dfcb0956834d0c85838e)
Add a dlist to track all active lockwait child processes.
Everytime creating a new lockwait handle, check if there is already an
active lockwait process for this database/key and if so,
send the new request straight to the overflow queue.
This means we will only have one active lockwaic child process for a certain key,
even if there were thousands of fetch-lock requests for this key.
When the lockwait processing finishes for the original request, the processing in d_overflow() will automagically process all remaining keys as well.
Add back a --nosetsched argument to make it easier to run under gdb
(This used to be ctdb commit 3e9317a2e1f687b04bf51575d47fcd4faa6e6515)
Once we have more than 200 children waiting on a particular db, don't create
any more. Just put them on an overflow queue, and when a child gets a lock
search that queue to see if others were after the same lock (they probably
were).
(This used to be ctdb commit 5e614e8cfd1e9a4b13035a0e400b7a60a745b510)
Make the ctdb parent "mark" the transaction lock once the child process
has frozen/locked the entire database.
This stops the ctdb daemon from using a blocking fcntl() locking on the tdb during the
read traverse during recovery.
CQ 1021388
(This used to be ctdb commit 52ee2b3ce822344d0f55ac040fe25f6ec5c0d7c2)
tdb_traverse_read() grabs the transaction lock. This can cause ctdbd
(which uses it) to block when it should not; expose mark and normal
variants of this lock, so ctdbd's child (the recovery daemon) can
acquire it and the ctdbd parent can mark it was held.
(This used to be ctdb commit d09fa845bd848d04507853809acf42e0471b44bf)
if we are the main ctdb daemon.
Other daemons/child processes are not guaranteed to get events on regular basis
so those should not be checked.
(This used to be ctdb commit ac2afe9c25753b837d5f6396020e0f3c65ef3628)
the original "Time jumped" messages are too coarse to interpret
exactly what was going wrong inside of CTDB.
This patch removes the original logs and adds two other logs that
differentiate between the time it took to work on an event and
the time it took to get the next event.
(This used to be ctdb commit fd8d54292f10b35bc4960d64cfa6843ce9aba225)
so we need a "ticker" in the main ctdbd daemon too to ensure we get at least one event to process every second.
This will improve the accuracy of "Time jumped" messages and remove false positives when the recovery daemon is "slow".
(This used to be ctdb commit 70154e5e19e219de086b2995d41e8f6e069ee20d)
Found during automatic regression testing.
We do not allow the takeip/releaseip events to be executed during a recovery.
All of "ctdb addip, ctdb delip, ctdb moveip" use and force these events to
trigger to perform the ip assignments required.
If these commands collide with a recovery, these commands could fail since we do
not allow takeip/releaseip events to trigger during the recovery.
While it is easy to just try running hte command again, this is suboptimal for script use.
Change these commands to retry these operations a few times until either successfull or until we give up.
This makes the commands much easier to use in scripts.
(This used to be ctdb commit 6954c9df67501183995f408cca358c8fdfb176ab)
by external services failing to start, or blocking CTDBD from finishing the startup phase,
we can encounter a situation where we have not yet fully initialized, but a
remote recovery master tries to release a certain ip clusterwide.
In this situation the node that is pinned down in init/startup phase
would fail to perform the release of the ip address since we are not yet fully operational and not yet host any valid interfaces.
In this situation, we just need to remain unhealthy, there is on need to
also ban the node.
Remove the autobanning for this condition and just let the node remain in
unhealthy mode.
Banning is overkill in this situation when the system is broken and just
draws attention to ctdbd instead of the root cause.
(This used to be ctdb commit d8af74e4c4961deb94c18dde8ba7fc07e944729c)
We were potentially leaving a node unable to serve requests for too
long.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 5be8610ffa33db49e33949560d0ef2fa5f3c0c73)
This was defaulting to just "service nfs restart", which doesn't have
the workarounds we need.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 0f462e9e9fe12b595f3c7452123db8e69548abd6)
Otherwise we might short-circuit events that are run only once and
actually need to do something.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit c4f9e8a43540bc049b2771e0a2d76d37b9d17331)
Otherwise there can be strange error messages from services
stopping/starting, without any context.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 8bcf7ab164429ddc0ae530133e114f186a8146dd)
"service nfs restart" can fail. To stop nfsd it sends a SIGINT and
nfsd might take a while to process it if the system is loaded.
Starting nfsd may then fail because resources are still in use.
This does some /proc magic to tell nfsd to do no more processing. It
then runs service stop, kills nfsd with SIGKILL, and then runs service
start. This is much less likely to fail.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit a9bf4f82852975b0b627f61ceb2d23401f630805)
From Michael Anderson,
initialize the inqueue element of the ctdb structure to NULL,
else it might be used uninitialized and cause a segv.
(This used to be ctdb commit 775d02180b825ae32d6536eaf2059884d5fed9f4)
has failed.
We dont need to rebuild the databases in this situation, we just
need to try again to sort out the ip address allocations.
(This used to be ctdb commit 044c398ffea23d36ee033c8ddf07d11028197346)
scheduler for the child.
Use ctdb_fork() from callers where we dont want the child to be running
at real-time privilege.
(This used to be ctdb commit 58795a4c9e0624e20fa3e0023b65127053edd103)
Revert this patch:
commit 482c302d46e2162d0cf552f8456bc49573ae729d
We may need to use real-time processes for the main daemon and the recovery daemon to handle the cases where systems come under very high loads.
(This used to be ctdb commit 08bef9dcab6e4da15fc783f8624e5ed09aa060b5)
availability at all (since we cant restart it, there is not point checking
if it is alive)
(This used to be ctdb commit 6075e85ba6c0f58fd1ab2ce3b09dd3d6ff491365)
Httpd can be very slow to start on some platforms,
wait 5 monitor intervals before we try to restart it if
it has not bound to port 80 yet.
After 10 failed intervals, flag the node as unhealthy.
(This used to be ctdb commit 6ec1993aa5f2778b8227ce5f6eca0d19e4ae9788)
Try to restart LOCKD after 10 failures and
flag the node as unhealthy after 15 failures
(This used to be ctdb commit 5a67889c9166835aef3443051812d14af07dfca5)
Net serverid wipe can take a bit of time sometimes so background it.
Only perform auto start/stop of the managed service on the monitor event
(This used to be ctdb commit deba5cbbf7703a1a24ce88a06c73fca056e05521)
After finishing "ctdb addip" wait for an implicit "iptakeover" to complete
the assignment to a node.
This makes it more wasteful and timeconsuming when adding multiple ips
at once, or the same ip to multiple nodes,
but makes it easier to script the use of this command.
(This used to be ctdb commit d86cbf3d7d426c558d110d67dc985634c754a522)
flag the interface as initially being "link ok"
so that we can add it and startup.
The eventscript can later drop the flag if required
(This used to be ctdb commit 720849b756c825fb8b285f09972a8c39f1888a99)
Add an input queue where we keep received pdus we have not yet processed
This allows us to perform SYNC calls from an ASYNC callback
(This used to be ctdb commit c111e98d3ad7bd3d09f4081e9bb1443d3722672f)
(Imported from SAMBA commit 09a6538969ac).
Chris Cowan tracked down a SEGV in sub_alloc: idp->level can actually
be equal to 7 (MAX_LEVEL) there, as it can be in sub_remove.
(We unfairly blamed a shift of a signed var for this crash in commit
2db1987f5a).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 73764104356d3738d9d20a9d06ce51535f74f475)
when we migrate a non-empty record onto the node
or a non-empty record off the node
When we migrate a record back to the lmaster and yield the dmaster role,
inspect this flag if if it is still not set, we can delete the record from
the local database as soon as we have migrated it back to the lmaster.
(This used to be ctdb commit a8cc35191df1cd4b866897df71d317ce5f198cb5)
ctdb readkey <dbid> <key>
ctdb writekey <dbid> <key> <value>
these are mainly intended for debugging of databases and dmaster migration issues
(This used to be ctdb commit 70c2e7dd04727371590fb94579ffd20318fbeb58)