IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Presumably this was done to minimise the chance of a recovery
occurring while the nodemaps are inconsistent across nodes.
Another potential theory is that the forced recovery in the
ctdb.c:control_reload_nodes_file() stops another recovery occurring
for ReRecoveryTimeout seconds, so this delay causes the reloads to
occur during that period.
This is no longer necessary because recoveries are now explicitly
disabled while node files are reloaded.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Every time a nodemap is contructed the node IP addresses all need to
be parsed. This isn't very productive use of CPU.
Instead, parse each string once when the nodes file is loaded. This
results in much simpler code.
This code also removes the use of ctdb_address. Duplicating the port
is pointless without an abstraction layer around ctdb_address. If
CTDB gets an incompatible transport in the future then add an
abstraction layer.
Note that the infiniband code is not updated. Compilation of the
infiniband code is already broken. Fixing it will be a separate,
properly tested effort.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
When the daemon is able to take the recovery lock during recovery we
might as well guess that the cluster filesystem has a lock coherence
problem and print a more useful message. This will be more helpful to
those trying out cluster filesystems that don't have lock coherence or
that are difficult to setup.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Have it just silently take or fail to take the lock, except on an
unexpected failure (where it should log an error).
This means that when it is called we need to keep the old behaviour
and explicitly release the lock. In do_recovery() the lock is
released and a message is printed before attempting to take the lock.
In the daemon sanity check the lock must be released in the error path
if it is actually taken.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Unlock the recovery lock file. This way knowledge of the file
descriptor isn't sprinkled throughout the code.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
It is pointless having a recovery lock but not sanity checking that it
is working. Also, the logic that uses this tunable is confusing. In
some places the recovery lock is released unnecessarily because the
tunable isn't set.
Simplify the logic by assuming that if a recovery lock is specified
then it should be verified.
Update documentation that references this tunable.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Print out the errno if the fcntl call.
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Richard Sharpe <rsharpe@samba.org>
Autobuild-User(master): Michael Adam <obnox@samba.org>
Autobuild-Date(master): Fri Jan 9 04:25:02 CET 2015 on sn-devel-104
This makes it consistent with Samba, to ease transition.
Update unit test code to link to with tdb_wrap instead of including
db_wrap.c.
There are some potential whitespace fixes in this commit that have
been ignored. CTDB's lib/tdb_wrap will be deleted after the
transition to Samba's lib/tdb_wrap, so there's no point polishing it
too much.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This prevents ctdb tool from thawing databases prematurely in
thaw/wipedb/restoredb commands if recovery is active.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
This was added to support external monitoring using CTDB event scripts.
However, it was never used.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
This is caused by corruption of a record header such that the records
on two nodes point to each other as dmaster. This makes a request for
that record bounce between nodes endlessly.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit f0853013655ac3bedf1b793de128fb679c6db6c6)
This is caused by corruption of a record header such that the records
on two nodes point to each other as dmaster. This makes a request for
that record bounce between nodes endlessly.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit a610bc351f0754c84c78c27d02f9a695e60c5b0f)
This helps distinguish processes in process list in top, perf, etc.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 2493f57ce268d6fe7e4c40a87852c347fd60d29e)
Since the complete database is not locked when the receive_records
control is received, it's possible that we may not be able to obtain
lock on a chain. We will try again to store this record.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit 32723c9efdad1c6ca4aa53f308ccd9bef1aadfff)
This adds more serialisation to the startup, ensuring that the
"startup" event runs after everything to do with the first recovery
(including the "recovered" event).
Given that it now takes longer to get to the "startup" state, the
initscript needs to wait until ctdbd gets to "first_recovery".
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit ed6814ff0a59ddbb1c1b3128b505380f60d7aeb7)
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-Programmed-With: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 1f96ea08f9a39dfe537c9b957ac512c84dc76f91)
It isn't used, superceded by "ipreallocated".
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit c2bb8596a8af6406ef50e53953884df9d6246a96)
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-By: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit b5a8791268e938d7e017056e0e2bd2cbec1fa690)
This in preparation of turning the vacuming on the lmaster into
into a two phase process:
- First the node sends the list of records to be vacuumed
to all other nodes with this new RECEIVE_RECORDS control.
The remote nodes should store the lmaster's empty current copy.
- Only those records that could be stored on all other nodes
are processed further. They are send to all other nodes with
the TRY_DELETE_RECORDS control as before for deletion.
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-By: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit e397702e271af38204fd99733bbeba7c1db3a999)
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 2e92deef5221ee651028ef87138b3113f1fece91)
When CTDB is shutting down, recovery daemon is stopped, but the
event that checks if recovery daemon is still alive is not destroyed.
So recovery master is restarted during shutdown if CTDB daemon takes
longer to shutdown.
There are two processes that check if recovery daemon is working.
1. ctdb_check_recd() - which checks every 30 seconds if the recovery
daemon process exists.
2. ctdb_recd_ping_timeout() - which is triggered when recovery daemon
fails to ping CTDB daemon.
Both the events are periodic and need to be destroyed when shutting down.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 746168df2e691058e601016110fae818c6a265c3)
Change this to instead preallocate , by default, 10MByte chunks to the data buffer.
This significantly reduces the number of potential reallocate and move operations that may be required.
Create a tunable to override/change how much preallocation should be used.
(This used to be ctdb commit 1f262deaad0818f159f9c68330f7fec121679023)
Add tunables to control when to log these instances and allow it to be completely turned off by setting the threshold to 0
(This used to be ctdb commit 9ed58fef4991725f75509433496f4d5ffae0ae87)
Wrap all creation of child processes inside ctdb_fork() which is used to track all processes we have spawned.
Capture SIGCHLD to track also which child processes have terminated.
Wrap kill() inside ctdb_kill() and make sure that we never send a !0 signal to a child process pid that has already terminated (and might have been replaced with a
(This used to be ctdb commit f73a4b1495830bcdd094a93732a89dd53b3c2f78)
This can improve performance and stop clients from having to chase a rapidly migrating/bouncing record
(This used to be ctdb commit d0d98f7e45e5084b81335b004d50bddc80cdc219)
Dont reset the pointer to NULL after deleting the first entry, loop deleting one entry
at a time until they are all gone or we will leak some memory and possibly a process.
(This used to be ctdb commit 8a86ac72088ad9f64ca83218c704f84c9abe00b6)
let all databases default to not support this until enabled through this control
(This used to be ctdb commit 908a07c42e5135a3ba30a625fc4f4e4916de197a)
the persistent flag.
This is the same size as the original boolean but allows ut to add additional flags for the database
(This used to be ctdb commit 7462761638d25880ad46024ad4ef21667eb99a98)
When the end_recovery control is received, pending trans3 commits are
finished. During the recovery, all the actions like persistent_callback
and persistent_store_timeout had been disabled to let the recovery do
its job. After the recover is completed, send the reply to the waiting
clients.
(This used to be ctdb commit f7dfeb7143f574c2434f7dd16917380dfd1f4f64)