IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
The only reason for recoverd attaching to databases was to migrate
records to the local node as part of vacuuming. Recovery daemon does
not take part in database vacuuming any more.
The actual database recovery is handled via the recovery_helper and
recovery daemon should not need to attach to the databases any more.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
This is now implemented in the ctdb daemon using VACUMM_FETCH control.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
This occurs rarely but can adversely impact performance, so it is
worth logging it more frequently.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This currently skips the last record.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14147
RN: Avoid potential data loss during recovery after vacuuming error
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Currently some of this is supported by a periodic check in the
recovery daemon's main_loop(), which notices the flag change, sets
recovery mode active and freezes databases. If STOP_NODE returns
immediately then the associated recovery can complete and the node can
be continued before databases are actually frozen.
Instead, immediately do all of the things that make a node inactive.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14087
RN: Stop "ctdb stop" from completing before freezing databases
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue Aug 20 08:32:27 UTC 2019 on sn-devel-184
There's no reason to avoid immediately setting recovery mode to active
and initiating freeze of databases.
This effectively reverts the following commits:
d8f3b490bbb691c9916eed0df5b980c1aef23c85
b4357a79d916b1f8ade8fa78563fbef0ce670aa9
The latter is now implemented using a control, resulting in looser
coupling.
See also the following commit:
f8141e91a693912ea1107a49320e83702a80757a
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14087
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This is a superset of ctdb_local_node_got_banned() so will replace
that function, and will also be used in the NODE_STOP control.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14087
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This is the core logic from ctdb_ip_to_pnn(), so re-implement that
that function using ctdb_ip_to_node().
Something similar (ctdb_ip_to_nodeid()) was recently removed in commit
010c1d77cd7e192b1fff39b7b91fccbdbbf4a786 because it wasn't required.
Now there is a use case.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14084
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Compiling with -Wsign-compare complains:
1047 | && (call->call_id == CTDB_FETCH_WITH_HEADER_FUNC)) {
| ^~
struct ctdb_call is a protocol element, so we can't simply change it.
Found by csbuild.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Wed Aug 14 10:29:59 UTC 2019 on sn-devel-184
Compiling with -Wsign-compare complains:
ctdb/server/ctdb_call.c:831:12: warning: comparison of integer expressions of different signedness: ‘int’ and ‘uint32_t’ {aka ‘unsigned int’} [-Wsign-compare]
831 | if (count <= ctdb_db->statistics.hot_keys[0].count) {
| ^~
and
ctdb/server/ctdb_call.c:844:13: warning: comparison of integer expressions of different signedness: ‘int’ and ‘uint32_t’ {aka ‘unsigned int’} [-Wsign-compare]
844 | if (count <= ctdb_db->statistics.hot_keys[i].count) {
| ^~
Found by cs-build.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
If the lock file is inaccessible or the inode number changes then the
lock is lost, so exit. This allows the recovery daemon to trigger an
election. The ensuing recovery will re-take the lock.
By default the lock file is checked every 60 seconds. A lot can
happen in 60 seconds but being more aggressive and accessing the lock
too often could result in a performance issue for the cluster
filesystem.
An new optional 2nd argument is added, which is the lock file re-check
time in seconds.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This will allow more conditions to be waited on via additional
sub-requests. At the moment this just completes when the parent wait
completes.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Put the checking for the process being immediately re-parented into
the computation too. This will be very rare and doing it
consistently makes testing saner.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
There is no need to wait until the parent kills the helper. The
parent will get the initial response, indicating contention or
similar, and will then get a separate event indicating that the pipe
is gone.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This makes the code more explicit and makes testing easier due to less
dependencies.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
clang warns:
ctdb/server/ctdb_mutex_fcntl_helper.c:61:3: warning: Value stored to 'fd' is never read
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
One of these had a missing space, so this implicitly fixes it. It
also drops the need to unnecessarily include common.h, which comes
with some dependency baggage.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Only do this if the recovery lock is unset. Log every minute for the
first 10 minutes, then every 10 minutes, then every hour.
This is useful for determining whether a split brain occurred. It is
particularly useful if logging failed or was throttled at startup, so
there is no evidence of the split brain when it began.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This can never be NULL. It could probably be NULL in the past when
"all database" locks existed.
There are paths where is is checked for NULL and then later
dereferenced, causing static analysers to produce spurious warnings.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Indexing by PNN is wrong.
This also removes a signed/unsigned comparison because the PNN is not
compared to -1 anymore.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Node ID is a poorly defined concept, indicating the slot in the node
map where the IP address was found. This signed value also ends up
compared to num_nodes, which is unsigned, producing unwanted warnings.
Just return the PNN because this what both callers really want.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
... and does not just contain whitespace.
Otherwise NULL can be passed as the first argument to execv().
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Instead of taking exclude_pnn as a parameter, calculate it from an
include_self_parameter, which is passed through from the 2 calling
functions.
While doing this, fix a signed/unsigned comparison issue by declaring
the new exclude_pnn local variable as an unsigned type.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The next commit will change the type of this function, which is only
used in this file. So, make it static to isolate the change.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Yes, the other callers check the return value of ctdb_lockdb_mark().
However, this is called in a void function and ctdb_lockdb_mark() has
already printed any error message. All we can do is explicitly ignore
the return value.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
To make this much clearer, move the declaration into the scope where
it is used.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This fixes warnings about signed versus unsigned comparisons.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Simple cases where variables and function parameters need to be
declared as an unsigned type instead of an int.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Simple cases where variables need to be declared as an unsigned type
instead of an int.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Fixes
ctdb/server/ipalloc_lcp2.c:61: error: shiftTooManyBitsSigned: Shifting signed 32-bit value by 31 bits is undefined behaviour <--[cppcheck]
Signed-off-by: Noel Power <noel.power@suse.com>
Reviewed-by: Andreas Schneider <asn@samba.org>
Old war story completely from memory, I could not find the commit that
introduced TDB_SEQNUM so far...:
Back in the days when ctdb was initially developed, TDB_SEQNUM's only
user was the notify.tdb that held one huge record for all notify
records. With that use case in mind it made perfect sense to keep the
SEQNUM stable locally, sacrificing precision. By now notify.tdb is
long gone, an the only user of TDB_SEQNUM right now is brlock.tdb,
which contains special case code for the imprecise ctdb implementation
of TDB_SEQNUM.
With this commit, that special code can go: The TDB_SEQNUM will also
increment when just the DMASTER header field changes, indicating to
smbd that someone else might have changed the record. This will of
course increase the SEQNUM frequency, but it should not increase the
load on ctdb: If you look at the brlock.c workaround, it just does not
do the caching that is possible with precise TDB_SEQNUMs working.
How did I get here? I want to move brl_num_read_oplocks() from
brlock.tdb into locking.tdb, and for that I need precise TDB_SEQNUMs
for locking.tdb.
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
Autobuild-User(master): Jeremy Allison <jra@samba.org>
Autobuild-Date(master): Fri May 24 00:42:17 UTC 2019 on sn-devel-184