IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
This also updates remote flags so the name is misleading.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14466
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
update_local_flags() will be changed to use these nodemaps.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14466
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The nodemap has already been fetched from the local node and is
actually passed to this function. Care must be taken to avoid
referencing the "remote" nodemap for the recovery master. It also
isn't useful to do so, since it would be the same nodemap.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14466
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The plan here is to use the nodemaps retrieved by get_remote_nodes()
in update_local_flags(). This will improve efficiency, since
get_remote_nodes() fetches flags from nodes in parallel. It also
means that get_remote_nodes() can be used exactly once early on in
main_loop() to retrieve remote nodemaps. Retrieving nodemaps multiple
times is unnecessary and racy - a single monitoring iteration should
not fetch flags multiple times and compare them.
This introduces a temporary behaviour change but it will be of no
consequence when the above changes are made.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14466
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This array is indexed by the same index as nodemap, not the PNN.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14466
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Also drop error handling in main_loop() that is replaced by this
change.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14466
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This will allow an error callback to be added.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14466
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Change 1st argument to a rec context, since this will be needed later.
Drop the nodemap argument and access it via rec->nodemap instead.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14466
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The memory is allocated off the memory context used by the current
iteration of main loop. It is freed when main loop completes the fix
doesn't require backporting to stable branches. However, it is sloppy
so it is worth fixing.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14466
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Don't log an error on failure - let the caller can do this. Apart
from this: fix up coding style and modernise the remaining error
message.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14466
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Signed-off-by: Ralph Boehme <slow@samba.org>
Reviewed-by: Swen Schillig <swen@linux.ibm.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
Autobuild-User(master): Jeremy Allison <jra@samba.org>
Autobuild-Date(master): Mon Aug 3 22:21:04 UTC 2020 on sn-devel-184
The logic currently in ctdb_ctrl_modflags() will be optimised so that
it no longer matches the pattern for a control function. So, remove
this function and squash its functionality into the only caller.
Although there are some superficial changes, the behaviour is
unchanged.
Flattening the 2 functions produces some seriously weird logic for
setting the new flags, to the point where using ctdb_ctrl_modflags()
for this purpose now looks very strange. The weirdness will be
cleaned up in a subsequent commit.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This is clearer than using the MODFLAGS control directly.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This makes fields such as recmaster and nodemap easily available if
required.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
An unused argument needlessly extends the length of function calls. A
subsequent change will allow rec->nodemap to be used if necessary.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The current code only ever swaps with slot 0. This will only ever
happen with slots 0 and 1, so probably never sorts.
Replace with qsort().
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
ctdbd currently only logs when a new hot key is added. If a key gets
hotter then nothing new is logged.
Log hot key updates when the number of migrations has doubled since
the last time that key was logged.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This message indicates that a hot key was added, so say that. After
all the hot key slots have been filled the id will always be 0, so
don't bother logging it.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
These should be unsigned but luck is currently on our side.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
There are 2 reasons for this. Sorting of hot keys is broken and will
be changed to an implementation that needs a named (i.e. not
anonymous) structure. Also, at least one non-protocol field will be
added to facilitate more useful logging.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Testing control: 4 bytes msec delay plus a blob, return the request after the
delay. This is an enhanced "ping" which can be used to test asynchronous
clients.
Doesn't have the full protocol implementation yet
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
The vacuuming integration tests set VacuumInterval to a very high
number to avoid vacuuming collisions. This is done after the cluster
is healthy, so Samba will have already been started and vacuuming will
already be scheduled *at the default interval* for databases attached
by Samba. This means that vacuuming controls used by vacuuming tests
can still collide with the scheduled vacuuming events.
Add some logic to reschedule a vacuuming event that has fired but
where VacuumInterval has increased since it was originally scheduled.
The increase in VacuumInterval is used as the time offset for
rescheduling the event.
Although this changes production behaviour for the convenience of
testing, the new behaviour is completely reasonable and obeys the
principle of least surprise.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue Apr 7 03:04:57 UTC 2020 on sn-devel-184
No behaviour change. This is final staging to make the next change
completely obvious.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
No behaviour change. This just makes future changes clearer by
avoiding reformatting (or introducing local variables).
Clean up error handling while touching a relevant line.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Inside the nested event loop in ctdb_ctrl_getnodemap(), various
asynchronous handlers may dereference rec->nodemap, which will be
NULL.
One example is lost_reclock_handler(), which causes rec->nodemap to be
unconditionally dereferenced in list_of_nodes() via this call chain:
list_of_nodes()
list_of_active_nodes()
set_recovery_mode()
force_election()
lost_reclock_handler()
Instead of attempting to trace all of the cases, just avoid leaving
rec->nodemap set to NULL. Attempting to use an old value is generally
harmless, especially since it will be the same as the new value in
most cases.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14324
Reported-by: Volker Lendecke <vl@samba.org>
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Tue Mar 24 01:22:45 UTC 2020 on sn-devel-184
Neither the recovery daemon nor the recovery helper should attach
databases outside of the recovery process.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14294
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This builds a complete list of databases across the cluster so it can
be used to create databases on the nodes where they are missing.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14294
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This will be used to build a merged list of databases from all nodes,
allowing the recovery helper to create missing databases.
It would be possible to also include the db_name field in this
structure but that would cause a lot of churn. This field is used
locally in the recovery of each database so can continue to live in
the relevant state structure(s).
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14294
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This is currently only set by the recovery daemon when it attaches
missing databases, so there is no obvious behaviour change. However,
attaching missing databases can now be moved to the recovery helper as
long as it sets this flag.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14294
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
ctdb_ctrl_createdb() is only called by the recovery daemon, so this is
a safe, temporary change. This is temporary because
ctdb_ctrl_createdb(), create_missing_remote_databases() and
create_missing_local_databases() will all go away soon.
Note that this doesn't cause a change in behaviour. The main daemon
will still only defer attaches from non-recoverd processes during
recovery.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14294
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Commit 3cc230b5ee says:
Dont allow clients to connect to databases untile we are well past
and through the initial recovery phase
It is unclear what this commit was attempting to do. The commit
message implies that more attaches should be deferred but the code
change adds a conjunction that causes less attaches to be deferred.
In particular, no attaches will be deferred after startup is complete.
This seems wrong.
To implement what seems to be stated in the commit message an "or"
needs to be used so that non-recovery daemon attaches are deferred
either when in recovery or before startup is complete. Making this
change highlights that attaches need to be allowed during the
"startup" event because this is when smbd is started.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14294
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
If a node is marked for banning, confirm that it's not become inactive
during the recovery. If yes, then don't ban the node.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14294
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
It's possible to have a node stopped, but recovery master not yet
updated flags on the local ctdb daemon when recovery is started. So do
not trust the list of active nodes obtained from the local node. Query
the connected nodes to calculate the list of active nodes.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14294
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
New vnnmap is constructed using the information from all the connected
nodes. So there is no need to fetch the vnnmap from recovery master.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14294
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
If NODE_FLAGS_DISCONNECTED is set the node can be in half-connected state. With
this change we ensure to restart the transport for this case.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14295
Signed-off-by: Ralph Boehme <slow@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
There is no sane way of keeping stdin open when using the shell to
background ctdbd in local_daemons.sh. Instead, have ctdbd fork when
not interactive and when test mode is enabled. become_daemon() can't
be used for this: if it forks then it also closes stdin.
For the interactive case, become_daemon() wasn't doing anything
special, so do nothing instead.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
These don't need to depend on do_fork. Child logging should be set up
whenever the daemon is not interactive. The stdin handler should be
setup whenever test mode is enabled.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
No functional changes.
This is staging for a change that makes ctdbd fork when test mode is
enabled but interactive is not set.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This allows a test environment to simply close its end of a pipe to
cleanly shutdown ctdbd. Like in smbd, this is only done if stdin is a
pipe or a socket.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This avoids a crash if ctdb_shutdown_sequence() is called before
monitoring is initialised.
Switch to using TALLOC_FREE() while touching this function.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Testing against a commonly used cluster filesystem has shown no
performance impact, as expected.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This changes the behaviour for some failures from exiting to simply
attempting to schedule the next run.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
All the vacuum operations if required have an event loop to ensure
completion of pending operations. Once all the steps are complete,
there is no reason to process any more packets.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
The only reason for recoverd attaching to databases was to migrate
records to the local node as part of vacuuming. Recovery daemon does
not take part in database vacuuming any more.
The actual database recovery is handled via the recovery_helper and
recovery daemon should not need to attach to the databases any more.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
This is now implemented in the ctdb daemon using VACUMM_FETCH control.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
This occurs rarely but can adversely impact performance, so it is
worth logging it more frequently.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This currently skips the last record.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14147
RN: Avoid potential data loss during recovery after vacuuming error
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Currently some of this is supported by a periodic check in the
recovery daemon's main_loop(), which notices the flag change, sets
recovery mode active and freezes databases. If STOP_NODE returns
immediately then the associated recovery can complete and the node can
be continued before databases are actually frozen.
Instead, immediately do all of the things that make a node inactive.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14087
RN: Stop "ctdb stop" from completing before freezing databases
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue Aug 20 08:32:27 UTC 2019 on sn-devel-184
There's no reason to avoid immediately setting recovery mode to active
and initiating freeze of databases.
This effectively reverts the following commits:
d8f3b490bbb4357a79d9
The latter is now implemented using a control, resulting in looser
coupling.
See also the following commit:
f8141e91a6
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14087
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This is a superset of ctdb_local_node_got_banned() so will replace
that function, and will also be used in the NODE_STOP control.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14087
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This is the core logic from ctdb_ip_to_pnn(), so re-implement that
that function using ctdb_ip_to_node().
Something similar (ctdb_ip_to_nodeid()) was recently removed in commit
010c1d77cd because it wasn't required.
Now there is a use case.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14084
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Compiling with -Wsign-compare complains:
1047 | && (call->call_id == CTDB_FETCH_WITH_HEADER_FUNC)) {
| ^~
struct ctdb_call is a protocol element, so we can't simply change it.
Found by csbuild.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Wed Aug 14 10:29:59 UTC 2019 on sn-devel-184
Compiling with -Wsign-compare complains:
ctdb/server/ctdb_call.c:831:12: warning: comparison of integer expressions of different signedness: ‘int’ and ‘uint32_t’ {aka ‘unsigned int’} [-Wsign-compare]
831 | if (count <= ctdb_db->statistics.hot_keys[0].count) {
| ^~
and
ctdb/server/ctdb_call.c:844:13: warning: comparison of integer expressions of different signedness: ‘int’ and ‘uint32_t’ {aka ‘unsigned int’} [-Wsign-compare]
844 | if (count <= ctdb_db->statistics.hot_keys[i].count) {
| ^~
Found by cs-build.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
If the lock file is inaccessible or the inode number changes then the
lock is lost, so exit. This allows the recovery daemon to trigger an
election. The ensuing recovery will re-take the lock.
By default the lock file is checked every 60 seconds. A lot can
happen in 60 seconds but being more aggressive and accessing the lock
too often could result in a performance issue for the cluster
filesystem.
An new optional 2nd argument is added, which is the lock file re-check
time in seconds.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This will allow more conditions to be waited on via additional
sub-requests. At the moment this just completes when the parent wait
completes.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Put the checking for the process being immediately re-parented into
the computation too. This will be very rare and doing it
consistently makes testing saner.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
There is no need to wait until the parent kills the helper. The
parent will get the initial response, indicating contention or
similar, and will then get a separate event indicating that the pipe
is gone.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This makes the code more explicit and makes testing easier due to less
dependencies.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
clang warns:
ctdb/server/ctdb_mutex_fcntl_helper.c:61:3: warning: Value stored to 'fd' is never read
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
One of these had a missing space, so this implicitly fixes it. It
also drops the need to unnecessarily include common.h, which comes
with some dependency baggage.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Only do this if the recovery lock is unset. Log every minute for the
first 10 minutes, then every 10 minutes, then every hour.
This is useful for determining whether a split brain occurred. It is
particularly useful if logging failed or was throttled at startup, so
there is no evidence of the split brain when it began.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This can never be NULL. It could probably be NULL in the past when
"all database" locks existed.
There are paths where is is checked for NULL and then later
dereferenced, causing static analysers to produce spurious warnings.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Indexing by PNN is wrong.
This also removes a signed/unsigned comparison because the PNN is not
compared to -1 anymore.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Node ID is a poorly defined concept, indicating the slot in the node
map where the IP address was found. This signed value also ends up
compared to num_nodes, which is unsigned, producing unwanted warnings.
Just return the PNN because this what both callers really want.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
... and does not just contain whitespace.
Otherwise NULL can be passed as the first argument to execv().
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Instead of taking exclude_pnn as a parameter, calculate it from an
include_self_parameter, which is passed through from the 2 calling
functions.
While doing this, fix a signed/unsigned comparison issue by declaring
the new exclude_pnn local variable as an unsigned type.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The next commit will change the type of this function, which is only
used in this file. So, make it static to isolate the change.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Yes, the other callers check the return value of ctdb_lockdb_mark().
However, this is called in a void function and ctdb_lockdb_mark() has
already printed any error message. All we can do is explicitly ignore
the return value.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
To make this much clearer, move the declaration into the scope where
it is used.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This fixes warnings about signed versus unsigned comparisons.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Simple cases where variables and function parameters need to be
declared as an unsigned type instead of an int.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Simple cases where variables need to be declared as an unsigned type
instead of an int.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Fixes
ctdb/server/ipalloc_lcp2.c:61: error: shiftTooManyBitsSigned: Shifting signed 32-bit value by 31 bits is undefined behaviour <--[cppcheck]
Signed-off-by: Noel Power <noel.power@suse.com>
Reviewed-by: Andreas Schneider <asn@samba.org>
Old war story completely from memory, I could not find the commit that
introduced TDB_SEQNUM so far...:
Back in the days when ctdb was initially developed, TDB_SEQNUM's only
user was the notify.tdb that held one huge record for all notify
records. With that use case in mind it made perfect sense to keep the
SEQNUM stable locally, sacrificing precision. By now notify.tdb is
long gone, an the only user of TDB_SEQNUM right now is brlock.tdb,
which contains special case code for the imprecise ctdb implementation
of TDB_SEQNUM.
With this commit, that special code can go: The TDB_SEQNUM will also
increment when just the DMASTER header field changes, indicating to
smbd that someone else might have changed the record. This will of
course increase the SEQNUM frequency, but it should not increase the
load on ctdb: If you look at the brlock.c workaround, it just does not
do the caching that is possible with precise TDB_SEQNUMs working.
How did I get here? I want to move brl_num_read_oplocks() from
brlock.tdb into locking.tdb, and for that I need precise TDB_SEQNUMs
for locking.tdb.
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
Autobuild-User(master): Jeremy Allison <jra@samba.org>
Autobuild-Date(master): Fri May 24 00:42:17 UTC 2019 on sn-devel-184
state is always freed before exiting this function, so allocate fde
off it instead of long-lived ctdb context.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13943
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
ctdb_control_db_attach() and ctdb_control_db_detach() assume that any
control with client ID 0 comes from another daemon and treat it
specially.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13930
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This is to help us notice when ctdbd is using the full capacity of a
CPU, so is saturated.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Sat Apr 6 11:51:55 UTC 2019 on sn-devel-144
There is no need to write SAMBA_VERSION_STRING as CTDB_VERSION_STRING.
Wherever required use SAMBA_VERSION_STRING directly.
Avoids the confusion with two version.h files.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13789
Signed-off-by: Amitay Isaacs <amitay@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Fri Mar 15 06:31:50 UTC 2019 on sn-devel-144
This can be used to test the version checking logic. Cache the
version to avoid re-checking the environment variable each time.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@samba.org>
263/386] Compiling ctdb/server/ctdb_recovery_helper.c
In file included from ../../server/ctdb_recovery_helper.c:24:0:
../../server/ctdb_recovery_helper.c: In function ‘main’:
../../../lib/talloc/talloc.h:911:34: error: ‘mem_ctx’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
#define TALLOC_FREE(ctx) do { if (ctx != NULL) { talloc_free(ctx); ctx=NULL; } } while(0)
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Jeremy Allison <jra@samba.org>
In order to detect an value overflow error during
the string to integer conversion with strtoul/strtoull,
the errno variable must be set to zero before the execution and
checked after the conversion is performed. This is achieved by
using the wrapper function strtoul_err and strtoull_err.
Signed-off-by: Swen Schillig <swen@linux.ibm.com>
Reviewed-by: Ralph Böhme <slow@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
This code is difficult to read and there really is no common code
between the 2 cases. For example, there is no need to split a
filename into words. Separating each of the 2 cases into its own
function makes the logic much easier to understand.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13800
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Mon Feb 25 03:40:16 CET 2019 on sn-devel-144
Currently this will wait forever. It really needs a timeout in case
the cluster filesystem (or other lock mechanism) is completely wedged.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13800
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
We really shouldn't see unknown errors. They probably represent a
misconfigured recovery lock or similar.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13800
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Add an explicit case for a timeout and clean up the other messages.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13800
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
If nested events occur while the file descriptor handler is still
active then chaos can ensue. For example, if a node is banned and the
lock is explicitly cancelled (e.g. due to election loss) then
double-talloc-free()s abound.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13800
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
It isn't currently possible to distinguish these 2 cases.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
When the number of fast path vacuuming runs is 0 then a full vacuuming
run is done. This means the first one is a full run, which is almost
certainly not what is intended.
Combine the 2 conditionals to only flag a full vacuuming run when the
count exceeds the configured limit. This means that the
full_vacuum_run flag is set in both parent and child, but this is
harmless... and is better than getting it wrong.
Also tweak the comparison to be less-than-or-equal, since the zeroth
run needs to be counted.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The lock may have been lost due to a failure in the underlying locking
mechanism. This could be due to quorum loss or similar. It is best
to call an election to confirm that this node should still be master.
At worst, the node will reelect itself, fail to take the lock and then
ban itself. This is a suitable outcome for a node that has been
partitioned from others in the cluster.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This provides finer resolution while still maintaining a reasonable
maximum. In this case the top bucket contains any hop counts
>= 16384, compared to the current situation where the top bucket contains
hop counts >= 268435456.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Since 4.9.0, the log messages can be confusing if a required database
directory does not exist. Explicitly check for database directories,
logging a clear error and exiting if one is missing.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13696
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Mon Dec 3 06:56:41 CET 2018 on sn-devel-144
Signed-off-by: Andreas Schneider <asn@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
Autobuild-User(master): Andreas Schneider <asn@cryptomilk.org>
Autobuild-Date(master): Thu Nov 8 11:03:11 CET 2018 on sn-devel-144
Explicitly background ctdbd instead.
This has the advantage of leaving stdin open. ctdbd can then be
enhanced to exit when stdin closes, allowing better cleanup in a test
environment.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue Nov 6 10:30:14 CET 2018 on sn-devel-144
popt uses an int in place of a bool, so declare an extra int and make
the conversion explicit.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
ctdbd logs to stderr in interactive mode, not stdout. This way stdout
is always closed.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Logging the logging location to syslog can be useful on production
systems when the configuration goes unexpectedly missing. However, in
test mode this just adds noise to the logs on the test system.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Drop the use of ctdb_set_sockname() because it complicates the memory
allocation and this is the only place it is used. Just assign to the
relevant pointer.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
... instead of applying banning credits.
There have been a couple of cases where recovery repeatedly takes just
over 2 minutes to fail. Therefore, banning credits expire between
failures and a continuously problematic node is never banned,
resulting in endless recoveries. This is because it takes 2
applications of banning credits before a node is banned, which
generally involves 2 recovery failures.
The recovery helper makes up to 3 attempts to recover each database
during a single run. If a node causes 3 failures then this is really
equivalent to 3 recovery failures in the model that existed before the
recovery helper added retries. In that case the node would have been
banned after 2 failures.
So, instead of applying banning credits to the "most failing" node,
simply ban it directly from the recovery helper.
If multiple nodes are causing recovery failures then this can cause a
node to be banned more quickly than it might otherwise have been, even
pre-recovery-helper. However, 90 seconds (i.e. 3 failures) is a long
time to be in recovery, so banning earlier seems like the best
approach.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13670
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Mon Nov 5 06:52:33 CET 2018 on sn-devel-144