IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Now all references to ctdb->recovery_lock are encapsulated in the
cluster lock code.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
It is no longer just a recovery lock but is always held by the cluster
leader.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
ctdb_test_init() doesn't actually pass arguments to local_daemons.sh.
This needs to be done using ctdb_nodes_start_custom().
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The introduction of the leader broadcast timeout provides an
alternative to the current leader validation. Using the leader
broadcast may not be as fast but it is more correct.
When the leader node is stopped or banned, the only way of triggering
an election is currently to fetch the leader's node map to check
whether the it is still active. This is because the leader will no
longer push the node map to other nodes. However, having all nodes
fetch the node map from an inactive leader may be unreliable.
Most of the other cases are also handled more reliably by the leader
broadcast timeout.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This no longer occurs at startup due to the leader broadcast timeout.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
If no leader broadcasts have been received from the leader for more
than 5s then trigger an election.
Apart from being sane behaviour, this avoids elected-before-connected
bugs at startup, where a node elects itself leader before it is
connected to other nodes.
When a node processes a leader broadcast timeout it sends an unknown
leader broadcast to all nodes. That causes cancellation of the leader
broadcast timeout across the cluster. This is particular important at
startup, since nodes may be started in a staggered fashion. Without
this cluster-wide cancellation, a node might notice the lack of
leader, win an election and complete a recovery before other nodes
notice the lack of leader. When the leader broadcast timeout finally
occurs on the other nodes then they'll put the cluster back into an
unnecessary recovery.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
These are triggered on 1 second timer, but are only sent if the node
is the current leader and there is no election underway.
If this node can not be the leader then ensure it releases the
recovery lock.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
CTDB_SRVID_LEADER will be regularly broadcast to all connected nodes
by the leader.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
An alternate election method will be added that doesn't use the
election timeout, so this provides a common way for recognising when
an election is in progress.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This makes the code self-documenting.
In ctdb_election_data() there is a slight behaviour change. An
inactive node will now try to lose an election. This case should not happen
because:
* An inactive node can't win an election round and then send a reply.
* Any inactive node should never start an election. There are
currently places where this happens and they will be fixed later.
There is an instance where this could be used in
validate_recovery_master() but this involves a more serious logic
change. Overhaul this function later.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
There are some remaining instances in this file but they will be
removed in subsequent commits.
Modernise debug macros as appropriate.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Recovery master is being renamed to leader. This follows clustering
best practice (e.g. RAFT).
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This is currently referenced in a number of inconsistent
ways, including:
* pnn
* rec->ctdb->pnn
* ctdb->pnn
* ctdb_get_pnn(ctdb)
* ctdb_get_pnn(rec->ctdb)
The first of these always requires some thought about the context - is
this the node PNN or some other PNN (e.g. argument to function)?
rec->pnn is now always used when referring to the recovery daemon's
PNN.
Doing this also reduces reliance on struct ctdb_context internals.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
ban_time argument is always ctdb->tunable.recovery_ban_period, so
build this in and make the calling code more readable.
ctdb_ban_node() already logs how long a node is banned for, so don't
repeatedly log this.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
All other arguments are available via rec, so simplify.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
pnn and nodemap are both available via the rec context, so simplify.
vnnmap is unused.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The pnn and nodemap arguments to force_election() and
send_election_request() are always effectively rec->pnn and
rec->nodemap, so simplify.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This is currently referenced in a number of inconsistent
ways, including:
* pnn
* rec->ctdb->pnn
* ctdb->pnn
* ctdb_get_pnn(ctdb)
* ctdb_get_pnn(rec->ctdb)
The first of these always requires some thought about the context - is
this the node PNN or some other PNN (e.g. argument to function)?
The intention is to always use rec->pnn when referring to the recovery
daemon's PNN.
Doing this also reduces reliance on struct ctdb_context internals.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Make the code self-documenting.
This preempts an upcoming change to terminology but doing it now saves
a lot of churn.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The recovery and takeover helpers can run for a while and generate
non-trivial logs, so have them reopen their logs to support log
rotation.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Mon Jan 17 04:36:30 UTC 2022 on sn-devel-184
Recovery and takeover helpers can run for a while and generate
non-trivial logs. They should support log reopening.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Pass on a SIGHUP to the recovery daemon, which will then reopen its
logs.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Now that CTDB uses Samba's file logging it is possible to reopen the
logs, so that log rotation can be supported.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
SIGHUP is for reopening logs, SIGUSR1 is for reconfigure.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This has support for log rotation (or re-opening).
The log format is updated to use an RFC5424 timestamp and to include a
hostname. The addition of the hostname allows trivial merging of log
files from multiple cluster nodes.
The hostname is faked from the CTDB_BASE environment variable during
testing, as per the comment in the code. It is currently faked in a
similar manner in local_daemons.sh when printing logs, so drop this.
Unit tests need updating because stderr logging no longer produces a
"PROGNAME[PID]: " header.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This can be overridden by DEBUG_FILE, whereas DEBUG_STDERR can not.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
RFC5952 says the existing style is not recommended and the [] style
should be employed.
There are more optimised ways of adding the square brackets but they
tend to be uglier.
Parsing IPv6 sockets without [] is now tested indirectly by parsing
examples in both styles and comparing the results.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Signed-off-by: Volker Lendecke <vl@samba.org>
Autobuild-User(master): Volker Lendecke <vl@samba.org>
Autobuild-Date(master): Thu Jan 13 17:02:21 UTC 2022 on sn-devel-184
Add tests to confirm that square brackets are handled and that
IPv4-mapped IPv6 addresses are parsed as expected.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Volker Lendecke <vl@samba.org>
Having this as a small static .text is simpler than having to create
this on the stack.
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
According to "man rindex" on debian bullseye rindex() was deprecated
in Posix.1-2001 and removed from Posix.1-2008.
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
This has been found by covscan and make analyzers happy.
Pair-programmed-with: Andreas Schneider <asn@samba.org>
Signed-off-by: Pavel Filipenský <pfilipen@redhat.com>
Signed-off-by: Andreas Schneider <asn@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Ralph Boehme <slow@samba.org>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Tue Oct 12 23:24:18 UTC 2021 on sn-devel-184
test stub code has been updated to handle this, so now let's put it
to work.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14826
RN: Correctly ignore comments in CTDB public addresses file
Signed-off-by: Ralph Boehme <slow@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Note that order of sed expressions matters: the expression to delete
comment lines must come first as the second expression would transform
# comment
to
comment
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14826
Signed-off-by: Ralph Boehme <slow@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Remote nodes are already initialised as UNHEALTHY when the node list
is initialised at startup (ctdb_load_nodes_file() calls
convert_node_map_to_list()) and when disconnected (ctdb_node_dead()).
So, drop this code.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Thu Sep 9 02:38:34 UTC 2021 on sn-devel-184
If this node is not connected to a node then we shouldn't know
anything about it. The state will be pushed later by the recovery
master.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Now that there are separate disable/enable controls used by the ctdb
tool this control can ignore any flag updates for the current nodes.
These only come from the recovery master, which depends on being able
to fetch flags for all nodes.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
CTDB_SRVID_SET_NODE_FLAGS is no longer sent so drop monitor_handler()
and replace with srvid_not_implemented(). Mark the SRVID obsolete in
its comment.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The code that handles this message is
ctdb_recoverd.c:monitor_handler(). Although it appears to do
something potentially useful, it only logs the flags changes. All
changes made are to local structures - there are no actual
side-effects.
It used to trigger a takeover run when the DISABLED flag changed.
This was dropped back in commit
662f06de9f.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
When flags change, promote the message to NOTICE level and switch the
message to the style that is currently generated by
ctdb-recoverd.c:monitor_handler(). This will allow monitor_handler()
to go away in future.
Drop logging when flags do not change. The recovery master now logs
when it pushes flags for a node, so the lack of a corresponding
"changed flags" message here indicates that no update was required.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Don't trust the old flags from the recovery master.
Surrounding code will change in future comments, including the use of
old-style debug macros, so just make this change clear.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Note that there a change from broadcast to a directed control here.
This is OK because the recovery master will push flags if any nodes
disagree with the canonical flags fetched from a node.
Static function ctdb_ctrl_modflags() is no longer used to drop it.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
DISABLED is UNHEALTHY | PERMANENTLY_DISABLED, which is not what is
intended here. Luckily, it doesn't do any harm because nodes are
marked unhealthy at startup anyway.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
These are CTDB_CONTROL_DISABLE_NODE and CTDB_CONTROL_ENABLE_NODE.
For consistency these match CTDB_CONTROL_STOP_NODE and
CTDB_CONTROL_CONTINUE_NODE. It would be possible to add a single
control but it would need to take data.
The aim is to finally fix races in flag handling. Previous fixes have
improved the situation but they have only narrowed the race window.
The problem is that the recovery daemon on the master node pushes
flags to nodes the same way that disable and enable are implemented.
So the following sequence is still racy:
1. Node A is disabled
2. Recovery master pulls flags from all nodes including A
3. Node A is enabled
4. Recovery master notices A is disabled and pushes a flag update to
all nodes including node A
5. Node A is erroneously marked disabled
Node A can not tell if the MODIFY_FLAGS control is from a "ctdb
disable" command or a flag update from the recovery master.
The solution is to use a different mechanism for disable/enable and
for a node to ignore MODIFY_FLAGS controls for their own flags.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This will usually happen if flags on the node in question change, so
keeping the code simple and pushing to all nodes won't hurt. When all
nodes come up there might be differences in connected nodes, causing
such "fix ups". Receiving nodes will ignore no-op pushes.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The resulting code structure looks a little weird. However, there is
another condition that requires the flags to be pushed that will be
inserted before the continue statement in a subsequent commit..
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
pylint warns:
Use lazy % formatting in logging functions
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: David Disseldorp <ddiss@samba.org>
Reviewed-by: Jose A. Rivera <jarrpa@samba.org>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Tue Jul 20 05:29:18 UTC 2021 on sn-devel-184
Don't bother with "as e" to avoid warning about unused variable.
Don't use bare "except:" (though pylint still complains about this
version).
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: David Disseldorp <ddiss@samba.org>
Reviewed-by: Jose A. Rivera <jarrpa@samba.org>
Removes an unnecessary level of indirection: defaults and help strings
are now where they are expected. Also removes some global variables.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: David Disseldorp <ddiss@samba.org>
Reviewed-by: Jose A. Rivera <jarrpa@samba.org>
Removes the need for the global variables currently associated with
this processing. Also removes unnecessarily double-handling the
defaults, which are assigned to the global variables and set via
add_argument().
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: David Disseldorp <ddiss@samba.org>
Reviewed-by: Jose A. Rivera <jarrpa@samba.org>
Avoids numerous pylint warnings.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: David Disseldorp <ddiss@samba.org>
Reviewed-by: Jose A. Rivera <jarrpa@samba.org>
Due to the number of flake8 and pylint warnings it is unclear if the
source has Python 3 incompatibilities. These will be cleaned up in
subsequent commits.
Signed-off-by: "L.P.H. van Belle" <belle@bazuin.nl>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: David Disseldorp <ddiss@samba.org>
Reviewed-by: Jose A. Rivera <jarrpa@samba.org>
In ShellCheck 0.7.2, POSIX compatibility warnings got their own SC3xxx
error codes, so now both the old and new codes need to be ignored.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Fri Jun 25 10:06:48 UTC 2021 on sn-devel-184
Fedora 34 now has a shell function for the which command, which causes
these uses of which to return the enclosing function definition rather
than the executable file as expected.
The event script unit tests always expect the stub service command to
be used, so the conditional in these functions is unnecessary.
$CTDB_HELPER_BINDIR already conveniently points to the stub directory,
so use it here.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
The socket is set close-on-exec but that doesn't help for processes
that do not exec(). This should be done for all child processes.
This has been seen in testing where "ctdb shutdown" waits for the
socket to close before succeeding. It appears that lingering
vacuuming processes have not closed the socket when becoming clients
so they cause "ctdb shutdown" to hang even though the main daemon
process has exited. The cause of the lingering vacuuming processes
has been previously examined but still isn't understood.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The path of the TDB is known, so calculate the file ID (device number
+ inode number) from it and use this to directly filter /proc/locks to
find processes holding locks.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Don't use the arguments yet. They will be used in a simplified
version of the code.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
1. PID of lock helper waiting for lock
2. Scope of lock: "record" or "db"
3. Path to database that lock helper is trying to lock
4. Whether the database uses mutexes: "mutex" or "fcntl"
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
These were fine (though still lazy) when these tests were the only
user of this stub. However, the ps stub is about to be enhanced, so
fix these uses of it to represent the intended usage.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The main reason for this is to facilitate testing.
Avoid some /proc accesses entirely by using ps(1) (which can be
replaced by a stub when testing) because this script might as well be
more portable in case anyone wants to add lock debugging for a
non-Linux platform. While the "state" format specification isn't
POSIX-compliant, it works on both Linux and FreeBSD so it is a
reasonable improvement.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
If a script times out the caller can talloc_free() the script_list
output of run_event_recv, which talloc_free's proc->output from
run_proc.c as well. If the script generates further output after the
timeout and then exits after a while, the SIGCHLD handler in the
eventd tries to read into proc->output, which was already free'ed.
Fix this by not doing just a talloc_steal but a talloc_move. This way
proc_read_handler() called from run_proc_signal_handler() does not try
to realloc the stale reference to proc->output but gets a NULL
reference.
I don't really know how to do a knownfail in ctdb, so this commit
actually activates catching the signal by waiting long enough for
22.bar to exit and generate the SIGCHLD.
Bug: https://bugzilla.samba.org/show_bug.cgi?id=14475
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Ralph Boehme <slow@samba.org>
This will lead to a crash in run_event_test.c soon
Bug: https://bugzilla.samba.org/show_bug.cgi?id=14475
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Ralph Boehme <slow@samba.org>
Triggers a different code path in run_event_* and aligns it more what
the ctdb eventd really does.
Bug: https://bugzilla.samba.org/show_bug.cgi?id=14475
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Ralph Boehme <slow@samba.org>
Use F_GETLK to get the lock holder PID, this is more accurate than
reading the file contents: A conflicting process might not have
written its PID yet. Also, F_GETLK easily allows to do a retry if the
lock holder just died.
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
This test has been failing with:
Wait until record is migrated to lmaster node 0
<30|BAD: node 0 is not dmaster
dmaster: 1
rsn: 8
flags: 0x00010000 MIGRATED_WITH_DATA
data(6) = "value1"
*** TEST COMPLETED (RC=1) AT 2021-02-02 06:18:48, CLEANING UP...
This should never happen. If this really fails then the wait should
time out.
The problem is that wait_until() does:
"$@" || _rc=$?
and vacuum_test_key_dmaster() currently calls ctdb_test_fail() on
failure, which causes the shell to exit. Instead, pass a variant to
wait_until() that simply returns the correct status instead of
exiting.
An alternative would be to change the statement in wait_until() to do:
("$@") || _rc=$?
so it captures the exit. However, this is a global change and
requires more thought.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Jeremy Allison <jra@samba.org>
This is helpful if you are in a listening loop with the same receiver
for many sockets doing the same thing.
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
If run with UID wrapper and UID_WRAPPER_ROOT=1 then securing the
socket will fail.
Test mode means that local daemons are in use, so securing the socket
is not important.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
Variable res is only used once and ret is re-used many times. Drop
res, use ret, which doesn't need to be initialised. Modernise debug
macro.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
When compiling with GCC 10.x and -O3 optimization, the IP checksum
calculation code generates wrong checksum. The function uint16_checksum
gets inlined during optimization and ip4pkt->tcp data gets wrongly
aliased.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14537
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Wed Oct 21 05:52:28 UTC 2020 on sn-devel-184
Check that the desired state is set on all nodes instead of just the
test node. This ensures that node flags have correctly propagated
across the cluster.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14513
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue Oct 6 04:32:06 UTC 2020 on sn-devel-184
update_flags() has already updated the recovery master's canonical
node map, based on the flags from each remote node, and pushed out
these flags to all nodes.
If i == j then the node map has already been updated from this remote
node's flags, so simply drop this case.
Although update_flags() has updated flags for all nodes, it did not
update each node map in remote_nodemaps[] to reflect this. This means
that remote_nodemaps[] may contain inconsistent flags for some nodes
so it should not be used to check consistency when i != j.
Further, a meaningful difference in flags can only really occur if
update_flags() failed. In that case this code is never reached.
These observations combine to imply that this whole loop should be
dropped.
This leaves potential sub-second inconsistencies due to out-of-band
healthy/unhealthy flag changes pushed via CTDB_SRVID_PUSH_NODE_FLAGS.
These updates could be dropped (takeover run asks each node for
available IPs rather than making centralised decisions based on node
flags) but for now they will be fixed in the next iteration of
main_loop().
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14513
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This has already been done in update_flags().
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14513
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Signed-off-by: David Disseldorp <ddiss@samba.org>
Reviewed-by: Samuel Cabrero <scabrero@samba.org>
Autobuild-User(master): David Disseldorp <ddiss@samba.org>
Autobuild-Date(master): Thu Sep 24 00:52:42 UTC 2020 on sn-devel-184
The Ceph Manager's service map is useful for tracking the status of
Ceph related services. By registering the CTDB recovery lock holder,
Ceph storage administrators can more easily identify where and when a
CTDB cluster is up and running.
Signed-off-by: David Disseldorp <ddiss@samba.org>
Reviewed-by: Samuel Cabrero <scabrero@samba.org>
This is no longer necessary because the capability new style database
pull is assumed to always be available.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Removes use of the old controls without cleaning up the code. Clean
up can be done later.
After this change the CTDB_CAP_FRAGMENTED_CONTROLS capability is no
longer checked. This capability can be removed along with the
controls.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The new waf detects a duplicate instance of
ctdb_mutex_ceph_rados_helper.7.xml, which is due
to manpages_extra being a pointer to
manpages_misc, therefore each call to build()
added duplicate entries to the manpages_misc
global entry.
Signed-off-by: David Mulder <dmulder@suse.com>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
This makes it consistent with the monitoring code. If the master has
changed then this means the master will always get the message.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue Aug 18 06:24:11 UTC 2020 on sn-devel-184
This also updates remote flags so the name is misleading.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14466
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
update_local_flags() will be changed to use these nodemaps.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14466
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The nodemap has already been fetched from the local node and is
actually passed to this function. Care must be taken to avoid
referencing the "remote" nodemap for the recovery master. It also
isn't useful to do so, since it would be the same nodemap.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14466
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The plan here is to use the nodemaps retrieved by get_remote_nodes()
in update_local_flags(). This will improve efficiency, since
get_remote_nodes() fetches flags from nodes in parallel. It also
means that get_remote_nodes() can be used exactly once early on in
main_loop() to retrieve remote nodemaps. Retrieving nodemaps multiple
times is unnecessary and racy - a single monitoring iteration should
not fetch flags multiple times and compare them.
This introduces a temporary behaviour change but it will be of no
consequence when the above changes are made.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14466
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This array is indexed by the same index as nodemap, not the PNN.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14466
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Also drop error handling in main_loop() that is replaced by this
change.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14466
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This will allow an error callback to be added.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14466
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Change 1st argument to a rec context, since this will be needed later.
Drop the nodemap argument and access it via rec->nodemap instead.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14466
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The memory is allocated off the memory context used by the current
iteration of main loop. It is freed when main loop completes the fix
doesn't require backporting to stable branches. However, it is sloppy
so it is worth fixing.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14466
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Don't log an error on failure - let the caller can do this. Apart
from this: fix up coding style and modernise the remaining error
message.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14466
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This isn't used anywhere and can easily be checked via "ctdb pnn" and
"ctdb recmaster" commands.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Signed-off-by: Ralph Boehme <slow@samba.org>
Reviewed-by: Swen Schillig <swen@linux.ibm.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
Autobuild-User(master): Jeremy Allison <jra@samba.org>
Autobuild-Date(master): Mon Aug 3 22:21:04 UTC 2020 on sn-devel-184
If nfsconf exists then use it as last resort to attempt to extract
[nfsd]:threads from /etc/nfs.conf.
Invocation of nfsconf requires "|| true" because this script uses "set
-e". Add a stub that always fails to at least test this much.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14444
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Mon Jul 27 07:06:58 UTC 2020 on sn-devel-184
If nfsconf exists then use it as last resort to attempt to extract
[statd]:name from /etc/nfs.conf.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14444
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Instead of master/slave.
Nearly all of these are simple textual substitutions, which preserve
the case of the original. A couple of minor cleanups were made in the
documentation (such as "LVSMASTER" -> "LVS leader").
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Instead of master/slave.
Nearly all of these are simple textual substitutions, which preserve
the case of the original.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The logic currently in ctdb_ctrl_modflags() will be optimised so that
it no longer matches the pattern for a control function. So, remove
this function and squash its functionality into the only caller.
Although there are some superficial changes, the behaviour is
unchanged.
Flattening the 2 functions produces some seriously weird logic for
setting the new flags, to the point where using ctdb_ctrl_modflags()
for this purpose now looks very strange. The weirdness will be
cleaned up in a subsequent commit.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This is clearer than using the MODFLAGS control directly.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This makes fields such as recmaster and nodemap easily available if
required.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
An unused argument needlessly extends the length of function calls. A
subsequent change will allow rec->nodemap to be used if necessary.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Avoid use of non-portable md5sum by constructing database names using
index. Improve indentation, use more modern commands, code
improvements (shellcheck).
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Wed Jul 22 09:14:35 UTC 2020 on sn-devel-184
Simplify code, use more modern commands, code improvements (shellcheck).
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
"wc -l" on some platforms (e.g. FreeBSD) contains leading spaces, so
strip them.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Select test node with IPs instead of using a fixed node. Remove
unnecessary code, use more modern commands, code
improvements (shellcheck).
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
"wc -l" on some platforms (e.g. FreeBSD) contains leading spaces and
stops "$num from being a number. Create a more portable solution and
put it in a function instead of repeating the logic.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
It would be nice to get rid of "onnode any". There's no use making
tests nondeterministic. If covering different cases matters then they
should be explicitly handled.
In most places "any" is replaced by "$test_node". In some cases,
where $test_node is not set, a fixed node that is already used
elsewhere can be reused.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Separate cluster startup from test initialisation for tests that start
the cluster with customised configuration. In these cases the result
of the cluster startup is actually the point of the test.
Additionally, pubips.013.failover_noop.sh claims to have completed
test initialisation twice, which just seems wrong.
The result is:
* ctdb_test_init() takes one option (-n) to indicate when it should
not configure/start the cluster
* New function ctdb_nodes_start_custom() accepts options for special
cluster configuration, only operates on local daemons and triggers a
test failure rather than a test error on failure.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The only caller calls ctdb_test_error() on failure and nesting this
calls can be confusing. A future change will make this even more
confusing.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Mostly avoidance of quoting warnings.
Silencing warnings about unquoted $CTDB_TEST_CAT_RESULTS_OPTS is
handled by passing '-' to cat when that variable's value is empty.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Apart from the non-constant sourcing of include files.
Mostly avoidance of quoting warnings.
One subtle change is to simply pass "120" to wait_until_ready() to
stop warnings that it expects arguments but none are passed (both
SC2119 and SC2120). There seems no way to indicate to structure
function argument handling so that shellcheck realises arguments are
optional. In later shellcheck versions, disabling SC2120 for a
function also silences complaints about its callers... but not all of
our testing uses "later" shellcheck versions.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
* Use "#!/usr/bin/env bash" for improved portability
* Drop test_info() definition and replace it with a comment
The use of test_info() is pointless.
* Drop call to cluster_is_healthy()
This is a holdover from when the previous test would restart daemons
to get things ready for a test. There was also a bug where going
into recovery during the restart would sometimes cause the cluster
to become unhealthy. If we really need something like this then we
can add it to ctdb_test_init().
* Make order of preamble consistent
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Avoid:
.../UNIT/shellcheck/scripts/local.sh: line 14: type: shellcheck: not found
The "type" command in dash prints the "not found" message to stdout
but the bash version prints to stderr, so redirect stderr too.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The output in a test failure appears to contain no pstree output
because "00\.test\.script,.*" does not match. However, this is just a
guess because the output is not shown.
Showing the output makes it easier to understand test failures.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This will allow local daemons to be used in more contexts, especially
in tests run by Jenkins where the directory names for some targets can
be very long.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The standalone build still includes tests, as does the top-level build
when --enable-selftest is used. The latter is consistent with the use
of --enable-selftest in the rest of the tree.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
In certain circumstance, which aren't obvious, cat(1) can fail when
attempting to write a lot of data. This is due to something (probably
write(2)) returning EAGAIN.
Given that the -v option should only really be used for test
debugging, ignore the failure instead of spending time debugging it.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14446
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
In certain circumstance, which aren't obvious, cat(1) can fail when
attempting to write a lot of data. This is due to something (probably
write(2)) returning EAGAIN.
Given that the -v option should only really be used for test
debugging, ignore the failure instead of spending time debugging it.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14446
Signed-off-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Wed Jul 22 04:10:47 UTC 2020 on sn-devel-184