1
0
mirror of https://github.com/samba-team/samba.git synced 2024-12-24 21:34:56 +03:00
Commit Graph

8796 Commits

Author SHA1 Message Date
Martin Schwenke
9e7d2d9794 ctdb-daemon: Don't mark a node as unhealthy when connecting to it
Remote nodes are already initialised as UNHEALTHY when the node list
is initialised at startup (ctdb_load_nodes_file() calls
convert_node_map_to_list()) and when disconnected (ctdb_node_dead()).
So, drop this code.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Thu Sep  9 02:38:34 UTC 2021 on sn-devel-184
2021-09-09 02:38:34 +00:00
Martin Schwenke
7f697b1938 ctdb-daemon: Ignore flag changes for disconnected nodes
If this node is not connected to a node then we shouldn't know
anything about it.  The state will be pushed later by the recovery
master.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
2021-09-09 01:46:49 +00:00
Martin Schwenke
ae10a8a4b7 ctdb-daemon: Simplify ctdb_control_modflags()
Now that there are separate disable/enable controls used by the ctdb
tool this control can ignore any flag updates for the current nodes.
These only come from the recovery master, which depends on being able
to fetch flags for all nodes.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2021-09-09 01:46:49 +00:00
Martin Schwenke
916c5ee131 ctdb-recoverd: Mark CTDB_SRVID_SET_NODE_FLAGS obsolete
CTDB_SRVID_SET_NODE_FLAGS is no longer sent so drop monitor_handler()
and replace with srvid_not_implemented().  Mark the SRVID obsolete in
its comment.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2021-09-09 01:46:49 +00:00
Martin Schwenke
e75256767f ctdb-daemon: Don't bother sending CTDB_SRVID_SET_NODE_FLAGS
The code that handles this message is
ctdb_recoverd.c:monitor_handler().  Although it appears to do
something potentially useful, it only logs the flags changes.  All
changes made are to local structures - there are no actual
side-effects.

It used to trigger a takeover run when the DISABLED flag changed.
This was dropped back in commit
662f06de9f.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2021-09-09 01:46:49 +00:00
Martin Schwenke
0132bd5a22 ctdb-daemon: Modernise remaining debug macro in this function
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2021-09-09 01:46:49 +00:00
Martin Schwenke
b6d25d079e ctdb-daemon: Update logging for flag changes
When flags change, promote the message to NOTICE level and switch the
message to the style that is currently generated by
ctdb-recoverd.c:monitor_handler().  This will allow monitor_handler()
to go away in future.

Drop logging when flags do not change.  The recovery master now logs
when it pushes flags for a node, so the lack of a corresponding
"changed flags" message here indicates that no update was required.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2021-09-09 01:46:49 +00:00
Martin Schwenke
eec44e2862 ctdb-daemon: Correct the condition for logging unchanged flags
Don't trust the old flags from the recovery master.

Surrounding code will change in future comments, including the use of
old-style debug macros, so just make this change clear.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2021-09-09 01:46:49 +00:00
Martin Schwenke
5914054698 ctdb-tools: Use disable and enable controls in tool
Note that there a change from broadcast to a directed control here.
This is OK because the recovery master will push flags if any nodes
disagree with the canonical flags fetched from a node.

Static function ctdb_ctrl_modflags() is no longer used to drop it.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2021-09-09 01:46:49 +00:00
Martin Schwenke
6fe6a54e7f ctdb-client: Add client code for disable/enable controls
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2021-09-09 01:46:49 +00:00
Martin Schwenke
15a6489c28 ctdb_daemon: Implement controls DISABLE_NODE/ENABLE_NODE
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2021-09-09 01:46:49 +00:00
Martin Schwenke
60c1ef1465 ctdb-daemon: Start as disabled means PERMANENTLY_DISABLED
DISABLED is UNHEALTHY | PERMANENTLY_DISABLED, which is not what is
intended here.  Luckily, it doesn't do any harm because nodes are
marked unhealthy at startup anyway.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2021-09-09 01:46:49 +00:00
Martin Schwenke
1ac7bc7532 ctdb-daemon: Factor out a function to get node structure from PNN
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2021-09-09 01:46:49 +00:00
Martin Schwenke
e0a7b5a9e8 ctdb-daemon: Add a helper variable
Simplifies a subsequent change.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2021-09-09 01:46:49 +00:00
Martin Schwenke
6845dca87e ctdb-protocol: Add marshalling for controls DISABLE_NODE/ENABLE_NODE
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2021-09-09 01:46:49 +00:00
Martin Schwenke
49dc5d8cd2 ctdb-protocol: Add new controls to disable and enable nodes
These are CTDB_CONTROL_DISABLE_NODE and CTDB_CONTROL_ENABLE_NODE.

For consistency these match CTDB_CONTROL_STOP_NODE and
CTDB_CONTROL_CONTINUE_NODE.  It would be possible to add a single
control but it would need to take data.

The aim is to finally fix races in flag handling.  Previous fixes have
improved the situation but they have only narrowed the race window.
The problem is that the recovery daemon on the master node pushes
flags to nodes the same way that disable and enable are implemented.
So the following sequence is still racy:

1. Node A is disabled
2. Recovery master pulls flags from all nodes including A
3. Node A is enabled
4. Recovery master notices A is disabled and pushes a flag update to
   all nodes including node A
5. Node A is erroneously marked disabled

Node A can not tell if the MODIFY_FLAGS control is from a "ctdb
disable" command or a flag update from the recovery master.

The solution is to use a different mechanism for disable/enable and
for a node to ignore MODIFY_FLAGS controls for their own flags.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2021-09-09 01:46:49 +00:00
Martin Schwenke
8305f6a7f1 ctdb-recoverd: Push flags for a node if any remote node disagrees
This will usually happen if flags on the node in question change, so
keeping the code simple and pushing to all nodes won't hurt.  When all
nodes come up there might be differences in connected nodes, causing
such "fix ups".  Receiving nodes will ignore no-op pushes.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2021-09-09 01:46:49 +00:00
Martin Schwenke
620d078714 ctdb-recoverd: Update the local node map before pushing out flags
The resulting code structure looks a little weird.  However, there is
another condition that requires the flags to be pushed that will be
inserted before the continue statement in a subsequent commit..

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2021-09-09 01:46:49 +00:00
Martin Schwenke
82a075d4d7 ctdb-recoverd: Add a helper variable
Improves readability and simplifies subsequent changes.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2021-09-09 01:46:49 +00:00
Martin Schwenke
b724c1e6a6 utils: Avoid pylint warning
pylint warns:

  Use lazy % formatting in logging functions

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: David Disseldorp <ddiss@samba.org>
Reviewed-by: Jose A. Rivera <jarrpa@samba.org>

Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Tue Jul 20 05:29:18 UTC 2021 on sn-devel-184
2021-07-20 05:29:18 +00:00
Martin Schwenke
319e27343d utils: Reformat lines that are longer than 80 columns
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: David Disseldorp <ddiss@samba.org>
Reviewed-by: Jose A. Rivera <jarrpa@samba.org>
2021-07-20 04:43:37 +00:00
Martin Schwenke
98c7a38b71 utils: Tweak exception handling to stop flake8 complaining
Don't bother with "as e" to avoid warning about unused variable.
Don't use bare "except:" (though pylint still complains about this
version).

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: David Disseldorp <ddiss@samba.org>
Reviewed-by: Jose A. Rivera <jarrpa@samba.org>
2021-07-20 04:43:37 +00:00
Martin Schwenke
12d3e215a6 utils: Simplify log level logic, drop global variable
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: David Disseldorp <ddiss@samba.org>
Reviewed-by: Jose A. Rivera <jarrpa@samba.org>
2021-07-20 04:43:37 +00:00
Martin Schwenke
e323d16a9d utils: Inline defaults and help strings
Removes an unnecessary level of indirection: defaults and help strings
are now where they are expected.  Also removes some global variables.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: David Disseldorp <ddiss@samba.org>
Reviewed-by: Jose A. Rivera <jarrpa@samba.org>
2021-07-20 04:43:37 +00:00
Martin Schwenke
af5aecced1 utils: Move argument processing into function and call from main()
Removes the need for the global variables currently associated with
this processing.  Also removes unnecessarily double-handling the
defaults, which are assigned to the global variables and set via
add_argument().

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: David Disseldorp <ddiss@samba.org>
Reviewed-by: Jose A. Rivera <jarrpa@samba.org>
2021-07-20 04:43:37 +00:00
Martin Schwenke
e66637a079 utils: Reorder imports so that standard imports are first
Avoids numerous pylint warnings.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: David Disseldorp <ddiss@samba.org>
Reviewed-by: Jose A. Rivera <jarrpa@samba.org>
2021-07-20 04:43:37 +00:00
Martin Schwenke
bd0b2bb6ee utils: Clean up ctdb_etcd_lock using autopep8
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: David Disseldorp <ddiss@samba.org>
Reviewed-by: Jose A. Rivera <jarrpa@samba.org>
2021-07-20 04:43:37 +00:00
Martin Schwenke
939aed0498 utils: Use Python 3
Due to the number of flake8 and pylint warnings it is unclear if the
source has Python 3 incompatibilities.  These will be cleaned up in
subsequent commits.

Signed-off-by: "L.P.H. van Belle" <belle@bazuin.nl>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: David Disseldorp <ddiss@samba.org>
Reviewed-by: Jose A. Rivera <jarrpa@samba.org>
2021-07-20 04:43:37 +00:00
Martin Schwenke
466aa8b6f5 ctdb-scripts: Ignore ShellCheck SC3013 for test -nt
In ShellCheck 0.7.2, POSIX compatibility warnings got their own SC3xxx
error codes, so now both the old and new codes need to be ignored.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Fri Jun 25 10:06:48 UTC 2021 on sn-devel-184
2021-06-25 10:06:48 +00:00
Martin Schwenke
fc0da6b0f8 ctdb-tests: Force stub version of service in eventscript tests
Fedora 34 now has a shell function for the which command, which causes
these uses of which to return the enclosing function definition rather
than the executable file as expected.

The event script unit tests always expect the stub service command to
be used, so the conditional in these functions is unnecessary.
$CTDB_HELPER_BINDIR already conveniently points to the stub directory,
so use it here.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
2021-06-25 09:16:31 +00:00
Martin Schwenke
23b2fab2c8 ctdb-common: Drop unused include of mkdir_p.h
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2021-06-25 09:16:31 +00:00
Martin Schwenke
e40d452722 ctdb-daemon: Close server socket when switching to client
The socket is set close-on-exec but that doesn't help for processes
that do not exec().  This should be done for all child processes.

This has been seen in testing where "ctdb shutdown" waits for the
socket to close before succeeding.  It appears that lingering
vacuuming processes have not closed the socket when becoming clients
so they cause "ctdb shutdown" to hang even though the main daemon
process has exited.  The cause of the lingering vacuuming processes
has been previously examined but still isn't understood.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2021-06-25 09:16:31 +00:00
Martin Schwenke
f7cf8132b0 ctdb-tests: Add debug_locks.sh tests for mutexes
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Fri May 28 07:34:23 UTC 2021 on sn-devel-184
2021-05-28 07:34:23 +00:00
Amitay Isaacs
99c3b49260 ctdb-scripts: Add lock debugging for tdb mutex locks
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Signed-off-by: Martin Schwenke <martin@meltin.net>
2021-05-28 06:46:29 +00:00
Amitay Isaacs
cb55b68b3e ctdb-utils: Add tdb_mutex_check utility
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2021-05-28 06:46:29 +00:00
Martin Schwenke
dd5972b699 ctdb-scripts: Simplify logic in debug_via_proc_locks()
The path of the TDB is known, so calculate the file ID (device number
+ inode number) from it and use this to directly filter /proc/locks to
find processes holding locks.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2021-05-28 06:46:29 +00:00
Martin Schwenke
e62ae53ef6 ctdb-scripts: Update debug_locks.sh to handle arguments
Don't use the  arguments yet.  They will be used in a simplified
version of the code.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2021-05-28 06:46:29 +00:00
Martin Schwenke
1dfff9751b ctdb-scripts: Move current lock debugging to a function
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2021-05-28 06:46:29 +00:00
Amitay Isaacs
d07875330a ctdb-locking: Pass additional arguments to debug locks script
1. PID of lock helper waiting for lock
2. Scope of lock: "record" or "db"
3. Path to database that lock helper is trying to lock
4. Whether the database uses mutexes: "mutex" or "fcntl"

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2021-05-28 06:46:29 +00:00
Martin Schwenke
2c7dbb043f ctdb-tests: Add debug_locks.sh testing
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2021-05-28 06:46:29 +00:00
Martin Schwenke
a3e7fd9c61 ctdb-tests: Fix nonsense arguments to ps stub
These were fine (though still lazy) when these tests were the only
user of this stub.  However, the ps stub is about to be enhanced, so
fix these uses of it to represent the intended usage.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2021-05-28 06:46:29 +00:00
Martin Schwenke
ffb56c9143 ctdb-scripts: Avoid direct /proc access
The main reason for this is to facilitate testing.

Avoid some /proc accesses entirely by using ps(1) (which can be
replaced by a stub when testing) because this script might as well be
more portable in case anyone wants to add lock debugging for a
non-Linux platform.  While the "state" format specification isn't
POSIX-compliant, it works on both Linux and FreeBSD so it is a
reasonable improvement.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2021-05-28 06:46:29 +00:00
Martin Schwenke
55d4b3438f ctdb-scripts: Factor out function dump_stacks()
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2021-05-28 06:46:29 +00:00
Volker Lendecke
adef87a621 ctdb: Fix a crash in run_proc_signal_handler()
If a script times out the caller can talloc_free() the script_list
output of run_event_recv, which talloc_free's proc->output from
run_proc.c as well. If the script generates further output after the
timeout and then exits after a while, the SIGCHLD handler in the
eventd tries to read into proc->output, which was already free'ed.

Fix this by not doing just a talloc_steal but a talloc_move. This way
proc_read_handler() called from run_proc_signal_handler() does not try
to realloc the stale reference to proc->output but gets a NULL
reference.

I don't really know how to do a knownfail in ctdb, so this commit
actually activates catching the signal by waiting long enough for
22.bar to exit and generate the SIGCHLD.

Bug: https://bugzilla.samba.org/show_bug.cgi?id=14475
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Ralph Boehme <slow@samba.org>
2021-05-18 10:42:32 +00:00
Volker Lendecke
f320d1a7ab ctdb: Introduce output before and after the 10-second timeout
This will lead to a crash in run_event_test.c soon

Bug: https://bugzilla.samba.org/show_bug.cgi?id=14475
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Ralph Boehme <slow@samba.org>
2021-05-18 10:42:32 +00:00
Volker Lendecke
19290f10c7 ctdb: Wait for SIGCHLD if script timed out
Bug: https://bugzilla.samba.org/show_bug.cgi?id=14475
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Ralph Boehme <slow@samba.org>
2021-05-18 10:42:32 +00:00
Volker Lendecke
07ab9b7a71 ctdb: Introduce a helper variable in run_event_test.c
Bug: https://bugzilla.samba.org/show_bug.cgi?id=14475
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Ralph Boehme <slow@samba.org>
2021-05-18 10:42:32 +00:00
Volker Lendecke
9398d4b912 ctdb: Call run_event_recv() in a callback function
Triggers a different code path in run_event_* and aligns it more what
the ctdb eventd really does.

Bug: https://bugzilla.samba.org/show_bug.cgi?id=14475
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Ralph Boehme <slow@samba.org>
2021-05-18 10:42:32 +00:00
Volker Lendecke
f188c9d732 ctdb: fix typos
Bug: https://bugzilla.samba.org/show_bug.cgi?id=14475
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Ralph Boehme <slow@samba.org>
2021-05-18 10:42:32 +00:00
Volker Lendecke
cf43f331be lib: Make pidfile_path_create() return the existing PID on conflict
Use F_GETLK to get the lock holder PID, this is more accurate than
reading the file contents: A conflicting process might not have
written its PID yet. Also, F_GETLK easily allows to do a retry if the
lock holder just died.

Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
2021-03-16 17:09:32 +00:00