IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Leave common/conf.[ch] where they are to make this commit
comprehensible.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Guenther Deschner <gd@samba.org>
Reviewed-by: Anoop C S <anoopcs@samba.org>
rpc.statd is single-threaded and runs its HA callout synchronously. If
it is too slow then latency accumulates and rpc.statd's backlog grows.
Running a pair of add-client/del-client events with the current code
averages ~0.030s in my test environment. This mean that 1000 clients
reclaiming locks after failover can easily cause 10s of latency. This
could cause rpc.statd to become unresponsive, resulting in a time out
for an rpcinfo-based health check of the status service.
Split the add-client/del-client events out to a standalone
statd_callout executable, written in C, to be used as the HA callout
for rpc.statd. All other functions move to statd_callout_helper.
Now, running a pair of add-client/del-client events in my test
environment averages only ~0.002s. This seems less likely to cause
latency problems.
The standalone statd_callout executable needs to read a configuration
file, which is generated by statd_callout_helper from the "startup"
event. It also needs access to a list of currently assigned public
IPs.
For backward compatibility, during installation a symlink is created
from $CTDB_BASE/statd-callout to the new statd_callout, which is
installed in the helper directory.
Testing this as part of the eventscript unit tests starts to become
even more of a hack than it used to be. However, the dependency on
stubs and the corresponding setup of fake state makes it hard to move
this elsewhere.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Tue Jun 25 04:24:57 UTC 2024 on atb-devel-224
There is a typo here, since there will be no process called "status".
Instead of fixing it, drop this because rpc.statd isn't the focus of
this monitoring check and when systemd is init rpc.statd isn't
restarted with nfs-ganesha. It stays running, so a confusing stack
trace for rpc.statd is always logged.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
If ganesha.nfsd is gone then a node can't provide an NFS service, so
should be marked unhealthy. A later restart may bring it back to
health.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This one does an rpcinfo check, along with statistics mitigation. It
can be used in combination with the existing 20.nfs_ganesha.check.
The equivalent kernel NFS file only restarts every 10 failures. This
one can be a little more proactive given that false positives are less
likely with the statistics mitigation.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Simplicity is preferred here over absolute correctness. If the
ganesha_stats command exits with an error or times out then no output
is produced so, implicitly, the statistics do not change. Also, the
statistics always change at startup. However, it is likely that the
statistics change when NFS makes progress and do not change when NFS
does not make progress.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
When monitoring an RPC service, the rpcinfo command might time out
even though the service is making progress. In this case, it is just
slow, so counting the timeout as a failure and potentially restarting
the service will not help. The problem is determining if a service is
making progress.
Add a new NFS checks service_stats_command. This command is intended
to run a statistics command. The output is naively compared using
cmp(1). If the output changes then rpcinfo failures are converted to
successes.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Document the new optional argument to specify the namespace to be
associated with RADOS objects in a pool.
Pair-Programmed-With: Anoop C S <anoopcs@samba.org>
Signed-off-by: Günther Deschner <gd@samba.org>
Reviewed-by: Günther Deschner <gd@samba.org>
Reviewed-by: David Disseldorp <ddiss@samba.org>
Autobuild-User(master): Anoop C S <anoopcs@samba.org>
Autobuild-Date(master): Fri Jun 14 07:42:25 UTC 2024 on atb-devel-224
RADOS objects within a pool can be associated to a namespace for
logical separation. librados already provides an API to configure
such a namespace with respect to a context. Make use of it as an
optional argument to the helper binary.
Pair-Programmed-With: Anoop C S <anoopcs@samba.org>
Signed-off-by: Günther Deschner <gd@samba.org>
Reviewed-by: Günther Deschner <gd@samba.org>
Reviewed-by: David Disseldorp <ddiss@samba.org>
While the PID check is worth it in relevant cases, NFS-Ganesha still
might go away after the check. Unfortunately, neither grace command
fails an indicative exit code, so invent one by checking error
messages. This can then be converted to success by the caller.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
Autobuild-User(master): Volker Lendecke <vl@samba.org>
Autobuild-Date(master): Thu May 30 12:50:01 UTC 2024 on atb-devel-224
If monitoring has failed because it isn't running, then don't fail
"startipreallocate" or "relaseip" by trying to go into grace.
Don't check this for "takeip". In that case NFS-Ganesha had better be
running.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
No need to grovel around in /proc. ps will happily tell us the
command.
Factor out the actual check into a separate function that can be used
elsewhere.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
Path values do not need to have quotes. The current code fails if
there aren't any.
Instead, implement a 2 stage parser using 2 sed commands. See
comments in the code for details.
Regexps are POSIX basic regular expressions, apart from \<WORD\> (used
to ensure WORD is on word boundaries, and the 'i' flag for case
insensitivity. The latter is supported in FreeBSD sed.
This code successfully parses Path values out of the following
monstrosity:
path = "/foo/bar1;a";
Path = /foo/bar2;
Something = false;
Pseudo = "/foo/bar3x" ; Path = "/foo/bar3; y" ; Access_type = RO;
Pseudo = "/foo/bar4x" ; path=/foo/bar4; Access_type = RO;
Pseudo = "/foo/barNONONO" ; not_Path=/foo/barNONONO; Access_type = RO;
Path = /foo/bar5
Pseudo = "/foo/bar6x Path=foo" ; Path=/foo/bar6; Access_type = RO
This is probably the best that can be done within a shell script.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
Exports may be contained in an include file rather than the top-level
ganesha.conf.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
An IP address is passed to these actions.
Reported-by: Arnab Tah <atah@ddn.com>
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
This simplifies and removes a bad hack. Also, in my test environment,
it also drops the average time take to run an add-client/del-client
pair from ~0.055s to ~0.030s.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
Take advantage of new function find_statd_sm_dir() when clearing the
local system statd state directory, so it uses the correct directory
when running on a non-RH distro.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
For add-client and del-client, statd-callout is called by rpc.statd,
which runs as rpcuser, statd or some other non-root system user. This
means that add-client and del-client can't write in the statd-callout
state directory if it is only writable by root. rpc.statd must be
able to write to its own local system statd state directory, so find
this directory and use it as a reference to set the ownership of
CTDB's statd-callout state directory.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
rpc.statd runs statd-callout as a non-root user, which is currently
hacked around using some sudo logic that fails to work in some
contexts (e.g. in a container).
Use $CTDB_MY_PUBLIC_IPS_CACHE to access the node's currently assigned
public IPs, for add-client/del-client. This avoids connecting to
ctdbd when called from rpc.statd.
Also, use $CTDB_MY_PUBLIC_IPS_CACHE in other places where it makes
sense.
Connections to ctdbd are still made in the "notify" action, but this
is always run as root.
In the test code, set the PNN after public addresses setup so that the
cache of assigned IPs correctly initialised.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
This is called in a couple of places without an argument, so give it a
default.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
This is way more complicated than I would like but, as per the
comment, this is due to complexities in the way public IPs work. The
main consumer will be statd-callout, which will then be able to run as
a non-root user.
Also generate the cache file in test code, whenever the PNN is set.
However, this can cause "ctdb ip" to generate a fake IP layout before
public IPs are setup. So, have the "ctdb ip" stub generate the IP
layout every time it is run to avoid it being stale.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
Add new variables statd_callout_state_dir and statd_callout_queue_dir
- the latter is for files queued by add-client/del-client.
Use $statd_callout_queue_dir to avoid a global cd to the queue
directory near the top of the script.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
All of the other uses of ctdb.tdb are in statd-callout.
New variable statd_callout_db makes it easy to change the database
name in future, perhaps even allowing it to be configurable.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
Tweak some lines to avoid overflowing 80 columns.
Best viewed with "git show -w".
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
This avoids a compilation error:
../../ctdb/protocol/protocol_util.c: In function ‘ctdb_connection_list_read’:
../../ctdb/protocol/protocol_util.c:787:9: error: ‘ret’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
787 | return ret;
| ^~~
Signed-off-by: Jo Sutton <josutton@catalyst.net.nz>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Using xargs with sort -u to de-duplicate this list was my idea and
causes a couple of things to go wrong. The use of xargs causes
double-quotes to be lost. The resulting $public_ifaces value also
contains newlines. The newlines could be removed with an additional
xargs at the end of the pipeline... but that would add an extra level
of quote stripping.
I have unsuccessfully tried to find an alternative, but still elegant,
command pipeline that de-duplicates the list, while maintaining
quoting.
So, just drop the de-duplication.
This might make interface_ifindex_exists_with_options() slightly less
efficient. However, that function walks the whole list, only
terminating early when a match is found on both interface and options,
so at least it will be correct.
Include an extra testcase.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Thu Apr 18 09:08:34 UTC 2024 on atb-devel-224
This was an implementation of getline(3), use that instead.
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Martin Schwenke <mschwenke@ddn.com>
This is the only user of common/line.[ch], which can go next.
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Martin Schwenke <mschwenke@ddn.com>
get_all_interfaces() functions gets all names for all public interfaces.
However name is misleading. Thus renamed it to get_public_ifaces() and
moved it under functions.
Signed-off-by: Vinit Agnihotri <vagnihotri@ddn.com>
Reviewed-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
We have the same function in tevent, no need to duplicate code.
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
nl->srvid is uint64_t, as is the srvid parameter of ctdb_daemon_send_message()
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Volker Lendecke <vl@samba.org>
Autobuild-Date(master): Wed Mar 13 08:43:16 UTC 2024 on atb-devel-224
These were generated by 06.nfs.script.
Signed-off-by: Vinit Agnihotri <vagnihotri@ddn.com>
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
Autobuild-User(master): Volker Lendecke <vl@samba.org>
Autobuild-Date(master): Wed Mar 6 07:09:06 UTC 2024 on atb-devel-224
Event scripts run the "start_ipreallocate" hook in order to notice
that some ip addresses in the cluster potentially changed.
CTDB_SRVID_START_IPREALLOCATE gives C code a chance to get notified as well
once the event scripts are finished.
Signed-off-by: Vinit Agnihotri <vagnihotri@ddn.com>
Reviewed-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
Trigger a "startipreallocate" event, but only if in RUNNING runstate.
"startipreallocate" is intended to allow an NFS server to be put into
grace on all nodes before any locks are released as part of releaseip
during failover. If node A is leader and initiates a takeover run
then node B may be connected/active but may not have completed
startup. In this case, the attempt to put NFS-Ganesha into grace on
node B will fail, startipreallocate will fail, and the node will be
banned.
Signed-off-by: Vinit Agnihotri <vagnihotri@ddn.com>
Reviewed-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
A new event is needed for NFS lock reclaim to ensure all nodes are in
grace before any locks are released. This event must take place before
releaseip.
Signed-off-by: Vinit Agnihotri <vagnihotri@ddn.com>
Reviewed-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
The canonical versions are in protocol utils.
These were unused apart from some stray forward declarations in
tools/ctdb.c and a single call in ctdb_set_runstate(), where
ctdb_runstate_to_string() can be used instead.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
ctdb_eventscript_call_names() will be dropped so the mapping between
events and strings is only maintained in one place.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
rb_test_001.sh runs for 60s even though rb_tree.c is almost never
modified. This generally extends test time by an unreasonable amount
of time.
Add an optional timeout (in seconds) argument to rb_test, defaulting
to 60, and pass 5 from rb_test_001.sh. If anyone ever significantly
updates rb_tree.c then they can run rb_test directly with its default
60s timeout... or for as long as they like.
Reported-by: Volker Lendecke <vl@samba.org>
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
Autobuild-User(master): Volker Lendecke <vl@samba.org>
Autobuild-Date(master): Thu Feb 29 13:20:40 UTC 2024 on atb-devel-224
CTDB_CONTROL_TCP_CLIENT_DISCONNECTED and
CTDB_CONTROL_TCP_CLIENT_PASSED were added in commits
c6602b686b and
037e8e449d. They were missing test
support for the packet push/pull. While adding the testing (for
completeness, before adding another new control) I noticed that the
push functionality was absent. This adds that, along with the test
support.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=15580
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
Autobuild-User(master): Volker Lendecke <vl@samba.org>
Autobuild-Date(master): Mon Feb 19 10:21:48 UTC 2024 on atb-devel-224
If someone wants to enable the witness service
samba-dcerpcd needs to be started as standalone service
BUG: https://bugzilla.samba.org/show_bug.cgi?id=15577
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Guenther Deschner <gd@samba.org>
We can easily monitor if the service is running at all,
that better than no monitoring at all...
BUG: https://bugzilla.samba.org/show_bug.cgi?id=15577
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Guenther Deschner <gd@samba.org>
"addip"/"delip" are different from "moveip" so they don't need to
call ipreallocate() nor send_ipreallocated_control_to_nodes().
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
This matches the behavior of takeover_send/recv() from
ctdb_takeover_helper.c.
It means we consistently call the ipreallocated event scripts
and also send CTDB_SRVID_IPREALLOCATED after moving ips.
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
That makes the behavior more consistent compared to a takeover run
started from the within ctdbd.
The behavior is the same but ctdb_message_disable_ip_check() used
a legacy code path and the next commits will also touch some
of the moveip logic...
The logic and comments are copied from control_reloadips().
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Event scripts run the "ipreallocated" hook in order to notice that some ip addresses
in the cluster potentially changed.
CTDB_SRVID_IPREALLOCATED gives C code a chance to get notified as well once the event
scripts are finished.
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Modernise debug while touching the code.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=15523
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
Autobuild-User(master): Stefan Metzmacher <metze@samba.org>
Autobuild-Date(master): Fri Dec 15 12:09:21 UTC 2023 on atb-devel-224
The one case that is no longer handled specially is when the
destination address is IPv4 loopback. This may previously have been
used to avoid flooding the logs when testing. However, that seems
unnecessary - if testing with 127.0.0.1 then make it a public address.
Modernise debug while touching the code.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=15523
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
With multichannel a tcp connection is registered first with
a temporary smbd process, that calls CTDB_CONTROL_TCP_CLIENT
first and then passes the tcp connection to the longterm smbd
that already handles all connections belonging to the specific
client_guid. That smbd process calls CTDB_CONTROL_TCP_CLIENT
again, but the 'tickle' information is already there.
When the temporary smbd process exists/disconnects from ctdb
or calls CTDB_CONTROL_TCP_CLIENT_DISCONNECTED, the 'tickle'
information is removed, while the longterm smbd process
still serves the tcp connection.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=15523
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
With multichannel a ctdb connection from smbd may hold multiple
tcp connections, which can be disconnected before the smbd
process terminates the whole ctdb connection, so we a
way to remove undo 'CTDB_CONTROL_TCP_CLIENT' again.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=15523
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
We could also remove the src_addr and dest_addr helper variables
completely, but that would be too much for this commit.
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
/etc/os-release is quite universal. It can be found on most Linux
distros and on FreeBSD.
Attempt to use /etc/os-release to detect Red Hat, SUSE and Debian
based distros. If /etc/os-release exists but distro is unknown then
$ID is printed as the detected distro, which will probably result in
sub-optimal behaviour, but when tracing it will at least indicate that
a new distro needs to be handled.
The only way to handle missing /etc/os-release is to set
CTDB_INIT_STYLE - see ctdb.sysconfig(5) for details.
The event script unit tests are updated to use /etc/os-release so
the new logic is exercised.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Mon Oct 30 09:19:11 UTC 2023 on atb-devel-224
With a file "home_nodes" next to "public_addresses" you can assign
public IPs to specific nodes when using the deterministic allocation
algorithm. Whenever the "home node" is up, the IP address will be
assigned to that node, independent of any other deterministic
calculation. The line
192.168.21.254 2
in the file "home_nodes" assigns the IP address to node 2. Only when
node 2 is not able to host IP addresses, 192.168.21.254 undergoes the
normal deterministic IP allocation algorithm.
Signed-off-by: Volker Lendecke <vl@samba.org>
add home_nodes
Reviewed-by: Ralph Boehme <slow@samba.org>
Autobuild-User(master): Volker Lendecke <vl@samba.org>
Autobuild-Date(master): Tue Oct 10 14:17:19 UTC 2023 on atb-devel-224
This effectively provides simple testing for the threshold-based
approach.
Add new script option CTDB_VSFTPD_MONITOR_THRESHOLDS.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue Oct 3 04:53:38 UTC 2023 on atb-devel-224
This can be used for simple failure counting, without restarts, as
used in the 40.vsftpd event script. That case will subsequently be
converted and this functionality can also be used elsewhere.
Add documentation to ctdb-script.options(5) to allow parameters that
use this to be more easily described.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Uninitialised counters are treated as 0, but still produce an error.
The redirect to stderr needs to come before the redirect for a missing
counter file.
The seemingly saner alternative of moving it outside the subshell
works when dash is /bin/sh (e.g. on Debian) but does not work when
bash is /bin/sh (e.g. on Fedora).
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
A subsequent commit will add a new section, which looks out of place
without these new sections.
Best reviewed with "git show -w".
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This will allow Unicode characters to be used, resulting in more
readable source files.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Commit 19c82c19c0 changed the behaviour
of prctl_set_comment() so it now calls setproctitle(3bsd) by default.
In some Linux distributions (e.g. Rocky Linux 8.8), this results in
messages like this spamming the logs:
ctdbd: setproctitle not initialized, please either call setproctitle_init() or link against libbsd-ctor.
Most Samba daemons seem to call setproctitle_init(), so do it here.
In the longer term CTDB should also switch to using lib/util's
process_set_title(), like the rest of Samba, for more flexible process
names.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=15479
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Ralph Boehme <slow@samba.org>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Thu Sep 21 00:46:50 UTC 2023 on atb-devel-224
Fix a problem where ctdb_killtcp (almost always) fails to capture
packets with --enable-pcap and libpcap ≥ 1.9.1. The problem is due to
a gradual change in libpcap semantics when using
pcap_get_selectable_fd(3PCAP) to get a file descriptor and then using
that file descriptor in non-blocking mode.
pcap_set_immediate_mode(3PCAP) says:
pcap_set_immediate_mode() sets whether immediate mode should be set
on a capture handle when the handle is activated. In immediate
mode, packets are always delivered as soon as they arrive, with no
buffering.
and
On Linux, with previous releases of libpcap, capture devices are
always in immediate mode; however, in 1.5.0 and later, they are, by
default, not in immediate mode, so if pcap_set_immediate_mode() is
available, it should be used.
However, it wasn't until libpcap commit
2ade7676101366983bd4f86bc039ffd25da8c126 (before libpcap 1.9.1) that
it became a requirement to use pcap_set_immediate_mode(), even with a
timeout of 0.
More explanation in this libpcap issue comment:
https://github.com/the-tcpdump-group/libpcap/issues/860#issuecomment-541204548
Do a configure check for pcap_set_immediate_mode() even though it has
existed for 10 years. It is easy enough.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=15451
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue Aug 15 10:53:52 UTC 2023 on atb-devel-224
A subsequent commit will insert an additional call before
pcap_activate().
This sequence of calls is taken from the source for pcap_open_live(),
so there should be no change in behaviour.
Given the defaults set by pcap_create_common(), it would be possible
to omit the calls to pcap_set_promisc() and pcap_set_timeout().
However, those defaults don't seem to be well documented, so continue
to explicitly set everything that was set before.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=15451
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Factor out a failure label, which will get more use in subsequent
commits, and only set private_data when success is certain.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=15451
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
>>> CID 1539212: Control flow issues (NO_EFFECT)
>>> This greater-than-or-equal-to-zero comparison of an unsigned value is always true. "p >= 0UL".
216 while (p >= 0 && output[p] == '\n') {
This is a real problem in the unlikely event that the output contains
only newlines.
Fix the issue by using a pointer and add a test to cover this case.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=15438
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Multi-line output currently prints like this:
OUTPUT: aaa
bbb
ccc
This is less beautiful than it could be.
Instead, print multi-line output with no inlining and each line
indented:
OUTPUT:
aaa
bbb
ccc
However, continue to inline single line output:
OUTPUT: foo
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
When event scripts succeed they generally produce no output. However,
when a script succeeds and produces output, such output almost
certainly contains warnings. So, always print script output.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Errors logged when testing statd-callout don't currently go anywhere.
This is because arguments to the hacked version of script_log() are
ignored.
Remove the hack and configure logging to stderr.
This could go in the local statd-callout.sh setup script. However,
make it available for other script tests.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Wed Jul 19 09:57:37 UTC 2023 on atb-devel-224
Logging in statd-callout tests is currently useless. This will
provide a way of seeing errors in those tests.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
usecs is going to be passed as a uint32_t. There is no need to
calculate it as a time_t.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
On some platforms, egrep prints a deprecation warning to stderr:
egrep: warning: egrep is obsolescent; using grep -E
Use grep -E instead.
This is nice and simple, so no use splitting this commit into 2
separate commits for each of tools and test.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Loading tunables is now done in ctdbd, so find another example for the
"setup" event.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
It will be in the git history if we ever decide to use SCSI persistent
reservations as a cluster lock.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This fixes a little thinko in commit
80de84d36e, where this was overlooked.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Mon Jul 10 15:15:06 UTC 2023 on atb-devel-224
DEBUG level logging in ctdb_killtcp is very noisy. The most important
messages when debugging are those for tickle ACKs and TCP RSTs. TCP
RSTs are already logged at INFO level, so promote tickle ACKs to INFO
level too.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
NOTICE level debug messages in common/run_event.c are not logged by
default.
Currently eventd ends up using ERROR, since this is specified as
LOGGING_LOG_LEVEL_DEFAULT. It doesn't inherit the debug level from
ctdbd and only uses NOTICE level when interactive.
Change the real logging default to NOTICE and use it everywhere.
Followups might be:
* Remove the default_log_level argument to logging_conf_init()
* Kick eventd to update debug level when "ctdb setdebug" is used
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Andreas Schneider <asn@samba.org>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Wed Jul 5 12:16:57 UTC 2023 on atb-devel-224
These are all trivial, so handle them in bulk.
* Change code to avoid (approximately sorted by frequency):
SC2004 $/${} is unnecessary on arithmetic variables.
SC2086 Double quote to prevent globbing and word splitting.
SC2162 read without -r will mangle backslashes.
SC2254 Quote expansions in case patterns to match literally rather than as a glob.
SC2154 (warning): <variable> is referenced but not assigned.
SC3037 (warning): In POSIX sh, echo flags are undefined.
SC2016 (info): Expressions don't expand in single quotes, use double quotes for that.
SC2069 (warning): To redirect stdout+stderr, 2>&1 must be last (or use '{ cmd > file; } 2>&1' to clarify).
SC2124 (warning): Assigning an array to a string! Assign as array, or use * instead of @ to concatenate.
SC2166 (warning): Prefer [ p ] && [ q ] as [ p -a q ] is not well defined.
SC2223 (info): This default assignment may cause DoS due to globbing. Quote it.
* Locally disable checks:
SC2034 (warning): <variable> appears unused. Verify use (or export if used externally).
SC2086 (info): Double quote to prevent globbing and word splitting. [once]
SC2120 (warning): <function> references arguments, but none are ever passed.
SC2317 (info): Command appears to be unreachable. Check usage (or ignore if invoked indirectly).
While touching reads for SC2162, switch unused variables to "_"
instead of "_x", which seems to be preferred by ShellCheck.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Andreas Schneider <asn@samba.org>
SC2059 (info): Don't use variables in the printf format string. Use printf '..%s..' "$foo".
Move the format string to the function and just parameterise the share
type.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Andreas Schneider <asn@samba.org>
In ./tests/UNIT/eventscripts/scripts/local.sh line 328:
echo $(ctdb ifaces -X | awk -F'|' 'FNR > 1 {print $2}')
^-- SC2046 (warning): Quote this to prevent word splitting.
^-- SC2005 (style): Useless echo? Instead of 'echo $(cmd)', just use 'cmd'.
Use xargs to get output on 1 line.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Andreas Schneider <asn@samba.org>
This generates ShellCheck warnings:
In ./tests/UNIT/eventscripts/scripts/60.nfs.sh line 412:
if [ -n "$service_check_cmd" ]; then
^----------------^ SC2031 (info): service_check_cmd was modified in a subshell. That change might be lost.
In ./tests/UNIT/eventscripts/scripts/60.nfs.sh line 413:
if eval "$service_check_cmd"; then
^----------------^ SC2031 (info): service_check_cmd was modified in a subshell. That change might be lost.
service_check_cmd will never be set here because it is only set in a
sub-shell in rpc_set_service_failure_response().
This reverts some of commit 713ec21750.
If testcases requiring use of service_check_cmd are later added then
this will need to be redone properly. This would probably start by
renaming this function nfs_iterate_rpc_test().
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Andreas Schneider <asn@samba.org>
This is unused since loading tunables was moved to ctdbd.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Andreas Schneider <asn@samba.org>
SC2086 Double quote to prevent globbing and word splitting.
Apparently ShellCheck is more picky about some of these than it used
to be.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Andreas Schneider <asn@samba.org>
New in ShellCheck 0.9.0:
SC2317 (info): Command appears to be unreachable. Check usage (or ignore if invoked indirectly).
Also:
SC2086 (info): Double quote to prevent globbing and word splitting.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Andreas Schneider <asn@samba.org>
New in ShellCheck 0.9.0:
SC2317 (info): Command appears to be unreachable. Check usage (or ignore if invoked indirectly).
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Andreas Schneider <asn@samba.org>
If this codepath is hit, ctdb aborts with:
ctdb/server/ctdb_recovery_helper.c:2687: Type mismatch: name[struct ban_node_state] expected[struct node_ban_state]")
at ../../lib/talloc/talloc.c:505
Fix this by using the correct type.
Signed-off-by: Christof Schmitt <cs@samba.org>
Reviewed-by: Volker Lendecke <vl@samba.org>
Autobuild-User(master): Volker Lendecke <vl@samba.org>
Autobuild-Date(master): Wed May 3 08:04:09 UTC 2023 on atb-devel-224
Best reviewed with: `git show --word-diff`
Signed-off-by: Andreas Schneider <asn@samba.org>
Reviewed-by: Martin Schwenke <mschwenke@ddn.com>
Autobuild-User(master): Andreas Schneider <asn@cryptomilk.org>
Autobuild-Date(master): Fri Mar 24 07:57:37 UTC 2023 on atb-devel-224
When testparm processes the output of "testparm -v" (which includes
default values) it appears to do global checks (or some other sort of
initialisation logic) for all specified values. This includes a DNS
lookup for the node's hostname, as a side-effect of a libldap
ldap_set_option() call when processing "ldap debug level". If DNS
servers are down then this can induce timeouts, possibly resulting in
monitor timeouts.
Avoid this by using sed to extract configuration values from the
testparm cache file.
This is already shown to work when retrieving share paths, where
testparm is basically used as cat. Update the sed pattern to avoid
matching empty values on the right-hand side of the equals ('=') -
this avoids the default empty path value (and "smb ports" never has an
empty value).
Corresponding test changes:
* 50.samba.monitor.111.sh no longer expects a failure from being
unable to set smb ports, since testparm is no longer used in that
code path.
* smb ports needs to be set in fake smb.conf so it is in the default
output and can be extracted using sed.
* Although testparm --parameter-name is no longer used in
50.samba.script, update the stub implementation (in case it is ever
used again) to extract from fake smb.conf, since "smb ports" is now
set there. The change from $parameter to $param allows a long line
to stay below 80 columns.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
Autobuild-User(master): Volker Lendecke <vl@samba.org>
Autobuild-Date(master): Tue Feb 14 08:43:53 UTC 2023 on atb-devel-224
The list changed back to space-separated in commit
93448f4be9, so simplify the code a
little.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
Removes:
* waf pydoctor
* waf wafdocs
* make pydoctor
There is no "make wafdocs" it only appears to be in wscript.
The reasoning being is these are broken and appear to not have been run for some time.
Signed-off-by: Rob van der Linde <rob@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
Autobuild-User(master): Jeremy Allison <jra@samba.org>
Autobuild-Date(master): Thu Feb 2 21:15:54 UTC 2023 on atb-devel-224
One of changes is somewhat interesting, it is "tfork waiter proces"
process title in tfork.c. I wonder why no one noticed this before.
There's another similar process title in there, "tfork waiter process(%d)".
Hopefully no one does grep for "proces$" (and there's no reason to).
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Reviewed-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Rowland Penny <rpenny@samba.org>
Autobuild-User(master): Jeremy Allison <jra@samba.org>
Autobuild-Date(master): Thu Jan 26 20:46:11 UTC 2023 on atb-devel-224
"basename" is define in libgen.h included from system/dir.h
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
If you happen to talloc_free(run_ctx) before all the tevent_req's
hanging off it, you run into the following:
==495196== Invalid read of size 8
==495196== at 0x10D757: run_proc_state_destructor (run_proc.c:413)
==495196== by 0x488F736: _tc_free_internal (talloc.c:1158)
==495196== by 0x488FBDD: _talloc_free_internal (talloc.c:1248)
==495196== by 0x4890F41: _talloc_free (talloc.c:1792)
==495196== by 0x48538B1: tevent_req_received (tevent_req.c:293)
==495196== by 0x4853429: tevent_req_destructor (tevent_req.c:129)
==495196== by 0x488F736: _tc_free_internal (talloc.c:1158)
==495196== by 0x4890AF6: _tc_free_children_internal (talloc.c:1669)
==495196== by 0x488F967: _tc_free_internal (talloc.c:1184)
==495196== by 0x488FBDD: _talloc_free_internal (talloc.c:1248)
==495196== by 0x4890F41: _talloc_free (talloc.c:1792)
==495196== by 0x10DE62: main (run_proc_test.c:86)
==495196== Address 0x55b77f8 is 152 bytes inside a block of size 160 free'd
==495196== at 0x48399AB: free (vg_replace_malloc.c:538)
==495196== by 0x488FB25: _tc_free_internal (talloc.c:1222)
==495196== by 0x488FBDD: _talloc_free_internal (talloc.c:1248)
==495196== by 0x4890F41: _talloc_free (talloc.c:1792)
==495196== by 0x10D315: run_proc_context_destructor (run_proc.c:329)
==495196== by 0x488F736: _tc_free_internal (talloc.c:1158)
==495196== by 0x488FBDD: _talloc_free_internal (talloc.c:1248)
==495196== by 0x4890F41: _talloc_free (talloc.c:1792)
==495196== by 0x10DE62: main (run_proc_test.c:86)
==495196== Block was alloc'd at
==495196== at 0x483877F: malloc (vg_replace_malloc.c:307)
==495196== by 0x488EAD9: __talloc_with_prefix (talloc.c:783)
==495196== by 0x488EC73: __talloc (talloc.c:825)
==495196== by 0x488F0FC: _talloc_named_const (talloc.c:982)
==495196== by 0x48925B1: _talloc_zero (talloc.c:2421)
==495196== by 0x10C8F2: proc_new (run_proc.c:61)
==495196== by 0x10D4C9: run_proc_send (run_proc.c:381)
==495196== by 0x10DDF6: main (run_proc_test.c:79)
This happens because run_proc_context_destructor() directly does a
talloc_free() on the struct proc_context's and not the enclosing
tevent_req's. run_proc_kill() makes sure that we don't follow
proc->req, but it forgets the "state->proc", which is free()'ed, but
later dereferenced in run_proc_state_destructor().
This is an attempt at a quick fix, I believe we should convert
run_proc_context->plist into an array of tevent_req's, so that we can
properly TALLOC_FREE() according to the "natural" hierarchy and not
just pull an arbitrary thread out of that heap.
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Volker Lendecke <vl@samba.org>
Autobuild-Date(master): Thu Oct 6 15:10:20 UTC 2022 on sn-devel-184
Add simple support for IPoIB via DLT_LINUX_SLL and DLT_LINUX_SLL2.
This seems to work, even when an IB interface is specified.
If this is later found to be insufficient, support for DLT_IPOIB can
be implemented. See https://www.tcpdump.org/linktypes.html for a
starting point.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This uses Linux cooked capture link-layer headers. See:
https://www.tcpdump.org/linktypes/LINKTYPE_LINUX_SLL.htmlhttps://www.tcpdump.org/linktypes/LINKTYPE_LINUX_SLL2.html
The header type needs to be checked to ensure the protocol
type (i.e. ether type, for the protocols we might be interested in) is
meaningful. The size of the header needs to be known so it can be
skipped, allowing the IP header to be found and parsed.
It would be possible to define support for DLT_LINUX_SLL2 if it is
missing. However, if a platform is missing support in the header file
then it is almost certainly missing in the run-time library too.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The current code will almost certainly generate ENOMSG for
non-ethernet packets, even for ethernet packets when the "any"
interface is used.
pcap_datalink(3PCAP) says:
Do NOT assume that the packets for a given capture or ``savefile``
will have any given link-layer header type, such as DLT_EN10MB for
Ethernet. For example, the "any" device on Linux will have a
link-layer header type of DLT_LINUX_SLL or DLT_LINUX_SLL2 even if
all devices on the sys‐ tem at the time the "any" device is opened
have some other data link type, such as DLT_EN10MB for Ethernet.
So, pcap_datalink() must be used.
Detect pcap packet types that are supported (currently only ethernet)
in the open code. There is no use continuing if the read code can't
parse packets. The pattern of using switch statements supports future
addition of other packet types.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>