IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
CTDB_STATD_CALLOUT_SHARED_STORAGE is a new configuration variable
indicating where statd-callout should store its NFS client locking
data. See the update to ctdb-script.options(5) for details.
This adds back functionality that was removed in commit
12cc826231. The commit message doesn't
say why this was changed but it was most likely due to a cluster
filesystem hanging at inopportune times. Hence, this is re-added as a
non-default option. There are 2 justifications for re-adding it:
* The existing method (persistent_db) relies on dequeuing data during
the monitor event, which loses any queued data on node crash.
* NFS-Ganesha writes NFSv4 client locking data to a cluster
filesystem, by default. Something similar might as well exist for
NFSv3.
Note that this could create the files for sm-notify in add-client.
However, this would require an alternate implementation of
send_notifies() (or a change to the implementation for persistent_db
too). It seems better to leave add-client lightweight and do the work
in notify, since add-client is a more frequent operation.
Unconditionally create the state directory on startup. This is
currently implicitly created for persistent_db when the queue
directory is created. However, it isn't created anywhere else for
shared_dir, so do it in a common place.
In test mode, the shared storage location has a prefix added so files
are created within the test environment.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
SM_NOTIFYs are sent before client records are deleted. Theoretically,
this means new records resulting from lock reclaim can be deleted.
This doesn't actually happen at the moment because any new "records"
resulting from lock reclaim are entered into the queue directory and
only dequeued to the database during a later monitor event. Since a
monitor event can't collide with an ipreallocated event, no records
can be dequeeued into the database during the ipreallocated event, so
they can't be deleted by delete_records().
However, a subsequent commit will add direct writing of records into a
shared cluster filesystem directory. This means that add-client
events will cause records to be added directly to that directory so,
without a fix, the race will be able to occur.
So, delete records before sending SM_NOTIFYs. In theory, the script
could be killed before all SM_NOTIFYs are successfully sent, resulting
in loss of locks. However, given the overall lack of error checking,
there are other, more likely problems.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This captures all of the persistent database (currently ctdb.tdb)
implementation-specific details in functions. Alternate
implementations can now be easily added.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Drop the complexity associated with using awk to escape dots in IPv4
addresses to protect them from sed, and generate a grep -F filter
instead.
For listing, the pipeline is now longer, but the steps are now
clearer:
1. List DB records
2. Extract keys
3. Keep only keys machine hosted public IPs
4. Parse out server IP and client IP
5. Sort
Performance here isn't critical, so having clearer code is preferable.
Use temporary files to avoid command-line length limits.
Also, drop the cd to the queue directory during update.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Commits caad5dc38d and
f022df1d40 commented out these lines
back in 2007.
2 things are clear from the commit messages:
* These setting should not be required in the real world - they are:
mainly useful for avoiding ack-storms when doing very rapid
failover/failback during testing
* If they are needed, they are not specific to
statd_callout/statd_callout_helper
Let's remove these comments to avoid confusing people.
Reported-by: Ulrich Sibiller <ulrich.sibiller@eviden.com>
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The top comment in the file is no longer true.
The comment about notifications doesn't really apply anymore since
upstream sm-notify is used and it does "the right thing".
shfmt wants to remove a space before a semicolon, so do that too.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This was passed to CTDB's old smnotify. This has been replaced by use
of nfs-utils' sm-notify, which doesn't need this.
In test, a fake NFS_HOSTNAME is still needed. Real sm-notify will get
it from a reverse host lookup of the IP address.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
CTDB's smnotify does not support IPv6 and is difficult to maintain.
So, create directories of files and pass them to NFS util's sm-notify.
There is an implied change here, because NFS utils sm-notify stopped
sending IP addresses as mon_name back in 2010:
http://git.linux-nfs.org/?p=steved/nfs-utils.git;a=commitdiff;h=900df0e7c0b9006d72d8459b30dc2cd69ce495a5
This will change advice given in the wiki to use a hostname for the
cluster with round-robin DNS, since this is what is best supported.
Another behavioural change is that sm-notify only sends "up"
notifications with an odd state.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Similar to the recent changes to the ctdb server code, add the ability
to load the nodes from a subprocess stdout.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
rpc.statd is single-threaded and runs its HA callout synchronously. If
it is too slow then latency accumulates and rpc.statd's backlog grows.
Running a pair of add-client/del-client events with the current code
averages ~0.030s in my test environment. This mean that 1000 clients
reclaiming locks after failover can easily cause 10s of latency. This
could cause rpc.statd to become unresponsive, resulting in a time out
for an rpcinfo-based health check of the status service.
Split the add-client/del-client events out to a standalone
statd_callout executable, written in C, to be used as the HA callout
for rpc.statd. All other functions move to statd_callout_helper.
Now, running a pair of add-client/del-client events in my test
environment averages only ~0.002s. This seems less likely to cause
latency problems.
The standalone statd_callout executable needs to read a configuration
file, which is generated by statd_callout_helper from the "startup"
event. It also needs access to a list of currently assigned public
IPs.
For backward compatibility, during installation a symlink is created
from $CTDB_BASE/statd-callout to the new statd_callout, which is
installed in the helper directory.
Testing this as part of the eventscript unit tests starts to become
even more of a hack than it used to be. However, the dependency on
stubs and the corresponding setup of fake state makes it hard to move
this elsewhere.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Tue Jun 25 04:24:57 UTC 2024 on atb-devel-224
The canonical versions are in protocol utils.
These were unused apart from some stray forward declarations in
tools/ctdb.c and a single call in ctdb_set_runstate(), where
ctdb_runstate_to_string() can be used instead.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
"addip"/"delip" are different from "moveip" so they don't need to
call ipreallocate() nor send_ipreallocated_control_to_nodes().
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
This matches the behavior of takeover_send/recv() from
ctdb_takeover_helper.c.
It means we consistently call the ipreallocated event scripts
and also send CTDB_SRVID_IPREALLOCATED after moving ips.
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
That makes the behavior more consistent compared to a takeover run
started from the within ctdbd.
The behavior is the same but ctdb_message_disable_ip_check() used
a legacy code path and the next commits will also touch some
of the moveip logic...
The logic and comments are copied from control_reloadips().
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
On some platforms, egrep prints a deprecation warning to stderr:
egrep: warning: egrep is obsolescent; using grep -E
Use grep -E instead.
This is nice and simple, so no use splitting this commit into 2
separate commits for each of tools and test.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
DEBUG level logging in ctdb_killtcp is very noisy. The most important
messages when debugging are those for tickle ACKs and TCP RSTs. TCP
RSTs are already logged at INFO level, so promote tickle ACKs to INFO
level too.
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
New in ShellCheck 0.9.0:
SC2317 (info): Command appears to be unreachable. Check usage (or ignore if invoked indirectly).
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Andreas Schneider <asn@samba.org>
In particular, knowing the reason fetching the packet fails can help
with debugging unsupported protocols in the pcap code.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
For example:
In /home/martins/samba/samba/ctdb/tools/onnode line 304:
[ "$nodes" != "${nodes%[ ${nl}]*}" ] && verbose=true
^---^ SC2295 (info): Expansions inside ${..} need to be quoted separately, otherwise they match as patterns.
Did you mean:
[ "$nodes" != "${nodes%[ "${nl}"]*}" ] && verbose=true
For more information:
https://www.shellcheck.net/wiki/SC2295 -- Expansions inside ${..} need to b...
Who knew? Thanks ShellCheck!
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
When a node is starting, CTDB reports remote nodes as unhealthy by
default. This can be misleading.
To hide this, report an "UNKNOWN" pseudo state when a remote node is
not disconnected and the runstate is less than or equal to
"FIRST_RECOVERY".
Signed-off-by: Vinit Agnihotri <vagnihotri@ddn.com>
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The changes are made to replace the deprecated network commands
(ifconfig,netstat) with the new commands
(ip addr,ss) respectively
Signed-off-by: Archana Chidirala <archana.chidirala.chidirala@ibm.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Volker Lendecke <vl@samba.org>
Autobuild-Date(master): Tue Mar 8 12:30:53 UTC 2022 on sn-devel-184
The following command names are changed:
recmaster -> leader
setrecmasterrole -> setleaderrole
Command output changed for the following commands:
status
getcapabilities
Documentation and tests are updated to reflect these changes.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This seems pointless but it localises a subsequent change and also
starts a terminology change in the tool code.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Note that there a change from broadcast to a directed control here.
This is OK because the recovery master will push flags if any nodes
disagree with the canonical flags fetched from a node.
Static function ctdb_ctrl_modflags() is no longer used to drop it.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14784
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This isn't used anywhere and can easily be checked via "ctdb pnn" and
"ctdb recmaster" commands.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Signed-off-by: Ralph Boehme <slow@samba.org>
Reviewed-by: Swen Schillig <swen@linux.ibm.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
Autobuild-User(master): Jeremy Allison <jra@samba.org>
Autobuild-Date(master): Mon Aug 3 22:21:04 UTC 2020 on sn-devel-184
Instead of master/slave.
Nearly all of these are simple textual substitutions, which preserve
the case of the original. A couple of minor cleanups were made in the
documentation (such as "LVSMASTER" -> "LVS leader").
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Instead of master/slave.
Nearly all of these are simple textual substitutions, which preserve
the case of the original.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Drop some unnecessary whitespace and re-indent push().
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Options can be set in ONNODE_SSH, so this variable is unnecessary.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
If ctdbd hangs when shutting down in post-test clean-up then killing
the process group can kill the test. When in test mode, create a
process group but only in the top-level ctdb tool - the natgw and lvs
helpers also run the ctdb tool.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This covers the following:
SC2166: Prefer [ p ] && [ q ] as [ p -a q ] is not well defined.
SC2166: Prefer [ p ] || [ q ] as [ p -o q ] is not well defined.
POSIX agrees that -a and -o should not be used:
https://pubs.opengroup.org/onlinepubs/9699919799/utilities/test.html
Fixing these doesn't cause much churn.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Commit 90de5e0594 remove the processing
for this option but forgot to remove it from the getopts command.
Versions of ShellCheck >= 0.4.7 warn on this, so it is worth fixing.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14086
RN: Fix onnode test failure with ShellCheck >= 0.4.7
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Regression introduced by commit
2558f96da1. count should be signed
because list_of_connected_nodes() returns -1 on failure. Variable i
is used in both signed and unsigned contexts, so add new signed
variable j for use in signed context.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This has been broken for 10 years since commit
9616959bd6, which introduced the
separate filtering. This commit was missing a redirect of the output
of stderr_filter() to stderr.
Since nobody depends on the separate filtering (i.e. nobody reported a
bug), just return to combining stdout and stderr, and filtering them
together.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This filter no longer does anything useful in this context. By
default it adds a pipeline with trailing cat process. In many
contexts, stdout of the process being run is still open so the cat
process will stay around and will stop onnode from exiting.
The filters should all go away because they are simply an example of
code that is trying to be too clever while causing unfortunate corner
cases.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>