IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Also add and update tests for statd stack dumps. Update the existing
60.ganesha statd test to do more iterations. Duplicate the result as
a new test for 60.nfs.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
In the process, fix a bug where an extra trace would be printed.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Some implementations may not understand RC3164 format messages on the
UDP socket, so add support for RFC5424 message format.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This has most of the advantages of the old logd with none of the
complexity of the extra process. There are several good syslog
implementations that can listen on the UDP port.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Remove --logfile and --syslog daemon options and replace with
--logging.
Modularise and clean up logging initialisation code. The
initialisation API includes an app_name argument that is currently
unused - this will be used in extensions to the syslog backend.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
As far as we know, nobody uses this and it just complicates the
logging subsystem.
Remove all ringbuffer code and documentation. Update the local
daemons startup code correspondingly.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Volker Lendecke <vl@samba.org>
The major and minor device numbers are hexadecimal not decimal.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Thu Sep 25 07:19:59 CEST 2014 on sn-devel-104
Variables that are not set but exported, may return an empty string
for getenv(). Tested on freebsd.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Wed Sep 17 09:55:47 CEST 2014 on sn-devel-104
The current check is incorrect in 2 ways:
* Commit be71a84565 contained a thinko
that stops virtio_net interfaces from simply being marked up
* virtio_net interfaces can actually be down
virtio_net has supported ethtool since Linux 2.6.29, so just remove
the special case. This means that testing CTDB on very old virtual
machines is not supported.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Thu Jul 31 13:08:47 CEST 2014 on sn-devel-104
This was used to limit damage in the "recovered" event.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue Jul 29 10:03:16 CEST 2014 on sn-devel-104
This event was introduced to handle misconfiguration. For example,
where all nodes where configured as NAT gateway slaves.
However, this event can fail when there are performance issues and
capabilities can't be retrieved from a remote node. The problem is
most likely with the remote node, so marking the local node UNHEALTHY
is probably a mistake.
Having a NAT gateway master node only matters in "ipreallocated", so
leave it to do the checking. Given that a node will run
"ipreallocated" as part of the first recovery, this should cause
misconfigurations to be detected nice and early.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Need to be able to recognise a RHEL system. Still use "system" to
start and stop service, since that still works and yields the smallest
change.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Debugging can still be running when a monitor event times out and
scriptstatus output changes.
When debugging a hung script to a log file, write to a temporary file
and move the temporary file over the log file when done. The test
then waits for the log file to appear.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Thu Jul 3 08:19:23 CEST 2014 on sn-devel-104
There shouldn't be an early exit for the "init" event. Just make the
"ctdb scriptstatus" call conditional.
While here, move the comment about only running a single instance to
be near locking code. The comment is more useful there.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Check that the $GANRECDIR symlink points to the location specified by
$CTDB_GANESHA_REC_SUBDIR and replace it if incorrect. This handles
reconfiguration and filesystem changes.
While touching this code:
* Create the $GANRECDIR link as a separate step if it doesn't exist.
This means there is only 1 place where the link is created.
* Change some variables names to the style used for local function
variables.
* Remove some "ln failed" error messages. ln failures will be logged
anyway.
* Add -v to various mkdir/rm/ln commands so that these actions are
logged when they actually do something.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Fri Jun 20 05:40:16 CEST 2014 on sn-devel-104
Backup and restore of the cluster filesystem can upset the operation
of 60.ganesha by changing the contents of this subdirectory.
Allow this subdirectory to be configured to a subdirectory that is
ignored by backup and restore processes.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Wed Jun 11 09:29:22 CEST 2014 on sn-devel-104
The range
CTDB_PER_IP_ROUTING_TABLE_ID_LOW..CTDB_PER_IP_ROUTING_TABLE_ID_HIGH
should not include 253-255. Otherwise policy routing may overwrite
the default system routing tables.
Add some corresponding tests.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This is the loop variable. It can't be empty, especially given the
way the list is built. This must have survived from an earlier
version of the script.
Given that there are whitespace changes associated with the above,
clean-up the "virtio_net" avoidance check so that it reads less like
line-noise.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Commit 4ee4925d41 forgot about
CTDB_NATGW_SLAVE_ONLY so it introduces an incorrect failure when this
is set, and CTDB_NATGW_PUBLIC_IFACE or CTDB_NATGW_PUBLIC_IP is unset.
Relax the sanity check to see if CTDB_NATGW_SLAVE_ONLY is set.
Update the documentation to explicitly state that
CTDB_NATGW_PUBLIC_IFACE and CTDB_NATGW_PUBLIC_IP are optional and
unused if CTDB_NATGW_SLAVE_ONLY is set. It would be possible to
insist that CTDB_NATGW_PUBLIC_IFACE and CTDB_NATGW_PUBLIC_IFACE should
be unset in that case. However, it is more reasonable to allow
consistent configuration across nodes except with some nodes
configured slave-only.
Add tests, update infrastructure and fix a thinko in the stub's
"natgwlist" implementation.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Mon Apr 14 06:06:49 CEST 2014 on sn-devel-104
Extend CTDB_NATGW_STATIC_ROUTES so that each network can have an
optional gateway that overrides CTDB_NATGW_DEFAULT_GATEWAY.
Signed-off-by: Martin Schwenke <martin@meltin.net>
This has been implied since the command to add the route has had
errors redirected to /dev/null. If infrastucture (e.g. ADS, DNS) is
on the same network as CTDB_NATGW_PUBLIC_IP then no route is
necessary.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Although the dots in $CTDB_NATGW_PUBLIC_IP could probably only help
match an invalid public IP address, this is only executed once so do
as exact a check as possible.
Use CTDB_BASE instead of hardcoding /etc/ctdb.
Make the error message less redundant.
Signed-off-by: Martin Schwenke <martin@meltin.net>
delete_all() really needed renaming for clarity. While doing this,
might as well rename some of the others that don't start with
"natgw_".
Signed-off-by: Martin Schwenke <martin@meltin.net>
NAT gateway really can't operate unless most of the configuration
variables are set.
A check in delete_all() can be removed - strange that this isn't also
done in the add case.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Commit 176ae6c704 caused these functions
to exit on failure. This is incorrect and broke NAT gateway.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
"statd-callout notify" currently complains until an add-client or
del-client is done.
Given that we might use ctdb.tdb for something else in the future it
makes sense attach to it in the "startup" event. This could be done
in the background but it should be so lightweight that a timeout will
indicate serious problems.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This feature was added quite a while ago but was not enabled by
default. It is a useful feature so enable it to dump stack traces of
up to 5 stuck processes by default.
This can be disabled by setting:
CTDB_NFS_DUMP_STUCK_THREADS=0
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue Feb 25 04:06:45 CET 2014 on sn-devel-104
This comment was true when 50.samba was spaghetti because it tried to
automatically manage both smbd (and nmbd) and winbind. It isn't true
anymore.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Wed Feb 19 04:07:12 CET 2014 on sn-devel-104
* Add stack dumps for "interesting" processes that sometimes get
stuck, so try to print stack traces for them if they appear in the
pstree output.
* Add new configuration variables CTDB_DEBUG_HUNG_SCRIPT_LOGFILE and
CTDB_DEBUG_HUNG_SCRIPT_STACKPAT. These are primarily for testing
but the latter may be useful for live debugging.
* Load CTDB configuration so that above configuration variables can be
set/changed without restarting ctdbd.
Add a test that tries to ensure that all of this is working.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
If a primary IP address is being deleted from an interface, the
secondaries are remembered and added back after the primary is
deleted. This is done under a lock shared by the add/del script code.
It is necessary because, by default, Linux deletes secondaries when
the corresponding primary is deleted.
There is a race here between ctdbd and the scripts, since ctdbd
doesn't know about the lock. If ctdbd receives a release IP control
and the IP address is not on an interface then it is regarded as a
"Redundant release of IP" so no "releaseip" event is generated. This
can occur if the IP address in question is a secondary that has been
temporarily dropped. It is more likely if the number of secondaries
is large.
Since Linux 2.6.12 (i.e. 2005) Linux has supported a
promote_secondaries option on interfaces. This option is currently
undocumented but that will change in Linux 3.14. With
promote_secondaries enabled the kernel will not drop secondaries but
will promote a corresponding secondary instead. The kernel does all
necessary locking.
Use promote_secondaries to simplify the code, avoid re-adding
secondaries, avoid re-adding routes and provide improved performance.
This could be done conditionally, with a fallback to legacy
secondary-re-adding code, but no supported Linux distribution is
running a pre-2.6.12 kernel so this is unnecessary.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This adds new files for Ganesha's recovery. myreleaseip_* are used by
the recovery thread on the node where IP is released. The releaseip_*
and tekeip_* files are used by recovery thread where IP is taken over.
Signed-off-by: Srikrishan Malik <srimalik@in.ibm.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Signed-off-by: Srikrishan Malik <srimalik@in.ibm.com>
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Thu Jan 30 11:18:19 CET 2014 on sn-devel-104
Services can be flagged for reconfigure when they release IPs at
shutdown. The flag is never removed and the service is prematurely
reconfigured during the first "ipreallocated" event, before any IPs
are hosted and before the "startup" event has actually started the
services.
$CTDB_VARDIR/state directly contained the service state subdirectories
and is already removed in the "init" event. Just push the service
state subdirectories down a level and put everything else in a
subdirectory.
This way all the eventscript state gets cleaned up every time CTDB
starts up.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Fri Jan 17 09:58:26 CET 2014 on sn-devel-104
Currently the lock is held until the corresponding eventscript
completes, since the process still exists. If the regular part of an
eventscript hangs then the lock might unnecessarily be held for a long
time. The pathological case is when a monitor event gets stuck in
D-wait state and the script times out but can't be killed so the lock
is still held. This can cause an unwanted monitor replay.
Change this so that the lock is released immediately after the
reconfiguration is complete.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
"monitor" events can be cancelled. If a reconfigure action does a
service restart then the "monitor" event can be cancelled at the
inconvenient moment after the service is stopped. In this case the
service stays down and the node may become unhealthy (depending on
whether there are any repair actions in the monitor event).
A long time ago we did service reconfiguration in "monitor" events
following failovers. Service reconfiguration was then moved to the
"ipreallocated" event. However, reconfiguration in "monitor" events
has been kept as a last resort in case an "ipreallocate" event does
not occur. The only important case that this covers is "ctdb
deleteip", where "releaseip" events are generated without a
corresponding "ipreallocated". Therefore, IPs can be deleted without
running the required service reconfiguration.
The supported way of removing IP addresses is now via "ctdb
reloadips", which always causes a takeover run with a corresponding
"ipreallocate" event.
This means that service reconfiguration in "monitor" events is no
longer required and should be removed because it is unsafe.
Also update the associated tests. Make the first confirm that the
monitor event no longer does reconfiguration. Change the others to
test that monitor status is correctly replayed when something else is
doing a reconfigure and currently holds the reconfigure lock.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue Dec 17 06:32:35 CET 2013 on sn-devel-104
If these configuration variables are not defined, then there should
a default fallback. This is a workaround till CTDB compile time
configuration can be accessed at runtime.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
If NFS RPC checks do restart Ganesha, then it's possible that share
check can fail prematurely.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
If $statd_state is empty then the loop will run once and print
spurious errors.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
This prevents spamming of logs if multiple lock requests are waiting
and keep timing out.
Also, improve the logging format with separators.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
This is naive and assumes no performance problems when updating
persistent DBs. It also does no error handling.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
That is, don't use fixed paths.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
The background update is never guaranteed to complete before the cache
is used, so don't bother trying it at the beginning. Instead, put a
timeout on a foreground update.
If the foreground update fails:
* If there's no available cache file then die.
* If there is a previous cache file then use it and log a warning.
* Do a background update at the end of the monitor event.
Also remove commas in the "smb ports" list before use, since (newer?)
testparm seem to insert commas into the default value. Update the
associated test to add a comma.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 8c6f511254ecb0381a609b37e3a0ee6e5ec5d562)
Elsewhere we're moving the socket to /var/run/ctdb. We might end up
with PID files and sockets for other daemons later, so let's call the
directory "ctdb" instead of "ctdbd".
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit b63f6fd2d295c8e18cbf3420ab05fce07b727f31)
Use /var/run/ctdb/ctdbd.socket because there might be other daemons
that need sockets in the future.
The local daemons test code to create a link for the default
convenience socket has to be removed because the link can't be created
as a regular user in the new location. This should be OK since all
calls to the ctdb tool in the test code should be wrapped in onnode.
When debugging tests, a developer will have to set CTDB_SOCKET by
hand.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-programmed-with: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit dc67a4e24af9d07aead2a1710eeaf5d6cc409201)
* It should run on "ipreallocated" instead of "recovered"
* Variable name NODE -> ip since that's what it is
* Simplify some logic
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 45e2bc66abf9fcfeadcc279a656ed7fd1838920a)
Routes only need to be updated when IPs have moved. IP takeover runs
will generate "ipreallocated", which is enough. "recovered" always
follows "ipreallocated" anyway, so avoid the redundancy.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 1152215fc69217e4292762e28d193b7ea0e06ee3)
Any time a node changes flags in any significant way there will be a
takeover run, which will generate an "ipreallocated" event. The
"recovered" event always happens straight after a takeover run so we
update the NAT gateway twice.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 542c70d6281d636ecd51502fbbf219f418bfac66)
There is no reconfigure code for these scripts so no need to check for
reconfiguration.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 41df1637c1d8a7b2f5a9974408db71b1f74cb2f2)
Nothing ever (or has ever) set the "needs reconfigure" flag, so this
code is unnecessary.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 5b77fd95bda5f1960aca952e1b759231890b56f3)
A generic framework is no longer needed now that the "ctdb" checker is
the only one left. Simplify the code.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 044d302b41a2040642355401e3236fcecc3a620a)
"ctdb checktcpport" is no longer experimental so the other checkers
are no longer required.
Remove tests related to the removed checkers.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 50e330d0679614bee2e7bab028436e929f74ca50)
The current setting is inconsistent with settings on most systems,
putting /bin before /sbin. Use of /usr/local/bin, which may be
required on some systems, is also overridden. This can make it
difficult to do interactive debugging of script problems.
Rely on the system PATH instead.
If system-specific changes need to be made then this can be done in a
configuration file.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit cfbff39e22e42f3997f637290748290833525714)
Reduce the complexity, including the depth of background processes.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 49f077c475b078889ff0492fe7d567a64d6cb87c)
Otherwise calls to "ctdb natgwlist" will not behave as expected if a
non-standard file is used, since that command will use the default
file location.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit e574b30257126679704b088c4334a8e7a53a9c3f)
The old logic was actually wrong. If CTDB_LOGFILE is unset then a
default is used, not syslog.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 79e2029f9bc078126e865aa715100a3870c7604b)
Allowing people to put random options in CTDB_OPTIONS complicates some
logic (particularly around use of syslog). If we're going to have
variables for options then let's make sure we have a variable for each
option and make people use them.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit e55f3a1577eff0182802b0341d865d961aeae1c7)
All CTDB configuration variables should start with CTDB_.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit f12658aff125996ae45eea23241d8c3d0567b893)
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 8f660d0dd52013e5876806be908e8e603aa6e968)
This uses potentially insecure temporary files and is not referenced
anywhere else.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 4b914d7e217202f3d11a8e95f9f74bc17869475b)
Anecdotal evidence suggests that most nfsd RPC check failures are due
to cluster filesystem or storage problem. Apparently these are rarely
helped by attempting to restart the NFS service because the restart
tends to hang.
Fail after 2 nfsd RPC check failures, instead of waiting for 6
failures. Restart on every 10th failure to try to bring the node back
to good health.
Update unit tests to match.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit e9ef93f7b6dad59eabaa32124df81f3e74c651ef)
It should print the actual number of consecutive failures rather than
the limit.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit ff5f0d1e29af2b293e30cdc54bed03a644be7038)
This makes the gaps in the logs more obvious.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 11fbf4789d783dd0bac22754b374dd9ea4b03bad)
Passing "localhost" to the rpcinfo command causes overheads, like
reading /etc/services multiple times.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 1d61988af9e4fa3621a3e2d06a859bcb53df2d67)
Also add it to the corresponding eventscript unit test infrastructure.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit f4ef83a256f59eeb00b9a5bc10c28347e1ad1031)
While doing this:
* Explicitly assign RPC program and version information in
_nfs_check_rpc_common(). This is more lines of code but is easier
to read.
* Don't print the options when starting a service. Trying to print it
makes the code messy for little benefit.
Update the eventscript unit testing code and a Ganesha test to
reflect this.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit e8b531405665885196c95fe1608db33a255bf761)
They're hard to maintain and provide very little benefit.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 1a1be43f8466d46913dcdfe6dcedb94316cd28ad)
That is, /dev/null the "stop" output. This is consistent with the way
CTDB generally deals with the output when stopping a service.
It also makes updating the eventscript unit tests easier.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit c7332526b1b488abefeb4be78a7cd3f2f9abc451)
CTDB daemon is not ready to accept clients in INIT runstate (init event).
CTDB daemon will start accepting connections in SETUP runstate (setup event)
and later.
Also, minor log formatting changes.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 81d7ce03b28d592a1337639e14d9ea141e20bfff)
On cluster where recovery lock file is not being used, asking CTDB daemon
is unnecessary overhead. And if CTDB is using recovery file, then changing
configuration without restarting is *stupid*.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-Programmed-With: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 44eb86e6042adb6efe75d2a5528b82a0f21d496d)
This ensures that any invocation of the ctdb tool (within the wrapper)
gets the desired value. This at least ensures that ctdbd will be
started.
If a non-standard value is set for CTDB_SOCKET then command-line users
will still need the variable in their environment.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 37ccc7c6cc43a80aaa92291aea7a438f4225488a)
This avoids issuing multiple "ctdb killtcp" commands to terminate tcp
connections, one per connection. This will considerably reduce the
time when there is a large number of tcp connections. This also makes
it possible to avoid calling "ctdb killtcp" when there are no connections.
Add a couple of unit tests for killtcp and update eventscript unit
test infrastructure to support.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit a20d94717d2e4ab866d8a002cdf39c0669b74c6a)
The timeout information printed by ctdbd is less than useful because
it refers to the cumulative time taken by the eventscripts run so far.
Adding scriptstatus output indicates where time was actually spent.
Since there is now quite a bit of output, serialise the calls to this
script using flock.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 1b016b2dfc5d7d3f2a42ce4dfe569608e90eb714)
A missing interface is at least as bad as an interface with a link
that is down so should have a similar effect.
This couldn't be done previously because orphaned interfaces used to
be listed for monitoring. This was worked around in 10.interface in
commit 49b2d1bd9554461ed8edbfc21e777c0eca9e1443 and fixed in ctdbd in
commit cc1a3ae911d3fee8b87fda5de5ab6d9499d7510a.
If $CTDB_PARTIALLY_ONLINE_INTERFACES="yes" then monitoring won't
actually fail but the interface is still marked as down.
While we're touching this code, use "ip link" instead of "ip addr".
It is marginally cheaper but not enough for a separate patch. ;-)
This effectively reverts d67955b42f7627be9dae995230c8fcbb8a948ec2.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 501f19b16fd6d67fbb754248868c38ee5bcf79ef)
This was previosuly changed because ctdbd didn't garbage collect
orphaned interfaces. This was fixed in commit
cc1a3ae911d3fee8b87fda5de5ab6d9499d7510a.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit c6ab0f9405d5fa5b0b1693bc92e59da0d555a9d7)
It can be very disconcerting when logging to syslog is expected but
nothing is being logged there.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 412bc0e20bef694d4e911dc9c984fd7716231f1f)
Based on an original patch by Sumit Bose <sbose@redhat.com>.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit e43a4b7b69a21c4cec2453dcac436b64bf5d7f06)
Currently the initscript is very complex. This makes it hard to read
and hard to add support for new init systems, such as systemd.
Create a wrapper called ctdbd_wrapper to be installed alongside ctdbd.
This is called by the initscript to start and stop ctdbd. It does the
ctdbd option construct and waits until ctdbd is properly initialised
before it exits.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit e3abc7eebab5cceddc4ce7817890dd5db9be3450)
This allows 60.ganesha to be unit tested, except for the core Ganesha
monitoring code.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit f606df4f2db754592e6d1a16c26e155cacb2beef)
Support for this was removed in commit
77302dbfd85754e02559eccb2dd6c090db0b6b9f and I overlooked its use in
60.ganesha.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 520914e7ee1b879c1080e5857fda18ed5b973fd6)
The "setup" event isn't called until ctdbd is in CTDB_RUNSTATE_SETUP
anyway...
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 9ea57af557028b1d2e5c560e7bcf4d014b9a8b1e)
This essentially reverts d4621277240721e6d130a930b0100506b64467ea.
This was added for testing but the test code was actually broken.
CTDB itself will only process public IPs if $CTDB_PUBLIC_ADDRESSES is
set, so no code should try to be more flexible than that!
The test code has been fixed instead.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 3b11b27f3e22e99947bc2d6c49c4427bd7a0e332)
It makes sense to do this in the "init" event and make the initscript
less complicated.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 3bc93f312b8464fbfa2b2c44fffedc591fe5a3e0)
It makes sense to do this in the "init" event and make the initscript
less complicated.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 0b77cceb49a30a181063adc7868d42d2851318e8)
Otherwise secondary addresses that aren't owned by CTDB could be
dropped.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 5ffce65a1ad659b198ddf647622b899bdde45c72)
Change all callers to maintain current behaviour.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 0b67397ef5419c781a35916575151da7b7e7cc27)
If some nfsd threads are still alive after a shutdown during a restart
then this indicates the maximum number of threads for which a stack
trace should be dumped. This can be useful for trying to determine
why nfsd is stuck.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 2503245db10d567af708a04edd3a3b488c24f401)
Consider the following example:
1. There are 256 nfsd threads configured.
2. 200 threads are "stuck" in system calls, perhaps waiting for the
underlying filesystem when an attempt is made to restart NFS.
3. 56 threads exit when NFS is stopped.
4. 56 new threads are started when NFS is started.
5. 200 "stuck" threads exit leaving only 56 threads running.
Setting this option to "yes" makes the 60.nfs monitor event look for
this situation and try to correct it.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 99b0d8b8ecc36dfc493775b9ebced54539c182d2)
60.nfs and 60.ganesha touch $statd_update_trigger every time they're
run. This stops the statd-callout updates from ever being called.
Make this logic self-contained and move it to new function
nfs_statd_update() in the functions file. Call this in 60.nfs and
60.ganesha with the appropriate update period as the only argument.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reported-by: Poornima Gupte <poornima.gupte@in.ibm.com>
(This used to be ctdb commit 1b5968f6be084590667f4f15ff3bef13ed9a2973)
Every time a node that wasn't the NAT gateway master gets reconfigured
something like this appears in the log:
ctdbd: 11.natgw: Failed to del 10.0.1.139 on dev eth1
Since this usually fails it is better to mute the error than to have
it pollute the log.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 0ca7a98ffef50cbd06849cfbf65fb4a3d668b7bd)
This is needed for AIX and possibly others.
Also provide a cheaper mktemp function is needed in the run_tests
script.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit b2b572e9049c7138bd223226475bef8fe3e01f10)
The current code calls "ctdb setnatgwstate ..." on every event.
However, calling the ctdb tool in the "init" event is not permitted.
Instead, update the capability when it is needed and at regular
intervals via the "monitor" event.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 39a43feae7c7de07ddaf2d6cb962f923d47d0c19)
This adds more serialisation to the startup, ensuring that the
"startup" event runs after everything to do with the first recovery
(including the "recovered" event).
Given that it now takes longer to get to the "startup" state, the
initscript needs to wait until ctdbd gets to "first_recovery".
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit ed6814ff0a59ddbb1c1b3128b505380f60d7aeb7)
If one or more run states are specified then "ctdb runstate" succeeds
only if ctdbd is in one of those run states.
At the moment, if the "setup" event fails then the initscript succeeds
but ctdbd exits almost immediately. This behaviour isn't very
friendly.
The initscript now waits until ctdbd is in "startup" or "running" run
state via the use of "ctdb runstate startup running", meaning that ctdbd
has successfully passed the "setup" event.
The "setup" event code in 00.ctdb now waits until ctdbd is in the
"setup" run state before proceeding via the use of "ctdb runstate setup".
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 4a2effcc455be67ff4a779a59ca81ba584312cd6)
This makes it easier to add notification handlers.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit d29e9a420b133088bf23a847c8d1dbce56c25eb0)
fff88940f71058e4eefd65f50a6701389c005c17 introduced a regression.
Without $service_name set by default, the CTDB configuration is no
longer loaded when loadconfig() is called without any arguments.
That's bad.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit f1619a36c1beba11533052dc5728fa3adaa08870)
No longer used, support removed from test infrastructure.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 0eb351ff4c7ee096de7c5e0a59561067091fa32e)
* New directory nfs-rpc-checks.d/ replaces hardcoded rules in 60.nfs
* Installation and packaging additions to handle nfs-rpc-checks.d/
* Unit test updates, including deleting 1 test that sanity checked
test infrastructure
* Test infrastructure changes to use nfs-rpc-checks.d/
Note that this removes support for $CTDB_NFS_SKIP_KNFSD_ALIVE_CHECK in
60.nfs. To get the equivalent behaviour, edit 20.nfsd.check and
remove/comment all lines.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 7e792d6768d9ca420ce3713cb122e63afd594b15)
Want nfs_check_rpc_services() to support filenames without the 'k'.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit d9775fcbd6e30eef8382bea68e2f9bad2309f2c1)
This is intended to replace nfs_check_rpc_service(), which builds
configuration into eventscripts.
nfs_check_rpc_services() uses a directory of configuration checks that
can be edited by an administrator. The files have one limit check and
a set of actions per line. The program name is extracted from the
file name.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 9bc8fbee6550ed2814fb35c70d57fab21ef1b8fd)
This creates new function _nfs_check_rpc_common().
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit cc3bb42e48bbdabd19187c231846b98589b4f4f3)
This is unused so doesn't need to be maintained. An attempt to use it
now will explicitly fail rather than implicitly fail via bitrot.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 887733dd7be53158bfe07b30ef31b611d0f8122f)
This reverts commit 92f74fd589467b46c758e116e97417edfe8773d7.
This change is unused and is just complicating the function.
Conflicts:
config/functions
(This used to be ctdb commit 77302dbfd85754e02559eccb2dd6c090db0b6b9f)
The code in 60.nfs is going to be genericised, so make all the checks
look the same.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 15b0f78cbf8d6ba481b7eba9e4fe3f4270214c72)
ctdb_check_counter_limit() can soon be removed...
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit bb2cdff77e8ec79e7d319159b9c9848ecfaaa0f1)
It is in the background but it still might cause the counter to be
reset before it is checked.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit ef2cf75e95ff382c65524a4d77eb00ab8411d2fc)
That way we don't even check the counter...
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 136abd4604dc68f7c696704bac708bae53cf1940)
This has 2 advantages:
1. It uses get_tcp_connections_for_ip() to check for leftover
connections, instead of custom code.
2. It checks for the timeout condition before sleeping. The current
code sleeps and then checks, so wastes a second.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 60a08eb96e1d97aab31e9bd4af01683c650541c2)
Uses new function get_tcp_connections_for_ip(). This avoids using a
temporary file and running netstat twice.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit a621622903c7ef17764b15293d6ea8df5a53c7e1)
... using kill_tcp_connections()
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 10e4db8f796d1e3259733180494db3b4bbad291a)
This change is a no-op. However, In a subsequent commit we'll merge
kill_tcp_connections_local_only() with this function.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 23c0f5f48e3e5a0c1a3254c582299f7893cf0d33)
Setting these variables spawns lots of unnecessary processes, which
would surely slow down these functions on a busy system.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 3eae161472e6352f7f656851c73dc056f95113eb)
* Command is now multiple arguments, preserving quoting
* $service_name no longer printed, no longer an argument
* Debug output from failed command
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 9e25fb261447a196de05937052779b36e75e7215)
The documentation comments are wrong... and remove option
$service_name argument.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit d9e6cb945c5edac9ca6405c9228bf647fab814f5)
Complicated argument handling was introduced to deal with multiple
services per eventscript. This was a failure and we split 50.samba.
This simplifies several functions to use global $service_name
unconditionally instead of having an optional argument.
$service_name is no automatically longer set in the functions file.
This means it needs to be explicitly set in 13.per_ip_routing because
this script uses ctdb_service_check_reconfigure().
Eventscript unit test infrastructure needs to set $service_name during
fake service setup, and policy routing tests need to be updated
accordingly.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 27aab8783898a50da8c4bc887b512d8f0c0d842c)
Complicated argument handling was introduced to deal with multiple
services per eventscript. This was a failure and we split 50.samba.
This simplifies several functions to use global $service_name
unconditionally instead of having an optional argument.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit b5802c4735e1c719a5cf9ce69489d5947bd5e8c5)
Complicated argument handling was introduced to deal with multiple
services per eventscript. This was a failure and we split 50.samba.
This simplifies several functions to use global $service_name
unconditionally instead of having an optional argument.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit e24baac0d2952e86d5ff31235901f06e2f2b2449)
Complicated argument handling was introduced to deal with multiple
services per eventscript. This was a failure and we split 50.samba.
This simplifies several functions to use global $service_name
unconditionally instead of having an optional argument.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit c2ea72ff565222f9edab408638bd45dbba6e8ff7)
5940a2494e9e43a83f2bca098bd04dfc1a8f2e93 makes script_log() always
pass a message to logger, so script_log() can no longer log stdin.
Put all the tag fu in the actual tag so the message argument is empty
if no message was passed.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 9dee4c84273633b9ad82e94dabbf0e6f86edbcef)
It isn't used, superceded by "ipreallocated".
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit c2bb8596a8af6406ef50e53953884df9d6246a96)
Use "ipreallocated" instead. The "stopped" event pre-dates the
"ipreallocated" event. The only way of stopping a node is via the
ctdb tool, which explicitly causes a takeover run to occur after the
node is stopped. The takeover run will generate an "ipreallocated"
event.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 978d4a0d6d8c9877b23f72e3a7b78c1245d16908)
Our practice is to search logs for "ctdbd:". We want to make sure we
find everything.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 5940a2494e9e43a83f2bca098bd04dfc1a8f2e93)
Previous commits stopped the top level of the script from creating
certain directories but some functions assume that required
directories exist.
Create those directories instead.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 0076cfc4666e5a96eb2c8affb59585b090840e00)
The current logic is horrible and creates an unnecessary file. Let's
make the script debug level independent of ctddb's debug level.
* Have debug() use $CTDB_SCRIPT_DEBUGLEVEL directly
* Remove ctdb_set_current_debuglevel()
* Remove the "getdebug" command from ctdb stub in eventscript unit
tests
* Update relevant eventscript unit tests to use
$CTDB_SCRIPT_DEBUGLEVEL
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 85efa446c7f5c5af1c3a960001aa777775ae562f)
Move the use of the service command below inclusion of functions file,
which sets $PATH.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit d254d03f69cbdc3e473202b759af6e1392cbb59c)
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit e7a4b7e35a1e4b826846e2494a3803abb57065ee)
"ctdb ping" can time out. How many times should we try?
Instead, depend on the initscript to implement something sane.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit 90cb337e5ccf397b69a64298559a428ff508f196)
Using "ctdb ping" and "ctdb status" is fraught with danger. These
commands can timeout when ctdbd is running, leading callers to believe
that ctdbd is not running. Timeouts could be increased but we would
still have to handle potential timeouts.
Everything else in the world implements the "status" option by
checking if the relevant process is running. This change makes CTDB
do the same thing and uses standard distro functions.
This change is backward compatible in sense that a missing
/var/run/ctdb/ directory means that we don't do a PID file check but
just depend on the distro's checking method. Therefore, if CTDB was
started with an older version of this script then "service ctdb
status" will still work.
This script does not support changing the value of CTDB_VALGRIND
between calls. If you start with CTDB_VALGRIND=yes then you need to
check status with the same setting. CTDB_VALGRIND is a debug
variable, so this is acceptable.
This also adds sourcing of /lib/lsb/init-functions to make the Debian
function status_of_proc() available.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit 687e2eace4f48400cf5029914f62b6ddabb85378)
In RHEL 6+, rpc.statd runs as "rpcuser" instead of root as on RHEL 5. This
prevents CTDB tool commands talking to daemon since "rpcuser" cannot access
CTDB socket.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-Programmed-With: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit fe8c4880b371492a38554868d4ca10918c54e412)
This is an artifact from older versions of Samba. In the newer versions of
Samba, "smbstatus -np" command does not do anything useful, but causes a
traverse in CTDB which is expensive and causes CPU utilization to shoot up.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 053b89c6dbce47001505524606889334559d2ec4)
This means it can be set like any other configuration option in the
configuration file, without needing to export it there.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit a0ef73e197dc9147f7718e0813fe803ff0b3d54d)
Use an environment variable instead. This just means that the
initscript exports CTDB_DEBUG_HUNG_SCRIPT and the code checks for the
environment variable.
The justification for this simplification is that more debug options
will be arriving soon and we want to handle them consistently without
needing to add a command-line option for each. So, the convention
will be to use an environment variable for each debug option.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 0581f9a84e58764d194f4e04064c2c5b393c348b)
Unobtrusive recovery: Ganesha will not be restarted on failovers.
Ganesha health: Use the counters in /var/lib/nfs/ganesha_local to track progress
instead of the null call which can timeout if the server is too busy.
Signed-off-by: Srikrishan Malik <srimalik@in.ibm.com>
Signed-off-by: Lance Russell <lancerus@us.ibm.com>
(This used to be ctdb commit 0e651e9da0f1f3c836b4474612ab13d0ccd272d9)
Currently it silently continues without attempting to set tunables.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 735ec99b99c7bb579851ce8293011aaf1dcc552a)
When using syslog any provided message arguments are ignored and not
passed to logger. This means that logger blocks waiting on stdin.
That's bad.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 50abf597cefe6f8ea2a2ff7694bf84641344a9b1)
This improves maintainability.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit e2aaa64925cca359c71520e01a18fc9461b0da4d)
Incorporate some of the logic from ctdb-crash-cleanup.sh that ensures
IPs are deleted even if they have the wrong netmask or are on the
wrong interface.
Factoring out some of the code will allow it to be used elsewhere.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 03356fd5ae7a3ac35fde0289cbea7c71ecf07367)
This makes it easier to run the scripts externally.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 740ea8ea5084149c8b552a01ee1c98c558b12384)
... so it can be improved and used elsewhere.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit b23c30253cc9eb274b895cac0f8c65245ba0a200)
A default action of restarting the service doesn't obey the principle
of least surprise. It cause the NFS service to be implicitly
reintroduced.
This allows no-op functions to be removed from some eventscripts and
service restart functions to be added to others.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit c75b5e5b4d000f5c7dab403df8238ceed390c1c0)
It looks like this restart was accidentally reintroduced in commit
fc0678d351187cfa4c71123f97c0f493aacd5d16 when $service_reconfigure
became unset so the default action of restarting the service would
occur. From there cleanups have explicitly reintroduced it and
carried it through the code.
Also update the unit tests affected by this change.
The restart was originally removed in commit
bc481c3f1a44c50648488c4f8a7f15ec395d446f.
The default reconfigure action of restarting a service is clearly
suboptimal and will be addressed in a separate patch.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 2629de72e1f37b5e46772c2ef8d8d0012fc4ed37)
At the moment the caller has no idea why it thinks CTDB isn't running
and we can't debug failures...
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 776590bf84d221092298346a28d7fc0552a67c9d)
creating the smb.conf cache with "-v" results in a cache file
that fails to load with "testparm -s ..." later on due to
"copy = " not being processable. (Copying the empty service name fails).
Signed-off-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit 81788cfabe960497b050c5ee4e4e487ee061012a)
The current code lists available interfaces. If IPs are configured in
some other way than the public addresses file (e.g. ctdb addip) and their
interfaces default to being marked down then, since down interfaces are
not available, these interfaces can never be marked up.
The configured interfaces should be listed instead.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit d8f010355b715e49709836e057a5d0f110919897)
Provided that monitor_interfaces() sets the state of each interface,
there's no need to mark all interfaces as up before running
monitor_interfaces() in the startup event. monitor_interfaces() will
set the true status of each interface anyway. The duplication is
unnecessary and may cause extra action in the recovery daemon because
the state of some interfaces is changed an extra time.
Instead, add a comment at the top of the loop in monitor_interfaces()
to warn against early loop exits.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit f243a916ee71013f7402b9c396c2ead88eb3aab0)
This file is #!/bin/sh. On sn-devel at least, with this /bin/sh the
shell does not like == for string equality.
(This used to be ctdb commit e2213db479129ce9c2b2fb88ec8c53cbd33d54b3)
This reverts commit 88f88d86b0d08240f749fb721b8c401c2eeb1099.
This is dangerous and, on reflection, I can't see it being useful.
There are often permanent IPs on interfaces that CTDB shares with its
public IPs.
(This used to be ctdb commit 16aba4eb620844626a1c71c58b51658caf44dea6)
The recovery process has no protection against the "recovered" event
failing, so this can cause a recovery loop.
Instead of failing the "recovered" event, add a "monitor" event and
fail that instead. In this case the failure semantics are well
defined.
A separate patch should ban nodes if the "recovered" event fails for
an unknown reason.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit eaa7c165f58abd7e259c37d76b7dd37c91e13d9f)
We've seen this function report "Unknown family, 0" and then CTDB
disappeared without a trace. If we can reproduce it then this might
help us to debug it.
The idea is that you do something like the following in /etc/sysconfig/ctdb:
export CTDB_EXTERNAL_TRACE="/etc/ctdb/config/gcore_trace.sh"
When we hit this error than we call out to gcore to get a core file so
we can do forensics. This might block CTDB for a few seconds.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 7895bc003f087ab2f3181df3c464386f59bfcc39)
ctdb_check_counter_limits does not fail but succeed if count >= limit
Signed-off-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit af540ef728303b4a0a188b17c695e9aefab34489)