IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
* Command is now multiple arguments, preserving quoting
* $service_name no longer printed, no longer an argument
* Debug output from failed command
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 9e25fb261447a196de05937052779b36e75e7215)
The documentation comments are wrong... and remove option
$service_name argument.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit d9e6cb945c5edac9ca6405c9228bf647fab814f5)
Complicated argument handling was introduced to deal with multiple
services per eventscript. This was a failure and we split 50.samba.
This simplifies several functions to use global $service_name
unconditionally instead of having an optional argument.
$service_name is no automatically longer set in the functions file.
This means it needs to be explicitly set in 13.per_ip_routing because
this script uses ctdb_service_check_reconfigure().
Eventscript unit test infrastructure needs to set $service_name during
fake service setup, and policy routing tests need to be updated
accordingly.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 27aab8783898a50da8c4bc887b512d8f0c0d842c)
Complicated argument handling was introduced to deal with multiple
services per eventscript. This was a failure and we split 50.samba.
This simplifies several functions to use global $service_name
unconditionally instead of having an optional argument.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit b5802c4735e1c719a5cf9ce69489d5947bd5e8c5)
Complicated argument handling was introduced to deal with multiple
services per eventscript. This was a failure and we split 50.samba.
This simplifies several functions to use global $service_name
unconditionally instead of having an optional argument.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit e24baac0d2952e86d5ff31235901f06e2f2b2449)
Complicated argument handling was introduced to deal with multiple
services per eventscript. This was a failure and we split 50.samba.
This simplifies several functions to use global $service_name
unconditionally instead of having an optional argument.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit c2ea72ff565222f9edab408638bd45dbba6e8ff7)
5940a2494e9e43a83f2bca098bd04dfc1a8f2e93 makes script_log() always
pass a message to logger, so script_log() can no longer log stdin.
Put all the tag fu in the actual tag so the message argument is empty
if no message was passed.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 9dee4c84273633b9ad82e94dabbf0e6f86edbcef)
Our practice is to search logs for "ctdbd:". We want to make sure we
find everything.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 5940a2494e9e43a83f2bca098bd04dfc1a8f2e93)
Previous commits stopped the top level of the script from creating
certain directories but some functions assume that required
directories exist.
Create those directories instead.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 0076cfc4666e5a96eb2c8affb59585b090840e00)
The current logic is horrible and creates an unnecessary file. Let's
make the script debug level independent of ctddb's debug level.
* Have debug() use $CTDB_SCRIPT_DEBUGLEVEL directly
* Remove ctdb_set_current_debuglevel()
* Remove the "getdebug" command from ctdb stub in eventscript unit
tests
* Update relevant eventscript unit tests to use
$CTDB_SCRIPT_DEBUGLEVEL
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 85efa446c7f5c5af1c3a960001aa777775ae562f)
Unobtrusive recovery: Ganesha will not be restarted on failovers.
Ganesha health: Use the counters in /var/lib/nfs/ganesha_local to track progress
instead of the null call which can timeout if the server is too busy.
Signed-off-by: Srikrishan Malik <srimalik@in.ibm.com>
Signed-off-by: Lance Russell <lancerus@us.ibm.com>
(This used to be ctdb commit 0e651e9da0f1f3c836b4474612ab13d0ccd272d9)
When using syslog any provided message arguments are ignored and not
passed to logger. This means that logger blocks waiting on stdin.
That's bad.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 50abf597cefe6f8ea2a2ff7694bf84641344a9b1)
Incorporate some of the logic from ctdb-crash-cleanup.sh that ensures
IPs are deleted even if they have the wrong netmask or are on the
wrong interface.
Factoring out some of the code will allow it to be used elsewhere.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 03356fd5ae7a3ac35fde0289cbea7c71ecf07367)
... so it can be improved and used elsewhere.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit b23c30253cc9eb274b895cac0f8c65245ba0a200)
A default action of restarting the service doesn't obey the principle
of least surprise. It cause the NFS service to be implicitly
reintroduced.
This allows no-op functions to be removed from some eventscripts and
service restart functions to be added to others.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit c75b5e5b4d000f5c7dab403df8238ceed390c1c0)
ctdb_check_counter_limits does not fail but succeed if count >= limit
Signed-off-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit af540ef728303b4a0a188b17c695e9aefab34489)
If $CTDB_SERVICE_AUTOSTARTSTOP="yes" then service start/stop is done
in the background with logging.
Fix some unit tests for samba and winbind.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 3a3dae4cb5ec8b4b8381a4013adda25b87641f3a)
Originally from Srikrishan Malik <srikrishan.malik@in.ibm.com> with
some style changes by me.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 637cab6304dae66b85668506028c76ea1ee88980)
This can be optional because the 1st item of each action-triple is a
test comparison that starts with '-'.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 92f74fd589467b46c758e116e97417edfe8773d7)
Make add_ip_to_iface() and delete_ip_from_iface() do their own locking
so the external script is no longer required.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 93f90caf91246074d9359bf31a39b26212cccc42)
This is no longer used by 13.per_ip_routing or anything else.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 2a2ea6c61a05af2d0765e964abcc7ef04047431e)
The relevant functions are now in that script.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 45c3476d12bf0f52966b72d286f101fce1382cd2)
Args:
1. Error message to be printed.
2. Option exit code (default 1)
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 97b0c138cb97e30db27c40b4ee1481109ae90c78)
Print useful output and return a suitable exit code.
The DISABLED and TIMEDOUT statuses use fake negative return codes, and
these can't be faked from the shell. So we map DISABLED to OK and
TIMEDOUT to ERROR - this should avoid nearly all surprises. When we
do this we add a note to the beginning of the output. The alternative
is to "fix" ctdbd to use only codes that can actually be returned by
shell scripts. However, the reason for using negative codes is
probably to distinguish them from real ones...
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit dda44d026e0c1b02feb02185b8c200a542be341a)
In the current code services can only be reconfigured asynchronously.
This means that configuration file changes can be made, an asychronous
reconfigure event can be triggered, and it always succeeds. Some time
later when a service is actually reconfigured then a failure may be
seen
This adds a synthetic reconfigure event that reconfigures a service
synchronously so that any failure is reported on exit.
ctdb_service_check_reconfigure() is essentially reimplemented.
If a reconfigure event is in flight and an ipreallocated or monitor
event occurs then any scheduled asynchronous reconfigure is deferred
until the next monitor cycle. This is to avoid reconfigures trampling
on each other. In this case a monitor event will also replay the
previous status to try to avoid exposing any temporary instability.
If a reconfigure event collides with another reconfigure event it will
exit with status 2, indicating that the reconfigure should be retried.
The reconfigure event is implemented using a subprocess to control the
exit from the synthetic event.
As before, if a monitor event causes a scheduled synchronous
reconfigure to occure then it will replay the previous status for the
service, given that a reconfigure can cause temporary instability.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 220578bfd3507152b29ba4c28942f9d5e8733886)
Pass this "$@" to do common eventscript argument checking.
For regular use putting this in 00.ctdb would be enough. However, for
developer testing it can be useful to call this in other eventscripts.
For example, 10.interfaces and 13.per_ip_routing currently check these
by hand.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 36de7e7fd6dfeed61ef9977b8d5b568f90a9707b)
Some of the current auto-start/stop logic is broken, particularly for
Samba. Fixing it is non-trivial.
If $CTDB_SERVICE_AUTOSTARTSTOP is "yes" then auto-start/stop services
when told to newly manage or no longer manage them. This defaults to
"yes".
However, if using a canned configuration file that doesn't set
$CTDB_SERVICE_AUTOSTARTSTOP then this stops the auto-start-stop logic
from working. Therefore, this works around CQ S1026685 - on the
system in question another daemon controls service auto-start/stop and
CTDB just gets in the way.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit ef71b8290ae49117d7bcc7166598b77cb64cc8a0)
New function ctdb_check_tcp_ports_ctdb(). This should be fast... and
is now the default checker. If it fails in an unexpected way we fall
back to the nmap and netstat checkers.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit a1e16a707ce204817531a61455000361f972080a)
Split the netstat-specific parts of ctdb_check_tcp_ports() into new
function ctdb_check_tcp_ports_netstat().
Implement new ctdb_check_tcp_ports_nmap() function that uses
"nmap -PS" to check if the desired ports are listening.
ctdb_check_ctdb_ports() now uses new configuration variable
CTDB_TCP_PORT_CHECKERS to decide which port checkers to try. Default
value is currently "nmap netstat". If nmap is not found then this
will fall back to netstat - if logging is at debug level this will
also fill the logs with message saying the nmap checker failed. This
indicates that either nmap should be installed or the default value of
CTDB_TCP_PORT_CHECKERS should be changed (in a configuration file) to
avoid trying to use nmap.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit d9651175b40b9454e7d4e98291955fcf1445085e)
Use the new debug function to conditionally print the netstat output.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 44c14aeeb11080980fe07c7396d06843a4870747)
Sometimes smbd and other services can take a while to start,
especially when there is a lot of activity after ctdbd has just
started. The TCP port check can then pollute the logs with lots of
"ERROR" messages and possibly extra debug.
This creates a flag file when a service is started (but not restarted)
and this flag is removed the first time that TCP port checks succeed
for that service. When a port check fails and the flag file still
exists, a less extreme "INFO" message is printed rather than the usual
"ERROR" message. This means that until the node actually becomes
healthy we see more friendly messages.
The subtext is that we're hearing false positive reports "recreates"
of CQ S1024874 (samba stopped responding on port 445) quite often when
ctdbd is started. This reduces the chances of people reporting such
false recreates...
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 571865eb6ef847857129d0b1e2ba5fa7254bfe8c)
ctdb_check_tcp_ports() runs "netstat -a -t -n" in a loop for each
port. There are 2 problems with this:
* Netstat is run on each loop iteration when it need only be run once.
* The -a option is used to list all connections but the function only
cares about the listening ports. There may be many thousands of
non-listening ports to grep through.
This changes ctdb_check_tcp_ports() to run netstat with the -l option
instead of the -a option. It also only runs netstat once before the
main loop.
When a port is found to not be listening the output of the netstat
command is now dumped to help with debugging.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 830355a8b18c53cfcc3ad1e3009bbb1a7a681fa0)
The debug function passes its arguments to echo if
$CTDB_CURRENT_DEBUGLEVEL is >= 4 (i.e. DEBUG). If no args are given
then use stdin - this allows the function to be used with here
documents.
To ensure $CTDB_CURRENT_DEBUGLEVEL is set,
ctdb_set_current_debuglevel() is called near the end of the functions
file.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 6143483d9f87322578c00f12081e381f425226ca)
This function ensures that CTDB_CURRENT_DEBUGLEVEL is set. It works
like this:
1. If it is already set then do nothing, since it might have been set
some other way.
The recommended "other way" would be to add a file in rc.local.d/.
2. If it is not set then set it by sourcing
/var/ctdb/eventscript_debuglevel.
3. If this file does not exist then create it using output from "ctdb
getdebug".
If the optional 1st argument is set to "create" then don't source an
existing file but create a new one instead - this is useful for
creating the file just once in each event run in, say, 00.ctdb.
If there's a problem getting the debug level from ctdb then it is
silently set to 0 - no use spamming logs if our debug code is
broken...
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 93910921c8a25f2b029733cd938069ff7c7bdab7)
This effectively reverts 953dbfbddad656a64e30a6aca115cb1479d11573 and
is a policy decision.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 380c9263eb37db5a250264316e250c2160908263)
This adds a helper function called nfs_check_rpc_service() and uses it
to make the monitor event much more readable. An example of usage is
as follows:
nfs_check_rpc_service "mountd" \
-ge 10 "verbose restart:b unhealthy" \
-eq 5 "restart:b"
The first argument to nfs_check_rpc_service() is the name of the RPC
service to be checked. The RPC service corresponding to this command
is checked for availability using the rpcinfo command. If the service
is available then the function succeeds and subsequent arguments are
ignored.
If the rpcinfo check fails then a failure counter for that particular
RPC service is incremented and subsequent arguments are processed in
groups of 3:
1. An integer comparison operator supported by test.
2. An integer failure limit.
3. An action string.
The value of the failure counter is checked using (1) and (2) above.
The first check that succeeds has its action string processed - note
that this explains the somewhat curious reverse ordering of checks.
It the example above:
* If the counter is >= 10 then a verbose message is printed
describing the failure, the service is restarted in the background
and the node is marked as unhealthy (via an "exit 1" from the
function).
* If the counter is == 5 then the service us restarted in the
background.
For more action options please see the code.
This also changes the ctdb_check_rpc() function so that it no longer
takes a program number to check. It now just takes a real RPC program
name that rpcinfo can resolve via /etc/rpc.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 9b66057964756a6245bafb436eb6106fb6a2866e)
* Make this function applicable to "ipreallocated" event too.
* Monitor event should not always succeed just because we reconfigure.
If the service was unhealthy before the reconfigure and we end the
reconfigure with "exit 0" then we can cause the node's health status
to flip-flop.
To avoid this we return the status of the service from the previous
monitor event.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 21dfcbbdccd906fcd6ab7bba81418ce565bf63aa)
This should eventually be able to replace ctdb_check_counter_limit()
and ctdb_check_counter_equal(), although it doesn't issue warnings
like the former.
It takes 4 optional arguments:
1. _msg - If "error" then over limit causes an error message and and
exit 1. Anything else fails silently but the function returns 1.
Default is "error".
2. _op - An integer operator supported by test (e.g. -eq, -ge, -gt).
Default is -ge.
3. _limit - Limit for the counter to be used in comparison. Default is
$service_fail_limit.
4. _service_name - Used to identify the counter. Default is
$service_name.
For example:
ctdb_check_counter error -ge 5 foo
will print a message and exit 1 if the counter for foo is >= 5,
whereas
ctdb_check_counter check -ge 5 foo
will just return 1 if the counter for foo is >= 5, and
ctdb_counter_check
with print a message and exit 1 if the counter for $service_name is >=
$service_fail_limit.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 5b01b7233515669e995e037205796e265643b176)