IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
This uses potentially insecure temporary files and is not referenced
anywhere else.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 4b914d7e217202f3d11a8e95f9f74bc17869475b)
Anecdotal evidence suggests that most nfsd RPC check failures are due
to cluster filesystem or storage problem. Apparently these are rarely
helped by attempting to restart the NFS service because the restart
tends to hang.
Fail after 2 nfsd RPC check failures, instead of waiting for 6
failures. Restart on every 10th failure to try to bring the node back
to good health.
Update unit tests to match.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit e9ef93f7b6dad59eabaa32124df81f3e74c651ef)
It should print the actual number of consecutive failures rather than
the limit.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit ff5f0d1e29af2b293e30cdc54bed03a644be7038)
This makes the gaps in the logs more obvious.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 11fbf4789d783dd0bac22754b374dd9ea4b03bad)
Passing "localhost" to the rpcinfo command causes overheads, like
reading /etc/services multiple times.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 1d61988af9e4fa3621a3e2d06a859bcb53df2d67)
Also add it to the corresponding eventscript unit test infrastructure.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit f4ef83a256f59eeb00b9a5bc10c28347e1ad1031)
While doing this:
* Explicitly assign RPC program and version information in
_nfs_check_rpc_common(). This is more lines of code but is easier
to read.
* Don't print the options when starting a service. Trying to print it
makes the code messy for little benefit.
Update the eventscript unit testing code and a Ganesha test to
reflect this.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit e8b531405665885196c95fe1608db33a255bf761)
They're hard to maintain and provide very little benefit.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 1a1be43f8466d46913dcdfe6dcedb94316cd28ad)
That is, /dev/null the "stop" output. This is consistent with the way
CTDB generally deals with the output when stopping a service.
It also makes updating the eventscript unit tests easier.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit c7332526b1b488abefeb4be78a7cd3f2f9abc451)
CTDB daemon is not ready to accept clients in INIT runstate (init event).
CTDB daemon will start accepting connections in SETUP runstate (setup event)
and later.
Also, minor log formatting changes.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 81d7ce03b28d592a1337639e14d9ea141e20bfff)
On cluster where recovery lock file is not being used, asking CTDB daemon
is unnecessary overhead. And if CTDB is using recovery file, then changing
configuration without restarting is *stupid*.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-Programmed-With: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 44eb86e6042adb6efe75d2a5528b82a0f21d496d)
This ensures that any invocation of the ctdb tool (within the wrapper)
gets the desired value. This at least ensures that ctdbd will be
started.
If a non-standard value is set for CTDB_SOCKET then command-line users
will still need the variable in their environment.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 37ccc7c6cc43a80aaa92291aea7a438f4225488a)
This avoids issuing multiple "ctdb killtcp" commands to terminate tcp
connections, one per connection. This will considerably reduce the
time when there is a large number of tcp connections. This also makes
it possible to avoid calling "ctdb killtcp" when there are no connections.
Add a couple of unit tests for killtcp and update eventscript unit
test infrastructure to support.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit a20d94717d2e4ab866d8a002cdf39c0669b74c6a)
The timeout information printed by ctdbd is less than useful because
it refers to the cumulative time taken by the eventscripts run so far.
Adding scriptstatus output indicates where time was actually spent.
Since there is now quite a bit of output, serialise the calls to this
script using flock.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 1b016b2dfc5d7d3f2a42ce4dfe569608e90eb714)
A missing interface is at least as bad as an interface with a link
that is down so should have a similar effect.
This couldn't be done previously because orphaned interfaces used to
be listed for monitoring. This was worked around in 10.interface in
commit 49b2d1bd9554461ed8edbfc21e777c0eca9e1443 and fixed in ctdbd in
commit cc1a3ae911d3fee8b87fda5de5ab6d9499d7510a.
If $CTDB_PARTIALLY_ONLINE_INTERFACES="yes" then monitoring won't
actually fail but the interface is still marked as down.
While we're touching this code, use "ip link" instead of "ip addr".
It is marginally cheaper but not enough for a separate patch. ;-)
This effectively reverts d67955b42f7627be9dae995230c8fcbb8a948ec2.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 501f19b16fd6d67fbb754248868c38ee5bcf79ef)
This was previosuly changed because ctdbd didn't garbage collect
orphaned interfaces. This was fixed in commit
cc1a3ae911d3fee8b87fda5de5ab6d9499d7510a.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit c6ab0f9405d5fa5b0b1693bc92e59da0d555a9d7)
It can be very disconcerting when logging to syslog is expected but
nothing is being logged there.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 412bc0e20bef694d4e911dc9c984fd7716231f1f)
Based on an original patch by Sumit Bose <sbose@redhat.com>.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit e43a4b7b69a21c4cec2453dcac436b64bf5d7f06)
Currently the initscript is very complex. This makes it hard to read
and hard to add support for new init systems, such as systemd.
Create a wrapper called ctdbd_wrapper to be installed alongside ctdbd.
This is called by the initscript to start and stop ctdbd. It does the
ctdbd option construct and waits until ctdbd is properly initialised
before it exits.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit e3abc7eebab5cceddc4ce7817890dd5db9be3450)
This allows 60.ganesha to be unit tested, except for the core Ganesha
monitoring code.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit f606df4f2db754592e6d1a16c26e155cacb2beef)
Support for this was removed in commit
77302dbfd85754e02559eccb2dd6c090db0b6b9f and I overlooked its use in
60.ganesha.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 520914e7ee1b879c1080e5857fda18ed5b973fd6)
The "setup" event isn't called until ctdbd is in CTDB_RUNSTATE_SETUP
anyway...
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 9ea57af557028b1d2e5c560e7bcf4d014b9a8b1e)
This essentially reverts d4621277240721e6d130a930b0100506b64467ea.
This was added for testing but the test code was actually broken.
CTDB itself will only process public IPs if $CTDB_PUBLIC_ADDRESSES is
set, so no code should try to be more flexible than that!
The test code has been fixed instead.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 3b11b27f3e22e99947bc2d6c49c4427bd7a0e332)
It makes sense to do this in the "init" event and make the initscript
less complicated.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 3bc93f312b8464fbfa2b2c44fffedc591fe5a3e0)
It makes sense to do this in the "init" event and make the initscript
less complicated.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 0b77cceb49a30a181063adc7868d42d2851318e8)
Otherwise secondary addresses that aren't owned by CTDB could be
dropped.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 5ffce65a1ad659b198ddf647622b899bdde45c72)
Change all callers to maintain current behaviour.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 0b67397ef5419c781a35916575151da7b7e7cc27)
If some nfsd threads are still alive after a shutdown during a restart
then this indicates the maximum number of threads for which a stack
trace should be dumped. This can be useful for trying to determine
why nfsd is stuck.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 2503245db10d567af708a04edd3a3b488c24f401)
Consider the following example:
1. There are 256 nfsd threads configured.
2. 200 threads are "stuck" in system calls, perhaps waiting for the
underlying filesystem when an attempt is made to restart NFS.
3. 56 threads exit when NFS is stopped.
4. 56 new threads are started when NFS is started.
5. 200 "stuck" threads exit leaving only 56 threads running.
Setting this option to "yes" makes the 60.nfs monitor event look for
this situation and try to correct it.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 99b0d8b8ecc36dfc493775b9ebced54539c182d2)
60.nfs and 60.ganesha touch $statd_update_trigger every time they're
run. This stops the statd-callout updates from ever being called.
Make this logic self-contained and move it to new function
nfs_statd_update() in the functions file. Call this in 60.nfs and
60.ganesha with the appropriate update period as the only argument.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reported-by: Poornima Gupte <poornima.gupte@in.ibm.com>
(This used to be ctdb commit 1b5968f6be084590667f4f15ff3bef13ed9a2973)
Every time a node that wasn't the NAT gateway master gets reconfigured
something like this appears in the log:
ctdbd: 11.natgw: Failed to del 10.0.1.139 on dev eth1
Since this usually fails it is better to mute the error than to have
it pollute the log.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 0ca7a98ffef50cbd06849cfbf65fb4a3d668b7bd)
This is needed for AIX and possibly others.
Also provide a cheaper mktemp function is needed in the run_tests
script.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit b2b572e9049c7138bd223226475bef8fe3e01f10)
The current code calls "ctdb setnatgwstate ..." on every event.
However, calling the ctdb tool in the "init" event is not permitted.
Instead, update the capability when it is needed and at regular
intervals via the "monitor" event.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 39a43feae7c7de07ddaf2d6cb962f923d47d0c19)
This adds more serialisation to the startup, ensuring that the
"startup" event runs after everything to do with the first recovery
(including the "recovered" event).
Given that it now takes longer to get to the "startup" state, the
initscript needs to wait until ctdbd gets to "first_recovery".
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit ed6814ff0a59ddbb1c1b3128b505380f60d7aeb7)
If one or more run states are specified then "ctdb runstate" succeeds
only if ctdbd is in one of those run states.
At the moment, if the "setup" event fails then the initscript succeeds
but ctdbd exits almost immediately. This behaviour isn't very
friendly.
The initscript now waits until ctdbd is in "startup" or "running" run
state via the use of "ctdb runstate startup running", meaning that ctdbd
has successfully passed the "setup" event.
The "setup" event code in 00.ctdb now waits until ctdbd is in the
"setup" run state before proceeding via the use of "ctdb runstate setup".
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 4a2effcc455be67ff4a779a59ca81ba584312cd6)
This makes it easier to add notification handlers.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit d29e9a420b133088bf23a847c8d1dbce56c25eb0)
fff88940f71058e4eefd65f50a6701389c005c17 introduced a regression.
Without $service_name set by default, the CTDB configuration is no
longer loaded when loadconfig() is called without any arguments.
That's bad.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit f1619a36c1beba11533052dc5728fa3adaa08870)
No longer used, support removed from test infrastructure.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 0eb351ff4c7ee096de7c5e0a59561067091fa32e)
* New directory nfs-rpc-checks.d/ replaces hardcoded rules in 60.nfs
* Installation and packaging additions to handle nfs-rpc-checks.d/
* Unit test updates, including deleting 1 test that sanity checked
test infrastructure
* Test infrastructure changes to use nfs-rpc-checks.d/
Note that this removes support for $CTDB_NFS_SKIP_KNFSD_ALIVE_CHECK in
60.nfs. To get the equivalent behaviour, edit 20.nfsd.check and
remove/comment all lines.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 7e792d6768d9ca420ce3713cb122e63afd594b15)
Want nfs_check_rpc_services() to support filenames without the 'k'.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit d9775fcbd6e30eef8382bea68e2f9bad2309f2c1)
This is intended to replace nfs_check_rpc_service(), which builds
configuration into eventscripts.
nfs_check_rpc_services() uses a directory of configuration checks that
can be edited by an administrator. The files have one limit check and
a set of actions per line. The program name is extracted from the
file name.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 9bc8fbee6550ed2814fb35c70d57fab21ef1b8fd)
This creates new function _nfs_check_rpc_common().
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit cc3bb42e48bbdabd19187c231846b98589b4f4f3)