IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
If lock_request could not be allocated, free lock_ctx since there can
only be a single lock_request per lock_ctx.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Signed-off-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Wed Mar 26 06:24:01 CET 2014 on sn-devel-104
Extend CTDB_NATGW_STATIC_ROUTES so that each network can have an
optional gateway that overrides CTDB_NATGW_DEFAULT_GATEWAY.
Signed-off-by: Martin Schwenke <martin@meltin.net>
This has been implied since the command to add the route has had
errors redirected to /dev/null. If infrastucture (e.g. ADS, DNS) is
on the same network as CTDB_NATGW_PUBLIC_IP then no route is
necessary.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Although the dots in $CTDB_NATGW_PUBLIC_IP could probably only help
match an invalid public IP address, this is only executed once so do
as exact a check as possible.
Use CTDB_BASE instead of hardcoding /etc/ctdb.
Make the error message less redundant.
Signed-off-by: Martin Schwenke <martin@meltin.net>
delete_all() really needed renaming for clarity. While doing this,
might as well rename some of the others that don't start with
"natgw_".
Signed-off-by: Martin Schwenke <martin@meltin.net>
NAT gateway really can't operate unless most of the configuration
variables are set.
A check in delete_all() can be removed - strange that this isn't also
done in the add case.
Signed-off-by: Martin Schwenke <martin@meltin.net>
This includes adding support for:
* Configuring fake NATGW state in the eventscript unit tests
* "natgwlist" and "setnatgwstate" in ctdb command stub
* ip command stub to default to "main table" when no table specified,
allow routes to be added without "dev" option (just add a default
dev), support "metric" option
Signed-off-by: Martin Schwenke <martin@meltin.net>
Previous commits maintained the ordering between
ctdb_remove_orphaned_ifaces() and ctdb_vnn_unassign_iface(). This
meant that ctdb_remove_orphaned_ifaces() needed to steal the orphaned
interfaces and they would be freed later.
Unassign the interface first and things get simpler.
ctdb_remove_orphaned_ifaces() is now self-contained.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Sun Mar 23 06:20:43 CET 2014 on sn-devel-104
reloadips really expects deleted IPs to be released before completing.
Otherwise the recovery daemon starts failing the local IP check. The
races that follow can cause a node to be banned.
To make the error handling simple, do the actual deletion in
release_ip_callback().
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
It is hard to diagnose failures in the NFS tickle test because there's
no way of telling if the test node doesn't have the tickle or if it
didn't get propagated.
Factor out check_tickles() into local.bash and give it some
parameters.
Have the NFS test call it first to ensure the tickle has been
registered. Then use new function check_tickles_all() to ensure the
tickle has been propagated to all nodes. Give this a bit of extra
time (double the timeout) just in case we're racing with the update.
Add a useful comment to the CIFS test so that I stop asking myself how
the test could ever have worked reliably. :-)
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Commit 176ae6c704 caused these functions
to exit on failure. This is incorrect and broke NAT gateway.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This is racy and cbffbb7c2f makes it
unnecessary.
The eventscript code still knows that monitor events are special
compared to other events. However, the general concept of monitoring
is no longer tangled up with running scripts.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
"statd-callout notify" currently complains until an add-client or
del-client is done.
Given that we might use ctdb.tdb for something else in the future it
makes sense attach to it in the "startup" event. This could be done
in the background but it should be so lightweight that a timeout will
indicate serious problems.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Commit 0723fedced added a cheap
implemention of ctdb_control_startup() that simply flags the recipient
node as needing to send updates for each IP when the tickle update
loop next fires. Commit 026996550d
ensures that a node only sends tickle updates once being flagged to do
so.
CTDB_CONTROL_STARTUP is broadcast to all nodes, so this is a good
start. However, the tickle updates are only broadcast to connected
nodes. A recently started node may not yet be considered to be
connected because the keepalive monitoring loop may not yet have
marked the node as connected. This means that the tickle update loop
races with the keepalive monitoring loop. If the tickle update loop
wins then updates will not be sent to the recently started node.
The simplest improvement is to stop the tickle update from depending
on whether a node is connected or not. So instead of broadcasting
tickle updates to connected nodes, they are broadcast to all nodes.
Since no reply is expected, this should work just fine.
While looking at this code, ctdb_ctrl_set_tcp_tickles() is named like
a client function. It isn't a client function. Also, 2 of the
arguments are ignored. So rename this function to
ctdb_send_set_tcp_tickles_for_ip() and remove the ignored arguments.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
This way this logic is centralised. It also means that the IP address
comparisons in the NAT gateway code are IPv6 safe.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
There is a check for empty lines in the loop just below.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Factor it from read_nodes_file(). Use it there and in
read_natgw_nodes_file().
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Tests for xpnn need to implement a stub for ctdb_sys_have_ip(). The
cheapest way of doing this is to read a fake nodemap using the
existing code and check if the IP of the "current" node is the one
being asked about. However, the fake state initialisation isn't
currently available to without_daemon commands because it is meant to
represent daemon state. However, it can be made available by moving
the relevant code into a new stub for tevent_context_init(). The stub
still needs to initialise a tevent context - this can be done by
calling a lower level function.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Add another filter function, like the ones for capabilities and flags
to, for filtering by NAT gateway nodes. This makes the main
natgw_list function more readable.
Note that this drops the early filtering of disconnected nodes, so
they will now be listed in a NAT gateway group. This makes sense.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The index of the nodes array in nodemap isn't the PNN.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Instead of just finding the first node that doesn't have any flags in
flag_mask set, change it into a function that filters a nodemap to
exclude nodes with the given flags.
This makes the NATGW code simpler but also provides a function that
can be used in other code.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Check capabilities once to build a filtered node list instead of
repeatedly checking capabilities
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This looks to have got left behind a long time ago when things got
moved around...
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
It looks like the original without_daemon code still tried to
establish a client connection to the daemon. Closing stderr looks to
be a cheap way of hiding the errors when this failed.
However, later cleanups avoid the client connection altogether, so do
not close stderr. Now debug output from without_daemon commands
actually appears.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The commit "pmda: Use upstream assigned PCP domain id" updated the
Performance Metrics Namespace (pmns) file, without changing the
corresponding metric identifiers used by the agent.
This change fixes the agent metric identifier values to match the pmns
definitions.
Signed-off-by: David Disseldorp <ddiss@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
Autobuild-User(master): Jeremy Allison <jra@samba.org>
Autobuild-Date(master): Sat Mar 22 00:07:31 CET 2014 on sn-devel-104
when bumping skipped, decrement left, so the sum is correct
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Thu Mar 6 03:32:33 CET 2014 on sn-devel-104
We need to have left records == 0 at the end of the delete list processing.
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Failure in traversal of the DB should not
prevent further processing.
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
We should try to continue vacuuming as much as possible.
Failure to send records to one lmaster doesn't mean the
others will fail too.
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
we know anyways the record to store is empty at this point.
So skip pointer calculations.
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
With the new vacuuming, we consider it an error if there are
records left for deletion after processing the various lists.
All records that can be deleted should have been deleted by
tdb_delete calls.
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
I.e. no number of records found to delete will trigger the
repacking.
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This describes more precisely what it actually is, nowadays.
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This lets the "fast vacuum" delete queue traverse do the actual work.
On the positive side, we note that this lets the "full vacuuming"
treat the records that have never been migrated with data correctly.
These had previously been added to the delete list for complicated
cross-node deletion instead of directly deleting them.
On the other hand side, there might be a slight overhead
since the records are read again in the delete queu traverse,
but this is OK because this change is in preparation of
untangling the db traverse altogether from the vacuum run,
making it independent.
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This in preparation of modifying the db traverse to
fill the delete_queue that is processed by the fast
vacuum run, instead of filling the same lists as the
fast vacuum run for further processing.
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
I.e. when RepackLimit is set to 0, no size of the freelist
should trigger a repack in vacuuming.
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
CTDB tracks connections to be able to send tickle ACKs and gratuitous
ARPs. When there are no public IPs, there is no need for tickle ACKs
and gratuitous ARPs.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Tue Mar 4 03:01:38 CET 2014 on sn-devel-104
Fix suggested by by Kevin Osborn <kosborn@overlandstorage.com>.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Thu Feb 27 13:54:59 CET 2014 on sn-devel-104
tcp_update_flag is set to true whenever tickles are added or deleted.
This flag is used to determine whether or not to send tickles list to
other nodes. Once tickles list is sent to other nodes successfully,
set tcp_update_flag to false, so ctdbd does not keep sending same tickles
list every TickleUpdateInterval (20 seconds).
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
This doesn't implement what was recommended. That would require
careful error handling, probably with a fallback to this code anyway.
This is simple and does no worse that the current code. That is, the
new node is updated on the next call to tdb_update_tcp_tickles().
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This fixes ctdb crash reported in bug #10366.
Fix suggested by Kevin Osborn <kosborn@overlandstorage.com>.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
This feature was added quite a while ago but was not enabled by
default. It is a useful feature so enable it to dump stack traces of
up to 5 stuck processes by default.
This can be disabled by setting:
CTDB_NFS_DUMP_STUCK_THREADS=0
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue Feb 25 04:06:45 CET 2014 on sn-devel-104
This comment was true when 50.samba was spaghetti because it tried to
automatically manage both smbd (and nmbd) and winbind. It isn't true
anymore.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Wed Feb 19 04:07:12 CET 2014 on sn-devel-104
* Should stop on 1st error
* Fix up value of CTDB_TESTS_ARE_INSTALLED
* Improve fixing of broken symlinks in INSTALL
This is all of the links in tests/eventscript/etc-ctdb/ so no need
to list them. Just find and fix them.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
* Remove unnecessary candimbl parameter.
This parameter can be cheaply calculated in
lcp2_failback_candidate(). The compiler will probably do an
excellent job optimising it. :-)
* Clarify a debug statement
This is much clearer than doing a complex recalculation of a known
value.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Currently this can be checked many times. However, there's no point
calling the rebalance/failback code at all if there are no rebalance
candidates.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
* Add stack dumps for "interesting" processes that sometimes get
stuck, so try to print stack traces for them if they appear in the
pstree output.
* Add new configuration variables CTDB_DEBUG_HUNG_SCRIPT_LOGFILE and
CTDB_DEBUG_HUNG_SCRIPT_STACKPAT. These are primarily for testing
but the latter may be useful for live debugging.
* Load CTDB configuration so that above configuration variables can be
set/changed without restarting ctdbd.
Add a test that tries to ensure that all of this is working.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The fast vacuum run may have increased the freelist size.
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Fri Feb 14 03:15:30 CET 2014 on sn-devel-104
In the first case, reconfiguration can longer happen in a monitor
event, so this is no longer a problem. Drop it.
Running a monitor event by hand no longer cancels the existing monitor
event. Instead the hand-run event fails. So do this differently and
just wait for a monitor event before continuing.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Thu Feb 13 04:05:57 CET 2014 on sn-devel-104
srcimbl gets changed on every iteration of the loop. The value that
should be stored for the new imbalance of the source node is
minsrcimbl.
To help diagnose this, added some extra debug that can be left in.
The extra debug changes the output of a couple of tests. Note that
the resulting IP allocations in those tests is unchanged - only the
debug output is changed.
Also add some new tests that illustrates the bug.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This adds a lot of IPs (currently 100) in a new network and deletes
them in a few steps. First the primary is deleted and then a check is
done to ensure that the remaining IPs are all correct. Then about 1/2
of the IPs and deleted and remaining IPs are checked. Then the
remaining IPs are deleted and a check is done to ensure they are all
gone.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Just enable this behaviour by default in the ip command stub, since
10.interface assumes/sets it. The rc.local replacement for set_proc()
doesn't do anything...
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
If a primary IP address is being deleted from an interface, the
secondaries are remembered and added back after the primary is
deleted. This is done under a lock shared by the add/del script code.
It is necessary because, by default, Linux deletes secondaries when
the corresponding primary is deleted.
There is a race here between ctdbd and the scripts, since ctdbd
doesn't know about the lock. If ctdbd receives a release IP control
and the IP address is not on an interface then it is regarded as a
"Redundant release of IP" so no "releaseip" event is generated. This
can occur if the IP address in question is a secondary that has been
temporarily dropped. It is more likely if the number of secondaries
is large.
Since Linux 2.6.12 (i.e. 2005) Linux has supported a
promote_secondaries option on interfaces. This option is currently
undocumented but that will change in Linux 3.14. With
promote_secondaries enabled the kernel will not drop secondaries but
will promote a corresponding secondary instead. The kernel does all
necessary locking.
Use promote_secondaries to simplify the code, avoid re-adding
secondaries, avoid re-adding routes and provide improved performance.
This could be done conditionally, with a fallback to legacy
secondary-re-adding code, but no supported Linux distribution is
running a pre-2.6.12 kernel so this is unnecessary.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
If CTDB_DEUB_HUNG_SCRIPT is set, use that instead of the default
debug script. This code was dropped by mistake in commit
18c1f43210.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Wed Feb 12 08:47:47 CET 2014 on sn-devel-104
This adds new files for Ganesha's recovery. myreleaseip_* are used by
the recovery thread on the node where IP is released. The releaseip_*
and tekeip_* files are used by recovery thread where IP is taken over.
Signed-off-by: Srikrishan Malik <srimalik@in.ibm.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Fri Jan 31 07:52:46 CET 2014 on sn-devel-104
If event script does not exist or does not have execute permissions, then
return negative errno to distinguish from the exit errors of event script.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
ctdb_get_my_public_addresses() attempts to echo things and this causes
an error if head has taken the first line and the pipe is closed.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Fri Jan 31 05:30:38 CET 2014 on sn-devel-104
It should support primary and secondaries per network instead of per
interface.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Signed-off-by: Srikrishan Malik <srimalik@in.ibm.com>
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Thu Jan 30 11:18:19 CET 2014 on sn-devel-104
Instead of using RB tree for sorting the script names (incorrectly since
it's only using the leading numbers in the script name), use scandir
with alphasort.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Tue Jan 21 06:41:25 CET 2014 on sn-devel-104
Any currently running monitor events are cancelled if any other events
are scheduled. However, this does not stop monitor events to be run
when other events are already running.
Keep track of the number of active events and schedule monitor event
only if there are no active events.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Services can be flagged for reconfigure when they release IPs at
shutdown. The flag is never removed and the service is prematurely
reconfigured during the first "ipreallocated" event, before any IPs
are hosted and before the "startup" event has actually started the
services.
$CTDB_VARDIR/state directly contained the service state subdirectories
and is already removed in the "init" event. Just push the service
state subdirectories down a level and put everything else in a
subdirectory.
This way all the eventscript state gets cleaned up every time CTDB
starts up.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Fri Jan 17 09:58:26 CET 2014 on sn-devel-104
At the moment ctdb_check_healthy() is overloaded to wait until the
first recovery is complete, handle the "startup" event and also
actually handle monitoring. This is untidy and hard to follow.
Instead, have the daemon explicitly wait for 1st recovery after the
"setup" event. When first recovery is complete, schedule a function
to handle the "startup" event. When the "startup" event succeeds then
explicitly enable monitoring.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Currently the lock is held until the corresponding eventscript
completes, since the process still exists. If the regular part of an
eventscript hangs then the lock might unnecessarily be held for a long
time. The pathological case is when a monitor event gets stuck in
D-wait state and the script times out but can't be killed so the lock
is still held. This can cause an unwanted monitor replay.
Change this so that the lock is released immediately after the
reconfiguration is complete.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Failure might be expected when disabling takeover runs on banned
nodes, since they might be suffering from performance problems or
similar. More broadly, administrators who reconfigure a cluster that
isn't in a happy state aren't necessarily doing something sensible.
However, allowing takeover runs to be disabled on inactive nodes stops
reconfiguration of stopped nodes. This is probaby an unreasonable
limitation, so drop it.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Currently timeouts for controls to inactive nodes can cause banning
credits to be applied. This should not happen.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This function has been replaced with ctdb_vfork_with_logging().
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Thu Jan 16 04:05:35 CET 2014 on sn-devel-104
This will be used to spawn lightweight helper processes to run
eventscripts.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
This was added to support external monitoring using CTDB event scripts.
However, it was never used.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
"monitor" events can be cancelled. If a reconfigure action does a
service restart then the "monitor" event can be cancelled at the
inconvenient moment after the service is stopped. In this case the
service stays down and the node may become unhealthy (depending on
whether there are any repair actions in the monitor event).
A long time ago we did service reconfiguration in "monitor" events
following failovers. Service reconfiguration was then moved to the
"ipreallocated" event. However, reconfiguration in "monitor" events
has been kept as a last resort in case an "ipreallocate" event does
not occur. The only important case that this covers is "ctdb
deleteip", where "releaseip" events are generated without a
corresponding "ipreallocated". Therefore, IPs can be deleted without
running the required service reconfiguration.
The supported way of removing IP addresses is now via "ctdb
reloadips", which always causes a takeover run with a corresponding
"ipreallocate" event.
This means that service reconfiguration in "monitor" events is no
longer required and should be removed because it is unsafe.
Also update the associated tests. Make the first confirm that the
monitor event no longer does reconfiguration. Change the others to
test that monitor status is correctly replayed when something else is
doing a reconfigure and currently holds the reconfigure lock.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue Dec 17 06:32:35 CET 2013 on sn-devel-104
autogen is already run in maketarball.sh which generates
the tarball for the RPM.
This way, we don't have a rpm build dependency on autoconf.
Recent changes introduced a dependency into autoconf
version >= 2.60, so this fix allows the generated
source RPM to be built also on older platforms.
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Mon Dec 9 05:47:00 CET 2013 on sn-devel-104
explain how to run individual tests and test collections and remove mention of
tests/scripts/run_tests which does not exist any more.
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
At the moment run_tests.sh has quite fragile argument processing. It
needs that annoying "--" between options and tests. The random
default (mktemp -d) for TEST_VAR_DIR is wrong and is worked around in
various places.
Instead:
* Change the default behaviour to print a summary, add new option -N
to turn off summary, and remove old -s option.
* Change the default behaviour to run integration tests with local
daemons, add new options -c to run on a cluster, remove old -l
option.
* Make $testdir/var the default if the tests are not installed, and
$(mktemp -d ) the default if tests are installed.
* Move the default tests for local/cluster into scripts/run_tests.
run_tests.sh (and the run_cluster_tests.sh symlink) should behave as
before but with slightly more reasonable defaults.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
Don't scatter the TEST_LOCAL_DAEMONS logic around the code. Limit it
to the local daemons file.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
setup_ctdb() doesn't need to do anything on a cluster. To avoid a
conditional, just override it for local daemons.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
This was the start of some refactorisation that was never completed.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
This hasn't been required for a long time and is probably broken. If
it is needed in future then we know where to find it.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
This is just a straight move. The clever stuff will follow. :-)
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
This currently requires an eventscript to be dynamically installed.
This eventscript is only used to help determine when a monitor event
has occurred. This code is horrible and fragile.
A better way is to just monitor the output of "ctdb scriptstatus".
When changes it changes then a monitor event has occurred.
Also remove the old code that checks for tickle information in shared
storage. CTDB hasn't done things this way for a long time.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
This case of "ip link show" does not break autobuild with
"Broken pipe" messages, but let's be consistent.
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Thu Nov 28 09:23:03 CET 2013 on sn-devel-104
This removes the requirement to create a temporary file
and hence makes this test runnable against local daemons
and against a real cluster without further changes.
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This fixes running "make autotest" from autobuild, since
it prevents irritating error output in delete_ip_from_iface()
when calling ip addr list ... | grep -Fq "inet ..." .
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This makes scripts called in the unit tests behave like
when called from ctdbd which ignodes SIGPIPE.
This also makes the scrips behave the same when
called from "make autotest" directly and via autobuild (python).
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
To test CTDB on a cluster we need to be able to build test RPMs with
relatively sane version numbers. This is a minimal change to allow
that to happen, until CTDB is integrated into the Samba build system.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Otherwise this should use mktemp, something should look at the output
and the file should be removed. :-)
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
Autobuild-User(master): Michael Adam <obnox@samba.org>
Autobuild-Date(master): Wed Nov 27 20:39:00 CET 2013 on sn-devel-104
This needs to run with socket wrapper and needs to stop after the 1st
error.
Pair-Programmed-With: Michael Adam <obnox@samba.org>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Signed-off-by: Martin Schwenke <martin@meltin.net>
Signed-off-by: Michael Adam <obnox@samba.org>
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Using a variable is too fragile, so use a function instead.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
Also match $TEST_VAR_DIR in the socket name. This means that we'll
only ever kill ctdbd process belong to our own test run.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
If these configuration variables are not defined, then there should
a default fallback. This is a workaround till CTDB compile time
configuration can be accessed at runtime.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
If NFS RPC checks do restart Ganesha, then it's possible that share
check can fail prematurely.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
If a node isn't numeric then it is silently converted to 0.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
Otherwise new requests can come in during the latter parts of the
takeover run when the IP allocation algorithm has already run, and the
new requests will be dequeued even though they haven't really be
processed.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
If $statd_state is empty then the loop will run once and print
spurious errors.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
This fixes an alignment discrepancy on 32-bit vs 64-bit platforms.
sizeof(struct ctdb_ltdb_header) = 20 (32-bit)
= 24 (64-bit)
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
* Low level DB checks should ignore the sequence number record.
* A restart is needed after messing with the RecoverPDBBySeqNum
tunable.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
Otherwise recovery ends up done by RSN when it is unnecessary.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
This code can then be used to track child processes created with vfork().
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
This prevents spamming of logs if multiple lock requests are waiting
and keep timing out.
Also, improve the logging format with separators.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
When a child process is created for a lock request, the current locks
statistics should be updated immediately. This will provide accurate
information on number of active lock requests.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
This limit was currently a global limit and not per database. This
prevents any database freeze lock requests from getting scheduled if
the global limit was reached.
Only individual record requests should be limited and database freeze
requests should always get scheduled.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
This is naive and assumes no performance problems when updating
persistent DBs. It also does no error handling.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
This fixes the issue of transaction commit failing due to an empty
__db_sequence_number__ record in persistent database left by previous
cancelled transaction.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
Also:
* More <refentryinfo> above <refmeta> to make the XML valid.
* Describe DB argument in introduction and use it for database
commands.
* Remove unnecessary format="linespecific" from <screen> tags, since
it will not be allowed in DocBook 5.0.
* Sort the items in "INTERNAL COMMANDS".
* Update/simplify some command descriptions.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
Also add test.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
This can be useful for piping data to onnode in certain circumstances.
There are now also enough command-line options that they should
definitely be alphabetically ordered.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
Currently it only passes the last (non -v) option seen. It should
pass them all.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
When running a mixed version cluster, compatibility with older
versions was was broken during recent refactorisation.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
That is, don't use fixed paths.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
This also happens earlier in do_recovery() and the nodemap is not
updated after that, so this update is redundant.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
$CTDB_TEST_WRAPPER is required only to run test functions or test binaries
on remote nodes. For running ctdb command, $CTDB is sufficient.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
Autobuild-User(master): Michael Adam <obnox@samba.org>
Autobuild-Date(master): Tue Nov 19 19:06:51 CET 2013 on sn-devel-104