IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Parallel database recovery freezes databases in parallel and irrespective
of database priority. So drop priority from freeze/thaw code.
Database priority will be dropped completely soon.
Now FREEZE and THAW controls operate on all the databases.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
If the database is not frozen and recovery mode is not active, then
vacuuming can continue.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
This makes it easier to drop unused async implementation of the same.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Databases should never be thawed manually. A database recovery will
correctly thaw all databases. Otherwise there is a bug in the database
recovery.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
CTDB web pages are now tracked in a separate repo. Most of the pages
are outdated and will be removed soon with links to the wiki.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
CTDB has a lot of scripts and shellcheck can be useful to find dubious
scripting practices.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Thu Jul 21 06:06:49 CEST 2016 on sn-devel-144
This helps unit tests locate CTDB's scripts. It is being created to
support some new shellcheck unit tests.
Hopefully, it can also be used to simplify some of the symlink
complexity in some other unit tests suites, such as eventscripts.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
SC2006: Use $(..) instead of legacy `..`.
Make the obvious changes to $(...) but convert some to $((...)).
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
SC2124: Assigning an array to a string!
Assign as array, or use * instead of @ to concatenate.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
SC2059: Don't use variables in the printf format string.
Use printf "..%s.." "$foo".
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
SC2012: Use find instead of ls to better handle non-alphanumeric filenames.
Make this cope better with unexpected whitespace.
Unfortunately, this results in shellcheck warning:
SC2035: Use ./*.tdb.* so names with dashes won't become options.
No! Then stat(1) will print ./file.tdb.X. We want the basenames and
we know the filenames don't start with dashes.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
SC2012: Use find instead of ls to better handle non-alphanumeric filenames.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
SC2038: Use -print0/-0 or -exec + to allow for non-alphanumeric filenames.
The suggested options aren't POSIX-compliant. This is more portable.
Base filenames can't have whitespace so rework to avoid problems with
whitespace in directory name.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
SC2039: In POSIX sh, ulimit -c/-n is not supported.
Have shellcheck suppress the warnings. If -n is not supported then
don't set CTDB_MAX_OPEN_FILES. If packaging for a platform where -c
is not supported then remove this code and associated documentation.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
SC2039: In POSIX sh, -nt is not supported.
This script is specific to the Linux NFS implementation. The -nt
operator is well supported in Linux shells (e.g. dash, bash, ksh).
The alternatives (e.g. using stat(1)) would result in less readable
code.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
SC2094: Make sure not to read and write the same file in the same pipeline.
The semantics here are unclear, so use a separate flock file in each
case for clarity.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
SC2039: In POSIX sh, echo flags are not supported.
echo -n is well supported but the changes are simple.
Improve some logic, replace some instances with printf. Who knew
printf was in POSIX?
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
SC2039: In POSIX sh, 'type' is not supported.
type is commonly supported and is more portable than which(1).
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
SC2015: Note that A && B || C is not if-then-else. C may run when A is
true.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
SC2119: Use FUNC "$@" if function's $1 should meanscript's $1.
SC2120: FUNC references arguments, but none are ever passed.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
SC2017: Increase precision by replacing a/b*c with a*c/b.
This code intentionally rounds to an even value.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
SC1004: You don't break lines with \ in single quotes, it results in
literal backslash-linefeed.
These don't hurt, since awk can cope with the continuations. However,
they don't add anything.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
SC2154: VAR is referenced but not assigned.
Change ctdb_setup_service_state_dir(), ctdb_get_pnn() and
ctdb_get_ip_address() to print the value so it can be assigned to a
variable. The performance gain from avoiding the sub-shells when
calling these functions is close to zero.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
SC2046: Quote this to prevent word splitting.
SC2086: Double quote to prevent globbing and word splitting.
Add some quoting where it makes sense. Use shellcheck directives for
false-positives.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
SC2034: VAR appears unused. Verify it or export it.
Drop some variables that are unnecessarily used. Use shellcheck
directive for false-positives.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
SC2030: Modification of VAR is local (to subshell caused by (..) group).
SC2031: VAR was modified in a subshell. That change might be lost.
Fix a related, incorrect comment.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
SC2016: Expressions don't expand in single quotes, use double quotes for that.
Error messages are now arguably more readable.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
It is no longer used and adds needless complexity.
As a side-effect, the functions file can now be parsed by shellcheck.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This makes the logic more obvious.
Fix the (probably) accidental fall-through to the regular monitor
failure.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
* Re-indent case labels as per new script style
Other indentation can be tweaked later as code changes, but the
labels are an obvious bulk change.
* Minor whitespace fixes
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
It doesn't do anything. Add a comment to its definition to explain
why it is still there.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Ensures that backups of corrupt TDB files are correctly limited in
number.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Tweak eventscript unit test infrastructure to support.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
If there are insufficient arguments then they can't be shifted.
This function will be removed shortly. However, it needs to work for
now as tests will be added that depend on it to work.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
==27786== Syscall param write(buf) points to uninitialised byte(s)
==27786== at 0x62820D0: __write_nocancel (syscall-template.S:84)
==27786== by 0x428B57: ctdb_queue_send (ctdb_io.c:322)
==27786== by 0x41F3B1: ctdb_client_queue_pkt (ctdb_client.c:153)
==27786== by 0x41F3B1: ctdb_client_send_message (ctdb_client.c:603)
==27786== by 0x419FA3: srvid_broadcast.constprop.26 (ctdb.c:1965)
==27786== by 0x41B869: control_reload_nodes_file (ctdb.c:5696)
==27786== by 0x404DBA: main (ctdb.c:6008)
==27786== Address 0x7ead310 is 144 bytes inside a block of size 168 alloc'd
==27786== at 0x4C2BBCF: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==27786== by 0x564DBEC: __talloc_with_prefix (talloc.c:675)
==27786== by 0x564DBEC: __talloc (talloc.c:716)
==27786== by 0x564DBEC: _talloc_named_const (talloc.c:873)
==27786== by 0x564DBEC: _talloc_zero (talloc.c:2318)
==27786== by 0x41E1E2: _ctdbd_allocate_pkt (ctdb_client.c:59)
==27786== by 0x41F37D: ctdb_client_send_message (ctdb_client.c:594)
==27786== by 0x419FA3: srvid_broadcast.constprop.26 (ctdb.c:1965)
==27786== by 0x41B869: control_reload_nodes_file (ctdb.c:5696)
==27786== by 0x404DBA: main (ctdb.c:6008)
==27786==
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
State is stolen onto tmp_ctx above so can't be referenced after
tmp_ctx is freed. So, state->status has to be looked at earlier.
Moving it immediately before the talloc_free(tmp_ctx) isn't sufficient
because invoking the callback appears to cause a recursive call to
ctdb_control_recv(), which also frees state.
Referencing it at the top seems safe.
==23982== Invalid read of size 4
==23982== at 0x4204AE: ctdb_control_recv (ctdb_client.c:1181)
==23982== by 0x420645: invoke_control_callback (ctdb_client.c:971)
==23982== by 0x5E675EC: tevent_common_loop_timer_delay (tevent_timed.c:341)
==23982== by 0x5E68639: epoll_event_loop_once (tevent_epoll.c:911)
==23982== by 0x5E66BD6: std_event_loop_once (tevent_standard.c:114)
==23982== by 0x5E622EC: _tevent_loop_once (tevent.c:533)
==23982== by 0x4255F7: ctdb_client_async_wait (ctdb_client.c:3385)
==23982== by 0x42578A: ctdb_client_async_control (ctdb_client.c:3442)
==23982== by 0x41B405: ctdb_get_nodes_files (ctdb.c:5488)
==23982== by 0x41B405: check_all_node_files_are_identical (ctdb.c:5530)
==23982== by 0x41B405: control_reload_nodes_file (ctdb.c:5673)
==23982== by 0x404DBA: main (ctdb.c:6008)
==23982== Address 0x7e98d9c is 108 bytes inside a block of size 168 free'd
==23982== at 0x4C2CDFB: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==23982== by 0x5652692: _tc_free_internal (talloc.c:1125)
==23982== by 0x5652692: _tc_free_children_internal (talloc.c:1570)
==23982== by 0x564B952: _tc_free_internal (talloc.c:1081)
==23982== by 0x564B952: _talloc_free_internal (talloc.c:1151)
==23982== by 0x564B952: _talloc_free (talloc.c:1693)
==23982== by 0x4204C9: ctdb_control_recv (ctdb_client.c:1182)
==23982== by 0x4207AA: async_callback (ctdb_client.c:3350)
==23982== by 0x4204AD: ctdb_control_recv (ctdb_client.c:1179)
==23982== by 0x420645: invoke_control_callback (ctdb_client.c:971)
==23982== by 0x5E675EC: tevent_common_loop_timer_delay (tevent_timed.c:341)
==23982== by 0x5E68639: epoll_event_loop_once (tevent_epoll.c:911)
==23982== by 0x5E66BD6: std_event_loop_once (tevent_standard.c:114)
==23982== by 0x5E622EC: _tevent_loop_once (tevent.c:533)
==23982== by 0x4255F7: ctdb_client_async_wait (ctdb_client.c:3385)
==23982== Block was alloc'd at
==23982== at 0x4C2BBCF: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==23982== by 0x564DBEC: __talloc_with_prefix (talloc.c:675)
==23982== by 0x564DBEC: __talloc (talloc.c:716)
==23982== by 0x564DBEC: _talloc_named_const (talloc.c:873)
==23982== by 0x564DBEC: _talloc_zero (talloc.c:2318)
==23982== by 0x42017F: ctdb_control_send (ctdb_client.c:1086)
==23982== by 0x425746: ctdb_client_async_control (ctdb_client.c:3431)
==23982== by 0x41B405: ctdb_get_nodes_files (ctdb.c:5488)
==23982== by 0x41B405: check_all_node_files_are_identical (ctdb.c:5530)
==23982== by 0x41B405: control_reload_nodes_file (ctdb.c:5673)
==23982== by 0x404DBA: main (ctdb.c:6008)
==23982==
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
The point of this code is almost certainly to return non-zero when
state->errormsg is set. So, return state->status if non-zero, -1
otherwise.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
This avoids relevant shellcheck warnings. This is most of the
shellcheck low hanging fruit in the non-test code. Many of the other
warnings produced by shellcheck are either false positives, are
non-trivial to fix or a fix may result in worse code.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Wed Jul 6 08:15:49 CEST 2016 on sn-devel-144
* Assign the output of dirname to temporary variable to avoid word
splitting when directory name contains whitespace
* Drop export of CTDB_BASE to avoid masking broken return value -
functions file does the export anyway
* Quote path when including functions file
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This avoids having to export it in every file that includes the
functions file.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Added so that nfs_check_services() could be run against an arbirary
directory. However, with the function moved to the event script, this
isn't useful. CTDB_NFS_CHECKS_DIR can be used for testing instead.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
There is no need to explicitly check that recovery is not active before
sending TRANS33_COMMIT control. Just try TRANS3_COMMIT control and if
recovery occurs before the control is completed, the control will fail
and it can be retried.
Make sure g_lock lock is released after the transaction is complete.
Also, add timeout to the client api.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Since g_lock checks if the process exists in case of conflicting lock,
there is no need to register srvid.
Transaction start returns a transaction handle and transaction
commit/cancel will free that handle. Since we cannot call async code
in a talloc destructor, this avoids the use of talloc destructor for
cancelling the transaction.
If user frees the transaction handle instead of calling transaction
cancel, it will leave stale g_lock lock. This stale g_lock lock will
get cleaned up on next transaction attempt.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
If a conflicting g_lock entry is found, check if the process exists.
This matches Samba implementation.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Instead of delaying for 1 second, try to get g_lock lock again after
1 milli-second.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
In delete_record, sync call to ctdb_ctrl_schedule_for_deletion will
cause nested event loops. Instead wrap the async version.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
In commit 1ee7053180, the
ctdb_rec_buffer_traverse always passes NULL for header. So explicitly
extract header from the data.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Logging that node has lost election is less useful than knowing which
node has won the election.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
The ipalloc code doesn't need a CTDB context so neither should the
code that tests it.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
No longer require CTDB context but pass in number of nodes, algorithm,
no_ip_failback and force_rebalance_nodes.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
At the moment IP is short-circuited when there are no available IP
addresses. However, if some IP addresses are already allocated then
"no available IP addresses" means that all the addresses should
(probably) be released. The current short-circuit means that no
already hosted IP addresses will be released.
The short-circuit exists to avoid lots of messages saying that all IP
addresses can not be assigned at startup time. So, add a check to
ipalloc_can_host_ips() so that it succeeds if IP addresses are already
allocated to nodes.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Abstracts out code involving internals of IP allocation state.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This is never used in the allocation algorithms. It is only used when
building the merged IP list.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
How the existing IP layout is constructed and how the merged IP list is
sorted are important aspects of the IP allocation algorithm. Construct the
merged IP list when known and available IPs are assigned.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Use ctdb_fetch_remote_public_ips() inline to fetch each list. Assign
them into the IP allocation state separately.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Factor out new function ctdb_fetch_remote_public_ips() to fetch known
or available public IP addresses, according to flags.
This also drops the hack where the array from a
ctdb_public_ip_list_old was assigned to a pointer in a
ctdb_public_ip_list.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
It isn't used outside this function, so just use a local variable.
This makes create_merged_ip_list() independent of the CTDB context.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
It is only run during a takeover run and only logs errors. It doesn't
actually do anything to fix potential errors. The takeover run should
fix any inconsistencies anyway.
Instead, leave a comment in the recovery daemon's monitoring loop to
add proper remote IP verification later.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This is unnecessary. IP allocation state already has a node count and
"i" is already a PNN.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Deleted (and other inactive) nodes will have an empty list of known
IP addresses.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This pointer is for an array that is always allocated. The check is
meant to skip a node that has no IP addresses. However, when there
are no IP addresses the loop below will not do anything anyway.
Add this as a check at the beginning of the function instead.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
No need to allocate these and iterate as
read_ctdb_public_ip_info_node() now returns a usable array.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
If there is per-node data then each chunk is read in a separate call
and is cherry-picked out into known_public_ips[] for each node. This
is confusing.
Instead, a single call now reads all data for multiple nodes and
returns complete arrays of known and available IP addresses.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
It isn't used outside this function. Instead, update k directly.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Known public IPs array is now dynamically allocated instead of
allocated once with artificial size limit.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
These tests aren't run anywhere. They were used to test internal
functions during development.
The aim is to simplify this test program so that it can be linked with
the ipalloc subsystem, allowing removal of ctdbd_test.c and all of its
complications.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
In case of database recovery failure, if there are no banning credits
assigned, then the async computation is never terminated. The else
condition is missing in (max_credits >= NUM_RETRIES) check.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Fri Jun 24 09:56:23 CEST 2016 on sn-devel-144
(showing what is the rule and what is the exception)
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Michael Adam <obnox@samba.org>
Autobuild-Date(master): Tue Jun 21 11:48:29 CEST 2016 on sn-devel-144
... instead of --nosetsched command-line option.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
Autobuild-User(master): Michael Adam <obnox@samba.org>
Autobuild-Date(master): Mon Jun 20 20:22:57 CEST 2016 on sn-devel-144
Test with 0-sized arrays in various data types.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
Autobuild-User(master): Volker Lendecke <vl@samba.org>
Autobuild-Date(master): Sat Jun 18 23:31:50 CEST 2016 on sn-devel-144
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Sat Jun 11 10:23:03 CEST 2016 on sn-devel-144
This makes ctdbd_wrapper usable in non-standard installs.
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Once DB_PUSH_START is processed as part of recovery, push_started
flag tracks if there are multiple attempts to send DB_PUSH_START.
In DB_PUSH_CONFIRM, once the record count is confirmed, all information
related to DB_PUSH should be reset. However, The push_started flag was
not reset when the push_state was reset.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Wed Jun 8 14:31:52 CEST 2016 on sn-devel-144
BUG: https://bugzilla.samba.org/show_bug.cgi?id=11956
In do_recovery, after the recovery and takeover is complete, recoverd
event is triggered. When the parallel database recovery was separated,
ctdb_recovery_helper implemented sending END_RECOVERY control which
causes recoverd event to be triggered. So when there is parallel database
recovery, recoverd event is triggered twice.
Instead move the call to run_recovered_eventscript() explicitly in
the serial recovery code path. This avoids the duplication trigger of
recoverd event.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
This generates takeip-pre and releaseip-pre call-out events.
One use is to put NFS into grace before an IP is assigned to an
interface.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
A second NFS eventscript may be required, so make this code available
to it.
The initialisation code can't be evaluated in the functions file
because service_state_dir isn't yet setup, so put it in a function and
call it with other initialisation code.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Wed Jun 8 04:52:18 CEST 2016 on sn-devel-144
The recovery lock helper must exit when it notices its parent is gone.
However, that can take a few seconds.
The usual way of terminating the recovery daemon is for the main ctdbd
to send it a SIGTERM. Installing a handler is nice and simple.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
If the process holding the recovery lock terminates unexpectedly then
the recovery daemon needs to know that the lock is no longer held.
While here, rename hold_reclock_handler() to take_reclock_handler() so
there is a clear difference between the two handler names.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This makes the API more general. If they are needed in a handler then
they can be in the private data.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This will allow a simplification of the cluster mutex API, so the
private data can be registered when calling ctdb_cluster_mutex().
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
It won't be called more than once by the cluster mutex code.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
After the first activity on the file descriptor, ignore any subsequent
activity. Single-shot handlers are easier to write.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
It isn't necessarily a file.
Don't bother changing the control, since it doesn't pervade the code.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Setting the recovery lock file at startup can be done more simply.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Support for updating the recovery lock is being removed because it
isn't possible to recover from failure.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
If the recovery lock setting is not consistent with that of the
recovery master then abort.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The recovery lock can not be reliably updated at run-time. If it
fails to update on some nodes then split-brain protection is gone and
there is no reasonable way to repair the situation. CTDB will have to
be restarted on all nodes. So, if this feature is being used to avoid
scheduling an outage then an outage will have to be scheduled just in
case!
To update the recovery lock, shut down CTDB on all nodes, reconfigure
the recovery lock and start CTDB again.
Those that *really* want to be able to change the recovery lock at
run-time can still do so. Set CTDB_RECOVERY_LOCK to point to a script
and this script can then be modified at run-time. However, please
don't report bugs if bad things happen...
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
BUG: https://bugzilla.samba.org/show_bug.cgi?id=11946
Commit 670db6ac1d split tevent-util public
library to create tevent-unix-util public library for standalone ctdb
use. This created a public library dependency between samba and ctdb
for packaging.
Bundle tevent_unix.c in public library tevent-util as before. However,
to avoid the dependencies for packaging, standalone ctdb build will
build tevent-util as a private library with only tevent_unix.c
This simplifies any new subsystems (or libraries) which need tevent-util
and are linked in both samba and ctdb.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
The timeout RecoverTimeout (default 120) is used for control messages
sent during the recovery. If any of the nodes does not respond to any
of the recovery control messages for RecoverTimeout seconds, then it
will cause a failure of recovery of a database. Recovery helper will
retry the recovery for a database 5 times.
In the worst case, if a database could not be recovered within 5 attempts,
a total of 600 seconds would have passed. During this time period other
timeouts will be triggered causing unnecessary failures as follows:
1. During the recovery, even though recoverd is processing events,
it does not send a ping message to ctdb daemon. If a ping message is
not received for RecdPingTimeout (default 60) seconds, then ctdb will
count it as unresponsive recovery daemon. If the recovery daemon
fails for RecdFailCount (default 10) times, then ctdb daemon will
restart recovery daemon. So after 600 seconds, ctdb daemon will
restart recovery daemon.
2. If ctdb daemon stays in recovery for RecoveryDropAllIPs (default 120),
then it will drop all the public addresses. This will cause all
SMB client to be disconnected unnecessarily. The released public
addresses will not be taken over till the recovery is complete.
To avoid dropping of IPs and restarting recovery daemon during a delayed
recovery, adjust RecoverTimeout to 30 seconds and limit number of
retries for recovering a database to 3. If we don't hear from a node
for more than 25 seconds, then the node is considered disconnected.
So 30 seconds is sufficient timeout for controls during recovery.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Mon Jun 6 08:49:15 CEST 2016 on sn-devel-144
If the node becomes stopped or banned after recovery is marked
active, then it will never freeze the databases, and hence the
node will keep banning itself indefinitely, until ctdbd is restarted.
This is a regression from 4.3, introduced with
b4357a79d9
and
d8f3b490bb
BUG: https://bugzilla.samba.org/show_bug.cgi?id=11945
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Michael Adam <obnox@samba.org>
Autobuild-Date(master): Wed Jun 1 17:36:12 CEST 2016 on sn-devel-144
This adapts the debug message in local_node_got_banned
to reflect what the function is currently doing.
This message was not adapted when the function was changed.
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
Autobuild-User(master): Jeremy Allison <jra@samba.org>
Autobuild-Date(master): Wed Jun 1 04:30:36 CEST 2016 on sn-devel-144
Both of these expand to 1. However, AF_LOCAL is a Unix domain socket,
which makes no sense when reading the code.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Thu May 26 11:42:46 CEST 2016 on sn-devel-144
With an empty value the first expression adds a trailing opening
quote, so the second expression doesn't add the closing quote. Handle
this with a special case.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This doesn't need to print the family. Nothing uses it.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Stub scripts are more obvious. rc.local should only be used when
strictly necessary.
iptables_wrapper doesn't need to be no-op-ed, provided flock is
installed.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Block and unblock IP addresses using these new functions. This makes
the code more readable.
The case statement in each function is very cheap, so there is no need
to prematurely optimise and pass the family.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
CTDB_INIT_STYLE isn't used in this script.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Kai Blin <kai@samba.org>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Fri May 20 21:06:18 CEST 2016 on sn-devel-144
Some Linux distributions don't have a "service" compatibility command.
To avoid breaking working systems, prefer the "service" compatibility
command just in case it does some extra, unexpected magic.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Kai Blin <kai@samba.org>
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Uri Simchoni <uri@samba.org>
Autobuild-User(master): Uri Simchoni <uri@samba.org>
Autobuild-Date(master): Tue May 17 21:21:30 CEST 2016 on sn-devel-144
If this fails, we want to know which states it wanted to move to. Don't do the
return before the debug.
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
Signed-off-by: Jose A. Rivera <jarrpa@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Sat May 14 03:06:05 CEST 2016 on sn-devel-144
Add CTDB_NFS_STATE_FS_TYPE and CTDB_NFS_STATE_MNT config options, show use in
nfs-ganesha-callout. Since the callout script is only an example, we
officially don't have default values for these.
Signed-off-by: Jose A. Rivera <jarrpa@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Comment typos and clarifications, erroneous variable names, corrected
pathnames, reorganizing variables, and squashing a few non-fatal
scripting errors.
Signed-off-by: Jose A. Rivera <jarrpa@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
If a recovery is going to be done then this will be followed by a
takeover run anyway. So, there's no use doing the takeover run
checks, potentially doing a takeover run and then doing a recovery.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The recovery daemon should be less involved in the service monitoring
logic.
The cases handled here are already handled elsewhere:
* When a node becomes unhealthy/healthy the monitoring code will
trigger a takeover run
* When a node is disabled/enabled the ctdb CLI tool will trigger a
takeover run
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
It will just become healthy again in the next monitor cycle.
Instead, let the recovery master ban it if the problem persists.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Banning is now handled by the takeover code sending banning credit
messages.
This commit makes a change in behaviour quite obvious. Takeover runs
were initiated from several locations in the code but banning was only
done from one of these locations. Now banning can be done from any
failed takeover run.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Post-process failues and only send banning credits to the node with
the most failures.
If there is a widespread problem or a problem on the recovery master
node then this should help avoid banning all the nodes.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This will allow banning credits assignments to be limited according to
some criteria.
Note that this only matters when multiple controls are sent to each
node: RELEASE_IP and TAKEOVER_IP. This doesn't change the behaviour
for IPREALLOCATED.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Banning credits are now assigned by takeover runs called from all
locations in the recovery daemon. Previously this only happened from
one of the callers. When separating out the takeover run code the
behaviour should be consistent.
The callback (and corresponding data) passed to ctdb_takeover_run() is
now ignored. Dropping this will allow the interface between the
recovery daemon and IP takeover to be simplified.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Probably due to oversight, this is currently only used for the
"takeip" step.
This does consistent error handling and provides a layer of
indirection to the passed callback, so use it for "releaseip" and
"ipreallocated" steps too.
The callback data now needs to be initialised before the first
possible jump to "ipreallocated".
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Abstract out the initialisation of the callback data. Later, we'll
need to do it multiple times or move it.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The ipreallocated control has been in CTDB for a long time.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
These tests do not use ctdb tool stub anymore.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Wed May 11 02:19:20 CEST 2016 on sn-devel-144