1
0
mirror of https://github.com/samba-team/samba.git synced 2024-12-30 13:18:05 +03:00
Commit Graph

5023 Commits

Author SHA1 Message Date
Amitay Isaacs
7562701153 ctdb-common: Coverity fixes
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
2013-11-19 17:13:05 +01:00
Amitay Isaacs
c72e745511 ctdb-client: Coverity fixes
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
2013-11-19 17:13:04 +01:00
Amitay Isaacs
6d1b74f052 ctdb-server: Coverity fixes
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
2013-11-19 17:13:03 +01:00
Amitay Isaacs
25f3c8b526 tests: Fix calling of ctdb tool from test
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 9381c33dfd40192b7532d942059c2959dfae059d)
2013-11-07 16:08:44 +11:00
Amitay Isaacs
e3e6c8576a Revert "tests: If transaction_start fails, try again"
This reverts commit ed7d999214ee009e480c26410a04fa105028cb8e.

This is not necessary since ctdb_transaction_start() now will return NULL
only when there is a failure and not when another transaction is currently
active.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 46615c8e0e63291605d76a6d35f1a93180718c36)
2013-11-07 16:08:32 +11:00
Amitay Isaacs
537d4abc11 client: Make g_lock_lock() wait till lock is obtained
This makes the behaviour of g_lock_lock() similar to that implemented in
Samba.  Now ctdb_transaction_start() will return NULL only when there are
failures and not when another transaction is active.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 59489019ad15a5ad6b0f295e742fc9832745a842)
2013-11-07 16:08:17 +11:00
Srikrishan Malik
ab59087775 eventscript: Fix link creation failure if the link already exist but the target path is missing
Signed-off-by: Srikrishan Malik <srimalik@in.ibm.com>

(This used to be ctdb commit 370022e1ff654db99d0c3ce0c49914c249e57289)
2013-11-01 13:09:05 +11:00
Martin Schwenke
a3e9fb9b8f doc: Update NEWS
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 30a6565a7b476516f3daed0669b5650e1be3cd18)
2013-10-30 16:01:14 +11:00
Amitay Isaacs
f7ccbf5187 web: Add links to new manpages
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit a7a844e7600b59d876de94ec5bf7bd1647508cdf)
2013-10-30 15:37:54 +11:00
Martin Schwenke
6315432f87 doc: Major updates to manual pages
This includes new manpages for ctdb.7, ctdb.conf.5 and ctdb-tunables.7.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 15b5c6c00c248bc1a8364a6da103296a55d7bfb6)
2013-10-30 15:37:54 +11:00
Amitay Isaacs
41d37058ca tunables: Remove obsolete tunables
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit ca5fc3431573c44d55d09d987c715fb53756fc1f)
2013-10-30 15:37:11 +11:00
Martin Schwenke
62076d3089 recoverd: Rebalancing should be done regardless tunable
Rebalance target nodes should be set even if a deferred rebalance is
not configured.  The user can explicitly cause a takeover run.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit afd9b51644af074752d74c412cb4e7ec2eba2c69)
2013-10-30 12:19:49 +11:00
Martin Schwenke
6b42805717 recoverd: Improve an error message in the election code
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 275ed9ebe287e39d891888c13810c70f347af8ac)
2013-10-30 11:34:56 +11:00
Martin Schwenke
5f80f4255c Revert "if a new node enters the cluster, that node will already be frozen at start"
This is unnecessary due to 03e2e436db5cfd29a56d13f5d2101e42389bfc94.
Furthermore, if a node doesn't force an election but wins it then it
can fail to record that it is the new recovery master.  This can lead
to a reverse split brain where there is no recovery master.

This reverts commit c5035657606283d2e35bea40992505e84ca8e7be.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

Conflicts:
	server/ctdb_recoverd.c

(This used to be ctdb commit c8b542e059a54b8d524bd430cad9d82e5edd864d)
2013-10-30 11:34:56 +11:00
Martin Schwenke
45b44a7155 ctdbd: When a node is connected, log at DEBUG NOTICE not DEBUG_INFO
This is important enough that we should see it when the log level is
DEBUG_NOTICE.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit eb8ec5681bfccb26c8ffae72952d54bb0ba46249)
2013-10-29 17:14:56 +11:00
Martin Schwenke
a41df343de tests/complex: Remove CTDB_NFS_SKIP_SHARE_CHECK test
This is a needlessly complex way of testing the same thing as the
eventscripts unit tests 60.nfs.monitor.161.sh and
60.nfs.monitor.162.sh.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit d1674aad224f8f0c9a03c3cd38a647318ba0f03e)
2013-10-29 17:14:56 +11:00
Martin Schwenke
1c4605cc81 tests/complex: Remove CTDB_SAMBA_SKIP_SHARE_CHECK test
This is adequately covered by eventscripts unit tests
50.samba.monitor.105.sh and 50.samba.monitor.106.sh.

This test is broken if CTDB_SAMBA_CHECK_PORTS is not specified in the
CTDB configuration.  Fixing it is hard and involves adding a more
complex stub for testparm.  We already have that in the eventscript
unit tests above.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 81b94fbb7495ac3204f1a84c673c8babf04663bc)
2013-10-29 17:14:56 +11:00
Martin Schwenke
edda442b36 eventscripts: Rewrite the smb.conf cache file handling
The background update is never guaranteed to complete before the cache
is used, so don't bother trying it at the beginning.  Instead, put a
timeout on a foreground update.

If the foreground update fails:

* If there's no available cache file then die.

* If there is a previous cache file then use it and log a warning.

* Do a background update at the end of the monitor event.

Also remove commas in the "smb ports" list before use, since (newer?)
testparm seem to insert commas into the default value.  Update the
associated test to add a comma.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 8c6f511254ecb0381a609b37e3a0ee6e5ec5d562)
2013-10-29 17:14:55 +11:00
Martin Schwenke
d9e2411ace tools/ctdb: Fix documentation string for ban command
Ban time of 0 is not supported.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit c072eb1f6488f94f83a6d3a81d88bf29ad866943)
2013-10-29 17:14:55 +11:00
Martin Schwenke
f88cf2d013 Revert "recoverd: Disable takeover runs on other nodes for 5 minutes"
5 minutes is too long to leave the cluster in limbo if the recovery
daemon dies during a takeover run, even though this is quite unlikely.
We need a new recover master to be able to do takeover runs fairly
quickly.

This reverts commit 71080676bb4acbd0d9b595a30cf7fe6dddbf426f.

(This used to be ctdb commit 3e41170c78fc7a2bf526129c9b7db3739b61c6bf)
2013-10-29 17:14:55 +11:00
Martin Schwenke
e8de58abd7 tools/onnode: Fix healthy/ok node handling
This bit-rotted a long time ago when the "ThisNode" column was added
to "ctdb -Y status" output.  The fake "ctdb -Y status" output in the
test was never updated to reflect this change.

Instead of making sure that all columns are "0", just check that
they're not "1".  This implicitly ignores "Y" and "N" in this
"ThisNode" column without having to do anything else clever.

Also update associated tests.  The main "ctdb ok" test had a duplicate
opening line for a here document, which was tickled by this change.

This fixes samba bz#8122.

Signed-off-by: Martin Schwenke <martin@meltin.net>

onnode test fixup

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 01a46205c3a3d6609dc0b0324319b89667dffa32)
2013-10-29 17:14:55 +11:00
Amitay Isaacs
fc7f335843 daemon: Change the default recovery method for persistent databases
Use sequence numbers to do recovery for persistent databases instead of
RSNs.  This fixes the problem of registry corruption during recovery.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 56486d1c01cc8ad0e4b8cee7a22429e72e50f03d)
2013-10-28 18:51:22 +11:00
Amitay Isaacs
a5c4794048 packaging: Create runtime directories for CTDB
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit c7450f9e22133333bf82c88a17ac25990ebc77ab)
2013-10-25 12:06:07 +11:00
Martin Schwenke
ab1b91caa4 initscript: Update systemd configuration to put PID file in /run/ctdb
Elsewhere we're moving the socket to /var/run/ctdb.  We might end up
with PID files and sockets for other daemons later, so let's call the
directory "ctdb" instead of "ctdbd".

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit b63f6fd2d295c8e18cbf3420ab05fce07b727f31)
2013-10-25 12:06:07 +11:00
Amitay Isaacs
7eb680a95f build: Move the default CTDB socket from /tmp to /var/run/ctdb
Use /var/run/ctdb/ctdbd.socket because there might be other daemons
that need sockets in the future.

The local daemons test code to create a link for the default
convenience socket has to be removed because the link can't be created
as a regular user in the new location.  This should be OK since all
calls to the ctdb tool in the test code should be wrapped in onnode.
When debugging tests, a developer will have to set CTDB_SOCKET by
hand.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-programmed-with: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit dc67a4e24af9d07aead2a1710eeaf5d6cc409201)
2013-10-25 12:06:07 +11:00
Amitay Isaacs
4432aef6d1 packaging: Move ctdb/ directory from /var to /var/lib
Introduce CTDB_VARDIR variable that points to /var/lib/ctdb by default.
This makes CTDB_VARDIR consistent across C code and scripts.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 2c09aac71188f43cd592572b10ea30b7a2969678)
2013-10-25 12:06:07 +11:00
Martin Schwenke
b595712f25 ctdbd: Simplify database directory setting logic
No need to check if the options are set.  The options are always set
via static defaults.

No need to talloc_strdup() the values via wrapper functions.  The
options aren't going away.  Remove now unused ctdb_set_tdb_dir() and
similar functions.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 1fe82f3d7b610547ff4945887f15dd6c5798a49b)
2013-10-25 12:06:06 +11:00
Martin Schwenke
a604c3d945 ctdbd: Remove duplicate database directory setting logic
Defaults for ctdb->db_directory and similar variables are currently
set in 2 places.

Change this to set them in only 1 place and make the directories at
initialisation time instead of waiting until later.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit d73d84346488a2ed54e6a86f9d7ec641c8e33ace)
2013-10-25 12:06:06 +11:00
Martin Schwenke
bd73e017b0 common: New function ctdb_mkdir_p_or_die()
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 7b971df79b0b63f83555205eacf48d49ca3a273a)
2013-10-25 12:06:06 +11:00
Martin Schwenke
c07e3830b3 common: New function mkdir_p()
Behaves like mkdir -p.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit afe2145d91725daf1399f0a24f1cddcf65f0ec31)
2013-10-25 12:06:06 +11:00
Amitay Isaacs
c393c8027f tcp: Create socket lock in /var/run/ctdb instead of /tmp
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-programmed-with: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit b9b9f6738fba5c32e87cb9c36b358355b444fb9b)
2013-10-25 12:06:06 +11:00
Amitay Isaacs
c330df8552 doc/examples: Add CTDB configuration examples
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 6a5469a63547029f4fc704a4d4075543e06c36d1)
2013-10-24 16:57:16 +11:00
Mathieu Parent
cdf507c4b5 Add missing $remote_fs LSB dependency
(This used to be ctdb commit a0b965bb73777dde7a4abf80c5c4742581bce520)
2013-10-24 16:54:08 +11:00
Mathieu Parent
6a03128c56 Improved check_ctdb
- increase verbosity with "-v"
- concat error messages (if there are several)
- handle 255 return code as warning (as it is the return code when any of the node is missing)
- read /etc/ctdb/nodes remotely (ctdb_check can be run on a non-ctdb host)

(This used to be ctdb commit cea81bdd503f6ef8b5bbd3582a8e0085bb02bc9f)
2013-10-24 16:54:08 +11:00
Mathieu Parent
0e80ca24f3 Add missing events.d/99.timeout
(This used to be ctdb commit 1f6cc8764e28058c56d0350147032b6e30cb355d)
2013-10-24 16:54:07 +11:00
Amitay Isaacs
17f8295460 eventscripts: Instead of listing all tunables, query EventScriptTimeout
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 58ca2c3e7e3a27023ad86660f01a2052e2a19635)
2013-10-24 16:54:07 +11:00
Michael Adam
49fcfd2cb3 ctdb_client.h: fix build on AIX by removing C++-style comments
Reported by John P Janosik <jpjanosi@us.ibm.com>

Signed-off-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit 1f327401f2e181780937aa3f6c479376ff787f3f)
2013-10-23 00:53:56 +02:00
Martin Schwenke
e782b61732 ctdbd: Pass the public address file location in ctdb context
No need to pass it as an extra argument to ctdb_start_daemon.

Also ensure options.public_address_list gets a nice static default.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit a3d63a9db89d08bb284b3b3a6db773422f21b477)
2013-10-22 15:37:54 +11:00
Martin Schwenke
463a091a77 ctdbd: Debug locks by default with override from enviroment variable
Default is debug_locks.sh, relative to CTDB_BASE.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit c11803e3dcc905a45a08d743595e63f9ca445f0d)
2013-10-22 15:37:54 +11:00
Martin Schwenke
4adc8f4f09 ctdbd: Default for event_script_dir should use CTDB_BASE
Also get rid of ctdb_set_event_script_dir().  It creates an
unnecessary copy of something that will be around for the lifetime of
the process.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 21b4d1aba00902f1eee0cbf4f082b0794fd5b738)
2013-10-22 15:37:54 +11:00
Martin Schwenke
f9ce563135 ctdbd: Add nodes_file member to struct ctdb_context
This allows ctdb_load_nodes_file() to move to ctdb_server.c and
ctdb_set_nlist() to become static.

Setting ctdb->nodes_file needs to be done early, before the nodes file
is loaded.  It is now set from CTDB_BASE instead ETCDIR, so setting
CTDB_BASE also needs to be done earlier.

Unhack ctdbd_test.c - it no longer needs to define
ctdb_load_nodes_file().

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 20e705e63bd3b20837cc3ac92fdcf2a9650ccfc8)
2013-10-22 15:37:54 +11:00
Martin Schwenke
27a6343369 tools/ctdb: CTDB_BASE is the default location of configuration files
Ensure that environment variable CTDB_BASE is set.

Update defaults for nodes and natgw_nodes to use CTDB_BASE.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 2b6dc0d2799f3563b767622b6f9246450aa4036b)
2013-10-22 15:37:54 +11:00
Martin Schwenke
7c90395136 ctdbd: Don't check CTDB_BASE before setting it, just don't override
That's what the 3rd argument to setenv(3) is for...  :-)

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 30ca419aa1c78008f81839497921bbfba480e7fc)
2013-10-22 15:37:54 +11:00
Martin Schwenke
b331fab515 tests/integration: Pass --valgrinding option when running under valgrind
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 913f229508302378212678d98c22606a4954b09c)
2013-10-22 15:37:54 +11:00
Martin Schwenke
82e5effc40 ctdbd: Fix some errors in the popt configuration
That 4th argument isn't a default or similar, so consistently make it 0.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 1c0a627df1b510f49c65ffeb4474240c8856cdf2)
2013-10-22 14:34:05 +11:00
Martin Schwenke
4fbf3e5bdf initscript: New configuration variable CTDB_DBDIR_STATE
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 30d9b634b16c3cc740e5e453ea5c21012b1fde88)
2013-10-22 14:34:05 +11:00
Martin Schwenke
37aea37269 scripts: Make detect_init_style() more readable
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 516cdea0e73cf3f63b3303e22809834c8cbc64e4)
2013-10-22 14:34:05 +11:00
Martin Schwenke
0b69785eb2 eventscripts: Rework the iSCSI eventscript
* It should run on "ipreallocated" instead of "recovered"
* Variable name NODE -> ip since that's what it is
* Simplify some logic

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 45e2bc66abf9fcfeadcc279a656ed7fd1838920a)
2013-10-22 14:34:05 +11:00
Martin Schwenke
04c31bf50d eventscripts: Don't update static routes on "recovered" event
Routes only need to be updated when IPs have moved.  IP takeover runs
will generate "ipreallocated", which is enough.  "recovered" always
follows "ipreallocated" anyway, so avoid the redundancy.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 1152215fc69217e4292762e28d193b7ea0e06ee3)
2013-10-22 14:34:05 +11:00
Martin Schwenke
3132550a88 eventscripts: NAT gateway script doesn't need to handle "recovered" event
Any time a node changes flags in any significant way there will be a
takeover run, which will generate an "ipreallocated" event.  The
"recovered" event always happens straight after a takeover run so we
update the NAT gateway twice.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 542c70d6281d636ecd51502fbbf219f418bfac66)
2013-10-22 14:34:05 +11:00
Martin Schwenke
5369f711dc eventscripts: Delete placeholder "recovered" and "shutdown" events
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 00736a21fc268c10b6a718731e56b3dbb7e60554)
2013-10-22 14:34:04 +11:00
Martin Schwenke
2e819aa00f eventscripts: Clean up comment at the top of 00.ctdb
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 2ea9d3acfe7e8665685f54294f5edc9b8ffc2f3f)
2013-10-22 14:34:04 +11:00
Martin Schwenke
cf04ff178c eventscripts: Remove reconfigure check from samba and winbind eventscripts
There is no reconfigure code for these scripts so no need to check for
reconfiguration.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 41df1637c1d8a7b2f5a9974408db71b1f74cb2f2)
2013-10-22 14:34:04 +11:00
Martin Schwenke
a45aae410c eventscripts: Remove reconfigure code from httpd eventscript
Nothing ever (or has ever) set the "needs reconfigure" flag, so this
code is unnecessary.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 5b77fd95bda5f1960aca952e1b759231890b56f3)
2013-10-22 14:34:04 +11:00
Martin Schwenke
49d0153b10 eventscripts: Fold ctdb_check_tcp_ports_ctdb() into ctdb_check_tcp_ports()
A generic framework is no longer needed now that the "ctdb" checker is
the only one left.  Simplify the code.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 044d302b41a2040642355401e3236fcecc3a620a)
2013-10-22 14:34:04 +11:00
Martin Schwenke
0e9c939c0c eventscripts: Remove TCP port checks other than the built-in CTDB one
"ctdb checktcpport" is no longer experimental so the other checkers
are no longer required.

Remove tests related to the removed checkers.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 50e330d0679614bee2e7bab028436e929f74ca50)
2013-10-22 14:34:04 +11:00
Martin Schwenke
d02a645691 scripts: Remove setting of PATH from functions file
The current setting is inconsistent with settings on most systems,
putting /bin before /sbin.  Use of /usr/local/bin, which may be
required on some systems, is also overridden.  This can make it
difficult to do interactive debugging of script problems.

Rely on the system PATH instead.

If system-specific changes need to be made then this can be done in a
configuration file.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit cfbff39e22e42f3997f637290748290833525714)
2013-10-22 14:34:04 +11:00
Martin Schwenke
5d65335d60 tests/eventscripts: Run scripts under sh by default
Some scripts are disabled by default so are no executable.  Explicitly
running them under sh allows them to be run without having to mess
around and make them executable or similar.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 9437d4809bfbbb5c6a32a610665333d2f641881d)
2013-10-22 14:34:04 +11:00
Martin Schwenke
05f5fe9179 tests/eventscripts: New tests for 20.multipathd
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 212d4b201c30804f69cffe4b7150d4b74bf2e54f)
2013-10-22 14:34:04 +11:00
Martin Schwenke
1ede20925f eventscripts: Clean up 20.multipathd
Reduce the complexity, including the depth of background processes.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 49f077c475b078889ff0492fe7d567a64d6cb87c)
2013-10-22 14:34:04 +11:00
Martin Schwenke
1e4c965f52 eventscripts: NAT gateway script should export CTDB_NATGW_NODES
Otherwise calls to "ctdb natgwlist" will not behave as expected if a
non-standard file is used, since that command will use the default
file location.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit e574b30257126679704b088c4334a8e7a53a9c3f)
2013-10-22 14:34:04 +11:00
Martin Schwenke
cd4041760b scripts: Simplify script_log() to just look at CTDB_SYSLOG variable
The old logic was actually wrong.  If CTDB_LOGFILE is unset then a
default is used, not syslog.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 79e2029f9bc078126e865aa715100a3870c7604b)
2013-10-22 14:34:04 +11:00
Martin Schwenke
4526fdbbca scripts: Remove support for CTDB_OPTIONS configuration variable
Allowing people to put random options in CTDB_OPTIONS complicates some
logic (particularly around use of syslog).  If we're going to have
variables for options then let's make sure we have a variable for each
option and make people use them.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit e55f3a1577eff0182802b0341d865d961aeae1c7)
2013-10-22 14:34:04 +11:00
Martin Schwenke
1043b53d12 scripts: Remove unused configuration variable CTDB_MANAGES_SCP
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit bda0da41aaf629a252cc361b73ebc5328f26ed04)
2013-10-22 14:34:03 +11:00
Martin Schwenke
04f67b1066 eventscripts: Deprecate NFS_SERVER_MODE, use CTDB_NFS_SERVER_MODE instead
All CTDB configuration variables should start with CTDB_.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit f12658aff125996ae45eea23241d8c3d0567b893)
2013-10-22 14:34:03 +11:00
Martin Schwenke
fbd2617cb8 recoverd: Remove function reload_nodes_file()
It is a 1 line wrapper around ctdb_load_nodes_file(), so use that
instead.  We need less code...  :-)

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 4a5d5935f4410a93a3343d85a24dbcddae2c4c20)
2013-10-22 14:34:03 +11:00
Martin Schwenke
a93361fca2 Revert "null out the pointer before we reload the nodes file"
This reverts commit 4b0f32047e8bece0a052bdbe2209afe91b7e8ce3.

This is not necessary.  It just causes a memory leak.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 25fd05505f61dc595c0ef25bb6e332274d5530e8)
2013-10-22 14:34:03 +11:00
Martin Schwenke
19a911bf1a client: Fix a format string argument compiler warning
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit f3413fb8b90c4d9f0c2c2a69825c66d080117193)
2013-10-22 14:34:03 +11:00
Amitay Isaacs
e63232e974 recoverd: Ignore failed flag updates on inactive nodes
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-programmed-with: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 484c46eaae056480baf050fd91868f2fd0537985)
2013-10-22 14:34:03 +11:00
Amitay Isaacs
a42b6e1cad common/util: Use AIX specific code for setting high priority for CTDB daemon
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 7764cf67a61bbf1caad5aa8e2d75a262b9da654c)
2013-10-22 14:33:56 +11:00
Martin Schwenke
63c534b1e2 git: Ignore generated documentation files
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit b9af66032f3d96f2fe12b7a4fcc5e71d4a282365)
2013-10-22 13:07:13 +11:00
Martin Schwenke
5e0eb7bf84 tests: When running local tests with run_tests.sh, use fixed TEST_VAR_DIR
Otherwise we end up with lots of useless temporary directories.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 63924ff372b066cd878b79e71f06de4c24c814a2)
2013-10-22 13:07:13 +11:00
Martin Schwenke
ace6c1ee62 eventscripts: Fix comment - CTDB_TCP_PORT_CHECKS -> CTDB_TCP_PORT_CHECKERS
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 0a79ba2f1277a776347e2c3f04ce8419e0be62de)
2013-10-22 13:07:13 +11:00
Martin Schwenke
9256010cfb tests/integration: Tweak ctdbd startup options
* --public-interface is not needed

* Add --sloppy-start to speed up restarts

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit d0dec5b8e60316701fdd02150c4dd8f01aacbfda)
2013-10-22 13:07:13 +11:00
Martin Schwenke
4812291ff8 recoverd: Fix the VNN lmaster consistency check
It does cope with node that don't have the lmaster capability.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 588172bcb6bf267339e2bd09e23d2c4904a27a41)
2013-10-22 11:49:54 +11:00
Amitay Isaacs
ec7c9952d5 tests: If transaction_start fails, try again
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit ed7d999214ee009e480c26410a04fa105028cb8e)
2013-10-08 17:10:13 +11:00
Amitay Isaacs
30f422b960 tests: Make sure test exits with zero status on successful completion
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit af4b6b8b3222d2a3c425fcc6833db579d0cd7ffa)
2013-10-08 17:10:08 +11:00
Amitay Isaacs
dae9e86461 tests: Re-enable transaction test code
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 929045335212e825deb645cc6c7f97b8a40fdbb3)
2013-10-04 15:47:11 +10:00
Amitay Isaacs
c4a80c7a67 tools/ctdb: Remove setdbseqnum command
This command was added to test persistent database recovery with sequence
numbers.  With the new persistent transaction code, sequence numbers get
updated automatically, so there is no need for this command.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 14bfd22fad1a5fd27eede1be7fccbaed9466e13e)
2013-10-04 15:47:11 +10:00
Amitay Isaacs
524696fa26 tests: No need to set sequence number when modifying persistent database
With the new persistent transaction code, sequence numbers will be
automatically updated whenever a record is updated.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 961dd5d0acbb971756944ea9f69992020ea7d9fc)
2013-10-04 15:47:11 +10:00
Amitay Isaacs
d0f99926e4 client: Remove old persistent transaction code
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 41bdbcfd72092cdd25da87e60689c087bca97933)
2013-10-04 15:47:11 +10:00
Amitay Isaacs
c5ec04f24e client: Reimplement persistent transaction code using TRANS3_COMMIT
Implementing persistent trasnaction code from Samba.

Persistent transaction code was reimplemented in Samba using g_lock.tdb
to hold transaction locks and using TRANS3_COMMIT control.

Implementation details:

1. When starting a transaction, create a record with "transaction-<dbid>"
   as key and store current server_id in the structure.

2. If a record already exists, some other client has already started a
   transaction.  Verify that the process corresponding to server_id stored
   in the record really exists or it's a stale record and overwrite it.

3. All modifications to the actual persistent database are stored in a
   marshal buffer.

4. When transaction is committed, read the sequence number of the
   persistent database and increment it.  Sequence number record is also
   stored in the marshal buffer.

5. Send the changed records (marshal buffer) in TRANS3_COMMIT control
   to all the active nodes.

6. If all controls succeed, verify that the sequence number has been
   incremented.  Commit is successful.  If any of the controls fail,
   abort the transaction.

7. In case sequence number has not yet been incremented, then database
   recovery has been triggered.  So repeat from step 5.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 4e0f1971792c9431d8d51dc57d54ecc9e4576dd5)
2013-10-04 15:46:15 +10:00
Amitay Isaacs
1203e82d9b client: Add functions to parse g_lock.tdb records
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 40589ae5259880431f358250c1f0d07bcaa21d1f)
2013-10-04 15:43:32 +10:00
Amitay Isaacs
0205d52657 client: Add functions to handle server_id structure
server_id records are stored in g_lock.tdb for persistent transactions.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 55f91ea4373c54ddb5faad87fa2826d86a4b6172)
2013-10-04 15:43:31 +10:00
Amitay Isaacs
be33efa3e4 ctdbd: Remove transaction code related to TRANS2 commits
This removes data types and structure elements related to TRANS2
persistent transaction code.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 22a253b7ccf1ff854cddf0b67969dc84d7d6a654)
2013-10-04 15:20:25 +10:00
Amitay Isaacs
91d644325d ctdbd: Deprecate TRANS2 commit controls
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 7d176352986317e63696d74252ff5d8eccb2fee5)
2013-10-04 15:20:25 +10:00
Amitay Isaacs
1ff9645865 ctdbd: Create a utility function to log error for "not implemented" controls
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 3c892ea1b5aa42686adb82ce29b9fcfdf9d204a1)
2013-10-04 15:20:25 +10:00
Amitay Isaacs
fe62936bb6 include: Remove unused set_dmaster structure
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 2ce3a48cc969d563c26dd295723416c0d7b077a2)
2013-10-04 15:20:25 +10:00
Martin Schwenke
24fb430d6e tests/tool: Remove references in libctdb in file and function names
Main changes are:

  libctdb_test.c -> ctdb_test_stubs.c
  ctdb_tool_libctdb.c -> ctdb_functest.c

ctdb_tool_stubby.c is gone, replaced with existing ctdb_test.c.

Functions starting with "libctdb_test_" now start with
"ctdb_test_stubs_".

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 6182bd0c19f215a997efe5272e633b1b1bd0c882)
2013-10-04 15:15:35 +10:00
Martin Schwenke
f3b1790819 tests/tool: Rework test programs so they no longer expect libctdb
Instead, override controls using preprocessor magic.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 10aac42f30cc0d56dca42ece17d04ccbc321056d)
2013-10-04 15:15:35 +10:00
Martin Schwenke
a6992b7b07 tests/tool: Fix some comment typos
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 59bd4ede15a5958b87e0d253461eb9111885bd2f)
2013-10-04 15:15:35 +10:00
Martin Schwenke
7a3e2f1627 tools/ctdb: Stop return value from being clobbered in control_lvsmaster()
ret is initialised too early and is clobbered by the call to
ctdb_ctrl_getcapabilities().  Initialising it later means that the
function returns -1 when no LVS master is found.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 3296559c43e70f755fcf2c06677891e0319c8142)
2013-10-04 15:15:35 +10:00
Martin Schwenke
b527236efe client: Fix some format string compiler warnings
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 5619754343003016ede27014567dbb4701f97928)
2013-10-04 15:15:35 +10:00
Amitay Isaacs
e229ff6133 common: Fix setting of debug level in the client code
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 299fa487549e36572b757852d21471f9e23f6e8f)
2013-10-04 15:15:35 +10:00
Amitay Isaacs
7a8337a01d libctdb: Remove incomplete libctdb
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit c5a7f2b4ff011e1393c4ff34864f85e6b472ff07)
2013-10-04 15:15:35 +10:00
Amitay Isaacs
2b68d143cb tools/ctdb: Pass memory context for returning nodes in parse_nodestring
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 1585a8e275b0143e5e46311b3d5e9785119f735f)
2013-10-04 15:15:35 +10:00
Amitay Isaacs
d4643abe88 tests: Do not use libctdb code in tests
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit ae0d8f432ef98a72c85a6cd42c503b718bef0e4e)
2013-10-04 15:15:34 +10:00
Amitay Isaacs
03379e332c tools/ctdb: Do not use libctdb for commandline tool
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit cd66282c635cf53386d8970b89c895076ea21cbd)
2013-10-04 15:15:34 +10:00
Amitay Isaacs
4ca9b96114 client: Add ctdb_ctrl_getdbseqnum() function
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 8cb1fbbfe88327c9c7ab68e8eded586dff611e57)
2013-10-04 15:15:34 +10:00
Amitay Isaacs
5d47f28e15 client: Add ctdb_ctrl_getdbstatistics() function
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 1e7fca5cdc1d7205cf084e35aace1a5dc46ea294)
2013-10-04 15:15:34 +10:00
Amitay Isaacs
105afa543e client: Add ctdb_client_check_message_handlers() function
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit c9a9d14c91f203ce964a426a8a1e2c1715af2098)
2013-10-04 15:15:34 +10:00
Amitay Isaacs
151bb4b97d client: Remove extra whitespaces
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 962eb63c6d500e29a03ae087757d81be449888c6)
2013-10-04 15:15:34 +10:00
Amitay Isaacs
2814c9a0c5 tests: Remove unused test program ctdb_fetch_lock_once
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 873b9cadbcc363a9e5f450b0a1feb1cf2ce1e6c9)
2013-10-04 15:15:34 +10:00
Amitay Isaacs
f165ed1594 tools/ctdb: When printing TDB data as a string, use correct length of the string
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit d94a10f93a0925b17458d009e604966666b3d880)
2013-10-04 15:15:27 +10:00
Amitay Isaacs
d3783ae140 tools/ctdb: Remove un-implemented ctdb vacuum command
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 8b238852884004a56f76a1762199c338864d1249)
2013-10-04 15:15:27 +10:00
Amitay Isaacs
e4ed152d59 tests: Add a simple test to test cluster wide database traverse
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 713c9ecc791e3319a2d109838471833de5a158c8)
2013-09-26 10:21:31 +10:00
Amitay Isaacs
a2d6bbe67a traverse: Send traverse end record from traverse child process
Traverse records are sent directly from traverse child process, but
the last empty record signalling end of traverse is sent from ctdbd.
This creates a race condition between ctdbd and traverse child.
There are two fds from traverse child to ctdbd - a pipe to track status
of the child process and unix socket connection for sending records.
It's possible that last few records are sitting in unix socket buffer
when ctdbd reads the status written from traverse child.  This will
be interpreted as end of traverse and ctdbd will send the last empty
record to originating node before it has processed the pending packets
in unix socket connection.

The race is avoided by sending the last empty record marking end of
traverse from the child process.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 37e22fc3ac3eb64732f2e67058f5b7b06c093fbf)
2013-09-25 14:59:45 +10:00
Amitay Isaacs
f1f1788f10 traverse: Wait till all data has been flushed from output queue
To improve the traverse performance, records are directly sent from
traverse child process to the originating node.  Make sure that all the
data is sent via socket, before informing ctdbd that traverse is complete.

Without waiting for all the packets to be flushed from the queue,
child process can incorrectly signal ctdbd that traverse has ended.
This will cause the pending records in the queue never to make it to
the originating node and traverse information will not be complete.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 482ac708cb79cb6378d814a79c2cf13f88435bc4)
2013-09-25 14:59:45 +10:00
Amitay Isaacs
1740cbb58c traverse: Use ctdb local variable for convenience
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 25e9cf86328252f96215b54b94551dd7bbdd2db4)
2013-09-25 14:59:45 +10:00
Amitay Isaacs
c4f49a5342 traverse: Check if local traverse failed or succeeded
By passing the result of tdb_traverse_read() allows ctdbd to determine
if the local traverse succeeded or not.  In case of a problem with local
traverse, ctdbd can log an error.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit abd51a9f41ebb178c4ea4491bdedf9a9433e7232)
2013-09-25 14:59:45 +10:00
Amitay Isaacs
76d9d2e5e1 traverse: Log information when traverse starts and ends
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit e4aba8598b00a810e721de64ac44dccc9af04ab6)
2013-09-25 14:59:45 +10:00
Martin Schwenke
613313fa52 tool/ltdbtool: -h option does not require an argument
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 9e18f3c173863919587e25d704f66372624ed8ed)
2013-09-25 14:35:46 +10:00
Martin Schwenke
5818771192 scripts: Add support for optional ctdbd.conf configuration file
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 8f660d0dd52013e5876806be908e8e603aa6e968)
2013-09-25 14:35:46 +10:00
Martin Schwenke
066b671de0 utils: Make debug level strings case-insensitive
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit c700dd0c7b6b43b61b3e231643b5d7cbe2f9592a)
2013-09-25 14:35:31 +10:00
Martin Schwenke
5b2c8ba880 tools/ctdb: Fix help messages for ctdb commands
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 49c87699fad151933a0aefebfee968fc850e6383)
2013-09-25 14:34:55 +10:00
Martin Schwenke
058037d58c tools/ctdb: Ban time of 0 is invalid
Apparently it used to mean a permanent ban but it is unclear if this
was ever supported.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit c8a6e5ce579e2fe320c40268e7e9ddfe68b8cd30)
2013-09-25 14:34:55 +10:00
Amitay Isaacs
4c4bfcbd6f eventscripts: Load CTDB configuration settings in 70.iscsi
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit ff41ce5ef202f8f6342e285d195bb5df61d848ce)
2013-09-23 18:38:28 +10:00
Martin Schwenke
430ae84877 recoverd: Disable takeover runs on other nodes for 5 minutes
60 seconds might not be long enough to kill all connections and
release IPs.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 71080676bb4acbd0d9b595a30cf7fe6dddbf426f)
2013-09-19 12:58:32 +10:00
Martin Schwenke
07d3a1b234 recoverd: Improve logging for takeover runs
Takeover runs are currently silent when they succeed.  However, they
are important, so log something by default.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit b39aa2e401fbb581207d986bac93778e9c01acdc)
2013-09-19 12:57:36 +10:00
Martin Schwenke
236b2524de tools/ctdb: Use the standard long timeout when disabling takeover runs
This means that takeover runs will be disabled for about as long as the
reloadips control can take to complete.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 6d44657a5e5b0df22bab2d487a503dd1c5ba79b4)
2013-09-19 12:56:50 +10:00
Martin Schwenke
5f0d85d4db tools/ctdb: Fix arguments/semantics of rebalance node
There's no reason why specifying a node should be compulsory.  This is
a cluster-wide operation because it is implemented by the recovery
master so multiple nodes should not be specified using -n.  However,
the command should be able to specify multiple nodes so let it have
its own nodestring argument.

This change should be backward compatible with the old requirement of
specifying a single node via -n.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 0846c00597adb66bba8c9dbf63443d0c2f91a7d1)
2013-09-19 12:54:32 +10:00
Martin Schwenke
c484361076 tools/ctdb: Make rebalancenode more robust
Use a broadcast instead of trying to win the race of determining the
recovery master and then sending the message before the recovery
master changes.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit ac946ee4ad01b1e5cd1006930b9f8a190a0a58ba)
2013-09-19 12:54:32 +10:00
Martin Schwenke
44b7397962 tests/simple: Fix the reloadips test to cope with changes to reloadips
Specifying nodes to reload no longer uses -n.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit d921b2756d5f1c4ad7a35fe120f6fda9f5bf5686)
2013-09-19 12:54:32 +10:00
Martin Schwenke
566d66e6ab recoverd: Be careful about freeing the list of IP rebalance target nodes
It can change during a takeover run.  If it does then don't free it.

There are potentially fancier solutions (e.g. check what PNNs are new
to the list) to this issue but this is the simplest.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit e81589b7084c661adf617e166cc2c25b4939f841)
2013-09-19 12:54:31 +10:00
Martin Schwenke
4fb0d4a301 recoverd: reloadips should rebalance target nodes for new IPs
Otherwise, if existing IPs are added to extra nodes (that have,
perhaps, been disconnected) then those IPs will not be rebalanced
across the extra nodes.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit ceb30432a9a550778aed0b422a654fc5287b82a3)
2013-09-19 12:54:31 +10:00
Martin Schwenke
950e23f664 ctdbd: Make ctdb_reloadips_child send controls asynchronously
Deleting IPs can take a while because IPs are released and connections
are killed.  This can take a while so do them in parallel.  In fact,
since the set of IPs being added and deleted will be disjoint, send
all the adds/deletes at the same time and then wait.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 85a5b544ec032173e98c9cc3b5402a76b961aa3b)
2013-09-19 12:54:31 +10:00
Martin Schwenke
b33ee7a2a4 recoverd: Fix the implementation of CTDB_SRVID_REBALANCE_NODE
The current implementation has a few flaws:

* A takeover run is called unconditionally when the timer goes even if
  the recovery master role has moved.  This means a node other than
  the recovery master can incorrectly do a takeover run.

* The rebalancing target nodes are cleared in the setup for a takeover
  run, regardless of whether the takeover run succeeds.

* The timer to force a rebalance isn't cleared if another takeover run
  occurs before the deadline.  Any forced rebalancing will happen in
  the first takeover run and when the timer expires some time later
  then an unnecessary takeover run will occur.

* If the recovery master role moves then the rebalancing data will
  stay on the original node and affect the next takeover run to occur
  if the recovery master role should come back to the original node.

Instead, store an array of rebalance target nodes in the recovery
master context.  This is passed as an extra argument to
ctdb_takeover_run() each time it is called and is cleared when a
takeover run succeeds.  The timer hangs off the array of rebalance
target nodes, which is cleared if the node isn't the recovery master.

This means that it is possible to lose rebalance data if the recovery
master role moves.  However, that's a difficult problem to solve.  The
best way of approaching it is probably to try to stop the recovery
master role from jumping around unnecesarily when inactive nodes join
the cluster.

The long term solution is to avoid this nonsense completely.  The IP
allocation algorithm needs to cache state between runs so that it
knows which nodes have just become healthy.  This also needs recovery
master stability.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit c51c1efe5fc7fa668597f2acd435dee16e410fc9)
2013-09-19 12:54:31 +10:00
Martin Schwenke
1793412de2 recoverd: Remove unused CTDB_SRVID_RELOAD_ALL_IPS and handler
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 4cd727439a0824ebb8dbcf737d9888ffc3c41184)
2013-09-19 12:54:31 +10:00
Martin Schwenke
6f1935ea6d tools/ctdb: Reimplement reloadips
This implementation disables takeover runs on all nodes before trying
to reload IPs.  It also takes "all" or the list of PNNs as an argument
to the command instead of to -n.  -n can still be specified with a
single node indicating that node should be considered the current node
- that might be confusing so could be removed.

This implementation does not use CTDB_SRVID_RELOAD_ALL_IPS, so it can
be removed.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit d66a072d9b120c78c47e726e9f29a3c1cfdd87ce)
2013-09-19 12:54:31 +10:00
Martin Schwenke
e7cc998570 recoverd: Defer ipreallocated requests when takeover runs are disabled
The takeover run will fail anyway but deferring seems like a cleaner
option.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 428f800bcdf3dbfe19de8bb36099fbf01ebeaab4)
2013-09-19 12:54:31 +10:00
Martin Schwenke
2f472b4573 recoverd: Reimplement CTDB_SRVID_DISABLE_IP_CHECK
Use disable_takeover_runs_handler() instead of maintaining duplicate
logic.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 0a51a85915486b2a8fded7ba6444b18c6c1ee8e8)
2013-09-19 12:54:31 +10:00
Martin Schwenke
5f0913d321 recoverd: New SRVID message CTDB_SRVID_DISABLE_TAKEOVER_RUNS
This implements a superset of CTDB_SRVID_DISABLE_IP_CHECK.  It stops
the IP checks but also causes any attempted takeover runs to fail and
be rescheduled.

This is meant to completely stop IP movements.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 00db4de53a0d86013e79e6577e7e6cf3ef864e56)
2013-09-19 12:54:31 +10:00
Martin Schwenke
e79b750e5e tools/ctdb: Add a wait_for_all option to srvid_broadcast()
This will be useful for other SRVIDs.

The error checking in the handler depends on the SRVID responding with
a uint32_t where <0 indicates an error and >=0 is a PNN that
succeeded.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 52050e1c75b21961dafe2bc410268b44240ab24e)
2013-09-19 12:54:31 +10:00
Martin Schwenke
51db81344e tools/ctdb: Factor out SRVID broadcast code from ipreallocate()
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit a566fb5e70282c4e9f76654b1be4dc80829dced0)
2013-09-19 12:54:30 +10:00
Martin Schwenke
8a6979dac3 tools/ctdb: Change ipreallocate() to use a local done flag
Instead of the current global variable.  This is in anticipation of
abstracting the code.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit c58ee0eddf7ae3283e3ca8bd25575e6e677e1b17)
2013-09-19 12:54:30 +10:00
Martin Schwenke
0ba7e2ce31 recoverd: Factor out the SRVID handling code
The code that handles IP reallocate requests can be reused.

This also changes the result back to a SRVID caller to the PNN on
success or a negative error code on failure.  None of the callers
currently look at the result so this is harmless... but it will be
useful later.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit e4eae6e3291baa299a1d0f733ab11b138ee699a3)
2013-09-19 12:54:30 +10:00
Martin Schwenke
4c3f8dc3bb recoverd: Make the SRVID request structure generic
No need for a separate one for each SRVID.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit d9c22b04d5aa7938a3965bd3144568664eb772ce)
2013-09-19 12:54:30 +10:00
Martin Schwenke
c503997746 recoverd: Move disabling of IP checks into do_takeover_run()
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 48b603fbf16311daa47b01e7a33d477ed51da56d)
2013-09-19 12:54:30 +10:00
Martin Schwenke
bbbb55eef9 recoverd: do_takeover_run() should mark when a takeover run is in progress
Nested takeover runs should never happens so they should fail.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 8ed29c60c0a7dd29f2a6efdf694d38e94281e1c4)
2013-09-19 12:54:30 +10:00
Martin Schwenke
a1f915f6b5 recoverd: takeover_fail_callback() doesn't need to set rec->need_takeover_run
It is set on every failure anyway.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit e5f94c7857405bdeac233069003c3769b3dc3616)
2013-09-19 12:54:30 +10:00
Martin Schwenke
701c450e90 recoverd: Fail takeover run if "ipreallocated" fails
Previously flagging a failure was probably avoided because of attempts
to run "ipreallocated" events on stopped and banned nodes, which would
fail because they are in recovery.  Given the change to a new control
and that fallback only retries the old method on active nodes, this
should never fail in reasonable circumstances.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 53722430ad35f80935aabd12fa07654126443b8b)
2013-09-19 12:54:30 +10:00
Martin Schwenke
e167e2e7c7 recoverd: New function do_takeover_run()
Factor the calling sequence for ctdb_takeover_run() into a new
function and call it instead.  This changes rec->need_takeover_run to
false for each successful takeover run and that seems to be the right
thing to do.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 9a3f0c0e61ca5c17e020c6e0463d73c7cf4f7c09)
2013-09-19 12:54:30 +10:00
Martin Schwenke
30a50c6e1e recoverd: Stabilise the recovery master role
On rare occasions when a node that has been inactive it will trigger
an election when it becomes active again.  If that node has been up
for the longest then it will win the election and the recovery master
role will spuriously move.

While a node remains inactive we reset the priority time to discourage
it from winning elections.  The priority time will now reflect roughly
how long the node has been active rather than how long it has been up.
That means the most stable node is more likely to win elections.

Having a stable recovery master means that disabling takeover runs
while reloading IPs is more likely to succeed.  It also improves the
chances of being able to cache information in the recovery master -
for example, between takeover runs.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit f0f48f22f45e4c82eba2582efae307e25385de81)
2013-09-19 12:54:29 +10:00
Martin Schwenke
630196423a recoverd: Banned nodes should not be told to run "ipreallocated" event
They will reject it because they are in recovery.  This can result in
extra banning credits being applied to banned nodes.

This corresponds to commit 9132e6814ed927fa317f333f03dedb18f75d0e5b
from the 1.2.40 branch.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 403938804caf1322f9773d63197e4303a7b2a788)
2013-09-18 17:16:35 +10:00
Martin Schwenke
d30e269ecc common: Make parse_ip() valgrind-clean
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit c0bb147ca09e82019b05ec22995623cffc3184e2)
2013-09-11 15:35:38 +10:00
Martin Schwenke
8d11da3546 recoverd: Remove an orphaned comment
This should have been removed with the associated code in commit
14bd0b6961ef1294e9cba74ce875386b7dfbf446.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 36de63843de10a1f2a9ccdbbee24cc1d08542984)
2013-09-11 15:35:16 +10:00
Martin Schwenke
4e62553fcb recoverd: Update a comment to use current terminology
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit ea5576071b22e1877903ec0921d375626a23e13b)
2013-09-11 15:35:10 +10:00
Martin Schwenke
fe7f66547b client: Remove unused function list_of_active_nodes_except_pnn()
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit d8a76cf79f07dfb5a93c6c9a13f16e3268c7dd57)
2013-09-11 15:35:03 +10:00
Martin Schwenke
c870f01160 tools/ctdb: list_of_active_nodes_except_pnn() -> list_of_nodes()
list_of_active_nodes_except_pnn() is only used here and can be removed
if we remove this call.  Less is more...

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit d4e206fb818048b7fab4797c877b854bdbb1ab70)
2013-09-11 15:34:58 +10:00
Martin Schwenke
2d31ec2131 tools/ctdb: Fix a memory leak in parse_nodestring()
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 8753a094b97340deb26dd44f6ea345ca0a642a95)
2013-09-11 15:34:51 +10:00