1
0
mirror of https://github.com/samba-team/samba.git synced 2024-12-30 13:18:05 +03:00
Commit Graph

5086 Commits

Author SHA1 Message Date
Martin Schwenke
e782b61732 ctdbd: Pass the public address file location in ctdb context
No need to pass it as an extra argument to ctdb_start_daemon.

Also ensure options.public_address_list gets a nice static default.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit a3d63a9db89d08bb284b3b3a6db773422f21b477)
2013-10-22 15:37:54 +11:00
Martin Schwenke
463a091a77 ctdbd: Debug locks by default with override from enviroment variable
Default is debug_locks.sh, relative to CTDB_BASE.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit c11803e3dcc905a45a08d743595e63f9ca445f0d)
2013-10-22 15:37:54 +11:00
Martin Schwenke
4adc8f4f09 ctdbd: Default for event_script_dir should use CTDB_BASE
Also get rid of ctdb_set_event_script_dir().  It creates an
unnecessary copy of something that will be around for the lifetime of
the process.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 21b4d1aba00902f1eee0cbf4f082b0794fd5b738)
2013-10-22 15:37:54 +11:00
Martin Schwenke
f9ce563135 ctdbd: Add nodes_file member to struct ctdb_context
This allows ctdb_load_nodes_file() to move to ctdb_server.c and
ctdb_set_nlist() to become static.

Setting ctdb->nodes_file needs to be done early, before the nodes file
is loaded.  It is now set from CTDB_BASE instead ETCDIR, so setting
CTDB_BASE also needs to be done earlier.

Unhack ctdbd_test.c - it no longer needs to define
ctdb_load_nodes_file().

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 20e705e63bd3b20837cc3ac92fdcf2a9650ccfc8)
2013-10-22 15:37:54 +11:00
Martin Schwenke
27a6343369 tools/ctdb: CTDB_BASE is the default location of configuration files
Ensure that environment variable CTDB_BASE is set.

Update defaults for nodes and natgw_nodes to use CTDB_BASE.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 2b6dc0d2799f3563b767622b6f9246450aa4036b)
2013-10-22 15:37:54 +11:00
Martin Schwenke
7c90395136 ctdbd: Don't check CTDB_BASE before setting it, just don't override
That's what the 3rd argument to setenv(3) is for...  :-)

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 30ca419aa1c78008f81839497921bbfba480e7fc)
2013-10-22 15:37:54 +11:00
Martin Schwenke
b331fab515 tests/integration: Pass --valgrinding option when running under valgrind
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 913f229508302378212678d98c22606a4954b09c)
2013-10-22 15:37:54 +11:00
Martin Schwenke
82e5effc40 ctdbd: Fix some errors in the popt configuration
That 4th argument isn't a default or similar, so consistently make it 0.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 1c0a627df1b510f49c65ffeb4474240c8856cdf2)
2013-10-22 14:34:05 +11:00
Martin Schwenke
4fbf3e5bdf initscript: New configuration variable CTDB_DBDIR_STATE
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 30d9b634b16c3cc740e5e453ea5c21012b1fde88)
2013-10-22 14:34:05 +11:00
Martin Schwenke
37aea37269 scripts: Make detect_init_style() more readable
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 516cdea0e73cf3f63b3303e22809834c8cbc64e4)
2013-10-22 14:34:05 +11:00
Martin Schwenke
0b69785eb2 eventscripts: Rework the iSCSI eventscript
* It should run on "ipreallocated" instead of "recovered"
* Variable name NODE -> ip since that's what it is
* Simplify some logic

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 45e2bc66abf9fcfeadcc279a656ed7fd1838920a)
2013-10-22 14:34:05 +11:00
Martin Schwenke
04c31bf50d eventscripts: Don't update static routes on "recovered" event
Routes only need to be updated when IPs have moved.  IP takeover runs
will generate "ipreallocated", which is enough.  "recovered" always
follows "ipreallocated" anyway, so avoid the redundancy.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 1152215fc69217e4292762e28d193b7ea0e06ee3)
2013-10-22 14:34:05 +11:00
Martin Schwenke
3132550a88 eventscripts: NAT gateway script doesn't need to handle "recovered" event
Any time a node changes flags in any significant way there will be a
takeover run, which will generate an "ipreallocated" event.  The
"recovered" event always happens straight after a takeover run so we
update the NAT gateway twice.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 542c70d6281d636ecd51502fbbf219f418bfac66)
2013-10-22 14:34:05 +11:00
Martin Schwenke
5369f711dc eventscripts: Delete placeholder "recovered" and "shutdown" events
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 00736a21fc268c10b6a718731e56b3dbb7e60554)
2013-10-22 14:34:04 +11:00
Martin Schwenke
2e819aa00f eventscripts: Clean up comment at the top of 00.ctdb
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 2ea9d3acfe7e8665685f54294f5edc9b8ffc2f3f)
2013-10-22 14:34:04 +11:00
Martin Schwenke
cf04ff178c eventscripts: Remove reconfigure check from samba and winbind eventscripts
There is no reconfigure code for these scripts so no need to check for
reconfiguration.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 41df1637c1d8a7b2f5a9974408db71b1f74cb2f2)
2013-10-22 14:34:04 +11:00
Martin Schwenke
a45aae410c eventscripts: Remove reconfigure code from httpd eventscript
Nothing ever (or has ever) set the "needs reconfigure" flag, so this
code is unnecessary.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 5b77fd95bda5f1960aca952e1b759231890b56f3)
2013-10-22 14:34:04 +11:00
Martin Schwenke
49d0153b10 eventscripts: Fold ctdb_check_tcp_ports_ctdb() into ctdb_check_tcp_ports()
A generic framework is no longer needed now that the "ctdb" checker is
the only one left.  Simplify the code.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 044d302b41a2040642355401e3236fcecc3a620a)
2013-10-22 14:34:04 +11:00
Martin Schwenke
0e9c939c0c eventscripts: Remove TCP port checks other than the built-in CTDB one
"ctdb checktcpport" is no longer experimental so the other checkers
are no longer required.

Remove tests related to the removed checkers.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 50e330d0679614bee2e7bab028436e929f74ca50)
2013-10-22 14:34:04 +11:00
Martin Schwenke
d02a645691 scripts: Remove setting of PATH from functions file
The current setting is inconsistent with settings on most systems,
putting /bin before /sbin.  Use of /usr/local/bin, which may be
required on some systems, is also overridden.  This can make it
difficult to do interactive debugging of script problems.

Rely on the system PATH instead.

If system-specific changes need to be made then this can be done in a
configuration file.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit cfbff39e22e42f3997f637290748290833525714)
2013-10-22 14:34:04 +11:00
Martin Schwenke
5d65335d60 tests/eventscripts: Run scripts under sh by default
Some scripts are disabled by default so are no executable.  Explicitly
running them under sh allows them to be run without having to mess
around and make them executable or similar.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 9437d4809bfbbb5c6a32a610665333d2f641881d)
2013-10-22 14:34:04 +11:00
Martin Schwenke
05f5fe9179 tests/eventscripts: New tests for 20.multipathd
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 212d4b201c30804f69cffe4b7150d4b74bf2e54f)
2013-10-22 14:34:04 +11:00
Martin Schwenke
1ede20925f eventscripts: Clean up 20.multipathd
Reduce the complexity, including the depth of background processes.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 49f077c475b078889ff0492fe7d567a64d6cb87c)
2013-10-22 14:34:04 +11:00
Martin Schwenke
1e4c965f52 eventscripts: NAT gateway script should export CTDB_NATGW_NODES
Otherwise calls to "ctdb natgwlist" will not behave as expected if a
non-standard file is used, since that command will use the default
file location.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit e574b30257126679704b088c4334a8e7a53a9c3f)
2013-10-22 14:34:04 +11:00
Martin Schwenke
cd4041760b scripts: Simplify script_log() to just look at CTDB_SYSLOG variable
The old logic was actually wrong.  If CTDB_LOGFILE is unset then a
default is used, not syslog.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 79e2029f9bc078126e865aa715100a3870c7604b)
2013-10-22 14:34:04 +11:00
Martin Schwenke
4526fdbbca scripts: Remove support for CTDB_OPTIONS configuration variable
Allowing people to put random options in CTDB_OPTIONS complicates some
logic (particularly around use of syslog).  If we're going to have
variables for options then let's make sure we have a variable for each
option and make people use them.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit e55f3a1577eff0182802b0341d865d961aeae1c7)
2013-10-22 14:34:04 +11:00
Martin Schwenke
1043b53d12 scripts: Remove unused configuration variable CTDB_MANAGES_SCP
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit bda0da41aaf629a252cc361b73ebc5328f26ed04)
2013-10-22 14:34:03 +11:00
Martin Schwenke
04f67b1066 eventscripts: Deprecate NFS_SERVER_MODE, use CTDB_NFS_SERVER_MODE instead
All CTDB configuration variables should start with CTDB_.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit f12658aff125996ae45eea23241d8c3d0567b893)
2013-10-22 14:34:03 +11:00
Martin Schwenke
fbd2617cb8 recoverd: Remove function reload_nodes_file()
It is a 1 line wrapper around ctdb_load_nodes_file(), so use that
instead.  We need less code...  :-)

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 4a5d5935f4410a93a3343d85a24dbcddae2c4c20)
2013-10-22 14:34:03 +11:00
Martin Schwenke
a93361fca2 Revert "null out the pointer before we reload the nodes file"
This reverts commit 4b0f32047e8bece0a052bdbe2209afe91b7e8ce3.

This is not necessary.  It just causes a memory leak.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 25fd05505f61dc595c0ef25bb6e332274d5530e8)
2013-10-22 14:34:03 +11:00
Martin Schwenke
19a911bf1a client: Fix a format string argument compiler warning
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit f3413fb8b90c4d9f0c2c2a69825c66d080117193)
2013-10-22 14:34:03 +11:00
Amitay Isaacs
e63232e974 recoverd: Ignore failed flag updates on inactive nodes
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-programmed-with: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 484c46eaae056480baf050fd91868f2fd0537985)
2013-10-22 14:34:03 +11:00
Amitay Isaacs
a42b6e1cad common/util: Use AIX specific code for setting high priority for CTDB daemon
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 7764cf67a61bbf1caad5aa8e2d75a262b9da654c)
2013-10-22 14:33:56 +11:00
Martin Schwenke
63c534b1e2 git: Ignore generated documentation files
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit b9af66032f3d96f2fe12b7a4fcc5e71d4a282365)
2013-10-22 13:07:13 +11:00
Martin Schwenke
5e0eb7bf84 tests: When running local tests with run_tests.sh, use fixed TEST_VAR_DIR
Otherwise we end up with lots of useless temporary directories.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 63924ff372b066cd878b79e71f06de4c24c814a2)
2013-10-22 13:07:13 +11:00
Martin Schwenke
ace6c1ee62 eventscripts: Fix comment - CTDB_TCP_PORT_CHECKS -> CTDB_TCP_PORT_CHECKERS
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 0a79ba2f1277a776347e2c3f04ce8419e0be62de)
2013-10-22 13:07:13 +11:00
Martin Schwenke
9256010cfb tests/integration: Tweak ctdbd startup options
* --public-interface is not needed

* Add --sloppy-start to speed up restarts

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit d0dec5b8e60316701fdd02150c4dd8f01aacbfda)
2013-10-22 13:07:13 +11:00
Martin Schwenke
4812291ff8 recoverd: Fix the VNN lmaster consistency check
It does cope with node that don't have the lmaster capability.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 588172bcb6bf267339e2bd09e23d2c4904a27a41)
2013-10-22 11:49:54 +11:00
Amitay Isaacs
ec7c9952d5 tests: If transaction_start fails, try again
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit ed7d999214ee009e480c26410a04fa105028cb8e)
2013-10-08 17:10:13 +11:00
Amitay Isaacs
30f422b960 tests: Make sure test exits with zero status on successful completion
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit af4b6b8b3222d2a3c425fcc6833db579d0cd7ffa)
2013-10-08 17:10:08 +11:00
Amitay Isaacs
dae9e86461 tests: Re-enable transaction test code
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 929045335212e825deb645cc6c7f97b8a40fdbb3)
2013-10-04 15:47:11 +10:00
Amitay Isaacs
c4a80c7a67 tools/ctdb: Remove setdbseqnum command
This command was added to test persistent database recovery with sequence
numbers.  With the new persistent transaction code, sequence numbers get
updated automatically, so there is no need for this command.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 14bfd22fad1a5fd27eede1be7fccbaed9466e13e)
2013-10-04 15:47:11 +10:00
Amitay Isaacs
524696fa26 tests: No need to set sequence number when modifying persistent database
With the new persistent transaction code, sequence numbers will be
automatically updated whenever a record is updated.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 961dd5d0acbb971756944ea9f69992020ea7d9fc)
2013-10-04 15:47:11 +10:00
Amitay Isaacs
d0f99926e4 client: Remove old persistent transaction code
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 41bdbcfd72092cdd25da87e60689c087bca97933)
2013-10-04 15:47:11 +10:00
Amitay Isaacs
c5ec04f24e client: Reimplement persistent transaction code using TRANS3_COMMIT
Implementing persistent trasnaction code from Samba.

Persistent transaction code was reimplemented in Samba using g_lock.tdb
to hold transaction locks and using TRANS3_COMMIT control.

Implementation details:

1. When starting a transaction, create a record with "transaction-<dbid>"
   as key and store current server_id in the structure.

2. If a record already exists, some other client has already started a
   transaction.  Verify that the process corresponding to server_id stored
   in the record really exists or it's a stale record and overwrite it.

3. All modifications to the actual persistent database are stored in a
   marshal buffer.

4. When transaction is committed, read the sequence number of the
   persistent database and increment it.  Sequence number record is also
   stored in the marshal buffer.

5. Send the changed records (marshal buffer) in TRANS3_COMMIT control
   to all the active nodes.

6. If all controls succeed, verify that the sequence number has been
   incremented.  Commit is successful.  If any of the controls fail,
   abort the transaction.

7. In case sequence number has not yet been incremented, then database
   recovery has been triggered.  So repeat from step 5.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 4e0f1971792c9431d8d51dc57d54ecc9e4576dd5)
2013-10-04 15:46:15 +10:00
Amitay Isaacs
1203e82d9b client: Add functions to parse g_lock.tdb records
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 40589ae5259880431f358250c1f0d07bcaa21d1f)
2013-10-04 15:43:32 +10:00
Amitay Isaacs
0205d52657 client: Add functions to handle server_id structure
server_id records are stored in g_lock.tdb for persistent transactions.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 55f91ea4373c54ddb5faad87fa2826d86a4b6172)
2013-10-04 15:43:31 +10:00
Amitay Isaacs
be33efa3e4 ctdbd: Remove transaction code related to TRANS2 commits
This removes data types and structure elements related to TRANS2
persistent transaction code.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 22a253b7ccf1ff854cddf0b67969dc84d7d6a654)
2013-10-04 15:20:25 +10:00
Amitay Isaacs
91d644325d ctdbd: Deprecate TRANS2 commit controls
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 7d176352986317e63696d74252ff5d8eccb2fee5)
2013-10-04 15:20:25 +10:00
Amitay Isaacs
1ff9645865 ctdbd: Create a utility function to log error for "not implemented" controls
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 3c892ea1b5aa42686adb82ce29b9fcfdf9d204a1)
2013-10-04 15:20:25 +10:00
Amitay Isaacs
fe62936bb6 include: Remove unused set_dmaster structure
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 2ce3a48cc969d563c26dd295723416c0d7b077a2)
2013-10-04 15:20:25 +10:00
Martin Schwenke
24fb430d6e tests/tool: Remove references in libctdb in file and function names
Main changes are:

  libctdb_test.c -> ctdb_test_stubs.c
  ctdb_tool_libctdb.c -> ctdb_functest.c

ctdb_tool_stubby.c is gone, replaced with existing ctdb_test.c.

Functions starting with "libctdb_test_" now start with
"ctdb_test_stubs_".

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 6182bd0c19f215a997efe5272e633b1b1bd0c882)
2013-10-04 15:15:35 +10:00
Martin Schwenke
f3b1790819 tests/tool: Rework test programs so they no longer expect libctdb
Instead, override controls using preprocessor magic.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 10aac42f30cc0d56dca42ece17d04ccbc321056d)
2013-10-04 15:15:35 +10:00
Martin Schwenke
a6992b7b07 tests/tool: Fix some comment typos
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 59bd4ede15a5958b87e0d253461eb9111885bd2f)
2013-10-04 15:15:35 +10:00
Martin Schwenke
7a3e2f1627 tools/ctdb: Stop return value from being clobbered in control_lvsmaster()
ret is initialised too early and is clobbered by the call to
ctdb_ctrl_getcapabilities().  Initialising it later means that the
function returns -1 when no LVS master is found.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 3296559c43e70f755fcf2c06677891e0319c8142)
2013-10-04 15:15:35 +10:00
Martin Schwenke
b527236efe client: Fix some format string compiler warnings
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 5619754343003016ede27014567dbb4701f97928)
2013-10-04 15:15:35 +10:00
Amitay Isaacs
e229ff6133 common: Fix setting of debug level in the client code
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 299fa487549e36572b757852d21471f9e23f6e8f)
2013-10-04 15:15:35 +10:00
Amitay Isaacs
7a8337a01d libctdb: Remove incomplete libctdb
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit c5a7f2b4ff011e1393c4ff34864f85e6b472ff07)
2013-10-04 15:15:35 +10:00
Amitay Isaacs
2b68d143cb tools/ctdb: Pass memory context for returning nodes in parse_nodestring
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 1585a8e275b0143e5e46311b3d5e9785119f735f)
2013-10-04 15:15:35 +10:00
Amitay Isaacs
d4643abe88 tests: Do not use libctdb code in tests
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit ae0d8f432ef98a72c85a6cd42c503b718bef0e4e)
2013-10-04 15:15:34 +10:00
Amitay Isaacs
03379e332c tools/ctdb: Do not use libctdb for commandline tool
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit cd66282c635cf53386d8970b89c895076ea21cbd)
2013-10-04 15:15:34 +10:00
Amitay Isaacs
4ca9b96114 client: Add ctdb_ctrl_getdbseqnum() function
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 8cb1fbbfe88327c9c7ab68e8eded586dff611e57)
2013-10-04 15:15:34 +10:00
Amitay Isaacs
5d47f28e15 client: Add ctdb_ctrl_getdbstatistics() function
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 1e7fca5cdc1d7205cf084e35aace1a5dc46ea294)
2013-10-04 15:15:34 +10:00
Amitay Isaacs
105afa543e client: Add ctdb_client_check_message_handlers() function
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit c9a9d14c91f203ce964a426a8a1e2c1715af2098)
2013-10-04 15:15:34 +10:00
Amitay Isaacs
151bb4b97d client: Remove extra whitespaces
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 962eb63c6d500e29a03ae087757d81be449888c6)
2013-10-04 15:15:34 +10:00
Amitay Isaacs
2814c9a0c5 tests: Remove unused test program ctdb_fetch_lock_once
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 873b9cadbcc363a9e5f450b0a1feb1cf2ce1e6c9)
2013-10-04 15:15:34 +10:00
Amitay Isaacs
f165ed1594 tools/ctdb: When printing TDB data as a string, use correct length of the string
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit d94a10f93a0925b17458d009e604966666b3d880)
2013-10-04 15:15:27 +10:00
Amitay Isaacs
d3783ae140 tools/ctdb: Remove un-implemented ctdb vacuum command
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 8b238852884004a56f76a1762199c338864d1249)
2013-10-04 15:15:27 +10:00
Amitay Isaacs
e4ed152d59 tests: Add a simple test to test cluster wide database traverse
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 713c9ecc791e3319a2d109838471833de5a158c8)
2013-09-26 10:21:31 +10:00
Amitay Isaacs
a2d6bbe67a traverse: Send traverse end record from traverse child process
Traverse records are sent directly from traverse child process, but
the last empty record signalling end of traverse is sent from ctdbd.
This creates a race condition between ctdbd and traverse child.
There are two fds from traverse child to ctdbd - a pipe to track status
of the child process and unix socket connection for sending records.
It's possible that last few records are sitting in unix socket buffer
when ctdbd reads the status written from traverse child.  This will
be interpreted as end of traverse and ctdbd will send the last empty
record to originating node before it has processed the pending packets
in unix socket connection.

The race is avoided by sending the last empty record marking end of
traverse from the child process.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 37e22fc3ac3eb64732f2e67058f5b7b06c093fbf)
2013-09-25 14:59:45 +10:00
Amitay Isaacs
f1f1788f10 traverse: Wait till all data has been flushed from output queue
To improve the traverse performance, records are directly sent from
traverse child process to the originating node.  Make sure that all the
data is sent via socket, before informing ctdbd that traverse is complete.

Without waiting for all the packets to be flushed from the queue,
child process can incorrectly signal ctdbd that traverse has ended.
This will cause the pending records in the queue never to make it to
the originating node and traverse information will not be complete.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 482ac708cb79cb6378d814a79c2cf13f88435bc4)
2013-09-25 14:59:45 +10:00
Amitay Isaacs
1740cbb58c traverse: Use ctdb local variable for convenience
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 25e9cf86328252f96215b54b94551dd7bbdd2db4)
2013-09-25 14:59:45 +10:00
Amitay Isaacs
c4f49a5342 traverse: Check if local traverse failed or succeeded
By passing the result of tdb_traverse_read() allows ctdbd to determine
if the local traverse succeeded or not.  In case of a problem with local
traverse, ctdbd can log an error.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit abd51a9f41ebb178c4ea4491bdedf9a9433e7232)
2013-09-25 14:59:45 +10:00
Amitay Isaacs
76d9d2e5e1 traverse: Log information when traverse starts and ends
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit e4aba8598b00a810e721de64ac44dccc9af04ab6)
2013-09-25 14:59:45 +10:00
Martin Schwenke
613313fa52 tool/ltdbtool: -h option does not require an argument
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 9e18f3c173863919587e25d704f66372624ed8ed)
2013-09-25 14:35:46 +10:00
Martin Schwenke
5818771192 scripts: Add support for optional ctdbd.conf configuration file
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 8f660d0dd52013e5876806be908e8e603aa6e968)
2013-09-25 14:35:46 +10:00
Martin Schwenke
066b671de0 utils: Make debug level strings case-insensitive
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit c700dd0c7b6b43b61b3e231643b5d7cbe2f9592a)
2013-09-25 14:35:31 +10:00
Martin Schwenke
5b2c8ba880 tools/ctdb: Fix help messages for ctdb commands
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 49c87699fad151933a0aefebfee968fc850e6383)
2013-09-25 14:34:55 +10:00
Martin Schwenke
058037d58c tools/ctdb: Ban time of 0 is invalid
Apparently it used to mean a permanent ban but it is unclear if this
was ever supported.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit c8a6e5ce579e2fe320c40268e7e9ddfe68b8cd30)
2013-09-25 14:34:55 +10:00
Amitay Isaacs
4c4bfcbd6f eventscripts: Load CTDB configuration settings in 70.iscsi
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit ff41ce5ef202f8f6342e285d195bb5df61d848ce)
2013-09-23 18:38:28 +10:00
Martin Schwenke
430ae84877 recoverd: Disable takeover runs on other nodes for 5 minutes
60 seconds might not be long enough to kill all connections and
release IPs.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 71080676bb4acbd0d9b595a30cf7fe6dddbf426f)
2013-09-19 12:58:32 +10:00
Martin Schwenke
07d3a1b234 recoverd: Improve logging for takeover runs
Takeover runs are currently silent when they succeed.  However, they
are important, so log something by default.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit b39aa2e401fbb581207d986bac93778e9c01acdc)
2013-09-19 12:57:36 +10:00
Martin Schwenke
236b2524de tools/ctdb: Use the standard long timeout when disabling takeover runs
This means that takeover runs will be disabled for about as long as the
reloadips control can take to complete.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 6d44657a5e5b0df22bab2d487a503dd1c5ba79b4)
2013-09-19 12:56:50 +10:00
Martin Schwenke
5f0d85d4db tools/ctdb: Fix arguments/semantics of rebalance node
There's no reason why specifying a node should be compulsory.  This is
a cluster-wide operation because it is implemented by the recovery
master so multiple nodes should not be specified using -n.  However,
the command should be able to specify multiple nodes so let it have
its own nodestring argument.

This change should be backward compatible with the old requirement of
specifying a single node via -n.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 0846c00597adb66bba8c9dbf63443d0c2f91a7d1)
2013-09-19 12:54:32 +10:00
Martin Schwenke
c484361076 tools/ctdb: Make rebalancenode more robust
Use a broadcast instead of trying to win the race of determining the
recovery master and then sending the message before the recovery
master changes.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit ac946ee4ad01b1e5cd1006930b9f8a190a0a58ba)
2013-09-19 12:54:32 +10:00
Martin Schwenke
44b7397962 tests/simple: Fix the reloadips test to cope with changes to reloadips
Specifying nodes to reload no longer uses -n.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit d921b2756d5f1c4ad7a35fe120f6fda9f5bf5686)
2013-09-19 12:54:32 +10:00
Martin Schwenke
566d66e6ab recoverd: Be careful about freeing the list of IP rebalance target nodes
It can change during a takeover run.  If it does then don't free it.

There are potentially fancier solutions (e.g. check what PNNs are new
to the list) to this issue but this is the simplest.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit e81589b7084c661adf617e166cc2c25b4939f841)
2013-09-19 12:54:31 +10:00
Martin Schwenke
4fb0d4a301 recoverd: reloadips should rebalance target nodes for new IPs
Otherwise, if existing IPs are added to extra nodes (that have,
perhaps, been disconnected) then those IPs will not be rebalanced
across the extra nodes.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit ceb30432a9a550778aed0b422a654fc5287b82a3)
2013-09-19 12:54:31 +10:00
Martin Schwenke
950e23f664 ctdbd: Make ctdb_reloadips_child send controls asynchronously
Deleting IPs can take a while because IPs are released and connections
are killed.  This can take a while so do them in parallel.  In fact,
since the set of IPs being added and deleted will be disjoint, send
all the adds/deletes at the same time and then wait.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 85a5b544ec032173e98c9cc3b5402a76b961aa3b)
2013-09-19 12:54:31 +10:00
Martin Schwenke
b33ee7a2a4 recoverd: Fix the implementation of CTDB_SRVID_REBALANCE_NODE
The current implementation has a few flaws:

* A takeover run is called unconditionally when the timer goes even if
  the recovery master role has moved.  This means a node other than
  the recovery master can incorrectly do a takeover run.

* The rebalancing target nodes are cleared in the setup for a takeover
  run, regardless of whether the takeover run succeeds.

* The timer to force a rebalance isn't cleared if another takeover run
  occurs before the deadline.  Any forced rebalancing will happen in
  the first takeover run and when the timer expires some time later
  then an unnecessary takeover run will occur.

* If the recovery master role moves then the rebalancing data will
  stay on the original node and affect the next takeover run to occur
  if the recovery master role should come back to the original node.

Instead, store an array of rebalance target nodes in the recovery
master context.  This is passed as an extra argument to
ctdb_takeover_run() each time it is called and is cleared when a
takeover run succeeds.  The timer hangs off the array of rebalance
target nodes, which is cleared if the node isn't the recovery master.

This means that it is possible to lose rebalance data if the recovery
master role moves.  However, that's a difficult problem to solve.  The
best way of approaching it is probably to try to stop the recovery
master role from jumping around unnecesarily when inactive nodes join
the cluster.

The long term solution is to avoid this nonsense completely.  The IP
allocation algorithm needs to cache state between runs so that it
knows which nodes have just become healthy.  This also needs recovery
master stability.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit c51c1efe5fc7fa668597f2acd435dee16e410fc9)
2013-09-19 12:54:31 +10:00
Martin Schwenke
1793412de2 recoverd: Remove unused CTDB_SRVID_RELOAD_ALL_IPS and handler
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 4cd727439a0824ebb8dbcf737d9888ffc3c41184)
2013-09-19 12:54:31 +10:00
Martin Schwenke
6f1935ea6d tools/ctdb: Reimplement reloadips
This implementation disables takeover runs on all nodes before trying
to reload IPs.  It also takes "all" or the list of PNNs as an argument
to the command instead of to -n.  -n can still be specified with a
single node indicating that node should be considered the current node
- that might be confusing so could be removed.

This implementation does not use CTDB_SRVID_RELOAD_ALL_IPS, so it can
be removed.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit d66a072d9b120c78c47e726e9f29a3c1cfdd87ce)
2013-09-19 12:54:31 +10:00
Martin Schwenke
e7cc998570 recoverd: Defer ipreallocated requests when takeover runs are disabled
The takeover run will fail anyway but deferring seems like a cleaner
option.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 428f800bcdf3dbfe19de8bb36099fbf01ebeaab4)
2013-09-19 12:54:31 +10:00
Martin Schwenke
2f472b4573 recoverd: Reimplement CTDB_SRVID_DISABLE_IP_CHECK
Use disable_takeover_runs_handler() instead of maintaining duplicate
logic.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 0a51a85915486b2a8fded7ba6444b18c6c1ee8e8)
2013-09-19 12:54:31 +10:00
Martin Schwenke
5f0913d321 recoverd: New SRVID message CTDB_SRVID_DISABLE_TAKEOVER_RUNS
This implements a superset of CTDB_SRVID_DISABLE_IP_CHECK.  It stops
the IP checks but also causes any attempted takeover runs to fail and
be rescheduled.

This is meant to completely stop IP movements.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 00db4de53a0d86013e79e6577e7e6cf3ef864e56)
2013-09-19 12:54:31 +10:00
Martin Schwenke
e79b750e5e tools/ctdb: Add a wait_for_all option to srvid_broadcast()
This will be useful for other SRVIDs.

The error checking in the handler depends on the SRVID responding with
a uint32_t where <0 indicates an error and >=0 is a PNN that
succeeded.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 52050e1c75b21961dafe2bc410268b44240ab24e)
2013-09-19 12:54:31 +10:00
Martin Schwenke
51db81344e tools/ctdb: Factor out SRVID broadcast code from ipreallocate()
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit a566fb5e70282c4e9f76654b1be4dc80829dced0)
2013-09-19 12:54:30 +10:00
Martin Schwenke
8a6979dac3 tools/ctdb: Change ipreallocate() to use a local done flag
Instead of the current global variable.  This is in anticipation of
abstracting the code.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit c58ee0eddf7ae3283e3ca8bd25575e6e677e1b17)
2013-09-19 12:54:30 +10:00
Martin Schwenke
0ba7e2ce31 recoverd: Factor out the SRVID handling code
The code that handles IP reallocate requests can be reused.

This also changes the result back to a SRVID caller to the PNN on
success or a negative error code on failure.  None of the callers
currently look at the result so this is harmless... but it will be
useful later.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit e4eae6e3291baa299a1d0f733ab11b138ee699a3)
2013-09-19 12:54:30 +10:00
Martin Schwenke
4c3f8dc3bb recoverd: Make the SRVID request structure generic
No need for a separate one for each SRVID.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit d9c22b04d5aa7938a3965bd3144568664eb772ce)
2013-09-19 12:54:30 +10:00
Martin Schwenke
c503997746 recoverd: Move disabling of IP checks into do_takeover_run()
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 48b603fbf16311daa47b01e7a33d477ed51da56d)
2013-09-19 12:54:30 +10:00
Martin Schwenke
bbbb55eef9 recoverd: do_takeover_run() should mark when a takeover run is in progress
Nested takeover runs should never happens so they should fail.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 8ed29c60c0a7dd29f2a6efdf694d38e94281e1c4)
2013-09-19 12:54:30 +10:00
Martin Schwenke
a1f915f6b5 recoverd: takeover_fail_callback() doesn't need to set rec->need_takeover_run
It is set on every failure anyway.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit e5f94c7857405bdeac233069003c3769b3dc3616)
2013-09-19 12:54:30 +10:00
Martin Schwenke
701c450e90 recoverd: Fail takeover run if "ipreallocated" fails
Previously flagging a failure was probably avoided because of attempts
to run "ipreallocated" events on stopped and banned nodes, which would
fail because they are in recovery.  Given the change to a new control
and that fallback only retries the old method on active nodes, this
should never fail in reasonable circumstances.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 53722430ad35f80935aabd12fa07654126443b8b)
2013-09-19 12:54:30 +10:00
Martin Schwenke
e167e2e7c7 recoverd: New function do_takeover_run()
Factor the calling sequence for ctdb_takeover_run() into a new
function and call it instead.  This changes rec->need_takeover_run to
false for each successful takeover run and that seems to be the right
thing to do.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 9a3f0c0e61ca5c17e020c6e0463d73c7cf4f7c09)
2013-09-19 12:54:30 +10:00
Martin Schwenke
30a50c6e1e recoverd: Stabilise the recovery master role
On rare occasions when a node that has been inactive it will trigger
an election when it becomes active again.  If that node has been up
for the longest then it will win the election and the recovery master
role will spuriously move.

While a node remains inactive we reset the priority time to discourage
it from winning elections.  The priority time will now reflect roughly
how long the node has been active rather than how long it has been up.
That means the most stable node is more likely to win elections.

Having a stable recovery master means that disabling takeover runs
while reloading IPs is more likely to succeed.  It also improves the
chances of being able to cache information in the recovery master -
for example, between takeover runs.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit f0f48f22f45e4c82eba2582efae307e25385de81)
2013-09-19 12:54:29 +10:00
Martin Schwenke
630196423a recoverd: Banned nodes should not be told to run "ipreallocated" event
They will reject it because they are in recovery.  This can result in
extra banning credits being applied to banned nodes.

This corresponds to commit 9132e6814ed927fa317f333f03dedb18f75d0e5b
from the 1.2.40 branch.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 403938804caf1322f9773d63197e4303a7b2a788)
2013-09-18 17:16:35 +10:00
Martin Schwenke
d30e269ecc common: Make parse_ip() valgrind-clean
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit c0bb147ca09e82019b05ec22995623cffc3184e2)
2013-09-11 15:35:38 +10:00
Martin Schwenke
8d11da3546 recoverd: Remove an orphaned comment
This should have been removed with the associated code in commit
14bd0b6961ef1294e9cba74ce875386b7dfbf446.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 36de63843de10a1f2a9ccdbbee24cc1d08542984)
2013-09-11 15:35:16 +10:00
Martin Schwenke
4e62553fcb recoverd: Update a comment to use current terminology
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit ea5576071b22e1877903ec0921d375626a23e13b)
2013-09-11 15:35:10 +10:00
Martin Schwenke
fe7f66547b client: Remove unused function list_of_active_nodes_except_pnn()
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit d8a76cf79f07dfb5a93c6c9a13f16e3268c7dd57)
2013-09-11 15:35:03 +10:00
Martin Schwenke
c870f01160 tools/ctdb: list_of_active_nodes_except_pnn() -> list_of_nodes()
list_of_active_nodes_except_pnn() is only used here and can be removed
if we remove this call.  Less is more...

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit d4e206fb818048b7fab4797c877b854bdbb1ab70)
2013-09-11 15:34:58 +10:00
Martin Schwenke
2d31ec2131 tools/ctdb: Fix a memory leak in parse_nodestring()
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 8753a094b97340deb26dd44f6ea345ca0a642a95)
2013-09-11 15:34:51 +10:00
Martin Schwenke
e003699686 tests/eventscripts: Tests for memory checking in 00.ctdb
... plus updates to test infrastructure to support.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 4a388fc6bf54636b7e1f6da8e6aa451cddd574f7)
2013-09-11 15:34:42 +10:00
Martin Schwenke
b88bf1275c eventscripts: Clean up monitoring of system memory in 00.ctdb
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 16fcff0d1993b7a0479341862ea44d10bd5c6d6d)
2013-09-11 15:34:30 +10:00
Michael Adam
18f17aaa33 server: standardize formatting of comment block for ctdb_reply_dmaster() while I'm at it..
This was the comment block I was touching and meant to adapt in
commit 00d3bf092e2f72eda330978c75ec85f17e870553.
My search was apparently not unique...

Signed-off-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit 09940255011b119dc6af3304f5d3e9568e6006fd)
2013-08-26 13:24:32 +02:00
Martin Schwenke
128e2cb29d doc: Update NEWS
Signed-off-by: Martin Schwenke <martin@meltin.net>
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit c446579fc442955ecc74f5566eaa0635c3171498)
2013-08-22 18:07:49 +10:00
Amitay Isaacs
7531b9528f build: Fix build dependencies for ctdb_lock_tdb
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit eb8575718400c45626cd1b2e0fd247bc3ebff655)
2013-08-22 17:59:59 +10:00
Martin Schwenke
1c3f4f55b0 tests/simple: Minimise the chance of a monitor event being cancelled
A monitor event following a "ctdb delip" might reconfigure services.
If the monitor event is cancelled then a service might be stopped but
not yet restarted and this could result in the subsequent monitor
events failing.

This obviously needs to be fixed in CTDB itself.  This will happen by
making "ctdb reloadips" the supported way of reconfiguring IPs.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 618ea3660e36e7bd92b686e1ca8728cf63c3c068)
2013-08-22 17:00:20 +10:00
Martin Schwenke
aecd66d0a0 packaging: Remove pushd/popd from maketarball.sh, don't need bash
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 3ffca990a18cbd31c8bd3ae01c6671d60da58f58)
2013-08-22 17:00:20 +10:00
Martin Schwenke
a04fb43708 tools/ctdb_diagnostics: Add output of "ctdb getdbmap"
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit f0d69a9079b7aecc68f1d2d8510702046b618b19)
2013-08-22 17:00:20 +10:00
Martin Schwenke
6c468c94a2 tools/ctdb_diagnostics: Safer temporary file creation
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 406e1cb1fdd17ddd239774d0228e3657b73ae68f)
2013-08-22 17:00:20 +10:00
Martin Schwenke
cc74417341 eventscripts: Avoid using a temporary file in 62.cnfs
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 81833052d7ee8f76b1e98376a0273448640cfa8e)
2013-08-22 17:00:20 +10:00
Martin Schwenke
bb974f150b scripts: Remove gdb_backtrace
This uses potentially insecure temporary files and is not referenced
anywhere else.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 4b914d7e217202f3d11a8e95f9f74bc17869475b)
2013-08-22 17:00:20 +10:00
Martin Schwenke
d1918ba27a tools/ctdb: Make most non-auto-all commands abort if run with -n all
Or if run with -n A,B,...

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit b1d8732b5da18ae80aea1df0e66b0b5cdcd919bc)
2013-08-22 17:00:20 +10:00
Martin Schwenke
fd79a86d8f tools/ctdb: Remove more non-essential fetching of PNN from daemon
The useful cases are either CTDB_CURRENT_NODE, in which case
ctdb_get_pnn() does the job, or a PNN, which is... ummm... a PNN!  :-)

This works because parse_nodestring() validates PNNs.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 7b3f7eea2465efb099a2faf3e42174bc97b13a16)
2013-08-22 17:00:20 +10:00
Martin Schwenke
3402ae9ffb tools/ctdb: Improve auto-all settings for some commands
* ipreallocate is cluster-wide so should not be auto-all

* enablescript, disablescript, getreclock, setreclock, natgwlist can
  all be auto-all without issues

* xpnn, ipiface a local-only so don't work with -n, so might as well
  not be auto-all

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 123a4677528cb46bee1c6dad8a5162eba9880bc1)
2013-08-22 17:00:20 +10:00
Martin Schwenke
3afcc53516 recoverd: Remove an unused temporary talloc context
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit da22d5e60dc023009854025cc9e6bc4b0a84c60e)
2013-08-22 17:00:20 +10:00
Martin Schwenke
1ae731198a recoverd: Move struct ctdb_public_ip_list back into ctdb_takeover.c
This is an internal structure.  It was moved into ctdb_private.h a
long time ago to allow unit testing.  Unit test compilation was
changed shortly afterwards to make this unnecessary.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit db57261d7dc264e161659a8c547f44fbd9e88eeb)
2013-08-22 17:00:20 +10:00
Martin Schwenke
e657f75484 recoverd: Log more information when interfaces change
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 3ef93a1a3e60cdf5d8954e7a16a988ea6126916b)
2013-08-22 17:00:20 +10:00
Amitay Isaacs
58e96eb178 traverse: Log when database traverse is started
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 256b157232c60bc432c94e54b1fae9699f737557)
2013-08-22 17:00:19 +10:00
Amitay Isaacs
e850a6d2ca ctdbd: Finish eventscript callback processing before debugging hung script
This ensures that the result of eventscripts is updated and callback is
processed before debugging hung script.  So "ctdb scriptstatus" output
will be useful from debug hung script.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-Programmed-With: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 4ed2efb838d2ac97746666f614ebef5fdf3cdd5e)
2013-08-22 17:00:19 +10:00
Amitay Isaacs
19444f7c3d ctdbd: Make sure call data is freed if doing an early return
This should avoid memory bloat when a request bounces between nodes.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 7677fb263f06a97398e2c546e32273fb96edca69)
2013-08-22 16:59:49 +10:00
Amitay Isaacs
a61a4b1254 common/io: Limit the queue buffer size for fair scheduling via tevent
If we process all the data available in a socket buffer, CTDB can stay busy
processing lots of packets via immediate event mechanism in tevent.  After
processing an immediate event, tevent returns without epoll_wait.  So as long
as there are immediate events, tevent will never poll other FDs.  CTDB will
report this as "Event handling took xx seconds" warning.  This is misleading
since CTDB is very busy processing packets, but never gets to the point of
polling FDs.

The improvement in socket handling made it worse when handling traverse
control.  There were lots of packets filled in the socket buffer quickly and
CTDB stayed busy processing those packets and not polling other FDs and timer
events.  This can lead to controls timing out and in worse case other nodes
marking busy node as disconnected.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 92939c1178d04116d842708bc2d6a9c2950e36cc)
2013-08-22 14:08:52 +10:00
Amitay Isaacs
cfb7f74fa2 Revert "common/io: Keep queue buffer size multiple of 4K"
This reverts commit 5e9b1a7e24d058ff88aaa0563db36a804e866fa9.

This is not the best approach.  Allowing queue buffer size to grow
indefinitely causes large number of CTDB packets to be queued up very
quickly which when processed via immediate events will block CTDB from
processing events from other FDs.  If there are immediate events queued
up, tevent will never process any of the FDs till all immediate events
are processed.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit d8b094e804efc53fae9f44c6ef961b7b5797d290)
2013-08-22 14:08:52 +10:00
Amitay Isaacs
1467b666f2 Revert "LACOUNT: Add back lacount mechanism to defer migrating a fetched/read copy until after default of 20 consecutive requests from the same node"
This reverts commit 035c0d981bde8c0eee8b3f24ba8e2dc817e5b504.

This is a premature optimization.  Record can bounce between nodes
very quickly if it is a contended record.  There is no need to hold a
record on a node unnecessarily.  In case record contention becomes bad,
enabling sticky records on a database is a better idea.

Conflicts:
	include/ctdb_private.h
	server/ctdb_tunables.c

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit ac417b0003f0116f116834ad2ac51482d25cfa0d)
2013-08-22 14:08:52 +10:00
Amitay Isaacs
59dae19f5a ctdbd: Print a log message when a key becomes hot
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 48f40985f4592c28402303ccbb458756f4914f75)
2013-08-22 14:08:52 +10:00
Amitay Isaacs
27fd34e9ff ctdbd: For volatile databases, write an empty record with rsn=0 only on dmaster
Empty record with rsn=0 should not be written on any other node other than
dmaster.  This is however not true for persistent databases.  So currently
apply the check only for volatile databases.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit df83ae7a047dab4803e0d94b1c11df48ae17ca96)
2013-08-22 14:08:52 +10:00
Martin Schwenke
73da6c0201 tools/ctdb: Fix message in showban when node is banned
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 5cdad2b8ebd71a5e458c301d00eac00a211feeb3)
2013-08-21 14:02:36 +10:00
Martin Schwenke
b74c232b8a tools/ctdb: Reimplement ban/unban using update_flags_wait_and_ipreallocate()
This has the side effect of making these commands more resilient to
control timeouts.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 0fe79662e20e347d9e1cb12a42cd356e33572402)
2013-08-21 14:02:36 +10:00
Martin Schwenke
b42b0e4676 tools/ctdb: Factor out common pattern used in disable/enable/stop/continue
Now we will only have one set of bugs.  :-)

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 444521c852749558f39dc6131acce9e47eefd489)
2013-08-21 14:02:36 +10:00
Martin Schwenke
f72f4c362b tools/ctdb: Factor, simplify and improve robustness of ipreallocate code
Having other functions call control_ipreallocate() suggests that the
it might look at the argv/argv arguments that are passed.  This is not
the case.  Change the callers so they call the new ipreallocate()
function instead.

Broadcast CTDB_SRVID_TAKEOVER_RUN to all connected nodes.  Inactive
nodes will ignore it.  This is safe since we only want 1 reply.  If we
didn't get a response, we don't actually care if there's no active
recovery master - just fire, wait, retry, ...

Ignore some failures on the basis that they might be transient, so it
is probably worth retrying.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 4bf0b1c9d21986eecb7682f935bd6154c65533cc)
2013-08-21 14:02:36 +10:00
Martin Schwenke
db121b4c8f tools/ctdb: Use ctdb_get_pnn() to get PNN of the current node
This has already been stored at connect time and can't fail.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit d8eb2e7fdd7645719370dad4f2faa5c3fffa8249)
2013-08-21 14:02:36 +10:00
Michael Adam
aa1360aeb2 util: In passing the code, fix a space vs. tab in set_close_on_exec().
Signed-off-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit f9556a6f1fe0046308c8b363e6dcaf3f7ce6f2b7)
2013-08-19 17:12:33 +02:00
Michael Adam
621bfe8b0d server: standardize formatting of comment block for ctdb_reply_dmaster() while I'm at it..
Signed-off-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit 00d3bf092e2f72eda330978c75ec85f17e870553)
2013-08-19 17:12:33 +02:00
Michael Adam
922246de73 server: fix wording and punctuation in comment block for ctdb_reply_dmaster().
Signed-off-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit cb3a1c5af3b796dba30cae07118670d3c9e57df7)
2013-08-19 17:12:32 +02:00
Amitay Isaacs
cb8310ddb6 recoverd: Improve log message when nodes disagree on recmaster
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 7b7aa7b599536cd60ebb84d363607bb4e953248a)
2013-08-14 16:55:51 +10:00
Amitay Isaacs
3c0a477911 common: Null terminate process name string so valgrind doesn't complain
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 1c9025fdd08d1cea342af7487d0123015e08831b)
2013-08-14 16:55:51 +10:00
Amitay Isaacs
ae30b61255 vacuuming: Fix vacuuming bug where requests keep bouncing between nodes (part 2)
This is caused by corruption of a record header such that the records
on two nodes point to each other as dmaster.  This makes a request for
that record bounce between nodes endlessly.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit f0853013655ac3bedf1b793de128fb679c6db6c6)
2013-08-14 16:55:51 +10:00
Amitay Isaacs
ee8d573069 vacuuming: Fix vacuuming bug where requests keep bouncing between nodes (part 1)
This is caused by corruption of a record header such that the records
on two nodes point to each other as dmaster.  This makes a request for
that record bounce between nodes endlessly.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit a610bc351f0754c84c78c27d02f9a695e60c5b0f)
2013-08-14 16:55:51 +10:00
Amitay Isaacs
f9be4803cb db_wrap: Make sure tdb messages are logged correctly
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 60cb40d090e45ff6134c098a238fac7ad854f134)
2013-08-14 16:55:51 +10:00
Martin Schwenke
fec69034ee eventscripts: Become unhealthy faster on nfsd failure
Anecdotal evidence suggests that most nfsd RPC check failures are due
to cluster filesystem or storage problem.  Apparently these are rarely
helped by attempting to restart the NFS service because the restart
tends to hang.

Fail after 2 nfsd RPC check failures, instead of waiting for 6
failures.  Restart on every 10th failure to try to bring the node back
to good health.

Update unit tests to match.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit e9ef93f7b6dad59eabaa32124df81f3e74c651ef)
2013-08-14 16:10:30 +10:00
Martin Schwenke
4cb3e2cd78 tools/ctdb: Increase default control timeout to 10 seconds
The current 3 second timeout is arbitrary and users trip over it
sometimes.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit b49c4f39666d5b1596213bf41bcdc47ed3c327ae)
2013-08-14 15:57:04 +10:00
Martin Schwenke
e6ce2f55ef eventscripts: Improve message logged when a counter hits a limit
It should print the actual number of consecutive failures rather than
the limit.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit ff5f0d1e29af2b293e30cdc54bed03a644be7038)
2013-08-14 15:57:04 +10:00
Martin Schwenke
35d9631eda eventscripts: Print a message when waiting for TCP connections to be killed
This makes the gaps in the logs more obvious.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 11fbf4789d783dd0bac22754b374dd9ea4b03bad)
2013-08-14 15:57:04 +10:00
Martin Schwenke
b1f7337d2b eventscripts: New configuration variable $CTDB_RPCINFO_LOCALHOST
Passing "localhost" to the rpcinfo command causes overheads, like
reading /etc/services multiple times.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 1d61988af9e4fa3621a3e2d06a859bcb53df2d67)
2013-08-14 15:57:04 +10:00
Martin Schwenke
0ca046577f eventscripts: Add modulo (%) operator to ctdb_check_counter()
Also add it to the corresponding eventscript unit test infrastructure.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit f4ef83a256f59eeb00b9a5bc10c28347e1ad1031)
2013-08-14 15:57:03 +10:00
Martin Schwenke
bdbe37b24f eventscripts: Separate out RPC service restart code
While doing this:

* Explicitly assign RPC program and version information in
  _nfs_check_rpc_common().  This is more lines of code but is easier
  to read.

* Don't print the options when starting a service.  Trying to print it
  makes the code messy for little benefit.

  Update the eventscript unit testing code and a Ganesha test to
  reflect this.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit e8b531405665885196c95fe1608db33a255bf761)
2013-08-14 15:57:03 +10:00
Martin Schwenke
2afb5632c7 tests/eventscripts: Override background_with_logging(), just prepend "&"
That is, output that goes through background_with_logging() just gets
"&" prepended to each line.  This is cleaner than having the tests
grovel through logs.

Update some 49.winbind/50.samba tests to deal with this.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 3ba933d806106d12bc48b83b22d0f314d9d1e5e5)
2013-08-14 15:57:03 +10:00
Martin Schwenke
df539a66cb eventscripts: Remove support for RPC service 'q' and 's' restart flags
They're hard to maintain and provide very little benefit.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 1a1be43f8466d46913dcdfe6dcedb94316cd28ad)
2013-08-14 15:57:03 +10:00
Martin Schwenke
5459cdc8a6 eventscripts: When restarting the nfslock service only show output of start
That is, /dev/null the "stop" output.  This is consistent with the way
CTDB generally deals with the output when stopping a service.

It also makes updating the eventscript unit tests easier.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit c7332526b1b488abefeb4be78a7cd3f2f9abc451)
2013-08-14 15:57:03 +10:00
Martin Schwenke
d63cf0e7a7 tests/simple: Unreachable node test should wait for recovery to complete
This should minimise the chances of a control timing out.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 63be516673c5d9c0d543617bf1bb8bca919956a8)
2013-08-14 15:57:03 +10:00
Martin Schwenke
0997b0c400 tests/simple: Fix the missing IP test
Update the missing IP test to wait until restarts are complete.
Otherwise a service restart can collide with the following monitor
event and cause chaos.

Also, do not disable 10.interface until it matters.  Disabling it too
early can cause even more chaos if something goes wrong with the
monitor step.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 4e3bd06916bd3adac213fb18c7c2a24854b02d45)
2013-08-14 15:57:03 +10:00
Amitay Isaacs
8f1e94dfa4 recoverd: Use TDB_INCOMPATIBLE_HASH when creating volatile databases
When creating missing databases either locally or remotely, recovery
master calls ctdb_ctrl_createdb().  Recovery master always passes 0
for tdb_flags.  For volatile databases, if TDB_INCOMPATIBLE_HASH is not
specified, then they will be attached without using jenkins hash causing
database corruption.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 2fc6b6403707a292d134140fc0b9145b454992c5)
2013-08-14 15:54:48 +10:00
Amitay Isaacs
de6b97ce4f Revert "recoverd: Use correct tdb flags when creating missing databases"
This reverts commit 10a057d8e15c8c18e540598a940d3548c731b0b4.

This approach would not work when creating local databases since currently
there is no control to receive TDB flags for remote databases.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit ca61eb776ab862bd269e45ee0f9f96e7e1e0e001)
2013-08-14 14:15:33 +10:00
Amitay Isaacs
d349b56e2d common/io: Keep queue buffer size multiple of 4K
Currently queue buffer size is realloc'd every time we need to extend the
buffer.  Small increments can cause memory fragmentation.  Instead always
extend buffer in multiples of 4K.  This should reduce multiple talloc_realloc
calls when there are lots of packets in the socket buffer.

Also, if queue buffer has grown larger than 64K, throw away the buffer once
all the requests in the queue have been processed.  That way queue does not
hold on to large buffers.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 5e9b1a7e24d058ff88aaa0563db36a804e866fa9)
2013-08-09 11:07:37 +10:00
Martin Schwenke
6f9090648a packaging: Allow setting custom release number in RPM spec file
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-Programmed-With: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 867afb247bd8cc86c8d738f051a44cc534cafacf)
2013-08-09 11:07:37 +10:00
Amitay Isaacs
a98baa539e ctdbd: When a record is made sticky, log only once
Instead of logging from ctdb_request_call(), log the message from
ctdb_make_record_sticky().  That way if the record is already sticky, the
message is not repeated unnecessarily.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 44a64d1c388bfe3c3388b191edfaedecfb7bb831)
2013-08-09 11:07:37 +10:00
Amitay Isaacs
d42cea6efe ctdbd: Improve high hopcount log messages when request is redirected
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 9cde47e1a5bf1b9ca3b4da8c2db94caac2b1aa5e)
2013-08-09 11:07:37 +10:00
Martin Schwenke
98163e01a9 scripts: Do not run ctdb tool commands when debugging hung "init" event
CTDB daemon is not ready to accept clients in INIT runstate (init event).
CTDB daemon will start accepting connections in SETUP runstate (setup event)
and later.

Also, minor log formatting changes.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 81d7ce03b28d592a1337639e14d9ea141e20bfff)
2013-08-09 11:04:55 +10:00
Amitay Isaacs
ded2f28954 ctdbd: Avoid leaking file descriptor if talloc fails
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit d7f6bc3fed2dc61e6e587b4c0ec0ac27d533bbbe)
2013-08-09 11:04:55 +10:00
Amitay Isaacs
a030b938ca eventscript: Wait for debug hung script to finish or timeout before continuing
Currently if the debug hung script takes long time to finish, the subsequent
monitor event can collide with the previous event which is not yet finished.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 9e99e0eb072e2b845914ee3896acbc66b96138d7)
2013-08-09 11:04:55 +10:00
Amitay Isaacs
f5ddb49e62 eventscripts: Use configured RECLOCK file instead of asking CTDB
On cluster where recovery lock file is not being used, asking CTDB daemon
is unnecessary overhead.  And if CTDB is using recovery file, then changing
configuration without restarting is *stupid*.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-Programmed-With: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 44eb86e6042adb6efe75d2a5528b82a0f21d496d)
2013-08-09 11:04:55 +10:00
Amitay Isaacs
477a51aba5 locking: Do not create multiple lock processes for the same key
If there are multiple lock helper processes waiting for the same record, then
it will cause a thundering herd when that record has been unlocked.  So avoid
scheduling lock contexts for the same record.  This will also mean that
multiple requests will get queued up behind the same lock context and can be
processed quickly once the lock has been obtained.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit ebecc3a18f1cb397a78b56eaf8f752dd5495bcc9)
2013-08-09 11:04:55 +10:00
Amitay Isaacs
9ba793a80f locking: Move function find_lock_context() before ctdb_lock_schedule()
So that ctdb_lock_schedule() can call this function without requiring extra
prototype declaration.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 68af5405acc123b5a90decd2123e2a02961a8fcf)
2013-08-09 11:04:42 +10:00
Amitay Isaacs
b77fec9381 ctdbd: Print set db sticky message after it's set
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 824dcec35ec461d78e22b2ea109473b32bfe3972)
2013-08-01 11:08:26 +10:00
Amitay Isaacs
1d9d1d8cf9 tests: Add a test program to hold a lock on a database
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit f6b066a23610fb0092298861c21a9b354b91e2f1)
2013-08-01 11:08:26 +10:00
Amitay Isaacs
f15e1a28a7 recoverd: Use correct tdb flags when creating missing databases
When creating missing databases either locally or remotely, make sure
to use the correct tdb flags from other nodes.  Without this, volatile
databases can get attached without TDB_INCOMPATIBLE_HASH flag.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 10a057d8e15c8c18e540598a940d3548c731b0b4)
2013-08-01 11:08:25 +10:00
Amitay Isaacs
e44c38dc45 client: Always use jenkins hash when attaching volatile databases
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 7e7e59c4047c78159387089eca65d90037bcf722)
2013-08-01 11:08:25 +10:00
Amitay Isaacs
5ba280d8ce recoverd: Make sure to use jenkins hash for recovery databases
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 32c83e209823e9a4d6306bb7fd63d4500f3e2668)
2013-08-01 10:51:14 +10:00
Amitay Isaacs
f1f787ccac recoverd: Assemble up-to-date node flags information from remote nodes
Currently nodemap used by recovery master is the one obtained from the local
node.  This information may have been updated while processing main loop.
Before comparing node flags on all the nodes, create up-to-date node flags
information based on the information received from all the nodes.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit fcf77dec5af973a0e32f3999bc012053a6f47a96)
2013-07-30 15:34:32 +10:00
Amitay Isaacs
16b519c51b tools/ctdb: Only print the hot records with non-zero hopcount
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 049d9beb3783482490e6273a434ccbad23f85f0a)
2013-07-30 15:34:32 +10:00
Amitay Isaacs
0993387f4a ctdbd: Don't consider a hot record if the hopcount is zero
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit ab35773518ad15588013f4d859f7bee790437450)
2013-07-30 15:34:32 +10:00
Amitay Isaacs
054d8727ed ctdbd: Fix updating of hot keys in database statistics
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit fde4b4db5a57f75c5efa5647c309f33e0d5a68f3)
2013-07-29 16:00:46 +10:00
Amitay Isaacs
d8fc36781c ctdbd: Remove incomplete ctdb_db_statistics_wire structure
Instead of maintaining another structure, add an element as place holder for
marshall buffer of hot keys.  This avoids duplication of the structure.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit e73b2e12adc9db1dedb48d32bba3a8406a80f4cd)
2013-07-29 16:00:46 +10:00
Amitay Isaacs
854216236b Revert "ctdbd: Remove incomplete ctdb_db_statistics_wire structure"
The structure cannot be removed without adding support for marshalling keys
for hot records.

This reverts commit 26a4653df594d351ca0dc1bd5f5b2f5b0eb0a9a5.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 023ca2e84f5ed064a288526b9c2bc7e06674dd81)
2013-07-29 16:00:46 +10:00
Martin Schwenke
e14fa50941 doc: Update XML files to use standard DocBook DTD
This simplifies building since we don't use any of the Samba
extensions.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 57aa2dffea60abd73a95233f8b761cc676adebb6)
2013-07-29 15:58:51 +10:00
Martin Schwenke
3c73949317 initscript: The wrapper script should export CTDB_SOCKET
This ensures that any invocation of the ctdb tool (within the wrapper)
gets the desired value.  This at least ensures that ctdbd will be
started.

If a non-standard value is set for CTDB_SOCKET then command-line users
will still need the variable in their environment.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 37ccc7c6cc43a80aaa92291aea7a438f4225488a)
2013-07-29 15:58:51 +10:00
Martin Schwenke
a5cb72cac3 ctdbd: Kill client process without checking for tracked child
Commit f73a4b1495830bcdd094a93732a89dd53b3c2f78 added a safety check
to ensure that CTDB never kills unrelated processes.  However, client
processes are unrelated.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 782814288bb560099ee44b607bf35f3eddf37f82)
2013-07-29 15:58:51 +10:00
Martin Schwenke
a8dd716146 eventscripts: kill_tcp_connections() should send connections to stdin
This avoids issuing multiple "ctdb killtcp" commands to terminate tcp
connections, one per connection.  This will considerably reduce the
time when there is a large number of tcp connections.  This also makes
it possible to avoid calling "ctdb killtcp" when there are no connections.

Add a couple of unit tests for killtcp and update eventscript unit
test infrastructure to support.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit a20d94717d2e4ab866d8a002cdf39c0669b74c6a)
2013-07-29 15:53:06 +10:00
Martin Schwenke
200c28fbb2 tools/ctdb: Allow killtcp to read connections from standard input
This will allows eventscripts to send information about multiple tcp
connections to a single "ctdb killtcp" command, saving the overhead of
setting up a client connection per tcp connection.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit af5aa369c266430fe912df0c26116b68bac3572e)
2013-07-29 15:51:03 +10:00
Martin Schwenke
34d55048bc tests: Always tally the number of passed/failed tests
Regardless of whether a summary is being printed!

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit a69e03a5e4671e998d45b4fef8611a421bbdb3e1)
2013-07-29 15:49:23 +10:00
Martin Schwenke
f46ab595d1 recoverd: Call takeover fail callback only once per node
Currently the fail callback is called once per (takeip/releaseip) control
failure.  This is overkill and can get a node banned much too quickly.

Instead, keep track of control failures per node and only call fail
callback once per failed node.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit bf4a7c1ad87e0e848296d15d63eb8cd901ca5335)
2013-07-29 15:48:48 +10:00
Martin Schwenke
67b22b6e94 scripts: Run scriptstatus for hung event
The timeout information printed by ctdbd is less than useful because
it refers to the cumulative time taken by the eventscripts run so far.
Adding scriptstatus output indicates where time was actually spent.

Since there is now quite a bit of output, serialise the calls to this
script using flock.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 1b016b2dfc5d7d3f2a42ce4dfe569608e90eb714)
2013-07-29 14:02:13 +10:00
Martin Schwenke
6cbcc4a8d9 ctdbd: Pass event name to hung script debugger
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit e0f3fa1020e13b84bdd672538168d148f1847d57)
2013-07-23 11:28:07 +10:00
Martin Schwenke
6882625cfe tests/complex: Fix NFS tests to work with root_squash
Refactor the NFS test setup/cleanup code into new common functions.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 29e98017221326bdc9b1c4f7c05b3b495c1de29b)
2013-07-23 11:28:07 +10:00
Martin Schwenke
1584f296b4 tests: Fix exit status of run_tests when a single test is run with -H
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 9d6e1c147bd036d832b98c155f405ee2a5d6f57f)
2013-07-22 19:38:50 +10:00
Martin Schwenke
417ee2f0aa tests/simple: Add -p in onnode test to help show groups of connections
Change the command from "true" to "hostname" since the former won't
produce any output when used in combination with "onnode -p".  This
could just be changed to "echo" but the hostname might actually be
useful.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit ae3c03d80264e997b7da9f3279d7810e18b8a1df)
2013-07-22 19:36:58 +10:00
Martin Schwenke
88ba32b787 ctdbd: Sleep at exit to allow time for log messages to flush
Register print_exit_message() earlier so that it covers most of the
early exits.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 90d792cf28d6a823141e4c417b6978f02a9cf596)
2013-07-19 15:40:59 +10:00
Martin Schwenke
84f5528d9b ctdbd: Exit if something is already listening on CTDB socket
Don't blindly remove the socket.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 3dd5b925dcf0e9a5b877638e471c5ecf36b46c58)
2013-07-19 15:40:43 +10:00
Martin Schwenke
363315aca5 tests/eventscripts: Add tests for monitoring of missing interfaces
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 53e4eca74429f76adc81d98e3d11d1bd61194d71)
2013-07-19 15:37:14 +10:00
Martin Schwenke
1da757d91a eventscripts: A missing interface should cause monitoring to fail
A missing interface is at least as bad as an interface with a link
that is down so should have a similar effect.

This couldn't be done previously because orphaned interfaces used to
be listed for monitoring.  This was worked around in 10.interface in
commit 49b2d1bd9554461ed8edbfc21e777c0eca9e1443 and fixed in ctdbd in
commit cc1a3ae911d3fee8b87fda5de5ab6d9499d7510a.

If $CTDB_PARTIALLY_ONLINE_INTERFACES="yes" then monitoring won't
actually fail but the interface is still marked as down.

While we're touching this code, use "ip link" instead of "ip addr".
It is marginally cheaper but not enough for a separate patch.  ;-)

This effectively reverts d67955b42f7627be9dae995230c8fcbb8a948ec2.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 501f19b16fd6d67fbb754248868c38ee5bcf79ef)
2013-07-19 15:35:41 +10:00
Martin Schwenke
4b5c9c7991 eventscripts: Get list of configured interfaces using "ctdb ifaces"
This was previosuly changed because ctdbd didn't garbage collect
orphaned interfaces.  This was fixed in commit
cc1a3ae911d3fee8b87fda5de5ab6d9499d7510a.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit c6ab0f9405d5fa5b0b1693bc92e59da0d555a9d7)
2013-07-19 15:35:41 +10:00
Martin Schwenke
a3bef911f3 ctdbd: Allow extra recovery to repair persistent DBs during first recovery
Commit 8076773a9924dcf8aff16f7d96b2b9ac383ecc28 introduced a potential
regression because a node may not have completed the "recovered" event
(so might still be in CTDB_RUNSTATE_FIRST_RECOVERY) when another node
becomes healthy.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 57ef5d3827ea3417a32703e259a53ce6fd10ac45)
2013-07-19 15:35:41 +10:00
Amitay Isaacs
5f0b19c6f7 packaging: Bundle debug_locks.sh script in RPM
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 5740155cc5de1a223412e8529aa1a383a5412514)
2013-07-16 12:59:50 +10:00
Amitay Isaacs
bf2b388837 packaging: No need to check for existence of scripts, they always do
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 67c227a5d30cb8487b20b19b20bdfa4613906609)
2013-07-16 12:59:47 +10:00
Martin Schwenke
7610b6c009 scripts: ctdbd_wrapper logs a message to syslog if syslog is not being used
It can be very disconcerting when logging to syslog is expected but
nothing is being logged there.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 412bc0e20bef694d4e911dc9c984fd7716231f1f)
2013-07-11 15:18:06 +10:00
Mathieu Parent
27c2c61c21 Update Nagios check to work with ctdb versions past 30 Aug 2011
Because of commit a779d83a6213e2ba

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit a4afe7af9c9391048d6f80135bbd5e15367770c7)
2013-07-11 15:18:06 +10:00
Martin Schwenke
ca13f28eef recoverd: Really fix bogus info in message about changed flags
Commit 9119a568c2b4601318f7751f537dca2f92a7230b attempted to fix this.
However, this was wrong because old_flags and new_flags were confused.
The latter has since been fixed in commit
7eb2f89979360b6cc98ca9b17c48310277fa89fc so this can now be fixed
properly.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 40f2825d6e818dc8c745b6385a545969dfb45fbc)
2013-07-11 15:18:06 +10:00
Martin Schwenke
2b8913fed6 doc: Update NEWS
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 76703514040b804b880cab909f6ff52576f80f89)
2013-07-11 15:17:57 +10:00
Sumit Bose
67f8a0ed91 Print deleted nodes as well
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 0930a3b806977555509c3228726e2250aef1f971)
2013-07-11 15:16:56 +10:00
Sumit Bose
3dc280f5b0 IPv6 neighbor solicit cleanup
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit a81edf7eb908659a379f0cb55fd5d04551dc2c37)
2013-07-11 15:16:55 +10:00
Sumit Bose
1f96f42b73 Fix memory leak in ctdb_send_message()
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit da87395d29f5d11ecfedaf36b53fa060a9140bfd)
2013-07-11 15:16:55 +10:00
Sumit Bose
157f1cfefd Fixes for various issues found by Coverity
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 05bfdbbd0d4abdfbcf28e3930086723508b35952)
2013-07-11 15:16:55 +10:00
Sumit Bose
d039f799ac Check return value of tdb_delete()
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 5cdcc3d45d358ddbcd7e864898eed9cbd9935429)
2013-07-11 15:16:55 +10:00
Amitay Isaacs
a40b9f2e7c web: Update webpages
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit ed9ba1d3dcfcb51aa69bf4d7a74b95063743d8d9)
2013-07-11 15:16:55 +10:00
Amitay Isaacs
f6f2cad9df Tests: Correct the arguments to memset
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 9ffcd6a91287d86bae7b0c73aa129c81126e08e7)
2013-07-11 11:34:46 +10:00
Amitay Isaacs
94b8e3926b doc: Update NEWS
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-programmed-with: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 14141b02b61d2783b750ee5b30f9520253e88f09)
2013-07-10 18:15:38 +10:00
Martin Schwenke
e4d99cc899 packaging: Add systemd support
Based on an original patch by Sumit Bose <sbose@redhat.com>.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit e43a4b7b69a21c4cec2453dcac436b64bf5d7f06)
2013-07-10 18:14:33 +10:00
Martin Schwenke
af0f11a4ab build: Turn off all deprecation warnings
The "‘tevent_loop_allow_nesting’ is deprecated" warnings will be
around for a while and are annoying.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 30a0040fbb7c4d97d107f0e55c600295c2603a68)
2013-07-10 18:14:32 +10:00
Martin Schwenke
4349cb9807 build: Remove -DTEVENT_DEPRECATED_QUIET=1 from CFLAGS
This reverts the last part of 788cdbddbc902a5b076d23473450065b551d274d
- the rest of this has been implicitly reverted via tevent syncs.
This is just leftover noise.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit b6bbfb4c464c39e322830cbbebcc51c225508584)
2013-07-10 18:14:32 +10:00
Martin Schwenke
adbee6ae4e initscript: Simpify initscript and control CTDB via new ctdbd_wrapper
Currently the initscript is very complex.  This makes it hard to read
and hard to add support for new init systems, such as systemd.

Create a wrapper called ctdbd_wrapper to be installed alongside ctdbd.
This is called by the initscript to start and stop ctdbd.  It does the
ctdbd option construct and waits until ctdbd is properly initialised
before it exits.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit e3abc7eebab5cceddc4ce7817890dd5db9be3450)
2013-07-10 15:19:27 +10:00
Martin Schwenke
a86f1f109a recoverd: Recovery daemon should use ctdb_get_pnn, which can't fail
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit c6fded59fa4da67f738a90fdacb51900e41801f9)
2013-07-10 15:19:27 +10:00
Amitay Isaacs
14c49eabe4 ctdbd: Print tdb flags when logging attached to database message
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 846109169ee5e3d03135156e45c8dac93aa2e95b)
2013-07-10 14:33:19 +10:00
Amitay Isaacs
1c21f37e57 ctdbd: Set process names for child processes
This helps distinguish processes in process list in top, perf, etc.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 2493f57ce268d6fe7e4c40a87852c347fd60d29e)
2013-07-10 14:33:19 +10:00
Amitay Isaacs
500b26e48f common/system: Add ctdb_set_process_name() function
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit fc3689c977f48d7988eed0654fb8e5ce4b8bfc8b)
2013-07-10 14:33:19 +10:00
Amitay Isaacs
4357aebdb9 traverse: Remove unused start_time field
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit dc834d5e78c3fb97ae15cddf1139b3c4a4051a7c)
2013-07-10 14:33:19 +10:00
Amitay Isaacs
bf3dd9488e traverse: Send records directly from traverse child to srcnode
Currently CTDB daemon reads records from a child process and then sends them to
srcnode via TRAVERSE_DATA control.  This ties up main CTDB daemon and also
requires an extra copy of the record in the CTDB daemon.  Instead send records
directly from traverse child process.

The control from child process still goes via local CTDB daemon as there
is no infrastructure currently to open a TCP socket to the srcnode.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 1a74192aa7d51ed99553e7292860027f06b6ef37)
2013-07-10 14:33:19 +10:00
Amitay Isaacs
557b92fc88 traverse: Pass reqid and srcnode information to local database traverse
So that traverse child process can directly send the TRAVERSE_DATA control to
the srcnode without first sending it to local node.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit faabce1b99fb3de9ff03bf54d303e7656538fee3)
2013-07-10 14:33:19 +10:00
Amitay Isaacs
3dcdd39801 packaging: When building with system libraries, add dependency for them
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 8225b3e77e140db34b52571a95d553d1e59e3f1e)
2013-07-10 14:33:18 +10:00
Amitay Isaacs
d46c24f4d0 ctdbd: No need for DeadlockTimeout tunable
The code for deadlock detection and killing smbd process causing deadlock
has been removed and replaced with external debug script.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 2211cd94bea266547d3e6f167d3160a6b23bec88)
2013-07-10 14:33:18 +10:00
Amitay Isaacs
ae0afad8ee initscript: Export CTDB_DEBUG_LOCKS variable
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit a415a1986900135f889efc25ecaf2761b1dae81a)
2013-07-10 14:33:18 +10:00
Amitay Isaacs
f46d0e783c scripts: Add an example debug_locks.sh script to debug locking issue
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit c711ff4702c5f95b75e4bf030665fc2afffc2f9e)
2013-07-10 14:33:18 +10:00
Amitay Isaacs
c620457c0b locking: Use external script to debug locking issues
Use an external script to parse /proc/locks and log useful debugging
information about locks rather than doing that in C code.

To use this feature, add configuration variable to /etc/sysconfig/ctdb:

  CTDB_DEBUG_LOCKS=/etc/ctdb/debug_locks.sh

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 2bfb8499366d530f16515b08928056bbda40f781)
2013-07-10 14:33:18 +10:00
Amitay Isaacs
9ae379c91a locking: Update locking bucket intervals
0   < 1 ms
 1   < 10 ms
 2   < 100 ms
 3   < 1 s
 4   < 2 s
 5   < 4 s
 6   < 8 s
 7   < 16 s
 8   < 32 s
 9   < 64 s
10   >= 64 s

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 6fc36a7036933237d09151a0baf4d8ccd2bc2c99)
2013-07-10 14:33:18 +10:00
Amitay Isaacs
1afb7fccb2 locking: Update locks latency in CTDB statistics only for RECORD or DB locks
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit dcc42a75b4638b3aa40c44ed9e0aaae26483e2b0)
2013-07-10 14:33:18 +10:00
Amitay Isaacs
81e6d60f01 tools/ctdb: Fix the format of DB statistics output
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 594c421f90ce132c75fbd985872114e4967f92b5)
2013-07-10 14:33:18 +10:00
Amitay Isaacs
d36aa928fd ctdbd: Remove incomplete ctdb_db_statistics_wire structure
Send the ctdb_db_statistics directly instead of first copying it to
duplicate ctdb_db_statistics_wire structure.  This simplifies the
implementation of the control to get database statistics.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 26a4653df594d351ca0dc1bd5f5b2f5b0eb0a9a5)
2013-07-10 14:33:18 +10:00
Amitay Isaacs
c0798dfb64 ctdbd: Update debug messages for setting readonly property on database
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 545a46437dfb2b755bb2fddb11dea8c4ccce3ed7)
2013-07-10 14:32:52 +10:00
Amitay Isaacs
bcb64aa55f recoverd: Fix buffer overflow error in reloadips
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-Programmed-With: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 41182623891d74a7e9e9c453183411a161201e67)
2013-07-05 15:52:34 +10:00
Martin Schwenke
f92e49f6f8 tests/eventscripts: Add some rudimentary tests for 60.ganesha
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit e1cf1f728236d808bb41265e74bc65f54bf1c133)
2013-07-05 15:52:34 +10:00
Martin Schwenke
d6d1fb1f46 eventscripts: New configuration variable $CTDB_SKIP_GANESHA_NFSD_CHECK
This allows 60.ganesha to be unit tested, except for the core Ganesha
monitoring code.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit f606df4f2db754592e6d1a16c26e155cacb2beef)
2013-07-05 15:52:33 +10:00
Martin Schwenke
7f6169b207 eventscript: Move Ganesha nfsd monitoring to a function
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit ceb5b2d37f7ab4894908ec26f3812b3bed991525)
2013-07-05 15:52:33 +10:00
Martin Schwenke
c3e83d4532 eventscripts: Drop RPC service version from nfs_check_rpc_service() calls
Support for this was removed in commit
77302dbfd85754e02559eccb2dd6c090db0b6b9f and I overlooked its use in
60.ganesha.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 520914e7ee1b879c1080e5857fda18ed5b973fd6)
2013-07-05 15:52:33 +10:00
Martin Schwenke
dcdae86dc7 ctdbd: Log something when releasing all IPs
At the moment this is silent and it can be confusing to see IPs just
disappear.

Also, this message:

  Been in recovery mode for too long. Dropping all IPS

can cause anxiety when all IPs should already have been dropped.
Adding a comforting message saying that 0 IPs were dropped relieves
such anxiety.  :-)

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 4d0f26b306fc465d551d340b0e7dce4412eae3fd)
2013-07-05 15:52:33 +10:00
Martin Schwenke
0108e8ff10 recoverd: Minor style improvements for ctdb_reload_remote_public_ips()
* Add a variable to the loop to make the code more readable and have
  it generally fit into 80 columns.

* Improve comments.

* Improve log messages.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 0a292fa8939a1343e44cadaa8ed9f3c0f18ca82f)
2013-07-05 15:52:33 +10:00
Martin Schwenke
7290798a41 recoverd: Clean up log messages in remote IP verification
The log messages in verify_remote_ip_allocation() are confusing
because they don't include the PNN of the problem node, because it is
not known in this function.

Add the PNN of the node being verified as a function argument and then
shuffle the log messages around to make them clearer.

Also fold 3 nested if statements into just one.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit f0942fa01cd422133fc9398f56b4855397d7bc86)
2013-07-05 15:52:33 +10:00
Martin Schwenke
15115becef recoverd: Fix an unclear log message - "Restart recovery process"
When the recovery master notices a node in recovery mode it starts the
recovery process, it doesn't restart it.

Update documentation to match.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 298c4d2c3b4ea3d900c91f5a0a5aca2952a13d61)
2013-07-05 15:52:33 +10:00
Martin Schwenke
bfe0b93652 recoverd: Fix an incorrect comment
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 9f6cd8b0bea619991c9f3bf35188c5950dabf8f4)
2013-07-05 15:52:33 +10:00
Martin Schwenke
9c8cc863f7 ctdbd: Use ctdb_die() on "setup" event failure
This is slightly easier to read because it all fits on 1 line.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 035bf3eecf99337c84d4ad16cdbf297b1fa037db)
2013-07-05 15:52:33 +10:00