1
0
mirror of https://github.com/samba-team/samba.git synced 2024-12-24 21:34:56 +03:00
Commit Graph

620 Commits

Author SHA1 Message Date
Martin Schwenke
07d2ecfbcc ctdb natgwlist should return non-zero when there is no natgw.
This makes it 2, since this error corresponds loosely to ENOENT.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 1bf289abdd3067a40e9a67091aba78222d13eddf)
2011-08-03 15:39:33 +10:00
Ronnie Sahlberg
20a7c19691 Add log output to wipedb and backupdb
CQ S1025379

(This used to be ctdb commit 6f51d4a75f8a9f2cdb8ecde946ed31809ab5a415)
2011-07-06 13:13:18 +10:00
Martin Schwenke
5ddc10128a onnode: fix natgwlist nodespec
This hasn't worked for a while if ever.

We treat this case specially because the output has 2 works on the 1st
line.  We also handle the error case where /etc/ctdb_natgw_nodes
exists but none of the other $NATGW_* configuration is done.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 66e89797c7866d207a5bbf1836f52d70dba7cea6)
2011-06-08 14:24:00 +10:00
Martin Schwenke
1ef399e48d onnode: fix get_nodes_with_status()
Setting IFS and looping though items with colons in them doesn't work.
Change this to read through the output line by line.  The header line
needs to be thrown away by throwing away everything up to the 1st
newline.

Keep stderr from the "ctdb status" command, otherwise debugging is
impossible.

On error, append any output from ctdb to onnode's error message.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit d60592cf99999f10344a05ef0571fb300bb9d97c)
2011-06-08 14:23:40 +10:00
Martin Schwenke
41436193dd onnode: Remove an unnecessary comment.
The comment about $CTDB_NODES_SOCKETS is meaningless.  The code ti
refers to works just find with $CTDB_NODES_SOCKETS.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 74e69a564bac653dadfffe8b08145b9b3be16e61)
2011-06-08 14:23:14 +10:00
Martin Schwenke
f730194f12 onnode: Future-proof get_nodes_with_status().
The current code requires knowledge of the number of status bits
output by "ctdb status -Y".

This changes the code to be completely general.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit e1788f25fde3d1f26bf4831a331741aa280f6fbc)
2011-06-08 14:22:49 +10:00
Martin Schwenke
f3ea7bec68 onnode: Exit with error for unknown command-line flags.
Use of "local" was masking errors in command-line processing.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit ca80adda7517b43147ef30156ae34c66b29fa2bd)
2011-06-08 14:22:16 +10:00
Martin Schwenke
350f3e5b09 onnode: Be defensive when listing IPs of nodes with designated status.
The current version gives the last item left after stripping the known
fields.  If an insufficent number of status fields is stripped then
this would return a residual status field value, which turned out to
be a valid IP address for localhost...  so no error occurs.

This change means that the node number is stripped and any residual
status field value will stay appended, causing an error the first time
this command is tested.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 74715e6ec7b67c6f0e863aa51c87279758d6bf91)
2011-06-08 14:21:53 +10:00
Martin Schwenke
597083d37a onnode - Fix long standing bug in onnode healthy/ok/connected/con.
When the output of "ctdb status -Y" changed to add an extra status
column we didn't fix onnode.

This adds a match for the extra column.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 793febaebd3d484ddfbbcb47aaa0cdf3cfc1a00d)
2011-06-08 14:21:26 +10:00
Ronnie Sahlberg
5b93e0a870 Remove all checking of GPFS from ctdb_diagnostics
CQ S1023524

(This used to be ctdb commit 4cddba08b46db0a56a86b32403a41b89cd097317)
2011-05-11 21:25:25 +10:00
Gregor Beck
082de99f87 add ltdbtool - a standalone ltdb tool
This this is a tool to handle (dump and convert) ctdb's local tdb
copies (ltdbs) without connecting to a ctdb daemon.

It can be used to

* dump the contents of a ltdb, printing
  the ctdb record header information

* dump a non-clustered tdb database (like tdbdump)

* convert between an ltdb and a non-clustered tdb
  (adding or removing ctdb headers)

* convert between 64 and 32 bit ltdbs
  (the ctdb record headers differ by 4 bytes of padding)

usage: bin/ltdbtool dump [-p] [-s{0|32|64}] <idb>
       bin/ltdbtool convert [-s{0|32|64}] [-o{0|32|64}] <idb> <odb>

Pair-Programmed-With: Michael Adam <obnox@samba.org>

(This used to be ctdb commit efcf2815711cd5371633614fb91273bd0a786da0)
2011-05-04 12:48:50 +02:00
Ronnie Sahlberg
c23f2e8bea We default to non-deterministic ip now where ips are "sticky" and dont change
too much.
This means we can simplify the way we add ips significantly and stop
trying to move them.

We also check if the node already hosts the ip, in which case we used to return an error. Instead just print an error string but return 0, ok.
This makes it easier to script, and works around broken scripts.

CQ1021034

(This used to be ctdb commit 307e5e95548155a31682dfcb0956834d0c85838e)
2011-02-08 17:06:10 +11:00
Ronnie Sahlberg
6494574d8f db_exists() takes 3 arguments, not two.
(This used to be ctdb commit 2c02fc2d45cd7364d7bee0d6a89f1386131ef002)
2011-01-14 09:53:25 +11:00
Ronnie Sahlberg
2edbf0b2fb ADDIP failure
Found during automatic regression testing.
We do not allow the takeip/releaseip events to be executed during a recovery.

All of "ctdb addip, ctdb delip, ctdb moveip" use and force these events to
trigger to perform the ip assignments required.

If these commands collide with a recovery, these commands could fail since we do
not allow takeip/releaseip events to trigger during the recovery.
While it is easy to just try running hte command again, this is suboptimal for script use.

Change these commands to retry these operations a few times until either successfull or until we give up.
This makes the commands much easier to use in scripts.

(This used to be ctdb commit 6954c9df67501183995f408cca358c8fdfb176ab)
2011-01-13 16:18:58 +11:00
Ronnie Sahlberg
99d7e39efc ctdb addip:
After finishing "ctdb addip"  wait for an implicit "iptakeover" to complete
the assignment to a node.

This makes it more wasteful and timeconsuming when adding multiple ips
at once, or the same ip to multiple nodes,
but makes it easier to script the use of this command.

(This used to be ctdb commit d86cbf3d7d426c558d110d67dc985634c754a522)
2010-12-13 14:24:30 +11:00
Ronnie Sahlberg
a75bf138ab add new command line functions
ctdb readkey <dbid> <key>
ctdb writekey <dbid> <key> <value>

these are mainly intended for debugging of databases and dmaster migration issues

(This used to be ctdb commit 70c2e7dd04727371590fb94579ffd20318fbeb58)
2010-12-07 15:33:08 +11:00
Stefan Metzmacher
e75f6907c0 tools/ctdb: allow "ctdb pfetch" only on persistent databases
metze

(This used to be ctdb commit 63ad4a7fe7bd7c9597a4f5573e87f66e5234eb48)
2010-10-21 11:10:21 +11:00
Stefan Metzmacher
be7545e83a tools/ctdb: add 'persistent' flag to "ctdb attach"
metze

(This used to be ctdb commit 7a5790de22e8370b2812414aa1adef8201e8b269)
2010-10-21 11:10:15 +11:00
Stefan Metzmacher
19bc2e40ca tools/ctdb: let "ctdb catdb" pass the persistent flag to ctdb_attach()
metze

(This used to be ctdb commit 4ec99c1eeab529865ac790ef554f3b099a14faf1)
2010-10-21 11:09:55 +11:00
Ronnie Sahlberg
c1612205f1 Remove a debug message "Timed out waiting ..."
from the ctdb command.

This is a debugging message and is normal tro tigger on a busy system.
It should not be logged as ERROR.

(This used to be ctdb commit 9ddf89e01f1845eec1712d75fb811240e8bb0e37)
2010-10-13 09:23:17 +11:00
Ronnie Sahlberg
5ef29f9f25 Update latency countes to show min/max and average
(This used to be ctdb commit 1919e949af4641ffe919123e44b02fb87c13ab9f)
2010-10-11 15:12:24 +11:00
Ronnie Sahlberg
d0054d383d get rid of the "ctdb setflags" command since
1, we dont need it
2, it uses the ugly "modify flags" control that should die

(This used to be ctdb commit 25f96db966230e90291eee57841c9faaae33713b)
2010-10-07 16:19:24 +11:00
Ronnie Sahlberg
b67754fa4d when printing machinereadable statistics only print the header with the fieldnames once
(This used to be ctdb commit 70c8d429d7c13cbbd08184ff8f0aa506de5adccc)
2010-09-30 15:08:12 +10:00
Ronnie Sahlberg
1a716ec300 add a machinereadable version of ctdb stats/statistics
(This used to be ctdb commit 3a033156c48d821d48fd18f12c3b0ac14bbddc93)
2010-09-30 15:01:08 +10:00
Ronnie Sahlberg
9f66a93f12 Add rolling statistics that are collected across 10 second intervals.
Add a new command "ctdb stats [num]" that prints the [num] most recent statistics intervals collected.

(This used to be ctdb commit e6e16fcd5a45ebd3739a8160c8fb5f44494edb9e)
2010-09-29 12:14:45 +10:00
Ronnie Sahlberg
bb22ff0f50 Dont try to read the nodemap from the daemon for "ctdb listnodes"
Always read it from the /etc/ctdb/nodes file

(This used to be ctdb commit a0fdb25bb2cac177cdc32b938fa08fd665aa873e)
2010-09-09 07:38:28 +10:00
Ronnie Sahlberg
f5c0539dc6 Change how NATGW is configured to allow special nodes that do not have
network connectivity outside of the cluster to still be able to
participate in a natgw group.
These nodes can not become natgw master since they lack external network
connectivity.

These nodes are configured just the same way as for any other node with
NATGW, with the following two exceptions :
* we do NOT set CTDB_NATGW_PUBLIC_IFACE at all on these nodes.
  since these ndoes lack external network we should not check the interface
  for link.
* we must set CTDB_NATGW_SLAVE_ONLY=yes to flag that this is a node that
  can not become natgw master.

(This used to be ctdb commit ab7b00a37e55beffc074be95b55d8a5c7cb9eef2)
2010-09-08 09:20:16 +10:00
Ronnie Sahlberg
55c619f072 the tfetch command can be used without the daemon running, so flag it as such.
fix a couple of incorrect settings for "auto-all" for a few of the commands as well.

(This used to be ctdb commit 9999771105d7105efaa232fe2842e21e66f78706)
2010-08-25 11:11:12 +10:00
Ronnie Sahlberg
018063b8eb add a new command "ctdb tfetch" that can read a record straight out of the
tdb file.

the command automatically strips off the initial ctdb header off the record so it can only be used on ctdb managed tdb files, not on normal tdb files.

(This used to be ctdb commit c3a816e5174abefb5155f65d8faad7b1e831e481)
2010-08-25 10:56:02 +10:00
Ronnie Sahlberg
f75b984b71 When "ctdb pfetch" creates a new file, make sure we set some initial sane mode bits
(This used to be ctdb commit 87160c91bfd87e8b9c510dacbf00e5aa481d2305)
2010-08-25 10:35:12 +10:00
Ronnie Sahlberg
4c5a4015f3 change "ctdb pfetch" to take an optional third argument
as a file to store the record in.

(This used to be ctdb commit 6d7e62f5401f0647a519fe0b74ec628418e33231)
2010-08-25 08:07:47 +10:00
Ronnie Sahlberg
a8db1adcd6 add a command to write a record to a persistent database
"ctdb pstore <db> <key> <file containing possibly binary data>"

(This used to be ctdb commit 14184ab7c80a3ef16c54b4ab168fd635b7add445)
2010-08-24 14:00:18 +10:00
Ronnie Sahlberg
4da818504a get rid of two compiler warnings
(This used to be ctdb commit 0865f0e6ef671396aa862f6a79a48a4891d72122)
2010-08-24 14:00:10 +10:00
Ronnie Sahlberg
401732a56b Add a command "ctdb pfetch <db> <record>" to read a record from
a persistent database.

(This used to be ctdb commit 3bef831b96ce8b40457ed4de527f0d62fa6a5b00)
2010-08-24 14:00:02 +10:00
Ronnie Sahlberg
1ef66379d7 ctdb ip is very busy.
revert the defauls case back to only showing the ip and node
and only display the extra info if -v verbose output is requested

(This used to be ctdb commit 6488651aa7e105c57324f4a300760a010d098fbb)
2010-08-20 11:38:34 +10:00
Ronnie Sahlberg
08a5b0c7c5 add a new commandline flag -v to enable verbose output
(This used to be ctdb commit 96dd9f40f9464c3d9de98f1323568724a1e31dc9)
2010-08-20 11:28:24 +10:00
Ronnie Sahlberg
388d18cc93 make it possible to "ctdb gettickle" to only list tickles for a certain
port.

Default is to continue to show all tickles, but if a second argument
is given, only tickles for that port will be shown.

(This used to be ctdb commit 5b985eb2cbbb92bf6ccfcacd633d793bcd4e3ec1)
2010-08-20 11:25:12 +10:00
Ronnie Sahlberg
31126b2ef0 Add machinereadable output for the "ctgdb gettickles <ip>" command
(This used to be ctdb commit c3eb53509331045074579468d94ed7e31101bba4)
2010-08-18 14:37:16 +10:00
Ronnie Sahlberg
5aa5f3e7bf Remove the structure ctdb_control_tcp_vnn since this is identical to the structure ctdb_tcp_connection.
Add a new "ctdb deltickle" command to delete tickles from the database.
This can ONLY be used for tickles created by "ctdb addtickle".

Push any "addtickle/deltickle" updates to other nodes every TickleUpdateInterval seconds'

(This used to be ctdb commit acded034e2f0dcae4c2c9e54e16a001caf23caec)
2010-08-18 12:36:03 +10:00
Ronnie Sahlberg
44ff992806 Add a new "ctdb addtickle" command to manually add tickles to ctdbd
This can be used to set ctdbd up to generate a tickle for non-samba
services.
(samba contains code to set tickles up automatically)

(This used to be ctdb commit 7ef2cddad5326fdcc26138906948342039829495)
2010-08-18 11:09:32 +10:00
Ronnie Sahlberg
e8ffb0d8a4 We use eventloop nesting in a couple of places, notably the sync
parts of the recovery daemon.

Initialize all event contexts to allow nesting

(This used to be ctdb commit 5bf6bd5e7f33aabbeb7b9707716ef99cf471e590)
2010-08-18 10:11:59 +10:00
Ronnie Sahlberg
ddf3c621c1 Merge commit 'rusty/libctdb-new' into foo
(This used to be ctdb commit 1566d2d23ab698896b3b6a76974a5c7452db4a62)
2010-08-18 09:53:52 +10:00
Rusty Russell
f93440c4b7 event: Update events to latest Samba version 0.9.8
In Samba this is now called "tevent", and while we use the backwards
compatibility wrappers they don't offer EVENT_FD_AUTOCLOSE: that is now
a separate tevent_fd_set_auto_close() function.

This is based on Samba version 7f29f817fa.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 85e5e760cc91eb3157d3a88996ce474491646726)
2010-08-18 09:16:31 +09:30
Ronnie Sahlberg
8b0bbf960b Create a new command "ctdb sync" that isd just an alias for "ctdb ipreallocate"
(This used to be ctdb commit eededd592c92c59b435f0046989b2327fcc280b1)
2010-08-10 09:49:55 +10:00
Ronnie Sahlberg
7139faaeac Update a log message to reflect that this does no longer only happen
when trying/failing to ban a node.

(This used to be ctdb commit dc6b143c4785449e8c4ef7a46bf16adba750ab56)
2010-08-10 09:48:50 +10:00
Ronnie Sahlberg
f7ead50738 Merge remote branch 'martins/master'
(This used to be ctdb commit 9ca09ee9129b787428a2ceac9731b12166dc8718)
2010-08-09 11:35:38 +10:00
Martin Schwenke
0f18859a6c Add some command-line options to ctdb_diagnostics.
In some contexts ctdb_diagnostics generates too many errors when it is
run on heterogeneous and machine-configured clusters.  In some
clusters some nodes are expected to be differently configured and also
machine-generated configured files can have comments containing
timestamps.

This adds some command-line options that can be used to reduce the
number of errors reported:

    -n <nodes>  Comma separated list of nodes to operate on
    -c          Ignore comment lines (starting with '#') in file comparisons
    -w          Ignore whitespace in file comparisons
    --no-ads    Do not use commands that assume an Active Directory Server

The -n option simply allows ctdb_diagnostics to operate on a subset of
nodes, avoiding file comparisons with and data collection on nodes
that are differently configured.  For file comparisons, instead of
showing each file on the current node and then comparing other nodes
to that file, the file from the first (available or requested) nodes
is shown and then other nodes are compared to that.  That has resulted
in changes in output - that is, ctdb diagnostics no longer prints
messages referencing the current node.

-c and -w are used to weaken comparisons between configuration files.

--no-ads can be used to avoid running ADS-specific commands if a
cluster uses LDAP (or other non-ADS) configuration.

This also fixes a number of bugs in related code:

* A call to onnode was losing the >> NODE ...  << lines because they
  now go to stderr.  This was changed in onnode long ago but
  ctdb_diagnostics was never updated to match.

* ctdb_diagnostics was counting lines in /etc/ctdb/nodes to determine
  what nodes to operate on.  For some time the nodes file has
  supported syntax that makes this invalid.  "ctdb listnodes -Y" is
  now used to list available nodes.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 36c8244a0f68c7c9bbee40982f230e9d14d3c0ea)
2010-08-06 11:10:56 +10:00
Ronnie Sahlberg
043045dcc5 remove the "ctdb freeze" debugging command
(This used to be ctdb commit bd005b987255eb65cd3826dce984281ee757daf6)
2010-08-05 16:30:47 +10:00
Rusty Russell
61d3e09632 ctdb: fix crash on "ctdb scriptstatus --events=releaseip"
Martin accidentally typed this instead of "ctdb scriptstatus releaseip"
and it crashes.

CQ:S1018859
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

(This used to be ctdb commit 70877b2e7f8fd0d46899bbeca2c6caad6e6e6820)
2010-07-12 16:08:37 +09:30
Ronnie Sahlberg
5699091e9a Some "ctdb ..." commands can be run without having the main daemon running.
In that case, when the main daemon is not running
the ctdb context will be initialized to NULL, since we can not connect.

Move the calls to read the ctdb socketname and connecting via libctdb to
only happen when we are executing a "ctdb ..." command that requires that we talk to the actual daemon.
Otherwise we will get an ugly SEGV for the "ctdb ..." commandline tool
when trying to run a command that is supposed to work also when the daemon is down.

(This used to be ctdb commit 18168da84a6aa8d69465e43402444c7ec979604a)
2010-06-09 09:17:35 +10:00
Ronnie Sahlberg
6e0d612750 update "ctdb pnn" to use the new return value for _recv() where
bool false means failure and true means success.

(This used to be ctdb commit 8fec60cb92d26886d853c918b8bc7931fec46469)
2010-06-05 14:38:01 +10:00
Ronnie Sahlberg
433bc560fb Update the ctdb tool to use the new signature for ctdb_connect()
(This used to be ctdb commit ced3bc40f841d353bc86a6ee9dd1868473223f52)
2010-06-05 14:21:42 +10:00
Ronnie Sahlberg
c05f3ee99b When we say "current time of statistics" in the "ctdb statistics" output,
print the current time and not the start time

(This used to be ctdb commit d42ea3b1892f6a4abd1dbcf822d0a4d5db422d38)
2010-06-02 17:07:27 +10:00
Ronnie Sahlberg
53ea238c6c Add a variable for start/current time to ctdb statistics
and print the time startistics was taken and for how long the statistics have been collected to the "ctdb statistics" output.

(This used to be ctdb commit 1bdfe0cd3370a335b960ce1ef97eade93b0cd2fa)
2010-06-02 13:14:53 +10:00
Ronnie Sahlberg
ae3e91f0ce link ctdb with libctdb and connect to the daemon both the old way and by using libctdb
update the function "control_pnn()" to use libctdb to ask the daemon for the pnn

(This used to be ctdb commit 3f651eb8d71c7af0268460bc4b1476112140b290)
2010-06-02 10:37:00 +10:00
Ronnie Sahlberg
bc208bc916 rename ctdb_set_message_handler to ctdb_client_set_message_handler
to avoid a colission with the function of the same name in libctdb

(This used to be ctdb commit 41dbdd4fc0ab560420fb0e24a3179ff7c94c5bb7)
2010-06-02 09:51:47 +10:00
Ronnie Sahlberg
761a075de9 rename ctdb_send_message to ctdb_client_send_message to resolve colission with the function of the same name in libctdb
(This used to be ctdb commit ac3292c12832484a22715f1d46aa23f3b7c8a6f6)
2010-06-02 09:45:21 +10:00
Ronnie Sahlberg
0d46488f6e Merge commit 'rusty/libctdb2'
(This used to be ctdb commit d41b802250ddc0a89581eb6285edfd66bdc7a78a)
2010-05-25 12:48:49 +10:00
Rusty Russell
d5f6026a22 libctdb: reorganize headers: remove ctdb.h, add ctdb_client.h and ctdb_protocol.h
ctdb_client.h is the existing internal client interface (which was mainly
in ctdb.h), and ctdb_protocol.h is the information needed for the wire
protocol only.

ctdb.h will be the new, shiny, libctdb API.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

(This used to be ctdb commit 4bba6b8cd47b352f98d41f9f06258d5ac3c9adef)
2010-05-20 15:18:30 +09:30
Ronnie Sahlberg
6c5256ae69 In control_ipreallocate() we wait at most 5 tries before aborting the command
and returning an error.
This might not be sufficient if there are several recoveries in a row.

Instead loop as long as it takes for the recovery master to finish the recoveries and re
spond to the ipreallocate call.

Increase the log level of the error message when the recovery master was busy and could
not perform the ipreallocation promptly

BZ61783

(This used to be ctdb commit 8e9fd36e4619b7cc7bb6f7f7416d13e4c00a296a)
2010-05-20 12:35:57 +10:00
Ronnie Sahlberg
e1d1c230d3 Enhance the "ctdb restoredb" command so you can restore a backup into a different database.
(This used to be ctdb commit c692b09851fce85b61c8c654faafb49db8cb601b)
2010-05-20 11:26:37 +10:00
Ronnie Sahlberg
6f1221e9e1 Add the number of performed recoveries to the "ctdb statistics" output.
(This used to be ctdb commit fa045733cb81412f0d02ab52d74eabc7efca8b3d)
2010-05-11 09:44:53 +10:00
Ronnie Sahlberg
4a43428440 The recent change to the recovery daemon to keep track of and
verify that all nodes agree on the most recent ip address assignments
broke "ctdb moveip ..." since that call would never trigger
a full takeover run and thus would immediately trigger an inconsistency.

Add a new message to the recovery daemon where we can tell the recovery daemon to update its assignments.

BZ62782

(This used to be ctdb commit e7069082e5f0380dcddee247db8754218ce18cab)
2010-05-03 15:47:17 +10:00
Ronnie Sahlberg
05dcbed90e ctdb regsrvids is much more useful for testing if it sleeps once it has registered its srvid.
Othervise, as soon as it terminates, ctdbd will deregister the id automatically.

(This used to be ctdb commit 23b059dcb8074872d7900b225790d4df7da071b6)
2010-02-22 15:34:26 +11:00
Ronnie Sahlberg
e01c8454ef commands that relate to manual failover of ip addresses (moveip)
can sometimes take long so allow for a longer timeout for the controls used.

(This used to be ctdb commit 144c69b633eeb17e120f962162feed6de3dc16a6)
2010-02-09 18:34:47 +11:00
Ronnie Sahlberg
ca9386a7f4 dont just exit(0) upon successful completion of waiting for an ipreallocate to finish.
return success back to the caller instead.

otherwise things like 'ctdb enable -n all' will just finish after the first disabled node has become enabled.

(This used to be ctdb commit f4eb41cd3a1099da8265351818fba9bd4688a188)
2010-02-09 14:35:10 +11:00
Ronnie Sahlberg
7a889c5f1d When trying to enable/disable a node.
Check if the node is already enabled/disabled and log an information
message if so.

(This used to be ctdb commit c3eec8f10764a647106087099eeb47b7196f7aac)
2010-02-04 10:03:21 +11:00
Ronnie Sahlberg
7a5254ae69 add two new debug controls to send and receive messages
ctdb msglisten and msgsend

(This used to be ctdb commit 8c89aac20260dc7f3746e29fe99f17422a77cb88)
2010-02-04 09:45:32 +11:00
Martin Schwenke
52dbd65825 onnode: update algorithm for finding nodes file.
2 changes:

* If a relative nodes file is specified via -f or $CTDB_NODES_FILE but
  this file does not exist then try looking for the file in /etc/ctdb
  (or $CTDB_BASE if set).

* If a nodes file is specified via -f or $CTDB_NODES_FILE but this
  file does not exist (even when checked as per above) then do not
  fall back to /etc/ctdb/nodes ((or $CTDB_BASE if set).  The old
  behaviour was surprising and hid errors.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 60aa570aaa77d293b963105b3f605f9625a4594b)
2010-01-21 18:52:44 +11:00
Martin Schwenke
7569b21f2d onnode - respect $CTDB_BASE rather than hard-coding /etc/ctdb.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 503e4908b3028330bc25dc6de8561dbd53ee6a8d)
2010-01-21 18:52:31 +11:00
Stefan Metzmacher
f2854f75c8 tools/ctdb: add PartiallyOnline state for "ctdb status" and "ctdb status -Y"
This is based on the GET_IFACES control against each node.

metze

(This used to be ctdb commit 38cb972382a09f830673277d0a9bd5d20deafff2)
2010-01-20 11:11:00 +01:00
Stefan Metzmacher
a6437bc707 tools/ctdb: display interfaces in "ctdb ip" and "ctdb ip -Y" outputs
metze

(This used to be ctdb commit dffa2b05acce8b73c2fdd085311732bf57f01b7f)
2010-01-20 11:11:00 +01:00
Stefan Metzmacher
df5805d6a0 tools/ctdb: add "ctdb ipinfo <ip>"
metze

(This used to be ctdb commit e05e236fc019bfd3b316609a7c190e0e028a4bbc)
2010-01-20 11:11:00 +01:00
Stefan Metzmacher
a6803f42a5 tools/ctdb: add "ctdb setifacelink <iface> <status>"
metze

(This used to be ctdb commit 8d0c00b60db69bd10f12da4c676e1142dc37af7a)
2010-01-20 11:11:00 +01:00
Stefan Metzmacher
0ceef7036b tools/ctdb: add "ctdb ifaces"
metze

(This used to be ctdb commit 80053d09eed967fb76898f4a53437bed2b43a02f)
2010-01-20 11:11:00 +01:00
Stefan Metzmacher
a23c409e73 tools/ctdb: display INACTIVE status in "ctdb status" and "ctdb status -Y"
metze

(This used to be ctdb commit 18af37e99ef8ff5623161495be432abfe5e3407f)
2010-01-20 09:44:36 +01:00
Stefan Metzmacher
a03cf0040b ctdb: print out some hints how to debug a "ctdb catdb" failure
metze

(This used to be ctdb commit 504cf78d00d1120b556124340b9312f890b8b8b9)
2009-12-16 08:08:33 +01:00
Stefan Metzmacher
965c000c6e ctdb: add machinereadable output fot "ctdb -Y getdbmap"
metze

(This used to be ctdb commit 45cfcd44093c7d2681e2ffd5cfb402823e8809f4)
2009-12-16 08:08:33 +01:00
Stefan Metzmacher
aa07a46bf5 ctdb: disallow "ctdb backupdb" on unhealthy databases
metze

(This used to be ctdb commit ecf799093c1989f5499c9d61ce8cc8a98d759160)
2009-12-16 08:08:33 +01:00
Stefan Metzmacher
c4bc231267 client: add "ctdb dumpdbbackup <filename>"
metze

(This used to be ctdb commit c63a0368d9d4b526ac1e49d891d3a1b7b8d20320)
2009-12-16 08:08:33 +01:00
Stefan Metzmacher
fb50e08942 tools/ctdb: let "ctdb restoredb" and "ctdb wipedb" mark the db as healthy on all
nodes

metze

(This used to be ctdb commit d1b10b0c0c323c39742a18e98a1dab7e82ddc7be)
2009-12-16 08:08:32 +01:00
Stefan Metzmacher
c56ce3d2f2 tools/ctdb: add "ctdb getdbstatus <dbname>"
metze

(This used to be ctdb commit 910c19f12448d293a755d1eb46d20f9591f8da7a)
2009-12-16 08:08:32 +01:00
Stefan Metzmacher
927dd3d9e5 tools/ctdb: display db health in "ctdb getdbmap"
metze

(This used to be ctdb commit c34535ff4dc6a44909283641596e0ed7c2316fbd)
2009-12-16 08:08:32 +01:00
Stefan Metzmacher
003985acfd ctdb: pass TDB_DISALLOW_NESTING to all tdb_open/tdb_wrap_open calls
metze

Signed-off-by: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit 1635e931b909c66eb3b1f5357e3a549b1a0da70d)
2009-12-16 08:03:55 +01:00
Rusty Russell
cab8da8dc4 ctdb: don't print OUTPUT: for DISABLED scripts
In other news, did you know ctime() returns a \n-terminated string?

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

(This used to be ctdb commit 1b4e7bb548976b99f122142b040494b6f9911962)
2009-12-14 15:46:49 +11:00
Rusty Russell
a46c3b4f2a ctdb: scriptstatus can now query non-monitor events
We also no longer return an error before scripts have been run; a special
zero-length data means we have never run the scripts.

"ctdb scriptstatus all" returns all event script results.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 9b90d671581e390e2892d3a68f3ca98d58bef4df)
2009-12-08 01:50:55 +10:30
Rusty Russell
9e87377e7a ctdb: support --machinereadable (-Y) for scriptstatus
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 47ffe75848f216568ce3db0a60ca88cfe3d6903a)
2009-12-08 01:31:53 +10:30
Rusty Russell
9753b7e793 eventscript: rename ctdb_monitoring_wire to ctdb_scripts_wire
We're going to allow fetching status of all script runs, so this
name is no longer appropriate.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit f5cb41ecf3fa986b8af243e8546eb3b985cd902a)
2009-12-08 00:51:24 +10:30
Rusty Russell
c70afe0cd4 eventscript: handle and report generic stat/execution errors
Rather than ignoring deleted event scripts (or pretending that they were "OK"),
and discarding other stat errors, we save the errno and turn it into a negative
status.

This gives us a bit more information if we can't execute a script (eg.
too many symlinks or other weird errors).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 5d894e1ae5228df6bbe4fc305ccba19803fa3798)
2009-12-07 23:12:19 +10:30
Rusty Russell
b9b75bd065 eventscript: use -ENOEXEC for disabled status value
This unifies code paths and simplifies things: we just hand -ENOEXEC to
ctdb_ctrl_event_script_stop().

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit eadf5e44ef97d7703a7d3bce0e7ea0f21cb11f14)
2009-12-07 23:11:47 +10:30
Rusty Russell
066a791770 eventscript: use -ETIME for timeout status value
This starts the move toward more expressive encoding of return values:
positive values mean the script ran, negative means we had a problem with
the script (and the value is the errno).

This does timeout, but changes the ctdb tool to recognize it.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 0eb1d0aa14e68b598d9e281c8a02b8f94a042fd9)
2009-12-07 23:09:42 +10:30
Michael Adam
92c5d9eefc ctdb: add command "ctdb wipedb" to wipe the contents of an attached tdb
Michael

(This used to be ctdb commit 5a7c1e7f15693522bbf1c39a53be2304ece9a134)
2009-12-04 11:30:20 +01:00
Ronnie Sahlberg
cc2d81a77c make the ringbuffer logging more efficient and marshall the data by writing to a tmpfile instead of continously talloc resizing a blob
(This used to be ctdb commit 6427f0b68d60b556a023f64e15e156000ba6f943)
2009-11-18 19:10:50 +11:00
Ronnie Sahlberg
bc2675119d add an in memory ringbuffer where we store the last 500000 log entries regardless of log level.
add commandt to extract this in memory buffer and to clear it

(This used to be ctdb commit 29d2ee8d9c6c6f36b2334480f646d6db209f370e)
2009-11-18 12:44:18 +11:00
Ronnie Sahlberg
f88fbb5f1e suggestion from Christian,
dont allow UNHEALTHY nodes to become natgw master, unless all nodes
are unhealthy

(This used to be ctdb commit e8e7129ff1371065fbd75e1aea844d6d04a96fa9)
2009-11-06 08:19:32 +11:00
Ronnie Sahlberg
fcd2ebc32b update the uptime command to indicate that time since last is either from alst recovery or from last failover
(This used to be ctdb commit 467da12a785ba3367ed9cbdf79440394e9703289)
2009-10-29 10:58:14 +11:00
Ronnie Sahlberg
023d09cd38 Revert "update the "uptime" command to indicate the "time since last" is the time since the last recovery OR failover."
This reverts commit 3b0d44497800a16400d05a30bdaf6e6c285d4b36.

(This used to be ctdb commit cb36bbb5418290e8e5b770d2d836285b15da2a6f)
2009-10-29 10:49:00 +11:00
Ronnie Sahlberg
279b7ca564 update the "uptime" command to indicate the "time since last" is the time since the last recovery OR failover.
(This used to be ctdb commit 3b0d44497800a16400d05a30bdaf6e6c285d4b36)
2009-10-29 10:37:10 +11:00
Ronnie Sahlberg
4d40b86805 for debugging
add a global variable holding the pid of the main daemon.
change the tracking of time() in the event loop to only check/warn when called from the main daemon

(This used to be ctdb commit a10fc51f4c30e85ada6d4b7347b0f9a8ebc76637)
2009-10-27 13:18:52 +11:00
Stefan Metzmacher
3d713d9e53 ctdb_diagnostics: don't use hardcoded path to iptables
All event scripts use only the relative path, so we should
here.

Also PATH includes /sbin and /usr/sbin...

metze

(This used to be ctdb commit 20678e1506db1f96b58c326ee91339e797c07c22)
2009-10-26 14:23:09 +11:00
Ronnie Sahlberg
d08e3c628d Merge commit 'martins/onnode_options'
(This used to be ctdb commit 82fad66123c1b8c5d4ed3b19c39acf6f367b3f37)
2009-10-14 15:51:57 +11:00
Martin Schwenke
f0dd32e412 Merge commit 'origin/master' into onnode_options
(This used to be ctdb commit e62928f56ce8927b1d8686db2c31538c86462d1a)
2009-10-14 13:49:30 +11:00
Martin Schwenke
787a6e44c6 New onnode options: -f to specify nodes file, -n to allow use of hostnames.
The -f option allows an alternate nodes file to be specified,
overriding the CTDB_NODES_FILE environment variable.

The -n option allows hostnames to be used instead of node numbers.
Using a range of hostnames is invalid, so hostnames can't contain
hyphens ('-') - sorry!  You can use this option without a nodes file
by specifying "-f /dev/null".

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 46474e5f21fd97dd765c616647ff46055a9970e7)
2009-10-14 13:44:57 +11:00
Ronnie Sahlberg
80be59d35e when we change state between healthy/unhealthy, make sure we ask the recovery
master to perform an explicit ip reallocation.

This is more reliable and faster than having the recovery dameon track these
changes, and since we now have an explicit method to ask the recovery daemon
to perform an explicit ip reallocation, we should use this.

(This used to be ctdb commit 3807681e74f4bfe92befdae6ed616ff5f1a99880)
2009-10-14 11:59:16 +11:00
Ronnie Sahlberg
98b5caf003 we must break the loop as soon as we find a suitable recmaster does exist
otherwise "tdb ipreallocate" will silently fail to update the addresses.

(This used to be ctdb commit 346fa055f4106497b87df97da5ebd6e51fa1ef8c)
2009-10-13 09:49:05 +11:00
Ronnie Sahlberg
771802b212 allow setting the recmode even when not completely frozen.
we sometimes have to do this when we want to trigger a recovery

(This used to be ctdb commit 46194e87e189521375b39b4ef33da2b493429fd8)
2009-10-12 13:06:16 +11:00
Ronnie Sahlberg
d4c98516a2 uptade the freeze/thaw commands to be able to send the requested database priority to freeze/thaw to the daemon.
this is encoded in the srvid field of the request header

(This used to be ctdb commit 0cb3d33caa42ed783e03bc825b181dde4cf63616)
2009-10-12 09:22:17 +11:00
Ronnie Sahlberg
3219f81710 add a control to read the db priority from a database
(This used to be ctdb commit ca6d045e419f308f57e74d4c978907afb05ddb85)
2009-10-10 15:04:18 +11:00
Ronnie Sahlberg
6cf7d8e131 add a control to set a database priority. Let newly created databases default to priority 1.
database priorities will be used to control in which order databases are locked during recovery in.

(This used to be ctdb commit 67741c0ee01916d94cace8e9462ef02507e06078)
2009-10-10 14:26:09 +11:00
Ronnie Sahlberg
134ed842fa always send the release/take ip controls to make sure all nodes are updated
(This used to be ctdb commit 789703ea684717781c176fd3a2a24d96abde220b)
2009-10-06 12:25:44 +11:00
Ronnie Sahlberg
166b1c97b4 add a new message to ask the recovery daemon to temporarily disable checking ip address consistency.
This is useful when we are moving addresses using moveip in the cluster since otherwise if we collide with the recovery daemons own check we could cause a recovery

(This used to be ctdb commit 9c63858c0b22c81eaccb9865a414af0bbb2833d4)
2009-10-06 12:11:32 +11:00
Ronnie Sahlberg
617e393f6b update addip/moveip/delip to make it less likely to trigger an accidental recovery
(This used to be ctdb commit 3befe5526e147d49451fddc930aaafc3dbe2e9c1)
2009-10-06 11:41:18 +11:00
Ronnie Sahlberg
709fc77878 When adding a public ip to a node, make sure to push the assignment of ip addresses out to all nodes so all nodes become aware who currently holds the ip.
(This used to be ctdb commit e8df6fc301fb7faf72c72eb39ea68d44d1526b00)
2009-10-06 08:19:25 +11:00
Ronnie Sahlberg
22dde50be3 add machinereadable output for the ctdb getreclock command
(This used to be ctdb commit 5e7dc36f1649824db2f9dab34bede8b388502a57)
2009-09-28 13:39:54 +10:00
Ronnie Sahlberg
029fd6b00f Revert "try to restart statd everytime it fails, not just the first time"
This reverts commit 4f7b39a4871af28df1c4545ec37db179fa47a7da.

(This used to be ctdb commit db7b96304e4725f29b12398b7582e385daed63ed)
2009-09-15 19:33:35 +10:00
Ronnie Sahlberg
59cacded72 try to restart statd everytime it fails, not just the first time
(This used to be ctdb commit 4f7b39a4871af28df1c4545ec37db179fa47a7da)
2009-09-15 13:35:58 +10:00
Martin Schwenke
021892346c onnode: add "any" nodespec to select any node with running CTDB.
In testing and other situations (e.g. eventscripts) it is necessary to
select a node where a ctdb command can be run.  The whole idea here is
to avoid nodes where ctdbd is not running and where most ctdb commands
would fail.  This implements a standard way of doing this involving a
recursive onnode command.

There is still a small window for a race, where the selected node is
suddenly shutdown, but this is unavoidable.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit fb47cce86c0edae5caaf485f13ae7a151b6cb00d)
2009-09-08 15:10:20 +10:00
Ronnie Sahlberg
cda5f02c7c new prototype banning code
(This used to be ctdb commit 0c4c2240267af183d54ffd4c0aacda208f6eff6a)
2009-09-04 02:20:39 +10:00
Ronnie Sahlberg
ef9db0efc3 reduce the loglevel for the message that we switch to a different recmaster while waiting for ipreallocate to finish
(This used to be ctdb commit e5b25e1386294b1f800c32fb01c69c3c3ce85c26)
2009-08-17 10:56:12 +10:00
Ronnie Sahlberg
486bdd8ca1 if no timeout at all is specified to the ctdb tool, neither using -T nor by setting CGTDB_TIMEOUT, then use 120 seconds as a default timepout before the ctdb command will exit with an error.
(This used to be ctdb commit d8d21884736a9610d48cf532e1c6778e511fb7a8)
2009-08-17 10:54:45 +10:00
Ronnie Sahlberg
1cc79905ad add new controls to make it possible to enable/disable individual eventscripts
update scriptstatus output so it lists disabled scripts

(This used to be ctdb commit 7e799b7523c9699bd65a8a8207f7e03d668b0b81)
2009-08-13 13:04:08 +10:00
Ronnie Sahlberg
0e09e52824 update STOP/CONTINUE to better handle when we stop the last node
(This used to be ctdb commit 9a251078f22aea15b9ca37393e0b5e2740aa21fb)
2009-08-03 12:51:55 +10:00
Martin Schwenke
e50a067cb5 Merge commit 'origin/master'
(This used to be ctdb commit d7ff60a74595dcb4ae41f5a8193de5b898d61227)
2009-07-29 10:08:56 +10:00
Ronnie Sahlberg
62c4a841d2 When processing the stop node control reply in the client code we should
also check the returned status code in case the _stop() command failed
due to the eventscripts failing.

If this happens, make "ctdb stop" log an error to the console and try
the operation again.

(This used to be ctdb commit 20e82e0c48e07d1012549f5277f1f5a3f4bd10d1)
2009-07-29 09:58:40 +10:00
Martin Schwenke
50650fbbd1 onnode: update tests for healthy and connected to cope with new stopped bit.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit bfc926c866e361ab28330747544b268ba130bf30)
2009-07-28 16:00:11 +10:00
Ronnie Sahlberg
37d68c58b8 add two commands : setlmasterrole and setrecmasterrole to enable/disable these capabilities at runtime
(This used to be ctdb commit 51aaed0e9e42e901451292e8dd545297ab725a62)
2009-07-28 13:45:13 +10:00
Ronnie Sahlberg
72e2380e92 add a command "setnatgwstate {on|off}" that can be used to indicate if this node is using natgw functionality or not.
(This used to be ctdb commit 89a9bb29a60a6fb1fba55987e6cf0a4baa695e50)
2009-07-28 09:58:11 +10:00
Ronnie Sahlberg
9c6aa4e420 update the eventscript to ensure that stopped nodes can not become the natgw master
also verify that we actually do have a natgw master available if this is configured and make the node unhealthy if not.

(This used to be ctdb commit 7f273ee769d671d8c8be87c9187302fb77e814f3)
2009-07-17 09:45:05 +10:00
Ronnie Sahlberg
5ce69e2fa3 if all nodes are STOPPED, pick one of the STOPPED nodes as natgw master
(This used to be ctdb commit 8bbd96cfbbe98f3fc19e432797cbf4478f753a0b)
2009-07-17 09:36:22 +10:00
Ronnie Sahlberg
bf9ad9c934 Do not allow STOPPED or DELETED nodes to become the NATGW master
(This used to be ctdb commit 4505ea15408ad40dd8deb4041fd75a65a0ad9336)
2009-07-17 09:29:58 +10:00
Ronnie Sahlberg
88f3c40d9c add two new controls, CTOP_NODE and CONTINUE_NODE
that are used to stop/continue a node instead of using modflags messages

(This used to be ctdb commit 54b4a02053a0f98f8c424e7f658890254023d39a)
2009-07-09 12:22:46 +10:00
Ronnie Sahlberg
d6a5fd5c9d remove the header printed for the machinereadable output for natgwlist
(This used to be ctdb commit 049271c83a09afb8d6c3e5212cf9ca782956b0c6)
2009-07-09 11:43:37 +10:00
Ronnie Sahlberg
9f0dc4b93b Add a new node flag : STOPPED
This node flag means the node is DISABLED and that all its public ip addresses
are failed over, but also that it has been removed from the VNNmap.

A STOPPED node should be in recovery mode active untill restarted using the continue command.

Adding two new commands "ctdb stop" "ctdb continue"

(This used to be ctdb commit d47dab1026deba0554f21282a59bd172209ea066)
2009-07-09 11:38:18 +10:00
Ronnie Sahlberg
20887a15ad Perform an ipreallocate efter each enable/disable.
This will force a wait until the ip addresses have been reallocated after a disable/enable command and will make scripting of enable/disable more predictable.

This will cause the command enable/disable to wait until the ip realocation that normally follows shortly after a enable/disable to finish before the command returns to the prompt.

(This used to be ctdb commit 6e1f60d8d780c1240aaabb78ecc8550d0480cd7e)
2009-07-06 11:49:55 +10:00
Ronnie Sahlberg
289c58e9b6 add a new command "ctdb ipreallocate", this command will force the recovery master to perform a full ip reallocation process.
the ctdb command will block until the ip reallocation has comleted

(This used to be ctdb commit abad7b97fe0c066b33f6e75d0953bbed892a3216)
2009-07-02 13:00:26 +10:00
Ronnie Sahlberg
8e435c0605 update enable/disable
(This used to be ctdb commit b99afc98bedf1a51d315e311f27c3fc55fd940e7)
2009-07-01 09:33:08 +10:00
Ronnie Sahlberg
2770cb4397 show the valid debuglevels that can be used in the error text when an invalid level was specified to ctdb setdebug
(This used to be ctdb commit 421c0566094b91221fab2ea68f2c9bd35d5dfbcb)
2009-07-01 09:21:07 +10:00
Ronnie Sahlberg
93026f4cbf update the handling of debug levels so that we always can use a literal instead of a numeric value.
validate the input values used and refuse setting the debug level to an unknown value

(This used to be ctdb commit daec49cea1790bcc64599959faf2159dec2c5929)
2009-07-01 09:17:13 +10:00
Ronnie Sahlberg
9802a0c2f6 when no debuglevel is specified, make 'ctdb setdebug' show the available options
(This used to be ctdb commit f4b0825d9da34578b9f90dc9bd7f99fcc2519ddf)
2009-07-01 08:26:00 +10:00
Ronnie Sahlberg
5b235c3999 add a control to set the reclock file
(This used to be ctdb commit 36cc2e586f03fa497ee9b06f3e6afc80219c4aaa)
2009-06-25 14:25:18 +10:00
Ronnie Sahlberg
2b253c094c add a control to read the current reclock file from a node
(This used to be ctdb commit ed6a4cbcdcbb4e0df83bec8be67c30288bf9bd41)
2009-06-25 12:17:19 +10:00
Martin Schwenke
635da189dc Fix minor onnode bugs relating to local daemons.
Commit a0f5148ac749758e2dfbd6099e829c5bf1d900e6 caused a subtle
regression.  Due to the subtlety, this description is much longer than
the 1 line patch that fixes it!  The regression, where a process that
invokes onnode is unexpectedly blocked, is only apparent if the
following conditions are met:

1. $CTDB_NODES_SOCKETS is set;
2. The command passed to onnode attempts to background a process; and
3. onnode is run in certain types of subshell (e.g. foo=$(onnode ...)).

In particular, when testing against local daemons (i.e. condition (1)
is met), tests/simple/07_ctdb_process_exists.sh would fail (because it
does both (2), (3)).

The problem is caused by the use of file descriptor 3 in the code that
allows separate filtering of stdout and stderr.  A backgrounded
process will have this descriptor open and the $(...) construct
appears to wait for all file descriptors to be closed.  This only
happens with local daemons because SSH is replaced by a shell and file
descriptor 3 leaks into that shell.  It does not occur when SSH is
used because the file descriptor does not leak into the remote shell
where the process is backgrounded.

The fix is simply to redirect file descriptor 3 to /dev/null in the
fakessh function, which is used when $CTDB_NODES_SOCKETS is set.

Also fixed is another minor bug when the -o option and
$CTDB_NODES_SOCKETS are used in combination.  The code uses the node
name as a suffix for the output filename(s).  Usually this is an IP
address.  However, when $CTDB_NODES_SOCKETS is in use the node name is
the socket name, which might be a path several directories deep.
Each output file is created via a simple redirection and this would
fail if unexpected directories appear in the filename.  3 possible
fixes were considered:

1. Replace all '/'s in the node name by '_'s.  Nice and simple.
2. Use the basename of the node name.  However, sockets may be in
   different directories but have the same basename.
3. Create all required directories before redirecting.  This is a
   little more complex and probably doesn't meet the user's
   expectations.

Option (1) is implemented here.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 5d320099025b6835eda3a1e431708f7e0a6b0ba6)
2009-06-19 18:02:17 +10:00
Ronnie Sahlberg
2bb687c4cd remove unused variable
(This used to be ctdb commit 2a52336ec021dfe8d56ba72726feb7b2dbd41f68)
2009-06-09 10:58:46 +10:00
Ronnie Sahlberg
ac931b1371 dont require particular values for NoIPFailback and DeterministicIPs when
using ctdb moveip

(This used to be ctdb commit d350c631850377c09968d2978ef57d2bd0d50116)
2009-06-09 10:57:46 +10:00
Ronnie Sahlberg
f135684766 improve ctdb moveip so that it does not always trigger a recovery.
(This used to be ctdb commit 0ca28d7336463ecd2ff65620d8dbcbb496991531)
2009-06-09 10:56:50 +10:00
Ronnie Sahlberg
f6ccf96898 try avoiding to cause a recovery when deleting a public ip from a node
(This used to be ctdb commit 6318ea13464e2fe630084c40802d8e697c2cb999)
2009-06-05 17:57:14 +10:00
Ronnie Sahlberg
b046f5e3aa when adding an ip, try manually adding and takingover the ip instead of triggering a full recovery to do the same thing
(This used to be ctdb commit 4d5d22e64270cfb31be6acd71f4f97ec43df5b2c)
2009-06-05 17:00:47 +10:00
Ronnie Sahlberg
79eef7f2b5 dont list DELETED nodes in the ctdb listnodes output
(This used to be ctdb commit 7eb137aa4c24c69bd93b98fb3c7108e5f3288ebd)
2009-06-04 13:25:58 +10:00
Ronnie Sahlberg
f691b96d84 make it possible to run 'ctdb listnodes' also if the daemon is not running.
in this case, read the nodes file directly instead of asking the local daemon for the list.

add an option -Y to provide machinereadable output to listnodes

(This used to be ctdb commit 4a55cacc4f5526abd2124460b669e633deeda408)
2009-06-04 13:21:25 +10:00
Ronnie Sahlberg
45aa542064 teach ONNODE about deleted nodes
(This used to be ctdb commit 03d304e72a5839dc8d8d2e2312b346c21dca5774)
2009-06-02 15:03:44 +10:00
Ronnie Sahlberg
1dee7a2401 hide all DELETED nodes from the ctdb command output
(This used to be ctdb commit 91fdfee371d6be83af60cd38ac34afb295b9987a)
2009-06-01 15:43:30 +10:00
Ronnie Sahlberg
e6170b5389 add a new node state : DELETED.
This is used to mark nodes as being DELETED internally in ctdb
so that nodes are not renumbered if / when they are removed from the nodes file.

This is used to be able to do "ctdb reloadnodes" at runtime without
causing nodes to be renumbered.
To do this, instead of deleting a node from the nodes file, just comment it out like

   1.0.0.1
   #1.0.0.2
   1.0.0.3

After removing 1.0.0.2 from the cluster,  the remaining nodes retain their
pnn's from prior to the deletion, namely 0 and 2

Any line in the nodes file that is commented out represents a DELETED pnn

(This used to be ctdb commit 6a5e4fd7fa391206b463bb4e976502f3ac5bd343)
2009-06-01 14:18:34 +10:00
Sumit Bose
2fcedf6dac add missing checks on so far ignored return values
Most of these were found during a review by Jim Meyering <meyering@redhat.com>

(This used to be ctdb commit 3aee5ee1deb4a19be3bd3a4ce3abbe09de763344)
2009-05-21 11:22:21 +10:00
Christian Ambach
8e9736ac1f Remove error messages about a non-existing /var/log/log.ctdb when running ctdb with logging to syslog
(This used to be ctdb commit afdbf3c0df02decd823615134294abf2c8a8a5f3)
2009-05-14 18:59:31 +10:00
Ronnie Sahlberg
98a54c4675 Track how long it takes to take out the recovery lock from both the main dameon and also from the recovery daemon.
Log this in "ctdb statistics".

Also add a varaible "RecLockLatencyMs" that will log an error everytime it takes longer than this to access the reclock file.

(This used to be ctdb commit 042377ed803bb8f7ca9d6ea1a387427b7b8ba45a)
2009-05-14 10:33:25 +10:00
Ronnie Sahlberg
93a2829e94 check that a node is banned before trying to unban it.
(This used to be ctdb commit 4467b5f88d749d455854512f60a5d313cafa828b)
2009-05-12 18:32:41 +10:00
Martin Schwenke
53c9643104 Fix lvsmaster and natgwlist nodespecs.
They both need to use a -Y option to ctdb and for natgwlist we only
want the 1st line.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit e781ff61e17d733349021bb036514f823c7cbfbb)
2009-05-12 08:58:57 +10:00
Martin Schwenke
6098464175 New lvs/lvsmaster and natgw/natgwlist nodespecs for onnode.
Some code re-factoring to implement this and to make it easy to
implement new ones.  New simpler implementation of echo_nth() no
longer uses deleted get_nth() function.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 29559f5dd099bec210e98909c9b2e048461b7c81)
2009-05-12 08:58:23 +10:00
Martin Schwenke
9616959bd6 New option "-o <prefix>" saves stdout from each node to file <prefix>.<ip>.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit a0f5148ac749758e2dfbd6099e829c5bf1d900e6)
2009-05-12 08:58:04 +10:00
Ronnie Sahlberg
54a5e6c0c8 Add a -Y machinereadable flag to "lvsmaster"
(This used to be ctdb commit bbae698656d5da9a4a5b0fbfc3003844f246d54b)
2009-05-11 14:44:59 +10:00
Ronnie Sahlberg
1ee122e165 in the "lvsmaster" command, return -1 if there is no lvsmaster
(This used to be ctdb commit ce6afbdef36e3c386b75709f73ef55efe0bd1987)
2009-05-11 13:56:28 +10:00
Ronnie Sahlberg
6721546b53 change the ctdb command table to allow us to describe commands which can be run independtly of the ctdb daemon.
create a new debugging command xpnn which discovers the pnn of the local node and which works even if the local daemon is not running

(This used to be ctdb commit cd78765f9400d7abce7929a2dd199f65226e7664)
2009-03-25 14:46:05 +11:00
Ronnie Sahlberg
d7ff332896 update how the NATGW configuration works.
allow the cluster to be partitioned into multiple disjoint natgw subsets

(This used to be ctdb commit 1046885cd22b5001e0251de2e536b5f6793459be)
2009-03-25 13:37:57 +11:00
Ronnie Sahlberg
7265c713db we need to set the port properly in the parse_ip helper
(This used to be ctdb commit 43fe18d86995744ba61c7a6405b70edcb265930a)
2009-03-24 13:45:11 +11:00
root
629d5ee1fa add a new command "ctdb scriptstatus"
this command shows which eventscripts were executed during the last monitoring cycle and the status from each eventscript.

If an eventscript timedout or returned an error we also
show the output from the eventscript.

Example :
[root@rcn1 ctdb-git]# ./bin/ctdb scriptstatus
6 scripts were executed last monitoring cycle
00.ctdb              Status:OK    Duration:0.021 Mon Mar 23 19:04:32 2009
10.interface         Status:OK    Duration:0.048 Mon Mar 23 19:04:32 2009
20.multipathd        Status:OK    Duration:0.011 Mon Mar 23 19:04:33 2009
40.vsftpd            Status:OK    Duration:0.011 Mon Mar 23 19:04:33 2009
41.httpd             Status:OK    Duration:0.011 Mon Mar 23 19:04:33 2009
50.samba             Status:ERROR    Duration:0.057 Mon Mar 23 19:04:33 2009
   OUTPUT:ERROR: Samba tcp port 445 is not responding

Add a new helper function "switch_from_server_to_client()" which both
the recovery daemon can use as well as in the child process we start for running the actual eventscripts.

Create several new controls, both for the eventscript child process to inform the master daemon of the current status of the scripts as well as for the ctdb tool to extract this information from the runninc daemon.

(This used to be ctdb commit c98f90ad61c9b1e679116fbed948ddca4111968d)
2009-03-23 19:07:45 +11:00
Ronnie Sahlberg
4d2195c503 The wbinfo --sequence command has been depreciated in favor of the new
--online-status command

(This used to be ctdb commit b6e34503ac094a274a569a69e3d93d92ad911f4d)
2009-03-19 10:43:57 +11:00
root
4088e0aceb make sure we can collect proper mmfs data
(This used to be ctdb commit 76d655f9aa3ebd39e7a40d0bbd85e40d08f3e90b)
2009-03-12 12:33:19 +11:00
root
7a11082f0f collect net conf list in ctdb_diagnostics
(This used to be ctdb commit 0bb130090b8dce5f85b0cb178a19f877759c0caa)
2007-03-10 14:10:21 +11:00
root
b1e7724eb8 check the static-routes file if it exists
(This used to be ctdb commit 9ce84a7915abaa987160ecbcae63128a9ed0a741)
2007-03-10 13:45:38 +11:00
Ronnie Sahlberg
5c7570b103 Merge branch 'martins'
(This used to be ctdb commit fe4eea45c6b5702a794424037c3f2ab4241d5e5e)
2009-02-18 13:10:03 +11:00
Michael Adam
3cca0f75e4 Fix treatment of link local ipv6 addresses: set the scope id.
metze / Michael

Signed-off-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit 9d12de1ca6107801dada927729e755c0949d73bf)
2009-01-19 22:50:53 +01:00
Martin Schwenke
9e3ccd9d69 Merge commit 'origin/master' into martins
(This used to be ctdb commit 099a1605574c7a8d232fd4c2d0c65e55aedeafad)
2008-12-17 15:05:44 +11:00
root
6c1359ab0d add better errorchecking that nodes we try to talk to using the "ctdb" tool actually exist and that it is connected.
two new dedicated ctdb error codes
21: node does not exist
22: node is disconnected

(This used to be ctdb commit 7ee6db06162ad5a554058bb6160ad37b24fe42e0)
2008-12-17 14:26:01 +11:00
root
1bf3006665 update the "ctdb recover" command.
block and wait until the clustered has completed the recovery before returning.
this  makes it easier to script since it avoids the common need for
   ctdb recover
   ... complex loop to wait for recovery to complete ...
   script continues

(This used to be ctdb commit 8a0df9324a03b0f17772c64a9331236126c22124)
2008-12-10 12:06:51 +11:00
root
1209079672 add a CTDB_TIMEOUT variable for the ctdb tool.
If set this specified the maximum runtime for the ctdb tool before it will terminate with status == 20
Just like the -T ...  option would.

(This used to be ctdb commit c404d57afb2adda039e676877838927d3073df11)
2008-12-10 12:01:19 +11:00
root
58bf3804f0 make sure we return an errorcode when the ctdb command has hung and is timeodout by the -T <timeout> setting
(This used to be ctdb commit 993f626e603b9bbc02942bb55096d63b9a4f456b)
2008-12-10 11:49:51 +11:00
Martin Schwenke
5dcc100e3e Merge commit 'origin/master' into martins
(This used to be ctdb commit 674d1660e5602f2fab1eaf219a6b8b5ddf24c402)
2008-12-10 11:42:02 +11:00
Martin Schwenke
ad47f61ea6 Merge commit 'origin/master' into martins
(This used to be ctdb commit b5eec91bd185c91a09b3f42ed26fee7b13a70d9d)
2008-12-10 11:32:24 +11:00
Martin Schwenke
5750e97944 Merge commit 'origin/master' into martins
(This used to be ctdb commit 6cbe8923ead8226de1c20cfd8718e43fe8525ce1)
2008-12-10 11:22:59 +11:00
root
762d4be8f9 add a helper that waits until the clueter is no longe rin recovery mode and return the generation number.
change the ban/unban logic to wait until we are not in recovery before it bans/unbans the node.

also wait until after the cluster has recovered from the ban/unban before returning so that the cluster is in recpovery mode == normal when the command returns.  this makes it much easier to script things ...

(This used to be ctdb commit 39c77371a2f995025a584691fe61af12dc6ed5d7)
2008-12-09 12:03:42 +11:00
Martin Schwenke
370cd5e819 Merge commit 'origin/master' into martins
(This used to be ctdb commit 2ecc701869c8bc2d823a8073453c6caf1575dc47)
2008-12-09 11:46:34 +11:00
Martin Schwenke
52c76f25f6 Merge commit 'origin/master' into martins
(This used to be ctdb commit 1b00fe0bac36422d30be167a009c452058975a21)
2008-12-08 17:03:50 +11:00
root
e4722f8ce4 return -1 if ctdb ping failed
(This used to be ctdb commit 691b9c0f1771afa564a5959405f2e7a54c334d45)
2008-12-08 12:57:40 +11:00
Martin Schwenke
2764c2d7be Merge commit 'origin/master' into martins
(This used to be ctdb commit ec354d602d20700e6769deb798436d08256a49d5)
2008-12-08 08:57:46 +11:00
root
e54347fa4e redo and update how we synchronize flags across the cluster.
this simplifies the code and should close a race condition between the local recovery daemon and a remote node when flags are changing.

(This used to be ctdb commit 32d460b8469eb53145f04161a5d01166f9b5f09e)
2008-12-05 16:32:30 +11:00
Martin Schwenke
733fe4594c Merge commit 'origin/master' into martins
(This used to be ctdb commit 4ff5875c965f21ab76a5924efd92f1832aeb36d4)
2008-12-04 14:42:04 +11:00
Ronnie Sahlberg
539f044aa3 print the list of valid debug level literals when an invalid debug level
is specified in 'ctdb setdebug'

(This used to be ctdb commit 979e78cfd96d74686af6f55f726c395a75275803)
2008-12-02 14:08:10 +11:00
Ronnie Sahlberg
edb7241c05 redesign how reloadnodes is implemented.
modify the transport methods to allow to restart individual connections
and set up destructors properly.

only tear down/set-up tcp connections to nodes removed from the cluster
or nodes added to the cluster.
Leave tcp connections to unchanged nodes connected.

make "ctdb reloadnodes" explicitely cause a recovery of the cluster once
the files have been realoaded

(This used to be ctdb commit d1057ed6de7de9f2a64d8fa012c52647e89b515b)
2008-12-02 13:26:30 +11:00
root
7592a97d16 debuglevel is a signed int, not usnigned.
(This used to be ctdb commit e577a276900854622f4e9da9d1ccd7b484d0d1ec)
2008-11-28 11:29:43 +11:00
Ronnie Sahlberg
51cc8b4df8 make it possible to delete an ip from all nodes at once using
"ctdb delip x.x.x.x -n all"

This is not as straightforward as one might think since during the
delete process we don not want the ip to be bouncing from one node to
another as node by node deletes it.

Thus we first delete the ip from all connected nodes which are not
currently hosting it.

After this we delete the ip from the node which is hosting it.

(This used to be ctdb commit bbd46f341e9aa32d8dbd49f7a9a07cb3f1f92ea3)
2008-11-28 09:52:26 +11:00
Martin Schwenke
bc3a6b20c5 Merge commit 'origin/master' into martins
(This used to be ctdb commit e088116238eb107e9831fccbfd66c1db3d837a3b)
2008-11-21 13:00:37 +11:00
Andrew Tridgell
eeae32c8d2 Merge commit 'ronnie/master'
(This used to be ctdb commit fe6ddf7992ca3e72a26dbac6666e0f6270da611f)
2008-11-20 21:23:26 +11:00
Martin Schwenke
d741559fa6 Add some simple tests that can be run from within the tree.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit eacb2ef82ea4809d874158756db973dd1e3fc8fc)
2008-11-20 20:40:01 +11:00
Ronnie Sahlberg
b9bd20ce55 add a context and a timed event so that once we have been in recovery
mode for too long we drop all public ip addresses

(This used to be ctdb commit 403c68f96e1380dd07217c688de2730464f77ea0)
2008-10-22 11:04:41 +11:00
Andrew Tridgell
371e6aa155 Merge commit 'ronnie/master'
(This used to be ctdb commit 5403ed6dcfdfc101b05b43f83002e720d81b4e38)
2008-10-16 12:58:25 +11:00
Ronnie Sahlberg
6e490e8cce verify that the nodes we try to ban/unban are operational and print an
error to the user othervise.

(This used to be ctdb commit 5747dd2d80af29d6252afb6aeb3e66328ee20de5)
2008-10-15 01:23:57 +11:00
Ronnie Sahlberg
6dbeb91e03 From Mathieu Parent
patch to make debian systems log the package versions in
ctdb_diagnostics

(This used to be ctdb commit 07dd4c7d2e8ba10f53d4cf2644fc4b7b8647e286)
2008-10-13 08:21:20 +11:00
Andrew Tridgell
c5edaf7a6e added some more gpfs commands per-filesystem
(This used to be ctdb commit a5d5aa455c7f7eb93d3fa6f403d5b8e0b795109d)
2008-10-09 18:45:12 +11:00
Ronnie Sahlberg
156662e257 Check that a database exists first before we dump its content (and
implicitely also create it) using 'ctdb catdb'

(This used to be ctdb commit 647003da975d4823abe8ed2bfb46153d68ea0fb0)
2008-09-23 01:38:28 +10:00
Andrew Tridgell
6cf004e98f expanded ctdb_diagnostics based on recent experience
(This used to be ctdb commit a06abf6bff6c4d379453e5063d8de1a6542c982a)
2008-09-17 21:00:04 +10:00