IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
This makes it 2, since this error corresponds loosely to ENOENT.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 1bf289abdd3067a40e9a67091aba78222d13eddf)
This hasn't worked for a while if ever.
We treat this case specially because the output has 2 works on the 1st
line. We also handle the error case where /etc/ctdb_natgw_nodes
exists but none of the other $NATGW_* configuration is done.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 66e89797c7866d207a5bbf1836f52d70dba7cea6)
Setting IFS and looping though items with colons in them doesn't work.
Change this to read through the output line by line. The header line
needs to be thrown away by throwing away everything up to the 1st
newline.
Keep stderr from the "ctdb status" command, otherwise debugging is
impossible.
On error, append any output from ctdb to onnode's error message.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit d60592cf99999f10344a05ef0571fb300bb9d97c)
The comment about $CTDB_NODES_SOCKETS is meaningless. The code ti
refers to works just find with $CTDB_NODES_SOCKETS.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 74e69a564bac653dadfffe8b08145b9b3be16e61)
The current code requires knowledge of the number of status bits
output by "ctdb status -Y".
This changes the code to be completely general.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit e1788f25fde3d1f26bf4831a331741aa280f6fbc)
Use of "local" was masking errors in command-line processing.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit ca80adda7517b43147ef30156ae34c66b29fa2bd)
The current version gives the last item left after stripping the known
fields. If an insufficent number of status fields is stripped then
this would return a residual status field value, which turned out to
be a valid IP address for localhost... so no error occurs.
This change means that the node number is stripped and any residual
status field value will stay appended, causing an error the first time
this command is tested.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 74715e6ec7b67c6f0e863aa51c87279758d6bf91)
When the output of "ctdb status -Y" changed to add an extra status
column we didn't fix onnode.
This adds a match for the extra column.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 793febaebd3d484ddfbbcb47aaa0cdf3cfc1a00d)
This this is a tool to handle (dump and convert) ctdb's local tdb
copies (ltdbs) without connecting to a ctdb daemon.
It can be used to
* dump the contents of a ltdb, printing
the ctdb record header information
* dump a non-clustered tdb database (like tdbdump)
* convert between an ltdb and a non-clustered tdb
(adding or removing ctdb headers)
* convert between 64 and 32 bit ltdbs
(the ctdb record headers differ by 4 bytes of padding)
usage: bin/ltdbtool dump [-p] [-s{0|32|64}] <idb>
bin/ltdbtool convert [-s{0|32|64}] [-o{0|32|64}] <idb> <odb>
Pair-Programmed-With: Michael Adam <obnox@samba.org>
(This used to be ctdb commit efcf2815711cd5371633614fb91273bd0a786da0)
too much.
This means we can simplify the way we add ips significantly and stop
trying to move them.
We also check if the node already hosts the ip, in which case we used to return an error. Instead just print an error string but return 0, ok.
This makes it easier to script, and works around broken scripts.
CQ1021034
(This used to be ctdb commit 307e5e95548155a31682dfcb0956834d0c85838e)
Found during automatic regression testing.
We do not allow the takeip/releaseip events to be executed during a recovery.
All of "ctdb addip, ctdb delip, ctdb moveip" use and force these events to
trigger to perform the ip assignments required.
If these commands collide with a recovery, these commands could fail since we do
not allow takeip/releaseip events to trigger during the recovery.
While it is easy to just try running hte command again, this is suboptimal for script use.
Change these commands to retry these operations a few times until either successfull or until we give up.
This makes the commands much easier to use in scripts.
(This used to be ctdb commit 6954c9df67501183995f408cca358c8fdfb176ab)
After finishing "ctdb addip" wait for an implicit "iptakeover" to complete
the assignment to a node.
This makes it more wasteful and timeconsuming when adding multiple ips
at once, or the same ip to multiple nodes,
but makes it easier to script the use of this command.
(This used to be ctdb commit d86cbf3d7d426c558d110d67dc985634c754a522)
ctdb readkey <dbid> <key>
ctdb writekey <dbid> <key> <value>
these are mainly intended for debugging of databases and dmaster migration issues
(This used to be ctdb commit 70c2e7dd04727371590fb94579ffd20318fbeb58)
from the ctdb command.
This is a debugging message and is normal tro tigger on a busy system.
It should not be logged as ERROR.
(This used to be ctdb commit 9ddf89e01f1845eec1712d75fb811240e8bb0e37)
Add a new command "ctdb stats [num]" that prints the [num] most recent statistics intervals collected.
(This used to be ctdb commit e6e16fcd5a45ebd3739a8160c8fb5f44494edb9e)
network connectivity outside of the cluster to still be able to
participate in a natgw group.
These nodes can not become natgw master since they lack external network
connectivity.
These nodes are configured just the same way as for any other node with
NATGW, with the following two exceptions :
* we do NOT set CTDB_NATGW_PUBLIC_IFACE at all on these nodes.
since these ndoes lack external network we should not check the interface
for link.
* we must set CTDB_NATGW_SLAVE_ONLY=yes to flag that this is a node that
can not become natgw master.
(This used to be ctdb commit ab7b00a37e55beffc074be95b55d8a5c7cb9eef2)
fix a couple of incorrect settings for "auto-all" for a few of the commands as well.
(This used to be ctdb commit 9999771105d7105efaa232fe2842e21e66f78706)
tdb file.
the command automatically strips off the initial ctdb header off the record so it can only be used on ctdb managed tdb files, not on normal tdb files.
(This used to be ctdb commit c3a816e5174abefb5155f65d8faad7b1e831e481)
revert the defauls case back to only showing the ip and node
and only display the extra info if -v verbose output is requested
(This used to be ctdb commit 6488651aa7e105c57324f4a300760a010d098fbb)
port.
Default is to continue to show all tickles, but if a second argument
is given, only tickles for that port will be shown.
(This used to be ctdb commit 5b985eb2cbbb92bf6ccfcacd633d793bcd4e3ec1)
Add a new "ctdb deltickle" command to delete tickles from the database.
This can ONLY be used for tickles created by "ctdb addtickle".
Push any "addtickle/deltickle" updates to other nodes every TickleUpdateInterval seconds'
(This used to be ctdb commit acded034e2f0dcae4c2c9e54e16a001caf23caec)
This can be used to set ctdbd up to generate a tickle for non-samba
services.
(samba contains code to set tickles up automatically)
(This used to be ctdb commit 7ef2cddad5326fdcc26138906948342039829495)
In Samba this is now called "tevent", and while we use the backwards
compatibility wrappers they don't offer EVENT_FD_AUTOCLOSE: that is now
a separate tevent_fd_set_auto_close() function.
This is based on Samba version 7f29f817fa.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 85e5e760cc91eb3157d3a88996ce474491646726)
In some contexts ctdb_diagnostics generates too many errors when it is
run on heterogeneous and machine-configured clusters. In some
clusters some nodes are expected to be differently configured and also
machine-generated configured files can have comments containing
timestamps.
This adds some command-line options that can be used to reduce the
number of errors reported:
-n <nodes> Comma separated list of nodes to operate on
-c Ignore comment lines (starting with '#') in file comparisons
-w Ignore whitespace in file comparisons
--no-ads Do not use commands that assume an Active Directory Server
The -n option simply allows ctdb_diagnostics to operate on a subset of
nodes, avoiding file comparisons with and data collection on nodes
that are differently configured. For file comparisons, instead of
showing each file on the current node and then comparing other nodes
to that file, the file from the first (available or requested) nodes
is shown and then other nodes are compared to that. That has resulted
in changes in output - that is, ctdb diagnostics no longer prints
messages referencing the current node.
-c and -w are used to weaken comparisons between configuration files.
--no-ads can be used to avoid running ADS-specific commands if a
cluster uses LDAP (or other non-ADS) configuration.
This also fixes a number of bugs in related code:
* A call to onnode was losing the >> NODE ... << lines because they
now go to stderr. This was changed in onnode long ago but
ctdb_diagnostics was never updated to match.
* ctdb_diagnostics was counting lines in /etc/ctdb/nodes to determine
what nodes to operate on. For some time the nodes file has
supported syntax that makes this invalid. "ctdb listnodes -Y" is
now used to list available nodes.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 36c8244a0f68c7c9bbee40982f230e9d14d3c0ea)
Martin accidentally typed this instead of "ctdb scriptstatus releaseip"
and it crashes.
CQ:S1018859
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 70877b2e7f8fd0d46899bbeca2c6caad6e6e6820)
In that case, when the main daemon is not running
the ctdb context will be initialized to NULL, since we can not connect.
Move the calls to read the ctdb socketname and connecting via libctdb to
only happen when we are executing a "ctdb ..." command that requires that we talk to the actual daemon.
Otherwise we will get an ugly SEGV for the "ctdb ..." commandline tool
when trying to run a command that is supposed to work also when the daemon is down.
(This used to be ctdb commit 18168da84a6aa8d69465e43402444c7ec979604a)
and print the time startistics was taken and for how long the statistics have been collected to the "ctdb statistics" output.
(This used to be ctdb commit 1bdfe0cd3370a335b960ce1ef97eade93b0cd2fa)
update the function "control_pnn()" to use libctdb to ask the daemon for the pnn
(This used to be ctdb commit 3f651eb8d71c7af0268460bc4b1476112140b290)
ctdb_client.h is the existing internal client interface (which was mainly
in ctdb.h), and ctdb_protocol.h is the information needed for the wire
protocol only.
ctdb.h will be the new, shiny, libctdb API.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 4bba6b8cd47b352f98d41f9f06258d5ac3c9adef)
and returning an error.
This might not be sufficient if there are several recoveries in a row.
Instead loop as long as it takes for the recovery master to finish the recoveries and re
spond to the ipreallocate call.
Increase the log level of the error message when the recovery master was busy and could
not perform the ipreallocation promptly
BZ61783
(This used to be ctdb commit 8e9fd36e4619b7cc7bb6f7f7416d13e4c00a296a)
verify that all nodes agree on the most recent ip address assignments
broke "ctdb moveip ..." since that call would never trigger
a full takeover run and thus would immediately trigger an inconsistency.
Add a new message to the recovery daemon where we can tell the recovery daemon to update its assignments.
BZ62782
(This used to be ctdb commit e7069082e5f0380dcddee247db8754218ce18cab)
Othervise, as soon as it terminates, ctdbd will deregister the id automatically.
(This used to be ctdb commit 23b059dcb8074872d7900b225790d4df7da071b6)
return success back to the caller instead.
otherwise things like 'ctdb enable -n all' will just finish after the first disabled node has become enabled.
(This used to be ctdb commit f4eb41cd3a1099da8265351818fba9bd4688a188)
Check if the node is already enabled/disabled and log an information
message if so.
(This used to be ctdb commit c3eec8f10764a647106087099eeb47b7196f7aac)
2 changes:
* If a relative nodes file is specified via -f or $CTDB_NODES_FILE but
this file does not exist then try looking for the file in /etc/ctdb
(or $CTDB_BASE if set).
* If a nodes file is specified via -f or $CTDB_NODES_FILE but this
file does not exist (even when checked as per above) then do not
fall back to /etc/ctdb/nodes ((or $CTDB_BASE if set). The old
behaviour was surprising and hid errors.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 60aa570aaa77d293b963105b3f605f9625a4594b)
In other news, did you know ctime() returns a \n-terminated string?
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 1b4e7bb548976b99f122142b040494b6f9911962)
We also no longer return an error before scripts have been run; a special
zero-length data means we have never run the scripts.
"ctdb scriptstatus all" returns all event script results.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 9b90d671581e390e2892d3a68f3ca98d58bef4df)
We're going to allow fetching status of all script runs, so this
name is no longer appropriate.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit f5cb41ecf3fa986b8af243e8546eb3b985cd902a)
Rather than ignoring deleted event scripts (or pretending that they were "OK"),
and discarding other stat errors, we save the errno and turn it into a negative
status.
This gives us a bit more information if we can't execute a script (eg.
too many symlinks or other weird errors).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 5d894e1ae5228df6bbe4fc305ccba19803fa3798)
This unifies code paths and simplifies things: we just hand -ENOEXEC to
ctdb_ctrl_event_script_stop().
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit eadf5e44ef97d7703a7d3bce0e7ea0f21cb11f14)
This starts the move toward more expressive encoding of return values:
positive values mean the script ran, negative means we had a problem with
the script (and the value is the errno).
This does timeout, but changes the ctdb tool to recognize it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 0eb1d0aa14e68b598d9e281c8a02b8f94a042fd9)
dont allow UNHEALTHY nodes to become natgw master, unless all nodes
are unhealthy
(This used to be ctdb commit e8e7129ff1371065fbd75e1aea844d6d04a96fa9)
add a global variable holding the pid of the main daemon.
change the tracking of time() in the event loop to only check/warn when called from the main daemon
(This used to be ctdb commit a10fc51f4c30e85ada6d4b7347b0f9a8ebc76637)
All event scripts use only the relative path, so we should
here.
Also PATH includes /sbin and /usr/sbin...
metze
(This used to be ctdb commit 20678e1506db1f96b58c326ee91339e797c07c22)
The -f option allows an alternate nodes file to be specified,
overriding the CTDB_NODES_FILE environment variable.
The -n option allows hostnames to be used instead of node numbers.
Using a range of hostnames is invalid, so hostnames can't contain
hyphens ('-') - sorry! You can use this option without a nodes file
by specifying "-f /dev/null".
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 46474e5f21fd97dd765c616647ff46055a9970e7)
master to perform an explicit ip reallocation.
This is more reliable and faster than having the recovery dameon track these
changes, and since we now have an explicit method to ask the recovery daemon
to perform an explicit ip reallocation, we should use this.
(This used to be ctdb commit 3807681e74f4bfe92befdae6ed616ff5f1a99880)
database priorities will be used to control in which order databases are locked during recovery in.
(This used to be ctdb commit 67741c0ee01916d94cace8e9462ef02507e06078)
This is useful when we are moving addresses using moveip in the cluster since otherwise if we collide with the recovery daemons own check we could cause a recovery
(This used to be ctdb commit 9c63858c0b22c81eaccb9865a414af0bbb2833d4)
In testing and other situations (e.g. eventscripts) it is necessary to
select a node where a ctdb command can be run. The whole idea here is
to avoid nodes where ctdbd is not running and where most ctdb commands
would fail. This implements a standard way of doing this involving a
recursive onnode command.
There is still a small window for a race, where the selected node is
suddenly shutdown, but this is unavoidable.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit fb47cce86c0edae5caaf485f13ae7a151b6cb00d)
also check the returned status code in case the _stop() command failed
due to the eventscripts failing.
If this happens, make "ctdb stop" log an error to the console and try
the operation again.
(This used to be ctdb commit 20e82e0c48e07d1012549f5277f1f5a3f4bd10d1)
also verify that we actually do have a natgw master available if this is configured and make the node unhealthy if not.
(This used to be ctdb commit 7f273ee769d671d8c8be87c9187302fb77e814f3)
This node flag means the node is DISABLED and that all its public ip addresses
are failed over, but also that it has been removed from the VNNmap.
A STOPPED node should be in recovery mode active untill restarted using the continue command.
Adding two new commands "ctdb stop" "ctdb continue"
(This used to be ctdb commit d47dab1026deba0554f21282a59bd172209ea066)
This will force a wait until the ip addresses have been reallocated after a disable/enable command and will make scripting of enable/disable more predictable.
This will cause the command enable/disable to wait until the ip realocation that normally follows shortly after a enable/disable to finish before the command returns to the prompt.
(This used to be ctdb commit 6e1f60d8d780c1240aaabb78ecc8550d0480cd7e)
validate the input values used and refuse setting the debug level to an unknown value
(This used to be ctdb commit daec49cea1790bcc64599959faf2159dec2c5929)
Commit a0f5148ac749758e2dfbd6099e829c5bf1d900e6 caused a subtle
regression. Due to the subtlety, this description is much longer than
the 1 line patch that fixes it! The regression, where a process that
invokes onnode is unexpectedly blocked, is only apparent if the
following conditions are met:
1. $CTDB_NODES_SOCKETS is set;
2. The command passed to onnode attempts to background a process; and
3. onnode is run in certain types of subshell (e.g. foo=$(onnode ...)).
In particular, when testing against local daemons (i.e. condition (1)
is met), tests/simple/07_ctdb_process_exists.sh would fail (because it
does both (2), (3)).
The problem is caused by the use of file descriptor 3 in the code that
allows separate filtering of stdout and stderr. A backgrounded
process will have this descriptor open and the $(...) construct
appears to wait for all file descriptors to be closed. This only
happens with local daemons because SSH is replaced by a shell and file
descriptor 3 leaks into that shell. It does not occur when SSH is
used because the file descriptor does not leak into the remote shell
where the process is backgrounded.
The fix is simply to redirect file descriptor 3 to /dev/null in the
fakessh function, which is used when $CTDB_NODES_SOCKETS is set.
Also fixed is another minor bug when the -o option and
$CTDB_NODES_SOCKETS are used in combination. The code uses the node
name as a suffix for the output filename(s). Usually this is an IP
address. However, when $CTDB_NODES_SOCKETS is in use the node name is
the socket name, which might be a path several directories deep.
Each output file is created via a simple redirection and this would
fail if unexpected directories appear in the filename. 3 possible
fixes were considered:
1. Replace all '/'s in the node name by '_'s. Nice and simple.
2. Use the basename of the node name. However, sockets may be in
different directories but have the same basename.
3. Create all required directories before redirecting. This is a
little more complex and probably doesn't meet the user's
expectations.
Option (1) is implemented here.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 5d320099025b6835eda3a1e431708f7e0a6b0ba6)
in this case, read the nodes file directly instead of asking the local daemon for the list.
add an option -Y to provide machinereadable output to listnodes
(This used to be ctdb commit 4a55cacc4f5526abd2124460b669e633deeda408)
This is used to mark nodes as being DELETED internally in ctdb
so that nodes are not renumbered if / when they are removed from the nodes file.
This is used to be able to do "ctdb reloadnodes" at runtime without
causing nodes to be renumbered.
To do this, instead of deleting a node from the nodes file, just comment it out like
1.0.0.1
#1.0.0.2
1.0.0.3
After removing 1.0.0.2 from the cluster, the remaining nodes retain their
pnn's from prior to the deletion, namely 0 and 2
Any line in the nodes file that is commented out represents a DELETED pnn
(This used to be ctdb commit 6a5e4fd7fa391206b463bb4e976502f3ac5bd343)
Log this in "ctdb statistics".
Also add a varaible "RecLockLatencyMs" that will log an error everytime it takes longer than this to access the reclock file.
(This used to be ctdb commit 042377ed803bb8f7ca9d6ea1a387427b7b8ba45a)
They both need to use a -Y option to ctdb and for natgwlist we only
want the 1st line.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit e781ff61e17d733349021bb036514f823c7cbfbb)
Some code re-factoring to implement this and to make it easy to
implement new ones. New simpler implementation of echo_nth() no
longer uses deleted get_nth() function.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 29559f5dd099bec210e98909c9b2e048461b7c81)
create a new debugging command xpnn which discovers the pnn of the local node and which works even if the local daemon is not running
(This used to be ctdb commit cd78765f9400d7abce7929a2dd199f65226e7664)
this command shows which eventscripts were executed during the last monitoring cycle and the status from each eventscript.
If an eventscript timedout or returned an error we also
show the output from the eventscript.
Example :
[root@rcn1 ctdb-git]# ./bin/ctdb scriptstatus
6 scripts were executed last monitoring cycle
00.ctdb Status:OK Duration:0.021 Mon Mar 23 19:04:32 2009
10.interface Status:OK Duration:0.048 Mon Mar 23 19:04:32 2009
20.multipathd Status:OK Duration:0.011 Mon Mar 23 19:04:33 2009
40.vsftpd Status:OK Duration:0.011 Mon Mar 23 19:04:33 2009
41.httpd Status:OK Duration:0.011 Mon Mar 23 19:04:33 2009
50.samba Status:ERROR Duration:0.057 Mon Mar 23 19:04:33 2009
OUTPUT:ERROR: Samba tcp port 445 is not responding
Add a new helper function "switch_from_server_to_client()" which both
the recovery daemon can use as well as in the child process we start for running the actual eventscripts.
Create several new controls, both for the eventscript child process to inform the master daemon of the current status of the scripts as well as for the ctdb tool to extract this information from the runninc daemon.
(This used to be ctdb commit c98f90ad61c9b1e679116fbed948ddca4111968d)
two new dedicated ctdb error codes
21: node does not exist
22: node is disconnected
(This used to be ctdb commit 7ee6db06162ad5a554058bb6160ad37b24fe42e0)
block and wait until the clustered has completed the recovery before returning.
this makes it easier to script since it avoids the common need for
ctdb recover
... complex loop to wait for recovery to complete ...
script continues
(This used to be ctdb commit 8a0df9324a03b0f17772c64a9331236126c22124)
If set this specified the maximum runtime for the ctdb tool before it will terminate with status == 20
Just like the -T ... option would.
(This used to be ctdb commit c404d57afb2adda039e676877838927d3073df11)
change the ban/unban logic to wait until we are not in recovery before it bans/unbans the node.
also wait until after the cluster has recovered from the ban/unban before returning so that the cluster is in recpovery mode == normal when the command returns. this makes it much easier to script things ...
(This used to be ctdb commit 39c77371a2f995025a584691fe61af12dc6ed5d7)
this simplifies the code and should close a race condition between the local recovery daemon and a remote node when flags are changing.
(This used to be ctdb commit 32d460b8469eb53145f04161a5d01166f9b5f09e)
modify the transport methods to allow to restart individual connections
and set up destructors properly.
only tear down/set-up tcp connections to nodes removed from the cluster
or nodes added to the cluster.
Leave tcp connections to unchanged nodes connected.
make "ctdb reloadnodes" explicitely cause a recovery of the cluster once
the files have been realoaded
(This used to be ctdb commit d1057ed6de7de9f2a64d8fa012c52647e89b515b)
"ctdb delip x.x.x.x -n all"
This is not as straightforward as one might think since during the
delete process we don not want the ip to be bouncing from one node to
another as node by node deletes it.
Thus we first delete the ip from all connected nodes which are not
currently hosting it.
After this we delete the ip from the node which is hosting it.
(This used to be ctdb commit bbd46f341e9aa32d8dbd49f7a9a07cb3f1f92ea3)