IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
verify that all nodes agree on the most recent ip address assignments
broke "ctdb moveip ..." since that call would never trigger
a full takeover run and thus would immediately trigger an inconsistency.
Add a new message to the recovery daemon where we can tell the recovery daemon to update its assignments.
BZ62782
(This used to be ctdb commit e7069082e5f0380dcddee247db8754218ce18cab)
Othervise, as soon as it terminates, ctdbd will deregister the id automatically.
(This used to be ctdb commit 23b059dcb8074872d7900b225790d4df7da071b6)
return success back to the caller instead.
otherwise things like 'ctdb enable -n all' will just finish after the first disabled node has become enabled.
(This used to be ctdb commit f4eb41cd3a1099da8265351818fba9bd4688a188)
Check if the node is already enabled/disabled and log an information
message if so.
(This used to be ctdb commit c3eec8f10764a647106087099eeb47b7196f7aac)
In other news, did you know ctime() returns a \n-terminated string?
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 1b4e7bb548976b99f122142b040494b6f9911962)
We also no longer return an error before scripts have been run; a special
zero-length data means we have never run the scripts.
"ctdb scriptstatus all" returns all event script results.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 9b90d671581e390e2892d3a68f3ca98d58bef4df)
We're going to allow fetching status of all script runs, so this
name is no longer appropriate.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit f5cb41ecf3fa986b8af243e8546eb3b985cd902a)
Rather than ignoring deleted event scripts (or pretending that they were "OK"),
and discarding other stat errors, we save the errno and turn it into a negative
status.
This gives us a bit more information if we can't execute a script (eg.
too many symlinks or other weird errors).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 5d894e1ae5228df6bbe4fc305ccba19803fa3798)
This unifies code paths and simplifies things: we just hand -ENOEXEC to
ctdb_ctrl_event_script_stop().
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit eadf5e44ef97d7703a7d3bce0e7ea0f21cb11f14)
This starts the move toward more expressive encoding of return values:
positive values mean the script ran, negative means we had a problem with
the script (and the value is the errno).
This does timeout, but changes the ctdb tool to recognize it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 0eb1d0aa14e68b598d9e281c8a02b8f94a042fd9)
dont allow UNHEALTHY nodes to become natgw master, unless all nodes
are unhealthy
(This used to be ctdb commit e8e7129ff1371065fbd75e1aea844d6d04a96fa9)
add a global variable holding the pid of the main daemon.
change the tracking of time() in the event loop to only check/warn when called from the main daemon
(This used to be ctdb commit a10fc51f4c30e85ada6d4b7347b0f9a8ebc76637)
master to perform an explicit ip reallocation.
This is more reliable and faster than having the recovery dameon track these
changes, and since we now have an explicit method to ask the recovery daemon
to perform an explicit ip reallocation, we should use this.
(This used to be ctdb commit 3807681e74f4bfe92befdae6ed616ff5f1a99880)
database priorities will be used to control in which order databases are locked during recovery in.
(This used to be ctdb commit 67741c0ee01916d94cace8e9462ef02507e06078)
This is useful when we are moving addresses using moveip in the cluster since otherwise if we collide with the recovery daemons own check we could cause a recovery
(This used to be ctdb commit 9c63858c0b22c81eaccb9865a414af0bbb2833d4)
also check the returned status code in case the _stop() command failed
due to the eventscripts failing.
If this happens, make "ctdb stop" log an error to the console and try
the operation again.
(This used to be ctdb commit 20e82e0c48e07d1012549f5277f1f5a3f4bd10d1)
also verify that we actually do have a natgw master available if this is configured and make the node unhealthy if not.
(This used to be ctdb commit 7f273ee769d671d8c8be87c9187302fb77e814f3)
This node flag means the node is DISABLED and that all its public ip addresses
are failed over, but also that it has been removed from the VNNmap.
A STOPPED node should be in recovery mode active untill restarted using the continue command.
Adding two new commands "ctdb stop" "ctdb continue"
(This used to be ctdb commit d47dab1026deba0554f21282a59bd172209ea066)
This will force a wait until the ip addresses have been reallocated after a disable/enable command and will make scripting of enable/disable more predictable.
This will cause the command enable/disable to wait until the ip realocation that normally follows shortly after a enable/disable to finish before the command returns to the prompt.
(This used to be ctdb commit 6e1f60d8d780c1240aaabb78ecc8550d0480cd7e)
validate the input values used and refuse setting the debug level to an unknown value
(This used to be ctdb commit daec49cea1790bcc64599959faf2159dec2c5929)
in this case, read the nodes file directly instead of asking the local daemon for the list.
add an option -Y to provide machinereadable output to listnodes
(This used to be ctdb commit 4a55cacc4f5526abd2124460b669e633deeda408)
This is used to mark nodes as being DELETED internally in ctdb
so that nodes are not renumbered if / when they are removed from the nodes file.
This is used to be able to do "ctdb reloadnodes" at runtime without
causing nodes to be renumbered.
To do this, instead of deleting a node from the nodes file, just comment it out like
1.0.0.1
#1.0.0.2
1.0.0.3
After removing 1.0.0.2 from the cluster, the remaining nodes retain their
pnn's from prior to the deletion, namely 0 and 2
Any line in the nodes file that is commented out represents a DELETED pnn
(This used to be ctdb commit 6a5e4fd7fa391206b463bb4e976502f3ac5bd343)
Log this in "ctdb statistics".
Also add a varaible "RecLockLatencyMs" that will log an error everytime it takes longer than this to access the reclock file.
(This used to be ctdb commit 042377ed803bb8f7ca9d6ea1a387427b7b8ba45a)
create a new debugging command xpnn which discovers the pnn of the local node and which works even if the local daemon is not running
(This used to be ctdb commit cd78765f9400d7abce7929a2dd199f65226e7664)
this command shows which eventscripts were executed during the last monitoring cycle and the status from each eventscript.
If an eventscript timedout or returned an error we also
show the output from the eventscript.
Example :
[root@rcn1 ctdb-git]# ./bin/ctdb scriptstatus
6 scripts were executed last monitoring cycle
00.ctdb Status:OK Duration:0.021 Mon Mar 23 19:04:32 2009
10.interface Status:OK Duration:0.048 Mon Mar 23 19:04:32 2009
20.multipathd Status:OK Duration:0.011 Mon Mar 23 19:04:33 2009
40.vsftpd Status:OK Duration:0.011 Mon Mar 23 19:04:33 2009
41.httpd Status:OK Duration:0.011 Mon Mar 23 19:04:33 2009
50.samba Status:ERROR Duration:0.057 Mon Mar 23 19:04:33 2009
OUTPUT:ERROR: Samba tcp port 445 is not responding
Add a new helper function "switch_from_server_to_client()" which both
the recovery daemon can use as well as in the child process we start for running the actual eventscripts.
Create several new controls, both for the eventscript child process to inform the master daemon of the current status of the scripts as well as for the ctdb tool to extract this information from the runninc daemon.
(This used to be ctdb commit c98f90ad61c9b1e679116fbed948ddca4111968d)
two new dedicated ctdb error codes
21: node does not exist
22: node is disconnected
(This used to be ctdb commit 7ee6db06162ad5a554058bb6160ad37b24fe42e0)
block and wait until the clustered has completed the recovery before returning.
this makes it easier to script since it avoids the common need for
ctdb recover
... complex loop to wait for recovery to complete ...
script continues
(This used to be ctdb commit 8a0df9324a03b0f17772c64a9331236126c22124)
If set this specified the maximum runtime for the ctdb tool before it will terminate with status == 20
Just like the -T ... option would.
(This used to be ctdb commit c404d57afb2adda039e676877838927d3073df11)
change the ban/unban logic to wait until we are not in recovery before it bans/unbans the node.
also wait until after the cluster has recovered from the ban/unban before returning so that the cluster is in recpovery mode == normal when the command returns. this makes it much easier to script things ...
(This used to be ctdb commit 39c77371a2f995025a584691fe61af12dc6ed5d7)
this simplifies the code and should close a race condition between the local recovery daemon and a remote node when flags are changing.
(This used to be ctdb commit 32d460b8469eb53145f04161a5d01166f9b5f09e)
modify the transport methods to allow to restart individual connections
and set up destructors properly.
only tear down/set-up tcp connections to nodes removed from the cluster
or nodes added to the cluster.
Leave tcp connections to unchanged nodes connected.
make "ctdb reloadnodes" explicitely cause a recovery of the cluster once
the files have been realoaded
(This used to be ctdb commit d1057ed6de7de9f2a64d8fa012c52647e89b515b)
"ctdb delip x.x.x.x -n all"
This is not as straightforward as one might think since during the
delete process we don not want the ip to be bouncing from one node to
another as node by node deletes it.
Thus we first delete the ip from all connected nodes which are not
currently hosting it.
After this we delete the ip from the node which is hosting it.
(This used to be ctdb commit bbd46f341e9aa32d8dbd49f7a9a07cb3f1f92ea3)
Encode the database name in the header so we dont need to provide the database
name when doing a restore
Encode a timestamp in the header telling us when the backup was created
(This used to be ctdb commit 77762170ad1dbc4620565bb898af5d493fac117d)
ctdb backupdb : which will copy a database out from ctdb and write it to a file
ctdb restoredb : which will read a database backup from a file and write it into ctdb
(This used to be ctdb commit b567e215f5c58d646a392408b9cc1df8ef029b33)
This file creates additional locking stress on the backend filesystem and we may not need it anyway.
(This used to be ctdb commit 84236e03e40bcf46fa634d106903277c149a734f)
lvs: which shows which nodes are active LVS servers
lvsmaster: which shows which node is the lvs master multiplex node
pnn: which prints the pnn of the local node
(This used to be ctdb commit 00025eef662b867293829228c681df491cd6f371)
make ctdb uptime print how long the recovery took
in the recovery daemon when we check that the public ip address
allocation on the local node is correct (we have the ips we should have
and we dont have any we shouldnt have) use ctdb uptime and check the
recovery start/stop times and make sure we dont check for ip allocation
inconsistencies during a recovery where the ip address allocation is in flux.
(This used to be ctdb commit f86551580349b7f662f9a07e4eb0c1189e38e429)
this callback is called for every node where the control failed (or timed out)
when we issue the start recovery control from recovery master,
set any node that fails as a culprit so it will eventually be banned
(This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2)
ctdb_attach() so that we can pass TDB_NOSYNC when we attach to
a persistent database and want fast unsafe writes instead of
slow but safe tdb_transaction writes.
enhance the ctdb_persistent test suite to test both safe and unsafe writes
(This used to be ctdb commit 4948574f5a290434f3edd0c052cf13f3645deec4)
This enhances the framework for sending tcp tickles to be able to send ipv6 tickles as well.
Since we can not use one single RAW socket to send both handcrafted ipv4 and ipv6 packets, instead of always opening TWO sockets, one ipv4 and one ipv6 we get rid of the helper ctdb_sys_open_sending_socket() and just open (and close) a raw socket of the appropriate type inside ctdb_sys_send_tcp().
We know which type of socket v4/v6 to use based on the sin_family of the destination address.
Since ctdb_sys_send_tcp() opens its own socket we no longer nede to pass a socket
descriptor as a parameter. Get rid of this redundant parameter and fixup all callers.
(This used to be ctdb commit 406a2a1e364cf71eb15e5aeec3b87c62f825da92)
This allows us to use the async framework also for controls that return
outdata.
Add a "capabilities" field to the ctdb_node structure. This field is
only initialized and kept valid inside the recovery daemon context and not
inside the main ctdb daemon.
change the GET_CAPABILITIES control to return the capabilities in outdata instead of in the res return variable.
When performing a recovery inside the recovery daemon, read the capabilities from all connected nodes and update the ctdb->nodes list of nodes.
when building the new vnnmap after the database rebuild in recovery, do not include any nodes which lack the LMASTER capability in the new vnnmap.
Unless there are no available connected node that sports the LMASTER capability in which case we let the local node (recmaster) take on the lmaster role temporarily (i.e. become a member of the vnnmap list)
(This used to be ctdb commit 0f1883c69c689b28b0c04148774840b2c4081df6)
Define two capabilities :
can be recmaster
can be lmaster
Default both capabilities to YES
Update the ctdb tool to read capabilities off a node
(This used to be ctdb commit 50f1255ea9ed15bb8fa11cf838b29afa77e857fd)
If no other node is hosting this public ip at the moment, then assign it immediately to the current node.
(This used to be ctdb commit a63825e32658b36e0964584758b9a276c18056b8)
this collects all public addresses from all nodes and presents the public ips
for the entire cluster
(This used to be ctdb commit cbf79b2158ab21a58aef967e89f0bd60890a7972)
this collects all public addresses from all nodes and presents the public ips
for the entire cluster
(This used to be ctdb commit 0a4e667f42c6fb23be13651f7b0d0a545a49900b)
and a ctdb command to pull the talloc memory map from a recovery daemon
ctdb rddumpmemory
(This used to be ctdb commit d23950be7406cf288f48b660c0f57a9b8d7bdd05)
The controls only modify the runtime setting of which public addresses a node
can server and does not modify /etc/ctdb/public_addresses.
To make the change permanent you also need to edit /etc/ctdb/public_addresses
manually.
After ip addresses have been added/deleted you need to invoke a recovery
for the ip addresses to be redistributed.
(This used to be ctdb commit f8294d103fdd8a720d0b0c337d3973c7fdf76b5c)
Add back the controls to enable/disable monitoring we used to have for debugging but removed a while ago
(This used to be ctdb commit 8477f6a079e2beb8c09c19702733c4e17f5032fe)
ctdb moveip <IPADDRESS> <NODE>
which can be used to manually fail an ip address over to a specific node.
This can only be used if DeteministicIPs are disabled and also only if NoIPFailback is enabled.
(This used to be ctdb commit ffee062b7e26a6aa6ad254edb58399040ecaa542)
add a new control that causes the node to drop the current nodes list
and reread it from the nodes file.
During this operation, the node will also drop the tcp layer and restart it.
When we drop the tcp layer, by talloc_free()ing the ctcp structure
add a destructor to ctcp so that we also can clean up and remove the references in the ctdb structure to the transport layer
add two new commands for the ctdb tool.
one to list all nodes in the nodesfile and the second a command to trigger a node to drop the transport and reinitialize it with the nde nodes file
(This used to be ctdb commit 4bc20ac73e9fa94ffd43cccb6eeb438eeff9963c)
memory tree to stdout. This is much more useful than putting it in the log, and also fixes
a bug where the pipe would overflow internally and cause ctdbd to lockup
(This used to be ctdb commit e236979e2162d9bd7a495086342168a696cf76c5)