1
0
mirror of https://github.com/samba-team/samba.git synced 2024-12-25 23:21:54 +03:00
Commit Graph

329 Commits

Author SHA1 Message Date
Martin Schwenke
635da189dc Fix minor onnode bugs relating to local daemons.
Commit a0f5148ac749758e2dfbd6099e829c5bf1d900e6 caused a subtle
regression.  Due to the subtlety, this description is much longer than
the 1 line patch that fixes it!  The regression, where a process that
invokes onnode is unexpectedly blocked, is only apparent if the
following conditions are met:

1. $CTDB_NODES_SOCKETS is set;
2. The command passed to onnode attempts to background a process; and
3. onnode is run in certain types of subshell (e.g. foo=$(onnode ...)).

In particular, when testing against local daemons (i.e. condition (1)
is met), tests/simple/07_ctdb_process_exists.sh would fail (because it
does both (2), (3)).

The problem is caused by the use of file descriptor 3 in the code that
allows separate filtering of stdout and stderr.  A backgrounded
process will have this descriptor open and the $(...) construct
appears to wait for all file descriptors to be closed.  This only
happens with local daemons because SSH is replaced by a shell and file
descriptor 3 leaks into that shell.  It does not occur when SSH is
used because the file descriptor does not leak into the remote shell
where the process is backgrounded.

The fix is simply to redirect file descriptor 3 to /dev/null in the
fakessh function, which is used when $CTDB_NODES_SOCKETS is set.

Also fixed is another minor bug when the -o option and
$CTDB_NODES_SOCKETS are used in combination.  The code uses the node
name as a suffix for the output filename(s).  Usually this is an IP
address.  However, when $CTDB_NODES_SOCKETS is in use the node name is
the socket name, which might be a path several directories deep.
Each output file is created via a simple redirection and this would
fail if unexpected directories appear in the filename.  3 possible
fixes were considered:

1. Replace all '/'s in the node name by '_'s.  Nice and simple.
2. Use the basename of the node name.  However, sockets may be in
   different directories but have the same basename.
3. Create all required directories before redirecting.  This is a
   little more complex and probably doesn't meet the user's
   expectations.

Option (1) is implemented here.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 5d320099025b6835eda3a1e431708f7e0a6b0ba6)
2009-06-19 18:02:17 +10:00
Ronnie Sahlberg
2bb687c4cd remove unused variable
(This used to be ctdb commit 2a52336ec021dfe8d56ba72726feb7b2dbd41f68)
2009-06-09 10:58:46 +10:00
Ronnie Sahlberg
ac931b1371 dont require particular values for NoIPFailback and DeterministicIPs when
using ctdb moveip

(This used to be ctdb commit d350c631850377c09968d2978ef57d2bd0d50116)
2009-06-09 10:57:46 +10:00
Ronnie Sahlberg
f135684766 improve ctdb moveip so that it does not always trigger a recovery.
(This used to be ctdb commit 0ca28d7336463ecd2ff65620d8dbcbb496991531)
2009-06-09 10:56:50 +10:00
Ronnie Sahlberg
f6ccf96898 try avoiding to cause a recovery when deleting a public ip from a node
(This used to be ctdb commit 6318ea13464e2fe630084c40802d8e697c2cb999)
2009-06-05 17:57:14 +10:00
Ronnie Sahlberg
b046f5e3aa when adding an ip, try manually adding and takingover the ip instead of triggering a full recovery to do the same thing
(This used to be ctdb commit 4d5d22e64270cfb31be6acd71f4f97ec43df5b2c)
2009-06-05 17:00:47 +10:00
Ronnie Sahlberg
79eef7f2b5 dont list DELETED nodes in the ctdb listnodes output
(This used to be ctdb commit 7eb137aa4c24c69bd93b98fb3c7108e5f3288ebd)
2009-06-04 13:25:58 +10:00
Ronnie Sahlberg
f691b96d84 make it possible to run 'ctdb listnodes' also if the daemon is not running.
in this case, read the nodes file directly instead of asking the local daemon for the list.

add an option -Y to provide machinereadable output to listnodes

(This used to be ctdb commit 4a55cacc4f5526abd2124460b669e633deeda408)
2009-06-04 13:21:25 +10:00
Ronnie Sahlberg
45aa542064 teach ONNODE about deleted nodes
(This used to be ctdb commit 03d304e72a5839dc8d8d2e2312b346c21dca5774)
2009-06-02 15:03:44 +10:00
Ronnie Sahlberg
1dee7a2401 hide all DELETED nodes from the ctdb command output
(This used to be ctdb commit 91fdfee371d6be83af60cd38ac34afb295b9987a)
2009-06-01 15:43:30 +10:00
Ronnie Sahlberg
e6170b5389 add a new node state : DELETED.
This is used to mark nodes as being DELETED internally in ctdb
so that nodes are not renumbered if / when they are removed from the nodes file.

This is used to be able to do "ctdb reloadnodes" at runtime without
causing nodes to be renumbered.
To do this, instead of deleting a node from the nodes file, just comment it out like

   1.0.0.1
   #1.0.0.2
   1.0.0.3

After removing 1.0.0.2 from the cluster,  the remaining nodes retain their
pnn's from prior to the deletion, namely 0 and 2

Any line in the nodes file that is commented out represents a DELETED pnn

(This used to be ctdb commit 6a5e4fd7fa391206b463bb4e976502f3ac5bd343)
2009-06-01 14:18:34 +10:00
Sumit Bose
2fcedf6dac add missing checks on so far ignored return values
Most of these were found during a review by Jim Meyering <meyering@redhat.com>

(This used to be ctdb commit 3aee5ee1deb4a19be3bd3a4ce3abbe09de763344)
2009-05-21 11:22:21 +10:00
Christian Ambach
8e9736ac1f Remove error messages about a non-existing /var/log/log.ctdb when running ctdb with logging to syslog
(This used to be ctdb commit afdbf3c0df02decd823615134294abf2c8a8a5f3)
2009-05-14 18:59:31 +10:00
Ronnie Sahlberg
98a54c4675 Track how long it takes to take out the recovery lock from both the main dameon and also from the recovery daemon.
Log this in "ctdb statistics".

Also add a varaible "RecLockLatencyMs" that will log an error everytime it takes longer than this to access the reclock file.

(This used to be ctdb commit 042377ed803bb8f7ca9d6ea1a387427b7b8ba45a)
2009-05-14 10:33:25 +10:00
Ronnie Sahlberg
93a2829e94 check that a node is banned before trying to unban it.
(This used to be ctdb commit 4467b5f88d749d455854512f60a5d313cafa828b)
2009-05-12 18:32:41 +10:00
Martin Schwenke
53c9643104 Fix lvsmaster and natgwlist nodespecs.
They both need to use a -Y option to ctdb and for natgwlist we only
want the 1st line.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit e781ff61e17d733349021bb036514f823c7cbfbb)
2009-05-12 08:58:57 +10:00
Martin Schwenke
6098464175 New lvs/lvsmaster and natgw/natgwlist nodespecs for onnode.
Some code re-factoring to implement this and to make it easy to
implement new ones.  New simpler implementation of echo_nth() no
longer uses deleted get_nth() function.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 29559f5dd099bec210e98909c9b2e048461b7c81)
2009-05-12 08:58:23 +10:00
Martin Schwenke
9616959bd6 New option "-o <prefix>" saves stdout from each node to file <prefix>.<ip>.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit a0f5148ac749758e2dfbd6099e829c5bf1d900e6)
2009-05-12 08:58:04 +10:00
Ronnie Sahlberg
54a5e6c0c8 Add a -Y machinereadable flag to "lvsmaster"
(This used to be ctdb commit bbae698656d5da9a4a5b0fbfc3003844f246d54b)
2009-05-11 14:44:59 +10:00
Ronnie Sahlberg
1ee122e165 in the "lvsmaster" command, return -1 if there is no lvsmaster
(This used to be ctdb commit ce6afbdef36e3c386b75709f73ef55efe0bd1987)
2009-05-11 13:56:28 +10:00
Ronnie Sahlberg
6721546b53 change the ctdb command table to allow us to describe commands which can be run independtly of the ctdb daemon.
create a new debugging command xpnn which discovers the pnn of the local node and which works even if the local daemon is not running

(This used to be ctdb commit cd78765f9400d7abce7929a2dd199f65226e7664)
2009-03-25 14:46:05 +11:00
Ronnie Sahlberg
d7ff332896 update how the NATGW configuration works.
allow the cluster to be partitioned into multiple disjoint natgw subsets

(This used to be ctdb commit 1046885cd22b5001e0251de2e536b5f6793459be)
2009-03-25 13:37:57 +11:00
Ronnie Sahlberg
7265c713db we need to set the port properly in the parse_ip helper
(This used to be ctdb commit 43fe18d86995744ba61c7a6405b70edcb265930a)
2009-03-24 13:45:11 +11:00
root
629d5ee1fa add a new command "ctdb scriptstatus"
this command shows which eventscripts were executed during the last monitoring cycle and the status from each eventscript.

If an eventscript timedout or returned an error we also
show the output from the eventscript.

Example :
[root@rcn1 ctdb-git]# ./bin/ctdb scriptstatus
6 scripts were executed last monitoring cycle
00.ctdb              Status:OK    Duration:0.021 Mon Mar 23 19:04:32 2009
10.interface         Status:OK    Duration:0.048 Mon Mar 23 19:04:32 2009
20.multipathd        Status:OK    Duration:0.011 Mon Mar 23 19:04:33 2009
40.vsftpd            Status:OK    Duration:0.011 Mon Mar 23 19:04:33 2009
41.httpd             Status:OK    Duration:0.011 Mon Mar 23 19:04:33 2009
50.samba             Status:ERROR    Duration:0.057 Mon Mar 23 19:04:33 2009
   OUTPUT:ERROR: Samba tcp port 445 is not responding

Add a new helper function "switch_from_server_to_client()" which both
the recovery daemon can use as well as in the child process we start for running the actual eventscripts.

Create several new controls, both for the eventscript child process to inform the master daemon of the current status of the scripts as well as for the ctdb tool to extract this information from the runninc daemon.

(This used to be ctdb commit c98f90ad61c9b1e679116fbed948ddca4111968d)
2009-03-23 19:07:45 +11:00
Ronnie Sahlberg
4d2195c503 The wbinfo --sequence command has been depreciated in favor of the new
--online-status command

(This used to be ctdb commit b6e34503ac094a274a569a69e3d93d92ad911f4d)
2009-03-19 10:43:57 +11:00
root
4088e0aceb make sure we can collect proper mmfs data
(This used to be ctdb commit 76d655f9aa3ebd39e7a40d0bbd85e40d08f3e90b)
2009-03-12 12:33:19 +11:00
root
7a11082f0f collect net conf list in ctdb_diagnostics
(This used to be ctdb commit 0bb130090b8dce5f85b0cb178a19f877759c0caa)
2007-03-10 14:10:21 +11:00
root
b1e7724eb8 check the static-routes file if it exists
(This used to be ctdb commit 9ce84a7915abaa987160ecbcae63128a9ed0a741)
2007-03-10 13:45:38 +11:00
Ronnie Sahlberg
5c7570b103 Merge branch 'martins'
(This used to be ctdb commit fe4eea45c6b5702a794424037c3f2ab4241d5e5e)
2009-02-18 13:10:03 +11:00
Michael Adam
3cca0f75e4 Fix treatment of link local ipv6 addresses: set the scope id.
metze / Michael

Signed-off-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit 9d12de1ca6107801dada927729e755c0949d73bf)
2009-01-19 22:50:53 +01:00
Martin Schwenke
9e3ccd9d69 Merge commit 'origin/master' into martins
(This used to be ctdb commit 099a1605574c7a8d232fd4c2d0c65e55aedeafad)
2008-12-17 15:05:44 +11:00
root
6c1359ab0d add better errorchecking that nodes we try to talk to using the "ctdb" tool actually exist and that it is connected.
two new dedicated ctdb error codes
21: node does not exist
22: node is disconnected

(This used to be ctdb commit 7ee6db06162ad5a554058bb6160ad37b24fe42e0)
2008-12-17 14:26:01 +11:00
root
1bf3006665 update the "ctdb recover" command.
block and wait until the clustered has completed the recovery before returning.
this  makes it easier to script since it avoids the common need for
   ctdb recover
   ... complex loop to wait for recovery to complete ...
   script continues

(This used to be ctdb commit 8a0df9324a03b0f17772c64a9331236126c22124)
2008-12-10 12:06:51 +11:00
root
1209079672 add a CTDB_TIMEOUT variable for the ctdb tool.
If set this specified the maximum runtime for the ctdb tool before it will terminate with status == 20
Just like the -T ...  option would.

(This used to be ctdb commit c404d57afb2adda039e676877838927d3073df11)
2008-12-10 12:01:19 +11:00
root
58bf3804f0 make sure we return an errorcode when the ctdb command has hung and is timeodout by the -T <timeout> setting
(This used to be ctdb commit 993f626e603b9bbc02942bb55096d63b9a4f456b)
2008-12-10 11:49:51 +11:00
Martin Schwenke
5dcc100e3e Merge commit 'origin/master' into martins
(This used to be ctdb commit 674d1660e5602f2fab1eaf219a6b8b5ddf24c402)
2008-12-10 11:42:02 +11:00
Martin Schwenke
ad47f61ea6 Merge commit 'origin/master' into martins
(This used to be ctdb commit b5eec91bd185c91a09b3f42ed26fee7b13a70d9d)
2008-12-10 11:32:24 +11:00
Martin Schwenke
5750e97944 Merge commit 'origin/master' into martins
(This used to be ctdb commit 6cbe8923ead8226de1c20cfd8718e43fe8525ce1)
2008-12-10 11:22:59 +11:00
root
762d4be8f9 add a helper that waits until the clueter is no longe rin recovery mode and return the generation number.
change the ban/unban logic to wait until we are not in recovery before it bans/unbans the node.

also wait until after the cluster has recovered from the ban/unban before returning so that the cluster is in recpovery mode == normal when the command returns.  this makes it much easier to script things ...

(This used to be ctdb commit 39c77371a2f995025a584691fe61af12dc6ed5d7)
2008-12-09 12:03:42 +11:00
Martin Schwenke
370cd5e819 Merge commit 'origin/master' into martins
(This used to be ctdb commit 2ecc701869c8bc2d823a8073453c6caf1575dc47)
2008-12-09 11:46:34 +11:00
Martin Schwenke
52c76f25f6 Merge commit 'origin/master' into martins
(This used to be ctdb commit 1b00fe0bac36422d30be167a009c452058975a21)
2008-12-08 17:03:50 +11:00
root
e4722f8ce4 return -1 if ctdb ping failed
(This used to be ctdb commit 691b9c0f1771afa564a5959405f2e7a54c334d45)
2008-12-08 12:57:40 +11:00
Martin Schwenke
2764c2d7be Merge commit 'origin/master' into martins
(This used to be ctdb commit ec354d602d20700e6769deb798436d08256a49d5)
2008-12-08 08:57:46 +11:00
root
e54347fa4e redo and update how we synchronize flags across the cluster.
this simplifies the code and should close a race condition between the local recovery daemon and a remote node when flags are changing.

(This used to be ctdb commit 32d460b8469eb53145f04161a5d01166f9b5f09e)
2008-12-05 16:32:30 +11:00
Martin Schwenke
733fe4594c Merge commit 'origin/master' into martins
(This used to be ctdb commit 4ff5875c965f21ab76a5924efd92f1832aeb36d4)
2008-12-04 14:42:04 +11:00
Ronnie Sahlberg
539f044aa3 print the list of valid debug level literals when an invalid debug level
is specified in 'ctdb setdebug'

(This used to be ctdb commit 979e78cfd96d74686af6f55f726c395a75275803)
2008-12-02 14:08:10 +11:00
Ronnie Sahlberg
edb7241c05 redesign how reloadnodes is implemented.
modify the transport methods to allow to restart individual connections
and set up destructors properly.

only tear down/set-up tcp connections to nodes removed from the cluster
or nodes added to the cluster.
Leave tcp connections to unchanged nodes connected.

make "ctdb reloadnodes" explicitely cause a recovery of the cluster once
the files have been realoaded

(This used to be ctdb commit d1057ed6de7de9f2a64d8fa012c52647e89b515b)
2008-12-02 13:26:30 +11:00
root
7592a97d16 debuglevel is a signed int, not usnigned.
(This used to be ctdb commit e577a276900854622f4e9da9d1ccd7b484d0d1ec)
2008-11-28 11:29:43 +11:00
Ronnie Sahlberg
51cc8b4df8 make it possible to delete an ip from all nodes at once using
"ctdb delip x.x.x.x -n all"

This is not as straightforward as one might think since during the
delete process we don not want the ip to be bouncing from one node to
another as node by node deletes it.

Thus we first delete the ip from all connected nodes which are not
currently hosting it.

After this we delete the ip from the node which is hosting it.

(This used to be ctdb commit bbd46f341e9aa32d8dbd49f7a9a07cb3f1f92ea3)
2008-11-28 09:52:26 +11:00
Martin Schwenke
bc3a6b20c5 Merge commit 'origin/master' into martins
(This used to be ctdb commit e088116238eb107e9831fccbfd66c1db3d837a3b)
2008-11-21 13:00:37 +11:00