IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
The valgrind start case should not use daemon, since this is specific
to Red Hat.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 867f57d166395c92949e480ca725249b0ca8950b)
Use a local variable $ctdbd so that we always run ctdbd from the the
same place and so that we know what to kill. This variable respects
the $CTDBD environment variable, which may be used to specify an
alternative location for the daemon.
In the important cases use "pkill -0 -f" to check if ctdbd is
running. Also, remove the special case for killing ctdbd when running
under valgrind. The regular case will handle this just fine.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 070305adfe636c2580776e6bf24bb8be06622b86)
Glitches during restarts of the CTDB cluster have been causing some
tests to fail. This is because restarts are initiated in the body of
many tests. This adds a simple function ctdb_restart_when_done, which
schedules a restart using an existing hook in the test exit code.
This function is now used in tests that need to restart CTDB.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit fc69b6a66282d5be6edeb286bf72aeafb252e6dd)
Commit a0f5148ac749758e2dfbd6099e829c5bf1d900e6 caused a subtle
regression. Due to the subtlety, this description is much longer than
the 1 line patch that fixes it! The regression, where a process that
invokes onnode is unexpectedly blocked, is only apparent if the
following conditions are met:
1. $CTDB_NODES_SOCKETS is set;
2. The command passed to onnode attempts to background a process; and
3. onnode is run in certain types of subshell (e.g. foo=$(onnode ...)).
In particular, when testing against local daemons (i.e. condition (1)
is met), tests/simple/07_ctdb_process_exists.sh would fail (because it
does both (2), (3)).
The problem is caused by the use of file descriptor 3 in the code that
allows separate filtering of stdout and stderr. A backgrounded
process will have this descriptor open and the $(...) construct
appears to wait for all file descriptors to be closed. This only
happens with local daemons because SSH is replaced by a shell and file
descriptor 3 leaks into that shell. It does not occur when SSH is
used because the file descriptor does not leak into the remote shell
where the process is backgrounded.
The fix is simply to redirect file descriptor 3 to /dev/null in the
fakessh function, which is used when $CTDB_NODES_SOCKETS is set.
Also fixed is another minor bug when the -o option and
$CTDB_NODES_SOCKETS are used in combination. The code uses the node
name as a suffix for the output filename(s). Usually this is an IP
address. However, when $CTDB_NODES_SOCKETS is in use the node name is
the socket name, which might be a path several directories deep.
Each output file is created via a simple redirection and this would
fail if unexpected directories appear in the filename. 3 possible
fixes were considered:
1. Replace all '/'s in the node name by '_'s. Nice and simple.
2. Use the basename of the node name. However, sockets may be in
different directories but have the same basename.
3. Create all required directories before redirecting. This is a
little more complex and probably doesn't meet the user's
expectations.
Option (1) is implemented here.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 5d320099025b6835eda3a1e431708f7e0a6b0ba6)
This allows us to timeout the operation if the underlying filesystem has become temporarily unresponsive without causing a new recovery.
(This used to be ctdb commit d177b08f1dc79534491f27726b05405d47e12e20)
Commit a0f5148ac749758e2dfbd6099e829c5bf1d900e6 caused a subtle
regression. Due to the subtlety, this description is much longer than
the 1 line patch that fixes it! The regression, where a process that
invokes onnode is unexpectedly blocked, is only apparent if the
following conditions are met:
1. $CTDB_NODES_SOCKETS is set;
2. The command passed to onnode attempts to background a process; and
3. onnode is run in certain types of subshell (e.g. foo=$(onnode ...)).
In particular, when testing against local daemons (i.e. condition (1)
is met), tests/simple/07_ctdb_process_exists.sh would fail (because it
does both (2), (3)).
The problem is caused by the use of file descriptor 3 in the code that
allows separate filtering of stdout and stderr. A backgrounded
process will have this descriptor open and the $(...) construct
appears to wait for all file descriptors to be closed. This only
happens with local daemons because SSH is replaced by a shell and file
descriptor 3 leaks into that shell. It does not occur when SSH is
used because the file descriptor does not leak into the remote shell
where the process is backgrounded.
The fix is simply to redirect file descriptor 3 to /dev/null in the
fakessh function, which is used when $CTDB_NODES_SOCKETS is set.
Also fixed is another minor bug when the -o option and
$CTDB_NODES_SOCKETS are used in combination. The code uses the node
name as a suffix for the output filename(s). Usually this is an IP
address. However, when $CTDB_NODES_SOCKETS is in use the node name is
the socket name, which might be a path several directories deep.
Each output file is created via a simple redirection and this would
fail if unexpected directories appear in the filename. 3 possible
fixes were considered:
1. Replace all '/'s in the node name by '_'s. Nice and simple.
2. Use the basename of the node name. However, sockets may be in
different directories but have the same basename.
3. Create all required directories before redirecting. This is a
little more complex and probably doesn't meet the user's
expectations.
Option (1) is implemented here.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit c97d56d93d9c1007a4e85affb19ed0c2d0e11b6d)
Glitches during restarts of the CTDB cluster have been causing some
tests to fail. This is because restarts are initiated in the body of
many tests. This adds a simple function ctdb_restart_when_done, which
schedules a restart using an existing hook in the test exit code.
This function is now used in tests that need to restart CTDB.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit d440e83bb4f0c19c085915d0f0e87cc0dabbc569)
New tests/complex/ subdirectory contains 2 new tests to ensure that
NFS and CIFS connections are tracked by CTDB and that tickle resets
are sent when a node is disabled.
Changes to ctdb_test_functions.bash to support these tests.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 5d188af387a2a1d68d66f47edb7a9ca546ed357c)
The threshold for the difference in the number messages sent in either
direction around the ring of nodes was set to 2%. Something
environmental is causing this different to sometimes be as high as 3%.
We're confident it isn't a CTDB issue so we're increasing the
threshold to 5%.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit be3e23c9fcb9c716e492af102830a4f6ad8bda7b)
New tests/complex/ subdirectory contains 2 new tests to ensure that
NFS and CIFS connections are tracked by CTDB and that tickle resets
are sent when a node is disabled.
Changes to ctdb_test_functions.bash to support these tests.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 31cc46eb157ca1301312f14879e4fb4da7d81088)
Limit the allowable difference in message counts in either direction
around the ring to 5% (up from 2%). There is something environmental
making this blow out to 3% very occasionally when there's no obvious
problem with ctdb.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit d6e6909ac629212b3028e13b958e1a17c64bee8c)
in this case, read the nodes file directly instead of asking the local daemon for the list.
add an option -Y to provide machinereadable output to listnodes
(This used to be ctdb commit 4a55cacc4f5526abd2124460b669e633deeda408)
AIX dont have getopt.h by default.
Dont try including this file when building on AIX
(This used to be ctdb commit 06b33a826e71e1dd2f9e02ad614be55535d42045)
* Move building of CTDB_OPTIONS to new function build_ctdb_options()
and have it use a helper function for readability.
* New functions check_persistent_databases() and set_ctdb_variables().
* Remove valgrind-specific stop code, since the general pkill should
kill ctdbd when running under valgrind.
* Remove some bash-isms (e.g. >& /dev/null) since the script is /bin/sh.
* Make indentation consistent.
* Minor clean-ups.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 951dbcb29fd53cf51a08958efe185db4954d24f3)
The valgrind start case should not use daemon, since this is specific
to Red Hat.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 1ea6af7007fe3b5a48d48440a0924c71d7a6000a)
Use a local variable $ctdbd so that we always run ctdbd from the the
same place and so that we know what to kill. This variable respects
the $CTDBD environment variable, which may be used to specify an
alternative location for the daemon.
In the important cases use "pkill -0 -f" to check if ctdbd is
running. Also, remove the special case for killing ctdbd when running
under valgrind. The regular case will handle this just fine.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit ee5d49324155e3e51371f6f8e5ed9eef4179f08d)
This is used to mark nodes as being DELETED internally in ctdb
so that nodes are not renumbered if / when they are removed from the nodes file.
This is used to be able to do "ctdb reloadnodes" at runtime without
causing nodes to be renumbered.
To do this, instead of deleting a node from the nodes file, just comment it out like
1.0.0.1
#1.0.0.2
1.0.0.3
After removing 1.0.0.2 from the cluster, the remaining nodes retain their
pnn's from prior to the deletion, namely 0 and 2
Any line in the nodes file that is commented out represents a DELETED pnn
(This used to be ctdb commit 6a5e4fd7fa391206b463bb4e976502f3ac5bd343)
Make the default be that transaction is not allowed and any attempt to create a nested transaction will fail with TDB_ERR_NESTING.
If an application can cope with transaction nesting and the implicit
semantics of tdb_transaction_commit(), it can enable transaction nesting
by using the TDB_ALLOW_NESTING flag.
(This used to be ctdb commit 3e49e41c21eb8c53084aa8cc7fd3557bdd8eb7b6)
Add a helper function that checks whether a unix domain socket exists
and there is a daemon LISTENING to it similar to the existing function
to check for a daemon LISTENING to a tcp/ip socket.
(This used to be ctdb commit 025a836ab3be3c078fccd8c10b10dfffbfdd94d0)
Log this in "ctdb statistics".
Also add a varaible "RecLockLatencyMs" that will log an error everytime it takes longer than this to access the reclock file.
(This used to be ctdb commit 042377ed803bb8f7ca9d6ea1a387427b7b8ba45a)
negative. ctdb_bench.c checks to ensure the timer has advanced from 0
before dividing.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 723413f246399b25166462d2018237920515655f)
simple/12_ctdb_getdebug.sh now recognises output with multi-digit node
numbers.
Sharing the ctdb directory via NFS and testing on a real cluster by
setting CTDB_TEST_REAL_CLUSTER didn't work by default. The fix is to
hack scripts/test_wrap so that it tries to find a valid bin directory
next to the directory containing it is in.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit ea2ca769e1d1068fbbad843750b19acfd87360e0)
They both need to use a -Y option to ctdb and for natgwlist we only
want the 1st line.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit e781ff61e17d733349021bb036514f823c7cbfbb)
Some code re-factoring to implement this and to make it easy to
implement new ones. New simpler implementation of echo_nth() no
longer uses deleted get_nth() function.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 29559f5dd099bec210e98909c9b2e048461b7c81)
RHEL5 can SIGKILL httpd when stopping it, causing it to leak
semaphores. This means that eventually a node runs out of semaphores
and httpd can't be started. So, before we attempt to start httpd we
clean up any semaphores owned by apache. We also try to restart httpd
in the monitor event if httpd has gone away.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 2d3fbbbb63f443686f9fec42c0bc2058d115806e)
When a client (such as smbstatus) is killed, it may have outstanding
traverse children on remote nodes. We need to catch the client
disconnect in ctdbd and send a control to all nodes telling them to
kill those outstanding traverse children.
(This used to be ctdb commit f2fb2df4619a14f7f6c11f9132ee7d793028042c)
Otherwise there is the chance that we will reset the statistics after the counter has been incremented (client connects) to zero and when the client disconnects we decrement it to a negative number.
this is a pure cosmetic patch with no operational impact to ctdb
(This used to be ctdb commit 72f1c696ee77899f7973878f2568a60d199d4fea)
this now defaults to 60 seconds
This is useful if a split brain occurs due to network partitioning since it will make sure that the "other half" of the cluster that does not contain the recovery master will eventually release all ips and thus avoiding a duplicate ip situation for the public addresses
(This used to be ctdb commit 70f21428c9eec96bcc787be191e7478ad68956dc)
We want ctdb to shutdown first, as it manages many other
services. With the old level of 32 the NFS service would shutdown
first, and that would trigger ctdb to do a recovery. Then ctdb itself
would be shutdown a few seconds later, which causes a lot of error
messages in the other nodes logs
(This used to be ctdb commit 2f952af1a12e81a652ec9a4794db96f9593f2676)