IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
This groups function prototypes for common client/server functions in
common/common.h and removes them from ctdb_private.h.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Instead of includes.h, include the required header files explicitly.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
This groups function prototypes for system specific functions in
common/system.h and removes them from ctdb_private.h.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Just use ctdb_tcp_connection. It is the same. There are no external
users.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Volker Lendecke <vl@samba.org>
If a control fails and error message is set, the returned status of the
control is always set to -1 ignoring the status passed by the daemon.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
This has not been used for a long time.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Fri Jul 10 23:41:18 CEST 2015 on sn-devel-104
==20761== Invalid read of size 8
==20761== at 0x11BE30: ctdb_ctrl_dbstatistics (ctdb_client.c:1286)
==20761== by 0x12BA89: control_dbstatistics (ctdb.c:713)
==20761== by 0x1312E0: main (ctdb.c:6543)
==20761== Address 0x713b0d0 is 0 bytes after a block of size 560 alloc'd
==20761== at 0x4C27A2E: malloc (vg_replace_malloc.c:270)
==20761== by 0x5CB0954: _talloc_memdup (talloc.c:615)
==20761== by 0x11395C: ctdb_control_recv (ctdb_client.c:1146)
==20761== by 0x11BDD7: ctdb_ctrl_dbstatistics (ctdb_client.c:1265)
==20761== by 0x12BA89: control_dbstatistics (ctdb.c:713)
==20761== by 0x1312E0: main (ctdb.c:6543)
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
ctdb_get_capabilities() gets capabilities from all connected nodes
into an array. ctdb_get_node_capabilities() gets capabilities for a
particular node from array. ctdb_node_has_capabilities() returns true
if given node has all of the given capabilities.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This is like CTDB_CONTROL_GET_NODEMAP but it loads from the nodes file
instead of the daemon.
Also new client function ctdb_ctrl_getnodesfile()
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
server_id_get with the next patch will be a global parsing function.
I've decided to rename this here in ctdb, as it's only a static function
in ctdb_client.c and apparently not intended for wider use. Please speak
up if you don't like this :-)
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
This makes it consistent with Samba, to ease transition.
Update unit test code to link to with tdb_wrap instead of including
db_wrap.c.
There are some potential whitespace fixes in this commit that have
been ignored. CTDB's lib/tdb_wrap will be deleted after the
transition to Samba's lib/tdb_wrap, so there's no point polishing it
too much.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Otherwise it conflicts with the same function provided by Samba's
lib/util. Using the Samba one drags in too many dependencies, so
rename the CTDB version to avoid a declaration clash when CTDB starts
including samba_util.h.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
To enable TDB mutex support, set tunable TDBMutexEnabled=1.
When databases are attached for the first time, attach flags must include
TDB_MUTEX_LOCKING and TDBMutexEnabled must set to enable mutex support.
However, when CTDB attaches databases internally for recovery, it will
enable mutex support if TDBMutexEnabled is set.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Wed Jul 9 06:45:17 CEST 2014 on sn-devel-104
This duplicates ctdb->ctdbd_pid.
Thanks to Sumit Bose <sbose@redhat.com> for the suggestion.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
When readonly delegations were added, ctdb_fetch_lock code should have
been modified to include the check for readonly flags.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
This fixes the issue of transaction commit failing due to an empty
__db_sequence_number__ record in persistent database left by previous
cancelled transaction.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
This makes the behaviour of g_lock_lock() similar to that implemented in
Samba. Now ctdb_transaction_start() will return NULL only when there are
failures and not when another transaction is active.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 59489019ad15a5ad6b0f295e742fc9832745a842)
Implementing persistent trasnaction code from Samba.
Persistent transaction code was reimplemented in Samba using g_lock.tdb
to hold transaction locks and using TRANS3_COMMIT control.
Implementation details:
1. When starting a transaction, create a record with "transaction-<dbid>"
as key and store current server_id in the structure.
2. If a record already exists, some other client has already started a
transaction. Verify that the process corresponding to server_id stored
in the record really exists or it's a stale record and overwrite it.
3. All modifications to the actual persistent database are stored in a
marshal buffer.
4. When transaction is committed, read the sequence number of the
persistent database and increment it. Sequence number record is also
stored in the marshal buffer.
5. Send the changed records (marshal buffer) in TRANS3_COMMIT control
to all the active nodes.
6. If all controls succeed, verify that the sequence number has been
incremented. Commit is successful. If any of the controls fail,
abort the transaction.
7. In case sequence number has not yet been incremented, then database
recovery has been triggered. So repeat from step 5.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 4e0f1971792c9431d8d51dc57d54ecc9e4576dd5)
server_id records are stored in g_lock.tdb for persistent transactions.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 55f91ea4373c54ddb5faad87fa2826d86a4b6172)
This reverts commit 035c0d981bde8c0eee8b3f24ba8e2dc817e5b504.
This is a premature optimization. Record can bounce between nodes
very quickly if it is a contended record. There is no need to hold a
record on a node unnecessarily. In case record contention becomes bad,
enabling sticky records on a database is a better idea.
Conflicts:
include/ctdb_private.h
server/ctdb_tunables.c
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit ac417b0003f0116f116834ad2ac51482d25cfa0d)
When creating missing databases either locally or remotely, recovery
master calls ctdb_ctrl_createdb(). Recovery master always passes 0
for tdb_flags. For volatile databases, if TDB_INCOMPATIBLE_HASH is not
specified, then they will be attached without using jenkins hash causing
database corruption.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 2fc6b6403707a292d134140fc0b9145b454992c5)
This reverts commit 10a057d8e15c8c18e540598a940d3548c731b0b4.
This approach would not work when creating local databases since currently
there is no control to receive TDB flags for remote databases.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit ca61eb776ab862bd269e45ee0f9f96e7e1e0e001)
When creating missing databases either locally or remotely, make sure
to use the correct tdb flags from other nodes. Without this, volatile
databases can get attached without TDB_INCOMPATIBLE_HASH flag.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 10a057d8e15c8c18e540598a940d3548c731b0b4)
Otherwise there is no way of treating a timeout differently to a
general failure.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 40e34773b8063196457746ffe7a048eb87d96d61)
Otherwise callers can't tell the difference between some other failure
(e.g. memory allocation failure) and an unknown tunable.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 03fd90d41f9cd9b8c42dc6b8b8d46ae19101a544)
Also new client function ctdb_ctrl_get_runstate().
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit dc4220e6f618cc688b3ca8e52bcb3eec6cb55bb1)
This was apparently not used before in this context, and the bug hence
not detected. It becomes necessary when ctdb_local_schedule_for_deletion()
is called from a client ctdbd (the vacuuming child), hence needs to send
the SCHEDULE_FOR_DELETION control to its parent.
Pair-Programmed-With: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-By: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit e72a5e11845fe445baaee4730bb0bea8588ee9e3)
If the socket is set non-blocking before connect, then we should catch
EAGAIN errors and retry. Instead of adding a random number of retries,
better to wait for connect to succeed and then set the socket to
non-blocking.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 524ec206e6a5e8b11723f4d8d1251ed5d84063b0)
This reverts commit dc0c58547cd4b20a8e2cd21f3c8363f34fd03e75.
There is a simpler solution that retrying random number of times. Do not set
socket non-blocking till connect succeeds.
(This used to be ctdb commit 74acc2c568300ef42740cf11299a1b2507047f60)
This reduces repetition.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit f505020a5720faa4ecc6414e0bfaa6b3c0e47291)
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit a73bb56991b8c07ed0e9517ffcf0dc264be30487)
Do some other hosuekeeping including stopping tevent.
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 212298279557a2833ef0f81809b4a5cdac72ca02)
This can improve performance and stop clients from having to chase a rapidly migrating/bouncing record
(This used to be ctdb commit d0d98f7e45e5084b81335b004d50bddc80cdc219)
This can improve performance slightly on certain workloads where smbds frequently read from the same record
(This used to be ctdb commit 035c0d981bde8c0eee8b3f24ba8e2dc817e5b504)
there are some child processes where we do not create a connection to the main daemon (switch_from_server_to_client()) because it is expensive to set up and we normally might not need to talk to the daemon at all via a domainsocket.
but we might want to still call to ctdb_ltdb_store() from such chil processes.
(This used to be ctdb commit 9e372a08c40087e6b5335aa298e94d88273566a5)
By this, the original CTDB_CONTROL_TRAVERSE_START control that is
used by e.g. samba's smbstatus, is not changed, so that samba
continues working without code change.
The CTDB_CONTROL_TRAVERSE_START currently just adds the "withemptyrecords"
flag to the state and processon on as CTDB_CONTROL_TRAVERSE_START_EXT.
(This used to be ctdb commit 8281bb210858ed04992eacea7f6d02261e0fc1b1)
This will be useful for also printing information about empty/deleted
records in "ctdb catdb", e.g. for debugging vacuuming issues.
(This used to be ctdb commit ddc5da3a0df7701934404192a0a0aa659a806acb)
Rename parameter script_status to scripts to avoid shadowing a global
function with the same name in eventscript.c.
This is in the context of wanting to run CCAN-style tests where most
of the ctdbd code is just included in the test program.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 668358057c1e6b9bbad7209212f9135c5e6241a0)
This avoids a name clash with a slightly different function in
ctdb_control.c.
This is in the context of wanting to run CCAN-style tests where most
of the ctdbd code is just included in the test program.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 82f6108bfbc7e48ca88650297c6a1c6ede0e1c9c)
Move identical copies of ctdb_null_func(), ctdb_fetch_func(),
ctdb_fetch_with_header_func() from ctdb_client.c and
ctdb_ltdb_server.c to somewhere common.
This is in the context of wanting to run CCAN-style tests where most
of the ctdbd code is just included in the test program.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 126cb0d369b2b1aed63801dc4ba0554399e8b7e4)
In a few places functions are called, the return code is assigned into
a variable but it is not checked. This generates a compiler warning
like this:
warning: variable ‘ret’ set but not used [-Wunused-but-set-variable]
Instead we remove the warning by checking the return code variable and
log a warning at DEBUG level if the return code indicates an error.
The justification is that there may have been a future intent to check
the return code but it hasn't been important enough to follow-up. If
it matters, it will be logged for easy debugging.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 1932466c76de2b184c2a257120768ab8c9d6c12a)
Remove the code in the example client code that writes the record to the local tdb.
Add code to the local ctdbd processing of replies to check if this reply contain a ro delegation and if so, spawn a child process to lock the tdb and then write the data.
(This used to be ctdb commit bf1d429227dc4f5818263cc39401d0a22663cdba)
Attempt to reconnect to ctdbd on fetch while it is unreachable.
We must provide our own queue callback wrapper, as ctdb_client_read_cb()
exits on transport failure.
(This used to be ctdb commit 28df6fbf1273b8d095a2bc38dca6a6c35c5c31bd)
let all databases default to not support this until enabled through this control
(This used to be ctdb commit 908a07c42e5135a3ba30a625fc4f4e4916de197a)
This function differs from the old FETCH in that this function will also fetch the record header and not just the record data
(This used to be ctdb commit c7196d16e8e03bb2a64be164d15a7502300eae0e)
Client connections to the ctdbd unix domain socket may fail
intermittently while the server is under heavy load. This change
introduces a client connect retry loop.
During failure the client will retry for a maximum of 64 seconds, the
ctdb --timelimit option can be used to cap client runtime.
Signed-off-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit dc0c58547cd4b20a8e2cd21f3c8363f34fd03e75)
This concept didnt work out and it is really just as expensive as a full migration
anyway, without the benefit of caching the data for subsequence accesses.
Now, migrate the records immediately on first access.
This will be combined with a "cheap vacuum-lite" for special empty records to
prevent growth of databases.
Later extensions to mimic read-only behaviour of records will include proper shared read-only locking of database records, making the laccessor/lacount read-only access to the data obsolete anyway.
By removing this special case and handling of lacount laccessor makes the codapath where shared read-only locking will be be implemented simpler, and frees up space in the ctdb_ltdb header for use by vacuuming flags as well as read-only locking flags.
(This used to be ctdb commit 155dd1f4885fe142c6f8bd09430f65daf8a17e51)
Revert this patch:
commit 482c302d46e2162d0cf552f8456bc49573ae729d
We may need to use real-time processes for the main daemon and the recovery daemon to handle the cases where systems come under very high loads.
(This used to be ctdb commit 08bef9dcab6e4da15fc783f8624e5ed09aa060b5)
Add a new command "ctdb stats [num]" that prints the [num] most recent statistics intervals collected.
(This used to be ctdb commit e6e16fcd5a45ebd3739a8160c8fb5f44494edb9e)
so set the TALLOC_DEPRECATED sympol to allow use of this call
from ctdb_client.c
(This used to be ctdb commit 3afa5d945a56952a7f211af068d671945de960e5)
This means we can distinguish which child is logging, esp. via syslog where we have no pid.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 68b3761a0874429b90731741f0531f76dcfbb081)
In Samba this is now called "tevent", and while we use the backwards
compatibility wrappers they don't offer EVENT_FD_AUTOCLOSE: that is now
a separate tevent_fd_set_auto_close() function.
This is based on Samba version 7f29f817fa.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 85e5e760cc91eb3157d3a88996ce474491646726)
We've been seeing "Invalid packet of length 0" errors, but we don't know
what is sending them. Add a name for each queue, and print nread.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit e6cf0e8f14f4263fbd8b995418909199924827e9)
We dont want it to wrap almost immediately so that basically all "ctdb ..."
commands log the "Reqid wrap" warning.
(This used to be ctdb commit f26b59d8b96a70baa80ab1bad406ee6a21330b68)
Ronnie and I tracked down a bug which seems to be caused by a node
running so slowly that we timed out the request and reused the request
id before it responded.
The result was that we unlocked the wrong record, leading to the
following:
ctdbd: tdb_unlock: count is 0
ctdbd: tdb_chainunlock failed
smbd[1630912]: [2010/06/08 15:32:28.251716, 0] lib/util_sock.c:1491(get_peer_addr_internal)
ctdbd: Could not find idr:43
ctdbd: server/ctdb_call.c:492 reqid 43 not found
This exact problem is now detected, but in general we want to delay
id reuse as long as possible to make our system more robust.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 9eb9c53ef29f4871ae2fe62fc5cb6145fca89eed)
and print the time startistics was taken and for how long the statistics have been collected to the "ctdb statistics" output.
(This used to be ctdb commit 1bdfe0cd3370a335b960ce1ef97eade93b0cd2fa)
1) It's buggy. Code needs to be carefully written (ie. no busy
loops) to handle running with it, and we fork and run scripts.[1]
2) It makes debugging harder. If ctdbd loops (as has happened recently)
it can be extremely hard to get in and see what's happening. We've already
seen the valgrind hacks.
3) We have seen recent scheduler problems. Perhaps they are unrelated,
but removing this very unusual setup is unlikely to hurt.
4) It doesn't make anything faster. Under all but the most perverse of
circumstances, 99% of the cpu gives the same performance as 100%, and
we will always preempt normal processes anyway.
[1] I made this worse in 0fafdcb8d353 "eventscript: fork() a child for
each script" by removing the switch_from_server_to_client() which
restored it, but even that was only for monitor scripts. Others were
run with RT priority.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 482c302d46e2162d0cf552f8456bc49573ae729d)
The do_setsched was being tested for whether to mmap tdbs: let's make it
explicit. We can also happily move the kill-child eventscript hack under
this flag.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 2ee86cc1f311d7b7504c7b14d142b9c4f6f4b469)
We also no longer return an error before scripts have been run; a special
zero-length data means we have never run the scripts.
"ctdb scriptstatus all" returns all event script results.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 9b90d671581e390e2892d3a68f3ca98d58bef4df)
We're going to allow fetching status of all script runs, so this
name is no longer appropriate.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit f5cb41ecf3fa986b8af243e8546eb3b985cd902a)
The child no longer uses ctdb_ctrl_event_script_init or
ctdb_ctrl_event_script_finished, and the others are redundant: it
doesn't need to tell us it's starting a script when it only runs one.
We move start and stop calls to the parent, and eliminate the RPC
infrastructure altogether.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 391926a87a7af73840f10bb314c0a2f951a0854c)
This unifies code paths and simplifies things: we just hand -ENOEXEC to
ctdb_ctrl_event_script_stop().
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit eadf5e44ef97d7703a7d3bce0e7ea0f21cb11f14)
To cope with timeouts when recoveries and transactions collide.
Maybe 100 is too high.
Michael
(This used to be ctdb commit c23d804165e84bdf95ba960c953c736d361011d7)
So that it is correctly handled by recoveries.
Also explicitly set the dmaster field to the current node's pnn.
Michael
(This used to be ctdb commit 03a5bb727b9db1ba952632f08ceb5355f0df842d)
The gap that remained is between checking whether a transaction commit
is in progress and taking the lock. Now we first take the lock and then
check whether a transaction commit is in progress. If so, we release the
lock, wait for one second and retry.
Michael
(This used to be ctdb commit b95524c08bf12914120cb6c818ecc1c99738fe37)
* add __location__
* wrap overly long line
* print unsigned ints as unsigned (reqid, opcode, destnode)
Michael
(This used to be ctdb commit 6b47ea111867c845974aa2687a658ebca2854816)
In ctdb_transaction_commit(), when the trans2_commit control fails, there
is a race condition in the 1 second sleep between the local transaction_cancel
and the call to ctdb_replay_transaction(): The database is not locked, and
neither is the transaction_lock record. So another client can start and possibly
complete a new transaction in this gap, but only on the same node: The locking
of the transaction_lock record on a different node which involves migration of
the record to the other node has been disabled by introduction of the
transaction_active flag on the db which closes precisely this gap from the start
of the commit until the call to TRANS2_FINISH or TRANS2_ERROR.
But this mechanism does not cover the case where a process on the same node
tries to start a transaction: There is no obstacle to locking the transaction_lock
record because the record does not need to be migrated.
This commit closes this race condition in ctdb_transaction_fetch_start()
by using the new ctdb_ctrl_transaction_active() call to ask the local
ctdb daemon whether it has a transaction running on the database.
If so, the check is repeated until the running transaction is done.
This does introduce an additional call to the local ctdbd when starting
transactions, but it does close the (hopefully) last race condition.
Michael
(This used to be ctdb commit 02ee9dfd3c6b09f5c5172a7e38738c20b7f0aecd)
add a global variable holding the pid of the main daemon.
change the tracking of time() in the event loop to only check/warn when called from the main daemon
(This used to be ctdb commit a10fc51f4c30e85ada6d4b7347b0f9a8ebc76637)
database priorities will be used to control in which order databases are locked during recovery in.
(This used to be ctdb commit 67741c0ee01916d94cace8e9462ef02507e06078)
There are two races in concurrent transactions on a single node.
One in starting a transaction, and one with committing (replaying).
This commit closes the first race by storing the pid in the
transaction-lock record and comparing the own pid against it
as a measure to prevent starting a second transaction when
a second node has come inbetween and changed the pid in the lock
record.
Michael
(This used to be ctdb commit 84e5a55a900b01903b80e23045edfc726d8d77a1)
also check the returned status code in case the _stop() command failed
due to the eventscripts failing.
If this happens, make "ctdb stop" log an error to the console and try
the operation again.
(This used to be ctdb commit 20e82e0c48e07d1012549f5277f1f5a3f4bd10d1)
Log this in "ctdb statistics".
Also add a varaible "RecLockLatencyMs" that will log an error everytime it takes longer than this to access the reclock file.
(This used to be ctdb commit 042377ed803bb8f7ca9d6ea1a387427b7b8ba45a)
this command shows which eventscripts were executed during the last monitoring cycle and the status from each eventscript.
If an eventscript timedout or returned an error we also
show the output from the eventscript.
Example :
[root@rcn1 ctdb-git]# ./bin/ctdb scriptstatus
6 scripts were executed last monitoring cycle
00.ctdb Status:OK Duration:0.021 Mon Mar 23 19:04:32 2009
10.interface Status:OK Duration:0.048 Mon Mar 23 19:04:32 2009
20.multipathd Status:OK Duration:0.011 Mon Mar 23 19:04:33 2009
40.vsftpd Status:OK Duration:0.011 Mon Mar 23 19:04:33 2009
41.httpd Status:OK Duration:0.011 Mon Mar 23 19:04:33 2009
50.samba Status:ERROR Duration:0.057 Mon Mar 23 19:04:33 2009
OUTPUT:ERROR: Samba tcp port 445 is not responding
Add a new helper function "switch_from_server_to_client()" which both
the recovery daemon can use as well as in the child process we start for running the actual eventscripts.
Create several new controls, both for the eventscript child process to inform the master daemon of the current status of the scripts as well as for the ctdb tool to extract this information from the runninc daemon.
(This used to be ctdb commit c98f90ad61c9b1e679116fbed948ddca4111968d)
a ctdb client instance.
use this from the recovery daemon child process to switch to client mode
and connect back to the main daemon
(This used to be ctdb commit 16f31786a031255ab5b3099a0a3c745de973347a)
race between the ctdb tool and the recovery daemon both at once
trying to push flag changes across the cluster.
(This used to be ctdb commit a9a1156ea4e10483a4bf4265b8e9203f0af033aa)
fallback to the old-style ipv4-only controls if the new-style ipv4/ipv6
control fails.
this allows a 1.0.59+ (ipv4/ipv6) ctdb daemon being recmaster to be
compatible with
pre-1.0.59 versions of ctdb that are ipv4 only.
(This used to be ctdb commit 8e912abc2c68f5fe7b06c600ba6fec1a6900127c)
older ipv4-only version of these controls.
We need this so that we are backwardcompatible with old versions of ctdb
and so that we can interoperate with a ipv4-only recmaster during a
rolling upgrade.
(This used to be ctdb commit 6b76c520f97127099bd9fbaa0fa7af1c61947fb7)
we currently only monitor that the dameons are running by kill(0, pid)
and verifying the the domain socket between them is ok.
this is not sufficient since we can have a situation where the recovery
daemon is hung.
this new code monitors that the recovery daemon is operating.
if the recovery hangs, we log this and shut down the main daemon
(This used to be ctdb commit cd69d292292eaab3aac0e9d9fc57cb621597c63c)