IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
The code for deadlock detection and killing smbd process causing deadlock
has been removed and replaced with external debug script.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 2211cd94bea266547d3e6f167d3160a6b23bec88)
Send the ctdb_db_statistics directly instead of first copying it to
duplicate ctdb_db_statistics_wire structure. This simplifies the
implementation of the control to get database statistics.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 26a4653df594d351ca0dc1bd5f5b2f5b0eb0a9a5)
The log messages in verify_remote_ip_allocation() are confusing
because they don't include the PNN of the problem node, because it is
not known in this function.
Add the PNN of the node being verified as a function argument and then
shuffle the log messages around to make them clearer.
Also fold 3 nested if statements into just one.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit f0942fa01cd422133fc9398f56b4855397d7bc86)
This is like ctdb_fatal() but exits cleanly without dumping core or
generating a backtrace.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit c0a9456692c88a7a5542cd893d8f326524d3f94e)
When this function is called, we are already committed to banning
and there is no point in failing this function. In case, freezing of
databases fails, it will be fixed from recovery daemon.
(This used to be ctdb commit bb178338658b4ae32382a1f62f7c21cee1d4878f)
If this function fails due to memory errors, there is no way to recover.
The best course of action is to abort.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 46efe7a886f8c4c56f19536adc98a73c22db906a)
This adds more serialisation to the startup, ensuring that the
"startup" event runs after everything to do with the first recovery
(including the "recovered" event).
Given that it now takes longer to get to the "startup" state, the
initscript needs to wait until ctdbd gets to "first_recovery".
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit ed6814ff0a59ddbb1c1b3128b505380f60d7aeb7)
Also new client function ctdb_ctrl_get_runstate().
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit dc4220e6f618cc688b3ca8e52bcb3eec6cb55bb1)
This allows states, including startup and shutdown states, to be
clearly tracked. This doesn't include regular runtime "states", which
are handled by node flags.
Introduce new functions ctdb_set_runstate(), runstate_to_string() and
runstate_from_string().
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 8076773a9924dcf8aff16f7d96b2b9ac383ecc28)
These functions were used in locking child process to do the locking. With
locking helper, these are not required.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit c660f33c3eaa1b4a2c4e951c1982979e57374ed4)
These should never be seen outside the IP allocation code.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit e143abd16ccde2e0edfe103673d31a5fb06b6aef)
This really needs to be per-node. The rename is because nodes with
this tunable switched on should drop IPs if they become unhealthy (or
disabled in some other way).
* Add new flag NODE_FLAGS_NOIPHOST, only used in recovery daemon.
* Enhance set_ipflags_internal() and set_ipflags() to setup
NODE_FLAGS_NOIPHOST depending on setting of NoIPHostOnAllDisabled
and/or whether nodes are disabled/inactive.
* Replace can_node_servce_ip() with functions can_node_host_ip() and
can_node_takeover_ip(). These functions are the only ones that need
to look at NODE_FLAGS_NOIPTAKEOVER and NODE_FLAGS_NOIPHOST. They
can make the decision without looking at any other flags due to
previous setup.
* Remove explicit flag checking in IP allocation functions (including
unassign_unsuitable_ips()) and just call can_node_host_ip() and
can_node_takeover_ip() as appropriate.
* Update test code to handle CTDB_SET_NoIPHostOnAllDisabled.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 1308a51f73f2e29ba4dbebb6111d9309a89732cc)
It isn't used, superceded by "ipreallocated".
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit c2bb8596a8af6406ef50e53953884df9d6246a96)
This is an alternative to using ctdb_run_eventscripts() that can be
used when in recovery.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 27a44685f0d7a88804b61a1542bb42adc8f88cb1)
This is used for some checks
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-By: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit c7924ce6404bb18641b00d5fbd2fe9da9aaf7959)
This in preparation of turning the vacuming on the lmaster into
into a two phase process:
- First the node sends the list of records to be vacuumed
to all other nodes with this new RECEIVE_RECORDS control.
The remote nodes should store the lmaster's empty current copy.
- Only those records that could be stored on all other nodes
are processed further. They are send to all other nodes with
the TRY_DELETE_RECORDS control as before for deletion.
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-By: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit e397702e271af38204fd99733bbeba7c1db3a999)
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit 1e989894764e4cd1d551c44784d91cb295cd790d)
It really is internal.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit abb64f62efaa70df4b87c030b96300eafd98e6a3)
Default is not to create a pid file.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit 996e74d3db0c50f91b320af8ab7c43ea6b1136af)
Must be called by all child processes.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit 59b019a97aad9a731f9080ea5be14d0dbdfe03d6)
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 2e92deef5221ee651028ef87138b3113f1fece91)
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit e691df43d20871468142c8fb83f7c7303c4ec307)
When CTDB is busy with lots of smbd, CTDB was spending too much time in
daemon_check_srvids() which searches a list of srvids in the registered
message handlers. Using a hash based index significantly improves the
performance of search in a linked list.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 3e09f25d419635f6dd679b48fa65370f7860be7d)
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit a73bb56991b8c07ed0e9517ffcf0dc264be30487)
Commit a82d3ec12f0fda16d6bfa8442a07595de897c10e broke fetching from
the log ringbuffer. The solution there is still generally good: there
is no need to keep the ringbuffer in children created by
ctdb_fork()... except for those special children that are created to
fetch data from the ringbuffer!
Introduce a new function ctdb_fork_no_free_ringbuffer() that does
everything ctdb_fork() needs to do except free the ringbuffer (i.e. it
is the old ctdb_fork() function). The new ctdb_fork() function just
calls that function and then frees the ringbuffer in the child.
This means all callers of ctdb_fork() have the convenience of having
the ringbuffer freed. There are 3 special cases:
* Forking the recovery daemon. We want to be able to fetch from the
ringbuffer there.
* The ringbuffer fetching code. Change the 2 calls in this code (main
daemon, recovery daemon) to call ctdb_fork_no_free_ringbuffer()
instead.
While we're here, clear the log ringbuffer when the recovery deamon is
forked, since it will contain a copy of the messages from the main
daemon.
Note to self: always test... even the most obvious patches... ;-)
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 00db5fa00474f8a83f1aa3b603fd756cc9b49ff4)
Use an environment variable instead. This just means that the
initscript exports CTDB_DEBUG_HUNG_SCRIPT and the code checks for the
environment variable.
The justification for this simplification is that more debug options
will be arriving soon and we want to handle them consistently without
needing to add a command-line option for each. So, the convention
will be to use an environment variable for each debug option.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 0581f9a84e58764d194f4e04064c2c5b393c348b)
The only allocation against this context is by
ctdb_fork_with_logging(). This memory is freed by ctdb_log_handler()
anyway. There should be no memory leak.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 501461cc3e132d4adee9e91b5d4513a26bae2846)
When CTDB is shut down and monitoring has been stopped, monitor_context
gets freed and all the callback states hanging off it. This includes
callback state for current_monitor, if the current monitor event has
not yet finished. As a result, when the shutdown event is called,
current_monitor->callback state is not NULL, but it's actually freed
and it's a dangling reference.
So before executing callback function and freeing callback state check
if ctdb->monitor->monitor_context is not NULL.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 7d8546ee4353851f0543d0ca2c4c67cb0cc75aea)
When CTDB is shutting down, recovery daemon is stopped, but the
event that checks if recovery daemon is still alive is not destroyed.
So recovery master is restarted during shutdown if CTDB daemon takes
longer to shutdown.
There are two processes that check if recovery daemon is working.
1. ctdb_check_recd() - which checks every 30 seconds if the recovery
daemon process exists.
2. ctdb_recd_ping_timeout() - which is triggered when recovery daemon
fails to ping CTDB daemon.
Both the events are periodic and need to be destroyed when shutting down.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 746168df2e691058e601016110fae818c6a265c3)
This effectively reverts d96cb02c2c24f9eabbc53d3d38e90dea49cff3e0
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 496387a585b2c5778c808cf02b8e1435abde4c3e)
Samba versions 3.6.x and older do not set the database priority.
This can cause deadlock between Samba and CTDB since the locking order
of database will be different. A hack was added for automatic promotion
of priority for specific databases to avoid deadlock. This code should
not be invoked with Samba version 4.x which correctly specifies the
priority for each database.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit 4a9e96ad3d8fc46da1cd44cd82309c1b54301eb7)
This fixes a bug where wrong variable is checked.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit f81e9add466b1d9b2796c09c6ba63b77296ea149)
RECLOCK is for recovery lock in CTDB. Do not override the meaning for
tracking locks on databases. Database lock latency has nothing to do
with recovery lock latency.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 54e24a151d2163954e5a2a1c0f41a2b5c19ae44b)
If any of the nodes fail takeover run (either due to timeout or failure
to complete within takeover_timeout interval) from main loop, recovery
master will give up trying takeover run with following message:
"Unable to setup public takeover addresses. Try again later"
And as a side-effect the monitoring is disabled on all the nodes. Before
ctdb_takeover_run() is called from main loop, monitoring get disabled via
startrecovery event. Since ctdb_takeover_run() fails, it never runs
recovered event and monitoring does not get re-enabled.
In main_loop, ctdb_takeover_run() is called with a takeover_fail_callback.
This callback will get called if any of the nodes fail in handling
takeip/releaseip/ipreallocated events in ctdb_takeover_run().
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit a5c6bb1fffb8dc3960af113957a1fd080cc7c245)
These support getting and clearing logs from the ring-buffer in the
recovery daemon.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit cbca233d1e03b2410e0bb63b936328d4a8b3c7b4)
When building samba with CTDB, if samba configure/waf does not support
setting of SOCKPATH, fallback to /tmp/ctdb.socket.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit a9511cf5ecd5bc39b0070f0afa8ac4d4926c6cab)
The ctdb socket path currently defaults to /tmp/ctdb.socket and can be
modified at runtime using the --socket=filename option, common to both
ctdb and ctdbd binaries.
This change allows the default path to be set at configure time using
the --with-socketpath=FILE argument. When not specified, the default
path remains /tmp/ctdb.socket, documentation remains unchanged as a
result.
Signed-off-by: David Disseldorp <ddiss@samba.org>
(This used to be ctdb commit f92b9c83a2f39fba9a141417a88de96fc8c592ff)
This introduces a consistent API for handling locks on single record, complete
db or all dbs. The locks are taken out in a child process. In cases of timeout,
find the processes that currently hold the lock and log.
Callback functions for locking requests take locked boolean to indicate
whether the lock was successfully obtained or not.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 1af99cf0de9919dd89af1feab6d1bd18b95d82ff)
Currently these functions are implemented only for Linux.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit be4051326b0c6a0fd301561af10fd15a0e90023b)
We've seen this function report "Unknown family, 0" and then CTDB
disappeared without a trace. If we can reproduce it then this might
help us to debug it.
The idea is that you do something like the following in /etc/sysconfig/ctdb:
export CTDB_EXTERNAL_TRACE="/etc/ctdb/config/gcore_trace.sh"
When we hit this error than we call out to gcore to get a core file so
we can do forensics. This might block CTDB for a few seconds.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 7895bc003f087ab2f3181df3c464386f59bfcc39)
There's a race here where release and takeover events for an IP can
run at the same time. For example, a "ctdb deleteip" and a takeover
initiated by the recovery daemon. The timeline is as follows:
1. The release code registers a callback to update the VNN. The
callback is executed *after* the eventscripts run the releaseip
event.
2. The release code calls the eventscripts for the releaseip event,
removing IP from its interface.
The takeover code "updates" the VNN saying that IP is on some
iface.... even if/though the address is already there.
3. The release callback runs, removing the iface associated with IP in
the VNN.
The takeover code calls the eventscripts for the takeip event,
adding IP to an interface.
As a result, CTDB doesn't think it should be hosting IP but IP is on
an interface. The recovery daemon fixes this later... but it
shouldn't happen.
This patch can cause some additional noise in the logs:
Release of IP 10.0.2.133/24 on interface eth2 node:2
recoverd:We are still serving a public address '10.0.2.133' that we should not be serving. Removing it.
Release of IP 10.0.2.133/24 rejected update for this IP already in flight
recoverd:client/ctdb_client.c:2455 ctdb_control for release_ip failed
recoverd:Failed to release local ip address
In this case the node has started releasing an IP when the recovery
daemon notices the addresses is still hosted and initiates another
release. This noise is harmless but annoying.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit bfe16cf69bf2eee93c0d831f76d88bba0c2b96c2)
Stops the behaviour where unhealthy nodes can host IPs when there are
no healthy nodes. Set this to 1 when an immediate complete outage is
preferred when all nodes are unhealthy. The alternative
(i.e. default) can lead to undefined behaviour when the shared
filesystem is unavailable.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit a555940fb5c914b7581667a05153256ad7d17774)
With an old ctdb_protocol.h installed under /usr/local, ctdb will
not compile because the <> form of include will find the header
under /usr/local
(This used to be ctdb commit c4f5a58471b206e2287c7958c7f29c1f1c0626ac)