IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Also get rid of ctdb_set_event_script_dir(). It creates an
unnecessary copy of something that will be around for the lifetime of
the process.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 21b4d1aba00902f1eee0cbf4f082b0794fd5b738)
This allows ctdb_load_nodes_file() to move to ctdb_server.c and
ctdb_set_nlist() to become static.
Setting ctdb->nodes_file needs to be done early, before the nodes file
is loaded. It is now set from CTDB_BASE instead ETCDIR, so setting
CTDB_BASE also needs to be done earlier.
Unhack ctdbd_test.c - it no longer needs to define
ctdb_load_nodes_file().
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 20e705e63bd3b20837cc3ac92fdcf2a9650ccfc8)
This removes data types and structure elements related to TRANS2
persistent transaction code.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 22a253b7ccf1ff854cddf0b67969dc84d7d6a654)
The current implementation has a few flaws:
* A takeover run is called unconditionally when the timer goes even if
the recovery master role has moved. This means a node other than
the recovery master can incorrectly do a takeover run.
* The rebalancing target nodes are cleared in the setup for a takeover
run, regardless of whether the takeover run succeeds.
* The timer to force a rebalance isn't cleared if another takeover run
occurs before the deadline. Any forced rebalancing will happen in
the first takeover run and when the timer expires some time later
then an unnecessary takeover run will occur.
* If the recovery master role moves then the rebalancing data will
stay on the original node and affect the next takeover run to occur
if the recovery master role should come back to the original node.
Instead, store an array of rebalance target nodes in the recovery
master context. This is passed as an extra argument to
ctdb_takeover_run() each time it is called and is cleared when a
takeover run succeeds. The timer hangs off the array of rebalance
target nodes, which is cleared if the node isn't the recovery master.
This means that it is possible to lose rebalance data if the recovery
master role moves. However, that's a difficult problem to solve. The
best way of approaching it is probably to try to stop the recovery
master role from jumping around unnecesarily when inactive nodes join
the cluster.
The long term solution is to avoid this nonsense completely. The IP
allocation algorithm needs to cache state between runs so that it
knows which nodes have just become healthy. This also needs recovery
master stability.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit c51c1efe5fc7fa668597f2acd435dee16e410fc9)
No need for a separate one for each SRVID.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit d9c22b04d5aa7938a3965bd3144568664eb772ce)
This is an internal structure. It was moved into ctdb_private.h a
long time ago to allow unit testing. Unit test compilation was
changed shortly afterwards to make this unnecessary.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit db57261d7dc264e161659a8c547f44fbd9e88eeb)
This reverts commit 035c0d981bde8c0eee8b3f24ba8e2dc817e5b504.
This is a premature optimization. Record can bounce between nodes
very quickly if it is a contended record. There is no need to hold a
record on a node unnecessarily. In case record contention becomes bad,
enabling sticky records on a database is a better idea.
Conflicts:
include/ctdb_private.h
server/ctdb_tunables.c
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit ac417b0003f0116f116834ad2ac51482d25cfa0d)
The structure cannot be removed without adding support for marshalling keys
for hot records.
This reverts commit 26a4653df594d351ca0dc1bd5f5b2f5b0eb0a9a5.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 023ca2e84f5ed064a288526b9c2bc7e06674dd81)
The code for deadlock detection and killing smbd process causing deadlock
has been removed and replaced with external debug script.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 2211cd94bea266547d3e6f167d3160a6b23bec88)
Send the ctdb_db_statistics directly instead of first copying it to
duplicate ctdb_db_statistics_wire structure. This simplifies the
implementation of the control to get database statistics.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 26a4653df594d351ca0dc1bd5f5b2f5b0eb0a9a5)
The log messages in verify_remote_ip_allocation() are confusing
because they don't include the PNN of the problem node, because it is
not known in this function.
Add the PNN of the node being verified as a function argument and then
shuffle the log messages around to make them clearer.
Also fold 3 nested if statements into just one.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit f0942fa01cd422133fc9398f56b4855397d7bc86)
This is like ctdb_fatal() but exits cleanly without dumping core or
generating a backtrace.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit c0a9456692c88a7a5542cd893d8f326524d3f94e)
When this function is called, we are already committed to banning
and there is no point in failing this function. In case, freezing of
databases fails, it will be fixed from recovery daemon.
(This used to be ctdb commit bb178338658b4ae32382a1f62f7c21cee1d4878f)
If this function fails due to memory errors, there is no way to recover.
The best course of action is to abort.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 46efe7a886f8c4c56f19536adc98a73c22db906a)
This adds more serialisation to the startup, ensuring that the
"startup" event runs after everything to do with the first recovery
(including the "recovered" event).
Given that it now takes longer to get to the "startup" state, the
initscript needs to wait until ctdbd gets to "first_recovery".
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit ed6814ff0a59ddbb1c1b3128b505380f60d7aeb7)
This allows states, including startup and shutdown states, to be
clearly tracked. This doesn't include regular runtime "states", which
are handled by node flags.
Introduce new functions ctdb_set_runstate(), runstate_to_string() and
runstate_from_string().
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 8076773a9924dcf8aff16f7d96b2b9ac383ecc28)
These functions were used in locking child process to do the locking. With
locking helper, these are not required.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit c660f33c3eaa1b4a2c4e951c1982979e57374ed4)
This really needs to be per-node. The rename is because nodes with
this tunable switched on should drop IPs if they become unhealthy (or
disabled in some other way).
* Add new flag NODE_FLAGS_NOIPHOST, only used in recovery daemon.
* Enhance set_ipflags_internal() and set_ipflags() to setup
NODE_FLAGS_NOIPHOST depending on setting of NoIPHostOnAllDisabled
and/or whether nodes are disabled/inactive.
* Replace can_node_servce_ip() with functions can_node_host_ip() and
can_node_takeover_ip(). These functions are the only ones that need
to look at NODE_FLAGS_NOIPTAKEOVER and NODE_FLAGS_NOIPHOST. They
can make the decision without looking at any other flags due to
previous setup.
* Remove explicit flag checking in IP allocation functions (including
unassign_unsuitable_ips()) and just call can_node_host_ip() and
can_node_takeover_ip() as appropriate.
* Update test code to handle CTDB_SET_NoIPHostOnAllDisabled.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 1308a51f73f2e29ba4dbebb6111d9309a89732cc)
It isn't used, superceded by "ipreallocated".
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit c2bb8596a8af6406ef50e53953884df9d6246a96)
This is an alternative to using ctdb_run_eventscripts() that can be
used when in recovery.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 27a44685f0d7a88804b61a1542bb42adc8f88cb1)
This in preparation of turning the vacuming on the lmaster into
into a two phase process:
- First the node sends the list of records to be vacuumed
to all other nodes with this new RECEIVE_RECORDS control.
The remote nodes should store the lmaster's empty current copy.
- Only those records that could be stored on all other nodes
are processed further. They are send to all other nodes with
the TRY_DELETE_RECORDS control as before for deletion.
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-By: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit e397702e271af38204fd99733bbeba7c1db3a999)
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit 1e989894764e4cd1d551c44784d91cb295cd790d)
It really is internal.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit abb64f62efaa70df4b87c030b96300eafd98e6a3)
Default is not to create a pid file.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit 996e74d3db0c50f91b320af8ab7c43ea6b1136af)
Must be called by all child processes.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit 59b019a97aad9a731f9080ea5be14d0dbdfe03d6)
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 2e92deef5221ee651028ef87138b3113f1fece91)
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit e691df43d20871468142c8fb83f7c7303c4ec307)
When CTDB is busy with lots of smbd, CTDB was spending too much time in
daemon_check_srvids() which searches a list of srvids in the registered
message handlers. Using a hash based index significantly improves the
performance of search in a linked list.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 3e09f25d419635f6dd679b48fa65370f7860be7d)
Commit a82d3ec12f0fda16d6bfa8442a07595de897c10e broke fetching from
the log ringbuffer. The solution there is still generally good: there
is no need to keep the ringbuffer in children created by
ctdb_fork()... except for those special children that are created to
fetch data from the ringbuffer!
Introduce a new function ctdb_fork_no_free_ringbuffer() that does
everything ctdb_fork() needs to do except free the ringbuffer (i.e. it
is the old ctdb_fork() function). The new ctdb_fork() function just
calls that function and then frees the ringbuffer in the child.
This means all callers of ctdb_fork() have the convenience of having
the ringbuffer freed. There are 3 special cases:
* Forking the recovery daemon. We want to be able to fetch from the
ringbuffer there.
* The ringbuffer fetching code. Change the 2 calls in this code (main
daemon, recovery daemon) to call ctdb_fork_no_free_ringbuffer()
instead.
While we're here, clear the log ringbuffer when the recovery deamon is
forked, since it will contain a copy of the messages from the main
daemon.
Note to self: always test... even the most obvious patches... ;-)
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 00db5fa00474f8a83f1aa3b603fd756cc9b49ff4)
Use an environment variable instead. This just means that the
initscript exports CTDB_DEBUG_HUNG_SCRIPT and the code checks for the
environment variable.
The justification for this simplification is that more debug options
will be arriving soon and we want to handle them consistently without
needing to add a command-line option for each. So, the convention
will be to use an environment variable for each debug option.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 0581f9a84e58764d194f4e04064c2c5b393c348b)
The only allocation against this context is by
ctdb_fork_with_logging(). This memory is freed by ctdb_log_handler()
anyway. There should be no memory leak.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 501461cc3e132d4adee9e91b5d4513a26bae2846)
When CTDB is shut down and monitoring has been stopped, monitor_context
gets freed and all the callback states hanging off it. This includes
callback state for current_monitor, if the current monitor event has
not yet finished. As a result, when the shutdown event is called,
current_monitor->callback state is not NULL, but it's actually freed
and it's a dangling reference.
So before executing callback function and freeing callback state check
if ctdb->monitor->monitor_context is not NULL.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 7d8546ee4353851f0543d0ca2c4c67cb0cc75aea)
When CTDB is shutting down, recovery daemon is stopped, but the
event that checks if recovery daemon is still alive is not destroyed.
So recovery master is restarted during shutdown if CTDB daemon takes
longer to shutdown.
There are two processes that check if recovery daemon is working.
1. ctdb_check_recd() - which checks every 30 seconds if the recovery
daemon process exists.
2. ctdb_recd_ping_timeout() - which is triggered when recovery daemon
fails to ping CTDB daemon.
Both the events are periodic and need to be destroyed when shutting down.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 746168df2e691058e601016110fae818c6a265c3)
This effectively reverts d96cb02c2c24f9eabbc53d3d38e90dea49cff3e0
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 496387a585b2c5778c808cf02b8e1435abde4c3e)
Samba versions 3.6.x and older do not set the database priority.
This can cause deadlock between Samba and CTDB since the locking order
of database will be different. A hack was added for automatic promotion
of priority for specific databases to avoid deadlock. This code should
not be invoked with Samba version 4.x which correctly specifies the
priority for each database.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit 4a9e96ad3d8fc46da1cd44cd82309c1b54301eb7)
This fixes a bug where wrong variable is checked.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit f81e9add466b1d9b2796c09c6ba63b77296ea149)
RECLOCK is for recovery lock in CTDB. Do not override the meaning for
tracking locks on databases. Database lock latency has nothing to do
with recovery lock latency.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 54e24a151d2163954e5a2a1c0f41a2b5c19ae44b)
If any of the nodes fail takeover run (either due to timeout or failure
to complete within takeover_timeout interval) from main loop, recovery
master will give up trying takeover run with following message:
"Unable to setup public takeover addresses. Try again later"
And as a side-effect the monitoring is disabled on all the nodes. Before
ctdb_takeover_run() is called from main loop, monitoring get disabled via
startrecovery event. Since ctdb_takeover_run() fails, it never runs
recovered event and monitoring does not get re-enabled.
In main_loop, ctdb_takeover_run() is called with a takeover_fail_callback.
This callback will get called if any of the nodes fail in handling
takeip/releaseip/ipreallocated events in ctdb_takeover_run().
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit a5c6bb1fffb8dc3960af113957a1fd080cc7c245)
These support getting and clearing logs from the ring-buffer in the
recovery daemon.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit cbca233d1e03b2410e0bb63b936328d4a8b3c7b4)
This introduces a consistent API for handling locks on single record, complete
db or all dbs. The locks are taken out in a child process. In cases of timeout,
find the processes that currently hold the lock and log.
Callback functions for locking requests take locked boolean to indicate
whether the lock was successfully obtained or not.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 1af99cf0de9919dd89af1feab6d1bd18b95d82ff)
Currently these functions are implemented only for Linux.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit be4051326b0c6a0fd301561af10fd15a0e90023b)
We've seen this function report "Unknown family, 0" and then CTDB
disappeared without a trace. If we can reproduce it then this might
help us to debug it.
The idea is that you do something like the following in /etc/sysconfig/ctdb:
export CTDB_EXTERNAL_TRACE="/etc/ctdb/config/gcore_trace.sh"
When we hit this error than we call out to gcore to get a core file so
we can do forensics. This might block CTDB for a few seconds.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 7895bc003f087ab2f3181df3c464386f59bfcc39)
There's a race here where release and takeover events for an IP can
run at the same time. For example, a "ctdb deleteip" and a takeover
initiated by the recovery daemon. The timeline is as follows:
1. The release code registers a callback to update the VNN. The
callback is executed *after* the eventscripts run the releaseip
event.
2. The release code calls the eventscripts for the releaseip event,
removing IP from its interface.
The takeover code "updates" the VNN saying that IP is on some
iface.... even if/though the address is already there.
3. The release callback runs, removing the iface associated with IP in
the VNN.
The takeover code calls the eventscripts for the takeip event,
adding IP to an interface.
As a result, CTDB doesn't think it should be hosting IP but IP is on
an interface. The recovery daemon fixes this later... but it
shouldn't happen.
This patch can cause some additional noise in the logs:
Release of IP 10.0.2.133/24 on interface eth2 node:2
recoverd:We are still serving a public address '10.0.2.133' that we should not be serving. Removing it.
Release of IP 10.0.2.133/24 rejected update for this IP already in flight
recoverd:client/ctdb_client.c:2455 ctdb_control for release_ip failed
recoverd:Failed to release local ip address
In this case the node has started releasing an IP when the recovery
daemon notices the addresses is still hosted and initiates another
release. This noise is harmless but annoying.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit bfe16cf69bf2eee93c0d831f76d88bba0c2b96c2)
Stops the behaviour where unhealthy nodes can host IPs when there are
no healthy nodes. Set this to 1 when an immediate complete outage is
preferred when all nodes are unhealthy. The alternative
(i.e. default) can lead to undefined behaviour when the shared
filesystem is unavailable.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit a555940fb5c914b7581667a05153256ad7d17774)
Change this to instead preallocate , by default, 10MByte chunks to the data buffer.
This significantly reduces the number of potential reallocate and move operations that may be required.
Create a tunable to override/change how much preallocation should be used.
(This used to be ctdb commit 1f262deaad0818f159f9c68330f7fec121679023)
Add tunables to control when to log these instances and allow it to be completely turned off by setting the threshold to 0
(This used to be ctdb commit 9ed58fef4991725f75509433496f4d5ffae0ae87)
Break this debug and datacollection out into an external script to make it easier to modify what data we need to collect.
For now we only collect a pstree so we can see what part of the script we hung in.
S1037271
(This used to be ctdb commit 6e68797af67bee36f2bad045f94806e7e98f27e9)
Wrap all creation of child processes inside ctdb_fork() which is used to track all processes we have spawned.
Capture SIGCHLD to track also which child processes have terminated.
Wrap kill() inside ctdb_kill() and make sure that we never send a !0 signal to a child process pid that has already terminated (and might have been replaced with a
(This used to be ctdb commit f73a4b1495830bcdd094a93732a89dd53b3c2f78)
and also update the "read public address file" to not check if the address exists already locally when we read if from the child process, to stop it
from spamming the logs with "We already host ..."
messages
(This used to be ctdb commit 334ea830f1bf33419f4a1e78f23afd41a852d0f4)
Also add a method to use the recovery master/daemon to reload the public ips on all nodes in the cluster.
Reloading the public ips on all node sin the cluster is only suported if all nodes in the cluster are available and healthy.
(This used to be ctdb commit 05603e914f8c12618d7e06943c0f7df207f645b0)
The implementation of DisableIPFailover got intermingled with
--nopublicipcheck. This just looks wrong - Ronnie must have been
having a bad day. :-)
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 5083b266dd68b292c4275505f3d1b878dbf12f11)
This can improve performance and stop clients from having to chase a rapidly migrating/bouncing record
(This used to be ctdb commit d0d98f7e45e5084b81335b004d50bddc80cdc219)
This can improve performance slightly on certain workloads where smbds frequently read from the same record
(This used to be ctdb commit 035c0d981bde8c0eee8b3f24ba8e2dc817e5b504)
Instead, tie them together via referencing a permanent linked list hung off the ctdb structure.
(This used to be ctdb commit a95c02da6c67dc4bd8716b75318a4188301df6f9)
there are some child processes where we do not create a connection to the main daemon (switch_from_server_to_client()) because it is expensive to set up and we normally might not need to talk to the daemon at all via a domainsocket.
but we might want to still call to ctdb_ltdb_store() from such chil processes.
(This used to be ctdb commit 9e372a08c40087e6b5335aa298e94d88273566a5)
By this, the original CTDB_CONTROL_TRAVERSE_START control that is
used by e.g. samba's smbstatus, is not changed, so that samba
continues working without code change.
The CTDB_CONTROL_TRAVERSE_START currently just adds the "withemptyrecords"
flag to the state and processon on as CTDB_CONTROL_TRAVERSE_START_EXT.
(This used to be ctdb commit 8281bb210858ed04992eacea7f6d02261e0fc1b1)
Add a new tunable that changes the mode how persistent databases are recovered.
RecoveryPDBBySeqNum
When set to 1, persistent databases will be recovered in whole from the node which
has the highest "__db_sequence_number__" record.
This record is managed by samba for those databases where we do persistent writes and have
inter-record relations.
For these databases we do not want the usual "blend records from all nodes based
on individual record RSN" but instead a mode where we pick one instance of the persistent database.
If no node was found with a "__db_sequence_number__" record at all, we fail back to the original "recover records independently based on record RSN".
Some persistent databases do not contain record interrelations and as such does not
contain this special record at all.
(This used to be ctdb commit 502150c764298a9fa8c4d8aa445bf7d85d4ee9dc)
These were intentionally not static so they could be linked to in unit
test programs. However, using the CCAN-style unit tests where
relevant code is just included, this is no longer necessary.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit d0e9e8554614bd49ffb9ec3509feaa0e80d0f65d)
Move identical copies of ctdb_null_func(), ctdb_fetch_func(),
ctdb_fetch_with_header_func() from ctdb_client.c and
ctdb_ltdb_server.c to somewhere common.
This is in the context of wanting to run CCAN-style tests where most
of the ctdbd code is just included in the test program.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 126cb0d369b2b1aed63801dc4ba0554399e8b7e4)
When multiple clients fetch the same record concurrently, send only one single
fetch across the network and deferr all other fetches locally.
This improves performance for hot records and reduces cpu load on ctdb.
(This used to be ctdb commit 82d6946ad8b3348e8b9d3d971f24925ade02d1be)
Attempt to reconnect to ctdbd on fetch while it is unreachable.
We must provide our own queue callback wrapper, as ctdb_client_read_cb()
exits on transport failure.
(This used to be ctdb commit 28df6fbf1273b8d095a2bc38dca6a6c35c5c31bd)
This will make it much easier to root-cause problems such as
S1029023
when an external application deleted the interface while it is still is in use by ctdbd.
(This used to be ctdb commit 9abf9c919a7e6789695490e2c3de56c21b63fa57)
check that the actual interface exist, print error and fail startup if the interface does not exist.
(This used to be ctdb commit cd33bbe6454b7b0316bdfffbd06c67b29779e873)
When set to 0, clients will not be able to attach to databases
via the db_attach control. This might can be useful for maintenance
where ctdb should be kept running but clients should not be able
to modify databases.
(This used to be ctdb commit ddfeecda87955b4e46777599f678e6926d37f4c4)
let all databases default to not support this until enabled through this control
(This used to be ctdb commit 908a07c42e5135a3ba30a625fc4f4e4916de197a)
This triggers a child process to be created to perform the actual potentially blocking calls that are required.
(This used to be ctdb commit 7d575ee92c95bc4aab78a33bc1aac7ff0811ab3a)
Once the contexts are freed, the deferred calls are re-issued to the input packet processing functions again.
This is needed when/if a CALL can not currently be processed by the main engine due to the record being locked down for revoking of all delegations.
The data is passed through several layers of callbacks, and finally a timed event callback to ensure that the processing of the packet will be restarted again at the topmost eventloop, avoinding event loop nesting.
(This used to be ctdb commit cc6f78efcfa3b8caeffbd68018e6dfbf81488dce)
Assume all databases will support readonly mode for now and se thte flag for all databases. At later stage we will add support to control on a per database level whether delegations will be supported or not.
(This used to be ctdb commit 502f86f79944df4bac9094f716e54110c511dc24)
Move struct ctdb_public_ip_list to ctdb_private.h and put some
definitions for some functions from ctdb_takeover.c there. This
allows those functions to be called from unit tests.
Add ctdb_takeover_tests.c and the Makefile support to build it.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 9d34be0233edf3bc022345c0494c4b2a4d7f8480)
The current non-deterministic IP allocation algorithm balances IPs
across the whole cluster. It does not consider different
interfaces/VLANs/subnets, so these different groups of IPs aren't
generally well balanced.
This adds the LCP2 algorithm for IP allocation and allows it to be
enabled by setting the "LCP2PublicIPs" tunable to 1.
The LCP2 algorithm calculates the imbalance of a node by totalling the
squares of the distances between each IP on the node. The IP distance
is defined as the length longest common prefix (LCP) of bits that is
found when comparing 2 IPs. The imbalance of a cluster is the maximum
imbalance for any node. At each step the algorithm selects an
allocation to the IP/node combination that results in the choosing the
allocation that best reduces the imbalance of the cluster.
The implementation splits out the IP allocation part of
ctdb_takeover_run() into new function ctdb_takeover_run_core(), and
then extracts out the basic IP assignment code into new functions
basic_allocate_unassigned() and basic_failback(). 3 new functions
lcp2_init(), lcp2_allocate_unassigned() and lcp2_failback() implement
the LCP2 algorithm, and are hooked into ctdb_takeover_run_core().
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 61fc7fbd0235469df22deb6581c6bd47e30bc0be)
This is realized by adding a ctdb_ltdb_store_fn function pointer to the db
context and filling it in the attach procedure for non-persistent dbs.
(This used to be ctdb commit df49ec44de80affa5ccc637dec12a20a26e8706e)
This is for the control dispatcher to check whether the input data has
a required minimum size.
(This used to be ctdb commit 2038e745db33cc5c3b4e2db8a00a57ede03906a2)
This will control how many fast-path vacuuming runs wil have to
be done, before a full vacuuming will be triggered, i.e. one with
a db-traversal.
(This used to be ctdb commit 0d997ec7e61a7bee2cb05456f9c7d5e6f7a44797)
This list will be filled by the client using a new
delete control. The list will then be used to implement
a fast-path vacuuming that will traverse this list instead
of traversing the database.
(This used to be ctdb commit 9bbedf786b26bb074f668b31f29a9032af958673)
This function walks all databases and checks for running trans3 commits.
It sends replies to all of them (with error code) and ends them.
To be called when a recovery finishes.
(This used to be ctdb commit 70ba153b532528bdccea70c5ea28972257f384c1)
This concept didnt work out and it is really just as expensive as a full migration
anyway, without the benefit of caching the data for subsequence accesses.
Now, migrate the records immediately on first access.
This will be combined with a "cheap vacuum-lite" for special empty records to
prevent growth of databases.
Later extensions to mimic read-only behaviour of records will include proper shared read-only locking of database records, making the laccessor/lacount read-only access to the data obsolete anyway.
By removing this special case and handling of lacount laccessor makes the codapath where shared read-only locking will be be implemented simpler, and frees up space in the ctdb_ltdb header for use by vacuuming flags as well as read-only locking flags.
(This used to be ctdb commit 155dd1f4885fe142c6f8bd09430f65daf8a17e51)
Add a dlist to track all active lockwait child processes.
Everytime creating a new lockwait handle, check if there is already an
active lockwait process for this database/key and if so,
send the new request straight to the overflow queue.
This means we will only have one active lockwaic child process for a certain key,
even if there were thousands of fetch-lock requests for this key.
When the lockwait processing finishes for the original request, the processing in d_overflow() will automagically process all remaining keys as well.
Add back a --nosetsched argument to make it easier to run under gdb
(This used to be ctdb commit 3e9317a2e1f687b04bf51575d47fcd4faa6e6515)
Once we have more than 200 children waiting on a particular db, don't create
any more. Just put them on an overflow queue, and when a child gets a lock
search that queue to see if others were after the same lock (they probably
were).
(This used to be ctdb commit 5e614e8cfd1e9a4b13035a0e400b7a60a745b510)
scheduler for the child.
Use ctdb_fork() from callers where we dont want the child to be running
at real-time privilege.
(This used to be ctdb commit 58795a4c9e0624e20fa3e0023b65127053edd103)
Revert this patch:
commit 482c302d46e2162d0cf552f8456bc49573ae729d
We may need to use real-time processes for the main daemon and the recovery daemon to handle the cases where systems come under very high loads.
(This used to be ctdb commit 08bef9dcab6e4da15fc783f8624e5ed09aa060b5)
Add a new command "ctdb stats [num]" that prints the [num] most recent statistics intervals collected.
(This used to be ctdb commit e6e16fcd5a45ebd3739a8160c8fb5f44494edb9e)
This function returns a pointer to a nodemap structure.
The returned structure must later be freed by calling ctdb_free_nodemap().
Move the definition of ctdb_sock_addr from ctdb_client.h to ctdb_protocol.h
Move the definition of the node flags, ctdb_node_and_flags and ctdb_node_map from ctdb_private.h to ctdb_protocol.h
Add both sync and async example for ctdb_getnodemap to the test application libctdb/tst.c
(This used to be ctdb commit 31c10eb2b337fd7d8a97a1f9e69b0e7570fec71d)
Add a new "ctdb deltickle" command to delete tickles from the database.
This can ONLY be used for tickles created by "ctdb addtickle".
Push any "addtickle/deltickle" updates to other nodes every TickleUpdateInterval seconds'
(This used to be ctdb commit acded034e2f0dcae4c2c9e54e16a001caf23caec)
There are some reports of freeze timeouts, and it looks like vacuuming might
be the culprit. So we add code to tell them to abort when a freeze is
going on.
(This is based on the 1.0.112 branch version 517f05e42f, but far
simpler since tdb is now robust against processes being killed during
transaction commit)
CQ:S1018154 & S1018349
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit f5d7dc679501e607c2c83a248a89d3cada9df146)
In Samba this is now called "tevent", and while we use the backwards
compatibility wrappers they don't offer EVENT_FD_AUTOCLOSE: that is now
a separate tevent_fd_set_auto_close() function.
This is based on Samba version 7f29f817fa.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 85e5e760cc91eb3157d3a88996ce474491646726)
We've been seeing "Invalid packet of length 0" errors, but we don't know
what is sending them. Add a name for each queue, and print nread.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit e6cf0e8f14f4263fbd8b995418909199924827e9)
The extra recovery interval wait was introduced in 821333afb458 but no
explanation was provided in that message. Nonetheless, if starting
the entire cluster for the first time, it should be safe to skip this.
We use the commandline arg --sloppy-start which should discourage
people from using it outside testing.
Seconds between ctdbd first log message and node healthy:
BEFORE: 16.10
AFTER: 4.03
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 509e2e89ae233a0e91998d95267bf62f296a73cd)
Ronnie and I tracked down a bug which seems to be caused by a node
running so slowly that we timed out the request and reused the request
id before it responded.
The result was that we unlocked the wrong record, leading to the
following:
ctdbd: tdb_unlock: count is 0
ctdbd: tdb_chainunlock failed
smbd[1630912]: [2010/06/08 15:32:28.251716, 0] lib/util_sock.c:1491(get_peer_addr_internal)
ctdbd: Could not find idr:43
ctdbd: server/ctdb_call.c:492 reqid 43 not found
This exact problem is now detected, but in general we want to delay
id reuse as long as possible to make our system more robust.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 9eb9c53ef29f4871ae2fe62fc5cb6145fca89eed)
and print the time startistics was taken and for how long the statistics have been collected to the "ctdb statistics" output.
(This used to be ctdb commit 1bdfe0cd3370a335b960ce1ef97eade93b0cd2fa)
ctdb_client.h is the existing internal client interface (which was mainly
in ctdb.h), and ctdb_protocol.h is the information needed for the wire
protocol only.
ctdb.h will be the new, shiny, libctdb API.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 4bba6b8cd47b352f98d41f9f06258d5ac3c9adef)
This resolves a problem with huge numbers of requests which could overflow
16 bits. Fortunately, the IDR should scale reasonably well, so we can simply
hold all the requests.
Although noone checks for failure, I added a constant for that.
BZ: 60540
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 72efc4122e37798227c3420a65ed1f706ca9ebe7)
verify that all nodes agree on the most recent ip address assignments
broke "ctdb moveip ..." since that call would never trigger
a full takeover run and thus would immediately trigger an inconsistency.
Add a new message to the recovery daemon where we can tell the recovery daemon to update its assignments.
BZ62782
(This used to be ctdb commit e7069082e5f0380dcddee247db8754218ce18cab)
In the case of a timeout, we dump a log of what's happening to a file
in /tmp. We do it from the signal handler, which is an unreliable hack
(BZ58365).
Instead, create another (lower-priority) child to do the dump, then
kill the timedout script.
Note that this doesn't quite work as intended (the dump is often run
after the script has been killed), so the next patch resolves this.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 7ee5ecc8d53e78e2dec21197b74a74cc4ae1834c)
addresses and verify that the remote nodes have/keep a consistent view of
assigned addresses.
If a remote node has an inconsistent view of addresses visavi the recovery
master this will trigger a full ip reallocation.
(This used to be ctdb commit f3bf2ab61f8dbbc806ec23a68a87aaedd458e712)
We know ask for the known and available interfaces.
This means a node gets a RELEASE_IP event for all interfaces
it "knows", but doesn't serve and a node only gets a TAKE_IP event
for "available" interfaces.
metze
(This used to be ctdb commit a695a38e49e7c3e15a9706392dc920eeab1f11ba)
This is needed because the "startup" event runs after the initial recovery,
but we need to do some actions before the initial recovery.
metze
(This used to be ctdb commit e953808449c102258abb6cba6f4abf486dda3b82)
configureable using --log-ringbuf-size=<num-entries>.
Add an entry in the sysconfig file to set this persistently.
(This used to be ctdb commit c79c2da69bc352f509e7fca4b9172a4b7f23c0f8)
We don't want ctdb stalling due to paging; this can be far worse than
scheduling delays. But if we simply do mlockall(MCL_FUTURE), it
increases the risk that mmap (ie. tdb open) or malloc will fail,
causing us to abort.
This patch is a compromise: we mlock all current pages (including
10k of future stack for expansion) and then relock when a client
asks us to open a TDB. We warn, but don't exit, if it fails.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 82f778e85440bc713d3f87c08ddc955d3cfce926)
1) It's buggy. Code needs to be carefully written (ie. no busy
loops) to handle running with it, and we fork and run scripts.[1]
2) It makes debugging harder. If ctdbd loops (as has happened recently)
it can be extremely hard to get in and see what's happening. We've already
seen the valgrind hacks.
3) We have seen recent scheduler problems. Perhaps they are unrelated,
but removing this very unusual setup is unlikely to hurt.
4) It doesn't make anything faster. Under all but the most perverse of
circumstances, 99% of the cpu gives the same performance as 100%, and
we will always preempt normal processes anyway.
[1] I made this worse in 0fafdcb8d353 "eventscript: fork() a child for
each script" by removing the switch_from_server_to_client() which
restored it, but even that was only for monitor scripts. Others were
run with RT priority.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 482c302d46e2162d0cf552f8456bc49573ae729d)
The do_setsched was being tested for whether to mmap tdbs: let's make it
explicit. We can also happily move the kill-child eventscript hack under
this flag.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 2ee86cc1f311d7b7504c7b14d142b9c4f6f4b469)
Depending on --max-persistent-check-errors we allow ctdb
to start with unhealthy persistent databases.
The default is 0 which means to reject a startup with
unhealthy dbs.
The health of the persistent databases is checked after each
recovery. Node monitoring and the "startup" is deferred
until all persistent databases are healthy.
Databases can become healthy automaticly by a completely
HEALTHY node joining the cluster. Or by an administrator
with "ctdb backupdb/restoredb" or "ctdb wipedb".
metze
(This used to be ctdb commit 15f133d5150ed1badb4fef7d644f10cd08a25cb5)
This reverts commit 401f421fa003d9515df15e759b50b56e0c67d69c.
Conflicts:
include/ctdb_private.h
server/ctdb_tunables.c
(This used to be ctdb commit b883d19a495a41a22db37f9c2cf6250fee529de0)
since we no longer ban nodes when dodgy scripts continue to hang.
We now only mark nodes as unhealthy if monitor events fail or timeout. Never ban.
(This used to be ctdb commit 5c8e56fc7a518e115bceac257867739283cf6a1e)
there is no rational need for a setting where we permanently mark nodes as disabled everytime an eventscript fails
(This used to be ctdb commit 68a8ee99b128a5ec883600735626bdb3bbc9c503)
This patch improves the handling of the fetch_lock operation on non-persistent
databases that ctdb clients have to do very frequently.
The normal flow how this goes is the following:
1. Client does a local fetch_lock on the database
2. Client looks if the local node is dmaster.
If yes, everything is fine
If no, continue here
3. Client unlocks the local record
4. Client issues a "get me the record" call to ctdbd
5. ctdbd goes out and fetches the dmaster role
6. ctdbd tells the client to retry
7. Client starts over again
The problem is between step 6 and 7: Before the client has had the chance to
retry (i.e. catch the record with a fetch_locked), another node might have come
asking ctdbd to migrate away the record again. This is a real problem, I've
seen >20 loops of this kind in real workloads.
This patch does the following: Whenever ctdb receives a record as result of
step 5, it puts the key on a "holdback list". As long as a key is on this list,
a request to migrate away the dmaster is put on hold. It is the client's duty
to issue the "CTDB_CONTROL_GOTIT" control when it has successfully done step 2
after having asked ctdb to fetch the record. This will release the key from the
"holdback list" and re-issue all dmaster migration requests.
As a safeguard against malicious clients, once a second (default 1000msecs,
tunable "HoldbackCleanupInterval" in milliseconds) ctdbd goes over the list of
held back keys, deletes them and releases all held back migration requests.
(This used to be ctdb commit 5736e17c139c9a8049e235429aeae0c6c9d0e93d)
This is a simplified version of the trans2 commit control:
It just rolls out the marshall buffer to all active nodes.
It is the main ctdbd part of the re-implementation of the
persistent transactions. The client code is changed to
take a global lock to start a transactions and store into
the marshal buffer instead of writing to the local tdb
under a local transaction.
The old transaction implementation is going to be
removed in a later commit.
Michael
(This used to be ctdb commit f66428f9d2013080a414404c1ba6117888352fd6)
Date: Wed, 9 Dec 2009 22:45:12 +0100
Subject: [PATCH] Revert an accidential commit
(This used to be ctdb commit af6656f2844d8fd72204a70358c9d589dbe1bd34)
This might be a bit less efficient, but experience in winbind has shown that
event callbacks can trigger changes in the socket state in very hard to
diagnose ways.
(This used to be ctdb commit a78b8ea7168e5fdb2d62379ad3112008b2748576)
We also no longer return an error before scripts have been run; a special
zero-length data means we have never run the scripts.
"ctdb scriptstatus all" returns all event script results.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 9b90d671581e390e2892d3a68f3ca98d58bef4df)
We're going to need this so ctdb can query non-monitor status.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 53bc5ca23ca55a3ac63a440051f16716944a2a51)
Rather than only tranferring to last_status for monitor events, do
it for every event (ctdb->last_status is now an array).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit c73ea56275d4be76f7ed983d7565b20237dbdce3)
We're going to allow fetching status of all script runs, so this
name is no longer appropriate.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit f5cb41ecf3fa986b8af243e8546eb3b985cd902a)
A new helper functions which sets up an event attached to the child's
stdout/stderr which gets routed to the logging callback after being
placed in the normal logs.
This is a generalization of the previous code which was hardcoded to
call ctdb_log_event_script_output.
The only subtlety is that we hang the child fds off the output buffer;
the destructor for that will flush, which means it has to be destroyed
before the output buffer is.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 32cfdc3aec34272612f43a3588e4cabed9c85b68)
The child no longer uses ctdb_ctrl_event_script_init or
ctdb_ctrl_event_script_finished, and the others are redundant: it
doesn't need to tell us it's starting a script when it only runs one.
We move start and stop calls to the parent, and eliminate the RPC
infrastructure altogether.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 391926a87a7af73840f10bb314c0a2f951a0854c)
We put a "scripts" member in ctdb_event_script_state, rather than using
a special struct for monitor events. This will fit better as we further
unify the different events, and holds the reports from the child process
running each monitor script.
Rather than making the monitor state a child of current_monitor_status_ctx,
we just point current_monitor directly at it. This means we need to reset
that pointer in the destructor for ctdb_event_script_state.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 9a2b4f6b17e54685f878d75bad27aa5090b4571f)
We have monitor_event_script_ctx and other_event_script_ctx, and
current_monitor_status_ctx in struct ctdb_context. This seems more
complex than it needs to be.
We use a single "event_script_ctx" as parent for all event script
state structures. Then we explicitly reparent monitor events under
current_monitor_status_ctx: this is freed every script invocation to
kill off any running scripts anyway.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 0d925e6f2767691fa561f15bbb857a2aec531143)
eventscript.c uses this now, but our next patch makes others use it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit a305cb7743c24386e464f6b2efab7e2108bb1e7e)
This unifies code paths and simplifies things: we just hand -ENOEXEC to
ctdb_ctrl_event_script_stop().
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit eadf5e44ef97d7703a7d3bce0e7ea0f21cb11f14)
This simplifies the code a little: last_status is now read to go
(it's only used by the scriptstatus command at the moment).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 6be931266a4e41fd0253f760936ad9707dd97c47)
This controls is only used by samba when samba wants to check if a subrecord held by a <node-id>:<smbd-pid> is still valid or if it can be reclaimed.
If the node is banned or stopped, we kill the smbd process and return that the process does not exist to the caller. This allows us to recover subrecords from stopped/banned nodes where smbd is hung waiting for the databases to thaw.
bz58185
(This used to be ctdb commit 157807af72ed4f7314afbc9c19756f9787b92c15)
Add the mapping to the list everytime we accept() a new client connection
and set it up to remove in the destructor when the client structure is freed.
(This used to be ctdb commit f75d379377f5d4abbff2576ddc5d58d91dc53bf4)
Now we're doing checking, we might as well make sure the commands from
"ctdb eventscripts" are valid.
This gets rid of the "UNKNOWN" event type.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 1d24a3869fe89fc9a109fd9e9b69df5fc665a5f6)
Rather than doing strcmp everywhere, pass an explicit enum around. This
also subtly documents what options are available. The "options" arg
is now used for extra arguments only.
Unfortunately, gcc complains on empty format strings, so we make
ctdb_event_script() take no varargs, and add ctdb_event_script_args(). We
leave ctdb_event_script_callback() taking varargs, which means callers
have to do "%s", "".
For the moment, we have CTDB_EVENT_UNKNOWN for handling forced scripts
from the ctdb tool.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 8001488be4f2beb25e943fe01b2afc2e8779930d)
Everyone uses the same timeout value, so just remove it from the API.
If we ever need variable timeouts, that might as well be central too.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(This used to be ctdb commit 533c3e053293941d2a9484b495e78d45f478bb08)
command.
Use the existing context used for non-monitor events
Multiple concurrent uses of "ctdb eventscript ..." could otherwise lead to a SEGV
(This used to be ctdb commit 80a8d728e9680040e00d24361dfc9367dd372a56)
This allows running the actual monitoring asynchronously from ctdbd
and only using "status" to pick up the actual results.
(This used to be ctdb commit 1908bac812650ca25151051f5d86815e0b8ed319)
use a udp socket on the ctdbd port to send messages to teh syslog child process for loggign.
we need this when syslog becomes "slow", like very slow, and on boxes where syslog is limited to 100 lines per second and starts to block after that
(This used to be ctdb commit 1446f4c247310e2ff2d522055bd8927d1a78d017)
Otherwise a node can lock itself out, e.g. when a commit control times out...
Michael
(This used to be ctdb commit cb432e30351d5e5a41e98da3c7b1c2a4d400a3a2)
This aske the daemon wheter a transaction is currently active on a
given DB on that node. More precisely this asks for the transaction_active
flag in the ctdb_db_context that is set in the CTDB_TRANS2_COMMIT
control and cleared in the CTDB_TRANS2_ERROR or CTDB_TRANS2_FINISHED controls.
This will be useful for fixing race conditions in the transaction code.
Michael
(This used to be ctdb commit 8d430ae6968dfe566614379436fc3c56003fcd88)