1
0
mirror of https://github.com/samba-team/samba.git synced 2024-12-23 17:34:34 +03:00
Commit Graph

810 Commits

Author SHA1 Message Date
Amitay Isaacs
c6914e3891 banning: Make ctdb_local_node_got_banned() a void function
When this function is called, we are already committed to banning
and there is no point in failing this function.  In case, freezing of
databases fails, it will be fixed from recovery daemon.

(This used to be ctdb commit bb178338658b4ae32382a1f62f7c21cee1d4878f)
2013-07-02 12:59:08 +10:00
Amitay Isaacs
622ccd09f9 freeze: Make ctdb_start_freeze() a void function
If this function fails due to memory errors, there is no way to recover.
The best course of action is to abort.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 46efe7a886f8c4c56f19536adc98a73c22db906a)
2013-07-02 12:59:08 +10:00
Martin Schwenke
6a52a87028 ctdbd: Refactor shutdown sequence
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit b32fd04bfbf33062d45365b37a7247e272a76ceb)
2013-06-22 15:51:02 +10:00
Martin Schwenke
6d9667f01c ctdbd: Add new runstate CTDB_RUNSTATE_FIRST_RECOVERY
This adds more serialisation to the startup, ensuring that the
"startup" event runs after everything to do with the first recovery
(including the "recovered" event).

Given that it now takes longer to get to the "startup" state, the
initscript needs to wait until ctdbd gets to "first_recovery".

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit ed6814ff0a59ddbb1c1b3128b505380f60d7aeb7)
2013-05-24 14:08:07 +10:00
Martin Schwenke
77671b9ef5 ctdbd: New control CTDB_CONTROL_GET_RUNSTATE
Also new client function ctdb_ctrl_get_runstate().

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit dc4220e6f618cc688b3ca8e52bcb3eec6cb55bb1)
2013-05-24 14:08:07 +10:00
Martin Schwenke
63577c96db ctdbd: Replace ctdb->done_startup with ctdb->runstate
This allows states, including startup and shutdown states, to be
clearly tracked.  This doesn't include regular runtime "states", which
are handled by node flags.

Introduce new functions ctdb_set_runstate(), runstate_to_string() and
runstate_from_string().

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 8076773a9924dcf8aff16f7d96b2b9ac383ecc28)
2013-05-24 14:08:06 +10:00
Amitay Isaacs
1ddc7b0d10 locking: Remove functions that are not used anymore
These functions were used in locking child process to do the locking.  With
locking helper, these are not required.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit c660f33c3eaa1b4a2c4e951c1982979e57374ed4)
2013-05-24 09:06:40 +10:00
Martin Schwenke
54e91df60d recoverd: Move IP flags into ctdb_takeover.c
These should never be seen outside the IP allocation code.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit e143abd16ccde2e0edfe103673d31a5fb06b6aef)
2013-05-09 12:55:42 +10:00
Martin Schwenke
0445c988e2 recoverd: Fix tunable NoIPTakeoverOnDisabled, rename to NoIPHostOnAllDisabled
This really needs to be per-node.  The rename is because nodes with
this tunable switched on should drop IPs if they become unhealthy (or
disabled in some other way).

* Add new flag NODE_FLAGS_NOIPHOST, only used in recovery daemon.

* Enhance set_ipflags_internal() and set_ipflags() to setup
  NODE_FLAGS_NOIPHOST depending on setting of NoIPHostOnAllDisabled
  and/or whether nodes are disabled/inactive.

* Replace can_node_servce_ip() with functions can_node_host_ip() and
  can_node_takeover_ip().  These functions are the only ones that need
  to look at NODE_FLAGS_NOIPTAKEOVER and NODE_FLAGS_NOIPHOST.  They
  can make the decision without looking at any other flags due to
  previous setup.

* Remove explicit flag checking in IP allocation functions (including
  unassign_unsuitable_ips()) and just call can_node_host_ip() and
  can_node_takeover_ip() as appropriate.

* Update test code to handle CTDB_SET_NoIPHostOnAllDisabled.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 1308a51f73f2e29ba4dbebb6111d9309a89732cc)
2013-05-07 16:20:46 +10:00
Martin Schwenke
fa16cccf02 ctdbd: Remove the "stopped" event
It isn't used, superceded by "ipreallocated".

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit c2bb8596a8af6406ef50e53953884df9d6246a96)
2013-05-06 13:38:21 +10:00
Martin Schwenke
2e59cd5428 ctdbd: New control CTDB_CONTROL_IPREALLOCATED
This is an alternative to using ctdb_run_eventscripts() that can be
used when in recovery.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 27a44685f0d7a88804b61a1542bb42adc8f88cb1)
2013-05-06 13:38:21 +10:00
Michael Adam
1aa09dd5c3 include: define CTDB_REC_RO_FLAGS - all read-only related record flags
This is used for some checks

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-By: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit c7924ce6404bb18641b00d5fbd2fe9da9aaf7959)
2013-04-24 18:48:31 +10:00
Michael Adam
527976d02a vacuum: introduce the RECEIVE_RECORDS control
This in preparation of turning the vacuming on the lmaster into
into a two phase process:

- First the node sends the list of records to be vacuumed
  to all other nodes with this new RECEIVE_RECORDS control.
  The remote nodes should store the lmaster's empty current copy.
- Only those records that could be stored on all other nodes
  are processed further. They are send to all other nodes with
  the TRY_DELETE_RECORDS control as before for deletion.

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-By: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit e397702e271af38204fd99733bbeba7c1db3a999)
2013-04-24 18:47:32 +10:00
Martin Schwenke
7ba42d2c89 util: Removed unused declaration of ctdbd_start()
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit 1e989894764e4cd1d551c44784d91cb295cd790d)
2013-04-18 13:22:12 +10:00
Martin Schwenke
7ccde44d30 include: Move ctdb_start_daemon() from ctdb_client.h to ctdb_private.h
It really is internal.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit abb64f62efaa70df4b87c030b96300eafd98e6a3)
2013-04-18 13:22:12 +10:00
Martin Schwenke
dcf1ac34ab ctdbd: Add --pidfile option
Default is not to create a pid file.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit 996e74d3db0c50f91b320af8ab7c43ea6b1136af)
2013-04-18 13:21:59 +10:00
Martin Schwenke
4ede763f3b util: New functions ctdb_set_child_info() and ctdb_is_child_process()
Must be called by all child processes.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit 59b019a97aad9a731f9080ea5be14d0dbdfe03d6)
2013-04-18 13:18:29 +10:00
Michael Adam
b1a6289b44 ctdbd: unimplement the unused SET_DMASTER control
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 2e92deef5221ee651028ef87138b3113f1fece91)
2013-04-17 12:44:08 +02:00
Amitay Isaacs
9e0f8fa09c traverse: Add CTDB_CONTROL_TRAVERSE_ALL_EXT to support withemptyrecords
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit e691df43d20871468142c8fb83f7c7303c4ec307)
2013-04-17 12:30:59 +02:00
Amitay Isaacs
dd050cd4ba util: Add hex_decode_talloc() to decode hex string into a binary blob
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 307416afda707b687f5e89e8438e45c154a4c806)
2013-03-25 17:45:23 +11:00
Amitay Isaacs
5d7efb4cf1 ctdbd: Add an index db for message list for faster searches
When CTDB is busy with lots of smbd, CTDB was spending too much time in
daemon_check_srvids() which searches a list of srvids in the registered
message handlers.  Using a hash based index significantly improves the
performance of search in a linked list.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 3e09f25d419635f6dd679b48fa65370f7860be7d)
2013-03-06 15:32:33 +11:00
Martin Schwenke
dab2f6817d client: New generic node listing function list_of_nodes()
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit a73bb56991b8c07ed0e9517ffcf0dc264be30487)
2013-02-20 14:44:38 +11:00
Martin Schwenke
689384a7b4 Logging: Fix breakage when freeing the log ringbuffer
Commit a82d3ec12f0fda16d6bfa8442a07595de897c10e broke fetching from
the log ringbuffer.  The solution there is still generally good: there
is no need to keep the ringbuffer in children created by
ctdb_fork()... except for those special children that are created to
fetch data from the ringbuffer!

Introduce a new function ctdb_fork_no_free_ringbuffer() that does
everything ctdb_fork() needs to do except free the ringbuffer (i.e. it
is the old ctdb_fork() function).  The new ctdb_fork() function just
calls that function and then frees the ringbuffer in the child.

This means all callers of ctdb_fork() have the convenience of having
the ringbuffer freed.  There are 3 special cases:

* Forking the recovery daemon.  We want to be able to fetch from the
  ringbuffer there.

* The ringbuffer fetching code.  Change the 2 calls in this code (main
  daemon, recovery daemon) to call ctdb_fork_no_free_ringbuffer()
  instead.

While we're here, clear the log ringbuffer when the recovery deamon is
forked, since it will contain a copy of the messages from the main
daemon.

Note to self: always test... even the most obvious patches...  ;-)

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 00db5fa00474f8a83f1aa3b603fd756cc9b49ff4)
2013-02-07 11:26:29 +11:00
Martin Schwenke
bc5f0a2b65 ctdbd: Remove command-line option --debug-hung-script
Use an environment variable instead.  This just means that the
initscript exports CTDB_DEBUG_HUNG_SCRIPT and the code checks for the
environment variable.

The justification for this simplification is that more debug options
will be arriving soon and we want to handle them consistently without
needing to add a command-line option for each.  So, the convention
will be to use an environment variable for each debug option.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 0581f9a84e58764d194f4e04064c2c5b393c348b)
2013-02-05 16:05:13 +11:00
Martin Schwenke
f2428cadd8 ctdbd: Remove debug_hung_script_ctx
The only allocation against this context is by
ctdb_fork_with_logging().  This memory is freed by ctdb_log_handler()
anyway.  There should be no memory leak.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 501461cc3e132d4adee9e91b5d4513a26bae2846)
2013-02-05 16:05:13 +11:00
Martin Schwenke
f2ba0e8a65 Logging: New function ctdb_log_ringbuffer_free()
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit a4f622e85168f59417c11705f1734e0352e1d44a)
2013-02-05 12:40:30 +11:00
Amitay Isaacs
4a6fa39ff9 daemon: Protect against double free of callback state while shutting down
When CTDB is shut down and monitoring has been stopped, monitor_context
gets freed and all the callback states hanging off it.  This includes
callback state for current_monitor, if the current monitor event has
not yet finished.  As a result, when the shutdown event is called,
current_monitor->callback state is not NULL, but it's actually freed
and it's a dangling reference.

So before executing callback function and freeing callback state check
if ctdb->monitor->monitor_context is not NULL.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 7d8546ee4353851f0543d0ca2c4c67cb0cc75aea)
2013-01-09 14:39:23 +11:00
Amitay Isaacs
30299c387f daemon: On shutdown, destroy timed events that check if recoverd is active
When CTDB is shutting down, recovery daemon is stopped, but the
event that checks if recovery daemon is still alive is not destroyed.
So recovery master is restarted during shutdown if CTDB daemon takes
longer to shutdown.

There are two processes that check if recovery daemon is working.

1. ctdb_check_recd() - which checks every 30 seconds if the recovery
   daemon process exists.

2. ctdb_recd_ping_timeout() - which is triggered when recovery daemon
   fails to ping CTDB daemon.

Both the events are periodic and need to be destroyed when shutting down.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 746168df2e691058e601016110fae818c6a265c3)
2013-01-09 13:20:26 +11:00
Martin Schwenke
80a2bb84e7 ctdbd: Remove debug option --node-ip, use --listen instead
This effectively reverts d96cb02c2c24f9eabbc53d3d38e90dea49cff3e0

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 496387a585b2c5778c808cf02b8e1435abde4c3e)
2013-01-07 10:35:39 +11:00
Amitay Isaacs
a73f13ada7 daemon: Add a tunable to enable automatic database priority setting
Samba versions 3.6.x and older do not set the database priority.
This can cause deadlock between Samba and CTDB since the locking order
of database will be different. A hack was added for automatic promotion
of priority for specific databases to avoid deadlock.  This code should
not be invoked with Samba version 4.x which correctly specifies the
priority for each database.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit 4a9e96ad3d8fc46da1cd44cd82309c1b54301eb7)
2013-01-05 01:14:57 +01:00
Amitay Isaacs
13518b9e33 daemon: Check if log_latency_ms is set before using it
This fixes a bug where wrong variable is checked.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit f81e9add466b1d9b2796c09c6ba63b77296ea149)
2012-11-30 12:21:30 +11:00
Amitay Isaacs
442d9905fe locking: Do not use RECLOCK for tracking DB locks and latencies
RECLOCK is for recovery lock in CTDB. Do not override the meaning for
tracking locks on databases.  Database lock latency has nothing to do
with recovery lock latency.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 54e24a151d2163954e5a2a1c0f41a2b5c19ae44b)
2012-11-14 15:51:59 +11:00
Amitay Isaacs
85c8deca3f recoverd: Track the nodes that fail takeover run and set culprit count
If any of the nodes fail takeover run (either due to timeout or failure
to complete within takeover_timeout interval) from main loop, recovery
master will give up trying takeover run with following message:

  "Unable to setup public takeover addresses. Try again later"

And as a side-effect the monitoring is disabled on all the nodes. Before
ctdb_takeover_run() is called from main loop, monitoring get disabled via
startrecovery event. Since ctdb_takeover_run() fails, it never runs
recovered event and monitoring does not get re-enabled.

In main_loop, ctdb_takeover_run() is called with a takeover_fail_callback.
This callback will get called if any of the nodes fail in handling
takeip/releaseip/ipreallocated events in ctdb_takeover_run().

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit a5c6bb1fffb8dc3960af113957a1fd080cc7c245)
2012-11-14 10:59:54 +11:00
Martin Schwenke
db5dfe891c recoverd: Add CTDB_SRVID_GETLOG and CTDB_SRVID_CLEARLOG
These support getting and clearing logs from the ring-buffer in the
recovery daemon.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit cbca233d1e03b2410e0bb63b936328d4a8b3c7b4)
2012-10-22 11:15:36 +11:00
Amitay Isaacs
bc126ccdd4 build: Set CTDB_PATH to /tmp/ctdb.socket if SOCKPATH is not defined
When building samba with CTDB, if samba configure/waf does not support
setting of SOCKPATH, fallback to /tmp/ctdb.socket.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit a9511cf5ecd5bc39b0070f0afa8ac4d4926c6cab)
2012-10-22 09:01:27 +11:00
David Disseldorp
8cbf1a00c4 Build: Set the default ctdb socket path at configure time
The ctdb socket path currently defaults to /tmp/ctdb.socket and can be
modified at runtime using the --socket=filename option, common to both
ctdb and ctdbd binaries.

This change allows the default path to be set at configure time using
the --with-socketpath=FILE argument. When not specified, the default
path remains /tmp/ctdb.socket, documentation remains unchanged as a
result.

Signed-off-by: David Disseldorp <ddiss@samba.org>

(This used to be ctdb commit f92b9c83a2f39fba9a141417a88de96fc8c592ff)
2012-10-21 01:39:08 +11:00
Amitay Isaacs
a00e50e503 ctdbd: Replace lockwait with locking API and remove ctdb_lockwait.c
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 2126795153dacb255e441abcb36ee05107b6282a)
2012-10-20 02:48:44 +11:00
Amitay Isaacs
83306337df ctdbd: locking: Provide non-blocking API for locking of TDB record/db/alldb
This introduces a consistent API for handling locks on single record, complete
db or all dbs. The locks are taken out in a child process. In cases of timeout,
find the processes that currently hold the lock and log.

Callback functions for locking requests take locked boolean to indicate
whether the lock was successfully obtained or not.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 1af99cf0de9919dd89af1feab6d1bd18b95d82ff)
2012-10-20 02:48:44 +11:00
Amitay Isaacs
1011d10a51 common: Add routines to get process and lock information
Currently these functions are implemented only for Linux.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit be4051326b0c6a0fd301561af10fd15a0e90023b)
2012-10-20 02:48:44 +11:00
Amitay Isaacs
ef79dc012e header: Added DB statistics update macros
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit a0cdfae7438092f5c605f0608daa536be860b7fe)
2012-10-20 02:48:44 +11:00
Martin Schwenke
8d7562f3f8 common: Debug ctdb_addr_to_str() using new function ctdb_external_trace()
We've seen this function report "Unknown family, 0" and then CTDB
disappeared without a trace.  If we can reproduce it then this might
help us to debug it.

The idea is that you do something like the following in /etc/sysconfig/ctdb:

  export CTDB_EXTERNAL_TRACE="/etc/ctdb/config/gcore_trace.sh"

When we hit this error than we call out to gcore to get a core file so
we can do forensics.  This might block CTDB for a few seconds.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 7895bc003f087ab2f3181df3c464386f59bfcc39)
2012-10-18 20:05:42 +11:00
Martin Schwenke
4b4e4d8870 ctdbd: Stop takeovers and releases from colliding in mid-air
There's a race here where release and takeover events for an IP can
run at the same time.  For example, a "ctdb deleteip" and a takeover
initiated by the recovery daemon.  The timeline is as follows:

1. The release code registers a callback to update the VNN.  The
   callback is executed *after* the eventscripts run the releaseip
   event.

2. The release code calls the eventscripts for the releaseip event,
   removing IP from its interface.

   The takeover code "updates" the VNN saying that IP is on some
   iface.... even if/though the address is already there.

3. The release callback runs, removing the iface associated with IP in
   the VNN.

   The takeover code calls the eventscripts for the takeip event,
   adding IP to an interface.

As a result, CTDB doesn't think it should be hosting IP but IP is on
an interface.  The recovery daemon fixes this later... but it
shouldn't happen.

This patch can cause some additional noise in the logs:

  Release of IP 10.0.2.133/24 on interface eth2  node:2
  recoverd:We are still serving a public address '10.0.2.133' that we should not be serving. Removing it.
  Release of IP 10.0.2.133/24 rejected update for this IP already in flight
  recoverd:client/ctdb_client.c:2455 ctdb_control for release_ip failed
  recoverd:Failed to release local ip address

In this case the node has started releasing an IP when the recovery
daemon notices the addresses is still hosted and initiates another
release.  This noise is harmless but annoying.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit bfe16cf69bf2eee93c0d831f76d88bba0c2b96c2)
2012-10-11 12:10:45 +11:00
Martin Schwenke
79ea15bf96 ctdbd: New tunable NoIPTakeoverOnDisabled
Stops the behaviour where unhealthy nodes can host IPs when there are
no healthy nodes.  Set this to 1 when an immediate complete outage is
preferred when all nodes are unhealthy.  The alternative
(i.e. default) can lead to undefined behaviour when the shared
filesystem is unavailable.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit a555940fb5c914b7581667a05153256ad7d17774)
2012-10-11 12:10:45 +11:00
Volker Lendecke
a68512c7d8 Correct include for ctdb_protocol.h
With an old ctdb_protocol.h installed under /usr/local, ctdb will
not compile because the <> form of include will find the header
under /usr/local

(This used to be ctdb commit c4f5a58471b206e2287c7958c7f29c1f1c0626ac)
2012-10-09 23:13:29 +11:00
Martin Schwenke
e05fc0e7b0 libctdb: add ctdb_getcapabilities()
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 140fafef23050d40d66f5b5558c7efcb78f80cd2)
2012-09-28 17:05:34 +10:00
Ronnie Sahlberg
d21337a0fb Add new command to find which interface is located on
(This used to be ctdb commit f07376309e70f5ccdb7de8453caacc71b451ab48)
2012-06-20 15:11:49 +10:00
Ronnie Sahlberg
59565c05cf STATISTICS: Add tracking of the 10 hottest keys per database measured in hopcount
and add mechanisms to dump it using the ctdb dbstatistics command

(This used to be ctdb commit 8307c70ed98996b430c470e9641a09fdeeb81bd8)
2012-06-13 16:19:18 +10:00
Amitay Isaacs
7631830152 server: Replace BOOL datatype with bool, True/False with true/false
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 6e5cbe8fff71985e5a2fc16b7e9f2b868011ff5d)
2012-05-28 11:22:25 +10:00
Ronnie Sahlberg
e7d21834ae RECOVER: When we pull databases during recovery, we used to reallocate the databuffer for each entry added. This would normally not be an issue, but for cases where memory is fragmented, this could start to cost significant cpu if we need to reallocate and move to a different region.
Change this to instead preallocate , by default, 10MByte chunks to the data buffer.
This significantly reduces the number of potential reallocate and move  operations that may be required.

Create a tunable to override/change how much preallocation should be used.

(This used to be ctdb commit 1f262deaad0818f159f9c68330f7fec121679023)
2012-05-25 12:34:06 +10:00
Ronnie Sahlberg
26322d257d DEBUG: Add checks for and print debug messages when 1) a database contains very many records, 2) when a database is very big, 3) when a single record is very big.
Add tunables to control when to log these instances and allow it to be completely turned off by setting the threshold to 0

(This used to be ctdb commit 9ed58fef4991725f75509433496f4d5ffae0ae87)
2012-05-21 13:26:13 +10:00
Ronnie Sahlberg
dce5969d12 Debug: When scripts hang, we may need to collect additional data in order to debug why the script hung.
Break this debug and datacollection out into an external script to make it easier to modify what data we need to collect.
For now we only collect a pstree so we can see what part of the script we hung in.

S1037271

(This used to be ctdb commit 6e68797af67bee36f2bad045f94806e7e98f27e9)
2012-05-17 10:29:03 +10:00
Ronnie Sahlberg
a57eba2bb4 Track all child process so we never send a signal to an unrelated process (our child died and kernel wrapped the pid-space and reused the pid for a different process
Wrap all creation of child processes inside ctdb_fork() which is used to track all processes we have spawned.
Capture SIGCHLD to track also which child processes have terminated.

Wrap kill() inside ctdb_kill() and make sure that we never send a !0 signal to a child process pid that has already terminated (and might have been replaced with a

(This used to be ctdb commit f73a4b1495830bcdd094a93732a89dd53b3c2f78)
2012-05-03 14:03:26 +10:00
Ronnie Sahlberg
a367fa6138 RELOADIPS: simplify the reloadips code a bit
and also update the "read public address file" to not check if the address exists already locally when we read if from the child process, to stop it
from spamming the logs with "We already host ..."
messages

(This used to be ctdb commit 334ea830f1bf33419f4a1e78f23afd41a852d0f4)
2012-05-01 15:34:26 +10:00
Ronnie Sahlberg
7a1aa560e7 Add new control to reload the public ip address file on a node
Also add a method to use the recovery master/daemon to reload the public ips on all nodes in the cluster.
Reloading the public ips on all node sin the cluster is only suported if all nodes in the cluster are available and healthy.

(This used to be ctdb commit 05603e914f8c12618d7e06943c0f7df207f645b0)
2012-05-01 10:48:08 +10:00
Amitay Isaacs
131d35d67d includes: Move special tevent defines from tevent.h to includes.h
This allows to build against system tevent library. Also include tevent header
along with other common headers.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 9ae4389c2c959c5dcd8395fdae2b25ed7e1e873a)
2012-04-13 17:28:14 +10:00
Martin Schwenke
fbe64dec01 Undo damage done by d8d37493478a26c5f1809a5f3df89ffd6e149281
The implementation of DisableIPFailover got intermingled with
--nopublicipcheck.  This just looks wrong - Ronnie must have been
having a bad day.  :-)

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 5083b266dd68b292c4275505f3d1b878dbf12f11)
2012-03-22 15:34:52 +11:00
Ronnie Sahlberg
2456f77ca6 NoIPTakeover: change the tunable name for the "dont allow failing addresses over onto the node" to NoIPTakeover
(This used to be ctdb commit 35592e618cfd827b6978af6332f80504f232c46a)
2012-03-22 11:05:15 +11:00
Ronnie Sahlberg
befa9df152 Make NoIPFailback a node local setting. Nodes that have NoIPFailback set to !0 can not takeover new ip addresses during failover.
Remove the old global setting for this unused tunable and add it as a new node flag. This node flag is only valid/defined within the takeover subsystem in the recovery daemon. Add async functions to collec the NoIPFailback settings for each node.

This will later e used to disqualify certain nodes from being takeover targets when we perform reallocation.

(This used to be ctdb commit 668f3e88a9e5f598706952b7140547640c85a5ed)
2012-03-22 09:09:57 +11:00
Ronnie Sahlberg
fa3a06246a STICKY: add prototype code to make records stick to a node to "calm" down if they are found to be very hot and accessed by a lot of clients.
This can improve performance and stop clients from having to chase a rapidly migrating/bouncing record

(This used to be ctdb commit d0d98f7e45e5084b81335b004d50bddc80cdc219)
2012-03-20 17:12:19 +11:00
Ronnie Sahlberg
e7e51ddb64 LACOUNT: Add back lacount mechanism to defer migrating a fetched/read copy until after default of 20 consecutive requests from the same node
This can improve performance slightly on certain workloads where smbds frequently read from the same record

(This used to be ctdb commit 035c0d981bde8c0eee8b3f24ba8e2dc817e5b504)
2012-03-20 12:26:22 +11:00
Ronnie Sahlberg
6a493a0b08 STATISTICS: add per-db hop count statistics
(This used to be ctdb commit 1c976d83b1d7dac6f0ef81306774998e4c8b56a1)
2012-03-20 12:11:55 +11:00
Ronnie Sahlberg
c051f67d67 FETCH COLLAPSE : Change the fetch-lock collapse to collapse ALL fetches, including fetch-locks into a single command in flight per record. Also add a tunable to enable/disable this optimization for hot records
(This used to be ctdb commit eafd7bbaaa5931546a96c8beae3cf9a39a49c925)
2012-03-20 11:39:00 +11:00
Ronnie Sahlberg
038c946e80 add max hop count buckets to see how bad hopcounts are
(This used to be ctdb commit 7d3931298e6477d92f43652c3006b0c426cb1307)
2012-03-20 11:20:53 +11:00
Ronnie Sahlberg
f3600276fc Add a tunable variable to control how long we defer after a ctdb addip until we force a rebalance and try to failback addresses onto this node
Have it default to 300 seconds.

(This used to be ctdb commit 49791db7dc74cffd7e88bd73091590cdc1909328)
2012-02-28 06:58:59 +11:00
Ronnie Sahlberg
ef2bd0b016 When adding ips to nodes, set up a deferred rebalance for the whole node to trigger after 60 seconds in case the normal ipreallocated is not sufficient to trigger rebalance.
(This used to be ctdb commit 4340263b219d75c39f8de22abe3f6f1c1ee63ea2)
2012-02-28 06:56:04 +11:00
Ronnie Sahlberg
93ec9c589c Eventscripts: remove the horrible horrible circular reference between state and callback since these two structures do not even share the same parent talloc context.
Instead, tie them together via referencing a permanent linked list hung off the ctdb structure.

(This used to be ctdb commit a95c02da6c67dc4bd8716b75318a4188301df6f9)
2012-02-23 06:49:47 +11:00
Ronnie Sahlberg
42e477b14e READONLY: only send a control to schedule fast-vacuuming from child context iff we have a connection open to the main daemon
there are some child processes where we do not create a connection to the main daemon (switch_from_server_to_client()) because it is expensive to set up and we normally might not need to talk to the daemon at all via a domainsocket.
but we might want to still call to ctdb_ltdb_store() from such chil processes.

(This used to be ctdb commit 9e372a08c40087e6b5335aa298e94d88273566a5)
2012-02-21 07:03:44 +11:00
Ronnie Sahlberg
73f8be16c6 ReadOnly: add per-database statistics to view how much delegations/revokes we have
(This used to be ctdb commit 751ed46197661eb841042ab6a02855a51dd0b17c)
2012-02-08 15:29:27 +11:00
Ronnie Sahlberg
1eafa68f0f STATISTICS: add total counts for number of delegations and number of revokes
Everytime we give a delegation to another node we count this as one delegation.
If the same record is delegated to several nodes we count one for each node.

Everytime a record has all its delegations revoked we count this as one revoke.

(This used to be ctdb commit b098bcf8007be63889aaed640a951b0eeaa9d191)
2012-02-08 13:42:30 +11:00
Martin Schwenke
ed8a8ee966 libctdb - add ctdb_getvnnmap()
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit f6039eaece4224b866a98dd49010f278a7b3f015)
2012-02-06 16:00:23 +11:00
Ronnie Sahlberg
e648045499 Merge branch 'master' of ssh://git.samba.org/data/git/ctdb
(This used to be ctdb commit 15d8ae8b0f80f95d7839528b8ac60aa0e2485c77)
2012-01-03 12:40:15 +11:00
Michael Adam
e04fad0ee4 vacuum: add new tunable VacuumInterval and mark Vacuum{Default,Min,Max}Interval obsolete
And use VacuumInterval instead of VacuumDefaultInterval in the code.

(This used to be ctdb commit 78530f40338f511a7cd1d33ada450905742bfa8f)
2011-12-23 17:39:02 +01:00
Michael Adam
a481ca711f vacuum: add ctdb_local_remove_from_delete_queue()
Pair-Programmed-With: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit a5065b42a98c709173503e02d217f97792878625)
2011-12-23 17:39:00 +01:00
Martin Schwenke
8b74037633 ctdb tool - generalise nodestring parsing for -n
Centralise -n nodestring parsing and add the ability to pass a
comma-separated list of node numbers.  Listing a node that is
disconnected or deleted results in failure, similar to the way passing
a single node currently works.  All of the auto_all commands inherit
this functionality.  For now, the non-auto_all commands do not inherit
this - they need to be individually tweaked.  Therefore, we haven't
updated the documentation to advertise this feature.

Implemented via a new function parse_nodestring() that parses an
optional (pass NULL when not available to indicate "current node")
comma-separated list of node numbers or "all".  parse_nodestring() can
be told to be non-fatal for disconnected/deleted nodes so it can also
be used in other contexts (yes, coming soon).  main() is changed to
call this function.

A new magic PNN value CTDB_MULTICAST is added and along with a
corresponding option.nodes structure member (a talloc-ed array of
PNNs).  This is also populated for "all" as well.

control_status() has new function pretty_print_flags() factored out so
pretty-printed flags can be used in error/debug messages.  New
function is_partially_online() is also factored out - this simplifies
some of the logic.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 920e3a732eb9e09004edde6cfb3c7db8a004016f)
2011-12-08 17:00:17 +11:00
Ronnie Sahlberg
609149bdc8 LibCTDB: Add support for the 'get interfaces' control and update the ctdb tool to use this interface
(This used to be ctdb commit 77dc0c7351071243d9096d3607d7499c82f46ec0)
2011-12-06 13:12:18 +11:00
Mathieu Parent
bb3d6698e9 Move platform-specific code to common/system_*
This removes #ifdef AIX and ease the addition of new platforms.

(This used to be ctdb commit 2fd1067a075fe0e4b2a36d4ea18af139d03f17bf)
2011-12-06 11:57:11 +11:00
Michael Adam
ad0de5494e traverse: fix traversing with empty records by adding a new (internal) control CTDB_CONTROL_TRAVERSE_START_EXT
By this, the original CTDB_CONTROL_TRAVERSE_START control that is
used by e.g. samba's smbstatus, is not changed, so that samba
continues working without code change.

The  CTDB_CONTROL_TRAVERSE_START currently just adds the "withemptyrecords"
flag to the state and processon on as CTDB_CONTROL_TRAVERSE_START_EXT.

(This used to be ctdb commit 8281bb210858ed04992eacea7f6d02261e0fc1b1)
2011-12-03 02:15:30 +01:00
Ronnie Sahlberg
11f3c947e6 LibCTDB: add support for the check-srvids control
(This used to be ctdb commit c32604fd0016de0df14845a2f222edaa3c52a4fa)
2011-11-30 10:00:07 +11:00
Volker Lendecke
5a1da0ac55 Add CTDB_CONTROL_CHECK_SRVID
(This used to be ctdb commit ad64ef2c40a2a12b37dbf39142e95c6781c2fc3b)
2011-11-30 09:02:26 +11:00
Ronnie Sahlberg
0420449a6c Recover Persistent database DB by DB and not record by record
Add a new tunable that changes the mode how persistent databases are recovered.
RecoveryPDBBySeqNum

When set to 1, persistent databases will be recovered in whole from the node which
has the highest "__db_sequence_number__" record.
This record is managed by samba for those databases where we do persistent writes and have
inter-record relations.
For these databases we do not want the usual "blend records from all nodes based
on individual record RSN" but instead a mode where we pick one instance of the persistent database.

If no node was found with a "__db_sequence_number__" record at all, we fail back to the original "recover records independently based on record RSN".
Some persistent databases do not contain record interrelations and as such does not
contain this special record at all.

(This used to be ctdb commit 502150c764298a9fa8c4d8aa445bf7d85d4ee9dc)
2011-11-30 08:48:23 +11:00
Ronnie Sahlberg
3cbff2edd8 LibCTDB: add get persistent db seqnum control
(This used to be ctdb commit 6e96a62494bbb2c7b0682ebf0c2115dd2f44f7af)
2011-11-30 08:48:14 +11:00
Michael Adam
31d62794fe ctdb: add an option --print-recordflags to trigger printing record flags in catdb and dumpdbbackup
This changes the default behaviour to not print record flags.

(This used to be ctdb commit 2d2ce07c51055d9400b22cd3c1fd682597cb921c)
2011-11-29 13:43:35 +01:00
Michael Adam
e6923904e8 ctdb: add an option --print-hash to enable printing of record hashes when dumping dbs
(This used to be ctdb commit efc033c28ade97f9884794256d59a4553e052d5f)
2011-11-29 13:43:34 +01:00
Michael Adam
86cd78efee ctdb: add an option --print-lmaster to enable printing of lmaster in "ctdb catdb"
(This used to be ctdb commit 326f88ef622620cb9e0569c4497bc0e86124beaa)
2011-11-29 13:43:33 +01:00
Michael Adam
dc98c12ac9 ctdb: add an option --print-datasize to only print datasize instead of dumping data in db dumps
Used in catdb, cattdb and dumpdbbackup.

(This used to be ctdb commit dd866116041e71cbf91e7fd91edcc9501634051d)
2011-11-29 13:43:32 +01:00
Michael Adam
1fcc7651f4 ctdb: add an option --print-emptyrecords to enable printing of empty records in dumping databases
this option is used with the commands catdb, cattdb and dumpdbbackup.

(This used to be ctdb commit 6ec68a2e667f66d2b194fe48cb75229a2777842e)
2011-11-29 10:30:24 +01:00
Michael Adam
1a31c84348 traverse: add a flag to enable transferring empty records in cluster wide traverse
This will be useful for also printing information about empty/deleted
records in "ctdb catdb", e.g. for debugging vacuuming issues.

(This used to be ctdb commit ddc5da3a0df7701934404192a0a0aa659a806acb)
2011-11-29 10:30:24 +01:00
Martin Schwenke
3ae8273d86 Make some ctdb_takeover.c functions static
These were intentionally not static so they could be linked to in unit
test programs.  However, using the CCAN-style unit tests where
relevant code is just included, this is no longer necessary.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit d0e9e8554614bd49ffb9ec3509feaa0e80d0f65d)
2011-11-11 14:41:47 +11:00
Martin Schwenke
f186dd90b6 Move some common functions to common/ctdb_ltdb.c
Move identical copies of ctdb_null_func(), ctdb_fetch_func(),
ctdb_fetch_with_header_func() from ctdb_client.c and
ctdb_ltdb_server.c to somewhere common.

This is in the context of wanting to run CCAN-style tests where most
of the ctdbd code is just included in the test program.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 126cb0d369b2b1aed63801dc4ba0554399e8b7e4)
2011-11-11 14:31:50 +11:00
Martin Schwenke
52ff485958 Added some #ifndefs to stop files being included multiple times.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit fdca12c25e6fce6206135b994dedf44265e4eb09)
2011-11-11 14:31:50 +11:00
Ronnie Sahlberg
44de394796 SRVID ranges: Change the ranges for SRVIDs to allow 8 bit prefixes
Update the ranges used for SRVID allocation to allow 8 bit prefixes and thus
56 user-defined bits.
Define the defacto-use of the 0x00 prefix as a SRVID used to register a process id
Upgrade SAMBA/iSCSI/NFS/TEST from a 32 bit prefix each ot a 8 bit prefix each
for private use.

(This used to be ctdb commit 5de9ec2bdf8067406165bc470becdca87f458ae9)
2011-11-09 08:12:44 +11:00
Ronnie Sahlberg
0e79b2d1e8 Record Fetch Collapse: Collapse multiple fetch request into one single request.
When multiple clients fetch the same record concurrently, send only one single
fetch across the network and deferr all other fetches locally.
This improves performance for hot records and reduces cpu load on ctdb.

(This used to be ctdb commit 82d6946ad8b3348e8b9d3d971f24925ade02d1be)
2011-11-08 16:08:28 +11:00
Ronnie Sahlberg
c21ec9fffc ReadOnly: add readonly record lock requests to libctdb
Initial readonly record support in libctdb.
New records are not yet created by the library but extising records will be delegated as readonly records.
This needs a bit more tests before we can drop the "old style" implementation of client
code in client/ctdb_client.c

(This used to be ctdb commit fb50a45a21ff56480d76acd1c33c13c323cbf5e2)
2011-10-28 11:55:46 +11:00
Ronnie Sahlberg
8e4bfba75c ReadOnly: Rename the function ctdb_ltdb_fetch_readonly() to ctdb_ltdb_fetch_with_header() since this is what it actually does.
(This used to be ctdb commit 94a5ce4e08e7891f07dbfe4c822ca4be5ab10965)
2011-09-13 18:38:20 +10:00
Ronnie Sahlberg
0dc5584101 Merge branch 'master-readonly-records' into foo
Conflicts:

	Makefile.in
	tools/ctdb.c

(This used to be ctdb commit 0fedef0ffba4178126eee9544c5e2db52f5db893)
2011-09-12 09:34:34 +10:00
Ronnie Sahlberg
1c05db2c9c Merge remote branch 'ddiss/master_pmda_and_client_timeouts'
(This used to be ctdb commit 7bebfc7bad8f36e54003b8e25372fdaf54836e21)
2011-09-08 11:22:53 +10:00
David Disseldorp
2f925f1e64 pmda: Attempt reconnects while ctdbd is unavailable
Attempt to reconnect to ctdbd on fetch while it is unreachable.

We must provide our own queue callback wrapper, as ctdb_client_read_cb()
exits on transport failure.

(This used to be ctdb commit 28df6fbf1273b8d095a2bc38dca6a6c35c5c31bd)
2011-09-06 14:01:18 +02:00
David Disseldorp
5296da5609 client: add timeout argument to ctdb_attach
Rather than using a fixed 2 second CTDB_CONTROL_GETDBPATH timeout.

(This used to be ctdb commit 9e178671560cb95121e11d718a76b05380ecd6c5)
2011-09-06 13:57:04 +02:00
David Disseldorp
0628d1c0e6 client: add req timeout argument to ctdb_cmdline_client
Following connection to the local ctdbd, ctdb_cmdline_client() currently
issues a CTDB_CONTROL_GET_PNN request with a fixed 3 second timeout.

The ctdb cmd line client accepts a --timelimit argument for specifying
a per request timeout, pass this value through to ctdb_cmdline_client()
for use as a CTDB_CONTROL_GET_PNN request timeout.

(This used to be ctdb commit 0634d0305f42f17048b6830733767e8dc300e11c)
2011-09-06 13:56:54 +02:00
Ronnie Sahlberg
783ceca07b Interface monitoring: add a event to trigger every 30 seconds to check that all interfaces referenced by the public address list actually exists.
This will make it much easier to root-cause problems such as
S1029023
when an external application deleted the interface while it is still is in use by ctdbd.

(This used to be ctdb commit 9abf9c919a7e6789695490e2c3de56c21b63fa57)
2011-09-06 17:02:19 +10:00
Ronnie Sahlberg
64378fea58 Check interfaces: when reading the public addresses file to create the vnn list
check that the actual interface exist, print error and fail startup if the interface does not exist.

(This used to be ctdb commit cd33bbe6454b7b0316bdfffbd06c67b29779e873)
2011-09-06 16:11:00 +10:00
Michael Adam
a3e0079568 Add a tunable "AllowClientDBAttach" with default value 1.
When set to 0, clients will not be able to attach to databases
via the db_attach control. This might can be useful for maintenance
where ctdb should be kept running but clients should not be able
to modify databases.

(This used to be ctdb commit ddfeecda87955b4e46777599f678e6926d37f4c4)
2011-09-05 16:17:39 +10:00
Ronnie Sahlberg
206a3c0c66 ReadOnly: add a new control to activate readonly lock capability for a database.
let all databases default to not support this  until enabled through this control

(This used to be ctdb commit 908a07c42e5135a3ba30a625fc4f4e4916de197a)
2011-09-01 11:08:18 +10:00
Ronnie Sahlberg
a0d4d240c3 ReadOnly: add a readonly flag to the getdbmap control and show the readonly setting in ctdb getdbmap output
(This used to be ctdb commit 4cac9ad7d9c9ca657a247a6c215476399c7d2210)
2011-09-01 10:28:15 +10:00
Ronnie Sahlberg
63dc96cdb2 ReadOnly: Change the ctdb_db structure to keep a uint8_t for flags instead of a boolean for
the persistent flag.
This is the same size as the original boolean but allows ut to add additional flags for the database

(This used to be ctdb commit 7462761638d25880ad46024ad4ef21667eb99a98)
2011-09-01 10:21:55 +10:00
Ronnie Sahlberg
2902203900 Logging: when we log stdout/stderr messages from eventscripts to the system log, prefix every line of output with the name of the eventscript.
CQ S1028412

(This used to be ctdb commit 392363c04185f47a826fc6ed95038342be2150bf)
2011-08-26 09:39:25 +10:00
Ronnie Sahlberg
b00b0e9d2e LibCTDB : add support for getrecmode
(This used to be ctdb commit b663f286ea8edd64c0405a1ab45b6ef1da501bf5)
2011-08-23 15:32:14 +10:00
Ronnie Sahlberg
5e72ee5127 LibCTDB : add support for getrecmode
(This used to be ctdb commit 0893fa0f3257f50d54896ffa78ec12ee11e8c6d2)
2011-08-23 15:00:27 +10:00
Ronnie Sahlberg
af19b5acff LibCTDB: add commands where an application can query how many commands are active
and we have not yet received a reply to.
Applications may use this command to query if it is "safe" to stop the event system and sleep
or whether it should first wait for all activity to ctdb daemons to cease first.

(This used to be ctdb commit 8d89bfdfd1f55dfeb22890b8bb0f08f31d1fa91a)
2011-08-23 12:43:16 +10:00
Ronnie Sahlberg
37608d70fc ReadOnly: Add clientside code to fetch readonly records
(This used to be ctdb commit 6fccc902bce21fa6ff13ed08ee3341bbf8be39f2)
2011-08-23 10:34:15 +10:00
Ronnie Sahlberg
1bbd4cbf35 ReadOnly: Add a ctdb_ltdb_fetch_readonly() helper function
(This used to be ctdb commit 8551420fb331dd2a897f4619278a981fcefb96e8)
2011-08-23 10:33:17 +10:00
Ronnie Sahlberg
17f0e0890c ReadOnly: Add a new flag to call request packet to indicate that the client wants a readonly delegation
(This used to be ctdb commit a3f54a556e97170eedf43708d58dd32446ca5840)
2011-08-23 10:29:40 +10:00
Ronnie Sahlberg
dda2616cf5 ReadOnly: Add a function to start a revoke of all delegations for a record.
This triggers a child process to be created to perform the actual potentially blocking calls that are required.

(This used to be ctdb commit 7d575ee92c95bc4aab78a33bc1aac7ff0811ab3a)
2011-08-23 10:27:31 +10:00
Ronnie Sahlberg
1bb855bd52 ReadOnly: Add functions to register CALLs to a context used to handle deferal of processing of CALL commands.
Once the contexts are freed, the deferred calls are re-issued to the input packet processing functions again.
This is needed when/if a CALL can not currently be processed by the main engine due to the record being locked down for revoking of all delegations.

The data is passed through several layers of callbacks, and finally a timed event callback to ensure that the processing of the packet will be restarted again at the topmost eventloop, avoinding event loop nesting.

(This used to be ctdb commit cc6f78efcfa3b8caeffbd68018e6dfbf81488dce)
2011-08-23 10:25:57 +10:00
Ronnie Sahlberg
3d495c48d2 ReadOnly: Add an extra flag to ctdb_call_local to specify whether we want to write the record and header back to the tdb (for example we do when performing dmaster migrations)
(This used to be ctdb commit b935e83255aeb3754b2fd37cf5611e02f7283514)
2011-08-23 10:25:05 +10:00
Ronnie Sahlberg
1441b77cce ReadOnly: Add "readonly" flag to the ctdb_db_context to indicate if this database supports readonly operations or not. Add a private lock-less tdb file to the ctdb_db_context to use for tracking delegarions for records
Assume all databases will support readonly mode for now and se thte flag for all databases. At later stage we will add support to control on a per database level whether delegations will be supported or not.

(This used to be ctdb commit 502f86f79944df4bac9094f716e54110c511dc24)
2011-08-23 10:24:26 +10:00
Ronnie Sahlberg
8f63a5dadd ReadOnly: Add 4 new record flags to handle read only delegation and revoking of delegations
(This used to be ctdb commit 875b0bede217547b51f02648b6a28a3c98b6b949)
2011-08-23 10:17:08 +10:00
Ronnie Sahlberg
e8127f0e0f ReadOnly: Add clientside functions to send the UPDATE_RECORD control
(This used to be ctdb commit 74a5b3d7bafd8827a4ee80095fde5798263821e4)
2011-08-23 10:11:38 +10:00
Ronnie Sahlberg
f924b3f40e ReadOnly: Add helper functions to manipulate a TDB_DATA as a bitmap for nodes that we are tracking as having a readonly delegation
(This used to be ctdb commit d10084e62d37674bb8d9e31d457fd23e050545be)
2011-08-23 10:09:42 +10:00
Ronnie Sahlberg
00a870f759 ReadOnly records: Add a new RPC function FETCH_WITH_HEADER.
This function differs from the old FETCH in that this function will also fetch the record header and not just the record data

(This used to be ctdb commit c7196d16e8e03bb2a64be164d15a7502300eae0e)
2011-08-23 10:06:59 +10:00
Volker Lendecke
21bb8abc93 libctdb: "ctdb_request_free" does not need the ctdb_connection parameter
Signed-off-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit 5a5ed2a43b76bec69494b6cdc6451527f5c472e5)
2011-08-22 17:11:07 +02:00
Martin Schwenke
5ac67504ca Tests: Initial test code for LCP2 IP allocation algorithm.
Move struct ctdb_public_ip_list to ctdb_private.h and put some
definitions for some functions from ctdb_takeover.c there.  This
allows those functions to be called from unit tests.

Add ctdb_takeover_tests.c and the Makefile support to build it.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 9d34be0233edf3bc022345c0494c4b2a4d7f8480)
2011-07-29 09:01:36 +10:00
Martin Schwenke
ff1a81c872 IP allocation - add LCP2 algorithm.
The current non-deterministic IP allocation algorithm balances IPs
across the whole cluster.  It does not consider different
interfaces/VLANs/subnets, so these different groups of IPs aren't
generally well balanced.

This adds the LCP2 algorithm for IP allocation and allows it to be
enabled by setting the "LCP2PublicIPs" tunable to 1.

The LCP2 algorithm calculates the imbalance of a node by totalling the
squares of the distances between each IP on the node.  The IP distance
is defined as the length longest common prefix (LCP) of bits that is
found when comparing 2 IPs.  The imbalance of a cluster is the maximum
imbalance for any node.  At each step the algorithm selects an
allocation to the IP/node combination that results in the choosing the
allocation that best reduces the imbalance of the cluster.

The implementation splits out the IP allocation part of
ctdb_takeover_run() into new function ctdb_takeover_run_core(), and
then extracts out the basic IP assignment code into new functions
basic_allocate_unassigned() and basic_failback().  3 new functions
lcp2_init(), lcp2_allocate_unassigned() and lcp2_failback() implement
the LCP2 algorithm, and are hooked into ctdb_takeover_run_core().

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 61fc7fbd0235469df22deb6581c6bd47e30bc0be)
2011-07-29 09:01:17 +10:00
Michael Adam
827e871ec4 ctdb_private.h: add record flag CTDB_REC_FLAG_AUTOMATIC
This is a flag that shall signa that a record has been automatically generated by ctdb
and not by an explicit client store operation. This will be used in the ctdb_ltdb_fetch
operation which stores an empty record with default initial header before trying to
migrate the record from the dmaster when the record does not exist in the local tdb.

(This used to be ctdb commit 46381a3cb58ccc11422af8f7798c80ea8d72294f)
2011-03-14 13:35:51 +01:00
Michael Adam
9e8d6b82b5 server: Use the ctdb_ltdb_store_server() in the ctdb daemon for non-persistent dbs
This is realized by adding a ctdb_ltdb_store_fn function pointer to the db
context and filling it in the attach procedure for non-persistent dbs.

(This used to be ctdb commit df49ec44de80affa5ccc637dec12a20a26e8706e)
2011-03-14 13:35:50 +01:00
Michael Adam
a6b13b21c1 client: add accessor function ctdb_header_from_record_handle().
(This used to be ctdb commit cf57efd440ccc3db381386f4749bfcbf8ac5ecae)
2011-03-14 13:35:50 +01:00
Michael Adam
50bd249990 vacuum: add ctdb_local_schedule_for_deletion()
(This used to be ctdb commit b70bc141d84f7355d2c6c901961b7366db566980)
2011-03-14 13:35:49 +01:00
Michael Adam
8569fcbc83 server: implement a new control SCHEDULE_FOR_DELETION to fill the delete_queue.
(This used to be ctdb commit 680223074e992b32ccf6f42cb80c3fa93074fee7)
2011-03-14 13:35:49 +01:00
Michael Adam
46a05397a4 control: add a new control opcode CTDB_CONTROL_SCHEDULE_FOR_DELETION
(This used to be ctdb commit 4cebfa33db3c7effa087f753530c52b2dd8550e6)
2011-03-14 13:35:49 +01:00
Michael Adam
77d4d156d3 control: add macro CHECK_CONTROL_MIN_DATA_SIZE.
This is for the control dispatcher to check whether the input data has
a required minimum size.

(This used to be ctdb commit 2038e745db33cc5c3b4e2db8a00a57ede03906a2)
2011-03-14 13:35:49 +01:00
Michael Adam
9d20f76052 Add a tunable VacuumFastPathCount.
This will control how many fast-path vacuuming runs wil have to
be done, before a full vacuuming will be triggered, i.e. one with
a db-traversal.

(This used to be ctdb commit 0d997ec7e61a7bee2cb05456f9c7d5e6f7a44797)
2011-03-14 13:35:47 +01:00
Michael Adam
cd061f3dee Add a delete_queue to the ctdb database context struct.
This list will be filled by the client using a new
delete control. The list will then be used to implement
a fast-path vacuuming that will traverse this list instead
of traversing the database.

(This used to be ctdb commit 9bbedf786b26bb074f668b31f29a9032af958673)
2011-03-14 13:35:45 +01:00
Michael Adam
f7eeb42219 add a new record flag CTDB_REC_FLAG_VACUUM_MIGRATED.
This is to be used internally. The purpose is to flag a record
as been migrated by a VACUUM_MIGRATION, which is triggered by
a VACUUM_FETCH message as part of the vacuuming. The local store
routine will base its decision whether to delete or to store
the record (among other things) upon the value of this flag.

This flag should never be stored in the local database copies.

(This used to be ctdb commit dd2449c422f323f9b5485e45107a9cc5acc09e08)
2011-03-14 13:35:44 +01:00
Michael Adam
f3fbd31d85 call: Move definition of call flags down to the definition of the flags field.
(This used to be ctdb commit 86c844fb08a7fd33e94f56b8d5e43278120e1162)
2011-03-14 13:35:44 +01:00
Michael Adam
a2c11d6edc call: add new call flag CTDB_CALL_FLAG_VACUUM_MIGRATION
This is to be used when the CTDB_SRVID_VACUUM_FETCH message
triggers the migration of deleted records to the lmaster.
The lmaster can then delete records that have not been
migrated with data instead of storing them.

(This used to be ctdb commit 455cc6616e10b7f09589f9b87cb60f591bb502b0)
2011-03-14 13:35:44 +01:00
Ronnie Sahlberg
8acb677c9c Deferred attach : at early startup, defer any db attach calls until we are out of recovery.
(This used to be ctdb commit eeaabd579841f60ab2c5b004cbbb1f5de2bfe685)
2011-03-01 12:13:34 +11:00
Michael Adam
2bd04f0ff8 persistent: add ctdb_persistent_finish_trans3_commits().
This function walks all databases and checks for running trans3 commits.
It sends replies to all of them (with error code) and ends them.
To be called when a recovery finishes.

(This used to be ctdb commit 70ba153b532528bdccea70c5ea28972257f384c1)
2011-02-24 10:35:26 +01:00
Michael Adam
ace1efb878 persistent: add a ctdb_persistent_state member to the ctdb_db context.
To be used for tracking running transaction commits through recoveries.

(This used to be ctdb commit 1237e15df4af58a3d220eea42a4b75e21e65029f)
2011-02-24 10:35:25 +01:00
Ronnie Sahlberg
65f44e159f Add two new flags for the ltdb header.
One of which signals that the record has never been migrated to/from a node
while containing data.
This property "has never been migrated while non-zero" is important later
to provide heuristics on which records we might be able to purge
from the tdb files cheaply, i.e. without having to rely on the full-blown
database vacuum.

These records are belived to be very common and the pattern would look like
this :
1, no record exists at all.
2, client opens a file
3, samba requests the record for this file
4, an empty record is created on the LMASTER
5, the empty record is migrated to the DMASTER
6, samba writes a <sharemode> to the record locally and the record grows
7, client finishes working the file and closes the file
8, samba removes the sharemode and the record becomes empty again.
9, much later : vacuuming will delete the record

At stage 8, since the record has never been migrated onto a node wile being
non-zero it would be safe, and much more efficient to just delete the record
completely from the database and hand it back to the LMASTER.

The flags occupy the same uint32_t as was previously used for laccessor/lacount
in the header. For now, make sure the flags only define/use the top 16 bits
of this field so that we are sure we dont collide with bits set to one
from previous generations of the ctdb cluster database prior to this
change in semantics of this word.

This is a rework of Michaels patch :
commit 2af1a47cbe1a608496c8caf3eb0c990eb7259a0d
Author: Michael Adam <obnox@samba.org>
Date:   Tue Nov 30 17:00:54 2010 +0100

    add a DEFAULT record flag and a MIGRATED_WITH_DATA record flag.

(This used to be ctdb commit e075670dee8e6ecaba54986f87a85be3d0528b6b)
2011-02-18 10:14:56 +11:00
Ronnie Sahlberg
b57bd0f896 Remove LACOUNT and LACCESSOR and migrate the records immediately.
This concept didnt work out and it is really just as expensive as a full migration
anyway, without the benefit of caching the data for subsequence accesses.

Now, migrate the records immediately on first access.
This will be combined with a "cheap vacuum-lite" for special empty records to
prevent growth of databases.

Later extensions to mimic read-only behaviour of records will include proper shared read-only locking of database records, making the laccessor/lacount read-only access to the data obsolete anyway.

By removing this special case and handling of lacount laccessor makes the codapath where shared read-only locking will be be implemented simpler, and frees up space in the ctdb_ltdb header for use by vacuuming flags as well as read-only locking flags.

(This used to be ctdb commit 155dd1f4885fe142c6f8bd09430f65daf8a17e51)
2011-02-18 10:08:32 +11:00
Ronnie Sahlberg
0f33605866 LockWait congestion.
Add a dlist to track all active lockwait child processes.
Everytime creating a new lockwait handle, check if there is already an
active lockwait process for this database/key and if so,
send the new request straight to the overflow queue.

This means we will only have one active lockwaic child process for a certain key,
even if there were thousands of fetch-lock requests for this key.

When the lockwait processing finishes for the original request, the processing in d_overflow() will automagically process all remaining keys as well.

Add back a --nosetsched argument to make it easier to run under gdb

(This used to be ctdb commit 3e9317a2e1f687b04bf51575d47fcd4faa6e6515)
2011-01-24 12:21:58 +11:00
Rusty Russell
e57362ecf4 ctdb_lockwait: create overflow queue.
Once we have more than 200 children waiting on a particular db, don't create
any more.  Just put them on an overflow queue, and when a child gets a lock
search that queue to see if others were after the same lock (they probably
were).

(This used to be ctdb commit 5e614e8cfd1e9a4b13035a0e400b7a60a745b510)
2011-01-24 12:21:50 +11:00
Ronnie Sahlberg
fcd98a7e59 LIBCTDB: add support for traverse
(This used to be ctdb commit 9463e04038ba36792583f83bd95c1af322dc283a)
2011-01-14 17:38:56 +11:00
Ronnie Sahlberg
c4006ce844 Add ctdb_fork(0 which will fork a child process and drop the real-time
scheduler for the child.

Use ctdb_fork() from callers where we dont want the child to be running
at real-time privilege.

(This used to be ctdb commit 58795a4c9e0624e20fa3e0023b65127053edd103)
2011-01-11 07:40:41 +11:00
Ronnie Sahlberg
ea0df6d882 Revert scheduling back to use real-time processes
Revert this patch:
commit 482c302d46e2162d0cf552f8456bc49573ae729d

We may need to use real-time processes for the main daemon and the recovery daemon to handle the cases where systems come under very high loads.

(This used to be ctdb commit 08bef9dcab6e4da15fc783f8624e5ed09aa060b5)
2011-01-11 07:40:35 +11:00
Ronnie Sahlberg
c69ada0090 add a new ctdb_ltdb function to delete a record in a normal database
(This used to be ctdb commit fe9070ec9be69e6a6fcbf9899e7ced24541c9c3a)
2010-12-07 15:32:30 +11:00
Ronnie Sahlberg
83e68b62dd delay loading the public ip address file until after we have started the transport and discovered ouw own pnn number
(This used to be ctdb commit 1b57fc866fc836b5dbd3ef7b646e5a0f4280e81e)
2010-11-10 14:55:24 +11:00
Ronnie Sahlberg
5f76f3c0e2 Add a new tunable : DisableIPFailover that when set to non 0
will stopp any ip reallocations at all from happening.

(This used to be ctdb commit d8d37493478a26c5f1809a5f3df89ffd6e149281)
2010-11-10 14:55:24 +11:00
Ronnie Sahlberg
5ef29f9f25 Update latency countes to show min/max and average
(This used to be ctdb commit 1919e949af4641ffe919123e44b02fb87c13ab9f)
2010-10-11 15:12:24 +11:00
Ronnie Sahlberg
3ba7ac13eb Create a tunable for how often to collect rolling statistics and initialize it to 1 second
(This used to be ctdb commit cb8c779bb5d9862abbe08919aa181a1a1b2bef18)
2010-09-30 15:00:57 +10:00