samba-mirror

mirror of https://github.com/samba-team/samba.git synced 2024-12-23 17:34:34 +03:00

Author	SHA1	Message	Date
Martin Schwenke	bc5f0a2b65	ctdbd: Remove command-line option --debug-hung-script Use an environment variable instead. This just means that the initscript exports CTDB_DEBUG_HUNG_SCRIPT and the code checks for the environment variable. The justification for this simplification is that more debug options will be arriving soon and we want to handle them consistently without needing to add a command-line option for each. So, the convention will be to use an environment variable for each debug option. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 0581f9a84e58764d194f4e04064c2c5b393c348b)	2013-02-05 16:05:13 +11:00
Martin Schwenke	f2428cadd8	ctdbd: Remove debug_hung_script_ctx The only allocation against this context is by ctdb_fork_with_logging(). This memory is freed by ctdb_log_handler() anyway. There should be no memory leak. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 501461cc3e132d4adee9e91b5d4513a26bae2846)	2013-02-05 16:05:13 +11:00
Martin Schwenke	f2ba0e8a65	Logging: New function ctdb_log_ringbuffer_free() Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit a4f622e85168f59417c11705f1734e0352e1d44a)	2013-02-05 12:40:30 +11:00
Amitay Isaacs	4a6fa39ff9	daemon: Protect against double free of callback state while shutting down When CTDB is shut down and monitoring has been stopped, monitor_context gets freed and all the callback states hanging off it. This includes callback state for current_monitor, if the current monitor event has not yet finished. As a result, when the shutdown event is called, current_monitor->callback state is not NULL, but it's actually freed and it's a dangling reference. So before executing callback function and freeing callback state check if ctdb->monitor->monitor_context is not NULL. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 7d8546ee4353851f0543d0ca2c4c67cb0cc75aea)	2013-01-09 14:39:23 +11:00
Amitay Isaacs	30299c387f	daemon: On shutdown, destroy timed events that check if recoverd is active When CTDB is shutting down, recovery daemon is stopped, but the event that checks if recovery daemon is still alive is not destroyed. So recovery master is restarted during shutdown if CTDB daemon takes longer to shutdown. There are two processes that check if recovery daemon is working. 1. ctdb_check_recd() - which checks every 30 seconds if the recovery daemon process exists. 2. ctdb_recd_ping_timeout() - which is triggered when recovery daemon fails to ping CTDB daemon. Both the events are periodic and need to be destroyed when shutting down. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 746168df2e691058e601016110fae818c6a265c3)	2013-01-09 13:20:26 +11:00
Martin Schwenke	80a2bb84e7	ctdbd: Remove debug option --node-ip, use --listen instead This effectively reverts d96cb02c2c24f9eabbc53d3d38e90dea49cff3e0 Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 496387a585b2c5778c808cf02b8e1435abde4c3e)	2013-01-07 10:35:39 +11:00
Amitay Isaacs	a73f13ada7	daemon: Add a tunable to enable automatic database priority setting Samba versions 3.6.x and older do not set the database priority. This can cause deadlock between Samba and CTDB since the locking order of database will be different. A hack was added for automatic promotion of priority for specific databases to avoid deadlock. This code should not be invoked with Samba version 4.x which correctly specifies the priority for each database. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit 4a9e96ad3d8fc46da1cd44cd82309c1b54301eb7)	2013-01-05 01:14:57 +01:00
Amitay Isaacs	13518b9e33	daemon: Check if log_latency_ms is set before using it This fixes a bug where wrong variable is checked. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit f81e9add466b1d9b2796c09c6ba63b77296ea149)	2012-11-30 12:21:30 +11:00
Amitay Isaacs	442d9905fe	locking: Do not use RECLOCK for tracking DB locks and latencies RECLOCK is for recovery lock in CTDB. Do not override the meaning for tracking locks on databases. Database lock latency has nothing to do with recovery lock latency. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 54e24a151d2163954e5a2a1c0f41a2b5c19ae44b)	2012-11-14 15:51:59 +11:00
Amitay Isaacs	85c8deca3f	recoverd: Track the nodes that fail takeover run and set culprit count If any of the nodes fail takeover run (either due to timeout or failure to complete within takeover_timeout interval) from main loop, recovery master will give up trying takeover run with following message: "Unable to setup public takeover addresses. Try again later" And as a side-effect the monitoring is disabled on all the nodes. Before ctdb_takeover_run() is called from main loop, monitoring get disabled via startrecovery event. Since ctdb_takeover_run() fails, it never runs recovered event and monitoring does not get re-enabled. In main_loop, ctdb_takeover_run() is called with a takeover_fail_callback. This callback will get called if any of the nodes fail in handling takeip/releaseip/ipreallocated events in ctdb_takeover_run(). Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit a5c6bb1fffb8dc3960af113957a1fd080cc7c245)	2012-11-14 10:59:54 +11:00
Martin Schwenke	db5dfe891c	recoverd: Add CTDB_SRVID_GETLOG and CTDB_SRVID_CLEARLOG These support getting and clearing logs from the ring-buffer in the recovery daemon. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit cbca233d1e03b2410e0bb63b936328d4a8b3c7b4)	2012-10-22 11:15:36 +11:00
Amitay Isaacs	bc126ccdd4	build: Set CTDB_PATH to /tmp/ctdb.socket if SOCKPATH is not defined When building samba with CTDB, if samba configure/waf does not support setting of SOCKPATH, fallback to /tmp/ctdb.socket. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit a9511cf5ecd5bc39b0070f0afa8ac4d4926c6cab)	2012-10-22 09:01:27 +11:00
David Disseldorp	8cbf1a00c4	Build: Set the default ctdb socket path at configure time The ctdb socket path currently defaults to /tmp/ctdb.socket and can be modified at runtime using the --socket=filename option, common to both ctdb and ctdbd binaries. This change allows the default path to be set at configure time using the --with-socketpath=FILE argument. When not specified, the default path remains /tmp/ctdb.socket, documentation remains unchanged as a result. Signed-off-by: David Disseldorp <ddiss@samba.org> (This used to be ctdb commit f92b9c83a2f39fba9a141417a88de96fc8c592ff)	2012-10-21 01:39:08 +11:00
Amitay Isaacs	a00e50e503	ctdbd: Replace lockwait with locking API and remove ctdb_lockwait.c Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 2126795153dacb255e441abcb36ee05107b6282a)	2012-10-20 02:48:44 +11:00
Amitay Isaacs	83306337df	ctdbd: locking: Provide non-blocking API for locking of TDB record/db/alldb This introduces a consistent API for handling locks on single record, complete db or all dbs. The locks are taken out in a child process. In cases of timeout, find the processes that currently hold the lock and log. Callback functions for locking requests take locked boolean to indicate whether the lock was successfully obtained or not. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 1af99cf0de9919dd89af1feab6d1bd18b95d82ff)	2012-10-20 02:48:44 +11:00
Amitay Isaacs	1011d10a51	common: Add routines to get process and lock information Currently these functions are implemented only for Linux. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit be4051326b0c6a0fd301561af10fd15a0e90023b)	2012-10-20 02:48:44 +11:00
Amitay Isaacs	ef79dc012e	header: Added DB statistics update macros Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit a0cdfae7438092f5c605f0608daa536be860b7fe)	2012-10-20 02:48:44 +11:00
Martin Schwenke	8d7562f3f8	common: Debug ctdb_addr_to_str() using new function ctdb_external_trace() We've seen this function report "Unknown family, 0" and then CTDB disappeared without a trace. If we can reproduce it then this might help us to debug it. The idea is that you do something like the following in /etc/sysconfig/ctdb: export CTDB_EXTERNAL_TRACE="/etc/ctdb/config/gcore_trace.sh" When we hit this error than we call out to gcore to get a core file so we can do forensics. This might block CTDB for a few seconds. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 7895bc003f087ab2f3181df3c464386f59bfcc39)	2012-10-18 20:05:42 +11:00
Martin Schwenke	4b4e4d8870	ctdbd: Stop takeovers and releases from colliding in mid-air There's a race here where release and takeover events for an IP can run at the same time. For example, a "ctdb deleteip" and a takeover initiated by the recovery daemon. The timeline is as follows: 1. The release code registers a callback to update the VNN. The callback is executed after the eventscripts run the releaseip event. 2. The release code calls the eventscripts for the releaseip event, removing IP from its interface. The takeover code "updates" the VNN saying that IP is on some iface.... even if/though the address is already there. 3. The release callback runs, removing the iface associated with IP in the VNN. The takeover code calls the eventscripts for the takeip event, adding IP to an interface. As a result, CTDB doesn't think it should be hosting IP but IP is on an interface. The recovery daemon fixes this later... but it shouldn't happen. This patch can cause some additional noise in the logs: Release of IP 10.0.2.133/24 on interface eth2 node:2 recoverd:We are still serving a public address '10.0.2.133' that we should not be serving. Removing it. Release of IP 10.0.2.133/24 rejected update for this IP already in flight recoverd:client/ctdb_client.c:2455 ctdb_control for release_ip failed recoverd:Failed to release local ip address In this case the node has started releasing an IP when the recovery daemon notices the addresses is still hosted and initiates another release. This noise is harmless but annoying. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit bfe16cf69bf2eee93c0d831f76d88bba0c2b96c2)	2012-10-11 12:10:45 +11:00
Martin Schwenke	79ea15bf96	ctdbd: New tunable NoIPTakeoverOnDisabled Stops the behaviour where unhealthy nodes can host IPs when there are no healthy nodes. Set this to 1 when an immediate complete outage is preferred when all nodes are unhealthy. The alternative (i.e. default) can lead to undefined behaviour when the shared filesystem is unavailable. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit a555940fb5c914b7581667a05153256ad7d17774)	2012-10-11 12:10:45 +11:00
Volker Lendecke	a68512c7d8	Correct include for ctdb_protocol.h With an old ctdb_protocol.h installed under /usr/local, ctdb will not compile because the <> form of include will find the header under /usr/local (This used to be ctdb commit c4f5a58471b206e2287c7958c7f29c1f1c0626ac)	2012-10-09 23:13:29 +11:00
Martin Schwenke	e05fc0e7b0	libctdb: add ctdb_getcapabilities() Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 140fafef23050d40d66f5b5558c7efcb78f80cd2)	2012-09-28 17:05:34 +10:00
Ronnie Sahlberg	d21337a0fb	Add new command to find which interface is located on (This used to be ctdb commit f07376309e70f5ccdb7de8453caacc71b451ab48)	2012-06-20 15:11:49 +10:00
Ronnie Sahlberg	59565c05cf	STATISTICS: Add tracking of the 10 hottest keys per database measured in hopcount and add mechanisms to dump it using the ctdb dbstatistics command (This used to be ctdb commit 8307c70ed98996b430c470e9641a09fdeeb81bd8)	2012-06-13 16:19:18 +10:00
Amitay Isaacs	7631830152	server: Replace BOOL datatype with bool, True/False with true/false Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 6e5cbe8fff71985e5a2fc16b7e9f2b868011ff5d)	2012-05-28 11:22:25 +10:00
Ronnie Sahlberg	e7d21834ae	RECOVER: When we pull databases during recovery, we used to reallocate the databuffer for each entry added. This would normally not be an issue, but for cases where memory is fragmented, this could start to cost significant cpu if we need to reallocate and move to a different region. Change this to instead preallocate , by default, 10MByte chunks to the data buffer. This significantly reduces the number of potential reallocate and move operations that may be required. Create a tunable to override/change how much preallocation should be used. (This used to be ctdb commit 1f262deaad0818f159f9c68330f7fec121679023)	2012-05-25 12:34:06 +10:00
Ronnie Sahlberg	26322d257d	DEBUG: Add checks for and print debug messages when 1) a database contains very many records, 2) when a database is very big, 3) when a single record is very big. Add tunables to control when to log these instances and allow it to be completely turned off by setting the threshold to 0 (This used to be ctdb commit 9ed58fef4991725f75509433496f4d5ffae0ae87)	2012-05-21 13:26:13 +10:00
Ronnie Sahlberg	dce5969d12	Debug: When scripts hang, we may need to collect additional data in order to debug why the script hung. Break this debug and datacollection out into an external script to make it easier to modify what data we need to collect. For now we only collect a pstree so we can see what part of the script we hung in. S1037271 (This used to be ctdb commit 6e68797af67bee36f2bad045f94806e7e98f27e9)	2012-05-17 10:29:03 +10:00
Ronnie Sahlberg	a57eba2bb4	Track all child process so we never send a signal to an unrelated process (our child died and kernel wrapped the pid-space and reused the pid for a different process Wrap all creation of child processes inside ctdb_fork() which is used to track all processes we have spawned. Capture SIGCHLD to track also which child processes have terminated. Wrap kill() inside ctdb_kill() and make sure that we never send a !0 signal to a child process pid that has already terminated (and might have been replaced with a (This used to be ctdb commit f73a4b1495830bcdd094a93732a89dd53b3c2f78)	2012-05-03 14:03:26 +10:00
Ronnie Sahlberg	a367fa6138	RELOADIPS: simplify the reloadips code a bit and also update the "read public address file" to not check if the address exists already locally when we read if from the child process, to stop it from spamming the logs with "We already host ..." messages (This used to be ctdb commit 334ea830f1bf33419f4a1e78f23afd41a852d0f4)	2012-05-01 15:34:26 +10:00
Ronnie Sahlberg	7a1aa560e7	Add new control to reload the public ip address file on a node Also add a method to use the recovery master/daemon to reload the public ips on all nodes in the cluster. Reloading the public ips on all node sin the cluster is only suported if all nodes in the cluster are available and healthy. (This used to be ctdb commit 05603e914f8c12618d7e06943c0f7df207f645b0)	2012-05-01 10:48:08 +10:00
Amitay Isaacs	131d35d67d	includes: Move special tevent defines from tevent.h to includes.h This allows to build against system tevent library. Also include tevent header along with other common headers. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 9ae4389c2c959c5dcd8395fdae2b25ed7e1e873a)	2012-04-13 17:28:14 +10:00
Martin Schwenke	fbe64dec01	Undo damage done by d8d37493478a26c5f1809a5f3df89ffd6e149281 The implementation of DisableIPFailover got intermingled with --nopublicipcheck. This just looks wrong - Ronnie must have been having a bad day. :-) Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 5083b266dd68b292c4275505f3d1b878dbf12f11)	2012-03-22 15:34:52 +11:00
Ronnie Sahlberg	2456f77ca6	NoIPTakeover: change the tunable name for the "dont allow failing addresses over onto the node" to NoIPTakeover (This used to be ctdb commit 35592e618cfd827b6978af6332f80504f232c46a)	2012-03-22 11:05:15 +11:00
Ronnie Sahlberg	befa9df152	Make NoIPFailback a node local setting. Nodes that have NoIPFailback set to !0 can not takeover new ip addresses during failover. Remove the old global setting for this unused tunable and add it as a new node flag. This node flag is only valid/defined within the takeover subsystem in the recovery daemon. Add async functions to collec the NoIPFailback settings for each node. This will later e used to disqualify certain nodes from being takeover targets when we perform reallocation. (This used to be ctdb commit 668f3e88a9e5f598706952b7140547640c85a5ed)	2012-03-22 09:09:57 +11:00
Ronnie Sahlberg	fa3a06246a	STICKY: add prototype code to make records stick to a node to "calm" down if they are found to be very hot and accessed by a lot of clients. This can improve performance and stop clients from having to chase a rapidly migrating/bouncing record (This used to be ctdb commit d0d98f7e45e5084b81335b004d50bddc80cdc219)	2012-03-20 17:12:19 +11:00
Ronnie Sahlberg	e7e51ddb64	LACOUNT: Add back lacount mechanism to defer migrating a fetched/read copy until after default of 20 consecutive requests from the same node This can improve performance slightly on certain workloads where smbds frequently read from the same record (This used to be ctdb commit 035c0d981bde8c0eee8b3f24ba8e2dc817e5b504)	2012-03-20 12:26:22 +11:00
Ronnie Sahlberg	6a493a0b08	STATISTICS: add per-db hop count statistics (This used to be ctdb commit 1c976d83b1d7dac6f0ef81306774998e4c8b56a1)	2012-03-20 12:11:55 +11:00
Ronnie Sahlberg	c051f67d67	FETCH COLLAPSE : Change the fetch-lock collapse to collapse ALL fetches, including fetch-locks into a single command in flight per record. Also add a tunable to enable/disable this optimization for hot records (This used to be ctdb commit eafd7bbaaa5931546a96c8beae3cf9a39a49c925)	2012-03-20 11:39:00 +11:00
Ronnie Sahlberg	038c946e80	add max hop count buckets to see how bad hopcounts are (This used to be ctdb commit 7d3931298e6477d92f43652c3006b0c426cb1307)	2012-03-20 11:20:53 +11:00
Ronnie Sahlberg	f3600276fc	Add a tunable variable to control how long we defer after a ctdb addip until we force a rebalance and try to failback addresses onto this node Have it default to 300 seconds. (This used to be ctdb commit 49791db7dc74cffd7e88bd73091590cdc1909328)	2012-02-28 06:58:59 +11:00
Ronnie Sahlberg	ef2bd0b016	When adding ips to nodes, set up a deferred rebalance for the whole node to trigger after 60 seconds in case the normal ipreallocated is not sufficient to trigger rebalance. (This used to be ctdb commit 4340263b219d75c39f8de22abe3f6f1c1ee63ea2)	2012-02-28 06:56:04 +11:00
Ronnie Sahlberg	93ec9c589c	Eventscripts: remove the horrible horrible circular reference between state and callback since these two structures do not even share the same parent talloc context. Instead, tie them together via referencing a permanent linked list hung off the ctdb structure. (This used to be ctdb commit a95c02da6c67dc4bd8716b75318a4188301df6f9)	2012-02-23 06:49:47 +11:00
Ronnie Sahlberg	42e477b14e	READONLY: only send a control to schedule fast-vacuuming from child context iff we have a connection open to the main daemon there are some child processes where we do not create a connection to the main daemon (switch_from_server_to_client()) because it is expensive to set up and we normally might not need to talk to the daemon at all via a domainsocket. but we might want to still call to ctdb_ltdb_store() from such chil processes. (This used to be ctdb commit 9e372a08c40087e6b5335aa298e94d88273566a5)	2012-02-21 07:03:44 +11:00
Ronnie Sahlberg	73f8be16c6	ReadOnly: add per-database statistics to view how much delegations/revokes we have (This used to be ctdb commit 751ed46197661eb841042ab6a02855a51dd0b17c)	2012-02-08 15:29:27 +11:00
Ronnie Sahlberg	1eafa68f0f	STATISTICS: add total counts for number of delegations and number of revokes Everytime we give a delegation to another node we count this as one delegation. If the same record is delegated to several nodes we count one for each node. Everytime a record has all its delegations revoked we count this as one revoke. (This used to be ctdb commit b098bcf8007be63889aaed640a951b0eeaa9d191)	2012-02-08 13:42:30 +11:00
Martin Schwenke	ed8a8ee966	libctdb - add ctdb_getvnnmap() Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit f6039eaece4224b866a98dd49010f278a7b3f015)	2012-02-06 16:00:23 +11:00
Ronnie Sahlberg	e648045499	Merge branch 'master' of ssh://git.samba.org/data/git/ctdb (This used to be ctdb commit 15d8ae8b0f80f95d7839528b8ac60aa0e2485c77)	2012-01-03 12:40:15 +11:00
Michael Adam	e04fad0ee4	vacuum: add new tunable VacuumInterval and mark Vacuum{Default,Min,Max}Interval obsolete And use VacuumInterval instead of VacuumDefaultInterval in the code. (This used to be ctdb commit 78530f40338f511a7cd1d33ada450905742bfa8f)	2011-12-23 17:39:02 +01:00
Michael Adam	a481ca711f	vacuum: add ctdb_local_remove_from_delete_queue() Pair-Programmed-With: Stefan Metzmacher <metze@samba.org> (This used to be ctdb commit a5065b42a98c709173503e02d217f97792878625)	2011-12-23 17:39:00 +01:00

1 2 3 4 5 ...

687 Commits