samba-mirror

mirror of https://github.com/samba-team/samba.git synced 2024-12-24 21:34:56 +03:00

Author	SHA1	Message	Date
Amitay Isaacs	abd5d7c8fd	ctdb-daemon: Use refactored tunable code Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2016-07-25 21:29:46 +02:00
Amitay Isaacs	d5e4a7abdc	ctdb-locking: Drop code for Samba 3.x compatibility Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2016-07-25 21:29:41 +02:00
Amitay Isaacs	93dcca2a5f	ctdb-recovery: Update timeout and number of retries during recovery The timeout RecoverTimeout (default 120) is used for control messages sent during the recovery. If any of the nodes does not respond to any of the recovery control messages for RecoverTimeout seconds, then it will cause a failure of recovery of a database. Recovery helper will retry the recovery for a database 5 times. In the worst case, if a database could not be recovered within 5 attempts, a total of 600 seconds would have passed. During this time period other timeouts will be triggered causing unnecessary failures as follows: 1. During the recovery, even though recoverd is processing events, it does not send a ping message to ctdb daemon. If a ping message is not received for RecdPingTimeout (default 60) seconds, then ctdb will count it as unresponsive recovery daemon. If the recovery daemon fails for RecdFailCount (default 10) times, then ctdb daemon will restart recovery daemon. So after 600 seconds, ctdb daemon will restart recovery daemon. 2. If ctdb daemon stays in recovery for RecoveryDropAllIPs (default 120), then it will drop all the public addresses. This will cause all SMB client to be disconnected unnecessarily. The released public addresses will not be taken over till the recovery is complete. To avoid dropping of IPs and restarting recovery daemon during a delayed recovery, adjust RecoverTimeout to 30 seconds and limit number of retries for recovering a database to 3. If we don't hear from a node for more than 25 seconds, then the node is considered disconnected. So 30 seconds is sufficient timeout for controls during recovery. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> Autobuild-User(master): Martin Schwenke <martins@samba.org> Autobuild-Date(master): Mon Jun 6 08:49:15 CEST 2016 on sn-devel-144	2016-06-06 08:49:15 +02:00
Amitay Isaacs	c41808e6d2	ctdb-tunables: Add new tunable RecBufferSizeLimit This will be used to limit the size of record buffer sent in newer controls for recovery and existing controls for vacuuming. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2016-03-25 03:26:14 +01:00
Volker Lendecke	deaab95b8d	ctdb: Fix CID 1356313 Explicit null dereferenced Signed-off-by: Volker Lendecke <vl@samba.org> Reviewed-by: Jeremy Allison <jra@samba.org>	2016-03-18 00:29:14 +01:00
Martin Schwenke	fa8bd41009	ctdb-tunables: Mark tunable DeferredRebalanceOnNodeAdd obsolete Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> Autobuild-User(master): Amitay Isaacs <amitay@samba.org> Autobuild-Date(master): Thu Mar 10 06:51:46 CET 2016 on sn-devel-144	2016-03-10 06:51:46 +01:00
Amitay Isaacs	bd23b43bfe	ctdb-tunables: Fix the implementation of LIST_TUNABLES control Do not assume the first tunable is not obsolete. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2016-03-10 03:34:18 +01:00
Amitay Isaacs	4a78200f43	ctdb-tunables: Mark tunable ReclockPingPeriod obsolete Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2016-03-10 03:34:18 +01:00
Amitay Isaacs	73ab0f9911	ctdb-tunables: Mark tunable MaxRedirectCount obsolete Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2016-03-10 03:34:18 +01:00
Amitay Isaacs	aa700deb64	ctdb-tunables: Add missing flags in the initializer Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2016-03-10 03:34:18 +01:00
Martin Schwenke	9166c30a41	ctdb-daemon: Rename EventScriptTimeoutCount to MonitorTimeoutCount This only applies to monitor events so renaming clarifies this. Note that this change is not backward compatible. Users with CTDB_SET_EventScriptTimeoutCount=<n> in their configuration will get failures when starting CTDB but the cause will be clearly logged. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-11-16 08:42:11 +01:00
Amitay Isaacs	f50db5cba5	ctdb-server: Replace ctdb_logging.h with common/logging.h Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Michael Adam <obnox@samba.org>	2015-11-16 00:46:15 +01:00
Amitay Isaacs	1278a5a4ed	ctdb-daemon: Rename struct ctdb_control_set_tunable to ctdb_tunable_old Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2015-11-04 00:47:15 +01:00
Amitay Isaacs	cb0be4126f	ctdb-daemon: Rename struct ctdb_tunable to ctdb_tunable_list Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2015-11-04 00:47:15 +01:00
Amitay Isaacs	4647787773	ctdb-daemon: Separate prototypes for common client/server functions This groups function prototypes for common client/server functions in common/common.h and removes them from ctdb_private.h. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2015-10-30 02:00:27 +01:00
Amitay Isaacs	01c6c90e98	ctdb-daemon: Remove dependency on includes.h Instead of includes.h, include the required header files explicitly. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2015-10-30 02:00:27 +01:00
Amitay Isaacs	62ba95a9f3	ctdb-daemon: Drop tunable that is no longer in use Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2015-03-27 06:40:08 +01:00
Martin Schwenke	54f0c39e5a	ctdb-client: Return a value of 1 when setting obsolete tunable variable Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-02-18 05:34:06 +01:00
Martin Schwenke	4d3b52f1ce	ctdb-daemon: Log a warning when setting obsolete tunables Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-02-13 07:19:07 +01:00
Martin Schwenke	d110fe2318	ctdb-daemon: Mark tunable VerifyRecoveryLock as obsolete It is pointless having a recovery lock but not sanity checking that it is working. Also, the logic that uses this tunable is confusing. In some places the recovery lock is released unnecessarily because the tunable isn't set. Simplify the logic by assuming that if a recovery lock is specified then it should be verified. Update documentation that references this tunable. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-02-13 07:19:07 +01:00
Amitay Isaacs	3ff8ec0283	ctdb-locking: Increase number of lock processes per database to 200 This was the original limit in the older versions of CTDB. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Volker Lendecke <vl@samba.org>	2014-08-04 17:59:52 +02:00
Amitay Isaacs	59d45ea307	ctdb-locking: Add new tunable LockProcessesPerDB This allows to change the maximum number of lock processes that can be active. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Volker Lendecke <vl@samba.org>	2014-08-04 17:59:52 +02:00
Amitay Isaacs	55fbe364b9	ctdb-daemon: Support per-node robust mutex feature To enable TDB mutex support, set tunable TDBMutexEnabled=1. When databases are attached for the first time, attach flags must include TDB_MUTEX_LOCKING and TDBMutexEnabled must set to enable mutex support. However, when CTDB attaches databases internally for recovery, it will enable mutex support if TDBMutexEnabled is set. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Stefan Metzmacher <metze@samba.org> Autobuild-User(master): Amitay Isaacs <amitay@samba.org> Autobuild-Date(master): Wed Jul 9 06:45:17 CEST 2014 on sn-devel-104	2014-07-09 06:45:17 +02:00
Amitay Isaacs	41d37058ca	tunables: Remove obsolete tunables Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit ca5fc3431573c44d55d09d987c715fb53756fc1f)	2013-10-30 15:37:11 +11:00
Amitay Isaacs	fc7f335843	daemon: Change the default recovery method for persistent databases Use sequence numbers to do recovery for persistent databases instead of RSNs. This fixes the problem of registry corruption during recovery. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 56486d1c01cc8ad0e4b8cee7a22429e72e50f03d)	2013-10-28 18:51:22 +11:00
Amitay Isaacs	1467b666f2	Revert "LACOUNT: Add back lacount mechanism to defer migrating a fetched/read copy until after default of 20 consecutive requests from the same node" This reverts commit 035c0d981bde8c0eee8b3f24ba8e2dc817e5b504. This is a premature optimization. Record can bounce between nodes very quickly if it is a contended record. There is no need to hold a record on a node unnecessarily. In case record contention becomes bad, enabling sticky records on a database is a better idea. Conflicts: include/ctdb_private.h server/ctdb_tunables.c Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit ac417b0003f0116f116834ad2ac51482d25cfa0d)	2013-08-22 14:08:52 +10:00
Amitay Isaacs	d46c24f4d0	ctdbd: No need for DeadlockTimeout tunable The code for deadlock detection and killing smbd process causing deadlock has been removed and replaced with external debug script. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 2211cd94bea266547d3e6f167d3160a6b23bec88)	2013-07-10 14:33:18 +10:00
Martin Schwenke	140f0cfd3b	ctdbd: Update the get_tunable code to return -EINVAL for unknown tunable Otherwise callers can't tell the difference between some other failure (e.g. memory allocation failure) and an unknown tunable. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 03fd90d41f9cd9b8c42dc6b8b8d46ae19101a544)	2013-05-24 16:04:50 +10:00
Martin Schwenke	0445c988e2	recoverd: Fix tunable NoIPTakeoverOnDisabled, rename to NoIPHostOnAllDisabled This really needs to be per-node. The rename is because nodes with this tunable switched on should drop IPs if they become unhealthy (or disabled in some other way). * Add new flag NODE_FLAGS_NOIPHOST, only used in recovery daemon. * Enhance set_ipflags_internal() and set_ipflags() to setup NODE_FLAGS_NOIPHOST depending on setting of NoIPHostOnAllDisabled and/or whether nodes are disabled/inactive. * Replace can_node_servce_ip() with functions can_node_host_ip() and can_node_takeover_ip(). These functions are the only ones that need to look at NODE_FLAGS_NOIPTAKEOVER and NODE_FLAGS_NOIPHOST. They can make the decision without looking at any other flags due to previous setup. * Remove explicit flag checking in IP allocation functions (including unassign_unsuitable_ips()) and just call can_node_host_ip() and can_node_takeover_ip() as appropriate. * Update test code to handle CTDB_SET_NoIPHostOnAllDisabled. Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 1308a51f73f2e29ba4dbebb6111d9309a89732cc)	2013-05-07 16:20:46 +10:00
Amitay Isaacs	1d3eebbca4	ctdbd: Fix the PullDBPreallocation size to 10MB as intended In 1f262deaad0818f159f9c68330f7fec121679023, Ronnie changed recovery code to allocate chunks of 10MB in traverse_pulldb() and traverse_recdb(). The tunable PullDBPreallocation size was set to 100MB. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit e204fac03412520e877ab04363b3ece02667c55b)	2013-02-14 09:40:35 +11:00
Amitay Isaacs	a73f13ada7	daemon: Add a tunable to enable automatic database priority setting Samba versions 3.6.x and older do not set the database priority. This can cause deadlock between Samba and CTDB since the locking order of database will be different. A hack was added for automatic promotion of priority for specific databases to avoid deadlock. This code should not be invoked with Samba version 4.x which correctly specifies the priority for each database. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit 4a9e96ad3d8fc46da1cd44cd82309c1b54301eb7)	2013-01-05 01:14:57 +01:00
Amitay Isaacs	83306337df	ctdbd: locking: Provide non-blocking API for locking of TDB record/db/alldb This introduces a consistent API for handling locks on single record, complete db or all dbs. The locks are taken out in a child process. In cases of timeout, find the processes that currently hold the lock and log. Callback functions for locking requests take locked boolean to indicate whether the lock was successfully obtained or not. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 1af99cf0de9919dd89af1feab6d1bd18b95d82ff)	2012-10-20 02:48:44 +11:00
Martin Schwenke	79ea15bf96	ctdbd: New tunable NoIPTakeoverOnDisabled Stops the behaviour where unhealthy nodes can host IPs when there are no healthy nodes. Set this to 1 when an immediate complete outage is preferred when all nodes are unhealthy. The alternative (i.e. default) can lead to undefined behaviour when the shared filesystem is unavailable. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit a555940fb5c914b7581667a05153256ad7d17774)	2012-10-11 12:10:45 +11:00
Ronnie Sahlberg	bbd33d6394	RECOVERY: Increase the time we allow before timing out recovery related tasks. If the system is temporarily taking unusually long to perform these tasks it is better to wait a lot longer and allow the tasks to complete than timing out repeatedly and then becomming banned. (This used to be ctdb commit 03fa2a517247eb2adfba67248e2466f17ea14418)	2012-05-25 12:34:14 +10:00
Ronnie Sahlberg	e7d21834ae	RECOVER: When we pull databases during recovery, we used to reallocate the databuffer for each entry added. This would normally not be an issue, but for cases where memory is fragmented, this could start to cost significant cpu if we need to reallocate and move to a different region. Change this to instead preallocate , by default, 10MByte chunks to the data buffer. This significantly reduces the number of potential reallocate and move operations that may be required. Create a tunable to override/change how much preallocation should be used. (This used to be ctdb commit 1f262deaad0818f159f9c68330f7fec121679023)	2012-05-25 12:34:06 +10:00
Ronnie Sahlberg	26322d257d	DEBUG: Add checks for and print debug messages when 1) a database contains very many records, 2) when a database is very big, 3) when a single record is very big. Add tunables to control when to log these instances and allow it to be completely turned off by setting the threshold to 0 (This used to be ctdb commit 9ed58fef4991725f75509433496f4d5ffae0ae87)	2012-05-21 13:26:13 +10:00
Ronnie Sahlberg	dce5969d12	Debug: When scripts hang, we may need to collect additional data in order to debug why the script hung. Break this debug and datacollection out into an external script to make it easier to modify what data we need to collect. For now we only collect a pstree so we can see what part of the script we hung in. S1037271 (This used to be ctdb commit 6e68797af67bee36f2bad045f94806e7e98f27e9)	2012-05-17 10:29:03 +10:00
Ronnie Sahlberg	2456f77ca6	NoIPTakeover: change the tunable name for the "dont allow failing addresses over onto the node" to NoIPTakeover (This used to be ctdb commit 35592e618cfd827b6978af6332f80504f232c46a)	2012-03-22 11:05:15 +11:00
Ronnie Sahlberg	fa3a06246a	STICKY: add prototype code to make records stick to a node to "calm" down if they are found to be very hot and accessed by a lot of clients. This can improve performance and stop clients from having to chase a rapidly migrating/bouncing record (This used to be ctdb commit d0d98f7e45e5084b81335b004d50bddc80cdc219)	2012-03-20 17:12:19 +11:00
Ronnie Sahlberg	e7e51ddb64	LACOUNT: Add back lacount mechanism to defer migrating a fetched/read copy until after default of 20 consecutive requests from the same node This can improve performance slightly on certain workloads where smbds frequently read from the same record (This used to be ctdb commit 035c0d981bde8c0eee8b3f24ba8e2dc817e5b504)	2012-03-20 12:26:22 +11:00
Ronnie Sahlberg	c051f67d67	FETCH COLLAPSE : Change the fetch-lock collapse to collapse ALL fetches, including fetch-locks into a single command in flight per record. Also add a tunable to enable/disable this optimization for hot records (This used to be ctdb commit eafd7bbaaa5931546a96c8beae3cf9a39a49c925)	2012-03-20 11:39:00 +11:00
Ronnie Sahlberg	e086333743	Vacuuming: change default timeout to 120 seconds (This used to be ctdb commit 5ae94c6b9b3000a6c79fccaaea1e007ebd5be1a9)	2012-02-29 12:29:22 +11:00
Ronnie Sahlberg	f3600276fc	Add a tunable variable to control how long we defer after a ctdb addip until we force a rebalance and try to failback addresses onto this node Have it default to 300 seconds. (This used to be ctdb commit 49791db7dc74cffd7e88bd73091590cdc1909328)	2012-02-28 06:58:59 +11:00
Michael Adam	8b89e542e1	tunables: don't list obsolete tunables in the list_tunables control (This used to be ctdb commit d8ab86f0eb11437e50d18183858dd3177a8f61e6)	2011-12-23 17:39:10 +01:00
Michael Adam	25ac808b07	tunables: add a bool obsolete flag to the tunable_map list (This used to be ctdb commit 1a7d9b25fdcf7b59598618d406c2a681c90d9163)	2011-12-23 17:39:09 +01:00
Michael Adam	e04fad0ee4	vacuum: add new tunable VacuumInterval and mark Vacuum{Default,Min,Max}Interval obsolete And use VacuumInterval instead of VacuumDefaultInterval in the code. (This used to be ctdb commit 78530f40338f511a7cd1d33ada450905742bfa8f)	2011-12-23 17:39:02 +01:00
Ronnie Sahlberg	0420449a6c	Recover Persistent database DB by DB and not record by record Add a new tunable that changes the mode how persistent databases are recovered. RecoveryPDBBySeqNum When set to 1, persistent databases will be recovered in whole from the node which has the highest "__db_sequence_number__" record. This record is managed by samba for those databases where we do persistent writes and have inter-record relations. For these databases we do not want the usual "blend records from all nodes based on individual record RSN" but instead a mode where we pick one instance of the persistent database. If no node was found with a "__db_sequence_number__" record at all, we fail back to the original "recover records independently based on record RSN". Some persistent databases do not contain record interrelations and as such does not contain this special record at all. (This used to be ctdb commit 502150c764298a9fa8c4d8aa445bf7d85d4ee9dc)	2011-11-30 08:48:23 +11:00
Michael Adam	a3e0079568	Add a tunable "AllowClientDBAttach" with default value 1. When set to 0, clients will not be able to attach to databases via the db_attach control. This might can be useful for maintenance where ctdb should be kept running but clients should not be able to modify databases. (This used to be ctdb commit ddfeecda87955b4e46777599f678e6926d37f4c4)	2011-09-05 16:17:39 +10:00
Ronnie Sahlberg	d1f5177374	Change the default for ip failover to be LCP2 and not DeterministicIPs (This used to be ctdb commit 038916248a73d6a250108c9235c0c4f76dba8e0c)	2011-08-15 10:43:42 +10:00
Martin Schwenke	ff1a81c872	IP allocation - add LCP2 algorithm. The current non-deterministic IP allocation algorithm balances IPs across the whole cluster. It does not consider different interfaces/VLANs/subnets, so these different groups of IPs aren't generally well balanced. This adds the LCP2 algorithm for IP allocation and allows it to be enabled by setting the "LCP2PublicIPs" tunable to 1. The LCP2 algorithm calculates the imbalance of a node by totalling the squares of the distances between each IP on the node. The IP distance is defined as the length longest common prefix (LCP) of bits that is found when comparing 2 IPs. The imbalance of a cluster is the maximum imbalance for any node. At each step the algorithm selects an allocation to the IP/node combination that results in the choosing the allocation that best reduces the imbalance of the cluster. The implementation splits out the IP allocation part of ctdb_takeover_run() into new function ctdb_takeover_run_core(), and then extracts out the basic IP assignment code into new functions basic_allocate_unassigned() and basic_failback(). 3 new functions lcp2_init(), lcp2_allocate_unassigned() and lcp2_failback() implement the LCP2 algorithm, and are hooked into ctdb_takeover_run_core(). Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 61fc7fbd0235469df22deb6581c6bd47e30bc0be)	2011-07-29 09:01:17 +10:00

1 2 3

106 Commits