samba-mirror

mirror of https://github.com/samba-team/samba.git synced 2024-12-22 13:34:15 +03:00

Author	SHA1	Message	Date
Martin Schwenke	97a45f6f25	ctdb-recoverd: Add log reopening on SIGHUP to helpers Recovery and takeover helpers can run for a while and generate non-trivial logs. They should support log reopening. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2022-01-17 03:43:30 +00:00
Martin Schwenke	2efce7d477	ctdb-recovery: Simplify database push function names Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2020-09-11 05:06:42 +00:00
Martin Schwenke	f4e2206e88	ctdb-recovery: Drop unnecessary database push wrapper Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2020-09-11 05:06:42 +00:00
Martin Schwenke	225a699633	ctdb-recovery: Drop passing of capabilities into database pull This is no longer necessary because the capability new style database pull is assumed to always be available. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2020-09-11 05:06:42 +00:00
Martin Schwenke	595c1a7c0f	ctdb-recovery: Simplify database pull function names Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2020-09-11 05:06:42 +00:00
Martin Schwenke	f968576642	ctdb-recovery: Remove use of old pull and push controls Removes use of the old controls without cleaning up the code. Clean up can be done later. After this change the CTDB_CAP_FRAGMENTED_CONTROLS capability is no longer checked. This capability can be removed along with the controls. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2020-09-11 05:06:42 +00:00
Ralph Boehme	2327471756	lib: relicense smb_strtoul(l) under LGPLv3 Signed-off-by: Ralph Boehme <slow@samba.org> Reviewed-by: Swen Schillig <swen@linux.ibm.com> Reviewed-by: Volker Lendecke <vl@samba.org> Autobuild-User(master): Jeremy Allison <jra@samba.org> Autobuild-Date(master): Mon Aug 3 22:21:04 UTC 2020 on sn-devel-184	2020-08-03 22:21:02 +00:00
Martin Schwenke	76a8174279	ctdb-recovery: Create database on nodes where it is missing BUG: https://bugzilla.samba.org/show_bug.cgi?id=14294 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2020-03-23 23:45:38 +00:00
Martin Schwenke	e6e63f8fb8	ctdb-recovery: Fetch database name from all nodes where it is attached BUG: https://bugzilla.samba.org/show_bug.cgi?id=14294 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2020-03-23 23:45:38 +00:00
Martin Schwenke	1bdfeb3fdc	ctdb-recovery: Pass db structure for each database recovery Instead of db_id and db_flags. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14294 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2020-03-23 23:45:38 +00:00
Martin Schwenke	c6f74e590f	ctdb-recovery: GET_DBMAP from all nodes This builds a complete list of databases across the cluster so it can be used to create databases on the nodes where they are missing. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14294 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2020-03-23 23:45:38 +00:00
Martin Schwenke	4c0b9c3605	ctdb-recovery: Replace use of ctdb_dbid_map with local db_list This will be used to build a merged list of databases from all nodes, allowing the recovery helper to create missing databases. It would be possible to also include the db_name field in this structure but that would cause a lot of churn. This field is used locally in the recovery of each database so can continue to live in the relevant state structure(s). BUG: https://bugzilla.samba.org/show_bug.cgi?id=14294 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2020-03-23 23:45:38 +00:00
Amitay Isaacs	1c56d6413f	ctdb-recovery: Refactor banning a node into separate computation If a node is marked for banning, confirm that it's not become inactive during the recovery. If yes, then don't ban the node. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14294 Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2020-03-23 23:45:37 +00:00
Amitay Isaacs	c6a0ff1bed	ctdb-recovery: Don't trust nodemap obtained from local node It's possible to have a node stopped, but recovery master not yet updated flags on the local ctdb daemon when recovery is started. So do not trust the list of active nodes obtained from the local node. Query the connected nodes to calculate the list of active nodes. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14294 Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2020-03-23 23:45:37 +00:00
Amitay Isaacs	6e2f8756f1	ctdb-recovery: Consolidate node state This avoids passing multiple arguments to async computation. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14294 Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2020-03-23 23:45:37 +00:00
Amitay Isaacs	072ff4d12b	ctdb-recovery: Fetched vnnmap is never used, so don't fetch it New vnnmap is constructed using the information from all the connected nodes. So there is no need to fetch the vnnmap from recovery master. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14294 Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2020-03-23 23:45:37 +00:00
Swen Schillig	73640b8ad8	ctdb: Update all consumers of strtoul_err(), strtoull_err() to new API Signed-off-by: Swen Schillig <swen@linux.ibm.com> Reviewed-by: Ralph Boehme <slow@samba.org> Reviewed-by: Christof Schmitt <cs@samba.org>	2019-06-30 11:32:18 +00:00
Martin Schwenke	90622ab901	ctdb-recovery: Fix signed/unsigned comparisons by declaring as unsigned Simple cases where variables and function parameters need to be declared as an unsigned type instead of an int. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2019-06-05 10:25:50 +00:00
Amitay Isaacs	278eb236ae	ctdb-daemon: Fix maybe-uninitialized error with picky developer 263/386] Compiling ctdb/server/ctdb_recovery_helper.c In file included from ../../server/ctdb_recovery_helper.c:24:0: ../../server/ctdb_recovery_helper.c: In function ‘main’: ../../../lib/talloc/talloc.h:911:34: error: ‘mem_ctx’ may be used uninitialized in this function [-Werror=maybe-uninitialized] #define TALLOC_FREE(ctx) do { if (ctx != NULL) { talloc_free(ctx); ctx=NULL; } } while(0) Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Jeremy Allison <jra@samba.org>	2019-03-01 17:21:15 +00:00
Swen Schillig	55acae774a	ctdb-server: Use wrapper for string to integer conversion In order to detect an value overflow error during the string to integer conversion with strtoul/strtoull, the errno variable must be set to zero before the execution and checked after the conversion is performed. This is achieved by using the wrapper function strtoul_err and strtoull_err. Signed-off-by: Swen Schillig <swen@linux.ibm.com> Reviewed-by: Ralph Böhme <slow@samba.org> Reviewed-by: Jeremy Allison <jra@samba.org>	2019-03-01 00:32:11 +00:00
Martin Schwenke	27df4f002a	ctdb-recovery: Ban a node that causes recovery failure ... instead of applying banning credits. There have been a couple of cases where recovery repeatedly takes just over 2 minutes to fail. Therefore, banning credits expire between failures and a continuously problematic node is never banned, resulting in endless recoveries. This is because it takes 2 applications of banning credits before a node is banned, which generally involves 2 recovery failures. The recovery helper makes up to 3 attempts to recover each database during a single run. If a node causes 3 failures then this is really equivalent to 3 recovery failures in the model that existed before the recovery helper added retries. In that case the node would have been banned after 2 failures. So, instead of applying banning credits to the "most failing" node, simply ban it directly from the recovery helper. If multiple nodes are causing recovery failures then this can cause a node to be banned more quickly than it might otherwise have been, even pre-recovery-helper. However, 90 seconds (i.e. 3 failures) is a long time to be in recovery, so banning earlier seems like the best approach. BUG: https://bugzilla.samba.org/show_bug.cgi?id=13670 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> Autobuild-User(master): Amitay Isaacs <amitay@samba.org> Autobuild-Date(master): Mon Nov 5 06:52:33 CET 2018 on sn-devel-144	2018-11-05 06:52:33 +01:00
Martin Schwenke	7dbf833697	ctdb: Fix some -Werror=strict-overflow issues All quite obvious. For the LCP2 one, we're not actually counting so use a bool instead of an int. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2018-04-27 06:53:16 +02:00
Amitay Isaacs	de3f0d889b	ctdb-recovery-helper: Deregister message handler in error paths BUG: https://bugzilla.samba.org/show_bug.cgi?id=13188 If PULL_DB control times out but the remote node is still sending the data, then the tevent_req for pull_database_send will be freed without removing the message handler. So when the data is received, srvid handler will be called and it will try to access tevent_req which will result in use-after-free and abort. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2017-12-13 08:48:18 +01:00
Amitay Isaacs	676df8770b	ctdb-protocol: Fix marshalling for ctdb_rec_buffer Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2017-08-30 14:59:23 +02:00
Amitay Isaacs	b8a0420d10	ctdb-daemon: Add implementation for CTDB_CONTROL_DB_ATTACH_REPLICATED control Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2017-06-29 10:34:27 +02:00
Amitay Isaacs	1e10f224ff	ctdb-recovery: Use db_flags instead of a boolean persistent flag Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2017-06-29 10:34:27 +02:00
Amitay Isaacs	c9d9f56bff	ctdb-recovery: Assign banning credits if database fails to freeze https://bugzilla.samba.org/show_bug.cgi?id=12857 Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2017-06-24 10:28:21 +02:00
Amitay Isaacs	6ebcba49d0	ctdb-recovery: Delete empty records during recovery Persistent databases are now always recovered by sequence number. So there is no need to keep the empty records in the database since they will never be recovered record-by-record using RSN. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> Autobuild-User(master): Martin Schwenke <martins@samba.org> Autobuild-Date(master): Sat Jun 17 16:47:55 CEST 2017 on sn-devel-144	2017-06-17 16:47:55 +02:00
Amitay Isaacs	40cc7a1eb3	ctdb-recovery: Log messages at various debug levels This avoids spamming the logs during recovery at NOTICE level. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> Autobuild-User(master): Martin Schwenke <martins@samba.org> Autobuild-Date(master): Tue Jun 13 13:22:09 CEST 2017 on sn-devel-144	2017-06-13 13:22:09 +02:00
Amitay Isaacs	41c964fdbc	ctdb-recovery: Start recovery helper with ctdb_vfork_exec The recovery helper does it's own logging, so there is no need to pass logfd. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> Autobuild-User(master): Martin Schwenke <martins@samba.org> Autobuild-Date(master): Mon Dec 5 11:59:42 CET 2016 on sn-devel-144	2016-12-05 11:59:42 +01:00
Martin Schwenke	bdc049dfce	ctdb-common: Drop CTDB's copy of sys_read() and sys_write() Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> Autobuild-User(master): Amitay Isaacs <amitay@samba.org> Autobuild-Date(master): Tue Nov 29 11:22:40 CET 2016 on sn-devel-144	2016-11-29 11:22:40 +01:00
Amitay Isaacs	f2414841f2	ctdb-daemon: Mark RecoverPDBBySeqNum tunable deprecated Persistent databases are now always recovered by sequence number, so there is no need for this tunable. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> Autobuild-User(master): Martin Schwenke <martins@samba.org> Autobuild-Date(master): Fri Nov 25 08:13:59 CET 2016 on sn-devel-144	2016-11-25 08:13:59 +01:00
Amitay Isaacs	54e392b385	ctdb-recovery: Avoid NULL dereference in failure case BUG: https://bugzilla.samba.org/show_bug.cgi?id=12434 Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Volker Lendecke <vl@samba.org> Autobuild-User(master): Volker Lendecke <vl@samba.org> Autobuild-Date(master): Mon Nov 21 12:26:04 CET 2016 on sn-devel-144	2016-11-21 12:26:04 +01:00
Amitay Isaacs	6b93b57921	ctdb-recovery-helper: Add missing initialisation of ban_credits BUG: https://bugzilla.samba.org/show_bug.cgi?id=12275 Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2016-09-19 08:23:22 +02:00
Amitay Isaacs	f1a8fb11dd	ctdb-recovery-helper: Fix format-nonliteral warning ... and printf format errors. BUG: https://bugzilla.samba.org/show_bug.cgi?id=12137 Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Uri Simchoni <uri@samba.org>	2016-08-10 08:18:16 +02:00
Amitay Isaacs	600cec4d44	ctdb-recovery: Terminate if recovery fails without any banning credits In case of database recovery failure, if there are no banning credits assigned, then the async computation is never terminated. The else condition is missing in (max_credits >= NUM_RETRIES) check. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> Autobuild-User(master): Martin Schwenke <martins@samba.org> Autobuild-Date(master): Fri Jun 24 09:56:23 CEST 2016 on sn-devel-144	2016-06-24 09:56:23 +02:00
Amitay Isaacs	1847556562	ctdb-recovery-helper: Fix a comment The sequence of events are incorrectly documented. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2016-06-24 05:59:08 +02:00
Amitay Isaacs	93dcca2a5f	ctdb-recovery: Update timeout and number of retries during recovery The timeout RecoverTimeout (default 120) is used for control messages sent during the recovery. If any of the nodes does not respond to any of the recovery control messages for RecoverTimeout seconds, then it will cause a failure of recovery of a database. Recovery helper will retry the recovery for a database 5 times. In the worst case, if a database could not be recovered within 5 attempts, a total of 600 seconds would have passed. During this time period other timeouts will be triggered causing unnecessary failures as follows: 1. During the recovery, even though recoverd is processing events, it does not send a ping message to ctdb daemon. If a ping message is not received for RecdPingTimeout (default 60) seconds, then ctdb will count it as unresponsive recovery daemon. If the recovery daemon fails for RecdFailCount (default 10) times, then ctdb daemon will restart recovery daemon. So after 600 seconds, ctdb daemon will restart recovery daemon. 2. If ctdb daemon stays in recovery for RecoveryDropAllIPs (default 120), then it will drop all the public addresses. This will cause all SMB client to be disconnected unnecessarily. The released public addresses will not be taken over till the recovery is complete. To avoid dropping of IPs and restarting recovery daemon during a delayed recovery, adjust RecoverTimeout to 30 seconds and limit number of retries for recovering a database to 3. If we don't hear from a node for more than 25 seconds, then the node is considered disconnected. So 30 seconds is sufficient timeout for controls during recovery. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> Autobuild-User(master): Martin Schwenke <martins@samba.org> Autobuild-Date(master): Mon Jun 6 08:49:15 CEST 2016 on sn-devel-144	2016-06-06 08:49:15 +02:00
Amitay Isaacs	c51b8c2234	ctdb-recovery-helper: Add banning to parallel recovery If one or more nodes are misbehaving during recovery, keep track of failures as ban_credits. If the node with the highest ban_credits exceeds 5 ban credits, then tell recovery daemon to assign banning credits. This will ban only a single node at a time in case of recovery failure. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> Autobuild-User(master): Martin Schwenke <martins@samba.org> Autobuild-Date(master): Fri Mar 25 06:57:32 CET 2016 on sn-devel-144	2016-03-25 06:57:32 +01:00
Amitay Isaacs	ad7a407a13	ctdb-recovery-helper: Introduce new #define variable ... instead of hardcoding number of retries. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2016-03-25 03:26:16 +01:00
Amitay Isaacs	e5a714a3c2	ctdb-recovery-helper: Improve log message Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2016-03-25 03:26:16 +01:00
Amitay Isaacs	ffea827bae	ctdb-recovery-helper: Introduce push database abstraction This abstraction uses capabilities of the remote nodes to either send older PUSH_DB controls or newer DB_PUSH_START and DB_PUSH_CONFIRM controls. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2016-03-25 03:26:15 +01:00
Amitay Isaacs	b96a4759b3	ctdb-recovery-helper: Introduce pull database abstraction This abstraction depending on the capability of the remote node either uses older PULL_DB control or newer DB_PULL control. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2016-03-25 03:26:15 +01:00
Amitay Isaacs	e1fdfdd1c1	ctdb-recovery-helper: Write recovery records to a recovery file Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2016-03-25 03:26:15 +01:00
Amitay Isaacs	9058fe06df	ctdb-recovery-helper: Re-factor function to retain records from recdb Also, rename traverse function and traverse state for recdb_records consistently. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2016-03-25 03:26:15 +01:00
Amitay Isaacs	a80ff09ed3	ctdb-recovery-helper: Create accessors for recdb structure fields Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2016-03-25 03:26:15 +01:00
Amitay Isaacs	70011a1bfb	ctdb-recovery-helper: Rename pnn to dmaster in recdb_records() This variable is used to set the dmaster value for each record in recdb_traverse(). Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2016-03-25 03:26:15 +01:00
Amitay Isaacs	5b926d882e	ctdb-recovery-helper: Pass capabilities to database recovery functions Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2016-03-25 03:26:15 +01:00
Amitay Isaacs	5f43f92796	ctdb-recovery-helper: Factor out generic recv function Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2016-03-25 03:26:15 +01:00
Amitay Isaacs	700f39372a	ctdb-recovery-helper: Get tunables first, so control timeout can be set During the recovery process, the timeout value for sending all controls is decided by RecoverTimeout tunable. So in the recovery process, first get the tunables, so the control timeout gets set correctly. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2016-03-10 03:34:18 +01:00

1 2

55 Commits