samba-mirror

mirror of https://github.com/samba-team/samba.git synced 2024-12-23 17:34:34 +03:00

Author	SHA1	Message	Date
Martin Schwenke	41a41d5f3e	ctdb-daemon: Implement DB_VACUUM control Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2019-10-24 04:06:43 +00:00
Martin Schwenke	d462d64cdf	ctdb-vacuum: Only schedule next vacuum event if vacuuuming is scheduled At the moment vacuuming is always scheduled. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2019-10-24 04:06:43 +00:00
Martin Schwenke	13cedaf019	ctdb-daemon: Factor out code to create vacuuming child This changes the behaviour for some failures from exiting to simply attempting to schedule the next run. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2019-10-24 04:06:43 +00:00
Martin Schwenke	5539edfdbe	ctdb-vacuum: Simplify recording of in-progress vacuuming child There can only be one, so simplify the logic. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2019-10-24 04:06:43 +00:00
Amitay Isaacs	d0cc9edc05	ctdb-vacuum: Avoid processing any more packets All the vacuum operations if required have an event loop to ensure completion of pending operations. Once all the steps are complete, there is no reason to process any more packets. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2019-10-24 04:06:43 +00:00
Amitay Isaacs	680df07630	ctdb-daemon: Avoid memory leak when packet is deferred Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2019-10-24 04:06:43 +00:00
Amitay Isaacs	c6427dddf5	ctdb-recoverd: No need for database detach handler The only reason for recoverd attaching to databases was to migrate records to the local node as part of vacuuming. Recovery daemon does not take part in database vacuuming any more. The actual database recovery is handled via the recovery_helper and recovery daemon should not need to attach to the databases any more. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2019-10-24 04:06:43 +00:00
Amitay Isaacs	fc81729dd2	ctdb-recoverd: Drop VACUUM_FETCH message handling This is now implemented in the ctdb daemon using VACUMM_FETCH control. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2019-10-24 04:06:43 +00:00
Amitay Isaacs	498932c0e8	ctdb-vacuum: Replace VACUUM_FETCH message with control Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2019-10-24 04:06:42 +00:00
Amitay Isaacs	86521837b6	ctdb-vacuum: Add processing of fetch queue Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2019-10-24 04:06:42 +00:00
Amitay Isaacs	da617f90d9	ctdb-daemon: Add implementation of VACUUM_FETCH control Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2019-10-24 04:06:42 +00:00
Martin Schwenke	815ae64400	ctdb-vacuum: Drop debug level of repacking message to NOTICE This occurs rarely but can adversely impact performance, so it is worth logging it more frequently. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2019-10-04 05:47:35 +00:00
Amitay Isaacs	33f1c9d965	ctdb-vacuum: Process all records not deleted on a remote node This currently skips the last record. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14147 RN: Avoid potential data loss during recovery after vacuuming error Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2019-10-04 05:47:34 +00:00
Mathieu Parent	7cb0ca4171	Spelling fixes s/ dont / don't / Excluding examples/tridge/smb.conf Signed-off-by: Mathieu Parent <math.parent@gmail.com> Reviewed-by: Andrew Bartlett <abartlet@samba.org> Reviewed-by: Gary Lockyer <gary@catalyst.net.nz>	2019-09-01 22:21:27 +00:00
Mathieu Parent	736bb924f7	Spelling fixes s/ ot / to / Signed-off-by: Mathieu Parent <math.parent@gmail.com> Reviewed-by: Andrew Bartlett <abartlet@samba.org> Reviewed-by: Gary Lockyer <gary@catalyst.net.nz>	2019-09-01 22:21:27 +00:00
Martin Schwenke	8190993d99	ctdb-recoverd: Fix typo in previous fix BUG: https://bugzilla.samba.org/show_bug.cgi?id=14085 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> Autobuild-User(master): Amitay Isaacs <amitay@samba.org> Autobuild-Date(master): Tue Aug 27 15:29:11 UTC 2019 on sn-devel-184	2019-08-27 15:29:11 +00:00
Martin Schwenke	5d655ac6f2	ctdb-recoverd: Only check for LMASTER nodes in the VNN map BUG: https://bugzilla.samba.org/show_bug.cgi?id=14085 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2019-08-21 11:50:30 +00:00
Martin Schwenke	e9f2e205ee	ctdb-daemon: Make node inactive in the NODE_STOP control Currently some of this is supported by a periodic check in the recovery daemon's main_loop(), which notices the flag change, sets recovery mode active and freezes databases. If STOP_NODE returns immediately then the associated recovery can complete and the node can be continued before databases are actually frozen. Instead, immediately do all of the things that make a node inactive. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14087 RN: Stop "ctdb stop" from completing before freezing databases Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> Autobuild-User(master): Amitay Isaacs <amitay@samba.org> Autobuild-Date(master): Tue Aug 20 08:32:27 UTC 2019 on sn-devel-184	2019-08-20 08:32:27 +00:00
Martin Schwenke	91ac4c13d8	ctdb-daemon: Drop unused function ctdb_local_node_got_banned() BUG: https://bugzilla.samba.org/show_bug.cgi?id=14087 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2019-08-20 07:15:41 +00:00
Martin Schwenke	0f5f7b7cf4	ctdb-daemon: Switch banning code to use ctdb_node_become_inactive() There's no reason to avoid immediately setting recovery mode to active and initiating freeze of databases. This effectively reverts the following commits: `d8f3b490bb` `b4357a79d9` The latter is now implemented using a control, resulting in looser coupling. See also the following commit: `f8141e91a6` BUG: https://bugzilla.samba.org/show_bug.cgi?id=14087 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2019-08-20 07:15:41 +00:00
Martin Schwenke	a42bcaabb6	ctdb-daemon: Factor out new function ctdb_node_become_inactive() This is a superset of ctdb_local_node_got_banned() so will replace that function, and will also be used in the NODE_STOP control. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14087 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2019-08-20 07:15:41 +00:00
Martin Schwenke	3acb8e9d1c	ctdb-daemon: Add function ctdb_ip_to_node() This is the core logic from ctdb_ip_to_pnn(), so re-implement that that function using ctdb_ip_to_node(). Something similar (ctdb_ip_to_nodeid()) was recently removed in commit `010c1d77cd` because it wasn't required. Now there is a use case. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14084 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2019-08-16 21:30:35 +00:00
Martin Schwenke	6c9d1f855e	ctdb-daemon: Avoid signed/unsigned comparison by casting Compiling with -Wsign-compare complains: 1047 \| && (call->call_id == CTDB_FETCH_WITH_HEADER_FUNC)) { \| ^~ struct ctdb_call is a protocol element, so we can't simply change it. Found by csbuild. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> Autobuild-User(master): Amitay Isaacs <amitay@samba.org> Autobuild-Date(master): Wed Aug 14 10:29:59 UTC 2019 on sn-devel-184	2019-08-14 10:29:59 +00:00
Martin Schwenke	4bdfbbd8d4	ctdb-daemon: Avoid signed/unsigned comparison by declaring as unsigned Compiling with -Wsign-compare complains: ctdb/server/ctdb_call.c:831:12: warning: comparison of integer expressions of different signedness: ‘int’ and ‘uint32_t’ {aka ‘unsigned int’} [-Wsign-compare] 831 \| if (count <= ctdb_db->statistics.hot_keys[0].count) { \| ^~ and ctdb/server/ctdb_call.c:844:13: warning: comparison of integer expressions of different signedness: ‘int’ and ‘uint32_t’ {aka ‘unsigned int’} [-Wsign-compare] 844 \| if (count <= ctdb_db->statistics.hot_keys[i].count) { \| ^~ Found by cs-build. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2019-08-14 09:11:36 +00:00
Martin Schwenke	41cd44724e	ctdb-mutex: Add support for exiting if the lock file disappears If the lock file is inaccessible or the inode number changes then the lock is lost, so exit. This allows the recovery daemon to trigger an election. The ensuing recovery will re-take the lock. By default the lock file is checked every 60 seconds. A lot can happen in 60 seconds but being more aggressive and accessing the lock too often could result in a performance issue for the cluster filesystem. An new optional 2nd argument is added, which is the lock file re-check time in seconds. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2019-07-26 03:34:17 +00:00
Martin Schwenke	af8de1bcfd	ctdb-mutex: Add an intermediate asynchronous computation for waiting This will allow more conditions to be waited on via additional sub-requests. At the moment this just completes when the parent wait completes. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2019-07-26 03:34:17 +00:00
Martin Schwenke	fae8e438f0	ctdb-mutex: Change parent checking to use an asynchronous computation Put the checking for the process being immediately re-parented into the computation too. This will be very rare and doing it consistently makes testing saner. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2019-07-26 03:34:17 +00:00
Martin Schwenke	2f768a090e	ctdb-mutex: Exit immediately if the lock isn't taken There is no need to wait until the parent kills the helper. The parent will get the initial response, indicating contention or similar, and will then get a separate event indicating that the pipe is gone. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2019-07-26 03:34:17 +00:00
Martin Schwenke	2b6f1a8ee6	ctdb-mutex: Drop dependency on ctdb_set_helper This makes the code more explicit and makes testing easier due to less dependencies. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2019-07-26 03:34:17 +00:00
Martin Schwenke	76ab0a2b82	ctdb-mutex: Drop unneeded assignment clang warns: ctdb/server/ctdb_mutex_fcntl_helper.c:61:3: warning: Value stored to 'fd' is never read Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2019-07-26 03:34:16 +00:00
Martin Schwenke	98169241ef	ctdb-mutex: Update to use modern debug macro One of these had a missing space, so this implicitly fixes it. It also drops the need to unnecessarily include common.h, which comes with some dependency baggage. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2019-07-26 03:34:16 +00:00
Martin Schwenke	6fe963c3f7	ctdb-recoverd: Periodically log recovery master of incomplete cluster Only do this if the recovery lock is unset. Log every minute for the first 10 minutes, then every 10 minutes, then every hour. This is useful for determining whether a split brain occurred. It is particularly useful if logging failed or was throttled at startup, so there is no evidence of the split brain when it began. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2019-07-26 03:34:16 +00:00
Martin Schwenke	f2559ef8ce	ctdb-recoverd: Log the master at the end of elections Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2019-07-26 03:34:16 +00:00
Martin Schwenke	755a9e654f	ctdb-daemon: Don't check if lock_ctx->ctdb_db is NULL This can never be NULL. It could probably be NULL in the past when "all database" locks existed. There are paths where is is checked for NULL and then later dereferenced, causing static analysers to produce spurious warnings. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2019-07-05 05:03:25 +00:00
Martin Schwenke	79a7cc3fb9	ctdb-daemon: Drop unused function ctdb_vfork_with_logging() Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2019-07-05 05:03:24 +00:00
Martin Schwenke	75a808fd86	ctdb-daemon: Don't index by PNN when initialising node flags Indexing by PNN is wrong. This also removes a signed/unsigned comparison because the PNN is not compared to -1 anymore. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2019-07-05 05:03:24 +00:00
Martin Schwenke	010c1d77cd	ctdb-daemon: Replace function ctdb_ip_to_nodeid() with ctdb_ip_to_pnn() Node ID is a poorly defined concept, indicating the slot in the node map where the IP address was found. This signed value also ends up compared to num_nodes, which is unsigned, producing unwanted warnings. Just return the PNN because this what both callers really want. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2019-07-05 05:03:23 +00:00
Martin Schwenke	4c24d434b9	ctdb-cluster-mutex: Ensure that the configured command is not empty ... and does not just contain whitespace. Otherwise NULL can be passed as the first argument to execv(). Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2019-07-05 05:03:23 +00:00
Martin Schwenke	9c75ad6818	ctdb-daemon: Drop unused values assigned to variable Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2019-07-05 05:03:23 +00:00
Martin Schwenke	c39441f62d	ctdb-daemon: Fix signed/unsigned comparisons by using constant Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2019-07-05 05:03:23 +00:00
Martin Schwenke	76e930d784	ctdb-daemon: Fix signed/unsigned comparisons by casting Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2019-07-05 05:03:23 +00:00
Martin Schwenke	1e47a1b3f6	ctdb-daemon: Fix signed/unsigned comparisons by declaring as unsigned Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2019-07-05 05:03:23 +00:00
Martin Schwenke	3ccce53e3e	ctdb-daemon: Make type of list_of_nodes() consistent with callers Instead of taking exclude_pnn as a parameter, calculate it from an include_self_parameter, which is passed through from the 2 calling functions. While doing this, fix a signed/unsigned comparison issue by declaring the new exclude_pnn local variable as an unsigned type. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2019-07-05 05:03:22 +00:00
Martin Schwenke	6556347901	ctdb-daemon: Make old list_of_nodes() function static The next commit will change the type of this function, which is only used in this file. So, make it static to isolate the change. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2019-07-05 05:03:22 +00:00
Swen Schillig	73640b8ad8	ctdb: Update all consumers of strtoul_err(), strtoull_err() to new API Signed-off-by: Swen Schillig <swen@linux.ibm.com> Reviewed-by: Ralph Boehme <slow@samba.org> Reviewed-by: Christof Schmitt <cs@samba.org>	2019-06-30 11:32:18 +00:00
Martin Schwenke	b1d83fb3e8	ctdb-daemon: Attempt to silence CID 1357985 (Unchecked return value) Yes, the other callers check the return value of ctdb_lockdb_mark(). However, this is called in a void function and ctdb_lockdb_mark() has already printed any error message. All we can do is explicitly ignore the return value. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2019-06-05 10:25:50 +00:00
Martin Schwenke	2db0e71d3b	ctdb-ipalloc: Fix warning about unused value assigned to srcimbl To make this much clearer, move the declaration into the scope where it is used. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2019-06-05 10:25:50 +00:00
Martin Schwenke	7df15b246a	ctdb-ipalloc: Avoid -1 as a PNN, use CTDB_UNKNOWN_PNN instead This fixes warnings about signed versus unsigned comparisons. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2019-06-05 10:25:50 +00:00
Martin Schwenke	86666d6570	ctdb-ipalloc: Fix signed/unsigned comparisons by declaring as unsigned Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2019-06-05 10:25:50 +00:00
Martin Schwenke	90622ab901	ctdb-recovery: Fix signed/unsigned comparisons by declaring as unsigned Simple cases where variables and function parameters need to be declared as an unsigned type instead of an int. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2019-06-05 10:25:50 +00:00
Martin Schwenke	35368d871d	ctdb-recovery: Avoid -1 as a PNN, use CTDB_UNKNOWN_PNN instead Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2019-06-05 10:25:50 +00:00
Martin Schwenke	978c7dbd55	ctdb-recovery: Fix signed/unsigned comparison by casting Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2019-06-05 10:25:50 +00:00
Martin Schwenke	fa7bd35b6a	ctdb-recovery: Fix signed/unsigned comparisons by declaring as unsigned Simple cases where variables need to be declared as an unsigned type instead of an int. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2019-06-05 10:25:50 +00:00
Stefan Metzmacher	b9b3acf23e	ctdb:takeover: add better debugging when a client connects to a non public address Signed-off-by: Stefan Metzmacher <metze@samba.org> Reviewed-by: Martin Schwenke <martin@meltin.net>	2019-06-04 22:13:07 +00:00
Noel Power	71e7b5d14e	ctdb/server: cppcheck: fix shiftTooManyBitsSigned error Fixes ctdb/server/ipalloc_lcp2.c:61: error: shiftTooManyBitsSigned: Shifting signed 32-bit value by 31 bits is undefined behaviour <--[cppcheck] Signed-off-by: Noel Power <noel.power@suse.com> Reviewed-by: Andreas Schneider <asn@samba.org>	2019-06-04 22:13:07 +00:00
Volker Lendecke	e7424897a1	ctdb: Make TDB_SEQNUM work synchronously with ctdb Old war story completely from memory, I could not find the commit that introduced TDB_SEQNUM so far...: Back in the days when ctdb was initially developed, TDB_SEQNUM's only user was the notify.tdb that held one huge record for all notify records. With that use case in mind it made perfect sense to keep the SEQNUM stable locally, sacrificing precision. By now notify.tdb is long gone, an the only user of TDB_SEQNUM right now is brlock.tdb, which contains special case code for the imprecise ctdb implementation of TDB_SEQNUM. With this commit, that special code can go: The TDB_SEQNUM will also increment when just the DMASTER header field changes, indicating to smbd that someone else might have changed the record. This will of course increase the SEQNUM frequency, but it should not increase the load on ctdb: If you look at the brlock.c workaround, it just does not do the caching that is possible with precise TDB_SEQNUMs working. How did I get here? I want to move brl_num_read_oplocks() from brlock.tdb into locking.tdb, and for that I need precise TDB_SEQNUMs for locking.tdb. Signed-off-by: Volker Lendecke <vl@samba.org> Reviewed-by: Jeremy Allison <jra@samba.org> Autobuild-User(master): Jeremy Allison <jra@samba.org> Autobuild-Date(master): Fri May 24 00:42:17 UTC 2019 on sn-devel-184	2019-05-24 00:42:17 +00:00
Martin Schwenke	6a2941e2a9	ctdb-recoverd: Fix memory leak state is always freed before exiting this function, so allocate fde off it instead of long-lived ctdb context. BUG: https://bugzilla.samba.org/show_bug.cgi?id=13943 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2019-05-14 07:25:37 +00:00
Martin Schwenke	8663e0a64f	ctdb-daemon: Never use 0 as a client ID ctdb_control_db_attach() and ctdb_control_db_detach() assume that any control with client ID 0 comes from another daemon and treat it specially. BUG: https://bugzilla.samba.org/show_bug.cgi?id=13930 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2019-05-13 07:27:24 +00:00
Martin Schwenke	95477e69e3	ctdb-daemon: Log when ctdbd CPU utilisation exceeds a threshold This is to help us notice when ctdbd is using the full capacity of a CPU, so is saturated. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2019-05-07 05:45:34 +00:00
Volker Lendecke	43cacaad57	ctdb: Fix a typo Signed-off-by: Volker Lendecke <vl@samba.org> Reviewed-by: Martin Schwenke <martin@meltin.net> Autobuild-User(master): Martin Schwenke <martins@samba.org> Autobuild-Date(master): Sat Apr 6 11:51:55 UTC 2019 on sn-devel-144	2019-04-06 11:51:55 +00:00
Volker Lendecke	bb1e32297e	ctdb: Slightly simplify ctdb_ltdb_lock_fetch_requeue Reduce indentation with an early return Signed-off-by: Volker Lendecke <vl@samba.org> Reviewed-by: Martin Schwenke <martin@meltin.net>	2019-04-06 10:47:13 +00:00
Amitay Isaacs	edd4a23d76	ctdb-version: Simplify version string usage There is no need to write SAMBA_VERSION_STRING as CTDB_VERSION_STRING. Wherever required use SAMBA_VERSION_STRING directly. Avoids the confusion with two version.h files. BUG: https://bugzilla.samba.org/show_bug.cgi?id=13789 Signed-off-by: Amitay Isaacs <amitay@samba.org> Reviewed-by: Martin Schwenke <martin@meltin.net> Autobuild-User(master): Amitay Isaacs <amitay@samba.org> Autobuild-Date(master): Fri Mar 15 06:31:50 UTC 2019 on sn-devel-144	2019-03-15 06:31:50 +00:00
Martin Schwenke	8c2ff3f2b5	ctdb-daemon: Add an environment variable to set version This can be used to test the version checking logic. Cache the version to avoid re-checking the environment variable each time. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@samba.org>	2019-03-15 05:17:14 +00:00
Amitay Isaacs	278eb236ae	ctdb-daemon: Fix maybe-uninitialized error with picky developer 263/386] Compiling ctdb/server/ctdb_recovery_helper.c In file included from ../../server/ctdb_recovery_helper.c:24:0: ../../server/ctdb_recovery_helper.c: In function ‘main’: ../../../lib/talloc/talloc.h:911:34: error: ‘mem_ctx’ may be used uninitialized in this function [-Werror=maybe-uninitialized] #define TALLOC_FREE(ctx) do { if (ctx != NULL) { talloc_free(ctx); ctx=NULL; } } while(0) Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Jeremy Allison <jra@samba.org>	2019-03-01 17:21:15 +00:00
Swen Schillig	55acae774a	ctdb-server: Use wrapper for string to integer conversion In order to detect an value overflow error during the string to integer conversion with strtoul/strtoull, the errno variable must be set to zero before the execution and checked after the conversion is performed. This is achieved by using the wrapper function strtoul_err and strtoull_err. Signed-off-by: Swen Schillig <swen@linux.ibm.com> Reviewed-by: Ralph Böhme <slow@samba.org> Reviewed-by: Jeremy Allison <jra@samba.org>	2019-03-01 00:32:11 +00:00
Martin Schwenke	c93430fe8f	ctdb-cluster-mutex: Separate out command and file handling This code is difficult to read and there really is no common code between the 2 cases. For example, there is no need to split a filename into words. Separating each of the 2 cases into its own function makes the logic much easier to understand. BUG: https://bugzilla.samba.org/show_bug.cgi?id=13800 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> Autobuild-User(master): Amitay Isaacs <amitay@samba.org> Autobuild-Date(master): Mon Feb 25 03:40:16 CET 2019 on sn-devel-144	2019-02-25 03:40:16 +01:00
Martin Schwenke	13a1a48089	ctdb-recoverd: Time out attempt to take recovery lock after 120s Currently this will wait forever. It really needs a timeout in case the cluster filesystem (or other lock mechanism) is completely wedged. BUG: https://bugzilla.samba.org/show_bug.cgi?id=13800 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2019-02-25 02:12:17 +01:00
Martin Schwenke	45a77d65b2	ctdb-recoverd: Ban node on unknown error when taking recovery lock We really shouldn't see unknown errors. They probably represent a misconfigured recovery lock or similar. BUG: https://bugzilla.samba.org/show_bug.cgi?id=13800 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2019-02-25 02:12:17 +01:00
Martin Schwenke	c0fb62ed39	ctdb-recoverd: Make recoverd context available in recovery lock handle BUG: https://bugzilla.samba.org/show_bug.cgi?id=13800 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2019-02-25 02:12:16 +01:00
Martin Schwenke	7e4aae6943	ctdb-recoverd: Clean up logging on failure to take recovery lock Add an explicit case for a timeout and clean up the other messages. BUG: https://bugzilla.samba.org/show_bug.cgi?id=13800 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2019-02-25 02:12:16 +01:00
Martin Schwenke	621658cbed	ctdb-recoverd: Free cluster mutex handler on failure to take lock If nested events occur while the file descriptor handler is still active then chaos can ensue. For example, if a node is banned and the lock is explicitly cancelled (e.g. due to election loss) then double-talloc-free()s abound. BUG: https://bugzilla.samba.org/show_bug.cgi?id=13800 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2019-02-25 02:12:16 +01:00
Martin Schwenke	944c92a15d	ctdb-daemon: Modernise debug during record deletion for vacuuming Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> Autobuild-User(master): Amitay Isaacs <amitay@samba.org> Autobuild-Date(master): Tue Dec 18 10:13:50 CET 2018 on sn-devel-144	2018-12-18 10:13:50 +01:00
Martin Schwenke	cdca0d7e78	ctdb-daemon Add extra debug during record deletion for vacuuming It isn't currently possible to distinguish these 2 cases. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2018-12-18 07:12:10 +01:00
Martin Schwenke	f1b594dce1	ctdb-daemon: Do not force full vacuum on first vacuuming run When the number of fast path vacuuming runs is 0 then a full vacuuming run is done. This means the first one is a full run, which is almost certainly not what is intended. Combine the 2 conditionals to only flag a full vacuuming run when the count exceeds the configured limit. This means that the full_vacuum_run flag is set in both parent and child, but this is harmless... and is better than getting it wrong. Also tweak the comparison to be less-than-or-equal, since the zeroth run needs to be counted. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2018-12-18 07:12:09 +01:00
Martin Schwenke	da8aaf2aee	ctdb-recoverd: Call an election when the recovery lock is lost The lock may have been lost due to a failure in the underlying locking mechanism. This could be due to quorum loss or similar. It is best to call an election to confirm that this node should still be master. At worst, the node will reelect itself, fail to take the lock and then ban itself. This is a suitable outcome for a node that has been partitioned from others in the cluster. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2018-12-18 02:02:03 +01:00
Martin Schwenke	93284ed032	ctdb-daemon: Divide by 2 when calculating hop count bucket This provides finer resolution while still maintaining a reasonable maximum. In this case the top bucket contains any hop counts >= 16384, compared to the current situation where the top bucket contains hop counts >= 268435456. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2018-12-18 02:02:03 +01:00
Martin Schwenke	dd7574afd1	ctdb-daemon: Exit with error if a database directory does not exist Since 4.9.0, the log messages can be confusing if a required database directory does not exist. Explicitly check for database directories, logging a clear error and exiting if one is missing. BUG: https://bugzilla.samba.org/show_bug.cgi?id=13696 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> Autobuild-User(master): Amitay Isaacs <amitay@samba.org> Autobuild-Date(master): Mon Dec 3 06:56:41 CET 2018 on sn-devel-144	2018-12-03 06:56:41 +01:00
Andreas Schneider	2d512b278e	debug: Use debuglevel_(get\|set) function Signed-off-by: Andreas Schneider <asn@samba.org> Reviewed-by: Jeremy Allison <jra@samba.org> Autobuild-User(master): Andreas Schneider <asn@cryptomilk.org> Autobuild-Date(master): Thu Nov 8 11:03:11 CET 2018 on sn-devel-144	2018-11-08 11:03:11 +01:00
Martin Schwenke	6e16e95f74	ctdb-daemon: Do not fork when CTDB_TEST_MODE is set Explicitly background ctdbd instead. This has the advantage of leaving stdin open. ctdbd can then be enhanced to exit when stdin closes, allowing better cleanup in a test environment. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> Autobuild-User(master): Amitay Isaacs <amitay@samba.org> Autobuild-Date(master): Tue Nov 6 10:30:14 CET 2018 on sn-devel-144	2018-11-06 10:30:14 +01:00
Martin Schwenke	01f6fbba4e	ctdb-daemon: Switch interactive variable to a bool popt uses an int in place of a bool, so declare an extra int and make the conversion explicit. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2018-11-06 07:16:18 +01:00
Martin Schwenke	4e6bd42493	ctdb-daemon: Improve documentation for -i option Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2018-11-06 07:16:15 +01:00
Martin Schwenke	9c41481f21	ctdb-daemon: Don't set log_to_stdout for become_daemon() ctdbd logs to stderr in interactive mode, not stdout. This way stdout is always closed. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2018-11-06 07:16:15 +01:00
Martin Schwenke	c84254d23d	ctdb-daemon: Avoid unnecessarily spamming the logs when in test mode Logging the logging location to syslog can be useful on production systems when the configuration goes unexpectedly missing. However, in test mode this just adds noise to the logs on the test system. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2018-11-06 07:16:14 +01:00
Martin Schwenke	d75fa2c3fd	ctdb-daemon: Drop unused function ctdb_set_socketname() Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2018-11-06 07:16:14 +01:00
Martin Schwenke	5f478b7c5f	ctdb-daemon: Use path functions for socket and PID file Drop the use of ctdb_set_sockname() because it complicates the memory allocation and this is the only place it is used. Just assign to the relevant pointer. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2018-11-06 07:16:14 +01:00
Martin Schwenke	27df4f002a	ctdb-recovery: Ban a node that causes recovery failure ... instead of applying banning credits. There have been a couple of cases where recovery repeatedly takes just over 2 minutes to fail. Therefore, banning credits expire between failures and a continuously problematic node is never banned, resulting in endless recoveries. This is because it takes 2 applications of banning credits before a node is banned, which generally involves 2 recovery failures. The recovery helper makes up to 3 attempts to recover each database during a single run. If a node causes 3 failures then this is really equivalent to 3 recovery failures in the model that existed before the recovery helper added retries. In that case the node would have been banned after 2 failures. So, instead of applying banning credits to the "most failing" node, simply ban it directly from the recovery helper. If multiple nodes are causing recovery failures then this can cause a node to be banned more quickly than it might otherwise have been, even pre-recovery-helper. However, 90 seconds (i.e. 3 failures) is a long time to be in recovery, so banning earlier seems like the best approach. BUG: https://bugzilla.samba.org/show_bug.cgi?id=13670 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> Autobuild-User(master): Amitay Isaacs <amitay@samba.org> Autobuild-Date(master): Mon Nov 5 06:52:33 CET 2018 on sn-devel-144	2018-11-05 06:52:33 +01:00
Martin Schwenke	fbea9d3699	ctdb-daemon: Fix valgrind hit in event code ==25741== Syscall param write(buf) points to uninitialised byte(s) ==25741== at 0x4939291: write (write.c:27) ==25741== by 0x4868285: sys_write (sys_rw.c:68) ==25741== by 0x13915D: sock_queue_trigger (sock_io.c:316) ==25741== by 0x4DE6478: tevent_common_invoke_immediate_handler (in /usr/lib/x86_64-linux-gnu/libtevent.so.0.9.37) ==25741== by 0x4DE64A2: tevent_common_loop_immediate (in /usr/lib/x86_64-linux-gnu/libtevent.so.0.9.37) ==25741== by 0x4DEBE5A: ??? (in /usr/lib/x86_64-linux-gnu/libtevent.so.0.9.37) ==25741== by 0x4DEA2D6: ??? (in /usr/lib/x86_64-linux-gnu/libtevent.so.0.9.37) ==25741== by 0x4DE57E3: _tevent_loop_once (in /usr/lib/x86_64-linux-gnu/libtevent.so.0.9.37) ==25741== by 0x15D1BA: ctdb_event_script_args (eventscript.c:821) ==25741== by 0x13B437: ctdb_start_daemon (ctdb_daemon.c:1315) ==25741== by 0x110642: main (ctdbd.c:393) ==25741== Address 0x57888a4 is 100 bytes inside a block of size 144 alloc'd ==25741== at 0x48357BF: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==25741== by 0x4B9B7C0: talloc_named_const (in /usr/lib/x86_64-linux-gnu/libtalloc.so.2.1.14) ==25741== by 0x15CCC6: eventd_client_write (eventscript.c:430) ==25741== by 0x15CCC6: eventd_client_run (eventscript.c:556) ==25741== by 0x15CCC6: ctdb_event_script_run (eventscript.c:649) ==25741== by 0x15D198: ctdb_event_script_args (eventscript.c:812) ==25741== by 0x13B437: ctdb_start_daemon (ctdb_daemon.c:1315) ==25741== by 0x110642: main (ctdbd.c:393) ==25741== BUG: https://bugzilla.samba.org/show_bug.cgi?id=13659 Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> Autobuild-User(master): Amitay Isaacs <amitay@samba.org> Autobuild-Date(master): Mon Oct 22 09:27:15 CEST 2018 on sn-devel-144	2018-10-22 09:27:15 +02:00
Martin Schwenke	c9e1603a5d	ctdb-daemon: Exit if eventd goes away ctdbd enters a broken state if eventd goes away. A clean shutdown is not possible because that involves running events. Restarting eventd is possible but this might mask a serious problem and it is possible that eventd might keep on disappearing. Just exit. BUG: https://bugzilla.samba.org/show_bug.cgi?id=13659 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2018-10-22 06:04:20 +02:00
Martin Schwenke	a3d12252fa	ctdb-daemon: Return early when refusing to run an event script BUG: https://bugzilla.samba.org/show_bug.cgi?id=13659 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2018-10-22 06:04:20 +02:00
Amitay Isaacs	d18385ea2a	ctdb-daemon: Drop implementation of RECEIVE_RECORDS control BUG: https://bugzilla.samba.org/show_bug.cgi?id=13641 Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2018-10-08 02:46:21 +02:00
Amitay Isaacs	e15cdc652d	ctdb-vacuum: Remove unnecessary check for zero records in delete list Since no records are deleted from RB tree during step 1, there is no need for the check. Run step 2 unconditionally. BUG: https://bugzilla.samba.org/show_bug.cgi?id=13641 Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2018-10-08 02:46:21 +02:00
Amitay Isaacs	ef05239717	ctdb-vacuum: Fix the incorrect counting of remote errors If a node fails to delete a record in TRY_DELETE_RECORDS control during vacuuming, then it's possible that other nodes also may fail to delete a record. So instead of deleting the record from RB tree on first failure, keep track of the remote failures. Update delete_list.remote_error and delete_list.left statistics only once per record during the delete_record_traverse. BUG: https://bugzilla.samba.org/show_bug.cgi?id=13641 Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2018-10-08 02:46:21 +02:00
Amitay Isaacs	202b9027ba	ctdb-vacuum: Simplify the deletion of vacuumed records The 3-phase deletion of vacuumed records was introduced to overcome the problem of record(s) resurrection during recovery. This problem is now handled by avoiding the records from recently INACTIVE nodes in the recovery process. BUG: https://bugzilla.samba.org/show_bug.cgi?id=13641 Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2018-10-08 02:46:20 +02:00
Amitay Isaacs	c4ec99b1d3	ctdb-daemon: Invalidate records if a node becomes INACTIVE BUG: https://bugzilla.samba.org/show_bug.cgi?id=13641 Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2018-10-08 02:46:20 +02:00
Amitay Isaacs	040401ca3a	ctdb-daemon: Don't pull any records if records are invalidated This avoids unnecessary work during recovery to pull records from nodes that were INACTIVE just before the recovery. BUG: https://bugzilla.samba.org/show_bug.cgi?id=13641 Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2018-10-08 02:46:20 +02:00
Martin Schwenke	486022ef8f	ctdb-recoverd: Set recovery lock handle at start of attempt This allows the attempt to be cancelled if an election is lost and an unlock is done before the attempt is completed. BUG: https://bugzilla.samba.org/show_bug.cgi?id=13617 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> Autobuild-User(master): Martin Schwenke <martins@samba.org> Autobuild-Date(master): Tue Sep 18 02:18:30 CEST 2018 on sn-devel-144	2018-09-18 02:18:30 +02:00
Martin Schwenke	b1dc568784	ctdb-recoverd: Handle cancellation when releasing recovery lock If the recovery lock is in the process of being taken then free the cluster mutex handle but leave the recovery lock handle in place. This allows ctdb_recovery_lock() to fail. Note that this isn't yet live because rec->recovery_lock_handle is still only set at the completion of the attempt to take the lock. BUG: https://bugzilla.samba.org/show_bug.cgi?id=13617 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2018-09-17 22:58:20 +02:00
Martin Schwenke	a755d060c1	ctdb-recoverd: Return early when the recovery lock is not held This makes upcoming changes simpler. Update to modern debug macro while touching relevant line. BUG: https://bugzilla.samba.org/show_bug.cgi?id=13617 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2018-09-17 22:58:20 +02:00
Martin Schwenke	c52216740b	ctdb-recoverd: Store recovery lock handle ... not just cluster mutex handle. This makes the recovery lock handle long-lived and with allow the releasing code to cancel an in-progress attempt to take the recovery lock. The cluster mutex handle is now allocated off the recovery lock handle. BUG: https://bugzilla.samba.org/show_bug.cgi?id=13617 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2018-09-17 22:58:20 +02:00
Martin Schwenke	a53b264aee	ctdb-recoverd: Use talloc() to allocate recovery lock handle At the moment this is still local and is freed after the mutex is successfully taken. BUG: https://bugzilla.samba.org/show_bug.cgi?id=13617 Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2018-09-17 22:58:20 +02:00

1 2 3 4 5 ...

2488 Commits