1
0
mirror of https://github.com/samba-team/samba.git synced 2025-01-10 01:18:15 +03:00
Commit Graph

8626 Commits

Author SHA1 Message Date
Martin Schwenke
5c8dfbbf9b ctdb-daemon: Add extra logging of hot keys
ctdbd currently only logs when a new hot key is added.  If a key gets
hotter then nothing new is logged.

Log hot key updates when the number of migrations has doubled since
the last time that key was logged.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-05-22 06:41:45 +00:00
Martin Schwenke
baf058dcf7 ctdb-daemon: Update hot key logging
This message indicates that a hot key was added, so say that.  After
all the hot key slots have been filled the id will always be 0, so
don't bother logging it.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-05-22 06:41:44 +00:00
Martin Schwenke
1ab39b3270 ctdb-daemon: Fix bug in slot 0 comparison optimisation
This is only valid if all slots are in use.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-05-22 06:41:44 +00:00
Martin Schwenke
f9f60c2a60 ctdb-daemon: Switch some variables to unsigned
These should be unsigned but luck is currently on our side.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-05-22 06:41:44 +00:00
Martin Schwenke
21b9844bcb ctdb-daemon: Add separate hot keys array for database statistics
There are 2 reasons for this.  Sorting of hot keys is broken and will
be changed to an implementation that needs a named (i.e. not
anonymous) structure.  Also, at least one non-protocol field will be
added to facilitate more useful logging.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-05-22 06:41:44 +00:00
Martin Schwenke
c28914bfa7 ctdb-build: Fix a typo
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-05-22 06:41:44 +00:00
Ralph Boehme
6e419dda71 ctdb: increase TasksMax limit, the systemd default is just 512
In 2015 systemd introduced a TasksMax which limits the number of processes in a
unit:

https://lists.freedesktop.org/archives/systemd-devel/2015-November/035006.html

The default of 512 may be too low in certain situations leading to vfork()
failing with errno=EAGAIN when trying to spawn lock-helper processes.

With the default for LockProcessesPerDB being 200 the increased TasksMax limit
should cover the problematic scenario.

Additional background: the failing vfork()s have been seen on production
clusters and were tracked down to being logged in the context of ctdb calling
tdb_repack().

Links:

9ded9cd14c
https://www.suse.com/support/kb/doc/?id=000015901
https://success.docker.com/article/how-to-reserve-resource-temporarily-unavailable-errors-due-to-tasksmax-setting
https://www.percona.com/blog/2019/01/02/tasksmax-another-setting-that-can-cause-mysql-error-messages/

Signed-off-by: Ralph Boehme <slow@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>

Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Wed May 13 13:30:12 UTC 2020 on sn-devel-184
2020-05-13 13:30:12 +00:00
Amitay Isaacs
23c2195e2c ctdb-build: Add messages_dgm build to ctdb
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Volker Lendecke <vl@samba.org>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Wed May  6 01:47:16 UTC 2020 on sn-devel-184
2020-05-06 01:47:16 +00:00
Amitay Isaacs
a59fd8164c lib/util: Build genrand for util core
messages_dgm depends on genrand.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Volker Lendecke <vl@samba.org>
2020-05-06 00:06:40 +00:00
Volker Lendecke
d9ccd853c3 ctdb: Implement CTDB_CONTROL_ECHO_DATA
Testing control: 4 bytes msec delay plus a blob, return the request after the
delay. This is an enhanced "ping" which can be used to test asynchronous
clients.

Doesn't have the full protocol implementation yet

Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2020-04-28 09:08:39 +00:00
Volker Lendecke
bdabf78122 ctdb-protocol: Add marshalling for control ECHO_DATA
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2020-04-28 09:08:39 +00:00
Volker Lendecke
6f56f45639 ctdb-protocol: Add marshalling for struct ctdb_echo_data
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2020-04-28 09:08:39 +00:00
Volker Lendecke
4f3db63d5e ctdb-protocol: Add new control CTDB_CONTROL_ECHO_DATA
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2020-04-28 09:08:39 +00:00
Volker Lendecke
861dd8c48a ctdb: Fix duplicate ;;
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2020-04-28 09:08:39 +00:00
Renaud Fortier
fdfc480a56 ctdb-scripts: Update nfs-ganesha-callout
On debian buster, this variable doesn't exist anymore. Look at this PR
as a reference:

  https://github.com/gluster/storhaug/pull/30

Signed-off-by: Renaud Fortier <renaud.fortier@fsaa.ulaval.ca>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>

Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Thu Apr 23 08:07:51 UTC 2020 on sn-devel-184
2020-04-23 08:07:51 +00:00
Volker Lendecke
ad4b53f2d9 ctdb: Fix a memleak
Bug: https://bugzilla.samba.org/show_bug.cgi?id=14348
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>

Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Fri Apr 17 08:32:35 UTC 2020 on sn-devel-184
2020-04-17 08:32:35 +00:00
Martin Schwenke
f8f3d7954d ctdb-vacuum: Reschedule vacuum event if VacuumInterval has increased
The vacuuming integration tests set VacuumInterval to a very high
number to avoid vacuuming collisions.  This is done after the cluster
is healthy, so Samba will have already been started and vacuuming will
already be scheduled *at the default interval* for databases attached
by Samba.  This means that vacuuming controls used by vacuuming tests
can still collide with the scheduled vacuuming events.

Add some logic to reschedule a vacuuming event that has fired but
where VacuumInterval has increased since it was originally scheduled.
The increase in VacuumInterval is used as the time offset for
rescheduling the event.

Although this changes production behaviour for the convenience of
testing, the new behaviour is completely reasonable and obeys the
principle of least surprise.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue Apr  7 03:04:57 UTC 2020 on sn-devel-184
2020-04-07 03:04:57 +00:00
Martin Schwenke
5d03a3c86e ctdb-vacuum: Store value of VacuumInterval in ctdb_vacuum_handle
No behaviour change.  This is final staging to make the next change
completely obvious.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-04-07 01:26:41 +00:00
Martin Schwenke
7ad7c0b932 ctdb-vacuum: Use vacuum_handle local variables
No behaviour change.  This just makes future changes clearer by
avoiding reformatting (or introducing local variables).

Clean up error handling while touching a relevant line.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-04-07 01:26:41 +00:00
Martin Schwenke
716f52f68b ctdb-recoverd: Avoid dereferencing NULL rec->nodemap
Inside the nested event loop in ctdb_ctrl_getnodemap(), various
asynchronous handlers may dereference rec->nodemap, which will be
NULL.

One example is lost_reclock_handler(), which causes rec->nodemap to be
unconditionally dereferenced in list_of_nodes() via this call chain:

  list_of_nodes()
  list_of_active_nodes()
  set_recovery_mode()
  force_election()
  lost_reclock_handler()

Instead of attempting to trace all of the cases, just avoid leaving
rec->nodemap set to NULL.  Attempting to use an old value is generally
harmless, especially since it will be the same as the new value in
most cases.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14324

Reported-by: Volker Lendecke <vl@samba.org>
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Tue Mar 24 01:22:45 UTC 2020 on sn-devel-184
2020-03-24 01:22:45 +00:00
Martin Schwenke
147afe77de ctdb-daemon: Don't allow attach from recovery if recovery is not active
Neither the recovery daemon nor the recovery helper should attach
databases outside of the recovery process.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14294

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-03-23 23:45:38 +00:00
Martin Schwenke
052f1bdb9c ctdb-daemon: Remove more unused old client database functions
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14294

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-03-23 23:45:38 +00:00
Martin Schwenke
3a66d181b6 ctdb-recovery: Remove old code for creating missing databases
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14294

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-03-23 23:45:38 +00:00
Martin Schwenke
76a8174279 ctdb-recovery: Create database on nodes where it is missing
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14294

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-03-23 23:45:38 +00:00
Martin Schwenke
e6e63f8fb8 ctdb-recovery: Fetch database name from all nodes where it is attached
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14294

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-03-23 23:45:38 +00:00
Martin Schwenke
1bdfeb3fdc ctdb-recovery: Pass db structure for each database recovery
Instead of db_id and db_flags.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14294

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-03-23 23:45:38 +00:00
Martin Schwenke
c6f74e590f ctdb-recovery: GET_DBMAP from all nodes
This builds a complete list of databases across the cluster so it can
be used to create databases on the nodes where they are missing.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14294

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-03-23 23:45:38 +00:00
Martin Schwenke
4c0b9c3605 ctdb-recovery: Replace use of ctdb_dbid_map with local db_list
This will be used to build a merged list of databases from all nodes,
allowing the recovery helper to create missing databases.

It would be possible to also include the db_name field in this
structure but that would cause a lot of churn.  This field is used
locally in the recovery of each database so can continue to live in
the relevant state structure(s).

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14294

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-03-23 23:45:38 +00:00
Martin Schwenke
7e5a8a4884 ctdb-daemon: Respect CTDB_CTRL_FLAG_ATTACH_RECOVERY when attaching databases
This is currently only set by the recovery daemon when it attaches
missing databases, so there is no obvious behaviour change.  However,
attaching missing databases can now be moved to the recovery helper as
long as it sets this flag.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14294

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-03-23 23:45:38 +00:00
Martin Schwenke
98e3d0db2b ctdb-recovery: Use CTDB_CTRL_FLAG_ATTACH_RECOVERY to attach during recovery
ctdb_ctrl_createdb() is only called by the recovery daemon, so this is
a safe, temporary change.  This is temporary because
ctdb_ctrl_createdb(), create_missing_remote_databases() and
create_missing_local_databases() will all go away soon.

Note that this doesn't cause a change in behaviour.  The main daemon
will still only defer attaches from non-recoverd processes during
recovery.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14294

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-03-23 23:45:38 +00:00
Martin Schwenke
17ed042590 ctdb-protocol: Add control flag CTDB_CTRL_FLAG_ATTACH_RECOVERY
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14294

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-03-23 23:45:37 +00:00
Martin Schwenke
fc23cd1b9c ctdb-daemon: Remove unused old client database functions
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14294

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-03-23 23:45:37 +00:00
Martin Schwenke
c6c89495fb ctdb-daemon: Fix database attach deferral logic
Commit 3cc230b5ee says:

  Dont allow clients to connect to databases untile we are well past
  and through the initial recovery phase

It is unclear what this commit was attempting to do.  The commit
message implies that more attaches should be deferred but the code
change adds a conjunction that causes less attaches to be deferred.
In particular, no attaches will be deferred after startup is complete.
This seems wrong.

To implement what seems to be stated in the commit message an "or"
needs to be used so that non-recovery daemon attaches are deferred
either when in recovery or before startup is complete.  Making this
change highlights that attaches need to be allowed during the
"startup" event because this is when smbd is started.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14294

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-03-23 23:45:37 +00:00
Amitay Isaacs
1c56d6413f ctdb-recovery: Refactor banning a node into separate computation
If a node is marked for banning, confirm that it's not become inactive
during the recovery.  If yes, then don't ban the node.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14294

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2020-03-23 23:45:37 +00:00
Amitay Isaacs
c6a0ff1bed ctdb-recovery: Don't trust nodemap obtained from local node
It's possible to have a node stopped, but recovery master not yet
updated flags on the local ctdb daemon when recovery is started.  So do
not trust the list of active nodes obtained from the local node.  Query
the connected nodes to calculate the list of active nodes.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14294

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2020-03-23 23:45:37 +00:00
Amitay Isaacs
6e2f8756f1 ctdb-recovery: Consolidate node state
This avoids passing multiple arguments to async computation.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14294

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2020-03-23 23:45:37 +00:00
Amitay Isaacs
072ff4d12b ctdb-recovery: Fetched vnnmap is never used, so don't fetch it
New vnnmap is constructed using the information from all the connected
nodes.  So there is no need to fetch the vnnmap from recovery master.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14294

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2020-03-23 23:45:37 +00:00
Martin Schwenke
319c93f0c6 ctdb-tcp: Do not stop outbound connection in ctdb_tcp_node_connect()
The only place the outgoing connection needs to be stopped is when
there is a timeout when waiting for the connection to become writable.
Add a new function ctdb_tcp_node_connect_timeout() to handle this
case.

All of the other cases are attempts to establish a new outgoing
connection (initial attempt, retry after an error or disconnect, ...)
so drop stopping the connection in those cases.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14295

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Signed-off-by: Martin Schwenke <martin@meltin.net>

Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Thu Mar 12 05:29:20 UTC 2020 on sn-devel-184
2020-03-12 05:29:20 +00:00
Martin Schwenke
3c8747fe29 ctdb-tcp: Factor out function ctdb_tcp_start_outgoing()
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14295

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Signed-off-by: Martin Schwenke <martin@meltin.net>
2020-03-12 03:47:30 +00:00
Ralph Boehme
2c73dbafba ctdb-tcp: add ctdb_tcp_stop_incoming()
No change in behaviour.  This makes the code self-documenting.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14295

Signed-off-by: Ralph Boehme <slow@samba.org>
Signed-off-by: Martin Schwenke <martin@meltin.net>
2020-03-12 03:47:30 +00:00
Ralph Boehme
1e2a967ff4 ctdb-tcp: rename ctdb_tcp_stop_connection() to ctdb_tcp_stop_outgoing()
No change in behaviour.  This makes the code self-documenting.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14295

Signed-off-by: Ralph Boehme <slow@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2020-03-12 03:47:30 +00:00
Ralph Boehme
ea37ecdcd5 ctdb-tcp: Remove redundant restart in ctdb_tcp_tnode_cb()
The node dead upcall has already restarted the outgoing connection.
There's no need to repeat it.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14295

Signed-off-by: Ralph Boehme <slow@samba.org>
Signed-off-by: Martin Schwenke <martin@meltin.net>
2020-03-12 03:47:30 +00:00
Ralph Boehme
b83ef98c74 ctdb-tcp: always call node_dead() upcall in ctdb_tcp_tnode_cb()
ctdb_tcp_tnode_cb() is called when we receive data on the outgoing connection.

This can happen when we get an EOF on the connection because the other side as
closed. In this case data will be NULL.

It would also be called if we received data from the peer. In this case data
will not be NULL.

The latter case is a fatal error though and we already call
ctdb_tcp_stop_connection() for this case as well, which means even though the
node is not fully connected anymore, by not calling the node_dead() upcall
NODE_FLAGS_DISCONNECTED will not be set.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14295

Signed-off-by: Ralph Boehme <slow@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2020-03-12 03:47:30 +00:00
Noel Power
0ff1b78fc2 ctdb-tcp: move free of inbound queue to TCP restart
Since commit 77deaadca8, a nodeA which
had previously accepted a connection from nodeB (where nodeB dies
e.g. as as result of fencing) when nodeB attempts to connect again
after restarting is always rejected with

 ctdb_listen_event: Incoming queue active, rejecting connection from w.x.y.z

messages.

Consolidate dead node handling in the TCP restart handling.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14295

Signed-off-by: Noel Power <noel.power@suse.com>
Reviewed-by: Ralph Boehme <slow@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2020-03-12 03:47:30 +00:00
Martin Schwenke
15762a3455 ctdb-daemon: more logical whitespace, debug modernisation
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14295

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Ralph Boehme <slow@samba.org>
2020-03-12 03:47:30 +00:00
Ralph Boehme
6a4fa0785f ctdb-daemon: ensure restart() callback is called in half-connected state
If NODE_FLAGS_DISCONNECTED is set the node can be in half-connected state. With
this change we ensure to restart the transport for this case.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14295

Signed-off-by: Ralph Boehme <slow@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2020-03-12 03:47:30 +00:00
Martin Schwenke
9f9dcfb6c3 ctdb-tests: Use built-in hexdump() in system socket tests
Better compatibility, since od output isn't consistent on FreeBSD.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>

Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Tue Mar 10 09:17:12 UTC 2020 on sn-devel-184
2020-03-10 09:17:12 +00:00
Martin Schwenke
602694522f ctdb-tests: Split system socket test
One test for each of types, TCP, ARP.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-03-10 07:37:34 +00:00
Martin Schwenke
b10e79f208 ctdb-tests: Skip "ctdb process-exists" tests when not on Linux
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-03-10 07:37:34 +00:00
Martin Schwenke
c5dd476715 ctdb-tests: Add function ctdb_test_check_supported_OS
Skips test if not on one of the supported OSes.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
2020-03-10 07:37:34 +00:00