samba-mirror

mirror of https://github.com/samba-team/samba.git synced 2024-12-27 03:21:53 +03:00

Author	SHA1	Message	Date
Martin Schwenke	d340f308e7	ctdb-daemon: Don't delay reloading the nodes file Presumably this was done to minimise the chance of a recovery occurring while the nodemaps are inconsistent across nodes. Another potential theory is that the forced recovery in the ctdb.c:control_reload_nodes_file() stops another recovery occurring for ReRecoveryTimeout seconds, so this delay causes the reloads to occur during that period. This is no longer necessary because recoveries are now explicitly disabled while node files are reloaded. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-04-07 07:43:13 +02:00
Martin Schwenke	a5be2c245d	ctdb-daemon: Store node addresses as ctdb_sock_addr rather than strings Every time a nodemap is contructed the node IP addresses all need to be parsed. This isn't very productive use of CPU. Instead, parse each string once when the nodes file is loaded. This results in much simpler code. This code also removes the use of ctdb_address. Duplicating the port is pointless without an abstraction layer around ctdb_address. If CTDB gets an incompatible transport in the future then add an abstraction layer. Note that the infiniband code is not updated. Compilation of the infiniband code is already broken. Fixing it will be a separate, properly tested effort. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>	2015-03-23 12:23:12 +01:00
Martin Schwenke	39d2fd330a	ctdb-recoverd: Abort when daemon can take recovery lock during recovery Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> Autobuild-User(master): Amitay Isaacs <amitay@samba.org> Autobuild-Date(master): Fri Feb 13 09:48:15 CET 2015 on sn-devel-104	2015-02-13 09:48:15 +01:00
Martin Schwenke	432d677489	ctdb-recoverd: Improve error messages on recovery lock coherence fail When the daemon is able to take the recovery lock during recovery we might as well guess that the cluster filesystem has a lock coherence problem and print a more useful message. This will be more helpful to those trying out cluster filesystems that don't have lock coherence or that are difficult to setup. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-02-13 07:19:07 +01:00
Martin Schwenke	1d6ed91f55	ctdb-recoverd: Simplify ctdb_recovery_lock() Have it just silently take or fail to take the lock, except on an unexpected failure (where it should log an error). This means that when it is called we need to keep the old behaviour and explicitly release the lock. In do_recovery() the lock is released and a message is printed before attempting to take the lock. In the daemon sanity check the lock must be released in the error path if it is actually taken. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-02-13 07:19:07 +01:00
Martin Schwenke	db32a2bce5	ctdb-recoverd: New function ctdb_recovery_unlock() Unlock the recovery lock file. This way knowledge of the file descriptor isn't sprinkled throughout the code. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-02-13 07:19:07 +01:00
Martin Schwenke	72701be663	ctdb-recoverd: New function ctdb_recovery_have_lock() True if this recovery daemon holds the lock. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-02-13 07:19:07 +01:00
Martin Schwenke	d110fe2318	ctdb-daemon: Mark tunable VerifyRecoveryLock as obsolete It is pointless having a recovery lock but not sanity checking that it is working. Also, the logic that uses this tunable is confusing. In some places the recovery lock is released unnecessarily because the tunable isn't set. Simplify the logic by assuming that if a recovery lock is specified then it should be verified. Update documentation that references this tunable. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-02-13 07:19:07 +01:00
Michael Adam	a59fb322d6	ctdb: improve helpfulness of debug message when taking reclock fails Print out the errno if the fcntl call. Signed-off-by: Michael Adam <obnox@samba.org> Reviewed-by: Richard Sharpe <rsharpe@samba.org> Autobuild-User(master): Michael Adam <obnox@samba.org> Autobuild-Date(master): Fri Jan 9 04:25:02 CET 2015 on sn-devel-104	2015-01-09 04:25:02 +01:00
Martin Schwenke	acf26089f1	ctdb-util: Rename db_wrap to tdb_wrap and make it a build subsystem This makes it consistent with Samba, to ease transition. Update unit test code to link to with tdb_wrap instead of including db_wrap.c. There are some potential whitespace fixes in this commit that have been ignored. CTDB's lib/tdb_wrap will be deleted after the transition to Samba's lib/tdb_wrap, so there's no point polishing it too much. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2014-09-10 01:36:15 +02:00
Martin Schwenke	b0f9d33058	ctdb: Fix some "declarations after code" problems Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2014-09-10 01:36:14 +02:00
Martin Schwenke	c1558adeaa	ctdb: Use sys_read() and sys_write() to ensure correct signal interaction ... and avoid compiler warnings in some cases. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2014-08-21 04:46:13 +02:00
Amitay Isaacs	f87b7f664f	ctdb-vacuum: Use existing function ctdb_marshall_finish Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Volker Lendecke <vl@samba.org> Autobuild-User(master): Amitay Isaacs <amitay@samba.org> Autobuild-Date(master): Wed Jul 23 09:44:00 CEST 2014 on sn-devel-104	2014-07-23 09:44:00 +02:00
Amitay Isaacs	2855173dac	ctdb-daemon: Do not thaw databases if recovery is active This prevents ctdb tool from thawing databases prematurely in thaw/wipedb/restoredb commands if recovery is active. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2014-07-07 13:29:50 +02:00
Amitay Isaacs	7aa20ccb5c	ctdb-daemon: No need to call event scripts with CTDB_CALLED_BY_USER This was added to support external monitoring using CTDB event scripts. However, it was never used. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2014-01-16 11:41:12 +11:00
Amitay Isaacs	6d1b74f052	ctdb-server: Coverity fixes Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Michael Adam <obnox@samba.org>	2013-11-19 17:13:03 +01:00
Amitay Isaacs	ae30b61255	vacuuming: Fix vacuuming bug where requests keep bouncing between nodes (part 2) This is caused by corruption of a record header such that the records on two nodes point to each other as dmaster. This makes a request for that record bounce between nodes endlessly. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit f0853013655ac3bedf1b793de128fb679c6db6c6)	2013-08-14 16:55:51 +10:00
Amitay Isaacs	ee8d573069	vacuuming: Fix vacuuming bug where requests keep bouncing between nodes (part 1) This is caused by corruption of a record header such that the records on two nodes point to each other as dmaster. This makes a request for that record bounce between nodes endlessly. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit a610bc351f0754c84c78c27d02f9a695e60c5b0f)	2013-08-14 16:55:51 +10:00
Sumit Bose	d039f799ac	Check return value of tdb_delete() Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 5cdcc3d45d358ddbcd7e864898eed9cbd9935429)	2013-07-11 15:16:55 +10:00
Amitay Isaacs	1c21f37e57	ctdbd: Set process names for child processes This helps distinguish processes in process list in top, perf, etc. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 2493f57ce268d6fe7e4c40a87852c347fd60d29e)	2013-07-10 14:33:19 +10:00
Mathieu Parent	d82b9ae410	build: Fix tdb.h path to enable building with system TDB library (This used to be ctdb commit f8bf99de3a5f56be67aaa67ed836458b1cf73e86)	2013-06-14 16:45:27 +10:00
Amitay Isaacs	140336383b	ctdbd: Log node state transitions at higher debug level Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit db31dc48bd3135e9242af08bb79b67a17a2b1668)	2013-05-29 17:47:15 +10:00
Amitay Isaacs	a002c6ec12	vacuum: Reduce the priority of non-critical error Since the complete database is not locked when the receive_records control is received, it's possible that we may not be able to obtain lock on a chain. We will try again to store this record. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit 32723c9efdad1c6ca4aa53f308ccd9bef1aadfff)	2013-05-24 14:22:16 +02:00
Martin Schwenke	6d9667f01c	ctdbd: Add new runstate CTDB_RUNSTATE_FIRST_RECOVERY This adds more serialisation to the startup, ensuring that the "startup" event runs after everything to do with the first recovery (including the "recovered" event). Given that it now takes longer to get to the "startup" state, the initscript needs to wait until ctdbd gets to "first_recovery". Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit ed6814ff0a59ddbb1c1b3128b505380f60d7aeb7)	2013-05-24 14:08:07 +10:00
Martin Schwenke	5aeae9744e	ctdbd: Log a message when recovery master changes Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-Programmed-With: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 1f96ea08f9a39dfe537c9b957ac512c84dc76f91)	2013-05-23 16:17:18 +10:00
Martin Schwenke	fa16cccf02	ctdbd: Remove the "stopped" event It isn't used, superceded by "ipreallocated". Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit c2bb8596a8af6406ef50e53953884df9d6246a96)	2013-05-06 13:38:21 +10:00
Michael Adam	217d2ad7b8	recover: use CTDB_REC_RO_FLAGS where appropriate Signed-off-by: Michael Adam <obnox@samba.org> Reviewed-By: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit b5a8791268e938d7e017056e0e2bd2cbec1fa690)	2013-04-24 18:49:08 +10:00
Michael Adam	527976d02a	vacuum: introduce the RECEIVE_RECORDS control This in preparation of turning the vacuming on the lmaster into into a two phase process: - First the node sends the list of records to be vacuumed to all other nodes with this new RECEIVE_RECORDS control. The remote nodes should store the lmaster's empty current copy. - Only those records that could be stored on all other nodes are processed further. They are send to all other nodes with the TRY_DELETE_RECORDS control as before for deletion. Signed-off-by: Michael Adam <obnox@samba.org> Reviewed-By: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit e397702e271af38204fd99733bbeba7c1db3a999)	2013-04-24 18:47:32 +10:00
Michael Adam	b1a6289b44	ctdbd: unimplement the unused SET_DMASTER control Signed-off-by: Michael Adam <obnox@samba.org> Reviewed-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 2e92deef5221ee651028ef87138b3113f1fece91)	2013-04-17 12:44:08 +02:00
Amitay Isaacs	30299c387f	daemon: On shutdown, destroy timed events that check if recoverd is active When CTDB is shutting down, recovery daemon is stopped, but the event that checks if recovery daemon is still alive is not destroyed. So recovery master is restarted during shutdown if CTDB daemon takes longer to shutdown. There are two processes that check if recovery daemon is working. 1. ctdb_check_recd() - which checks every 30 seconds if the recovery daemon process exists. 2. ctdb_recd_ping_timeout() - which is triggered when recovery daemon fails to ping CTDB daemon. Both the events are periodic and need to be destroyed when shutting down. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 746168df2e691058e601016110fae818c6a265c3)	2013-01-09 13:20:26 +11:00
Michael Adam	f5b15e21c5	ctdb:recover: fix a comment typo Signed-off-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit 5067392d2e06795559f25828b65c129608b65c0b)	2013-01-05 01:15:19 +01:00
Amitay Isaacs	08ffbc342c	ctdb_recover: Replace static locking functions with locking API Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 4456a01d8f54ca6c771d7488048de5f638477d21)	2012-10-20 02:48:44 +11:00
Ronnie Sahlberg	e7d21834ae	RECOVER: When we pull databases during recovery, we used to reallocate the databuffer for each entry added. This would normally not be an issue, but for cases where memory is fragmented, this could start to cost significant cpu if we need to reallocate and move to a different region. Change this to instead preallocate , by default, 10MByte chunks to the data buffer. This significantly reduces the number of potential reallocate and move operations that may be required. Create a tunable to override/change how much preallocation should be used. (This used to be ctdb commit 1f262deaad0818f159f9c68330f7fec121679023)	2012-05-25 12:34:06 +10:00
Ronnie Sahlberg	26322d257d	DEBUG: Add checks for and print debug messages when 1) a database contains very many records, 2) when a database is very big, 3) when a single record is very big. Add tunables to control when to log these instances and allow it to be completely turned off by setting the threshold to 0 (This used to be ctdb commit 9ed58fef4991725f75509433496f4d5ffae0ae87)	2012-05-21 13:26:13 +10:00
Ronnie Sahlberg	a57eba2bb4	Track all child process so we never send a signal to an unrelated process (our child died and kernel wrapped the pid-space and reused the pid for a different process Wrap all creation of child processes inside ctdb_fork() which is used to track all processes we have spawned. Capture SIGCHLD to track also which child processes have terminated. Wrap kill() inside ctdb_kill() and make sure that we never send a !0 signal to a child process pid that has already terminated (and might have been replaced with a (This used to be ctdb commit f73a4b1495830bcdd094a93732a89dd53b3c2f78)	2012-05-03 14:03:26 +10:00
Amitay Isaacs	4392591555	Remove explicit include of lib/tevent/tevent.h. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 0681014ca5ed2a9b56f63fdace7f894beccf8a9a)	2012-04-13 17:28:14 +10:00
Amitay Isaacs	e2d83970e9	recovery: Add prototypes for tdb internal functions Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 98ac99c4a79fe2ee024890bb27c3ca68dc02d434)	2012-03-30 12:33:28 +11:00
Ronnie Sahlberg	fa3a06246a	STICKY: add prototype code to make records stick to a node to "calm" down if they are found to be very hot and accessed by a lot of clients. This can improve performance and stop clients from having to chase a rapidly migrating/bouncing record (This used to be ctdb commit d0d98f7e45e5084b81335b004d50bddc80cdc219)	2012-03-20 17:12:19 +11:00
Ronnie Sahlberg	6f83805183	READONLY: skip vacuuming or deleting records with readonly delegations. they are hot. wait until they have been revoked before we recall them. (This used to be ctdb commit 7417d994c2a159f71d27d4bcd2f53684862eece3)	2012-02-29 16:09:24 +11:00
Ronnie Sahlberg	e3cdf429b6	ReadOnly: revokechild_active is a list, not a context. Dont reset the pointer to NULL after deleting the first entry, loop deleting one entry at a time until they are all gone or we will leak some memory and possibly a process. (This used to be ctdb commit 8a86ac72088ad9f64ca83218c704f84c9abe00b6)	2011-09-13 18:47:18 +10:00
Ronnie Sahlberg	206a3c0c66	ReadOnly: add a new control to activate readonly lock capability for a database. let all databases default to not support this until enabled through this control (This used to be ctdb commit 908a07c42e5135a3ba30a625fc4f4e4916de197a)	2011-09-01 11:08:18 +10:00
Ronnie Sahlberg	a0d4d240c3	ReadOnly: add a readonly flag to the getdbmap control and show the readonly setting in ctdb getdbmap output (This used to be ctdb commit 4cac9ad7d9c9ca657a247a6c215476399c7d2210)	2011-09-01 10:28:15 +10:00
Ronnie Sahlberg	63dc96cdb2	ReadOnly: Change the ctdb_db structure to keep a uint8_t for flags instead of a boolean for the persistent flag. This is the same size as the original boolean but allows ut to add additional flags for the database (This used to be ctdb commit 7462761638d25880ad46024ad4ef21667eb99a98)	2011-09-01 10:21:55 +10:00
Ronnie Sahlberg	9729d3e339	ReadOnly: Check the readonly flag instead of whether the tdb pointer is NULL or not (This used to be ctdb commit 01314c2cb3a480917d6a632b83c39f0a48bba0e7)	2011-08-23 10:41:52 +10:00
Ronnie Sahlberg	59d8d9b695	ReadOnly: Once recovery has finished, make sure to free all revoke child processes and trigger the destructors for all deferred calls to re-queue the original packets to the input packet processing function (This used to be ctdb commit 530a78aa05910beeca0867c4dbe226d4ce73f946)	2011-08-23 10:30:57 +10:00
Ronnie Sahlberg	b01dc029ca	ReadOnly: After recovering all databases, make sure to clear out the tracking database used to track delegations and revoke. This is because the recovery will implicitely result in a revoke of all delegations. (This used to be ctdb commit b5520933b9922d6af6f59f535824e1cdacb9f774)	2011-08-23 10:24:44 +10:00
Ronnie Sahlberg	6ff039d444	ReadOnly: After performing a recovery, clear out all flags related to readonly delegations and revoke (This used to be ctdb commit 9985a97e11688f3f688bb84e1180fd57c42077f4)	2011-08-23 10:24:18 +10:00
Ronnie Sahlberg	a1abcd41e0	Restart recovery dameon if it looks like it hung. Dont shutdown ctdbd completely, that only makes the problem worse. (This used to be ctdb commit 221ecc2509f6d267d1854c1042ff945a620510bb)	2011-03-07 06:39:10 +11:00
Ronnie Sahlberg	8acb677c9c	Deferred attach : at early startup, defer any db attach calls until we are out of recovery. (This used to be ctdb commit eeaabd579841f60ab2c5b004cbbb1f5de2bfe685)	2011-03-01 12:13:34 +11:00
Michael Adam	40e922f4e6	recover: finish pending trans3 commits when a recovery is finished. When the end_recovery control is received, pending trans3 commits are finished. During the recovery, all the actions like persistent_callback and persistent_store_timeout had been disabled to let the recovery do its job. After the recover is completed, send the reply to the waiting clients. (This used to be ctdb commit f7dfeb7143f574c2434f7dd16917380dfd1f4f64)	2011-02-24 10:35:26 +01:00

1 2 3

144 Commits