samba-mirror

mirror of https://github.com/samba-team/samba.git synced 2025-02-11 17:58:16 +03:00

Author	SHA1	Message	Date
Michael Adam	ace1efb878	persistent: add a ctdb_persistent_state member to the ctdb_db context. To be used for tracking running transaction commits through recoveries. (This used to be ctdb commit 1237e15df4af58a3d220eea42a4b75e21e65029f)	2011-02-24 10:35:25 +01:00
Michael Adam	76acf72bc5	persistent_callback: print "no error message given" instead of "(null)" (This used to be ctdb commit d871a38978219e004833608c11aae98fe47614b9)	2011-02-24 10:35:25 +01:00
Michael Adam	e050266690	persistent: reduce indentation for the finishing moves in ctdb_persistent_callback (This used to be ctdb commit 2c2d1646eb753ea9561f085bcb101153267b052b)	2011-02-24 10:35:24 +01:00
Michael Adam	033ba0b466	persistent: if a node failed to update_record, trigger a recovery and stop processing of the update_record replies in order to let the recovery finish the trans3_commit control. (This used to be ctdb commit cab95570dc1eefb08abbac5ae411c29f699b51cc)	2011-02-24 10:35:24 +01:00
Michael Adam	0c93a2932c	persistent_store_timout: do not really time out the trans3_commit control in recovery If a recovery was started, then all further processing of the update_record controls sent by the trans3_commit control and timing them out is disabled. The recovery should trigger sending the reply for the update record control when finished. (This used to be ctdb commit 983c1ca2e18ecd60fca69bfe9e116125cc695857)	2011-02-24 10:35:24 +01:00
Michael Adam	c9df23ae1d	persistent_callback: ignore the update-recordreturn code of remote node in recovery If a recovery was started, then all further processing of the update_record controls sent by the trans3_commit control is disabled. The recovery should trigger sending the reply for the update record control when finished. (This used to be ctdb commit 12cf0619255b12230843cd8bb49cbfdea376ca2f)	2011-02-24 10:35:24 +01:00
Ronnie Sahlberg	92f86534ac	ctdb_req_dmaster from non-master If we find a situatior where we get a stray packet with the wrong dmaster, dont suicide with ctdb_fatal() since this is too disruptive. Just drop the stray packet and force a recovery to make sure all is good again. CQ S1022004 (This used to be ctdb commit 62b7fe853db37c0a90e48a0332a3426a8dcb4ed8)	2011-02-18 11:29:44 +11:00
Ronnie Sahlberg	a453e79050	50.samba : Tell winbind about every time we add/remove and ip from the node CQ S1021636 (This used to be ctdb commit 87b279027616cffbcedfd534ac0032cd51238dfe)	2011-02-18 11:29:35 +11:00
Ronnie Sahlberg	65f44e159f	Add two new flags for the ltdb header. One of which signals that the record has never been migrated to/from a node while containing data. This property "has never been migrated while non-zero" is important later to provide heuristics on which records we might be able to purge from the tdb files cheaply, i.e. without having to rely on the full-blown database vacuum. These records are belived to be very common and the pattern would look like this : 1, no record exists at all. 2, client opens a file 3, samba requests the record for this file 4, an empty record is created on the LMASTER 5, the empty record is migrated to the DMASTER 6, samba writes a <sharemode> to the record locally and the record grows 7, client finishes working the file and closes the file 8, samba removes the sharemode and the record becomes empty again. 9, much later : vacuuming will delete the record At stage 8, since the record has never been migrated onto a node wile being non-zero it would be safe, and much more efficient to just delete the record completely from the database and hand it back to the LMASTER. The flags occupy the same uint32_t as was previously used for laccessor/lacount in the header. For now, make sure the flags only define/use the top 16 bits of this field so that we are sure we dont collide with bits set to one from previous generations of the ctdb cluster database prior to this change in semantics of this word. This is a rework of Michaels patch : commit 2af1a47cbe1a608496c8caf3eb0c990eb7259a0d Author: Michael Adam <obnox@samba.org> Date: Tue Nov 30 17:00:54 2010 +0100 add a DEFAULT record flag and a MIGRATED_WITH_DATA record flag. (This used to be ctdb commit e075670dee8e6ecaba54986f87a85be3d0528b6b)	2011-02-18 10:14:56 +11:00
Ronnie Sahlberg	d32a4dd501	remove checking for filesystems and filesystem health from the cnfs script. remove the gpfsmount and gpfsumount entry points (This used to be ctdb commit 7db5a4832a9555be53c301f198f72b9e075a8ae7)	2011-02-18 10:11:56 +11:00
Ronnie Sahlberg	ef0ab7eee1	60.nfs Dont update the statd settings that often. When we have very many nodes and very many ips, this would generate a lot of unnessecary load on the system (This used to be ctdb commit 0c030c9384500f340d8382c20e1e91b11aa377e9)	2011-02-18 10:10:34 +11:00
Ronnie Sahlberg	b57bd0f896	Remove LACOUNT and LACCESSOR and migrate the records immediately. This concept didnt work out and it is really just as expensive as a full migration anyway, without the benefit of caching the data for subsequence accesses. Now, migrate the records immediately on first access. This will be combined with a "cheap vacuum-lite" for special empty records to prevent growth of databases. Later extensions to mimic read-only behaviour of records will include proper shared read-only locking of database records, making the laccessor/lacount read-only access to the data obsolete anyway. By removing this special case and handling of lacount laccessor makes the codapath where shared read-only locking will be be implemented simpler, and frees up space in the ctdb_ltdb header for use by vacuuming flags as well as read-only locking flags. (This used to be ctdb commit 155dd1f4885fe142c6f8bd09430f65daf8a17e51)	2011-02-18 10:08:32 +11:00
Ronnie Sahlberg	0aa2282c9c	change the hash function to use the much better Jenkins hash from the tdb library cq S1020233 (This used to be ctdb commit b86feb6fe463dfdb67b2798491df18a4c434a430)	2011-02-18 10:05:09 +11:00
Ronnie Sahlberg	c23f2e8bea	We default to non-deterministic ip now where ips are "sticky" and dont change too much. This means we can simplify the way we add ips significantly and stop trying to move them. We also check if the node already hosts the ip, in which case we used to return an error. Instead just print an error string but return 0, ok. This makes it easier to script, and works around broken scripts. CQ1021034 (This used to be ctdb commit 307e5e95548155a31682dfcb0956834d0c85838e)	2011-02-08 17:06:10 +11:00
Ronnie Sahlberg	40bd94bd5e	If the node is stopped, put a log entry in /var/log/* to indicate this is why we never become ready (This used to be ctdb commit ef1de8211f83259ea37dcd57562139a3b63d9631)	2011-02-02 14:09:56 +11:00
Ronnie Sahlberg	0f33605866	LockWait congestion. Add a dlist to track all active lockwait child processes. Everytime creating a new lockwait handle, check if there is already an active lockwait process for this database/key and if so, send the new request straight to the overflow queue. This means we will only have one active lockwaic child process for a certain key, even if there were thousands of fetch-lock requests for this key. When the lockwait processing finishes for the original request, the processing in d_overflow() will automagically process all remaining keys as well. Add back a --nosetsched argument to make it easier to run under gdb (This used to be ctdb commit 3e9317a2e1f687b04bf51575d47fcd4faa6e6515)	2011-01-24 12:21:58 +11:00
Ronnie Sahlberg	f91f063fe0	Compile fix (This used to be ctdb commit a81da1e67cd11734839c3fa7ae1ddaaf3459416d)	2011-01-24 12:21:53 +11:00
Rusty Russell	e57362ecf4	ctdb_lockwait: create overflow queue. Once we have more than 200 children waiting on a particular db, don't create any more. Just put them on an overflow queue, and when a child gets a lock search that queue to see if others were after the same lock (they probably were). (This used to be ctdb commit 5e614e8cfd1e9a4b13035a0e400b7a60a745b510)	2011-01-24 12:21:50 +11:00
Ronnie Sahlberg	b2d7554b32	Add a new test tool that fetch locks a record and then blocks until it receives user input to unlock the record again. (This used to be ctdb commit 1b3c5278aa1bf712606e2ec138e6be7b2e8a6ad1)	2011-01-24 12:21:46 +11:00
Ronnie Sahlberg	3f819741ad	ctdb: hold transaction locks during freeze, mark during recover. Make the ctdb parent "mark" the transaction lock once the child process has frozen/locked the entire database. This stops the ctdb daemon from using a blocking fcntl() locking on the tdb during the read traverse during recovery. CQ 1021388 (This used to be ctdb commit 52ee2b3ce822344d0f55ac040fe25f6ec5c0d7c2)	2011-01-18 14:07:44 +11:00
Rusty Russell	e68b97ffc9	tdb: expose transaction lock infrastructure for ctdb tdb_traverse_read() grabs the transaction lock. This can cause ctdbd (which uses it) to block when it should not; expose mark and normal variants of this lock, so ctdbd's child (the recovery daemon) can acquire it and the ctdbd parent can mark it was held. (This used to be ctdb commit d09fa845bd848d04507853809acf42e0471b44bf)	2011-01-18 14:07:41 +11:00
Ronnie Sahlberg	849ef2e39b	change Christinas previous patch to only perform the check/logging if we are the main ctdb daemon. Other daemons/child processes are not guaranteed to get events on regular basis so those should not be checked. (This used to be ctdb commit ac2afe9c25753b837d5f6396020e0f3c65ef3628)	2011-01-17 12:01:28 +11:00
Christian Ambach	ad56f321c8	improve timing issue detections the original "Time jumped" messages are too coarse to interpret exactly what was going wrong inside of CTDB. This patch removes the original logs and adds two other logs that differentiate between the time it took to work on an event and the time it took to get the next event. (This used to be ctdb commit fd8d54292f10b35bc4960d64cfa6843ce9aba225)	2011-01-17 11:56:55 +11:00
Ronnie Sahlberg	fcd98a7e59	LIBCTDB: add support for traverse (This used to be ctdb commit 9463e04038ba36792583f83bd95c1af322dc283a)	2011-01-14 17:38:56 +11:00
Ronnie Sahlberg	6494574d8f	db_exists() takes 3 arguments, not two. (This used to be ctdb commit 2c02fc2d45cd7364d7bee0d6a89f1386131ef002)	2011-01-14 09:53:25 +11:00
Ronnie Sahlberg	d903473d82	We can not always rely on the recovery daemon pinging us in a timely manner so we need a "ticker" in the main ctdbd daemon too to ensure we get at least one event to process every second. This will improve the accuracy of "Time jumped" messages and remove false positives when the recovery daemon is "slow". (This used to be ctdb commit 70154e5e19e219de086b2995d41e8f6e069ee20d)	2011-01-14 09:47:44 +11:00
Ronnie Sahlberg	2edbf0b2fb	ADDIP failure Found during automatic regression testing. We do not allow the takeip/releaseip events to be executed during a recovery. All of "ctdb addip, ctdb delip, ctdb moveip" use and force these events to trigger to perform the ip assignments required. If these commands collide with a recovery, these commands could fail since we do not allow takeip/releaseip events to trigger during the recovery. While it is easy to just try running hte command again, this is suboptimal for script use. Change these commands to retry these operations a few times until either successfull or until we give up. This makes the commands much easier to use in scripts. (This used to be ctdb commit 6954c9df67501183995f408cca358c8fdfb176ab)	2011-01-13 16:18:58 +11:00
Ronnie Sahlberg	93bea39391	IPALLOCATION : If the node is held pinned down in "init" state by external services failing to start, or blocking CTDBD from finishing the startup phase, we can encounter a situation where we have not yet fully initialized, but a remote recovery master tries to release a certain ip clusterwide. In this situation the node that is pinned down in init/startup phase would fail to perform the release of the ip address since we are not yet fully operational and not yet host any valid interfaces. In this situation, we just need to remain unhealthy, there is on need to also ban the node. Remove the autobanning for this condition and just let the node remain in unhealthy mode. Banning is overkill in this situation when the system is broken and just draws attention to ctdbd instead of the root cause. (This used to be ctdb commit d8af74e4c4961deb94c18dde8ba7fc07e944729c)	2011-01-13 09:42:01 +11:00
Martin Schwenke	59c5a9f279	Eventscripts: lower the fail/restart limits for nfsd. We were potentially leaving a node unable to serve requests for too long. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 5be8610ffa33db49e33949560d0ef2fa5f3c0c73)	2011-01-11 16:49:46 +11:00
Martin Schwenke	96378d6dc8	Eventscripts: use "startstop_nfs restart" to reconfigure NFS. This was defaulting to just "service nfs restart", which doesn't have the workarounds we need. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 0f462e9e9fe12b595f3c7452123db8e69548abd6)	2011-01-11 16:49:14 +11:00
Martin Schwenke	3efd5ef77c	Eventscripts: only autostart during a monitor event. Otherwise we might short-circuit events that are run only once and actually need to do something. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit c4f9e8a43540bc049b2771e0a2d76d37b9d17331)	2011-01-11 16:48:50 +11:00
Martin Schwenke	fb8f199651	Eventscripts: print a message when reconfiguring a service. Otherwise there can be strange error messages from services stopping/starting, without any context. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 8bcf7ab164429ddc0ae530133e114f186a8146dd)	2011-01-11 16:48:17 +11:00
Martin Schwenke	934ae76d38	Eventscripts: work around NFS restart failure under load. "service nfs restart" can fail. To stop nfsd it sends a SIGINT and nfsd might take a while to process it if the system is loaded. Starting nfsd may then fail because resources are still in use. This does some /proc magic to tell nfsd to do no more processing. It then runs service stop, kills nfsd with SIGKILL, and then runs service start. This is much less likely to fail. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit a9bf4f82852975b0b627f61ceb2d23401f630805)	2011-01-11 16:47:43 +11:00
Ronnie Sahlberg	47aad74673	TYPO (This used to be ctdb commit 38dc1ac2e87416a22c9356596286b773d601e71c)	2011-01-11 16:17:33 +11:00
Ronnie Sahlberg	2a3442d972	STATD is 100027 not 1000247 (This used to be ctdb commit f4cf15a2b06ffefde0cba803603b48040ad0fa05)	2011-01-11 16:16:28 +11:00
Ronnie Sahlberg	1859cde18d	LIBCTDB uninitialized inqueue element From Michael Anderson, initialize the inqueue element of the ctdb structure to NULL, else it might be used uninitialized and cause a segv. (This used to be ctdb commit 775d02180b825ae32d6536eaf2059884d5fed9f4)	2011-01-11 07:40:57 +11:00
Ronnie Sahlberg	d236c970d0	recoverd: avoid triggering a full recovery if just some ip allocation has failed. We dont need to rebuild the databases in this situation, we just need to try again to sort out the ip address allocations. (This used to be ctdb commit 044c398ffea23d36ee033c8ddf07d11028197346)	2011-01-11 07:40:49 +11:00
Ronnie Sahlberg	c4006ce844	Add ctdb_fork(0 which will fork a child process and drop the real-time scheduler for the child. Use ctdb_fork() from callers where we dont want the child to be running at real-time privilege. (This used to be ctdb commit 58795a4c9e0624e20fa3e0023b65127053edd103)	2011-01-11 07:40:41 +11:00
Ronnie Sahlberg	ea0df6d882	Revert scheduling back to use real-time processes Revert this patch: commit 482c302d46e2162d0cf552f8456bc49573ae729d We may need to use real-time processes for the main daemon and the recovery daemon to handle the cases where systems come under very high loads. (This used to be ctdb commit 08bef9dcab6e4da15fc783f8624e5ed09aa060b5)	2011-01-11 07:40:35 +11:00
Ronnie Sahlberg	7e747aab8d	60.nfs Check if we have rpc.statd and if not, skip checking for statd availability at all (since we cant restart it, there is not point checking if it is alive) (This used to be ctdb commit 6075e85ba6c0f58fd1ab2ce3b09dd3d6ff491365)	2011-01-06 15:49:15 +11:00
Ronnie Sahlberg	ded7c23122	41.HTTPD Httpd can be very slow to start on some platforms, wait 5 monitor intervals before we try to restart it if it has not bound to port 80 yet. After 10 failed intervals, flag the node as unhealthy. (This used to be ctdb commit 6ec1993aa5f2778b8227ce5f6eca0d19e4ae9788)	2010-12-22 10:31:41 +11:00
Ronnie Sahlberg	e9ff38be7d	60.nfs Try to restart LOCKD after 10 failures and flag the node as unhealthy after 15 failures (This used to be ctdb commit 5a67889c9166835aef3443051812d14af07dfca5)	2010-12-22 10:31:31 +11:00
Ronnie Sahlberg	57e74f6d8a	Dont run net serverid wipe in the background (This used to be ctdb commit 76c515f9f05f4fb5683b5ff65cf136c168fd882f)	2010-12-22 10:31:26 +11:00
Ronnie Sahlberg	97a6eccaf7	50.samba Net serverid wipe can take a bit of time sometimes so background it. Only perform auto start/stop of the managed service on the monitor event (This used to be ctdb commit deba5cbbf7703a1a24ce88a06c73fca056e05521)	2010-12-14 21:19:28 +11:00
Ronnie Sahlberg	99d7e39efc	ctdb addip: After finishing "ctdb addip" wait for an implicit "iptakeover" to complete the assignment to a node. This makes it more wasteful and timeconsuming when adding multiple ips at once, or the same ip to multiple nodes, but makes it easier to script the use of this command. (This used to be ctdb commit d86cbf3d7d426c558d110d67dc985634c754a522)	2010-12-13 14:24:30 +11:00
Ronnie Sahlberg	1e41ab5fa3	LVS update lvs configuration on ipreallocated events too (This used to be ctdb commit a4e98073d955676fdcbb91affae1de1a733d0bc2)	2010-12-13 14:24:16 +11:00
Ronnie Sahlberg	a9a6ae064d	When assigning the single-public-ip during startup, flag the interface as initially being "link ok" so that we can add it and startup. The eventscript can later drop the flag if required (This used to be ctdb commit 720849b756c825fb8b285f09972a8c39f1888a99)	2010-12-13 14:24:04 +11:00
Ronnie Sahlberg	220c5371c7	Revert "server: when we migrate off a record with data, set the MIGRATED_WITH_DATA flag" This reverts commit 17e231abf5ade83d7fa624b5cf54ae876e2795aa. (This used to be ctdb commit 23f81ba39ee7cd8a7360f4602b3eb264eb221552)	2010-12-13 14:23:48 +11:00
Ronnie Sahlberg	dff88a8a6a	Revert "Add a new header flag for "migrated with data" and set this to 1" This reverts commit a8cc35191df1cd4b866897df71d317ce5f198cb5. (This used to be ctdb commit 7c37435fb517a621c45b21a21b4eb15f8bbd3c83)	2010-12-13 14:23:32 +11:00
Ronnie Sahlberg	f815237f8f	libctdb fix a compile problem after renaming a structure field (This used to be ctdb commit f44c02f45dbc13e3cc2e89ee1c96bd0d57042fcc)	2010-12-10 14:19:53 +11:00

1 2 3 4 5 ...

3278 Commits