samba-mirror

mirror of https://github.com/samba-team/samba.git synced 2025-01-13 13:18:06 +03:00

Author	SHA1	Message	Date
Michael Adam	0113744fec	server: trans2_active: don't report a transaction active on the node that performs the transaction Otherwise a node can lock itself out, e.g. when a commit control times out... Michael (This used to be ctdb commit cb432e30351d5e5a41e98da3c7b1c2a4d400a3a2)	2009-10-30 09:22:18 +11:00
Ronnie Sahlberg	784a89ec62	new version 1.0.102 (This used to be ctdb commit 4892222ffb255dccd8ced1cb047f199386bb3e98)	2009-10-29 13:49:27 +11:00
Wolfgang Mueller-Friedt	9713b8ea9a	ensure tdb names end with .tdb. and any number of digits (This used to be ctdb commit 8ab1349feb64a91cb500c130ea299e2182491f06)	2009-10-29 13:46:37 +11:00
Wolfgang Mueller-Friedt	2c137b7030	vacuuming needed additional check before getting rid of the record; there is a gap between selecting the records and deleting them, therefore we have to check if the records still can be deleted when we actually are about to delete them (This used to be ctdb commit a6fbc65aca35c41c428a82d7402e43c6eaac1d6e)	2009-10-29 13:45:17 +11:00
Ronnie Sahlberg	f5e90ec3b5	Revert "From Wolfgang M." This reverts commit 5b70fa8cfd5916d3c212823ad5cc1b251ae175ed. (This used to be ctdb commit 363e7e939ad46b3f75c83c30d4163d63876c2456)	2009-10-29 13:44:12 +11:00
Ronnie Sahlberg	9e235af3a2	make the error logged when winbindd fails to access the dc during startup more scary and easier to spot in the logs (This used to be ctdb commit 0c9b0466fd87b3f1e5d53f867c863217802ac43b)	2009-10-29 11:54:24 +11:00
Ronnie Sahlberg	fcd2ebc32b	update the uptime command to indicate that time since last is either from alst recovery or from last failover (This used to be ctdb commit 467da12a785ba3367ed9cbdf79440394e9703289)	2009-10-29 10:58:14 +11:00
Ronnie Sahlberg	023d09cd38	Revert "update the "uptime" command to indicate the "time since last" is the time since the last recovery OR failover." This reverts commit 3b0d44497800a16400d05a30bdaf6e6c285d4b36. (This used to be ctdb commit cb36bbb5418290e8e5b770d2d836285b15da2a6f)	2009-10-29 10:49:00 +11:00
Ronnie Sahlberg	a4b8a17b26	update the manpage for "update" to indicate the "time since last" indicates the time since the last recovery OR failover (This used to be ctdb commit 22712c577f64ec84851b4addcf4a46c7e99e0662)	2009-10-29 10:32:28 +11:00
Ronnie Sahlberg	279b7ca564	update the "uptime" command to indicate the "time since last" is the time since the last recovery OR failover. (This used to be ctdb commit 3b0d44497800a16400d05a30bdaf6e6c285d4b36)	2009-10-29 10:37:10 +11:00
Michael Adam	2419eab0d9	ctdb_client: reformat a comment slightly to enhance clearness. Michael (This used to be ctdb commit 9560f8b7fe0f7ee0386a87c2653333071050fe4b)	2009-10-29 10:15:54 +11:00
Michael Adam	5d579cf665	client: fix race condition with concurrent transactions on the same node. In ctdb_transaction_commit(), when the trans2_commit control fails, there is a race condition in the 1 second sleep between the local transaction_cancel and the call to ctdb_replay_transaction(): The database is not locked, and neither is the transaction_lock record. So another client can start and possibly complete a new transaction in this gap, but only on the same node: The locking of the transaction_lock record on a different node which involves migration of the record to the other node has been disabled by introduction of the transaction_active flag on the db which closes precisely this gap from the start of the commit until the call to TRANS2_FINISH or TRANS2_ERROR. But this mechanism does not cover the case where a process on the same node tries to start a transaction: There is no obstacle to locking the transaction_lock record because the record does not need to be migrated. This commit closes this race condition in ctdb_transaction_fetch_start() by using the new ctdb_ctrl_transaction_active() call to ask the local ctdb daemon whether it has a transaction running on the database. If so, the check is repeated until the running transaction is done. This does introduce an additional call to the local ctdbd when starting transactions, but it does close the (hopefully) last race condition. Michael (This used to be ctdb commit 02ee9dfd3c6b09f5c5172a7e38738c20b7f0aecd)	2009-10-29 10:15:21 +11:00
Michael Adam	953ccee5c5	client: add ctdb_ctrl_transaction_active() which calls out to CTDB_TRANS2_ACTIVE Michael (This used to be ctdb commit 813cfd7c625ac8af4ef169cc92fb6d69f66004c9)	2009-10-29 10:15:00 +11:00
Michael Adam	abac42ca34	server: add a new ctdb control CTDB_TRANS2_ACTIVE This aske the daemon wheter a transaction is currently active on a given DB on that node. More precisely this asks for the transaction_active flag in the ctdb_db_context that is set in the CTDB_TRANS2_COMMIT control and cleared in the CTDB_TRANS2_ERROR or CTDB_TRANS2_FINISHED controls. This will be useful for fixing race conditions in the transaction code. Michael (This used to be ctdb commit 8d430ae6968dfe566614379436fc3c56003fcd88)	2009-10-29 10:14:30 +11:00
Ronnie Sahlberg	019f3c930e	version 1.0.101 (This used to be ctdb commit 47b67077bdfa64938bb0fa6d1ca8f56fbd5c960e)	2009-10-28 17:42:01 +11:00
Ronnie Sahlberg	d379b30182	create a separate context for non-monitor eventscripts so they dont collide (This used to be ctdb commit 325de818f88f339a16dc4544e899a2d735933c44)	2009-10-28 17:35:15 +11:00
Ronnie Sahlberg	f8a8c0d6e4	return 0 in the event script callback if it was aborted by a different script (This used to be ctdb commit 8d5cb2586a1d5a0255cc18295430927b914d4527)	2009-10-28 16:40:31 +11:00
Ronnie Sahlberg	d82fdcb56f	new version 1.0.100 (This used to be ctdb commit fa34e8a5d588026029dca949151697817fe7f127)	2009-10-28 16:18:28 +11:00
Ronnie Sahlberg	e07ca41886	change the eventscript handling to allow EventScriptTimeout for each individual script isntead of for the entire set of scripts restructure the talloc hierarchy to allow this (This used to be ctdb commit 64da4402c6ad485f1d0a604878a7b0c01a0ea5f0)	2009-10-28 16:11:54 +11:00
Martin Schwenke	8767c894a0	Test suite: Regression fix - wait_until should not run command in sub-shell. Commit 25e82a8a667a54c6921ef076c63fdd738dd75d19 changed wait_until() to protect the command it runs from "set -e" by running it in a subshell. This breaks uses where the command is expected to set global variables. For example, wait_until_get_src_socket lost the value of $out from its call to get_src_socket(). The fix is to not be lazy and use a sub-shell! Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 39642e745254d93d74dde907787503854fe6ca4a)	2009-10-28 13:02:18 +11:00
Ronnie Sahlberg	3526bc830d	Enhance the logging fromeventscripts. When a single script is finished, also log the name of the script, the duration it took and the return status. In the loop where we signal back to the main daemon that the script finished, do this once every 100ms instead of once every 1 second (This used to be ctdb commit 6a1f7a7b1b3a0b8f89998db8fdad83bbb4e9b5a5)	2009-10-28 09:07:43 +11:00
Ronnie Sahlberg	0588b5f9c5	add a check that winbind can actually talk to teh dc during the startup event and refuse to start up if it can not (This used to be ctdb commit 4037b6e73a819a8e2463dfe0959b42875e05e106)	2009-10-27 15:45:03 +11:00
Ronnie Sahlberg	d1bf89a617	temporarily try allowing clients to attach to databases even if the node is banned/stopped or inactive in any other way. (This used to be ctdb commit 227fe99f105bdc3a4f1000f238cbe3adeb3f22f0)	2009-10-27 15:17:45 +11:00
Ronnie Sahlberg	1d7681709b	dont run the monitor event so frequently after a event has failed. use _exit() instead of exit() when terminating an eventscript. (This used to be ctdb commit cc30ee2f4f33cb75b2be980c2d4dff6c7c23852f)	2009-10-27 13:51:45 +11:00
Ronnie Sahlberg	4d40b86805	for debugging add a global variable holding the pid of the main daemon. change the tracking of time() in the event loop to only check/warn when called from the main daemon (This used to be ctdb commit a10fc51f4c30e85ada6d4b7347b0f9a8ebc76637)	2009-10-27 13:18:52 +11:00
Stefan Metzmacher	3d713d9e53	ctdb_diagnostics: don't use hardcoded path to iptables All event scripts use only the relative path, so we should here. Also PATH includes /sbin and /usr/sbin... metze (This used to be ctdb commit 20678e1506db1f96b58c326ee91339e797c07c22)	2009-10-26 14:23:09 +11:00
Stefan Metzmacher	1c6829f3c2	ctdb_client: fix DEBUG statement in ctdb_ctrl_modflags() metze (This used to be ctdb commit a244b75ee49556b0ff51e254cc812594ee3b23a7)	2009-10-26 14:22:07 +11:00
Stefan Metzmacher	198866d82d	server: if takeover runs when the recovery master becomes unhealthy The problem was this: When the monitor event fails, the node->flags get updated, and an update (containing the old and new flags) is sent to the recovery master. If the recovery master sends the update to itself (the same process), it was compairing the node->flags variable with the received new flags. This check always found both flag values to be equal and never sets the rec->need_takeover_run variable to true. There were two problem, first the push_flags_handler() function didn't pass the received old flags. And the ctdb_control_modflags() function ignored the received old flags. metze (This used to be ctdb commit 8ec633b64a05a2d903c2b9639909f15f6375548f)	2009-10-26 14:21:45 +11:00
Stefan Metzmacher	7a616a0d7b	server: print out the full 64-bit srvid on 32-bit hosts metze (This used to be ctdb commit 440e870d61267054b24404bcb69e599226353949)	2009-10-26 14:20:52 +11:00
Stefan Metzmacher	ee97e2676d	tcp: don't log an error when we succefully bind to the desired address metze (This used to be ctdb commit 752a9c81de97be509de7e7feddde749cc5ee22a8)	2009-10-26 14:20:23 +11:00
Ronnie Sahlberg	299b027b8c	patch the event loop so we read the current time every iteration. log an error if the clock jumps backwards also log an error if the clock jumps >5 seconds forward (we assume here we will get at least one event every 5 seconds) (This used to be ctdb commit 11193e1e192bee6f579bdf1303153571a82711d7)	2009-10-26 13:20:35 +11:00
Ronnie Sahlberg	8aacfa348d	Suggestion from Volker, make ctdb_queue_length() cheaper by using a counter variable instead of counting the number of packets each time. (This used to be ctdb commit 331c6e3afd96d8b5e191153a631efdbdabb6ea33)	2009-10-26 12:20:52 +11:00
Ronnie Sahlberg	c36fa583f3	disabel the multipath eventscript by default (This used to be ctdb commit e79c3bcead7bd4bfb74d0aec81908da71551c107)	2009-10-26 10:22:00 +11:00
Ronnie Sahlberg	9db2a5ca05	update the manpage for ctdb setreclock (This used to be ctdb commit ab4a6a58fb002ec29c19d167800e47987b023fe4)	2009-10-26 10:11:00 +11:00
Ronnie Sahlberg	2d06e9d252	automatically re-activate the reclock file check if we set the reclock file to something (This used to be ctdb commit db250cad7c92c1cc0a690725a4e39531a2e1b7fd)	2009-10-26 10:13:20 +11:00
Ronnie Sahlberg	5aaa15fdb2	lower the log level of a debug message (This used to be ctdb commit 496dc2e80b714811c6e69dc928deaad61cf603b1)	2009-10-26 09:35:18 +11:00
Ronnie Sahlberg	86d1b4c465	Add a mechanism where we can register notifications to be sent out to a SRVID when the client disconnects. The way to use this is from a client to : 1, first create a message handle and bind it to a SRVID A special prefix for the srvid space has been set aside for samba : Only samba is allowed to use srvid's with the top 32 bits set like this. The lower 32 bits are for samba to use internally. 2, register a "notification" using the new control : CTDB_CONTROL_REGISTER_NOTIFY = 114, This control takes as indata a structure like this : struct ctdb_client_notify_register { uint64_t srvid; uint32_t len; uint8_t notify_data[1]; }; srvid is the srvid used in the space set aside above. len and notify_data is an arbitrary blob. When notifications are later sent out to all clients, this is the payload of that notification message. If a client has registered with control 114 and then disconnects from ctdbd, ctdbd will broadcast a message to that srvid to all nodes/listeners in the cluster. A client can resister itself with as many different srvid's it want, but this is handled through a linked list from the client structure so it mainly designed for "few notifications per client". 3, a client that no longer wants to have a notification set up can deregister using control CTDB_CONTROL_DEREGISTER_NOTIFY = 115, which takes this as arguments : struct ctdb_client_notify_deregister { uint64_t srvid; }; When a client deregisters, there will no longer be sent a message to all other clients when this client disconnects from ctdbd. (This used to be ctdb commit f1b6ee4a55cdca60f93d992f0431d91bf301af2c)	2009-10-23 15:24:51 +11:00
Ronnie Sahlberg	c61c655769	when scripts timeout, log pstree to a file in /tmp and just log the filename in the messages file (This used to be ctdb commit 0785afba8e5cd501b9e0ecb4a6a44edf43b57ab0)	2009-10-23 13:55:21 +11:00
Ronnie Sahlberg	3c9b43531a	set the eventscripts to timeout after 20 seconds change the ban count to 10 failures before we ban by default (This used to be ctdb commit 38d7487bc68c8cf85980004aceeef24ae32d6f36)	2009-10-23 13:54:45 +11:00
Ronnie Sahlberg	65757fe1d6	Merge commit 'martins/master' (This used to be ctdb commit 514a60c57557042e463efeff53dd11b9fec40561)	2009-10-23 10:43:13 +11:00
Ronnie Sahlberg	42718a8842	new version 1.0.99 (This used to be ctdb commit 14fca8383b6b1da49278a9181a975543b956161b)	2009-10-22 18:16:33 +11:00
Martin Schwenke	69cca03851	Merge commit 'origin/master' (This used to be ctdb commit f3e09f2cfd33e79e69fc8c84ce4781a31a7a0437)	2009-10-22 17:48:09 +11:00
Martin Schwenke	a128b7e3bb	Document onnode -n and -f options. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 431f79f7c9038ebd95d27c2465207ca40b8f4f23)	2009-10-22 17:47:10 +11:00
Ronnie Sahlberg	e627fae600	if a lock wait child died/finished, we could have released the lockwait handle and set it to NULL before we call the destructors for releaseing the waiters. The waiters reference the locakwait handle in order to remove itself from the li nked list which caused a SEGV. We dont actually need to remove ourselves from this list here since if the parent freeze_handle holding the list is freed, then all waiters are rele ased as well, and the only place we actually need to relink the waiter is in ctd b_freeze_lock_handler, where we want to respond back to the clients and release the waiters but we still want to keep the freeze_handle hanging around. (This used to be ctdb commit e01ab46bafad09a5e320d420734db129d35863bc)	2009-10-22 13:41:28 +11:00
Ronnie Sahlberg	902c476c03	From Volker L Fix some warnings and an incorrect check for a talloc failure (This used to be ctdb commit 27296a47b3d057a6729287acf128b2b67775ecde)	2009-10-22 12:19:40 +11:00
Ronnie Sahlberg	831f9e05a6	From Wolfgang M. With the new vacuuming code, dont treat an invalid dmaster as fatal. Let it update to the new value insetad. (This used to be ctdb commit 5b70fa8cfd5916d3c212823ad5cc1b251ae175ed)	2009-10-22 07:58:44 +11:00
Martin Schwenke	8b2101bc61	Merge commit 'origin/master' (This used to be ctdb commit 61282d4a9be9e544aaa86f3cffc5b58e417f5ab1)	2009-10-21 21:48:15 +11:00
Martin Schwenke	12798118a1	Test suite: Remove the disable/enable monitor tests - they are useless. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 8264c42969d4be7fc6c5b4d56f8b5ef7c62b3bfb)	2009-10-21 21:47:06 +11:00
Martin Schwenke	f2a9ba6976	Test suite: Fix the timeouts on the skip share check tests. The timeout for waiting for state changes isn't very predictable. It is "about" MonitorInterval seconds... but can be longer given the duration of eventscript runs and other things. So, we change the timeout to MonitorInterval + EventScriptTimeout, hoping it never takes that long. Move the eventscript installation/removal from the old fake-tests into a function in the functions file. Implement supporting functions to create/remove/check-for various files that it handles. Also add a function that uses all of this that waits for the next monitor event (but only if all other monitor events pass). The final check in the skip share check tests uses the above and waits for a monitor event, and then checks that the node is still healthy. Also enhance the wait_until function to handle a command starting with '!' (as a separate word) to make it easy to wait for a file not to exist. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 25e82a8a667a54c6921ef076c63fdd738dd75d19)	2009-10-21 21:36:39 +11:00
Ronnie Sahlberg	d5fd4fc0ce	During tests it is common to add/delete test eventscripts at runtime. This can race with teh eventascript handling that does a : list all scripts, sort them, then execute them so trap status code 127 which means the script could not be executed (or /bin/sh does not exist) and treat it as not to cause the node to become unhealthy (This used to be ctdb commit befabc917edb036ca81f5216f65a6d62b26ee83e)	2009-10-21 16:50:39 +11:00

... 2 3 4 5 6 ...

2470 Commits