samba-mirror

mirror of https://github.com/samba-team/samba.git synced 2024-12-24 21:34:56 +03:00

Author	SHA1	Message	Date
Amitay Isaacs	f165ed1594	tools/ctdb: When printing TDB data as a string, use correct length of the string Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit d94a10f93a0925b17458d009e604966666b3d880)	2013-10-04 15:15:27 +10:00
Amitay Isaacs	d3783ae140	tools/ctdb: Remove un-implemented ctdb vacuum command Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 8b238852884004a56f76a1762199c338864d1249)	2013-10-04 15:15:27 +10:00
Amitay Isaacs	e4ed152d59	tests: Add a simple test to test cluster wide database traverse Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 713c9ecc791e3319a2d109838471833de5a158c8)	2013-09-26 10:21:31 +10:00
Amitay Isaacs	a2d6bbe67a	traverse: Send traverse end record from traverse child process Traverse records are sent directly from traverse child process, but the last empty record signalling end of traverse is sent from ctdbd. This creates a race condition between ctdbd and traverse child. There are two fds from traverse child to ctdbd - a pipe to track status of the child process and unix socket connection for sending records. It's possible that last few records are sitting in unix socket buffer when ctdbd reads the status written from traverse child. This will be interpreted as end of traverse and ctdbd will send the last empty record to originating node before it has processed the pending packets in unix socket connection. The race is avoided by sending the last empty record marking end of traverse from the child process. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 37e22fc3ac3eb64732f2e67058f5b7b06c093fbf)	2013-09-25 14:59:45 +10:00
Amitay Isaacs	f1f1788f10	traverse: Wait till all data has been flushed from output queue To improve the traverse performance, records are directly sent from traverse child process to the originating node. Make sure that all the data is sent via socket, before informing ctdbd that traverse is complete. Without waiting for all the packets to be flushed from the queue, child process can incorrectly signal ctdbd that traverse has ended. This will cause the pending records in the queue never to make it to the originating node and traverse information will not be complete. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 482ac708cb79cb6378d814a79c2cf13f88435bc4)	2013-09-25 14:59:45 +10:00
Amitay Isaacs	1740cbb58c	traverse: Use ctdb local variable for convenience Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 25e9cf86328252f96215b54b94551dd7bbdd2db4)	2013-09-25 14:59:45 +10:00
Amitay Isaacs	c4f49a5342	traverse: Check if local traverse failed or succeeded By passing the result of tdb_traverse_read() allows ctdbd to determine if the local traverse succeeded or not. In case of a problem with local traverse, ctdbd can log an error. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit abd51a9f41ebb178c4ea4491bdedf9a9433e7232)	2013-09-25 14:59:45 +10:00
Amitay Isaacs	76d9d2e5e1	traverse: Log information when traverse starts and ends Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit e4aba8598b00a810e721de64ac44dccc9af04ab6)	2013-09-25 14:59:45 +10:00
Martin Schwenke	613313fa52	tool/ltdbtool: -h option does not require an argument Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 9e18f3c173863919587e25d704f66372624ed8ed)	2013-09-25 14:35:46 +10:00
Martin Schwenke	5818771192	scripts: Add support for optional ctdbd.conf configuration file Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 8f660d0dd52013e5876806be908e8e603aa6e968)	2013-09-25 14:35:46 +10:00
Martin Schwenke	066b671de0	utils: Make debug level strings case-insensitive Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit c700dd0c7b6b43b61b3e231643b5d7cbe2f9592a)	2013-09-25 14:35:31 +10:00
Martin Schwenke	5b2c8ba880	tools/ctdb: Fix help messages for ctdb commands Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 49c87699fad151933a0aefebfee968fc850e6383)	2013-09-25 14:34:55 +10:00
Martin Schwenke	058037d58c	tools/ctdb: Ban time of 0 is invalid Apparently it used to mean a permanent ban but it is unclear if this was ever supported. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit c8a6e5ce579e2fe320c40268e7e9ddfe68b8cd30)	2013-09-25 14:34:55 +10:00
Amitay Isaacs	4c4bfcbd6f	eventscripts: Load CTDB configuration settings in 70.iscsi Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit ff41ce5ef202f8f6342e285d195bb5df61d848ce)	2013-09-23 18:38:28 +10:00
Martin Schwenke	430ae84877	recoverd: Disable takeover runs on other nodes for 5 minutes 60 seconds might not be long enough to kill all connections and release IPs. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 71080676bb4acbd0d9b595a30cf7fe6dddbf426f)	2013-09-19 12:58:32 +10:00
Martin Schwenke	07d3a1b234	recoverd: Improve logging for takeover runs Takeover runs are currently silent when they succeed. However, they are important, so log something by default. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit b39aa2e401fbb581207d986bac93778e9c01acdc)	2013-09-19 12:57:36 +10:00
Martin Schwenke	236b2524de	tools/ctdb: Use the standard long timeout when disabling takeover runs This means that takeover runs will be disabled for about as long as the reloadips control can take to complete. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 6d44657a5e5b0df22bab2d487a503dd1c5ba79b4)	2013-09-19 12:56:50 +10:00
Martin Schwenke	5f0d85d4db	tools/ctdb: Fix arguments/semantics of rebalance node There's no reason why specifying a node should be compulsory. This is a cluster-wide operation because it is implemented by the recovery master so multiple nodes should not be specified using -n. However, the command should be able to specify multiple nodes so let it have its own nodestring argument. This change should be backward compatible with the old requirement of specifying a single node via -n. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 0846c00597adb66bba8c9dbf63443d0c2f91a7d1)	2013-09-19 12:54:32 +10:00
Martin Schwenke	c484361076	tools/ctdb: Make rebalancenode more robust Use a broadcast instead of trying to win the race of determining the recovery master and then sending the message before the recovery master changes. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit ac946ee4ad01b1e5cd1006930b9f8a190a0a58ba)	2013-09-19 12:54:32 +10:00
Martin Schwenke	44b7397962	tests/simple: Fix the reloadips test to cope with changes to reloadips Specifying nodes to reload no longer uses -n. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit d921b2756d5f1c4ad7a35fe120f6fda9f5bf5686)	2013-09-19 12:54:32 +10:00
Martin Schwenke	566d66e6ab	recoverd: Be careful about freeing the list of IP rebalance target nodes It can change during a takeover run. If it does then don't free it. There are potentially fancier solutions (e.g. check what PNNs are new to the list) to this issue but this is the simplest. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit e81589b7084c661adf617e166cc2c25b4939f841)	2013-09-19 12:54:31 +10:00
Martin Schwenke	4fb0d4a301	recoverd: reloadips should rebalance target nodes for new IPs Otherwise, if existing IPs are added to extra nodes (that have, perhaps, been disconnected) then those IPs will not be rebalanced across the extra nodes. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit ceb30432a9a550778aed0b422a654fc5287b82a3)	2013-09-19 12:54:31 +10:00
Martin Schwenke	950e23f664	ctdbd: Make ctdb_reloadips_child send controls asynchronously Deleting IPs can take a while because IPs are released and connections are killed. This can take a while so do them in parallel. In fact, since the set of IPs being added and deleted will be disjoint, send all the adds/deletes at the same time and then wait. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 85a5b544ec032173e98c9cc3b5402a76b961aa3b)	2013-09-19 12:54:31 +10:00
Martin Schwenke	b33ee7a2a4	recoverd: Fix the implementation of CTDB_SRVID_REBALANCE_NODE The current implementation has a few flaws: * A takeover run is called unconditionally when the timer goes even if the recovery master role has moved. This means a node other than the recovery master can incorrectly do a takeover run. * The rebalancing target nodes are cleared in the setup for a takeover run, regardless of whether the takeover run succeeds. * The timer to force a rebalance isn't cleared if another takeover run occurs before the deadline. Any forced rebalancing will happen in the first takeover run and when the timer expires some time later then an unnecessary takeover run will occur. * If the recovery master role moves then the rebalancing data will stay on the original node and affect the next takeover run to occur if the recovery master role should come back to the original node. Instead, store an array of rebalance target nodes in the recovery master context. This is passed as an extra argument to ctdb_takeover_run() each time it is called and is cleared when a takeover run succeeds. The timer hangs off the array of rebalance target nodes, which is cleared if the node isn't the recovery master. This means that it is possible to lose rebalance data if the recovery master role moves. However, that's a difficult problem to solve. The best way of approaching it is probably to try to stop the recovery master role from jumping around unnecesarily when inactive nodes join the cluster. The long term solution is to avoid this nonsense completely. The IP allocation algorithm needs to cache state between runs so that it knows which nodes have just become healthy. This also needs recovery master stability. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit c51c1efe5fc7fa668597f2acd435dee16e410fc9)	2013-09-19 12:54:31 +10:00
Martin Schwenke	1793412de2	recoverd: Remove unused CTDB_SRVID_RELOAD_ALL_IPS and handler Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 4cd727439a0824ebb8dbcf737d9888ffc3c41184)	2013-09-19 12:54:31 +10:00
Martin Schwenke	6f1935ea6d	tools/ctdb: Reimplement reloadips This implementation disables takeover runs on all nodes before trying to reload IPs. It also takes "all" or the list of PNNs as an argument to the command instead of to -n. -n can still be specified with a single node indicating that node should be considered the current node - that might be confusing so could be removed. This implementation does not use CTDB_SRVID_RELOAD_ALL_IPS, so it can be removed. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit d66a072d9b120c78c47e726e9f29a3c1cfdd87ce)	2013-09-19 12:54:31 +10:00
Martin Schwenke	e7cc998570	recoverd: Defer ipreallocated requests when takeover runs are disabled The takeover run will fail anyway but deferring seems like a cleaner option. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 428f800bcdf3dbfe19de8bb36099fbf01ebeaab4)	2013-09-19 12:54:31 +10:00
Martin Schwenke	2f472b4573	recoverd: Reimplement CTDB_SRVID_DISABLE_IP_CHECK Use disable_takeover_runs_handler() instead of maintaining duplicate logic. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 0a51a85915486b2a8fded7ba6444b18c6c1ee8e8)	2013-09-19 12:54:31 +10:00
Martin Schwenke	5f0913d321	recoverd: New SRVID message CTDB_SRVID_DISABLE_TAKEOVER_RUNS This implements a superset of CTDB_SRVID_DISABLE_IP_CHECK. It stops the IP checks but also causes any attempted takeover runs to fail and be rescheduled. This is meant to completely stop IP movements. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 00db4de53a0d86013e79e6577e7e6cf3ef864e56)	2013-09-19 12:54:31 +10:00
Martin Schwenke	e79b750e5e	tools/ctdb: Add a wait_for_all option to srvid_broadcast() This will be useful for other SRVIDs. The error checking in the handler depends on the SRVID responding with a uint32_t where <0 indicates an error and >=0 is a PNN that succeeded. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 52050e1c75b21961dafe2bc410268b44240ab24e)	2013-09-19 12:54:31 +10:00
Martin Schwenke	51db81344e	tools/ctdb: Factor out SRVID broadcast code from ipreallocate() Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit a566fb5e70282c4e9f76654b1be4dc80829dced0)	2013-09-19 12:54:30 +10:00
Martin Schwenke	8a6979dac3	tools/ctdb: Change ipreallocate() to use a local done flag Instead of the current global variable. This is in anticipation of abstracting the code. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit c58ee0eddf7ae3283e3ca8bd25575e6e677e1b17)	2013-09-19 12:54:30 +10:00
Martin Schwenke	0ba7e2ce31	recoverd: Factor out the SRVID handling code The code that handles IP reallocate requests can be reused. This also changes the result back to a SRVID caller to the PNN on success or a negative error code on failure. None of the callers currently look at the result so this is harmless... but it will be useful later. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit e4eae6e3291baa299a1d0f733ab11b138ee699a3)	2013-09-19 12:54:30 +10:00
Martin Schwenke	4c3f8dc3bb	recoverd: Make the SRVID request structure generic No need for a separate one for each SRVID. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit d9c22b04d5aa7938a3965bd3144568664eb772ce)	2013-09-19 12:54:30 +10:00
Martin Schwenke	c503997746	recoverd: Move disabling of IP checks into do_takeover_run() Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 48b603fbf16311daa47b01e7a33d477ed51da56d)	2013-09-19 12:54:30 +10:00
Martin Schwenke	bbbb55eef9	recoverd: do_takeover_run() should mark when a takeover run is in progress Nested takeover runs should never happens so they should fail. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 8ed29c60c0a7dd29f2a6efdf694d38e94281e1c4)	2013-09-19 12:54:30 +10:00
Martin Schwenke	a1f915f6b5	recoverd: takeover_fail_callback() doesn't need to set rec->need_takeover_run It is set on every failure anyway. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit e5f94c7857405bdeac233069003c3769b3dc3616)	2013-09-19 12:54:30 +10:00
Martin Schwenke	701c450e90	recoverd: Fail takeover run if "ipreallocated" fails Previously flagging a failure was probably avoided because of attempts to run "ipreallocated" events on stopped and banned nodes, which would fail because they are in recovery. Given the change to a new control and that fallback only retries the old method on active nodes, this should never fail in reasonable circumstances. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 53722430ad35f80935aabd12fa07654126443b8b)	2013-09-19 12:54:30 +10:00
Martin Schwenke	e167e2e7c7	recoverd: New function do_takeover_run() Factor the calling sequence for ctdb_takeover_run() into a new function and call it instead. This changes rec->need_takeover_run to false for each successful takeover run and that seems to be the right thing to do. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 9a3f0c0e61ca5c17e020c6e0463d73c7cf4f7c09)	2013-09-19 12:54:30 +10:00
Martin Schwenke	30a50c6e1e	recoverd: Stabilise the recovery master role On rare occasions when a node that has been inactive it will trigger an election when it becomes active again. If that node has been up for the longest then it will win the election and the recovery master role will spuriously move. While a node remains inactive we reset the priority time to discourage it from winning elections. The priority time will now reflect roughly how long the node has been active rather than how long it has been up. That means the most stable node is more likely to win elections. Having a stable recovery master means that disabling takeover runs while reloading IPs is more likely to succeed. It also improves the chances of being able to cache information in the recovery master - for example, between takeover runs. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit f0f48f22f45e4c82eba2582efae307e25385de81)	2013-09-19 12:54:29 +10:00
Martin Schwenke	630196423a	recoverd: Banned nodes should not be told to run "ipreallocated" event They will reject it because they are in recovery. This can result in extra banning credits being applied to banned nodes. This corresponds to commit 9132e6814ed927fa317f333f03dedb18f75d0e5b from the 1.2.40 branch. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 403938804caf1322f9773d63197e4303a7b2a788)	2013-09-18 17:16:35 +10:00
Martin Schwenke	d30e269ecc	common: Make parse_ip() valgrind-clean Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit c0bb147ca09e82019b05ec22995623cffc3184e2)	2013-09-11 15:35:38 +10:00
Martin Schwenke	8d11da3546	recoverd: Remove an orphaned comment This should have been removed with the associated code in commit 14bd0b6961ef1294e9cba74ce875386b7dfbf446. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 36de63843de10a1f2a9ccdbbee24cc1d08542984)	2013-09-11 15:35:16 +10:00
Martin Schwenke	4e62553fcb	recoverd: Update a comment to use current terminology Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit ea5576071b22e1877903ec0921d375626a23e13b)	2013-09-11 15:35:10 +10:00
Martin Schwenke	fe7f66547b	client: Remove unused function list_of_active_nodes_except_pnn() Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit d8a76cf79f07dfb5a93c6c9a13f16e3268c7dd57)	2013-09-11 15:35:03 +10:00
Martin Schwenke	c870f01160	tools/ctdb: list_of_active_nodes_except_pnn() -> list_of_nodes() list_of_active_nodes_except_pnn() is only used here and can be removed if we remove this call. Less is more... Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit d4e206fb818048b7fab4797c877b854bdbb1ab70)	2013-09-11 15:34:58 +10:00
Martin Schwenke	2d31ec2131	tools/ctdb: Fix a memory leak in parse_nodestring() Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 8753a094b97340deb26dd44f6ea345ca0a642a95)	2013-09-11 15:34:51 +10:00
Martin Schwenke	e003699686	tests/eventscripts: Tests for memory checking in 00.ctdb ... plus updates to test infrastructure to support. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 4a388fc6bf54636b7e1f6da8e6aa451cddd574f7)	2013-09-11 15:34:42 +10:00
Martin Schwenke	b88bf1275c	eventscripts: Clean up monitoring of system memory in 00.ctdb Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 16fcff0d1993b7a0479341862ea44d10bd5c6d6d)	2013-09-11 15:34:30 +10:00
Michael Adam	18f17aaa33	server: standardize formatting of comment block for ctdb_reply_dmaster() while I'm at it.. This was the comment block I was touching and meant to adapt in commit 00d3bf092e2f72eda330978c75ec85f17e870553. My search was apparently not unique... Signed-off-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit 09940255011b119dc6af3304f5d3e9568e6006fd)	2013-08-26 13:24:32 +02:00

1 2 3 4 5 ...

5056 Commits