samba-mirror

mirror of https://github.com/samba-team/samba.git synced 2024-12-23 17:34:34 +03:00

Author	SHA1	Message	Date
Amitay Isaacs	01c6c90e98	ctdb-daemon: Remove dependency on includes.h Instead of includes.h, include the required header files explicitly. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2015-10-30 02:00:27 +01:00
Amitay Isaacs	2fdb332fad	ctdb-daemon: Stop using tevent compatibility definitions Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2015-10-30 02:00:27 +01:00
Amitay Isaacs	b900adc55c	ctdb-daemon: Separate prototypes for system specific functions This groups function prototypes for system specific functions in common/system.h and removes them from ctdb_private.h. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2015-10-30 02:00:27 +01:00
Amitay Isaacs	b25c1135a7	ctdb-daemon: Use reqid abstraction Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2015-10-07 14:53:28 +02:00
Martin Schwenke	bce6a386d3	ctdb-daemon: Drop struct ctdb_control_killtcp Just use ctdb_tcp_connection. It is the same. There are no external users. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Volker Lendecke <vl@samba.org>	2015-09-07 07:01:13 +02:00
Martin Schwenke	952a50485f	ctdb-daemon: Check if updates are in flight when releasing all IPs Some code involved in releasing IPs is not re-entrant. Memory corruption can occur if, for example, overlapping attempts are made to ban a node. We haven't been able to recreate the corruption but this should protect against it. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-07-29 10:33:29 +02:00
Martin Schwenke	036c2a9243	ctdb-recoverd: Add new function clear_ip_assignment_tree() This needs to be cleared to avoid stale data when a new recovery master is elected. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-07-01 04:18:28 +02:00
Martin Schwenke	7d0a4ab622	ctdb-daemon: Never release all IPs when DisableIPFailover is set If DisableIPFailover is set then something else may be managing public IP addresses so CTDB should leave them alone. Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-05-13 06:42:13 +02:00
Martin Schwenke	5483d0c799	ctdb-daemon: Don't update IP tree if DisableIPFailover is set There won't be an IP tree. It is only ever initialised during a takeover run. The alternate to this would be to avoid sending CTDB_SRVID_RECD_UPDATE_IP in "ctdb moveip". This logic is probably best kept out of the CLI tool. Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-05-13 06:42:13 +02:00
Martin Schwenke	3c7bcea368	ctdb-daemon: Mark interfaces as "up" by default The potential for public IP addresses to shuffle around during node initialisation disappeared a while ago because IP addresses can only be assigned to a node that is in CTDB_RUNSTATE_RUNNING. This means that interfaces might as well just be initialised as "up". If any interfaces are actually "down" then this will be rectified by the "startup" event in 10.interfaces. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-05-13 06:42:13 +02:00
Martin Schwenke	694482fb3f	ctdb-daemon: Skip "IP on interface" checks if DisableIPFailover is set To support external failover of IP addresses if DisableIPFailover is set. CTDB's idea of IP address assignment can be manipulated using "ctdb moveip". Checking if the IP address is already held breaks this in several places. Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-05-13 06:42:13 +02:00
Martin Schwenke	3e6660c46f	ctdb-daemon: Improve readability of code by nesting if-statements ctdb_sys_have_ip() should only be run if if do_publicipcheck is set. This is clearer if written as 2 nested if-statements rather than as a lazy conjuction. Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-05-13 06:42:13 +02:00
Amitay Isaacs	9b6865475e	ctdb-daemon: Remove obsolete IPv4 only controls Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Jeremy Allison <jra@samba.org>	2015-05-12 01:32:11 +02:00
Amitay Isaacs	4f4e6ebace	ctdb-daemon: Remove older data structure that supports only IPv4 addresses Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Jeremy Allison <jra@samba.org>	2015-05-12 01:32:11 +02:00
Martin Schwenke	6808b0aa6a	ctdb-daemon: Drop interface monitoring This is done by 10.interace where the monitor event fails when there is a missing interface. The in-daemon interface checking adds no value. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-05-10 03:22:14 +02:00
Martin Schwenke	7ee57b8d7c	ctdb-recoverd: Short circuit takeover run if no nodes are RUNNING If all nodes are still in, say, FIRST_RECOVERY runstate, then the logs contain unfortunate noise like: recoverd:Failed to find node to cover ip 10.0.2.131 This avoids that by adding an early exit that avoids running takeover_run_core() when there are no nodes in the CTDB_RUNSTATE_RUNNING. To support this add the runstate to the ipflags structure. There are clearly other ways of hacking this but this seems the simplest. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2015-05-10 03:22:13 +02:00
Rajesh Joseph	9b33732a57	ctdb: Coverity fix for CID 1125630 Due to usage of CTDB_NO_MEMORY macro, some of the resources are not freed in failure cases. Signed-off-by: Rajesh Joseph <rjoseph@redhat.com> Reviewed-by: Guenther Deschner <gd@samba.org> Reviewed-by: Michael Adam <obnox@samba.org> Autobuild-User(master): Günther Deschner <gd@samba.org> Autobuild-Date(master): Fri Apr 17 16:49:05 CEST 2015 on sn-devel-104	2015-04-17 16:49:04 +02:00
Amitay Isaacs	41ed26cbf7	ctdb-recoverd: Fix typo in comment Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2015-03-27 06:40:08 +01:00
Martin Schwenke	108b1be0ee	ctdb-daemon: Trust vnn->interface for an IP when releasing it ctdb_sys_find_ifname() doesn't work for IPv6 addresses so don't use it. Trust the eventscript to do sanity checking on the interface. Current warnings are replaced with equivalents generated by the eventscript. The unlikely message: Public IP %s is hosted on interface %s but we have no VNN will be replaced by: WARNING: Public IP %s hosted on interface %s but VNN says __none__ which is clear enough. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2014-12-05 21:02:40 +01:00
Martin Schwenke	a4e76b58a5	ctdb-util: Add extra max_size argument to file_lines_load() This is part of a migration to Samba's lib/util. CTDB always passes 0 (i.e. no max_size) so use a simple assert() to enforce this, rather than changing a lot of code that will be discarded anyway. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2014-09-10 01:36:15 +02:00
Martin Schwenke	c1558adeaa	ctdb: Use sys_read() and sys_write() to ensure correct signal interaction ... and avoid compiler warnings in some cases. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2014-08-21 04:46:13 +02:00
Martin Schwenke	6f43896e12	ctdb-daemon: Debugging for tickle updates This was useful for debugging the race fixed by commit `4f79fa6c7c`. It might be useful again. Also fix a nearby comment typo. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> Autobuild-User(master): Amitay Isaacs <amitay@samba.org> Autobuild-Date(master): Fri Jun 20 02:07:48 CEST 2014 on sn-devel-104	2014-06-20 02:07:48 +02:00
Martin Schwenke	cbd6beb469	ctdb-daemon: Move a ZERO_STRUCT() to a better place It might as well be near where it is used. Add a comment explaining it. Also add/update comments at the top of the RELEASE_IP and TAKEOVER_IP loops to explain what is happening. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> Autobuild-User(master): Amitay Isaacs <amitay@samba.org> Autobuild-Date(master): Mon May 5 06:20:39 CEST 2014 on sn-devel-104	2014-05-05 06:20:38 +02:00
Gregor Beck	6cdde2711b	ctdb:daemon avoid goto ctdb_remove_orphaned_ifaces() Signed-off-by: Gregor Beck <gbeck@sernet.de> Reviewed-by: David Disseldorp <ddiss@samba.org> Reviewed-by: Michael Adam <obnox@samba.org> Autobuild-User(master): Michael Adam <obnox@samba.org> Autobuild-Date(master): Tue Apr 1 02:59:05 CEST 2014 on sn-devel-104	2014-04-01 02:59:05 +02:00
Gregor Beck	dd56afc7df	ctdb:daemon take a shortcut in all_nodes_are_disabled() Signed-off-by: Gregor Beck <gbeck@sernet.de> Reviewed-by: David Disseldorp <ddiss@samba.org> Reviewed-by: Michael Adam <obnox@samba.org>	2014-04-01 00:55:45 +02:00
Martin Schwenke	20c719677a	ctdb/daemon: Optimise deletion of IPs Previous commits maintained the ordering between ctdb_remove_orphaned_ifaces() and ctdb_vnn_unassign_iface(). This meant that ctdb_remove_orphaned_ifaces() needed to steal the orphaned interfaces and they would be freed later. Unassign the interface first and things get simpler. ctdb_remove_orphaned_ifaces() is now self-contained. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> Autobuild-User(master): Amitay Isaacs <amitay@samba.org> Autobuild-Date(master): Sun Mar 23 06:20:43 CET 2014 on sn-devel-104	2014-03-23 06:20:43 +01:00
Martin Schwenke	9b907536fb	ctdb/daemon: Make delete IP wait until the IP is released reloadips really expects deleted IPs to be released before completing. Otherwise the recovery daemon starts failing the local IP check. The races that follow can cause a node to be banned. To make the error handling simple, do the actual deletion in release_ip_callback(). Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2014-03-23 04:20:15 +01:00
Martin Schwenke	4f79fa6c7c	ctdb-daemon: Fix tickle updates to recently started nodes Commit `0723fedced` added a cheap implemention of ctdb_control_startup() that simply flags the recipient node as needing to send updates for each IP when the tickle update loop next fires. Commit `026996550d` ensures that a node only sends tickle updates once being flagged to do so. CTDB_CONTROL_STARTUP is broadcast to all nodes, so this is a good start. However, the tickle updates are only broadcast to connected nodes. A recently started node may not yet be considered to be connected because the keepalive monitoring loop may not yet have marked the node as connected. This means that the tickle update loop races with the keepalive monitoring loop. If the tickle update loop wins then updates will not be sent to the recently started node. The simplest improvement is to stop the tickle update from depending on whether a node is connected or not. So instead of broadcasting tickle updates to connected nodes, they are broadcast to all nodes. Since no reply is expected, this should work just fine. While looking at this code, ctdb_ctrl_set_tcp_tickles() is named like a client function. It isn't a client function. Also, 2 of the arguments are ignored. So rename this function to ctdb_send_set_tcp_tickles_for_ip() and remove the ignored arguments. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>	2014-03-23 04:20:14 +01:00
Amitay Isaacs	fb2631f5df	ctdb-daemon: Do not support connection tracking if there are no public IPs CTDB tracks connections to be able to send tickle ACKs and gratuitous ARPs. When there are no public IPs, there is no need for tickle ACKs and gratuitous ARPs. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> Autobuild-User(master): Martin Schwenke <martins@samba.org> Autobuild-Date(master): Tue Mar 4 03:01:38 CET 2014 on sn-devel-104	2014-03-04 03:01:38 +01:00
Amitay Isaacs	7d05baa96b	ctdb-recoverd: Check if callback function is registered before calling Fix suggested by by Kevin Osborn <kosborn@overlandstorage.com>. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net> Autobuild-User(master): Martin Schwenke <martins@samba.org> Autobuild-Date(master): Thu Feb 27 13:54:59 CET 2014 on sn-devel-104	2014-02-27 13:54:59 +01:00
Amitay Isaacs	026996550d	ctdb-daemon: After updating tickles on other nodes, set update flag to false tcp_update_flag is set to true whenever tickles are added or deleted. This flag is used to determine whether or not to send tickles list to other nodes. Once tickles list is sent to other nodes successfully, set tcp_update_flag to false, so ctdbd does not keep sending same tickles list every TickleUpdateInterval (20 seconds). Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2014-02-27 11:49:39 +01:00
Martin Schwenke	0723fedced	ctdb-daemon: Implement ctdb_control_startup() This doesn't implement what was recommended. That would require careful error handling, probably with a fallback to this code anyway. This is simple and does no worse that the current code. That is, the new node is updated on the next call to tdb_update_tcp_tickles(). Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2014-02-27 11:49:39 +01:00
Amitay Isaacs	75ca1216a6	ctdb-daemon: Fix whitespaces Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2014-02-27 11:49:39 +01:00
Amitay Isaacs	f2cd999189	ctdb-daemon: Always talloc tickle array off vnn instead of ctdb->nodes This fixes ctdb crash reported in bug #10366. Fix suggested by Kevin Osborn <kosborn@overlandstorage.com>. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2014-02-27 11:49:39 +01:00
Martin Schwenke	24b734f084	ctdb-recoverd: LCP2 cleanups * Remove unnecessary candimbl parameter. This parameter can be cheaply calculated in lcp2_failback_candidate(). The compiler will probably do an excellent job optimising it. :-) * Clarify a debug statement This is much clearer than doing a complex recalculation of a known value. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2014-02-19 12:04:47 +11:00
Martin Schwenke	9e5ef44f32	ctdb-recoverd: Optimise check for rebalance candidates in LCP2 Currently this can be checked many times. However, there's no point calling the rebalance/failback code at all if there are no rebalance candidates. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2014-02-19 12:04:47 +11:00
Martin Schwenke	f1a20d748f	ctdb-recoverd: Fix a bug in the LCP2 rebalancing code srcimbl gets changed on every iteration of the loop. The value that should be stored for the new imbalance of the source node is minsrcimbl. To help diagnose this, added some extra debug that can be left in. The extra debug changes the output of a couple of tests. Note that the resulting IP allocations in those tests is unchanged - only the debug output is changed. Also add some new tests that illustrates the bug. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2014-02-13 02:03:24 +01:00
Martin Schwenke	e5778cc172	ctdb/daemon: reloadips must register state of asynchronous controls Otherwise ctdb_client_async_wait() is a no-op. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2014-01-31 13:36:04 +11:00
Martin Schwenke	a955d0bedc	ctdb-recoverd: Ignore failed ipreallocated controls to inactive nodes Currently timeouts for controls to inactive nodes can cause banning credits to be applied. This should not happen. Signed-off-by: Martin Schwenke <martin@meltin.net> Reviewed-by: Amitay Isaacs <amitay@gmail.com>	2014-01-17 17:59:08 +11:00
Amitay Isaacs	7aa20ccb5c	ctdb-daemon: No need to call event scripts with CTDB_CALLED_BY_USER This was added to support external monitoring using CTDB event scripts. However, it was never used. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Martin Schwenke <martin@meltin.net>	2014-01-16 11:41:12 +11:00
Amitay Isaacs	6d1b74f052	ctdb-server: Coverity fixes Signed-off-by: Amitay Isaacs <amitay@gmail.com> Reviewed-by: Michael Adam <obnox@samba.org>	2013-11-19 17:13:03 +01:00
Martin Schwenke	4adc8f4f09	ctdbd: Default for event_script_dir should use CTDB_BASE Also get rid of ctdb_set_event_script_dir(). It creates an unnecessary copy of something that will be around for the lifetime of the process. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 21b4d1aba00902f1eee0cbf4f082b0794fd5b738)	2013-10-22 15:37:54 +11:00
Martin Schwenke	4fb0d4a301	recoverd: reloadips should rebalance target nodes for new IPs Otherwise, if existing IPs are added to extra nodes (that have, perhaps, been disconnected) then those IPs will not be rebalanced across the extra nodes. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit ceb30432a9a550778aed0b422a654fc5287b82a3)	2013-09-19 12:54:31 +10:00
Martin Schwenke	950e23f664	ctdbd: Make ctdb_reloadips_child send controls asynchronously Deleting IPs can take a while because IPs are released and connections are killed. This can take a while so do them in parallel. In fact, since the set of IPs being added and deleted will be disjoint, send all the adds/deletes at the same time and then wait. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 85a5b544ec032173e98c9cc3b5402a76b961aa3b)	2013-09-19 12:54:31 +10:00
Martin Schwenke	b33ee7a2a4	recoverd: Fix the implementation of CTDB_SRVID_REBALANCE_NODE The current implementation has a few flaws: * A takeover run is called unconditionally when the timer goes even if the recovery master role has moved. This means a node other than the recovery master can incorrectly do a takeover run. * The rebalancing target nodes are cleared in the setup for a takeover run, regardless of whether the takeover run succeeds. * The timer to force a rebalance isn't cleared if another takeover run occurs before the deadline. Any forced rebalancing will happen in the first takeover run and when the timer expires some time later then an unnecessary takeover run will occur. * If the recovery master role moves then the rebalancing data will stay on the original node and affect the next takeover run to occur if the recovery master role should come back to the original node. Instead, store an array of rebalance target nodes in the recovery master context. This is passed as an extra argument to ctdb_takeover_run() each time it is called and is cleared when a takeover run succeeds. The timer hangs off the array of rebalance target nodes, which is cleared if the node isn't the recovery master. This means that it is possible to lose rebalance data if the recovery master role moves. However, that's a difficult problem to solve. The best way of approaching it is probably to try to stop the recovery master role from jumping around unnecesarily when inactive nodes join the cluster. The long term solution is to avoid this nonsense completely. The IP allocation algorithm needs to cache state between runs so that it knows which nodes have just become healthy. This also needs recovery master stability. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit c51c1efe5fc7fa668597f2acd435dee16e410fc9)	2013-09-19 12:54:31 +10:00
Martin Schwenke	c503997746	recoverd: Move disabling of IP checks into do_takeover_run() Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 48b603fbf16311daa47b01e7a33d477ed51da56d)	2013-09-19 12:54:30 +10:00
Martin Schwenke	701c450e90	recoverd: Fail takeover run if "ipreallocated" fails Previously flagging a failure was probably avoided because of attempts to run "ipreallocated" events on stopped and banned nodes, which would fail because they are in recovery. Given the change to a new control and that fallback only retries the old method on active nodes, this should never fail in reasonable circumstances. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 53722430ad35f80935aabd12fa07654126443b8b)	2013-09-19 12:54:30 +10:00
Martin Schwenke	630196423a	recoverd: Banned nodes should not be told to run "ipreallocated" event They will reject it because they are in recovery. This can result in extra banning credits being applied to banned nodes. This corresponds to commit 9132e6814ed927fa317f333f03dedb18f75d0e5b from the 1.2.40 branch. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 403938804caf1322f9773d63197e4303a7b2a788)	2013-09-18 17:16:35 +10:00
Martin Schwenke	8d11da3546	recoverd: Remove an orphaned comment This should have been removed with the associated code in commit 14bd0b6961ef1294e9cba74ce875386b7dfbf446. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 36de63843de10a1f2a9ccdbbee24cc1d08542984)	2013-09-11 15:35:16 +10:00
Martin Schwenke	4e62553fcb	recoverd: Update a comment to use current terminology Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit ea5576071b22e1877903ec0921d375626a23e13b)	2013-09-11 15:35:10 +10:00
Martin Schwenke	1ae731198a	recoverd: Move struct ctdb_public_ip_list back into ctdb_takeover.c This is an internal structure. It was moved into ctdb_private.h a long time ago to allow unit testing. Unit test compilation was changed shortly afterwards to make this unnecessary. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit db57261d7dc264e161659a8c547f44fbd9e88eeb)	2013-08-22 17:00:20 +10:00
Martin Schwenke	a5cb72cac3	ctdbd: Kill client process without checking for tracked child Commit f73a4b1495830bcdd094a93732a89dd53b3c2f78 added a safety check to ensure that CTDB never kills unrelated processes. However, client processes are unrelated. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 782814288bb560099ee44b607bf35f3eddf37f82)	2013-07-29 15:58:51 +10:00
Martin Schwenke	f46ab595d1	recoverd: Call takeover fail callback only once per node Currently the fail callback is called once per (takeip/releaseip) control failure. This is overkill and can get a node banned much too quickly. Instead, keep track of control failures per node and only call fail callback once per failed node. Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit bf4a7c1ad87e0e848296d15d63eb8cd901ca5335)	2013-07-29 15:48:48 +10:00
Amitay Isaacs	1c21f37e57	ctdbd: Set process names for child processes This helps distinguish processes in process list in top, perf, etc. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 2493f57ce268d6fe7e4c40a87852c347fd60d29e)	2013-07-10 14:33:19 +10:00
Amitay Isaacs	bcb64aa55f	recoverd: Fix buffer overflow error in reloadips Signed-off-by: Amitay Isaacs <amitay@gmail.com> Pair-Programmed-With: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 41182623891d74a7e9e9c453183411a161201e67)	2013-07-05 15:52:34 +10:00
Martin Schwenke	dcdae86dc7	ctdbd: Log something when releasing all IPs At the moment this is silent and it can be confusing to see IPs just disappear. Also, this message: Been in recovery mode for too long. Dropping all IPS can cause anxiety when all IPs should already have been dropped. Adding a comforting message saying that 0 IPs were dropped relieves such anxiety. :-) Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 4d0f26b306fc465d551d340b0e7dce4412eae3fd)	2013-07-05 15:52:33 +10:00
Martin Schwenke	7290798a41	recoverd: Clean up log messages in remote IP verification The log messages in verify_remote_ip_allocation() are confusing because they don't include the PNN of the problem node, because it is not known in this function. Add the PNN of the node being verified as a function argument and then shuffle the log messages around to make them clearer. Also fold 3 nested if statements into just one. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit f0942fa01cd422133fc9398f56b4855397d7bc86)	2013-07-05 15:52:33 +10:00
Martin Schwenke	26b161156a	ctdbd: Release IP callback should fail if the IP is still hosted At the moment there (at least) are 2 bugs that cause rogue IPs: * A race where release_ip_callback() runs after a "subsequent" take IP has completed. The IP is back on an interface but we unset vnn->iface in the callback. * A "releaseip" eventscript times out. We ignore the timeout and call it success, deleting the VNN even if the IP is still hosted. We could decide not to ignore the timeout and ban the node, but killing TCP connections can take a long time and that might result in a lot of manning. We probably won't reinstate banning on "releaseip" until killing TCP connections has been optimised. In both cases, a rogue IP can be avoided by leaving vnn->iface set and simply failing the control. Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit c5797f2942e83da24df548ea07196fbbac0eab20)	2013-07-05 15:52:32 +10:00
Martin Schwenke	793233f6b6	ctdbd: Log warnings in release IP when unexpected interface is encountered Previous code changes work around a potential problems but do not provide useful information when the a problem occurs. Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit f1f1b0c24b9b6cd24b83a4e4da16e179287ec6ac)	2013-07-05 15:52:32 +10:00
Amitay Isaacs	6391f61fbc	build: Fix compiler warnings for uninitialized variables Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 5408c5c4050539e5aa06a5e82ceb63a6cb5cef0c)	2013-07-04 20:43:52 +10:00
Mathieu Parent	d82b9ae410	build: Fix tdb.h path to enable building with system TDB library (This used to be ctdb commit f8bf99de3a5f56be67aaa67ed836458b1cf73e86)	2013-06-14 16:45:27 +10:00
Martin Schwenke	1ab2bbb349	recoverd: Backward compatibility for nodes without IPREALLOCATED control Consider the case of upgrading a cluster node by node, where some nodes are still running older versions of CTDB without the IPREALLOCATED control. If a "new" node takes over as recovery master and a failover occurs, then it will attempt to send IPREALLOCATED controls to all nodes. The "old" nodes will fail in a fairly nondescript way (result == -1). To try to handle this situation, fall back to the EVENTSCRIPT control to handle "ipreallocated". Only do this on the failed nodes. However, do not do this on nodes that timed out (they've probably implemented the control and we should call the regular fail_callback to get those nodes banned) or for stopped nodes (since they can't actually run the "ipreallocated" event via the EVENTSCRIPT control). Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit b2654853ce9b7c18c5874b080bc94d3118078a5d)	2013-05-27 15:15:25 +10:00
Martin Schwenke	f35e9bba9b	recoverd: Nodes can only takeover IPs if they are in runstate RUNNING Currently the order of the first IP allocation, including the first "ipreallocated" event, and the "startup" event is undefined. Both of these events can (re)start services. This stops IPs being hosted before the "startup" event has completed. Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit f15dd562fd8c08cafd957ce9509102db7eb49668)	2013-05-24 16:27:55 +10:00
Martin Schwenke	7f03618ae4	recoverd: Handle errors carefully when fetching tunables If a tunable is not implemented on a remote node then this should not be fatal. In this case the takeover run can continue using benign defaults for the tunables. However, timeouts and any unexpected errors should be fatal. These should abort the takeover run because they can lead to unexpected IP movements. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit c0c27762ea728ed86405b29c642ba9e43200f4ae)	2013-05-24 16:27:55 +10:00
Martin Schwenke	116f62a7b3	recoverd: Set explicit default value when getting tunable from nodes Both of the current defaults are implicitly 0. It is better to make the defaults obvious. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 1190bb0d9c14dc5889c2df56f6c8986db23d81a1)	2013-05-24 16:04:57 +10:00
Martin Schwenke	e78b064dcc	recoverd: Whitespace improvements Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 473cfcb019f0cb4a094bf10397f7414f7923ee57)	2013-05-24 15:55:11 +10:00
Martin Schwenke	1a181a4284	recoverd: Use talloc_array_length() for simpler code Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit f6792f478197774d2f3b2258c969b67c83e017ab)	2013-05-24 15:55:10 +10:00
Martin Schwenke	63577c96db	ctdbd: Replace ctdb->done_startup with ctdb->runstate This allows states, including startup and shutdown states, to be clearly tracked. This doesn't include regular runtime "states", which are handled by node flags. Introduce new functions ctdb_set_runstate(), runstate_to_string() and runstate_from_string(). Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 8076773a9924dcf8aff16f7d96b2b9ac383ecc28)	2013-05-24 14:08:06 +10:00
Martin Schwenke	5fdf71b898	recoverd: takeover_run_core() should not use modified node flags Modifying the node flags with IP-allocation-only flags is not necessary. It causes breakage if the flags are not cleared after use. ctdb_takeover_run() no longer needs the general node flags - it only needs the IP flags. Instead of modifying the node flags in nodemap, construct a custom IP flags list and have takeover_run_core() use that instead of node flags. As well as being safer, this makes the IP allocation code more self contained and a little bit clearer. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 14bd0b6961ef1294e9cba74ce875386b7dfbf446)	2013-05-23 16:18:23 +10:00
Martin Schwenke	e769f8575a	ctdbd: Log add and delete of IPs At the moment, when someone deletes all the IPs on a node, all we see are the release IP messages and we have to guess why. Some would argue that add/release are more significant than take/release so they should be logged. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 3c3df1d6afec7e3e721f9bcd4e8b8e008fd6e50b)	2013-05-22 14:24:22 +10:00
Martin Schwenke	0baefba368	ctdbd: Removed bogus comment in ctdb_find_iface() Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 4a8d90d0812a3242f58a2a0e2aa0f528f60f7013)	2013-05-22 14:24:21 +10:00
Martin Schwenke	54e91df60d	recoverd: Move IP flags into ctdb_takeover.c These should never be seen outside the IP allocation code. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit e143abd16ccde2e0edfe103673d31a5fb06b6aef)	2013-05-09 12:55:42 +10:00
Martin Schwenke	50f19b5bd4	recoverd: Clear IP flags after IP allocation algorithm has run If these flags are left set they will confuse other recovery daemon code. Factor the clearing code into new function clear_ipflags(). Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 45c776958017ea7001f061842c9e0f60e4a25f23)	2013-05-09 12:55:42 +10:00
Martin Schwenke	530020d83b	recoverd: Remove unused mask argument and initial mask calculation This has been replaced by set_ipflags() and associated functionality. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit d0a3822573db296e73cc897835f783c8abc084b3)	2013-05-07 16:20:47 +10:00
Martin Schwenke	ee7357de51	recoverd: When calculating rebalance candidates don't consider flags This is really a check to see if a node is already hosting IPs. If so, we assume it was previously healthy so it isn't considered as a rebalance candidate. There's no need to limit this to healthy node, since this is checked elsewhere. Due to this the variable newly_healthy is renamed everywhere to rebalance_candidates. The mask argument is now completely unused. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 65e0ea6c2c0629e19349ba4b9affa221fde2b070)	2013-05-07 16:20:47 +10:00
Martin Schwenke	c9056b4f88	recoverd: Remove unused mask argument from IP allocation functions This is a no-op and is in a separate commit to make the previous commit less cumbersome. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 107e656bbe24f9d21fbaf886a3e9417da4effe5a)	2013-05-07 16:20:47 +10:00
Martin Schwenke	0445c988e2	recoverd: Fix tunable NoIPTakeoverOnDisabled, rename to NoIPHostOnAllDisabled This really needs to be per-node. The rename is because nodes with this tunable switched on should drop IPs if they become unhealthy (or disabled in some other way). * Add new flag NODE_FLAGS_NOIPHOST, only used in recovery daemon. * Enhance set_ipflags_internal() and set_ipflags() to setup NODE_FLAGS_NOIPHOST depending on setting of NoIPHostOnAllDisabled and/or whether nodes are disabled/inactive. * Replace can_node_servce_ip() with functions can_node_host_ip() and can_node_takeover_ip(). These functions are the only ones that need to look at NODE_FLAGS_NOIPTAKEOVER and NODE_FLAGS_NOIPHOST. They can make the decision without looking at any other flags due to previous setup. * Remove explicit flag checking in IP allocation functions (including unassign_unsuitable_ips()) and just call can_node_host_ip() and can_node_takeover_ip() as appropriate. * Update test code to handle CTDB_SET_NoIPHostOnAllDisabled. Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 1308a51f73f2e29ba4dbebb6111d9309a89732cc)	2013-05-07 16:20:46 +10:00
Martin Schwenke	ac80824709	recoverd: Factor out new function all_nodes_are_disabled() Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 12aef10e9889760d98f58c8d916f19d069fa381a)	2013-05-07 16:20:46 +10:00
Martin Schwenke	657162fb34	recoverd: Refactor code to get NoIPTakeover tunable from all nodes Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 1fb5352d2b6918fcc6f630db49275d25a3eebe8d)	2013-05-07 16:20:46 +10:00
Martin Schwenke	17521b31b2	recoverd: Add debug message when dropping IPs in IP allocation Update tests accordingly. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 91405282ba4abad4ad8e8c5f7ee4c83c75f38280)	2013-05-07 16:20:46 +10:00
Martin Schwenke	745c6bc363	recoverd: ctdb_takeover_run() uses CTDB_CONTROL_IPREALLOCATED This means "ipreallocated" is now run on stopped nodes. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 83b61f7414b1f7a3424497ac987ca0724fba9eaa)	2013-05-06 13:38:21 +10:00
Martin Schwenke	2e59cd5428	ctdbd: New control CTDB_CONTROL_IPREALLOCATED This is an alternative to using ctdb_run_eventscripts() that can be used when in recovery. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 27a44685f0d7a88804b61a1542bb42adc8f88cb1)	2013-05-06 13:38:21 +10:00
Amitay Isaacs	77a29b3733	recoverd/takeover: Use IP->node mapping info from nodes hosting that IP When collating IP information for IP layout, only trust the nodes that are hosting an IP, to have correct information about that IP. Ignore what all the other nodes think. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 1c7adbccc69ac276d2b957ad16c3802fdb8868ca)	2013-04-08 11:14:32 +10:00
Martin Schwenke	53bd183683	recoverd: Separate each IP allocation algorithm into its own function This makes the code much more readable and maintainable. As a side effect, fix a memory leak in LCP2. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 6a1d88a17321f7e1dc84b4823d5e7588516a6904)	2013-01-08 10:16:11 +11:00
Martin Schwenke	2e8df43561	recoverd: New function unassign_unsuitable_ips() Move the code into a new function so it can be called from a number of places. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 8adb255e62dbe60d1e983047acd7b9c941231d11)	2013-01-08 10:16:11 +11:00
Martin Schwenke	bcefb76884	recoverd: Move failback retry loop into basic_failback() and lcp2_failback() The retry loop is currently in ctdb_takeover_run_core(). Pushing it into each function will make it possible to put each algorithm into a separate top-level function. This will make the code much clearer and more maintainable. Also keep associated test code compatible. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit f6ce18d011dd9043b04256690d826deb2640cd89)	2013-01-08 10:16:11 +11:00
Martin Schwenke	443fbb9e01	recoverd: Trying to failback more IPs no longer allocates unassigned IPs Neither basic_failback() nor lcp2_failback() unassign IPs anymore, so there's no point looping back that far. Also fix a unit test that now fails because looping back to handle unassigned IPs is no longer logged. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit c09aeaecad7d3232b1c07bab826b96818756f5e0)	2013-01-08 10:16:11 +11:00
Martin Schwenke	dfa7ce7b73	recoverd: basic_failback() can call find_takeover_node() directly Instead of unassigning, looping back and depending on basic_allocate_unassigned. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 4dc08e37dec464c8785a2ddae15c7c69d3c81ac3)	2013-01-08 10:16:11 +11:00
Martin Schwenke	326328d520	recoverd: Don't do failback at all when deterministic IPs are in use This seems to be the right thing to do instead of calling into the failback code and continually skipping the release of an IP. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 4c87e7cb3fa2cf2e034fa8454364e0a7fe0c8f81)	2013-01-08 10:16:11 +11:00
Martin Schwenke	ef403f70f2	recoverd: Move the test for both 'DeterministicIPs' and 'NoIPFailback' set If this is done earlier then some other logic can be improved. Also, this should be a warning since no error condition is set. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit e06476e07197b7327b8bdac9c0b2e7281798ffec)	2013-01-08 10:16:11 +11:00
Martin Schwenke	a3911ed7bf	recoverd: Fix a memory leak in IP allocation Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit bcd5f587aff3ba536cb0b5ef00d2d802352bae25)	2013-01-08 10:16:11 +11:00
Martin Schwenke	4f0d68cba6	ctdbd: Clean up orphaned interfaces when an IP is deleted Add a new function ctdb_remove_orphaned_ifaces() and call it in ctdb_control_del_public_address(). ctdb_remove_orphaned_ifaces() uses a naive implementation that does things in a very obvious way. There are many ways to improve the performance - some are mentioned in a comment in the code. However, I doubt that this will be a bottleneck even with a large number of public IPs. Running the eventscript is likely to outweigh the cost of this cleanup. Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit cc1a3ae911d3fee8b87fda5de5ab6d9499d7510a)	2013-01-07 12:19:33 +11:00
Martin Schwenke	0f1bcebc80	ctdbd: Make the link status of new interfaces more flexible Neither up nor down is a good default value for the link status of a new interface. Up means that IPs can be assigned to interfaces before the true state is known and they can move away quickly if the interface is actually down. Down means that IPs can't be assigned to an interface for a variable amount of time - until a monitor cycle occurs - and this can result in imbalanced IPs. This is a neat compromise. Before the startup event completes, IPs can't be assigned to interfaces because all interfaces begin in a down state. As soon as the startup event completes, IPs can be allocated to any interface that has been marked up by the eventscript. Later, during normal operation, newly added IPs can be assigned to new interfaces immediately. The IPs will still move away if an interface is noticed to be down in the next monitor cycle, but that is the exception rather than the rule. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 9275a69a414482f1053ae14528d5972575b9214e)	2012-11-19 15:53:13 +11:00
Amitay Isaacs	85c8deca3f	recoverd: Track the nodes that fail takeover run and set culprit count If any of the nodes fail takeover run (either due to timeout or failure to complete within takeover_timeout interval) from main loop, recovery master will give up trying takeover run with following message: "Unable to setup public takeover addresses. Try again later" And as a side-effect the monitoring is disabled on all the nodes. Before ctdb_takeover_run() is called from main loop, monitoring get disabled via startrecovery event. Since ctdb_takeover_run() fails, it never runs recovered event and monitoring does not get re-enabled. In main_loop, ctdb_takeover_run() is called with a takeover_fail_callback. This callback will get called if any of the nodes fail in handling takeip/releaseip/ipreallocated events in ctdb_takeover_run(). Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit a5c6bb1fffb8dc3960af113957a1fd080cc7c245)	2012-11-14 10:59:54 +11:00
Martin Schwenke	62046a8a4c	recoverd: When starting a takeover run disable IP verification Disable for TakeoverTimeout seconds. Otherwise the the recovery daemon can get overzealous and start trying to add/delete addresses that it thinks are missing but where the eventscript just hasn't finished. This didn't used to matter so much but it is more important now that concurrent takeip/releaseip/updateip generate error - we want to avoid spamming the log. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 56fcee3c7730cb12fa666072d5400949af6e5f7c)	2012-10-11 12:10:45 +11:00
Martin Schwenke	4b4e4d8870	ctdbd: Stop takeovers and releases from colliding in mid-air There's a race here where release and takeover events for an IP can run at the same time. For example, a "ctdb deleteip" and a takeover initiated by the recovery daemon. The timeline is as follows: 1. The release code registers a callback to update the VNN. The callback is executed after the eventscripts run the releaseip event. 2. The release code calls the eventscripts for the releaseip event, removing IP from its interface. The takeover code "updates" the VNN saying that IP is on some iface.... even if/though the address is already there. 3. The release callback runs, removing the iface associated with IP in the VNN. The takeover code calls the eventscripts for the takeip event, adding IP to an interface. As a result, CTDB doesn't think it should be hosting IP but IP is on an interface. The recovery daemon fixes this later... but it shouldn't happen. This patch can cause some additional noise in the logs: Release of IP 10.0.2.133/24 on interface eth2 node:2 recoverd:We are still serving a public address '10.0.2.133' that we should not be serving. Removing it. Release of IP 10.0.2.133/24 rejected update for this IP already in flight recoverd:client/ctdb_client.c:2455 ctdb_control for release_ip failed recoverd:Failed to release local ip address In this case the node has started releasing an IP when the recovery daemon notices the addresses is still hosted and initiates another release. This noise is harmless but annoying. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit bfe16cf69bf2eee93c0d831f76d88bba0c2b96c2)	2012-10-11 12:10:45 +11:00
Martin Schwenke	79ea15bf96	ctdbd: New tunable NoIPTakeoverOnDisabled Stops the behaviour where unhealthy nodes can host IPs when there are no healthy nodes. Set this to 1 when an immediate complete outage is preferred when all nodes are unhealthy. The alternative (i.e. default) can lead to undefined behaviour when the shared filesystem is unavailable. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit a555940fb5c914b7581667a05153256ad7d17774)	2012-10-11 12:10:45 +11:00
Martin Schwenke	9aa9abcc19	ctdbd: Avoid unnecessary updateip event The existing code makes one fatally bad assumption: vnn->iface->references can never be -1 (or max-unit32_t in this case). Right now the reference counting is broken so a reference count of -1 is possible and causes a spurious updateip when vnn->iface is the same as best_face. This can occur frequently because we get a lot of redundant takeovers, especially when each IP can only be hosted on one interface. This makes the code much more defensive by noting that when best_iface is the same as vnn->iface there is never a need for an updateip event. This effectively neuters the updateip code path when IPs can only be hosted by a single interface. This should obsolete 6a74515f0a1e24d97cee3ba05d89133aac7ad2b7. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 7054e4ded59c6b8f254dcfefaef64da05f25aecd)	2012-10-10 14:54:53 +11:00
Amitay Isaacs	3c1f656764	Revert "when creating/adding a public ip, set the initial interface to be the first interface specified" This reverts commit 4308935ba48ac7a29e7523315acf580019715f0f. This fixes 16_ctdb_config_add_ip.sh test when run against local daemons. When running against local daemons, if the interface is assigned as soon as an IP is added, then takeover would never assign this IP address. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 06dfd13604d08910e07cbf927c338d7b9fce9a2f)	2012-10-07 15:25:34 +11:00
Martin Schwenke	7df1da1c91	recoverd: Update a log message that has bit-rotted This message used to be correct because the ipreallocated event only handled updating the NAT gateway. However, that has changed so the message needs to be updated. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit cc9d96f4248e45ea99c5f00db1526426ac26fbc2)	2012-08-08 16:11:11 +10:00
Martin Schwenke	75a0041567	ctdbd: Fix ctdb_control_release_ip() on local daemons When running on local daemons no IPs are actually assigned to interfaces. Commit 9a806dec8687e2ec08a308853b61af6aed5e5d1e broke ctdb_control_release_ip() for local daemons because it asks the system which interface the given IP is on, instead of the old behaviour of trusting CTDB's internal records. For local deamons (i.e. !ctdb->do_checkpublicip) revert to the old behaviour of looking up the interface internally. This is good enough, given that the tests don't tend to misconfigure the addresses. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 38e8651b955afdbaf0ae87c24c55c052f8209290)	2012-07-26 22:10:54 +10:00
Amitay Isaacs	e379fc3ea5	Fix compiler warnings. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit d29e1880c8ce7219e065d31b47b0e8ad9e83146d)	2012-07-13 14:50:56 +10:00
Ronnie Sahlberg	c7e648c2d1	When we release an ip, get the interface name from the kernel instead of using the interface where ctdb thinks the ip is hosted at. The difference is that this now allows us to handle cases where we want to release an ip but ctdbd does not know which interface the ip is assigned on. (user has used 'ip addr add...' and manually assigned an ip to the wrong interface) (This used to be ctdb commit c6bf22ba5c01001b7febed73dd16a03bd3fd2bed)	2012-06-20 15:11:56 +10:00
Amitay Isaacs	7631830152	server: Replace BOOL datatype with bool, True/False with true/false Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 6e5cbe8fff71985e5a2fc16b7e9f2b868011ff5d)	2012-05-28 11:22:25 +10:00
Ronnie Sahlberg	a57eba2bb4	Track all child process so we never send a signal to an unrelated process (our child died and kernel wrapped the pid-space and reused the pid for a different process Wrap all creation of child processes inside ctdb_fork() which is used to track all processes we have spawned. Capture SIGCHLD to track also which child processes have terminated. Wrap kill() inside ctdb_kill() and make sure that we never send a !0 signal to a child process pid that has already terminated (and might have been replaced with a (This used to be ctdb commit f73a4b1495830bcdd094a93732a89dd53b3c2f78)	2012-05-03 14:03:26 +10:00
Ronnie Sahlberg	a367fa6138	RELOADIPS: simplify the reloadips code a bit and also update the "read public address file" to not check if the address exists already locally when we read if from the child process, to stop it from spamming the logs with "We already host ..." messages (This used to be ctdb commit 334ea830f1bf33419f4a1e78f23afd41a852d0f4)	2012-05-01 15:34:26 +10:00
Ronnie Sahlberg	7a1aa560e7	Add new control to reload the public ip address file on a node Also add a method to use the recovery master/daemon to reload the public ips on all nodes in the cluster. Reloading the public ips on all node sin the cluster is only suported if all nodes in the cluster are available and healthy. (This used to be ctdb commit 05603e914f8c12618d7e06943c0f7df207f645b0)	2012-05-01 10:48:08 +10:00
Ronnie Sahlberg	db411aaada	Merge remote branch 'amitay/tevent-sync' (This used to be ctdb commit 17ff3f240b0d72c72ed28d70fb9aeb3b20c80670)	2012-04-26 08:09:23 +10:00
Amitay Isaacs	4392591555	Remove explicit include of lib/tevent/tevent.h. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 0681014ca5ed2a9b56f63fdace7f894beccf8a9a)	2012-04-13 17:28:14 +10:00
Amitay Isaacs	b3d098ced7	ctdbd: Fix spurious warnings when running with --nopublicipcheck Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 67b909a0718d6cfce82ffce0830da3a6ff1f6c4b)	2012-04-13 15:38:11 +10:00
Amitay Isaacs	425b8768ee	ctdbd: Fix the error message string Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 15f63ebab9686734f41a6adf38d4a7faa919ac66)	2012-04-13 14:51:13 +10:00
Ronnie Sahlberg	2456f77ca6	NoIPTakeover: change the tunable name for the "dont allow failing addresses over onto the node" to NoIPTakeover (This used to be ctdb commit 35592e618cfd827b6978af6332f80504f232c46a)	2012-03-22 11:05:15 +11:00
Ronnie Sahlberg	9f31f76805	NoIPFailback: Exclude nodes which have NoIPFailback as failback targets during reallocation (This used to be ctdb commit c262c29773d1608e7ce04bdfb7f4469df0a9637b)	2012-03-22 09:24:32 +11:00
Ronnie Sahlberg	befa9df152	Make NoIPFailback a node local setting. Nodes that have NoIPFailback set to !0 can not takeover new ip addresses during failover. Remove the old global setting for this unused tunable and add it as a new node flag. This node flag is only valid/defined within the takeover subsystem in the recovery daemon. Add async functions to collec the NoIPFailback settings for each node. This will later e used to disqualify certain nodes from being takeover targets when we perform reallocation. (This used to be ctdb commit 668f3e88a9e5f598706952b7140547640c85a5ed)	2012-03-22 09:09:57 +11:00
Ronnie Sahlberg	ef2bd0b016	When adding ips to nodes, set up a deferred rebalance for the whole node to trigger after 60 seconds in case the normal ipreallocated is not sufficient to trigger rebalance. (This used to be ctdb commit 4340263b219d75c39f8de22abe3f6f1c1ee63ea2)	2012-02-28 06:56:04 +11:00
Ronnie Sahlberg	91c9371f2d	Make KILLTCP structure a child of VNN so that it is freed at the same time the referenced VNN structure is. Also, remove the circular reference between the two objects KIPPCTP and VNN (This used to be ctdb commit 02b62482164a3c69715949074feb7f191a29d534)	2012-02-27 07:21:26 +11:00
Volker Lendecke	5e3b13a32a	FreeBSD does not define s6_addr32, only s6_addr Signed-off-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit d657af4fb68ce3f7c462856f2934f6bf169e120b)	2012-02-13 16:20:12 +01:00
Martin Schwenke	3ae8273d86	Make some ctdb_takeover.c functions static These were intentionally not static so they could be linked to in unit test programs. However, using the CCAN-style unit tests where relevant code is just included, this is no longer necessary. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit d0e9e8554614bd49ffb9ec3509feaa0e80d0f65d)	2011-11-11 14:41:47 +11:00
Ronnie Sahlberg	8db9b73920	Merge remote branch 'martins/lcp2fix' (This used to be ctdb commit 7c02d242af552aa732f5c70ea4eeefbc8a8542e2)	2011-11-08 14:06:30 +11:00
Ronnie Sahlberg	0f92fa224c	RB_TREE: Add mechanism to abort a traverse This patch changes the callback signature for traversal functions to allow a client to abort a traverse before it finishes. Updates to all callers and examples as well as rb-test tool. (This used to be ctdb commit 8ab0c63ad36cfbbb1e5fed46a1f4c47b1fdb581f)	2011-11-08 13:40:28 +11:00
Martin Schwenke	c0939af571	LCP IP allocation algorithm - try harder to find a candidate source node There's a bug in LCP2. Selecting the node with the highest imbalance doesn't always work. Some nodes can have a high imbalance metric because they have a lot of IPs. However, these nodes can be part of a group that is perfectly balanced. Nodes in another group with less IPs might actually be imbalanced. Instead of just trying the source node with the highest imbalance this tries them in descending order of imbalance until it finds one where an IP can be moved to another node. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 574091d5aced5e87aefad52f8bc47aa75c25fbf6)	2011-11-02 10:17:00 +11:00
Martin Schwenke	98c27f973d	LCP IP allocation algorithm - new function lcp2_failback_candidate() There's a bug in LCP2. Selecting the node with the highest imbalance doesn't always work. Some nodes can have a high imbalance metric because they have a lot of IPs. However, these nodes can be part of a group that is perfectly balanced. Nodes in another group with less IPs might actually be imbalanced. Factor out the code from lcp2_failback() that actually takes a node and decides which address should be moved to which node. This is the first step in fixing the above bug. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 75718c5768b5bb5c0bcd7dd90e0327c6ed22a63d)	2011-11-01 21:01:25 +11:00
Ronnie Sahlberg	d79596ba1a	One of the entry points to release an ip reset the pnn field before invoking the eventscript. this triggered a check for "only run the eventscript if we host the address" to trigger and shortcir=cuit calling the eventscript. An effect of this would be that 'ctdb delip' would remove the ip from ctdb, but fail to delete it from the interface. S1028798 (This used to be ctdb commit b82524f240bf21769dd7624ca6026763d38b9396)	2011-09-22 15:17:23 +10:00
Ronnie Sahlberg	4587bdb052	when checking that the interfaces exist in ctdb_add_public_address() cant talloc off vnn since it is not yet initialized and might not always be NULL (This used to be ctdb commit 3d37be3e2bfb61ede824028aeebaa18ba304faae)	2011-09-21 11:42:19 +10:00
Ronnie Sahlberg	783ceca07b	Interface monitoring: add a event to trigger every 30 seconds to check that all interfaces referenced by the public address list actually exists. This will make it much easier to root-cause problems such as S1029023 when an external application deleted the interface while it is still is in use by ctdbd. (This used to be ctdb commit 9abf9c919a7e6789695490e2c3de56c21b63fa57)	2011-09-06 17:02:19 +10:00
Ronnie Sahlberg	64378fea58	Check interfaces: when reading the public addresses file to create the vnn list check that the actual interface exist, print error and fail startup if the interface does not exist. (This used to be ctdb commit cd33bbe6454b7b0316bdfffbd06c67b29779e873)	2011-09-06 16:11:00 +10:00
Volker Lendecke	1cf1670f0a	Fix a const warning Signed-off-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit e25559087c9752502580875f7e33f3c416c05f84)	2011-08-22 17:11:07 +02:00
Ronnie Sahlberg	fea64f65b5	Remove a log message about setting linkstate for an unknown interface. sometimes we do want to try to set the linkstate for interfaces that are not in use by public addresses right now (but posisbly by other mechanisms) and these messages just spam the logs S1026357 (This used to be ctdb commit f2fe0a090a9650910ebe49514b3ca01dc593bea3)	2011-08-05 10:05:12 +10:00
Martin Schwenke	5ac67504ca	Tests: Initial test code for LCP2 IP allocation algorithm. Move struct ctdb_public_ip_list to ctdb_private.h and put some definitions for some functions from ctdb_takeover.c there. This allows those functions to be called from unit tests. Add ctdb_takeover_tests.c and the Makefile support to build it. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 9d34be0233edf3bc022345c0494c4b2a4d7f8480)	2011-07-29 09:01:36 +10:00
Martin Schwenke	ff1a81c872	IP allocation - add LCP2 algorithm. The current non-deterministic IP allocation algorithm balances IPs across the whole cluster. It does not consider different interfaces/VLANs/subnets, so these different groups of IPs aren't generally well balanced. This adds the LCP2 algorithm for IP allocation and allows it to be enabled by setting the "LCP2PublicIPs" tunable to 1. The LCP2 algorithm calculates the imbalance of a node by totalling the squares of the distances between each IP on the node. The IP distance is defined as the length longest common prefix (LCP) of bits that is found when comparing 2 IPs. The imbalance of a cluster is the maximum imbalance for any node. At each step the algorithm selects an allocation to the IP/node combination that results in the choosing the allocation that best reduces the imbalance of the cluster. The implementation splits out the IP allocation part of ctdb_takeover_run() into new function ctdb_takeover_run_core(), and then extracts out the basic IP assignment code into new functions basic_allocate_unassigned() and basic_failback(). 3 new functions lcp2_init(), lcp2_allocate_unassigned() and lcp2_failback() implement the LCP2 algorithm, and are hooked into ctdb_takeover_run_core(). Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 61fc7fbd0235469df22deb6581c6bd47e30bc0be)	2011-07-29 09:01:17 +10:00
Ronnie Sahlberg	e707f23596	Update the delip command Dont talloc_free(vnn) immediately but postphone it until later when the eventscript callback has completed. CQ S1026664 (This used to be ctdb commit 0a99e8742a261b1d3a2c8830f5c19ea6c2c47cad)	2011-07-29 08:50:48 +10:00
Ronnie Sahlberg	c93a968619	When trying to re-balance the ip assignment and shuffle ips from nodes with many addresses to nodes with few addresses, loop up to num_ips+5 times instead of only 5 times. When we have very many public ips per node, we might need to loop more than 5 times or else we will exit without reaching optimal balance. (This used to be ctdb commit aa8114a625a637277561a66c80bdece3c27e9e20)	2011-07-06 13:14:13 +10:00
Ronnie Sahlberg	f84bd3b5f1	Dont call the UPDATE event if both old and new interface is the same. CQ S1018175 (This used to be ctdb commit 6a74515f0a1e24d97cee3ba05d89133aac7ad2b7)	2011-05-04 13:29:29 +10:00
Ronnie Sahlberg	c04505724a	IFACE handling. Assume links are always good on nstartup (they almost always Simplify the handling of setting the links in the 10.interface eventscript and remove the optimization to only call setifacelink on state change to make the code simpler to read. If a take ip event fails, flag the node as unhealthy. Add a check to the interface script to check if the interface exists or if it has been deleted. So that we can capture and become UNHELTHY if someone deletes an interface we are using to host public addresses. (This used to be ctdb commit 4ab63d2a7262aff30d5eced184c294c9c9dd4974)	2011-04-11 07:40:05 +10:00
Ronnie Sahlberg	f82936402f	IP reallocation. If a public address is already hosted on the node when we startup, log a warning message but do not cause the recovery to fail. CQ S1022356 Signed-off-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit 89f8169c24da96c1fdd0ac19b8a1e0e1df01a72a)	2011-03-14 13:35:53 +01:00
Ronnie Sahlberg	93bea39391	IPALLOCATION : If the node is held pinned down in "init" state by external services failing to start, or blocking CTDBD from finishing the startup phase, we can encounter a situation where we have not yet fully initialized, but a remote recovery master tries to release a certain ip clusterwide. In this situation the node that is pinned down in init/startup phase would fail to perform the release of the ip address since we are not yet fully operational and not yet host any valid interfaces. In this situation, we just need to remain unhealthy, there is on need to also ban the node. Remove the autobanning for this condition and just let the node remain in unhealthy mode. Banning is overkill in this situation when the system is broken and just draws attention to ctdbd instead of the root cause. (This used to be ctdb commit d8af74e4c4961deb94c18dde8ba7fc07e944729c)	2011-01-13 09:42:01 +11:00
Ronnie Sahlberg	a9a6ae064d	When assigning the single-public-ip during startup, flag the interface as initially being "link ok" so that we can add it and startup. The eventscript can later drop the flag if required (This used to be ctdb commit 720849b756c825fb8b285f09972a8c39f1888a99)	2010-12-13 14:24:04 +11:00
Ronnie Sahlberg	c2c53db49d	during ip allocation, there are failure modes where a node might hold a ip address but thinks it is still unassigned (-1). add code to the recovery daemon to detect this case and trigger a reallocation so that the ip gets covered and change the takeip code to allow for this condition, taking on an ip address that is already hosted. cq s1021073 (This used to be ctdb commit 9020baf27cab7821c9094cda185206fb7af0fee7)	2010-12-03 13:30:39 +11:00
Ronnie Sahlberg	dbcf0de18c	Dont exit the update ip function if the old and new interfaces are the same since if they are the same for whatever reason this triggers the system to go into an infinite loop and is unrobust The scriptds have been changed instead to be able to cope with this situation for enhanced robustness During takeover_run and when merging all ip allocations across the cluster try to kepe track of when and which node currently hosts an ip address so that we avoid extra ip failovers between nodes (This used to be ctdb commit cf778b5aaf6356401e3985acccc7df9e08ab6930)	2010-11-10 14:55:25 +11:00
Ronnie Sahlberg	6fa8e1fddb	when we load the public address file, at the same time check if we are already hosting the public address, if so, set ourselves up as the pnn for that address (This used to be ctdb commit 0f2a2dac91a61be188c3578c8bb89d47cbf9a0f8)	2010-11-10 14:55:24 +11:00
Ronnie Sahlberg	5f76f3c0e2	Add a new tunable : DisableIPFailover that when set to non 0 will stopp any ip reallocations at all from happening. (This used to be ctdb commit d8d37493478a26c5f1809a5f3df89ffd6e149281)	2010-11-10 14:55:24 +11:00
Ronnie Sahlberg	87a0ece976	when creating/adding a public ip, set the initial interface to be the first interface specified (This used to be ctdb commit 4308935ba48ac7a29e7523315acf580019715f0f)	2010-11-10 14:55:23 +11:00
Ronnie Sahlberg	d8d8b9e1d7	add a new serverid to send a message everytime an ip address is taken on the local node (This used to be ctdb commit 1261f3d9702800a4e59550c881350daf479f00ef)	2010-09-13 15:43:19 +10:00
Ronnie Sahlberg	19211f99c8	remove an unused variable (This used to be ctdb commit e07fdbaf12bbe84370bc47a1979fe198a06a6cc8)	2010-09-13 13:13:12 +10:00
Ronnie Sahlberg	c95f4258d8	Add a new event "ipreallocated" This is called everytime a reallocation is performed. While STARTRECOVERY/RECOVERED events are only called when we do ipreallocation as part of a full database/cluster recovery, this new event can be used to trigger on when we just do a light failover due to a node becomming unhealthy. I.e. situations where we do a failover but we do not perform a full cluster recovery. Use this to trigger for natgw so we select a new natgw master node when failover happens and not just when cluster rebuilds happen. (This used to be ctdb commit 7f4c591388adae20e98984001385cba26598ec67)	2010-08-30 18:09:30 +10:00
Ronnie Sahlberg	2e8aac6689	Merge commit 'rusty/ports-from-1.0.112' into foo (This used to be ctdb commit 13e58d92f5f1723e850a82ae030d0ca57e89b1ee)	2010-08-19 13:17:56 +10:00
Ronnie Sahlberg	5aa5f3e7bf	Remove the structure ctdb_control_tcp_vnn since this is identical to the structure ctdb_tcp_connection. Add a new "ctdb deltickle" command to delete tickles from the database. This can ONLY be used for tickles created by "ctdb addtickle". Push any "addtickle/deltickle" updates to other nodes every TickleUpdateInterval seconds' (This used to be ctdb commit acded034e2f0dcae4c2c9e54e16a001caf23caec)	2010-08-18 12:36:03 +10:00
Rusty Russell	1a009aff73	takeover: prevent crash by avoiding free in traverse on RST timeout After 5 attempts to send a RST to a client without any response, we free "con"; this is done during a traverse. This frees the node we are walking through (the node is made a child of "con" down in rb_tree.c's trbt_create_node() (Valgrind would catch this, as Martin confirmed). So, we create a temporary parent and reparent onto that; then we free that parent after the traverse, thus deleting the unwanted nodes. CQ:S1019041 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 08f7f85477610a4916c1ec866aa467b28f1bbec3)	2010-08-18 11:40:17 +09:30
Rusty Russell	f93440c4b7	event: Update events to latest Samba version 0.9.8 In Samba this is now called "tevent", and while we use the backwards compatibility wrappers they don't offer EVENT_FD_AUTOCLOSE: that is now a separate tevent_fd_set_auto_close() function. This is based on Samba version `7f29f817fa`. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 85e5e760cc91eb3157d3a88996ce474491646726)	2010-08-18 09:16:31 +09:30
Ronnie Sahlberg	4136f27145	When adding an ip at runtime, it might not yet have an iface assigned to it, so ensure that the next takover_ip call will fall through to accept the ip and add it. (This used to be ctdb commit 2d60f96680d16c2992e2a35517822f88c12538b7)	2010-06-01 16:22:48 +10:00
Ronnie Sahlberg	92340e4d6f	check if vnn is a valid pointer before dereferencing it based on rustys patch for bz62783 (This used to be ctdb commit bdd250b9afdd1060cfd1e2b0f0a5a567150bb380)	2010-05-26 13:43:28 +10:00
Ronnie Sahlberg	4a43428440	The recent change to the recovery daemon to keep track of and verify that all nodes agree on the most recent ip address assignments broke "ctdb moveip ..." since that call would never trigger a full takeover run and thus would immediately trigger an inconsistency. Add a new message to the recovery daemon where we can tell the recovery daemon to update its assignments. BZ62782 (This used to be ctdb commit e7069082e5f0380dcddee247db8754218ce18cab)	2010-05-03 15:47:17 +10:00
Ronnie Sahlberg	c3c7aa934f	Make create_merged_ip_list() a static function since it is not called from outside of ctdb_takeover.c (This used to be ctdb commit 880896a27adfdd5173b2810b6b2f3889802046f0)	2010-05-03 15:47:06 +10:00
Ronnie Sahlberg	79fac9771d	In the log message when we have found an inconsistent ip address allocation, add extra log information about what the inconsistency is. (This used to be ctdb commit d2e4a9912c4bd13eb4f12681adebe7e59a6d1fb2)	2010-05-03 15:46:36 +10:00
Ronnie Sahlberg	06885ea9a7	In the recovery daemon, keep track of which node we have assigned public ip addresses and verify that the remote nodes have/keep a consistent view of assigned addresses. If a remote node has an inconsistent view of addresses visavi the recovery master this will trigger a full ip reallocation. (This used to be ctdb commit f3bf2ab61f8dbbc806ec23a68a87aaedd458e712)	2010-04-08 14:25:26 +10:00
Ronnie Sahlberg	7f2f7364ad	lower the loglevel for a debug message for redundant releases of public ips (This used to be ctdb commit cfc1a4f878b61c85063af649d2339431e799647d)	2010-02-16 11:01:09 +11:00
Stefan Metzmacher	76cb4ce34c	server: ban ourself if the ctdb and kernel knowledge of a public ip differs metze (This used to be ctdb commit 48e0af91113d6cead6cae3f28d8d8f610cacaa71)	2010-01-20 11:11:04 +01:00
Stefan Metzmacher	405368eeb0	server: give an error if we're getting an takeover_ip event with a wrong pnn metze (This used to be ctdb commit 2f44d6f3d290cc1b37b19ec34edfbad12cc0c0a7)	2010-01-20 11:11:04 +01:00
Stefan Metzmacher	a5ba5c129a	server: return an error if we get an takeover ip event and we cannot serve the ip metze (This used to be ctdb commit f5c221e6abc118aefa489aa7e07755af952fd2bb)	2010-01-20 11:11:03 +01:00
Stefan Metzmacher	55d824bd77	server: print node number as signed integer on release ip event metze (This used to be ctdb commit 6c456face30606641f6b8beaad3121c9b05ca763)	2010-01-20 11:11:03 +01:00
Stefan Metzmacher	c5e579b56a	server: debug redundant takeover ip events with level INFO metze (This used to be ctdb commit 7bc9969c4c28f2c4a4848bd730db3c63bb9204fe)	2010-01-20 11:11:03 +01:00
Stefan Metzmacher	ffdf32dedf	server: be less verbose on redundant release_ip events metze (This used to be ctdb commit 72ef5f891f85ce51f5ca7e0c03d0c7cc955be110)	2010-01-20 11:11:03 +01:00
Stefan Metzmacher	58d7c44b1c	server: add a ctdb_do_updateip() metze (This used to be ctdb commit eded224368dded2264e53546c196b1b485cb2094)	2010-01-20 11:11:02 +01:00
Stefan Metzmacher	aa485b17bb	server: split out a ctdb_do_takeover_ip() function metze (This used to be ctdb commit 8fd6f4aab0c173b4c9c4c02c546e7d2ec1a98423)	2010-01-20 11:11:02 +01:00
Stefan Metzmacher	da59e0b162	server: split out a ctdb_announce_vnn_iface() function metze (This used to be ctdb commit ec87a51660cfa8a6851923f757fed31f7ffc7153)	2010-01-20 11:11:02 +01:00
Stefan Metzmacher	179c098e86	server: start with disabled interfaces and let the event scripts enable the interfaces explicit This makes sure that we don't get public addresses assigned during the initial recovery and remove them again in the startup event. metze (This used to be ctdb commit f872e8c63a2f8979e6a0d088630575bdd4d7b4f1)	2010-01-20 11:11:01 +01:00
Stefan Metzmacher	f4f72024fe	server: implement ctdb_control_set_iface_link() This only marks the interface status and doesn't generate any directly triggered action. The actions is later taken by the recovery process in verify_ip_allocation. metze (This used to be ctdb commit cff58b27c970e9252d131125941c372019fd6660)	2010-01-20 11:10:59 +01:00
Stefan Metzmacher	0dd7e1bfa1	server: implement ctdb_control_get_ifaces() metze (This used to be ctdb commit 0e982a416a126d9856145c19baef320cd0e71d66)	2010-01-20 11:10:59 +01:00
Stefan Metzmacher	80e3ab04de	server: implement ctdb_control_get_public_ip_info() metze (This used to be ctdb commit 486fbd15f4cc4f45a4c110b2ddbba48bade22c9f)	2010-01-20 11:10:59 +01:00
Stefan Metzmacher	32d00d0a0d	controls: add stups for GET_PUBLIC_IP_INFO, GET_IFACES and SET_IFACE_LINK_STATE metze (This used to be ctdb commit a2c9e4578e149eccb2c6183f64a6b657eb95c5e1)	2010-01-20 11:10:59 +01:00
Stefan Metzmacher	37880b0d0a	server: use CTDB_PUBLIC_IP_FLAGS_ONLY_AVAILABLE during a takeover run We know ask for the known and available interfaces. This means a node gets a RELEASE_IP event for all interfaces it "knows", but doesn't serve and a node only gets a TAKE_IP event for "available" interfaces. metze (This used to be ctdb commit a695a38e49e7c3e15a9706392dc920eeab1f11ba)	2010-01-20 11:10:59 +01:00
Stefan Metzmacher	d89604afab	server: implement CTDB_PUBLIC_IP_FLAGS_ONLY_AVAILABLE behavior metze (This used to be ctdb commit 09a5c59bc8d1301edf60d7ae77504dc6d11a7da2)	2010-01-20 11:10:58 +01:00
Stefan Metzmacher	bea53c60b8	server: keep the interface information in a list of ctdb_iface structures metze (This used to be ctdb commit ff5291778f0752e176539397e9530dcf0e546bea)	2010-01-20 11:10:58 +01:00
Stefan Metzmacher	539ebdc94c	server: we don't need to copy strings we pass as talloc_asprintf() arguments metze (This used to be ctdb commit 080ba5ac2195fb73ef6f18740abdde57a7b97151)	2010-01-20 11:10:58 +01:00
Stefan Metzmacher	a1da4e05b5	server: allow multiple interfaces comma separated in public_addresses metze (This used to be ctdb commit 33a00ef7233051acdbc66410130ec5d876a8422f)	2010-01-20 11:10:58 +01:00
Stefan Metzmacher	8d50eda2b1	server: add a ctdb_vnn_iface_string() helper function to access vnn->iface metze (This used to be ctdb commit 9e5532e215892b2e0aadd9b106a730727f92c62e)	2010-01-20 11:10:58 +01:00
Stefan Metzmacher	bec35e6441	server: add a ctdb_set_single_public_ip() helper function metze (This used to be ctdb commit 400b4806c4a9686a2ee6398b5d7c3e0ca0793fd1)	2010-01-20 11:10:57 +01:00
Rusty Russell	928b8dcb31	eventscript: handle banning within the callbacks Currently the timeout handler in eventscript.c does the banning if a timeout happens. However, because monitor events are different, it has to special case them. As we call the callback anyway in this case, we should make that handle -ETIME as it sees fit: for everyone but the monitor event, we simply ban ourselves. The more complicated monitor event banning logic is now in ctdb_monitor.c where it belongs. Note: I wrapped the other bans in "if (status == -ETIME)", though they should probably ban themselves on any error. This change should be a noop. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 9ecee127e19a9e7cae114a66f3514ee7a75276c5)	2009-12-07 23:48:57 +10:30
Ronnie Sahlberg	569001afd0	Merge commit 'martins/status-test-2' Conflicts: server/eventscript.c (This used to be ctdb commit e9b3477a5b9a2eff18f727e7d59338bfb5214793)	2009-12-01 10:53:18 +11:00
Martin Schwenke	a64ccf07c1	Add flag to ctdb_event_script_callback indicating when called by client. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit a1d654a982ca56fade82552f4e6b5586236d3233)	2009-11-26 15:49:49 +11:00
Ronnie Sahlberg	926261aafc	use a binary tree and sort all ipv4/v6 addresses before we assign them out on nodes. (This used to be ctdb commit 862526e558099fad4c8259cb88da9b776aa7f80d)	2009-11-25 11:54:40 +11:00
Rusty Russell	2d9254404d	eventscript: introduce enum for different event script calls. Rather than doing strcmp everywhere, pass an explicit enum around. This also subtly documents what options are available. The "options" arg is now used for extra arguments only. Unfortunately, gcc complains on empty format strings, so we make ctdb_event_script() take no varargs, and add ctdb_event_script_args(). We leave ctdb_event_script_callback() taking varargs, which means callers have to do "%s", "". For the moment, we have CTDB_EVENT_UNKNOWN for handling forced scripts from the ctdb tool. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 8001488be4f2beb25e943fe01b2afc2e8779930d)	2009-11-24 11:16:49 +10:30
Rusty Russell	2763df22de	eventscript: put timeout inside ctdb_event_script_callback_v Everyone uses the same timeout value, so just remove it from the API. If we ever need variable timeouts, that might as well be central too. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 533c3e053293941d2a9484b495e78d45f478bb08)	2009-11-24 11:09:46 +10:30
Ronnie Sahlberg	e07ca41886	change the eventscript handling to allow EventScriptTimeout for each individual script isntead of for the entire set of scripts restructure the talloc hierarchy to allow this (This used to be ctdb commit 64da4402c6ad485f1d0a604878a7b0c01a0ea5f0)	2009-10-28 16:11:54 +11:00
Ronnie Sahlberg	902c476c03	From Volker L Fix some warnings and an incorrect check for a talloc failure (This used to be ctdb commit 27296a47b3d057a6729287acf128b2b67775ecde)	2009-10-22 12:19:40 +11:00
Ronnie Sahlberg	50712d48d3	change some loglevels and also pront the pnn of the ip for takeip/releaseip logging (This used to be ctdb commit 9d95dfbd12898975ba0d8560d95a974210d3de7c)	2009-10-06 11:40:38 +11:00
Ronnie Sahlberg	3133dadd8f	allocate takeoverip state as a child of vnn and also make the takeocerip context a child of vnn (This used to be ctdb commit 804e5905be51f43c8a338bfbe216fd8d5718850f)	2009-10-06 09:35:15 +11:00
Ronnie Sahlberg	263d76f8c2	lower the loglevel for the info messages that a public ip is not hosted locally for takeip/releaseip (This used to be ctdb commit f76132b0d555e52ee0a379ec2c156350b37b0280)	2009-09-04 04:09:30 +10:00
Ronnie Sahlberg	1593e67399	send ARPs with an interval of 1.1 seconds during ip takeover. this is to better handle linux clients which often default to ignore grat arps that arrive within 1 second of eachother. (This used to be ctdb commit 5664da36943b4901a807a9594b0f45e859aafbf3)	2009-07-07 11:40:01 +10:00
Ronnie Sahlberg	b046f5e3aa	when adding an ip, try manually adding and takingover the ip instead of triggering a full recovery to do the same thing (This used to be ctdb commit 4d5d22e64270cfb31be6acd71f4f97ec43df5b2c)	2009-06-05 17:00:47 +10:00
Ronnie Sahlberg	e6170b5389	add a new node state : DELETED. This is used to mark nodes as being DELETED internally in ctdb so that nodes are not renumbered if / when they are removed from the nodes file. This is used to be able to do "ctdb reloadnodes" at runtime without causing nodes to be renumbered. To do this, instead of deleting a node from the nodes file, just comment it out like 1.0.0.1 #1.0.0.2 1.0.0.3 After removing 1.0.0.2 from the cluster, the remaining nodes retain their pnn's from prior to the deletion, namely 0 and 2 Any line in the nodes file that is commented out represents a DELETED pnn (This used to be ctdb commit 6a5e4fd7fa391206b463bb4e976502f3ac5bd343)	2009-06-01 14:18:34 +10:00
Sumit Bose	2fcedf6dac	add missing checks on so far ignored return values Most of these were found during a review by Jim Meyering <meyering@redhat.com> (This used to be ctdb commit 3aee5ee1deb4a19be3bd3a4ce3abbe09de763344)	2009-05-21 11:22:21 +10:00
Ronnie Sahlberg	9a3e19658d	Change the loglevel of "registered tcp client for ..." to INFO instead of ERR (This used to be ctdb commit 92b5580c38c23b99c1692708540983b0c0fcd6cf)	2009-05-19 08:55:42 +10:00
Michael Adam	3cca0f75e4	Fix treatment of link local ipv6 addresses: set the scope id. metze / Michael Signed-off-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit 9d12de1ca6107801dada927729e755c0949d73bf)	2009-01-19 22:50:53 +01:00
Stefan Metzmacher	23b550d6fc	Fix segfault in ip takeover fallback code. metze Signed-off-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit 3b88f3dec5227e8579672974f7028fb356ee1d94)	2009-01-16 07:22:59 +11:00
root	321866dbba	finish the ipv6 support. allow clients to register either ipv4 or ipv6 client connections to the tickles list (This used to be ctdb commit d9b44d7c3255b0fd7359b9afeb613e6ff4c4eaac)	2009-01-13 16:17:20 +11:00
Ronnie Sahlberg	b9bd20ce55	add a context and a timed event so that once we have been in recovery mode for too long we drop all public ip addresses (This used to be ctdb commit 403c68f96e1380dd07217c688de2730464f77ea0)	2008-10-22 11:04:41 +11:00
Ronnie Sahlberg	233b0e5cbb	lower the loglevel for the informational message that a TCP_ADD opeation described an ip address not known to be a public address. This could happen if someone for genuine reasons accesses a share through a static ip address. It can also happen if non homogenous public address configurations are used and when a tcp description is pushed out to a different node that does not server/know the specific ip address. (This used to be ctdb commit 9b1d089c99413f3681440f3cf33c293d118c9108)	2008-10-15 03:02:09 +11:00
Ronnie Sahlberg	cb300382b0	update TAKEIP/RELEASEIP/GETPUBLICIP/GETNODEMAP controls so we retain an older ipv4-only version of these controls. We need this so that we are backwardcompatible with old versions of ctdb and so that we can interoperate with a ipv4-only recmaster during a rolling upgrade. (This used to be ctdb commit 6b76c520f97127099bd9fbaa0fa7af1c61947fb7)	2008-10-14 10:40:29 +11:00
Ronnie Sahlberg	3411e98e14	skip empty lines in the public addresses file, not skip all non-empty lines (This used to be ctdb commit dc108adada33bb713f71a2859eda3b439ed0cd1a)	2008-10-07 19:34:34 +11:00

... 2 3 4 5 6 ...

430 Commits