samba-mirror

mirror of https://github.com/samba-team/samba.git synced 2024-12-23 17:34:34 +03:00

Author	SHA1	Message	Date
Martin Schwenke	c9056b4f88	recoverd: Remove unused mask argument from IP allocation functions This is a no-op and is in a separate commit to make the previous commit less cumbersome. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 107e656bbe24f9d21fbaf886a3e9417da4effe5a)	2013-05-07 16:20:47 +10:00
Martin Schwenke	0445c988e2	recoverd: Fix tunable NoIPTakeoverOnDisabled, rename to NoIPHostOnAllDisabled This really needs to be per-node. The rename is because nodes with this tunable switched on should drop IPs if they become unhealthy (or disabled in some other way). * Add new flag NODE_FLAGS_NOIPHOST, only used in recovery daemon. * Enhance set_ipflags_internal() and set_ipflags() to setup NODE_FLAGS_NOIPHOST depending on setting of NoIPHostOnAllDisabled and/or whether nodes are disabled/inactive. * Replace can_node_servce_ip() with functions can_node_host_ip() and can_node_takeover_ip(). These functions are the only ones that need to look at NODE_FLAGS_NOIPTAKEOVER and NODE_FLAGS_NOIPHOST. They can make the decision without looking at any other flags due to previous setup. * Remove explicit flag checking in IP allocation functions (including unassign_unsuitable_ips()) and just call can_node_host_ip() and can_node_takeover_ip() as appropriate. * Update test code to handle CTDB_SET_NoIPHostOnAllDisabled. Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 1308a51f73f2e29ba4dbebb6111d9309a89732cc)	2013-05-07 16:20:46 +10:00
Martin Schwenke	ac80824709	recoverd: Factor out new function all_nodes_are_disabled() Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 12aef10e9889760d98f58c8d916f19d069fa381a)	2013-05-07 16:20:46 +10:00
Martin Schwenke	657162fb34	recoverd: Refactor code to get NoIPTakeover tunable from all nodes Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 1fb5352d2b6918fcc6f630db49275d25a3eebe8d)	2013-05-07 16:20:46 +10:00
Martin Schwenke	17521b31b2	recoverd: Add debug message when dropping IPs in IP allocation Update tests accordingly. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 91405282ba4abad4ad8e8c5f7ee4c83c75f38280)	2013-05-07 16:20:46 +10:00
Martin Schwenke	745c6bc363	recoverd: ctdb_takeover_run() uses CTDB_CONTROL_IPREALLOCATED This means "ipreallocated" is now run on stopped nodes. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 83b61f7414b1f7a3424497ac987ca0724fba9eaa)	2013-05-06 13:38:21 +10:00
Martin Schwenke	2e59cd5428	ctdbd: New control CTDB_CONTROL_IPREALLOCATED This is an alternative to using ctdb_run_eventscripts() that can be used when in recovery. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 27a44685f0d7a88804b61a1542bb42adc8f88cb1)	2013-05-06 13:38:21 +10:00
Amitay Isaacs	77a29b3733	recoverd/takeover: Use IP->node mapping info from nodes hosting that IP When collating IP information for IP layout, only trust the nodes that are hosting an IP, to have correct information about that IP. Ignore what all the other nodes think. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 1c7adbccc69ac276d2b957ad16c3802fdb8868ca)	2013-04-08 11:14:32 +10:00
Martin Schwenke	53bd183683	recoverd: Separate each IP allocation algorithm into its own function This makes the code much more readable and maintainable. As a side effect, fix a memory leak in LCP2. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 6a1d88a17321f7e1dc84b4823d5e7588516a6904)	2013-01-08 10:16:11 +11:00
Martin Schwenke	2e8df43561	recoverd: New function unassign_unsuitable_ips() Move the code into a new function so it can be called from a number of places. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 8adb255e62dbe60d1e983047acd7b9c941231d11)	2013-01-08 10:16:11 +11:00
Martin Schwenke	bcefb76884	recoverd: Move failback retry loop into basic_failback() and lcp2_failback() The retry loop is currently in ctdb_takeover_run_core(). Pushing it into each function will make it possible to put each algorithm into a separate top-level function. This will make the code much clearer and more maintainable. Also keep associated test code compatible. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit f6ce18d011dd9043b04256690d826deb2640cd89)	2013-01-08 10:16:11 +11:00
Martin Schwenke	443fbb9e01	recoverd: Trying to failback more IPs no longer allocates unassigned IPs Neither basic_failback() nor lcp2_failback() unassign IPs anymore, so there's no point looping back that far. Also fix a unit test that now fails because looping back to handle unassigned IPs is no longer logged. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit c09aeaecad7d3232b1c07bab826b96818756f5e0)	2013-01-08 10:16:11 +11:00
Martin Schwenke	dfa7ce7b73	recoverd: basic_failback() can call find_takeover_node() directly Instead of unassigning, looping back and depending on basic_allocate_unassigned. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 4dc08e37dec464c8785a2ddae15c7c69d3c81ac3)	2013-01-08 10:16:11 +11:00
Martin Schwenke	326328d520	recoverd: Don't do failback at all when deterministic IPs are in use This seems to be the right thing to do instead of calling into the failback code and continually skipping the release of an IP. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 4c87e7cb3fa2cf2e034fa8454364e0a7fe0c8f81)	2013-01-08 10:16:11 +11:00
Martin Schwenke	ef403f70f2	recoverd: Move the test for both 'DeterministicIPs' and 'NoIPFailback' set If this is done earlier then some other logic can be improved. Also, this should be a warning since no error condition is set. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit e06476e07197b7327b8bdac9c0b2e7281798ffec)	2013-01-08 10:16:11 +11:00
Martin Schwenke	a3911ed7bf	recoverd: Fix a memory leak in IP allocation Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit bcd5f587aff3ba536cb0b5ef00d2d802352bae25)	2013-01-08 10:16:11 +11:00
Martin Schwenke	4f0d68cba6	ctdbd: Clean up orphaned interfaces when an IP is deleted Add a new function ctdb_remove_orphaned_ifaces() and call it in ctdb_control_del_public_address(). ctdb_remove_orphaned_ifaces() uses a naive implementation that does things in a very obvious way. There are many ways to improve the performance - some are mentioned in a comment in the code. However, I doubt that this will be a bottleneck even with a large number of public IPs. Running the eventscript is likely to outweigh the cost of this cleanup. Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit cc1a3ae911d3fee8b87fda5de5ab6d9499d7510a)	2013-01-07 12:19:33 +11:00
Martin Schwenke	0f1bcebc80	ctdbd: Make the link status of new interfaces more flexible Neither up nor down is a good default value for the link status of a new interface. Up means that IPs can be assigned to interfaces before the true state is known and they can move away quickly if the interface is actually down. Down means that IPs can't be assigned to an interface for a variable amount of time - until a monitor cycle occurs - and this can result in imbalanced IPs. This is a neat compromise. Before the startup event completes, IPs can't be assigned to interfaces because all interfaces begin in a down state. As soon as the startup event completes, IPs can be allocated to any interface that has been marked up by the eventscript. Later, during normal operation, newly added IPs can be assigned to new interfaces immediately. The IPs will still move away if an interface is noticed to be down in the next monitor cycle, but that is the exception rather than the rule. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 9275a69a414482f1053ae14528d5972575b9214e)	2012-11-19 15:53:13 +11:00
Amitay Isaacs	85c8deca3f	recoverd: Track the nodes that fail takeover run and set culprit count If any of the nodes fail takeover run (either due to timeout or failure to complete within takeover_timeout interval) from main loop, recovery master will give up trying takeover run with following message: "Unable to setup public takeover addresses. Try again later" And as a side-effect the monitoring is disabled on all the nodes. Before ctdb_takeover_run() is called from main loop, monitoring get disabled via startrecovery event. Since ctdb_takeover_run() fails, it never runs recovered event and monitoring does not get re-enabled. In main_loop, ctdb_takeover_run() is called with a takeover_fail_callback. This callback will get called if any of the nodes fail in handling takeip/releaseip/ipreallocated events in ctdb_takeover_run(). Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit a5c6bb1fffb8dc3960af113957a1fd080cc7c245)	2012-11-14 10:59:54 +11:00
Martin Schwenke	62046a8a4c	recoverd: When starting a takeover run disable IP verification Disable for TakeoverTimeout seconds. Otherwise the the recovery daemon can get overzealous and start trying to add/delete addresses that it thinks are missing but where the eventscript just hasn't finished. This didn't used to matter so much but it is more important now that concurrent takeip/releaseip/updateip generate error - we want to avoid spamming the log. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 56fcee3c7730cb12fa666072d5400949af6e5f7c)	2012-10-11 12:10:45 +11:00
Martin Schwenke	4b4e4d8870	ctdbd: Stop takeovers and releases from colliding in mid-air There's a race here where release and takeover events for an IP can run at the same time. For example, a "ctdb deleteip" and a takeover initiated by the recovery daemon. The timeline is as follows: 1. The release code registers a callback to update the VNN. The callback is executed after the eventscripts run the releaseip event. 2. The release code calls the eventscripts for the releaseip event, removing IP from its interface. The takeover code "updates" the VNN saying that IP is on some iface.... even if/though the address is already there. 3. The release callback runs, removing the iface associated with IP in the VNN. The takeover code calls the eventscripts for the takeip event, adding IP to an interface. As a result, CTDB doesn't think it should be hosting IP but IP is on an interface. The recovery daemon fixes this later... but it shouldn't happen. This patch can cause some additional noise in the logs: Release of IP 10.0.2.133/24 on interface eth2 node:2 recoverd:We are still serving a public address '10.0.2.133' that we should not be serving. Removing it. Release of IP 10.0.2.133/24 rejected update for this IP already in flight recoverd:client/ctdb_client.c:2455 ctdb_control for release_ip failed recoverd:Failed to release local ip address In this case the node has started releasing an IP when the recovery daemon notices the addresses is still hosted and initiates another release. This noise is harmless but annoying. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit bfe16cf69bf2eee93c0d831f76d88bba0c2b96c2)	2012-10-11 12:10:45 +11:00
Martin Schwenke	79ea15bf96	ctdbd: New tunable NoIPTakeoverOnDisabled Stops the behaviour where unhealthy nodes can host IPs when there are no healthy nodes. Set this to 1 when an immediate complete outage is preferred when all nodes are unhealthy. The alternative (i.e. default) can lead to undefined behaviour when the shared filesystem is unavailable. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit a555940fb5c914b7581667a05153256ad7d17774)	2012-10-11 12:10:45 +11:00
Martin Schwenke	9aa9abcc19	ctdbd: Avoid unnecessary updateip event The existing code makes one fatally bad assumption: vnn->iface->references can never be -1 (or max-unit32_t in this case). Right now the reference counting is broken so a reference count of -1 is possible and causes a spurious updateip when vnn->iface is the same as best_face. This can occur frequently because we get a lot of redundant takeovers, especially when each IP can only be hosted on one interface. This makes the code much more defensive by noting that when best_iface is the same as vnn->iface there is never a need for an updateip event. This effectively neuters the updateip code path when IPs can only be hosted by a single interface. This should obsolete 6a74515f0a1e24d97cee3ba05d89133aac7ad2b7. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 7054e4ded59c6b8f254dcfefaef64da05f25aecd)	2012-10-10 14:54:53 +11:00
Amitay Isaacs	3c1f656764	Revert "when creating/adding a public ip, set the initial interface to be the first interface specified" This reverts commit 4308935ba48ac7a29e7523315acf580019715f0f. This fixes 16_ctdb_config_add_ip.sh test when run against local daemons. When running against local daemons, if the interface is assigned as soon as an IP is added, then takeover would never assign this IP address. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 06dfd13604d08910e07cbf927c338d7b9fce9a2f)	2012-10-07 15:25:34 +11:00
Martin Schwenke	7df1da1c91	recoverd: Update a log message that has bit-rotted This message used to be correct because the ipreallocated event only handled updating the NAT gateway. However, that has changed so the message needs to be updated. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit cc9d96f4248e45ea99c5f00db1526426ac26fbc2)	2012-08-08 16:11:11 +10:00
Martin Schwenke	75a0041567	ctdbd: Fix ctdb_control_release_ip() on local daemons When running on local daemons no IPs are actually assigned to interfaces. Commit 9a806dec8687e2ec08a308853b61af6aed5e5d1e broke ctdb_control_release_ip() for local daemons because it asks the system which interface the given IP is on, instead of the old behaviour of trusting CTDB's internal records. For local deamons (i.e. !ctdb->do_checkpublicip) revert to the old behaviour of looking up the interface internally. This is good enough, given that the tests don't tend to misconfigure the addresses. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 38e8651b955afdbaf0ae87c24c55c052f8209290)	2012-07-26 22:10:54 +10:00
Amitay Isaacs	e379fc3ea5	Fix compiler warnings. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit d29e1880c8ce7219e065d31b47b0e8ad9e83146d)	2012-07-13 14:50:56 +10:00
Ronnie Sahlberg	c7e648c2d1	When we release an ip, get the interface name from the kernel instead of using the interface where ctdb thinks the ip is hosted at. The difference is that this now allows us to handle cases where we want to release an ip but ctdbd does not know which interface the ip is assigned on. (user has used 'ip addr add...' and manually assigned an ip to the wrong interface) (This used to be ctdb commit c6bf22ba5c01001b7febed73dd16a03bd3fd2bed)	2012-06-20 15:11:56 +10:00
Amitay Isaacs	7631830152	server: Replace BOOL datatype with bool, True/False with true/false Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 6e5cbe8fff71985e5a2fc16b7e9f2b868011ff5d)	2012-05-28 11:22:25 +10:00
Ronnie Sahlberg	a57eba2bb4	Track all child process so we never send a signal to an unrelated process (our child died and kernel wrapped the pid-space and reused the pid for a different process Wrap all creation of child processes inside ctdb_fork() which is used to track all processes we have spawned. Capture SIGCHLD to track also which child processes have terminated. Wrap kill() inside ctdb_kill() and make sure that we never send a !0 signal to a child process pid that has already terminated (and might have been replaced with a (This used to be ctdb commit f73a4b1495830bcdd094a93732a89dd53b3c2f78)	2012-05-03 14:03:26 +10:00
Ronnie Sahlberg	a367fa6138	RELOADIPS: simplify the reloadips code a bit and also update the "read public address file" to not check if the address exists already locally when we read if from the child process, to stop it from spamming the logs with "We already host ..." messages (This used to be ctdb commit 334ea830f1bf33419f4a1e78f23afd41a852d0f4)	2012-05-01 15:34:26 +10:00
Ronnie Sahlberg	7a1aa560e7	Add new control to reload the public ip address file on a node Also add a method to use the recovery master/daemon to reload the public ips on all nodes in the cluster. Reloading the public ips on all node sin the cluster is only suported if all nodes in the cluster are available and healthy. (This used to be ctdb commit 05603e914f8c12618d7e06943c0f7df207f645b0)	2012-05-01 10:48:08 +10:00
Ronnie Sahlberg	db411aaada	Merge remote branch 'amitay/tevent-sync' (This used to be ctdb commit 17ff3f240b0d72c72ed28d70fb9aeb3b20c80670)	2012-04-26 08:09:23 +10:00
Amitay Isaacs	4392591555	Remove explicit include of lib/tevent/tevent.h. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 0681014ca5ed2a9b56f63fdace7f894beccf8a9a)	2012-04-13 17:28:14 +10:00
Amitay Isaacs	b3d098ced7	ctdbd: Fix spurious warnings when running with --nopublicipcheck Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 67b909a0718d6cfce82ffce0830da3a6ff1f6c4b)	2012-04-13 15:38:11 +10:00
Amitay Isaacs	425b8768ee	ctdbd: Fix the error message string Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 15f63ebab9686734f41a6adf38d4a7faa919ac66)	2012-04-13 14:51:13 +10:00
Ronnie Sahlberg	2456f77ca6	NoIPTakeover: change the tunable name for the "dont allow failing addresses over onto the node" to NoIPTakeover (This used to be ctdb commit 35592e618cfd827b6978af6332f80504f232c46a)	2012-03-22 11:05:15 +11:00
Ronnie Sahlberg	9f31f76805	NoIPFailback: Exclude nodes which have NoIPFailback as failback targets during reallocation (This used to be ctdb commit c262c29773d1608e7ce04bdfb7f4469df0a9637b)	2012-03-22 09:24:32 +11:00
Ronnie Sahlberg	befa9df152	Make NoIPFailback a node local setting. Nodes that have NoIPFailback set to !0 can not takeover new ip addresses during failover. Remove the old global setting for this unused tunable and add it as a new node flag. This node flag is only valid/defined within the takeover subsystem in the recovery daemon. Add async functions to collec the NoIPFailback settings for each node. This will later e used to disqualify certain nodes from being takeover targets when we perform reallocation. (This used to be ctdb commit 668f3e88a9e5f598706952b7140547640c85a5ed)	2012-03-22 09:09:57 +11:00
Ronnie Sahlberg	ef2bd0b016	When adding ips to nodes, set up a deferred rebalance for the whole node to trigger after 60 seconds in case the normal ipreallocated is not sufficient to trigger rebalance. (This used to be ctdb commit 4340263b219d75c39f8de22abe3f6f1c1ee63ea2)	2012-02-28 06:56:04 +11:00
Ronnie Sahlberg	91c9371f2d	Make KILLTCP structure a child of VNN so that it is freed at the same time the referenced VNN structure is. Also, remove the circular reference between the two objects KIPPCTP and VNN (This used to be ctdb commit 02b62482164a3c69715949074feb7f191a29d534)	2012-02-27 07:21:26 +11:00
Volker Lendecke	5e3b13a32a	FreeBSD does not define s6_addr32, only s6_addr Signed-off-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit d657af4fb68ce3f7c462856f2934f6bf169e120b)	2012-02-13 16:20:12 +01:00
Martin Schwenke	3ae8273d86	Make some ctdb_takeover.c functions static These were intentionally not static so they could be linked to in unit test programs. However, using the CCAN-style unit tests where relevant code is just included, this is no longer necessary. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit d0e9e8554614bd49ffb9ec3509feaa0e80d0f65d)	2011-11-11 14:41:47 +11:00
Ronnie Sahlberg	8db9b73920	Merge remote branch 'martins/lcp2fix' (This used to be ctdb commit 7c02d242af552aa732f5c70ea4eeefbc8a8542e2)	2011-11-08 14:06:30 +11:00
Ronnie Sahlberg	0f92fa224c	RB_TREE: Add mechanism to abort a traverse This patch changes the callback signature for traversal functions to allow a client to abort a traverse before it finishes. Updates to all callers and examples as well as rb-test tool. (This used to be ctdb commit 8ab0c63ad36cfbbb1e5fed46a1f4c47b1fdb581f)	2011-11-08 13:40:28 +11:00
Martin Schwenke	c0939af571	LCP IP allocation algorithm - try harder to find a candidate source node There's a bug in LCP2. Selecting the node with the highest imbalance doesn't always work. Some nodes can have a high imbalance metric because they have a lot of IPs. However, these nodes can be part of a group that is perfectly balanced. Nodes in another group with less IPs might actually be imbalanced. Instead of just trying the source node with the highest imbalance this tries them in descending order of imbalance until it finds one where an IP can be moved to another node. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 574091d5aced5e87aefad52f8bc47aa75c25fbf6)	2011-11-02 10:17:00 +11:00
Martin Schwenke	98c27f973d	LCP IP allocation algorithm - new function lcp2_failback_candidate() There's a bug in LCP2. Selecting the node with the highest imbalance doesn't always work. Some nodes can have a high imbalance metric because they have a lot of IPs. However, these nodes can be part of a group that is perfectly balanced. Nodes in another group with less IPs might actually be imbalanced. Factor out the code from lcp2_failback() that actually takes a node and decides which address should be moved to which node. This is the first step in fixing the above bug. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 75718c5768b5bb5c0bcd7dd90e0327c6ed22a63d)	2011-11-01 21:01:25 +11:00
Ronnie Sahlberg	d79596ba1a	One of the entry points to release an ip reset the pnn field before invoking the eventscript. this triggered a check for "only run the eventscript if we host the address" to trigger and shortcir=cuit calling the eventscript. An effect of this would be that 'ctdb delip' would remove the ip from ctdb, but fail to delete it from the interface. S1028798 (This used to be ctdb commit b82524f240bf21769dd7624ca6026763d38b9396)	2011-09-22 15:17:23 +10:00
Ronnie Sahlberg	4587bdb052	when checking that the interfaces exist in ctdb_add_public_address() cant talloc off vnn since it is not yet initialized and might not always be NULL (This used to be ctdb commit 3d37be3e2bfb61ede824028aeebaa18ba304faae)	2011-09-21 11:42:19 +10:00
Ronnie Sahlberg	783ceca07b	Interface monitoring: add a event to trigger every 30 seconds to check that all interfaces referenced by the public address list actually exists. This will make it much easier to root-cause problems such as S1029023 when an external application deleted the interface while it is still is in use by ctdbd. (This used to be ctdb commit 9abf9c919a7e6789695490e2c3de56c21b63fa57)	2011-09-06 17:02:19 +10:00

1 2 3 4 5

205 Commits