samba-mirror

mirror of https://github.com/samba-team/samba.git synced 2024-12-28 07:21:54 +03:00

Author	SHA1	Message	Date
Martin Schwenke	3ae8273d86	Make some ctdb_takeover.c functions static These were intentionally not static so they could be linked to in unit test programs. However, using the CCAN-style unit tests where relevant code is just included, this is no longer necessary. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit d0e9e8554614bd49ffb9ec3509feaa0e80d0f65d)	2011-11-11 14:41:47 +11:00
Ronnie Sahlberg	8db9b73920	Merge remote branch 'martins/lcp2fix' (This used to be ctdb commit 7c02d242af552aa732f5c70ea4eeefbc8a8542e2)	2011-11-08 14:06:30 +11:00
Ronnie Sahlberg	0f92fa224c	RB_TREE: Add mechanism to abort a traverse This patch changes the callback signature for traversal functions to allow a client to abort a traverse before it finishes. Updates to all callers and examples as well as rb-test tool. (This used to be ctdb commit 8ab0c63ad36cfbbb1e5fed46a1f4c47b1fdb581f)	2011-11-08 13:40:28 +11:00
Martin Schwenke	c0939af571	LCP IP allocation algorithm - try harder to find a candidate source node There's a bug in LCP2. Selecting the node with the highest imbalance doesn't always work. Some nodes can have a high imbalance metric because they have a lot of IPs. However, these nodes can be part of a group that is perfectly balanced. Nodes in another group with less IPs might actually be imbalanced. Instead of just trying the source node with the highest imbalance this tries them in descending order of imbalance until it finds one where an IP can be moved to another node. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 574091d5aced5e87aefad52f8bc47aa75c25fbf6)	2011-11-02 10:17:00 +11:00
Martin Schwenke	98c27f973d	LCP IP allocation algorithm - new function lcp2_failback_candidate() There's a bug in LCP2. Selecting the node with the highest imbalance doesn't always work. Some nodes can have a high imbalance metric because they have a lot of IPs. However, these nodes can be part of a group that is perfectly balanced. Nodes in another group with less IPs might actually be imbalanced. Factor out the code from lcp2_failback() that actually takes a node and decides which address should be moved to which node. This is the first step in fixing the above bug. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 75718c5768b5bb5c0bcd7dd90e0327c6ed22a63d)	2011-11-01 21:01:25 +11:00
Ronnie Sahlberg	d79596ba1a	One of the entry points to release an ip reset the pnn field before invoking the eventscript. this triggered a check for "only run the eventscript if we host the address" to trigger and shortcir=cuit calling the eventscript. An effect of this would be that 'ctdb delip' would remove the ip from ctdb, but fail to delete it from the interface. S1028798 (This used to be ctdb commit b82524f240bf21769dd7624ca6026763d38b9396)	2011-09-22 15:17:23 +10:00
Ronnie Sahlberg	4587bdb052	when checking that the interfaces exist in ctdb_add_public_address() cant talloc off vnn since it is not yet initialized and might not always be NULL (This used to be ctdb commit 3d37be3e2bfb61ede824028aeebaa18ba304faae)	2011-09-21 11:42:19 +10:00
Ronnie Sahlberg	783ceca07b	Interface monitoring: add a event to trigger every 30 seconds to check that all interfaces referenced by the public address list actually exists. This will make it much easier to root-cause problems such as S1029023 when an external application deleted the interface while it is still is in use by ctdbd. (This used to be ctdb commit 9abf9c919a7e6789695490e2c3de56c21b63fa57)	2011-09-06 17:02:19 +10:00
Ronnie Sahlberg	64378fea58	Check interfaces: when reading the public addresses file to create the vnn list check that the actual interface exist, print error and fail startup if the interface does not exist. (This used to be ctdb commit cd33bbe6454b7b0316bdfffbd06c67b29779e873)	2011-09-06 16:11:00 +10:00
Volker Lendecke	1cf1670f0a	Fix a const warning Signed-off-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit e25559087c9752502580875f7e33f3c416c05f84)	2011-08-22 17:11:07 +02:00
Ronnie Sahlberg	fea64f65b5	Remove a log message about setting linkstate for an unknown interface. sometimes we do want to try to set the linkstate for interfaces that are not in use by public addresses right now (but posisbly by other mechanisms) and these messages just spam the logs S1026357 (This used to be ctdb commit f2fe0a090a9650910ebe49514b3ca01dc593bea3)	2011-08-05 10:05:12 +10:00
Martin Schwenke	5ac67504ca	Tests: Initial test code for LCP2 IP allocation algorithm. Move struct ctdb_public_ip_list to ctdb_private.h and put some definitions for some functions from ctdb_takeover.c there. This allows those functions to be called from unit tests. Add ctdb_takeover_tests.c and the Makefile support to build it. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 9d34be0233edf3bc022345c0494c4b2a4d7f8480)	2011-07-29 09:01:36 +10:00
Martin Schwenke	ff1a81c872	IP allocation - add LCP2 algorithm. The current non-deterministic IP allocation algorithm balances IPs across the whole cluster. It does not consider different interfaces/VLANs/subnets, so these different groups of IPs aren't generally well balanced. This adds the LCP2 algorithm for IP allocation and allows it to be enabled by setting the "LCP2PublicIPs" tunable to 1. The LCP2 algorithm calculates the imbalance of a node by totalling the squares of the distances between each IP on the node. The IP distance is defined as the length longest common prefix (LCP) of bits that is found when comparing 2 IPs. The imbalance of a cluster is the maximum imbalance for any node. At each step the algorithm selects an allocation to the IP/node combination that results in the choosing the allocation that best reduces the imbalance of the cluster. The implementation splits out the IP allocation part of ctdb_takeover_run() into new function ctdb_takeover_run_core(), and then extracts out the basic IP assignment code into new functions basic_allocate_unassigned() and basic_failback(). 3 new functions lcp2_init(), lcp2_allocate_unassigned() and lcp2_failback() implement the LCP2 algorithm, and are hooked into ctdb_takeover_run_core(). Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 61fc7fbd0235469df22deb6581c6bd47e30bc0be)	2011-07-29 09:01:17 +10:00
Ronnie Sahlberg	e707f23596	Update the delip command Dont talloc_free(vnn) immediately but postphone it until later when the eventscript callback has completed. CQ S1026664 (This used to be ctdb commit 0a99e8742a261b1d3a2c8830f5c19ea6c2c47cad)	2011-07-29 08:50:48 +10:00
Ronnie Sahlberg	c93a968619	When trying to re-balance the ip assignment and shuffle ips from nodes with many addresses to nodes with few addresses, loop up to num_ips+5 times instead of only 5 times. When we have very many public ips per node, we might need to loop more than 5 times or else we will exit without reaching optimal balance. (This used to be ctdb commit aa8114a625a637277561a66c80bdece3c27e9e20)	2011-07-06 13:14:13 +10:00
Ronnie Sahlberg	f84bd3b5f1	Dont call the UPDATE event if both old and new interface is the same. CQ S1018175 (This used to be ctdb commit 6a74515f0a1e24d97cee3ba05d89133aac7ad2b7)	2011-05-04 13:29:29 +10:00
Ronnie Sahlberg	c04505724a	IFACE handling. Assume links are always good on nstartup (they almost always Simplify the handling of setting the links in the 10.interface eventscript and remove the optimization to only call setifacelink on state change to make the code simpler to read. If a take ip event fails, flag the node as unhealthy. Add a check to the interface script to check if the interface exists or if it has been deleted. So that we can capture and become UNHELTHY if someone deletes an interface we are using to host public addresses. (This used to be ctdb commit 4ab63d2a7262aff30d5eced184c294c9c9dd4974)	2011-04-11 07:40:05 +10:00
Ronnie Sahlberg	f82936402f	IP reallocation. If a public address is already hosted on the node when we startup, log a warning message but do not cause the recovery to fail. CQ S1022356 Signed-off-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit 89f8169c24da96c1fdd0ac19b8a1e0e1df01a72a)	2011-03-14 13:35:53 +01:00
Ronnie Sahlberg	93bea39391	IPALLOCATION : If the node is held pinned down in "init" state by external services failing to start, or blocking CTDBD from finishing the startup phase, we can encounter a situation where we have not yet fully initialized, but a remote recovery master tries to release a certain ip clusterwide. In this situation the node that is pinned down in init/startup phase would fail to perform the release of the ip address since we are not yet fully operational and not yet host any valid interfaces. In this situation, we just need to remain unhealthy, there is on need to also ban the node. Remove the autobanning for this condition and just let the node remain in unhealthy mode. Banning is overkill in this situation when the system is broken and just draws attention to ctdbd instead of the root cause. (This used to be ctdb commit d8af74e4c4961deb94c18dde8ba7fc07e944729c)	2011-01-13 09:42:01 +11:00
Ronnie Sahlberg	a9a6ae064d	When assigning the single-public-ip during startup, flag the interface as initially being "link ok" so that we can add it and startup. The eventscript can later drop the flag if required (This used to be ctdb commit 720849b756c825fb8b285f09972a8c39f1888a99)	2010-12-13 14:24:04 +11:00
Ronnie Sahlberg	c2c53db49d	during ip allocation, there are failure modes where a node might hold a ip address but thinks it is still unassigned (-1). add code to the recovery daemon to detect this case and trigger a reallocation so that the ip gets covered and change the takeip code to allow for this condition, taking on an ip address that is already hosted. cq s1021073 (This used to be ctdb commit 9020baf27cab7821c9094cda185206fb7af0fee7)	2010-12-03 13:30:39 +11:00
Ronnie Sahlberg	dbcf0de18c	Dont exit the update ip function if the old and new interfaces are the same since if they are the same for whatever reason this triggers the system to go into an infinite loop and is unrobust The scriptds have been changed instead to be able to cope with this situation for enhanced robustness During takeover_run and when merging all ip allocations across the cluster try to kepe track of when and which node currently hosts an ip address so that we avoid extra ip failovers between nodes (This used to be ctdb commit cf778b5aaf6356401e3985acccc7df9e08ab6930)	2010-11-10 14:55:25 +11:00
Ronnie Sahlberg	6fa8e1fddb	when we load the public address file, at the same time check if we are already hosting the public address, if so, set ourselves up as the pnn for that address (This used to be ctdb commit 0f2a2dac91a61be188c3578c8bb89d47cbf9a0f8)	2010-11-10 14:55:24 +11:00
Ronnie Sahlberg	5f76f3c0e2	Add a new tunable : DisableIPFailover that when set to non 0 will stopp any ip reallocations at all from happening. (This used to be ctdb commit d8d37493478a26c5f1809a5f3df89ffd6e149281)	2010-11-10 14:55:24 +11:00
Ronnie Sahlberg	87a0ece976	when creating/adding a public ip, set the initial interface to be the first interface specified (This used to be ctdb commit 4308935ba48ac7a29e7523315acf580019715f0f)	2010-11-10 14:55:23 +11:00
Ronnie Sahlberg	d8d8b9e1d7	add a new serverid to send a message everytime an ip address is taken on the local node (This used to be ctdb commit 1261f3d9702800a4e59550c881350daf479f00ef)	2010-09-13 15:43:19 +10:00
Ronnie Sahlberg	19211f99c8	remove an unused variable (This used to be ctdb commit e07fdbaf12bbe84370bc47a1979fe198a06a6cc8)	2010-09-13 13:13:12 +10:00
Ronnie Sahlberg	c95f4258d8	Add a new event "ipreallocated" This is called everytime a reallocation is performed. While STARTRECOVERY/RECOVERED events are only called when we do ipreallocation as part of a full database/cluster recovery, this new event can be used to trigger on when we just do a light failover due to a node becomming unhealthy. I.e. situations where we do a failover but we do not perform a full cluster recovery. Use this to trigger for natgw so we select a new natgw master node when failover happens and not just when cluster rebuilds happen. (This used to be ctdb commit 7f4c591388adae20e98984001385cba26598ec67)	2010-08-30 18:09:30 +10:00
Ronnie Sahlberg	2e8aac6689	Merge commit 'rusty/ports-from-1.0.112' into foo (This used to be ctdb commit 13e58d92f5f1723e850a82ae030d0ca57e89b1ee)	2010-08-19 13:17:56 +10:00
Ronnie Sahlberg	5aa5f3e7bf	Remove the structure ctdb_control_tcp_vnn since this is identical to the structure ctdb_tcp_connection. Add a new "ctdb deltickle" command to delete tickles from the database. This can ONLY be used for tickles created by "ctdb addtickle". Push any "addtickle/deltickle" updates to other nodes every TickleUpdateInterval seconds' (This used to be ctdb commit acded034e2f0dcae4c2c9e54e16a001caf23caec)	2010-08-18 12:36:03 +10:00
Rusty Russell	1a009aff73	takeover: prevent crash by avoiding free in traverse on RST timeout After 5 attempts to send a RST to a client without any response, we free "con"; this is done during a traverse. This frees the node we are walking through (the node is made a child of "con" down in rb_tree.c's trbt_create_node() (Valgrind would catch this, as Martin confirmed). So, we create a temporary parent and reparent onto that; then we free that parent after the traverse, thus deleting the unwanted nodes. CQ:S1019041 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 08f7f85477610a4916c1ec866aa467b28f1bbec3)	2010-08-18 11:40:17 +09:30
Rusty Russell	f93440c4b7	event: Update events to latest Samba version 0.9.8 In Samba this is now called "tevent", and while we use the backwards compatibility wrappers they don't offer EVENT_FD_AUTOCLOSE: that is now a separate tevent_fd_set_auto_close() function. This is based on Samba version `7f29f817fa`. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 85e5e760cc91eb3157d3a88996ce474491646726)	2010-08-18 09:16:31 +09:30
Ronnie Sahlberg	4136f27145	When adding an ip at runtime, it might not yet have an iface assigned to it, so ensure that the next takover_ip call will fall through to accept the ip and add it. (This used to be ctdb commit 2d60f96680d16c2992e2a35517822f88c12538b7)	2010-06-01 16:22:48 +10:00
Ronnie Sahlberg	92340e4d6f	check if vnn is a valid pointer before dereferencing it based on rustys patch for bz62783 (This used to be ctdb commit bdd250b9afdd1060cfd1e2b0f0a5a567150bb380)	2010-05-26 13:43:28 +10:00
Ronnie Sahlberg	4a43428440	The recent change to the recovery daemon to keep track of and verify that all nodes agree on the most recent ip address assignments broke "ctdb moveip ..." since that call would never trigger a full takeover run and thus would immediately trigger an inconsistency. Add a new message to the recovery daemon where we can tell the recovery daemon to update its assignments. BZ62782 (This used to be ctdb commit e7069082e5f0380dcddee247db8754218ce18cab)	2010-05-03 15:47:17 +10:00
Ronnie Sahlberg	c3c7aa934f	Make create_merged_ip_list() a static function since it is not called from outside of ctdb_takeover.c (This used to be ctdb commit 880896a27adfdd5173b2810b6b2f3889802046f0)	2010-05-03 15:47:06 +10:00
Ronnie Sahlberg	79fac9771d	In the log message when we have found an inconsistent ip address allocation, add extra log information about what the inconsistency is. (This used to be ctdb commit d2e4a9912c4bd13eb4f12681adebe7e59a6d1fb2)	2010-05-03 15:46:36 +10:00
Ronnie Sahlberg	06885ea9a7	In the recovery daemon, keep track of which node we have assigned public ip addresses and verify that the remote nodes have/keep a consistent view of assigned addresses. If a remote node has an inconsistent view of addresses visavi the recovery master this will trigger a full ip reallocation. (This used to be ctdb commit f3bf2ab61f8dbbc806ec23a68a87aaedd458e712)	2010-04-08 14:25:26 +10:00
Ronnie Sahlberg	7f2f7364ad	lower the loglevel for a debug message for redundant releases of public ips (This used to be ctdb commit cfc1a4f878b61c85063af649d2339431e799647d)	2010-02-16 11:01:09 +11:00
Stefan Metzmacher	76cb4ce34c	server: ban ourself if the ctdb and kernel knowledge of a public ip differs metze (This used to be ctdb commit 48e0af91113d6cead6cae3f28d8d8f610cacaa71)	2010-01-20 11:11:04 +01:00
Stefan Metzmacher	405368eeb0	server: give an error if we're getting an takeover_ip event with a wrong pnn metze (This used to be ctdb commit 2f44d6f3d290cc1b37b19ec34edfbad12cc0c0a7)	2010-01-20 11:11:04 +01:00
Stefan Metzmacher	a5ba5c129a	server: return an error if we get an takeover ip event and we cannot serve the ip metze (This used to be ctdb commit f5c221e6abc118aefa489aa7e07755af952fd2bb)	2010-01-20 11:11:03 +01:00
Stefan Metzmacher	55d824bd77	server: print node number as signed integer on release ip event metze (This used to be ctdb commit 6c456face30606641f6b8beaad3121c9b05ca763)	2010-01-20 11:11:03 +01:00
Stefan Metzmacher	c5e579b56a	server: debug redundant takeover ip events with level INFO metze (This used to be ctdb commit 7bc9969c4c28f2c4a4848bd730db3c63bb9204fe)	2010-01-20 11:11:03 +01:00
Stefan Metzmacher	ffdf32dedf	server: be less verbose on redundant release_ip events metze (This used to be ctdb commit 72ef5f891f85ce51f5ca7e0c03d0c7cc955be110)	2010-01-20 11:11:03 +01:00
Stefan Metzmacher	58d7c44b1c	server: add a ctdb_do_updateip() metze (This used to be ctdb commit eded224368dded2264e53546c196b1b485cb2094)	2010-01-20 11:11:02 +01:00
Stefan Metzmacher	aa485b17bb	server: split out a ctdb_do_takeover_ip() function metze (This used to be ctdb commit 8fd6f4aab0c173b4c9c4c02c546e7d2ec1a98423)	2010-01-20 11:11:02 +01:00
Stefan Metzmacher	da59e0b162	server: split out a ctdb_announce_vnn_iface() function metze (This used to be ctdb commit ec87a51660cfa8a6851923f757fed31f7ffc7153)	2010-01-20 11:11:02 +01:00
Stefan Metzmacher	179c098e86	server: start with disabled interfaces and let the event scripts enable the interfaces explicit This makes sure that we don't get public addresses assigned during the initial recovery and remove them again in the startup event. metze (This used to be ctdb commit f872e8c63a2f8979e6a0d088630575bdd4d7b4f1)	2010-01-20 11:11:01 +01:00
Stefan Metzmacher	f4f72024fe	server: implement ctdb_control_set_iface_link() This only marks the interface status and doesn't generate any directly triggered action. The actions is later taken by the recovery process in verify_ip_allocation. metze (This used to be ctdb commit cff58b27c970e9252d131125941c372019fd6660)	2010-01-20 11:10:59 +01:00

1 2 3 4

163 Commits