pve-ha-manager

mirror of git://git.proxmox.com/git/pve-ha-manager.git synced 2025-01-31 05:47:19 +03:00

Author	SHA1	Message	Date
Thomas Lamprecht	7fd7af67e5	manager: recompute online usage: iterate over keys sorted mostly to be safe for reproduce ability with the test system. Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2023-03-20 11:09:01 +01:00
Thomas Lamprecht	b159176a9b	manager: service start: make EWRONG_NODE a non-fatal error traverse the usual error counting mechanisms, as then the select_service_node helper either picks up the right node and it starts there or it can trigger fencing of that. Note, in practice this normally can only happen if the admin butchered around in the node cluster state, but as we only select the safe nodes from the configured groups, we should be safe in any case. Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2023-03-20 11:09:01 +01:00
Thomas Lamprecht	49b0ccc7fe	sim hardware: avoid hard error on usage stats parsing now that we can automatically derive them from the SID Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2023-03-20 11:09:01 +01:00
Thomas Lamprecht	d9a55b5d3c	sim env: derive service usage from ID as fallback so that we don't need to specify all usage stats explicitly for bigger tests. Note, we explicitly use two digits for memory as with just one a lot of services are exactly the same, which gives us flaky tests due to rounding, or some flakiness in the rust code - so this is a bit of a stop gap for that too and should be reduced to a single digit once we fixed it in the future. Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2023-03-20 11:09:01 +01:00
Thomas Lamprecht	de225e04c4	update readme to be a bit less confusing/outdated E.g., pve-ha-manager is our current HA manager, so talking about the "current HA stack" being EOL without mentioning the actually meant `rgmanager` one, got taken up the wrong way by some potential users. Correct that and a few other things, but as there are definitively stuff still out-of-date, or will be in a few months, mention that this is an older readme and refer to the HA reference docs at the top. Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2023-01-03 13:19:18 +01:00
Thomas Lamprecht	071e69ce7f	bump version to 3.5.1 Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2022-11-19 15:51:16 +01:00
Thomas Lamprecht	475f19fe7d	api: status: add CRS info to manager if not set to default Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2022-11-19 15:50:14 +01:00
Thomas Lamprecht	f2c729829f	manager: slightly clarify log message for fallback on init-failure Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2022-11-19 15:50:14 +01:00
Thomas Lamprecht	d062598531	api: status: code and indentation cleanup Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2022-11-19 15:50:14 +01:00
Thomas Lamprecht	1b81383180	manager: make crs a full blown hash To support potential more CRS settings more easily. Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2022-11-19 15:50:14 +01:00
Thomas Lamprecht	086f7075d0	manager: update crs scheduling mode once per round Pretty safe to do as we recompute everything per round anyway (and much more often on top of that, but that's another topic). Actually I'd argue that it's safer as this way a user doesn't need to actively restart the manager, which grinds much more gears and watchdog changes than checking periodically and updating it internally. Plus, a lot of admins won't expect that they need to restart the current active master and thus they'll complain that their recently made change to the CRS config had no effect/the CRS doesn't work at all. We should codify such a change in test for this though. Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2022-11-19 14:05:26 +01:00
Thomas Lamprecht	cb06cd421a	manager: factor out setting crs scheduling mode Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2022-11-19 13:36:28 +01:00
Thomas Lamprecht	83a84eb0e3	manager: various code style cleanups Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2022-11-19 13:06:03 +01:00
Thomas Lamprecht	091f890416	bump version to 3.5.0 Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2022-11-18 15:03:00 +01:00
Thomas Lamprecht	c2d8b56a97	manager: better convey that basic is always the fallback to hint to a potential "code optimizer" that it may not be easily moved above to the scheduling selection Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2022-11-18 15:01:12 +01:00
Thomas Lamprecht	42d9b683f2	d/control: add (build-)dependency for libpve-rs-perl to ensure we got the perlmod for the basic scheduler available. Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2022-11-18 13:44:45 +01:00
Fiona Ebner	4788830551	resources: add missing PVE::Cluster use statements Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>	2022-11-18 13:25:21 +01:00
Fiona Ebner	f348399fe4	test: add tests for static resource scheduling See the READMEs for more information about the tests. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>	2022-11-18 13:25:21 +01:00
Fiona Ebner	223a2ca493	usage: static: use service count on nodes as a fallback if something goes wrong with the TOPSIS scoring. Not expected to happen, but it's rather cheap to be on the safe side. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>	2022-11-18 13:25:21 +01:00
Fiona Ebner	c724ce1be7	manager: avoid scoring nodes when not trying next and current node is valid With the Usage::Static plugin, scoring is not as cheap anymore and select_service_node() is called for each running service. This should cover most calls of select_service_node(). Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>	2022-11-18 13:25:21 +01:00
Fiona Ebner	631ba60ef2	manager: avoid scoring nodes if maintenance fallback node is valid With the Usage::Static plugin, scoring is not as cheap anymore and select_service_node() is called for each running service. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>	2022-11-18 13:25:21 +01:00
Fiona Ebner	561e7f4bfb	manager: use static resource scheduler when configured Note that recompute_online_node_usage() becomes much slower when the 'static' resource scheduler mode is used. Tested it with ~300 HA services (minimal containers) running on my virtual test cluster. Timings with 'basic' mode were between 0.0004 - 0.001 seconds Timings with 'static' mode were between 0.007 - 0.012 seconds Combined with the fact that recompute_online_node_usage() is currently called very often this can lead to a lot of delay during recovery situations with hundreds of services and low thousands of services overall and with genereous estimates even run into the watchdog timer. Ideas to remedy this is using PVE::Cluster's get_guest_config_properties() instead of load_config() and/or optimizing how often recompute_online_node_usage() is called. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>	2022-11-18 13:25:21 +01:00
Fiona Ebner	f74f8ffb24	manager: set resource scheduler mode upon init Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>	2022-11-18 13:25:21 +01:00
Fiona Ebner	7c142d6822	env: datacenter config: include crs (cluster-resource-scheduling) setting Suggested-by: Thomas Lamprecht <t.lamprecht@proxmox.com> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>	2022-11-18 13:25:21 +01:00
Fiona Ebner	749d8161be	env: rename get_ha_settings to get_datacenter_settings The method will be extended to include other HA-relevant settings from datacenter.cfg. Suggested-by: Thomas Lamprecht <t.lamprecht@proxmox.com> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>	2022-11-18 13:25:21 +01:00
Fiona Ebner	48f2144b27	usage: add Usage::Static plugin for calculating node usage of services based upon static CPU and memory configuration as well as scoring the nodes with that information to decide where to start a new or recovered service. For getting the service stats, it's necessary to also consider the migration target (if present), becuase the configuration file might have already moved. It's necessary to update the cluster filesystem upon stealing the service to be able to always read the moved config right away when adding the usage. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>	2022-11-18 13:25:21 +01:00
Fiona Ebner	5d724d4dd9	manager: online node usage: switch to Usage::Basic plugin no functional change is intended. One test needs adaptation too, because it created its own version of $online_node_usage. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>	2022-11-18 13:25:21 +01:00
Fiona Ebner	b259857688	manager: select service node: add $sid to parameters In preparation for scheduling based on static information, where the scoring of nodes depends on information from the service's VM/CT configuration file (and the $sid is required to query that). Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>	2022-11-18 13:25:21 +01:00
Fiona Ebner	c8c6e462fc	add Usage base plugin and Usage::Basic plugin in preparation to also support static resource scheduling via another such Usage plugin. The interface is designed in anticipation of the Usage::Static plugin, the Usage::Basic plugin doesn't require all parameters. In Usage::Static, the $haenv will necessary for logging and getting the static node stats. add_service_usage_to_node() and score_nodes_to_start_service() take the sid, service node and the former also the optional migration target (during a migration it's not clear whether the config file has already been moved or not) to be able to get the static service stats. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>	2022-11-18 13:25:21 +01:00
Fiona Ebner	eea0c60923	resources: add get_static_stats() method to be used for static resource scheduling. In container's vmstatus(), the 'cores' option takes precedence over the 'cpulimit' one, but it felt more accurate to prefer 'cpulimit' here. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>	2022-11-18 13:25:21 +01:00
Fiona Ebner	5db695c3f3	env: add get_static_node_stats() method to be used for static resource scheduling. In the simulation environment, the information can be added in hardware_status. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>	2022-11-18 13:25:21 +01:00
Thomas Lamprecht	0869c306ba	fixup variable name typo Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2022-07-22 12:39:27 +02:00
Thomas Lamprecht	a3ffb0b3d4	manager: add top level comment section to explain common variables Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2022-07-22 12:15:55 +02:00
Thomas Lamprecht	bc64c08e37	d/lintian-overrides: update for newer lintian Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2022-07-22 10:06:47 +02:00
Thomas Lamprecht	2a1638b77b	bump version to 3.4.0 Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2022-07-22 09:22:52 +02:00
Thomas Lamprecht	6f818da13f	manager: online node usage: factor out possible traget and future proof only count up target selection if that node is already in the online node usage list, to avoid that a offline node is considered online if its a target from any command Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2022-07-22 09:12:38 +02:00
Thomas Lamprecht	8c80973d40	test: update pre-existing policy tests for fixed balancing spread Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2022-07-22 08:49:41 +02:00
Thomas Lamprecht	1280368d31	fix variable name typo Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2022-07-22 07:25:02 +02:00
Thomas Lamprecht	066fd01670	fix spreading out services if source node isnt operational but otherwise ok as its the case for going into maintenance mode Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2022-07-21 18:14:33 +02:00
Thomas Lamprecht	6756e14aed	tests: add shutdown policy scenario with multiple guests to spread out currently wrong as online_node_usage doesn't considers counting the target node if the source node isn't considered online (= operational) anymore Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2022-07-21 18:09:42 +02:00
Thomas Lamprecht	c00c44818a	bump version to 3.3-4 Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2022-04-27 14:02:22 +02:00
Fabian Grünbichler	ad6456997e	lrm: fix getting stuck on restart run_workers is responsible for updating the state after workers have exited. if the current LRM state is 'active', but a shutdown_request was issued in 'restart' mode (like on package upgrades), this call is the only one made in the LRM work() loop. skipping it if there are active services means the following sequence of events effectively keeps the LRM from restarting or making any progress: - start HA migration on node A - reload LRM on node A while migration is still running even once the migration is finished, the service count is still >= 1 since the LRM never calls run_workers (directly or via manage_resources), so the service having been migrated is never noticed. maintenance mode (i.e., rebooting the node with shutdown policy migrate) does call manage_resources and thus run_workers, and will proceed once the last worker has exited. reported by a user: https://forum.proxmox.com/threads/lrm-hangs-when-updating-while-migration-is-running.108628 Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>	2022-04-27 13:57:37 +02:00
Thomas Lamprecht	fe3781e8ab	buildsys: track and upload debug package Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2022-01-20 18:08:27 +01:00
Thomas Lamprecht	c15a8b803e	bump version to 3.3-3 Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2022-01-20 18:05:37 +01:00
Thomas Lamprecht	eef4f86338	lrm: increase run_worker loop-time parition every LRM round is scheduled to run for 10s but we spend only half of that to actively trying to run workers (in the max_worker limit). Raise that to 80% duty cycle. Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2022-01-20 16:17:28 +01:00
Thomas Lamprecht	65c1fbac99	lrm: avoid job starvation on huge workloads If a setup has a lot VMs we may run into the time limit from the run_worker loop before processing all workers, which can easily happen if an admin did not increased their default of max_workers in the setup, but even with a bigger max_worker setting one can run into it. That combined with the fact that we sorted just by the $sid alpha-numerically means that CTs where preferred over VMs (C comes before V) and additionally lower VMIDs where preferred too. That means that a set of SIDs had a lower chance of ever get actually run, which is naturally not ideal at all. Improve on that behavior by adding a counter to the queued worker and preferring those that have a higher one, i.e., spent more time waiting on getting actively run. Note, due to the way the stop state is enforced, i.e., always enqueued as new worker, its start-try counter will be reset every round and thus have a lower priority compared to other request states. We probably want to differ between a stop request when the service is/was in another state just before and the time a stop is just re-requested even if a service was already stopped for a while. Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2022-01-20 16:14:03 +01:00
Thomas Lamprecht	b538340c9d	lrm: code/style cleanups Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2022-01-20 14:40:27 +01:00
Thomas Lamprecht	f613e426ce	lrm: run worker: avoid an indendation level best viewed with the `-w` flag to ignore whitespace change itself Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2022-01-20 13:42:15 +01:00
Thomas Lamprecht	a25a516ac6	lrm: log actual error if fork fails Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2022-01-20 13:39:35 +01:00
Thomas Lamprecht	2deff1ae35	manager: refactor fence processing and rework fence-but-no-service log Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2022-01-20 13:31:04 +01:00

1 2 3 4 5 ...

802 Commits