pve-ha-manager

mirror of git://git.proxmox.com/git/pve-ha-manager.git synced 2025-01-31 05:47:19 +03:00

Author	SHA1	Message	Date
Wolfgang Bumiller	800a0c3e48	bump version to 4.0.5 Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>	2024-06-04 11:10:11 +02:00
Lukas Wagner	f43a6009ff	env: notify: use named templates instead of passing template strings Signed-off-by: Lukas Wagner <l.wagner@proxmox.com> Tested-by: Max Carrara <m.carrara@proxmox.com> Reviewed-by: Max Carrara <m.carrara@proxmox.com>	2024-06-03 14:16:35 +02:00
Thomas Lamprecht	822def8250	bump version to 4.0.4 Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2024-04-22 13:47:22 +02:00
Fabian Grünbichler	8bac62a877	d/postinst: make deb-systemd-invoke non-fatal else this can break an upgrade for unrelated reasons. this also mimics debhelper behaviour more (which we only not use here because of lack of reload support) - restructured the snippet to be more similar with an explicit `if` as well. Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>	2024-04-17 16:56:02 +02:00
Thomas Lamprecht	2db44501bc	bump version to 4.0.3 Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2023-11-17 14:49:08 +01:00
Lukas Wagner	868d3cd4bb	env: switch to matcher-based notification system Signed-off-by: Lukas Wagner <l.wagner@proxmox.com>	2023-11-17 14:47:55 +01:00
Thomas Lamprecht	07284f1194	usage stats: tiny code style clean-up Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2023-11-17 14:47:12 +01:00
Thomas Lamprecht	56d4c7a50a	watchdog-mux: code indentation and style cleanups Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2023-11-17 14:46:49 +01:00
Thomas Lamprecht	6548300e33	buildsys: use dpkg default makefile snippet Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2023-11-17 14:45:35 +01:00
Fiona Ebner	1c61138341	crs: avoid auto-vivification when adding node to service usage Part of what caused bug #4984. Make the code future-proof and warn when the node was never registered in the plugin, similar to what the 'static' usage plugin already does. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> [ TL: rework commit message subject ] Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2023-10-06 12:27:04 +02:00
Fiona Ebner	c7843a315d	fix #4984 : manager: add service to migration-target usage only if online Otherwise, when using the 'basic' plugin, this would lead to auto-vilification of the $target node in the Perl hash tracking the usage and it would wrongly be considered online when selecting the recovery node. The 'static' plugin was not affected, because it would check and warn before adding usage to a node that was not registered with add_node() first. Doing the same in the 'basic' plugin will be done by another patch. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> [ TL: shorten commit message subject ] Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2023-10-06 12:22:53 +02:00
Lukas Wagner	4cb3b2cf9b	manager: send notifications via new notification module ... instead of using sendmail directly. If the new 'notify.target-fencing' parameter from datacenter config is set, we use it as a target for notifications. If it is not set, we send the notification to the default target (mail-to-root). There is also a new 'notify.fencing' paramter which controls if notifications should be sent at all. If it is not set, we default to the old behavior, which is to send. Also add dependency to the `libpve-notify-perl` package to d/control. Signed-off-by: Lukas Wagner <l.wagner@proxmox.com>	2023-08-03 17:34:52 +02:00
Thomas Lamprecht	dfe080bab1	bump version to 4.0.2 Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2023-06-13 08:35:56 +02:00
Fiona Ebner	17c6cbeab9	manager: clear stale maintenance node caused by simultaneous cluster shutdown Currently, the maintenance node for a service is only cleared when the service is started on another node. In the edge case of a simultaneous cluster shutdown however, it might be that the service never was started anywhere else after the maintenance node was recorded, because the other nodes were already in the process of being shut down too. If a user ends up in this edge case, it would be rather surprising that the service would be automatically migrated back to the "maintenance node" which actually is not in maintenance mode anymore after a migration away from it. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>	2023-06-13 08:33:52 +02:00
Fiona Ebner	a1b9918d30	tests: simulate stale maintainance node caused by simultaneous cluster shutdown In the test log, it can be seen that the service will unexpectedly be migrated back. This is caused by the service's maintainance node property being set by the initial shutdown, but never cleared, because that currently happens only when the service is started on a different node. The next commit will address the issue. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>	2023-06-13 08:33:52 +02:00
Thomas Lamprecht	eee63557bc	bump version to 4.0.1 Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2023-06-09 10:41:59 +02:00
Thomas Lamprecht	bf5d92725e	d/control: bump versioned dependency for pve-container & qemu-server Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2023-06-09 10:33:40 +02:00
Fiona Ebner	e0346eccaf	resources: pve: avoid relying on internal configuration details Instead, use the new get_derived_property() method to get the same information in a way that is robust regarding changes in the configuration structure. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>	2023-06-09 07:28:29 +02:00
Fiona Ebner	afa1aa9cb8	api: fix/add return description for status endpoint The fact that no 'items' was specified made the api-viewer throw a JavaScript exception: retinf.items is undefined Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>	2023-06-07 17:40:48 +02:00
Fiona Ebner	5a9c3a2808	lrm: do not migrate if service already running upon rebalance on start As reported in the community forum[0], currently, a newly added service that's already running is shut down, offline migrated and started again if rebalance selects a new node for it. This is unexpected. An improvement would be online migrating the service, but rebalance is only supposed to happen for a stopped->start transition[1], so the service should not being migrated at all. The cleanest solution would be for the CRM to use the state 'started' instead of 'request_start' for newly added services that are already running, i.e. restore the behavior from before commit c2f2b9c ("manager: set new request_start state for services freshly added to HA") for such services. But currently, there is no mechanism for the CRM to check if the service is already running, because it could be on a different node. For now, avoiding the migration has to be handled in the LRM instead. If the CRM ever has access to the necessary information in the future, to solution mentioned above can be re-considered. Note that the CRM log message relies on the fact that the LRM only returns the IGNORED status in this case, but it's more user-friendly than using a generic message like "migration ignored (check LRM log)". [0]: https://forum.proxmox.com/threads/125597/ [1]: https://pve.proxmox.com/pve-docs/chapter-ha-manager.html#_crs_scheduling_points Suggested-by: Thomas Lamprecht <t.lamprecht@proxmox.com> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> [ T: split out adding the test to a previous commit so that one can see in git what the original bad behavior was and how it's now ] Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2023-06-06 19:08:00 +02:00
Thomas Lamprecht	c1aaa05b85	tests: simulate adding running services to HA with rebalance-on-start Split out from Fiona's original series, to better show what actually changes with her fix. Currently, a newly added service that's already running is shut down, offline migrated and started again if rebalance selects a new node for it. This is unexpected and should be fixed, encode that behavior as a test now, showing still the undesired behavior, and fix it in the next commit Originally-by: Fiona Ebner <f.ebner@proxmox.com> Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2023-06-06 19:05:22 +02:00
Fiona Ebner	c0dbab3c32	tools: add IGNORED return code Will be used to ignore rebalance-on-start when an already running service is newly added to HA. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2023-06-06 19:05:22 +02:00
Fiona Ebner	81e8e7d000	sim: hardware: commands: make it possible to add already running service Will be used in a test for balance on start, where it should make a difference if the service is running or not. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2023-06-06 19:05:22 +02:00
Fiona Ebner	b8d86ec48c	sim: hardware: commands: fix documentation for add Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2023-06-06 19:05:22 +02:00
Thomas Lamprecht	973bf0324f	bump version to 4.0.0 Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2023-05-24 19:27:04 +02:00
Thomas Lamprecht	3de087a57b	buildsys: derive upload dist automatically Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2023-05-24 19:26:27 +02:00
Thomas Lamprecht	c1b4249bde	d/control: raise standards version compliance to 4.6.2 Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2023-05-24 19:26:27 +02:00
Thomas Lamprecht	cfe9011673	buildsys: improve DSC target & add sbuild convenience target Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2023-05-24 19:26:27 +02:00
Thomas Lamprecht	1b91242ae9	buildsys: make build-dir generation atomic Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2023-05-24 19:26:27 +02:00
Thomas Lamprecht	576ae6e7d5	buildsys: rework doc-gen cleanup and makefile inclusion Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2023-05-24 19:26:27 +02:00
Thomas Lamprecht	df0c583fc3	buildsys: use full DEB_VERSION and correct DEB_HOST_ARCH The DEB_HOST_ARCH is the one the package is actually built for, the DEB_BUILD_ARCH is the one of the build host; having this correct makes cross-building easier, but otherwise it makes no difference. Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2023-05-24 19:26:27 +02:00
Thomas Lamprecht	69e37516e9	makefile: convert to use simple parenthesis Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2023-05-24 19:26:27 +02:00
Thomas Lamprecht	b0274c4acf	bump version to 3.6.1 Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2023-04-20 14:16:18 +02:00
Thomas Lamprecht	f129138cb0	lrm: keep manual maintenance mode independent of shutdown policy We did not handle being in maintenance mode explicitly with shutdown policies, which is in practice not often an issue as most that use the maintenance mode also switched over the shutdown policy to 'migrate', which keeps the maintenance mode, but for all those evaluating HA or only using the manual maintenance mode it meant that on shutdown the mode was set to 'restart' or 'shutdown', which made the active manager think that the node got out of the maintenance state again, and marked it as online – but as it wasn't really online (and on the way to shutdown), this not only cleared the maintenance mode by mistake, it also had a chance to cause fencing - if any service was still on the node – i.e., maintenance mode wasn't reached yet, but still in-progress of moving HA services (guests). Fix that by checking if maintenance mode is requested, or already active (we currently don't differ those two explicitly, but could be determined from active service count if required), and avoid changing the mode in the shutdown and restart case. Log that also explicitly so admins can understand what happened and why. Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2023-04-20 14:12:07 +02:00
Thomas Lamprecht	f12abfe072	test behavior of maintenance mode with another shutdown policy Encode what happens if a node is in maintenance and gets shutdown with a shutdown policy other than 'migrate' (= maintenance mode) active. Currently it's causing disabling the maintenance mode and also might make a fence even possible (if not all service got moved already). This will be addressed in the next commit. Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2023-04-20 14:10:18 +02:00
Thomas Lamprecht	26bbff0d55	manager: ensure node-request state transferred to new active CRM We do not just take the full CRM status of the old master if a new one gets active, we only take over the most relevant parts like node state. But the relative new node_request object entry is also important, as without that a maintenance state request may get lost if a new CRM becomes the active master. Simply copy it over on initial manager construction, if it exists. Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2023-04-20 14:10:18 +02:00
Thomas Lamprecht	6925144443	test behavior of shutdown with maintenance mode on active master this encode the current bad behavior of the maintenance mode getting lost on active CRM switch, due to the request node state not being transferred. Will be fixed in the next commit. Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2023-04-20 14:10:18 +02:00
Thomas Lamprecht	ef2c0f29f6	lrm: add maintenance to comment about available modes Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2023-04-20 14:10:18 +02:00
Thomas Lamprecht	3361156205	ha config: code style/indendation cleanups Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2023-04-20 14:10:18 +02:00
Thomas Lamprecht	f6c61fe8a3	cli: assert that node exist when changing CRS request state Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2023-04-06 14:09:01 +02:00
Thomas Lamprecht	03f825dbc7	bump version to 3.6.0 Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2023-03-20 13:45:36 +01:00
Thomas Lamprecht	4600bf8998	cli: expose new "crm-command node-maintenance enable/disable" commands Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2023-03-20 13:38:23 +01:00
Thomas Lamprecht	989c4c4929	add CRM command to switch an online node manually into maintenance without reboot Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2023-03-20 13:38:23 +01:00
Thomas Lamprecht	279d91c2ec	lrm: always give up lock if node went successfully into maintenance the change as of now is a no-op, as we only ever switched to maintenance mode on shutdown-request, and there we exited immediately if no active service and worker where around anyway. So this is mostly preparing for a manual maintenance mode without any pending shutdown. Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2023-03-20 13:07:45 +01:00
Thomas Lamprecht	73faade519	lrm: factor out check fo maintenance-request Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2023-03-20 13:04:11 +01:00
Thomas Lamprecht	0916918022	manager: some code style cleanups Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2023-03-20 11:09:01 +01:00
Thomas Lamprecht	314ef2579e	request start: allow to auto-rebalance on a new start request Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2023-03-20 11:09:01 +01:00
Thomas Lamprecht	2fdf40f282	manager: select service node: allow to force best-score selection withot try-next useful for re-balanacing on start, where we do not want to exclude the current node like setting the $try_next param does, but also don't want to favor it like not setting the $try_next param does. We might want to transform both, `try_next` and `best_scored` into a single `mode` parameter to reduce complexity and make it more explicit what we want here. Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2023-03-20 11:09:01 +01:00
Thomas Lamprecht	c2f2b9c62c	manager: set new request_start state for services freshly added to HA Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2023-03-20 11:09:01 +01:00
Thomas Lamprecht	4931b58659	manager: add new intermediate state for stop->start transitions We always check for re-starting a service if its in the started state, but for those that go from a (request_)stop to the stopped state it can be useful to explicitly have a separate transition. The newly introduced `request_start` state can also be used for CRS to opt-into starting a service up on a load-wise better suited node in the future. Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2023-03-20 11:09:01 +01:00

1 2 3 4 5 ...

802 Commits