pve-ha-manager

mirror of git://git.proxmox.com/git/pve-ha-manager.git synced 2025-01-20 18:03:53 +03:00

Author	SHA1	Message	Date
Thomas Lamprecht	800a2de6a3	FenceConfig: early return if file is empty Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2019-01-13 12:45:45 +01:00
Thomas Lamprecht	1c2561110f	d/lintian-overrids: add repeated-trigger-name override in this package we provide api functions, thus we want to activate the pve-api-update trigger, so that packages like pve-manager get notified about it. But we also use api functions directly so we setup an interest in the pve-api-update trigger. This results in an lintian error (lintian version from buster or newer) which we can override: > [...] > This tag is also triggered if the package has an activate trigger > for something on which it also declares an interest. The only (but > rather unlikely) reason to do this is if another package also > declares an interest and this package needs to activate that other > package. If the package is using it for this exact purpose, then > please use a Lintian override to state this. -- https://lintian.debian.org/tags/repeated-trigger-name.html Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2019-01-08 17:35:59 +01:00
Thomas Lamprecht	3220c3391c	sim: show sent emails in regression tests its good to check if any regression regarding sendmail happened, as it can be annoying if a sendmail loop happens.	2019-01-08 17:32:05 +01:00
Thomas Lamprecht	7488b3cc2c	fence config: allow to pass arguments to fence agents via short-opts Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2019-01-08 15:28:06 +01:00
Thomas Lamprecht	a57a3b7809	d/control: add missing pve-container dependency Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2019-01-08 15:23:56 +01:00
Thomas Lamprecht	7583bf275c	fencing: fixup run_fence_jobs Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2019-01-08 15:23:04 +01:00
Thomas Lamprecht	7655c92c81	fixup changelog line length and typos Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2019-01-07 13:35:34 +01:00
Thomas Lamprecht	e3e02f4688	bump version 2.0-6 Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2019-01-07 13:00:00 +01:00
Wolfgang Bumiller	0354cbe945	fixup parse_sid call This call was missed in the commit moving it from PVE::HA::Tools to PVE::HA:Config. Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com> Fixes: 0087839aa530 ("Tools: remove dependency on PVE::Cluster")	2019-01-07 12:10:40 +01:00
Thomas Lamprecht	d2236278ac	followup code cleanup addresses a few nits from Fabians review at: https://pve.proxmox.com/pipermail/pve-devel/2018-December/035061.html https://pve.proxmox.com/pipermail/pve-devel/2018-December/035085.html Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2019-01-07 12:07:05 +01:00
Thomas Lamprecht	7a20d688d8	lrm: explicitly log shutdown_policy on node shutdown Makes regression test a bit more telling and it helps to be verbose for an user here too. Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2019-01-07 11:17:30 +01:00
Thomas Lamprecht	ba15a9b908	fix #1378 : allow to specify a service shutdown policy Allow an admin to set a datacenter wide HA policy which can change the way we handle services on a node shutdown. There's: * freeze: always freeze servivces, independent of the shutdown type (reboot, poweroff) * failover: never freeze services, this means that a service will get recovered to another node if possible and if the current node does not comes back up in the grace period of 1 minute. * default: this is the current behavior, freeze on reboot but do not freeze on poweroff Add to tests, shutdown-policy1 which is based of the reboot1 test, but enforces no freeze with a failover policy, and shutdown-policy2 which is based on the shutdown1 test but with a explicit freeze policy. You can compare (diff) each tests log result to the test it's based on to see what changes. Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2019-01-07 11:17:30 +01:00
Thomas Lamprecht	ed408b4491	Env: add get_ha_settings method Add get_ha_settings, a method which returns the datacenter wide HA settings Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2019-01-07 11:17:30 +01:00
Rhonda D'Vine	b9350791a3	Add missing Build-Depends Signed-off-by: Rhonda D'Vine <rhonda@proxmox.com>	2018-12-17 09:41:11 +01:00
Thomas Lamprecht	c974828745	install simulator executable into bin not sbin Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2018-10-17 11:51:04 +02:00
Thomas Lamprecht	1e07d70c29	Tools: add note about indirect include of Config module Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2018-10-17 11:41:44 +02:00
Fabian Grünbichler	728d9a2a97	build: actually ship SOURCE file Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>	2018-10-17 11:20:41 +02:00
Fabian Grünbichler	6ea95574cc	build: bump compat level to 10 Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>	2018-10-17 11:20:41 +02:00
Fabian Grünbichler	1116ca25b8	build: restructure packaging use dpkg-buildpackage and debhelper properly, add missing dependencies and embed used perl modules from libpve-common-perl to make pve-ha-simulator standalone. Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>	2018-10-17 11:20:41 +02:00
Fabian Grünbichler	0087839aa5	Tools: remove dependency on PVE::Cluster by moving parse_sid to PVE::HA::Env, with the default implementation in PVE::HA::Config. the bash completion methods use PVE::HA::Config (and PVE::Cluster), but the corresponding use statements are only in PVE::CLI::ha_manager, where the bash completion is actually used. Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>	2018-10-17 11:20:41 +02:00
Fabian Grünbichler	6529b6a4e2	Tools/Config: refactor lrm status json reading to avoid unnecessary dependency on PVE::Cluster in PVE::HA::Tools. reading the LRM status file was the only instance of reading from the CFS via this method. Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>	2018-10-17 11:20:41 +02:00
Fabian Grünbichler	5f52cd3c42	sim: don't install PVE::HA::Config it is not needed anymore by the simulator. Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>	2018-09-28 15:26:26 +02:00
Fabian Grünbichler	dd970f9ea6	sim: don't install real resources they are not needed, the simulator contains its own (simulated) resources. Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>	2018-09-28 15:26:21 +02:00
Fabian Grünbichler	7d33cb12de	groups: register groups directly and use PVE::HA::Groups to parse the config when testing/simulating. this allows us to drop the dependency on PVE::HA::Config, which would otherwise pull in a lot of additional depdendencies that we don't want in the simulator. Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>	2018-09-28 14:06:59 +02:00
Fabian Grünbichler	f503a7bf77	pve-ha-tester: use correct lib path since we want to test the version from the current working tree, and not the installed one. Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>	2018-09-28 14:06:59 +02:00
Fabian Grünbichler	e649331eab	remove unused use statements Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>	2018-09-28 14:06:59 +02:00
Fabian Grünbichler	745fd425c4	build: remove leftover PHONY declaration simdeb is already declared PHONY on its own Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>	2018-09-28 14:06:59 +02:00
Dominik Csapak	2799edd464	document api result for ha resources Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>	2018-09-17 12:43:54 +02:00
Thomas Lamprecht	c253924fd3	bump version to 2.0-5 Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2018-02-07 11:20:28 +01:00
Thomas Lamprecht	9cdf16b1c8	buildsys: use correct git revision for SOURCE file	2018-02-07 10:38:51 +01:00
Thomas Lamprecht	724bd3f311	do not do active work if cfs update failed We ignored if the cluster state update failed and happily worked with an empty state, resulting in strange actions, e.g., the removal of all (not so) "stale" services or changing the all but the masters node state to unknown. Check on the update result and if failed, either do not get active, or, if already active, skip the current round with the knowledge that we only got here because the update failed but our lock renew worked => cfs got already in a working and quorate state again - (probably just a restart) Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com> Reviewed-by: Dominik Csapak <d.csapak@proxmox.com> Tested-by: Dominik Csapak <d.csapak@proxmox.com>	2018-01-30 09:33:16 +01:00
Thomas Lamprecht	3df1538094	move cfs update to common code We updated the CRM and LRM view of the cluster state only in the PVE2 environment, outside of all regression testing and simulation scope. Further, we ignored if this update failed and happily worked with an empty state, resulting in strange actions, e.g., the removal of all (not so) "stale" services or changing the all but the masters node state to unknown. This patch tries to improve this by moving out the update in a own environment method, cluster_update_state, calling this in the LRM and CRM and saving its result. As with our introduced functionallity to simulate cfs rw or update errors we can also simulate failures of this state update with the RT system. Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com> Reviewed-by: Dominik Csapak <d.csapak@proxmox.com> Tested-by: Dominik Csapak <d.csapak@proxmox.com>	2018-01-30 09:33:16 +01:00
Thomas Lamprecht	da6f041699	move start/end hooks to common code We called them at similar times anyways, and have them under the regression test cover with this change. Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com> Reviewed-by: Dominik Csapak <d.csapak@proxmox.com> Tested-by: Dominik Csapak <d.csapak@proxmox.com>	2018-01-30 09:33:16 +01:00
Thomas Lamprecht	ada4b9a830	Revert "wrap possible problematic cfs_read_file calls in eval" This reverts commit bf7febe3771d6f9a2aef97bcd6eab4ece098c5aa. Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com> Reviewed-by: Dominik Csapak <d.csapak@proxmox.com> Tested-by: Dominik Csapak <d.csapak@proxmox.com>	2018-01-30 09:33:16 +01:00
Thomas Lamprecht	30b4f397a0	CRM: refactor check if state transition to active is ok Mainly addresses a problem where we read the manager status without catching any possible exceptions. As this was done only to check if our node has active fencing jobs, which tells us that it makes no sense to even try to acquire the manager lock - as we're fenced soon anyway. Besides this check we always checked if we're quorate and if there are services configured, so move both checks in the new 'can_get_active' method, which replaces the check_pending_fencing and the has_services method. Move the quorum check in front and catch a possible error from the following manager status read. As a side effect the state transition code gets a bit shorter without hiding the check intention. Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com> Reviewed-by: Dominik Csapak <d.csapak@proxmox.com> Tested-by: Dominik Csapak <d.csapak@proxmox.com>	2018-01-30 09:33:16 +01:00
Thomas Lamprecht	8e940b68f9	lrm: handle an error during service_status update we may get an error here if the cluster filesystem is (temporarily) unavailable here, this error resulted in stopping the whole CRM service immediately, which then triggered a node reset (if happened on the current master), even if we had still time left to retry and thus, for example, handle a update of pve-cluster gracefully. Add a method which wraps the status read in an eval and logs an eventual error, but does not abort the service. Instead we rely on our get_protected_ha_agent_lock method to detect a problem and switch to the lost_agent_lock state. If the pmxcfs outage was really short, so that the manager status read failed but the lock update worked again we update also always before doing real work when in the 'active' state. If this update fails we return from the eval and try next round again, as no point in doing anything without consistent state. Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com> Reviewed-by: Dominik Csapak <d.csapak@proxmox.com> Tested-by: Dominik Csapak <d.csapak@proxmox.com>	2018-01-30 09:33:16 +01:00
Thomas Lamprecht	ba2a45cd9d	test/sim: allow to simulate cfs failures Add simulated hardware commands for the cluster file system. This allows to tell the regression test or simulator system that a certain nodes calls to methods accessing the CFS should fail, i.e., die. With this we can cover a situation which mainly happen during a cluster file system update. For now allow to define if the CFS is read-/writeable (state rw) and if updates of the CFS (state update) should work or fail. Add 'can read/write' assertions all over the relevant methods. Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com> Reviewed-by: Dominik Csapak <d.csapak@proxmox.com> Tested-by: Dominik Csapak <d.csapak@proxmox.com>	2018-01-30 09:31:03 +01:00
Thomas Lamprecht	3166752f13	postinst: use auto generated postinst This was introduced for cleaning up an possible left over systemd watchdog mux enable link, which is gone for good now. Then it was extended with trigger targets, as the HA Manager services now restart when the pve-api-update trigger fires. As the autogenerated postinst does the same unconditionally for the pve-ha-lrm.service and pve-ha-crm.service already we may remove it too. The only difference is that try-restart is used by the auto generated script, not reload-or-try-restart, but this does not matter, as the HA services have currently no reload ability. Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2018-01-26 09:37:22 +01:00
Thomas Lamprecht	c122969ff2	postinst: we do not use templates, remove debconf This was copied by accident when adding the transitional code for removing the left over of the systemd managed watchdog mux in commit f8a3fc80af299e613c21c9b67e29aee8cc807018 Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2018-01-26 09:37:22 +01:00
Thomas Lamprecht	e2c96fdae4	postinst: drop transitional systemd watchdog mux socket cleanup This transitional code was added first with commit f8a3fc80af299e613c21c9b67e29aee8cc807018 and fixed up with commit ecc145c9724f056549e5458f17d7714ac8c83459 during Proxmox VE 4.1 and 4.2 to remove the problematic systemd managed watchdog mux socket. As each system going for an distribution upgrade must first upgrade to 4.4, where this gets handled, we can remove it now. Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2018-01-26 09:37:22 +01:00
Thomas Lamprecht	0da2e042e1	watchdog mux: trailing whitespace cleanup Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2018-01-16 09:10:01 +01:00
Thomas Lamprecht	1dd1d6cd3a	watchdog mux: fix comment, there's no systemd .socket anymore Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2018-01-16 09:09:47 +01:00
Thomas Lamprecht	5f09eb480d	fix typo in simulator package description Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2018-01-15 13:15:48 +01:00
Fabian Grünbichler	a6b9892808	debian/rules: add some explaining comments Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com> Reviewed-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2017-12-28 16:37:25 +01:00
Fabian Grünbichler	1abfa1f8ec	debian/rules: don't dh_systemd_start watchdog-mux as it's a static unit dh_systemd_starting it is not possible - but it gets pulled in and started by pve-ha-crm/pve-ha-lrm anyway. Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com> Reviewed-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2017-12-28 16:37:25 +01:00
Fabian Grünbichler	449a03b794	debian/rules: add file names to dh_systemd_enable otherwise it gets confused and enables pve-ha-crm twice in the postinst. Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com> Reviewed-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2017-12-28 16:37:25 +01:00
Thomas Lamprecht	cf1ad777ff	buildsys: also cleanup *.buildinfo files Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2017-11-16 11:32:58 +01:00
Wolfgang Bumiller	5d82b887eb	bump version to 2.0-4 Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>	2017-11-09 11:47:36 +01:00
Thomas Lamprecht	bf7febe377	wrap possible problematic cfs_read_file calls in eval Wrap those calls to the cfs_read_file method, which may now also die if there was a grave problem reading the file, into eval in all methods which are used by the ha services. The ones only used by API calls or CLI helpers are not wrapped, as there it can be handled more gracefull (i.e., no watchdog is running) and further, this is more intended to temporarily workaround until we handle such an exception explicitly in the services - which is a bit bigger change, so let's just go back to the old behavior for now. Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2017-11-09 11:42:12 +01:00
Thomas Lamprecht	f466005d20	swap native syslog command with HA environment one Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>	2017-11-08 06:01:46 +01:00

... 2 3 4 5 6 ...

730 Commits