5
0
mirror of git://git.proxmox.com/git/pve-ha-manager.git synced 2025-01-31 05:47:19 +03:00

802 Commits

Author SHA1 Message Date
Thomas Lamprecht
7fd7af67e5 manager: recompute online usage: iterate over keys sorted
mostly to be safe for reproduce ability with the test system.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-03-20 11:09:01 +01:00
Thomas Lamprecht
b159176a9b manager: service start: make EWRONG_NODE a non-fatal error
traverse the usual error counting mechanisms, as then the
select_service_node helper either picks up the right node and it
starts there or it can trigger fencing of that.

Note, in practice this normally can only happen if the admin
butchered around in the node cluster state, but as we only select the
safe nodes from the configured groups, we should be safe in any case.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-03-20 11:09:01 +01:00
Thomas Lamprecht
49b0ccc7fe sim hardware: avoid hard error on usage stats parsing
now that we can automatically derive them from the SID

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-03-20 11:09:01 +01:00
Thomas Lamprecht
d9a55b5d3c sim env: derive service usage from ID as fallback
so that we don't need to specify all usage stats explicitly for
bigger tests.

Note, we explicitly use two digits for memory as with just one a lot
of services are exactly the same, which gives us flaky tests due to
rounding, or some flakiness in the rust code - so this is a bit of a
stop gap for that too and should be reduced to a single digit once
we fixed it in the future.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-03-20 11:09:01 +01:00
Thomas Lamprecht
de225e04c4 update readme to be a bit less confusing/outdated
E.g., pve-ha-manager is our current HA manager, so talking about the
"current HA stack" being EOL without mentioning the actually meant
`rgmanager` one, got taken up the wrong way by some potential users.
Correct that and a few other things, but as there are definitively
stuff still out-of-date, or will be in a few months, mention that
this is an older readme and refer to the HA reference docs at the
top.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-01-03 13:19:18 +01:00
Thomas Lamprecht
071e69ce7f bump version to 3.5.1
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-11-19 15:51:16 +01:00
Thomas Lamprecht
475f19fe7d api: status: add CRS info to manager if not set to default
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-11-19 15:50:14 +01:00
Thomas Lamprecht
f2c729829f manager: slightly clarify log message for fallback on init-failure
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-11-19 15:50:14 +01:00
Thomas Lamprecht
d062598531 api: status: code and indentation cleanup
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-11-19 15:50:14 +01:00
Thomas Lamprecht
1b81383180 manager: make crs a full blown hash
To support potential more CRS settings more easily.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-11-19 15:50:14 +01:00
Thomas Lamprecht
086f7075d0 manager: update crs scheduling mode once per round
Pretty safe to do as we recompute everything per round anyway (and
much more often on top of that, but that's another topic).

Actually I'd argue that it's safer as this way a user doesn't need to
actively restart the manager, which grinds much more gears and
watchdog changes than checking periodically and updating it
internally. Plus, a lot of admins won't expect that they need to
restart the current active master and thus they'll complain that
their recently made change to the CRS config had no effect/the CRS
doesn't work at all.

We should codify such a change in test for this though.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-11-19 14:05:26 +01:00
Thomas Lamprecht
cb06cd421a manager: factor out setting crs scheduling mode
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-11-19 13:36:28 +01:00
Thomas Lamprecht
83a84eb0e3 manager: various code style cleanups
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-11-19 13:06:03 +01:00
Thomas Lamprecht
091f890416 bump version to 3.5.0
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-11-18 15:03:00 +01:00
Thomas Lamprecht
c2d8b56a97 manager: better convey that basic is always the fallback
to hint to a potential "code optimizer" that it may not be easily
moved above to the scheduling selection

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-11-18 15:01:12 +01:00
Thomas Lamprecht
42d9b683f2 d/control: add (build-)dependency for libpve-rs-perl
to ensure we got the perlmod for the basic scheduler available.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-11-18 13:44:45 +01:00
Fiona Ebner
4788830551 resources: add missing PVE::Cluster use statements
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-11-18 13:25:21 +01:00
Fiona Ebner
f348399fe4 test: add tests for static resource scheduling
See the READMEs for more information about the tests.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-11-18 13:25:21 +01:00
Fiona Ebner
223a2ca493 usage: static: use service count on nodes as a fallback
if something goes wrong with the TOPSIS scoring. Not expected to
happen, but it's rather cheap to be on the safe side.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-11-18 13:25:21 +01:00
Fiona Ebner
c724ce1be7 manager: avoid scoring nodes when not trying next and current node is valid
With the Usage::Static plugin, scoring is not as cheap anymore and
select_service_node() is called for each running service.

This should cover most calls of select_service_node().

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-11-18 13:25:21 +01:00
Fiona Ebner
631ba60ef2 manager: avoid scoring nodes if maintenance fallback node is valid
With the Usage::Static plugin, scoring is not as cheap anymore and
select_service_node() is called for each running service.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-11-18 13:25:21 +01:00
Fiona Ebner
561e7f4bfb manager: use static resource scheduler when configured
Note that recompute_online_node_usage() becomes much slower when the
'static' resource scheduler mode is used. Tested it with ~300 HA
services (minimal containers) running on my virtual test cluster.

Timings with 'basic' mode were between 0.0004 - 0.001 seconds
Timings with 'static' mode were between 0.007 - 0.012 seconds

Combined with the fact that recompute_online_node_usage() is currently
called very often this can lead to a lot of delay during recovery
situations with hundreds of services and low thousands of services
overall and with genereous estimates even run into the watchdog timer.

Ideas to remedy this is using PVE::Cluster's
get_guest_config_properties() instead of load_config() and/or
optimizing how often recompute_online_node_usage() is called.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-11-18 13:25:21 +01:00
Fiona Ebner
f74f8ffb24 manager: set resource scheduler mode upon init
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-11-18 13:25:21 +01:00
Fiona Ebner
7c142d6822 env: datacenter config: include crs (cluster-resource-scheduling) setting
Suggested-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-11-18 13:25:21 +01:00
Fiona Ebner
749d8161be env: rename get_ha_settings to get_datacenter_settings
The method will be extended to include other HA-relevant settings from
datacenter.cfg.

Suggested-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-11-18 13:25:21 +01:00
Fiona Ebner
48f2144b27 usage: add Usage::Static plugin
for calculating node usage of services based upon static CPU and
memory configuration as well as scoring the nodes with that
information to decide where to start a new or recovered service.

For getting the service stats, it's necessary to also consider the
migration target (if present), becuase the configuration file might
have already moved.

It's necessary to update the cluster filesystem upon stealing the
service to be able to always read the moved config right away when
adding the usage.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-11-18 13:25:21 +01:00
Fiona Ebner
5d724d4dd9 manager: online node usage: switch to Usage::Basic plugin
no functional change is intended.

One test needs adaptation too, because it created its own version of
$online_node_usage.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-11-18 13:25:21 +01:00
Fiona Ebner
b259857688 manager: select service node: add $sid to parameters
In preparation for scheduling based on static information, where the
scoring of nodes depends on information from the service's
VM/CT configuration file (and the $sid is required to query that).

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-11-18 13:25:21 +01:00
Fiona Ebner
c8c6e462fc add Usage base plugin and Usage::Basic plugin
in preparation to also support static resource scheduling via another
such Usage plugin.

The interface is designed in anticipation of the Usage::Static plugin,
the Usage::Basic plugin doesn't require all parameters.

In Usage::Static, the $haenv will necessary for logging and getting
the static node stats. add_service_usage_to_node() and
score_nodes_to_start_service() take the sid, service node and the
former also the optional migration target (during a migration it's not
clear whether the config file has already been moved or not) to be
able to get the static service stats.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-11-18 13:25:21 +01:00
Fiona Ebner
eea0c60923 resources: add get_static_stats() method
to be used for static resource scheduling.

In container's vmstatus(), the 'cores' option takes precedence over
the 'cpulimit' one, but it felt more accurate to prefer 'cpulimit'
here.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-11-18 13:25:21 +01:00
Fiona Ebner
5db695c3f3 env: add get_static_node_stats() method
to be used for static resource scheduling. In the simulation
environment, the information can be added in hardware_status.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-11-18 13:25:21 +01:00
Thomas Lamprecht
0869c306ba fixup variable name typo
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-07-22 12:39:27 +02:00
Thomas Lamprecht
a3ffb0b3d4 manager: add top level comment section to explain common variables
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-07-22 12:15:55 +02:00
Thomas Lamprecht
bc64c08e37 d/lintian-overrides: update for newer lintian
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-07-22 10:06:47 +02:00
Thomas Lamprecht
2a1638b77b bump version to 3.4.0
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-07-22 09:22:52 +02:00
Thomas Lamprecht
6f818da13f manager: online node usage: factor out possible traget and future proof
only count up target selection if that node is already in the online
node usage list, to avoid that a offline node is considered online if
its a target from any command

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-07-22 09:12:38 +02:00
Thomas Lamprecht
8c80973d40 test: update pre-existing policy tests for fixed balancing spread
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-07-22 08:49:41 +02:00
Thomas Lamprecht
1280368d31 fix variable name typo
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-07-22 07:25:02 +02:00
Thomas Lamprecht
066fd01670 fix spreading out services if source node isnt operational but otherwise ok
as its the case for going into maintenance mode

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-07-21 18:14:33 +02:00
Thomas Lamprecht
6756e14aed tests: add shutdown policy scenario with multiple guests to spread out
currently wrong as online_node_usage doesn't considers counting the
target node if the source node isn't considered online (=
operational) anymore

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-07-21 18:09:42 +02:00
Thomas Lamprecht
c00c44818a bump version to 3.3-4
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-04-27 14:02:22 +02:00
Fabian Grünbichler
ad6456997e lrm: fix getting stuck on restart
run_workers is responsible for updating the state after workers have
exited. if the current LRM state is 'active', but a shutdown_request was
issued in 'restart' mode (like on package upgrades), this call is the
only one made in the LRM work() loop.

skipping it if there are active services means the following sequence of
events effectively keeps the LRM from restarting or making any progress:

- start HA migration on node A
- reload LRM on node A while migration is still running

even once the migration is finished, the service count is still >= 1
since the LRM never calls run_workers (directly or via
manage_resources), so the service having been migrated is never noticed.

maintenance mode (i.e., rebooting the node with shutdown policy migrate)
does call manage_resources and thus run_workers, and will proceed once
the last worker has exited.

reported by a user:

https://forum.proxmox.com/threads/lrm-hangs-when-updating-while-migration-is-running.108628

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2022-04-27 13:57:37 +02:00
Thomas Lamprecht
fe3781e8ab buildsys: track and upload debug package
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-01-20 18:08:27 +01:00
Thomas Lamprecht
c15a8b803e bump version to 3.3-3
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-01-20 18:05:37 +01:00
Thomas Lamprecht
eef4f86338 lrm: increase run_worker loop-time parition
every LRM round is scheduled to run for 10s but we spend only half
of that to actively trying to run workers (in the max_worker limit).

Raise that to 80% duty cycle.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-01-20 16:17:28 +01:00
Thomas Lamprecht
65c1fbac99 lrm: avoid job starvation on huge workloads
If a setup has a lot VMs we may run into the time limit from the
run_worker loop before processing all workers, which can easily
happen if an admin did not increased their default of max_workers in
the setup, but even with a bigger max_worker setting one can run into
it.

That combined with the fact that we sorted just by the $sid
alpha-numerically means that CTs where preferred over VMs (C comes
before V) and additionally lower VMIDs where preferred too.

That means that a set of SIDs had a lower chance of ever get actually
run, which is naturally not ideal at all.
Improve on that behavior by adding a counter to the queued worker and
preferring those that have a higher one, i.e., spent more time
waiting on getting actively run.

Note, due to the way the stop state is enforced, i.e., always
enqueued as new worker, its start-try counter will be reset every
round and thus have a lower priority compared to other request
states. We probably want to differ between a stop request when the
service is/was in another state just before and the time a stop is
just re-requested even if a service was already stopped for a while.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-01-20 16:14:03 +01:00
Thomas Lamprecht
b538340c9d lrm: code/style cleanups
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-01-20 14:40:27 +01:00
Thomas Lamprecht
f613e426ce lrm: run worker: avoid an indendation level
best viewed with the `-w` flag to ignore whitespace change itself

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-01-20 13:42:15 +01:00
Thomas Lamprecht
a25a516ac6 lrm: log actual error if fork fails
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-01-20 13:39:35 +01:00
Thomas Lamprecht
2deff1ae35 manager: refactor fence processing and rework fence-but-no-service log
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-01-20 13:31:04 +01:00