IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Check if systemd is active by testing if the /run/systemd/system
directory exists, just like debhelper generated code does, before
running systemctl.
Allows for setting up a package's build dependencies in containers not
managed by systemd.
Signed-off-by: Maximiliano Sandoval <m.sandoval@proxmox.com>
[TL: extend commit message and note that this fixes setting up the
build env, not the build itself]
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
There is no user of File::Path remaining after commit 787b66e
("SimCluster: setup status dir inside new") which was the only user
of remove_tree(). make_path() was not used at all according to git
history.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
This is mostly cosmetic, because as long as there are configured
services a CRM would get active anyway. But it can happen that a
maintenance mode is left-over and all services got removed a fresh
cluster start then will keep all CRMs as idle and thus never clear the
maintenance state.
This can be especially confusing now, as a recent pve-manager commit
993d05abc ("api/ui: include the node ha status in resources call and
show as icon") started to show the maintenance mode as icon in the web
UI, thus making this blip much more prominent.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Add helper that returns if the CRM command queue holds any commands
without altering the state of the queue at all, unlike the existing
read_crm_commands method would do.
This will be used to check if a CRM needs to become active when there
are pending CRM commands but no master seems to process them as all
are idle/offline.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
To test the behavior for when a CRM should get active or stay active
(for a bit longer).
These cases show the status quo, which will be improved on in the next
commits.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
This is mostly for convenience for when one does a quick HA evaluation
and then removes all services again, the biggest visible effect is
that there will be no status updates once the CRM is idle, reducing
some mid-frequent updates to pmxcfs.
As the CRM never lets the watchdog run out pro-actively to trigger
fencing, this won't have much of a difference w.r.t. "accidental"
self-fencing on the common outage situation that happen in practice,
i.e. quorum loss due to network outage or corosync getting
misconfigured, that's also why this was not considered when adding the
auto-idling for LRM back in commit 2105170 ("LRM: release lock and
close watchdog if no service configured for >10min").
In short, the watchdog for CRM is mostly here to avoid a situation
where the process of the currently active CRM hangs, or does not get
scheduled for a while such that another CRM becomes active only that
the previous one then resumes and still thinks it is the active one
and, e.g., writes out a outdated manager_status file; there are some
other situations, but it's always similar reason.
Compared to the LRM idle mechanism we require more rounds for the CRM
to go idle (90 for CRM vs 60 for LRM), the reason here is that the LRM
needs an active CRM for some operations to progress in the FSM, so
just waiting a bit longer for the CRM is enough to ensure that this
can happen.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
This will be reused for the auto-idling mechanism, factor out getting
the manager status too as this will be used for more specific checks
about being able to go idle or when a CRM should be active.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
See the added comment for full details why the watchdog protection of
the CRM needs less strict safety requirements compared to the one of
an (active) LRM.
In short, CRM does not manages services but directs them through the
manager_status state-file. This means the watchdog mainly protects it
from a hung system where locks would timeout before writing the state
out and thus a race with the new CRM would happen. So the CRM
basically can always give up the watchdog safely when it stops being
the active CRM anyway.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
else this can break an upgrade for unrelated reasons.
this also mimics debhelper behaviour more (which we only not use here because
of lack of reload support) - restructured the snippet to be more similar with
an explicit `if` as well.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Part of what caused bug #4984. Make the code future-proof and warn
when the node was never registered in the plugin, similar to what the
'static' usage plugin already does.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
[ TL: rework commit message subject ]
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Otherwise, when using the 'basic' plugin, this would lead to
auto-vilification of the $target node in the Perl hash tracking the
usage and it would wrongly be considered online when selecting the
recovery node.
The 'static' plugin was not affected, because it would check and warn
before adding usage to a node that was not registered with add_node()
first. Doing the same in the 'basic' plugin will be done by another
patch.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
[ TL: shorten commit message subject ]
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
... instead of using sendmail directly.
If the new 'notify.target-fencing' parameter from datacenter config
is set, we use it as a target for notifications. If it is not set,
we send the notification to the default target (mail-to-root).
There is also a new 'notify.fencing' paramter which controls if
notifications should be sent at all. If it is not set, we
default to the old behavior, which is to send.
Also add dependency to the `libpve-notify-perl` package to d/control.
Signed-off-by: Lukas Wagner <l.wagner@proxmox.com>
Currently, the maintenance node for a service is only cleared when the
service is started on another node. In the edge case of a simultaneous
cluster shutdown however, it might be that the service never was
started anywhere else after the maintenance node was recorded, because
the other nodes were already in the process of being shut down too.
If a user ends up in this edge case, it would be rather surprising
that the service would be automatically migrated back to the
"maintenance node" which actually is not in maintenance mode anymore
after a migration away from it.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
In the test log, it can be seen that the service will unexpectedly be
migrated back. This is caused by the service's maintainance node
property being set by the initial shutdown, but never cleared, because
that currently happens only when the service is started on a different
node. The next commit will address the issue.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Instead, use the new get_derived_property() method to get the same
information in a way that is robust regarding changes in the
configuration structure.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
The fact that no 'items' was specified made the api-viewer throw a
JavaScript exception: retinf.items is undefined
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
As reported in the community forum[0], currently, a newly added
service that's already running is shut down, offline migrated and
started again if rebalance selects a new node for it. This is
unexpected.
An improvement would be online migrating the service, but rebalance
is only supposed to happen for a stopped->start transition[1], so the
service should not being migrated at all.
The cleanest solution would be for the CRM to use the state 'started'
instead of 'request_start' for newly added services that are already
running, i.e. restore the behavior from before commit c2f2b9c
("manager: set new request_start state for services freshly added to
HA") for such services. But currently, there is no mechanism for the
CRM to check if the service is already running, because it could be on
a different node. For now, avoiding the migration has to be handled in
the LRM instead. If the CRM ever has access to the necessary
information in the future, to solution mentioned above can be
re-considered.
Note that the CRM log message relies on the fact that the LRM only
returns the IGNORED status in this case, but it's more user-friendly
than using a generic message like "migration ignored (check LRM
log)".
[0]: https://forum.proxmox.com/threads/125597/
[1]: https://pve.proxmox.com/pve-docs/chapter-ha-manager.html#_crs_scheduling_points
Suggested-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
[ T: split out adding the test to a previous commit so that one can
see in git what the original bad behavior was and how it's now ]
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Split out from Fiona's original series, to better show what actually
changes with her fix.
Currently, a newly added service that's already running is shut down,
offline migrated and started again if rebalance selects a new node
for it. This is unexpected and should be fixed, encode that behavior
as a test now, showing still the undesired behavior, and fix it in
the next commit
Originally-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Will be used to ignore rebalance-on-start when an already running
service is newly added to HA.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Will be used in a test for balance on start, where it should make a
difference if the service is running or not.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
The DEB_HOST_ARCH is the one the package is actually built for, the
DEB_BUILD_ARCH is the one of the build host; having this correct
makes cross-building easier, but otherwise it makes no difference.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>