1
0
mirror of https://github.com/systemd/systemd.git synced 2025-03-31 14:50:15 +03:00

core/service: introduce sd_notify() RESTART_RESET=1 for resetting restart counter

We have RestartMaxDelaySec= + RestartSteps= to exponentially increase
auto restart durations, but it currently cannot be reset by the service
itself, which makes it sometimes awkward to use. A typical pattern
in real life is that a service was once down (e.g. due to temporary
network interruption) and multiple restarts were attempted. Then,
future restarts would always wait for increated amount of time based on
RestartMaxDelaySec=, even after the original problem got resolved.
Such "persistence" could result in longer unavailablity than there
should be for failures that come later.
(C.f. https://utcc.utoronto.ca/~cks/space/blog/linux/SystemdResettingUnitBackoff)

Let's introduce a new sd_notify() notification for resetting the restart
counter. There were discussions about making this timer-based, but I think
it's more flexible to leave the decision-making to the service. This enables
them to do a combination of N successful requests + uptime check for instance.
This commit is contained in:
Mike Yuan 2024-10-26 01:51:04 +02:00
parent a364ebd46d
commit 406aeb5da6
No known key found for this signature in database
GPG Key ID: 417471C0A40F58B3
2 changed files with 25 additions and 2 deletions

View File

@ -333,7 +333,7 @@
<citerefentry><refentrytitle>systemd.service</refentrytitle><manvolnum>5</manvolnum></citerefentry>
for information how to enable this functionality and
<citerefentry><refentrytitle>sd_watchdog_enabled</refentrytitle><manvolnum>3</manvolnum></citerefentry>
for the details of how the service can check whether the watchdog is enabled. </para></listitem>
for the details of how the service can check whether the watchdog is enabled.</para></listitem>
</varlistentry>
<varlistentry>
@ -345,7 +345,7 @@
in time. Note that <varname>WatchdogSec=</varname> does not need to be enabled for
<literal>WATCHDOG=trigger</literal> to trigger the watchdog action. See
<citerefentry><refentrytitle>systemd.service</refentrytitle><manvolnum>5</manvolnum></citerefentry>
for information about the watchdog behavior. </para>
for information about the watchdog behavior.</para>
<xi:include href="version-info.xml" xpointer="v243"/></listitem>
</varlistentry>
@ -376,6 +376,18 @@
<xi:include href="version-info.xml" xpointer="v236"/></listitem>
</varlistentry>
<varlistentry>
<term>RESTART_RESET=1</term>
<listitem><para>Reset the restart counter of the service, which has the effect of restoring
the restart duration to <varname>RestartSec=</varname> if <varname>RestartSteps=</varname> and
<varname>RestartMaxDelaySec=</varname> are in use. For more information, refer to
<citerefentry><refentrytitle>systemd.service</refentrytitle><manvolnum>5</manvolnum></citerefentry>.
</para>
<xi:include href="version-info.xml" xpointer="v258"/></listitem>
</varlistentry>
<varlistentry>
<term>FDSTORE=1</term>

View File

@ -4861,6 +4861,17 @@ static void service_notify_message(
service_override_watchdog_timeout(s, watchdog_override_usec);
}
/* Interpret RESTART_RESET=1 */
if (strv_contains(tags, "RESTART_RESET=1") && IN_SET(s->state, SERVICE_RUNNING, SERVICE_STOP)) {
log_unit_struct(u, LOG_NOTICE,
LOG_UNIT_MESSAGE(u, "Got RESTART_RESET=1, resetting restart counter from %u.", s->n_restarts),
"N_RESTARTS=0",
LOG_UNIT_INVOCATION_ID(u));
s->n_restarts = 0;
notify_dbus = true;
}
/* Process FD store messages. Either FDSTOREREMOVE=1 for removal, or FDSTORE=1 for addition. In both cases,
* process FDNAME= for picking the file descriptor name to use. Note that FDNAME= is required when removing
* fds, but optional when pushing in new fds, for compatibility reasons. */