1
1
mirror of https://github.com/systemd/systemd-stable.git synced 2024-12-25 23:21:33 +03:00

Merge pull request #23200 from keszybz/oomd-docs

Extend the documentation for oomd a bit
This commit is contained in:
Zbigniew Jędrzejewski-Szmek 2022-04-28 17:46:03 +02:00 committed by GitHub
commit 6ef00eb846
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
5 changed files with 83 additions and 58 deletions

View File

@ -29,23 +29,36 @@
<refsect1>
<title>Description</title>
<para><command>systemd-oomd</command> is a system service that uses cgroups-v2 and pressure stall information (PSI)
to monitor and take action on processes before an OOM occurs in kernel space.</para>
<para><command>systemd-oomd</command> is a system service that uses cgroups-v2 and pressure stall
information (PSI) to monitor and take corrective action before an OOM occurs in the kernel space.</para>
<para>You can enable monitoring and actions on units by setting <varname>ManagedOOMSwap=</varname> and/or
<varname>ManagedOOMMemoryPressure=</varname> to the appropriate value. <command>systemd-oomd</command> will
periodically poll enabled units' cgroup data to detect when corrective action needs to occur. When an action needs
to happen, it will only be performed on the descendant cgroups of the enabled units. More precisely, only cgroups with
<filename>memory.oom.group</filename> set to <constant>1</constant> and leaf cgroup nodes are eligible candidates.
Action will be taken recursively on all of the processes under the chosen candidate.</para>
<para>You can enable monitoring and actions on units by setting <varname>ManagedOOMSwap=</varname> and
<varname>ManagedOOMMemoryPressure=</varname> in the unit configuration, see
<citerefentry><refentrytitle>systemd.resource-control</refentrytitle><manvolnum>5</manvolnum></citerefentry>.
<command>systemd-oomd</command> retrieves information about such units from <command>systemd</command>
when it starts and watches for subsequent changes.</para>
<para>See
<citerefentry><refentrytitle>oomd.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>
<para>Cgroups of units with <varname>ManagedOOMSwap=</varname> or
<varname>ManagedOOMMemoryPressure=</varname> set to <option>kill</option> will be monitored.
<command>systemd-oomd</command> periodically polls PSI statistics for the system and those cgroups to
decide when to take action. If the configured limits are exceeded, <command>systemd-oomd</command> will
select a cgroup to terminate, and send <constant>SIGKILL</constant> to all processes in it. Note that
only descendant cgroups are eligible candidates for killing; the unit with its property set to
<option>kill</option> is not a candidate (unless one of its ancestors set their property to
<option>kill</option>). Also only leaf cgroups and cgroups with <filename>memory.oom.group</filename> set
to <constant>1</constant> are eligible candidates; see <varname>OOMPolicy=</varname> in
<citerefentry><refentrytitle>systemd.service</refentrytitle><manvolnum>5</manvolnum></citerefentry>.
</para>
<para><citerefentry><refentrytitle>oomctl</refentrytitle><manvolnum>1</manvolnum></citerefentry> can
be used to list monitored cgroups and pressure information.</para>
<para>See <citerefentry><refentrytitle>oomd.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>
for more information about the configuration of this service.</para>
</refsect1>
<refsect1>
<title>Setup Information</title>
<title>System requirements and configuration</title>
<para>The system must be running systemd with a full unified cgroup hierarchy for the expected cgroups-v2 features.
Furthermore, memory accounting must be turned on for all units monitored by <command>systemd-oomd</command>.
@ -53,23 +66,25 @@
is set to <constant>true</constant> in
<citerefentry><refentrytitle>systemd-system.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>.</para>
<para>You will need a kernel compiled with PSI support. This is available in Linux 4.20 and above.</para>
<para>The kernel must be compiled with PSI support. This is available in Linux 4.20 and above.</para>
<para>It is highly recommended for the system to have swap enabled for <command>systemd-oomd</command> to function
optimally. With swap enabled, the system spends enough time swapping pages to let <command>systemd-oomd</command> react.
Without swap, the system enters a livelocked state much more quickly and may prevent <command>systemd-oomd</command>
from responding in a reasonable amount of time. See
<ulink url="https://chrisdown.name/2018/01/02/in-defence-of-swap.html">"In defence of swap: common misconceptions"</ulink>
for more details on swap. Any swap-based actions on systems without swap will be ignored. While
<command>systemd-oomd</command> can perform pressure-based actions on a system without swap, the pressure increases
will be more abrupt and may require more tuning to get the desired thresholds and behavior.</para>
<para>It is highly recommended for the system to have swap enabled for <command>systemd-oomd</command> to
function optimally. With swap enabled, the system spends enough time swapping pages to let
<command>systemd-oomd</command> react. Without swap, the system enters a livelocked state much more
quickly and may prevent <command>systemd-oomd</command> from responding in a reasonable amount of
time. See <ulink url="https://chrisdown.name/2018/01/02/in-defence-of-swap.html">"In defence of swap:
common misconceptions"</ulink> for more details on swap. Any swap-based actions on systems without swap
will be ignored. While <command>systemd-oomd</command> can perform pressure-based actions on such a
system, the pressure increases will be more abrupt and may require more tuning to get the desired
thresholds and behavior.</para>
<para>Be aware that if you intend to enable monitoring and actions on <filename>user.slice</filename>,
<filename>user-$UID.slice</filename>, or their ancestor cgroups, it is highly recommended that your programs be
managed by the systemd user manager to prevent running too many processes under the same session scope (and thus
avoid a situation where memory intensive tasks trigger <command>systemd-oomd</command> to kill everything under the
cgroup). If you're using a desktop environment like GNOME, it already spawns many session components with the
systemd user manager.</para>
<filename>user-$UID.slice</filename>, or their ancestor cgroups, it is highly recommended that your
programs be managed by the systemd user manager to prevent running too many processes under the same
session scope (and thus avoid a situation where memory intensive tasks trigger
<command>systemd-oomd</command> to kill everything under the cgroup). If you're using a desktop
environment like GNOME or KDE, it already spawns many session components with the systemd user manager.
</para>
</refsect1>
<refsect1>
@ -79,11 +94,11 @@
<filename>-.slice</filename>, and allowing all descendant cgroups to be eligible candidates may make the most
sense.</para>
<para><varname>ManagedOOMMemoryPressure=</varname> tends to work better on the cgroups below the root slice
<filename>-.slice</filename>. For units which tend to have processes that are less latency sensitive (e.g.
<filename>system.slice</filename>), a higher limit like the default of 60% may be acceptable, as those processes
can usually ride out slowdowns caused by lack of memory without serious consequences. However, something like
<filename>user@$UID.service</filename> may prefer a much lower value like 40%.</para>
<para><varname>ManagedOOMMemoryPressure=</varname> tends to work better on the cgroups below the root
slice. For units which tend to have processes that are less latency sensitive (e.g.
<filename>system.slice</filename>), a higher limit like the default of 60% may be acceptable, as those
processes can usually ride out slowdowns caused by lack of memory without serious consequences. However,
something like <filename>user@$UID.service</filename> may prefer a much lower value like 40%.</para>
</refsect1>
<refsect1>

View File

@ -1108,24 +1108,24 @@ DeviceAllow=/dev/loop-control
<citerefentry><refentrytitle>systemd-oomd.service</refentrytitle><manvolnum>8</manvolnum></citerefentry>
will act on this unit's cgroups. Defaults to <option>auto</option>.</para>
<para>When set to <option>kill</option>, <command>systemd-oomd</command> will actively monitor this unit's
cgroup metrics to decide whether it needs to act. If the cgroup passes the limits set by
<citerefentry><refentrytitle>oomd.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry> or its
overrides, <command>systemd-oomd</command> will send a <constant>SIGKILL</constant> to all of the processes
under the chosen candidate cgroup. Note that only descendant cgroups can be eligible candidates for killing;
the unit that set its property to <option>kill</option> is not a candidate (unless one of its ancestors set
their property to <option>kill</option>). You can find more details on candidates and kill behavior at
<para>When set to <option>kill</option>, the unit becomes a candidate for monitoring by
<command>systemd-oomd</command>. If the cgroup passes the limits set by
<citerefentry><refentrytitle>oomd.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry> or
the unit configuration, <command>systemd-oomd</command> will select a descendant cgroup and send
<constant>SIGKILL</constant> to all of the processes under it. You can find more details on
candidates and kill behavior at
<citerefentry><refentrytitle>systemd-oomd.service</refentrytitle><manvolnum>8</manvolnum></citerefentry>
and <citerefentry><refentrytitle>oomd.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>. Setting
either of these properties to <option>kill</option> will also automatically acquire
<varname>After=</varname> and <varname>Wants=</varname> dependencies on
<filename>systemd-oomd.service</filename> unless <varname>DefaultDependencies=no</varname>.
</para>
and
<citerefentry><refentrytitle>oomd.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>.</para>
<para>When set to <option>auto</option>, <command>systemd-oomd</command> will not actively use this cgroup's
data for monitoring and detection. However, if an ancestor cgroup has one of these properties set to
<option>kill</option>, a unit with <option>auto</option> can still be an eligible candidate for
<command>systemd-oomd</command> to act on.</para>
<para>Setting either of these properties to <option>kill</option> will also result in
<varname>After=</varname> and <varname>Wants=</varname> dependencies on
<filename>systemd-oomd.service</filename> unless <varname>DefaultDependencies=no</varname>.</para>
<para>When set to <option>auto</option>, <command>systemd-oomd</command> will not actively use this
cgroup's data for monitoring and detection. However, if an ancestor cgroup has one of these
properties set to <option>kill</option>, a unit with <option>auto</option> can still be a candidate
for <command>systemd-oomd</command> to terminate.</para>
</listitem>
</varlistentry>

View File

@ -1123,15 +1123,25 @@
<varlistentry>
<term><varname>OOMPolicy=</varname></term>
<listitem><para>Configure the Out-Of-Memory (OOM) killer policy. On Linux, when memory becomes scarce
the kernel might decide to kill a running process in order to free up memory and reduce memory
<listitem><para>Configure the out-of-memory (OOM) kernel killer policy. Note that the userspace OOM
killer
<citerefentry><refentrytitle>systemd-oomd.service</refentrytitle><manvolnum>8</manvolnum></citerefentry>
is a more flexible solution that aims to prevent out-of-memory situations for the userspace, not just
the kernel.</para>
<para>On Linux, when memory becomes scarce to the point that the kernel has trouble allocating memory
for itself, it might decide to kill a running process in order to free up memory and reduce memory
pressure. This setting takes one of <constant>continue</constant>, <constant>stop</constant> or
<constant>kill</constant>. If set to <constant>continue</constant> and a process of the service is
killed by the kernel's OOM killer this is logged but the service continues running. If set to
<constant>stop</constant> the event is logged but the service is terminated cleanly by the service
manager. If set to <constant>kill</constant> and one of the service's processes is killed by the OOM
killer the kernel is instructed to kill all remaining processes of the service, too. Defaults to the
setting <varname>DefaultOOMPolicy=</varname> in
killer the kernel is instructed to kill all remaining processes of the service too, by setting the
<filename>memory.oom.group</filename> attribute to <constant>1</constant>; also see <ulink
url="https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html">kernel documentation</ulink>.
</para>
<para>Defaults to the setting <varname>DefaultOOMPolicy=</varname> in
<citerefentry><refentrytitle>systemd-system.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>
is set to, except for services where <varname>Delegate=</varname> is turned on, where it defaults to
<constant>continue</constant>.</para>
@ -1142,9 +1152,9 @@
<citerefentry><refentrytitle>systemd.exec</refentrytitle><manvolnum>5</manvolnum></citerefentry> for
details.</para>
<para>This setting also applies to <command>systemd-oomd</command>, similar to kernel OOM kills
this setting determines the state of the service after <command>systemd-oomd</command> kills a cgroup associated
with the service.</para></listitem>
<para>This setting also applies to <command>systemd-oomd</command>, similar to the kernel OOM kills
this setting determines the state of the service after <command>systemd-oomd</command> kills a cgroup
associated with the service.</para></listitem>
</varlistentry>
</variablelist>

View File

@ -180,13 +180,13 @@ finish:
return r;
}
/* Fill `new_h` with `path`'s descendent OomdCGroupContexts. Only include descendent cgroups that are possible
/* Fill 'new_h' with 'path's descendant OomdCGroupContexts. Only include descendant cgroups that are possible
* candidates for action. That is, only leaf cgroups or cgroups with memory.oom.group set to "1".
*
* This function ignores most errors in order to handle cgroups that may have been cleaned up while populating
* the hashmap.
* This function ignores most errors in order to handle cgroups that may have been cleaned up while
* populating the hashmap.
*
* `new_h` is of the form { key: cgroup paths -> value: OomdCGroupContext } */
* 'new_h' is of the form { key: cgroup paths -> value: OomdCGroupContext } */
static int recursively_get_cgroup_context(Hashmap *new_h, const char *path) {
_cleanup_free_ char *subpath = NULL;
_cleanup_closedir_ DIR *d = NULL;

View File

@ -170,7 +170,7 @@ static int run(int argc, char *argv[]) {
assert_se(sigprocmask_many(SIG_BLOCK, NULL, SIGTERM, SIGINT, -1) >= 0);
if (arg_mem_pressure_usec > 0 && arg_mem_pressure_usec < 1 * USEC_PER_SEC)
log_error_errno(SYNTHETIC_ERRNO(EINVAL), "DefaultMemoryPressureDurationSec= must be 0 or at least 1s");
return log_error_errno(SYNTHETIC_ERRNO(EINVAL), "DefaultMemoryPressureDurationSec= must be 0 or at least 1s");
r = manager_new(&m);
if (r < 0)