mirror of
https://github.com/systemd/systemd-stable.git
synced 2024-12-25 23:21:33 +03:00
Merge pull request #23200 from keszybz/oomd-docs
Extend the documentation for oomd a bit
This commit is contained in:
commit
6ef00eb846
@ -29,23 +29,36 @@
|
||||
<refsect1>
|
||||
<title>Description</title>
|
||||
|
||||
<para><command>systemd-oomd</command> is a system service that uses cgroups-v2 and pressure stall information (PSI)
|
||||
to monitor and take action on processes before an OOM occurs in kernel space.</para>
|
||||
<para><command>systemd-oomd</command> is a system service that uses cgroups-v2 and pressure stall
|
||||
information (PSI) to monitor and take corrective action before an OOM occurs in the kernel space.</para>
|
||||
|
||||
<para>You can enable monitoring and actions on units by setting <varname>ManagedOOMSwap=</varname> and/or
|
||||
<varname>ManagedOOMMemoryPressure=</varname> to the appropriate value. <command>systemd-oomd</command> will
|
||||
periodically poll enabled units' cgroup data to detect when corrective action needs to occur. When an action needs
|
||||
to happen, it will only be performed on the descendant cgroups of the enabled units. More precisely, only cgroups with
|
||||
<filename>memory.oom.group</filename> set to <constant>1</constant> and leaf cgroup nodes are eligible candidates.
|
||||
Action will be taken recursively on all of the processes under the chosen candidate.</para>
|
||||
<para>You can enable monitoring and actions on units by setting <varname>ManagedOOMSwap=</varname> and
|
||||
<varname>ManagedOOMMemoryPressure=</varname> in the unit configuration, see
|
||||
<citerefentry><refentrytitle>systemd.resource-control</refentrytitle><manvolnum>5</manvolnum></citerefentry>.
|
||||
<command>systemd-oomd</command> retrieves information about such units from <command>systemd</command>
|
||||
when it starts and watches for subsequent changes.</para>
|
||||
|
||||
<para>See
|
||||
<citerefentry><refentrytitle>oomd.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>
|
||||
<para>Cgroups of units with <varname>ManagedOOMSwap=</varname> or
|
||||
<varname>ManagedOOMMemoryPressure=</varname> set to <option>kill</option> will be monitored.
|
||||
<command>systemd-oomd</command> periodically polls PSI statistics for the system and those cgroups to
|
||||
decide when to take action. If the configured limits are exceeded, <command>systemd-oomd</command> will
|
||||
select a cgroup to terminate, and send <constant>SIGKILL</constant> to all processes in it. Note that
|
||||
only descendant cgroups are eligible candidates for killing; the unit with its property set to
|
||||
<option>kill</option> is not a candidate (unless one of its ancestors set their property to
|
||||
<option>kill</option>). Also only leaf cgroups and cgroups with <filename>memory.oom.group</filename> set
|
||||
to <constant>1</constant> are eligible candidates; see <varname>OOMPolicy=</varname> in
|
||||
<citerefentry><refentrytitle>systemd.service</refentrytitle><manvolnum>5</manvolnum></citerefentry>.
|
||||
</para>
|
||||
|
||||
<para><citerefentry><refentrytitle>oomctl</refentrytitle><manvolnum>1</manvolnum></citerefentry> can
|
||||
be used to list monitored cgroups and pressure information.</para>
|
||||
|
||||
<para>See <citerefentry><refentrytitle>oomd.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>
|
||||
for more information about the configuration of this service.</para>
|
||||
</refsect1>
|
||||
|
||||
<refsect1>
|
||||
<title>Setup Information</title>
|
||||
<title>System requirements and configuration</title>
|
||||
|
||||
<para>The system must be running systemd with a full unified cgroup hierarchy for the expected cgroups-v2 features.
|
||||
Furthermore, memory accounting must be turned on for all units monitored by <command>systemd-oomd</command>.
|
||||
@ -53,23 +66,25 @@
|
||||
is set to <constant>true</constant> in
|
||||
<citerefentry><refentrytitle>systemd-system.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>.</para>
|
||||
|
||||
<para>You will need a kernel compiled with PSI support. This is available in Linux 4.20 and above.</para>
|
||||
<para>The kernel must be compiled with PSI support. This is available in Linux 4.20 and above.</para>
|
||||
|
||||
<para>It is highly recommended for the system to have swap enabled for <command>systemd-oomd</command> to function
|
||||
optimally. With swap enabled, the system spends enough time swapping pages to let <command>systemd-oomd</command> react.
|
||||
Without swap, the system enters a livelocked state much more quickly and may prevent <command>systemd-oomd</command>
|
||||
from responding in a reasonable amount of time. See
|
||||
<ulink url="https://chrisdown.name/2018/01/02/in-defence-of-swap.html">"In defence of swap: common misconceptions"</ulink>
|
||||
for more details on swap. Any swap-based actions on systems without swap will be ignored. While
|
||||
<command>systemd-oomd</command> can perform pressure-based actions on a system without swap, the pressure increases
|
||||
will be more abrupt and may require more tuning to get the desired thresholds and behavior.</para>
|
||||
<para>It is highly recommended for the system to have swap enabled for <command>systemd-oomd</command> to
|
||||
function optimally. With swap enabled, the system spends enough time swapping pages to let
|
||||
<command>systemd-oomd</command> react. Without swap, the system enters a livelocked state much more
|
||||
quickly and may prevent <command>systemd-oomd</command> from responding in a reasonable amount of
|
||||
time. See <ulink url="https://chrisdown.name/2018/01/02/in-defence-of-swap.html">"In defence of swap:
|
||||
common misconceptions"</ulink> for more details on swap. Any swap-based actions on systems without swap
|
||||
will be ignored. While <command>systemd-oomd</command> can perform pressure-based actions on such a
|
||||
system, the pressure increases will be more abrupt and may require more tuning to get the desired
|
||||
thresholds and behavior.</para>
|
||||
|
||||
<para>Be aware that if you intend to enable monitoring and actions on <filename>user.slice</filename>,
|
||||
<filename>user-$UID.slice</filename>, or their ancestor cgroups, it is highly recommended that your programs be
|
||||
managed by the systemd user manager to prevent running too many processes under the same session scope (and thus
|
||||
avoid a situation where memory intensive tasks trigger <command>systemd-oomd</command> to kill everything under the
|
||||
cgroup). If you're using a desktop environment like GNOME, it already spawns many session components with the
|
||||
systemd user manager.</para>
|
||||
<filename>user-$UID.slice</filename>, or their ancestor cgroups, it is highly recommended that your
|
||||
programs be managed by the systemd user manager to prevent running too many processes under the same
|
||||
session scope (and thus avoid a situation where memory intensive tasks trigger
|
||||
<command>systemd-oomd</command> to kill everything under the cgroup). If you're using a desktop
|
||||
environment like GNOME or KDE, it already spawns many session components with the systemd user manager.
|
||||
</para>
|
||||
</refsect1>
|
||||
|
||||
<refsect1>
|
||||
@ -79,11 +94,11 @@
|
||||
<filename>-.slice</filename>, and allowing all descendant cgroups to be eligible candidates may make the most
|
||||
sense.</para>
|
||||
|
||||
<para><varname>ManagedOOMMemoryPressure=</varname> tends to work better on the cgroups below the root slice
|
||||
<filename>-.slice</filename>. For units which tend to have processes that are less latency sensitive (e.g.
|
||||
<filename>system.slice</filename>), a higher limit like the default of 60% may be acceptable, as those processes
|
||||
can usually ride out slowdowns caused by lack of memory without serious consequences. However, something like
|
||||
<filename>user@$UID.service</filename> may prefer a much lower value like 40%.</para>
|
||||
<para><varname>ManagedOOMMemoryPressure=</varname> tends to work better on the cgroups below the root
|
||||
slice. For units which tend to have processes that are less latency sensitive (e.g.
|
||||
<filename>system.slice</filename>), a higher limit like the default of 60% may be acceptable, as those
|
||||
processes can usually ride out slowdowns caused by lack of memory without serious consequences. However,
|
||||
something like <filename>user@$UID.service</filename> may prefer a much lower value like 40%.</para>
|
||||
</refsect1>
|
||||
|
||||
<refsect1>
|
||||
|
@ -1108,24 +1108,24 @@ DeviceAllow=/dev/loop-control
|
||||
<citerefentry><refentrytitle>systemd-oomd.service</refentrytitle><manvolnum>8</manvolnum></citerefentry>
|
||||
will act on this unit's cgroups. Defaults to <option>auto</option>.</para>
|
||||
|
||||
<para>When set to <option>kill</option>, <command>systemd-oomd</command> will actively monitor this unit's
|
||||
cgroup metrics to decide whether it needs to act. If the cgroup passes the limits set by
|
||||
<citerefentry><refentrytitle>oomd.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry> or its
|
||||
overrides, <command>systemd-oomd</command> will send a <constant>SIGKILL</constant> to all of the processes
|
||||
under the chosen candidate cgroup. Note that only descendant cgroups can be eligible candidates for killing;
|
||||
the unit that set its property to <option>kill</option> is not a candidate (unless one of its ancestors set
|
||||
their property to <option>kill</option>). You can find more details on candidates and kill behavior at
|
||||
<para>When set to <option>kill</option>, the unit becomes a candidate for monitoring by
|
||||
<command>systemd-oomd</command>. If the cgroup passes the limits set by
|
||||
<citerefentry><refentrytitle>oomd.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry> or
|
||||
the unit configuration, <command>systemd-oomd</command> will select a descendant cgroup and send
|
||||
<constant>SIGKILL</constant> to all of the processes under it. You can find more details on
|
||||
candidates and kill behavior at
|
||||
<citerefentry><refentrytitle>systemd-oomd.service</refentrytitle><manvolnum>8</manvolnum></citerefentry>
|
||||
and <citerefentry><refentrytitle>oomd.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>. Setting
|
||||
either of these properties to <option>kill</option> will also automatically acquire
|
||||
<varname>After=</varname> and <varname>Wants=</varname> dependencies on
|
||||
<filename>systemd-oomd.service</filename> unless <varname>DefaultDependencies=no</varname>.
|
||||
</para>
|
||||
and
|
||||
<citerefentry><refentrytitle>oomd.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>.</para>
|
||||
|
||||
<para>When set to <option>auto</option>, <command>systemd-oomd</command> will not actively use this cgroup's
|
||||
data for monitoring and detection. However, if an ancestor cgroup has one of these properties set to
|
||||
<option>kill</option>, a unit with <option>auto</option> can still be an eligible candidate for
|
||||
<command>systemd-oomd</command> to act on.</para>
|
||||
<para>Setting either of these properties to <option>kill</option> will also result in
|
||||
<varname>After=</varname> and <varname>Wants=</varname> dependencies on
|
||||
<filename>systemd-oomd.service</filename> unless <varname>DefaultDependencies=no</varname>.</para>
|
||||
|
||||
<para>When set to <option>auto</option>, <command>systemd-oomd</command> will not actively use this
|
||||
cgroup's data for monitoring and detection. However, if an ancestor cgroup has one of these
|
||||
properties set to <option>kill</option>, a unit with <option>auto</option> can still be a candidate
|
||||
for <command>systemd-oomd</command> to terminate.</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
|
@ -1123,15 +1123,25 @@
|
||||
<varlistentry>
|
||||
<term><varname>OOMPolicy=</varname></term>
|
||||
|
||||
<listitem><para>Configure the Out-Of-Memory (OOM) killer policy. On Linux, when memory becomes scarce
|
||||
the kernel might decide to kill a running process in order to free up memory and reduce memory
|
||||
<listitem><para>Configure the out-of-memory (OOM) kernel killer policy. Note that the userspace OOM
|
||||
killer
|
||||
<citerefentry><refentrytitle>systemd-oomd.service</refentrytitle><manvolnum>8</manvolnum></citerefentry>
|
||||
is a more flexible solution that aims to prevent out-of-memory situations for the userspace, not just
|
||||
the kernel.</para>
|
||||
|
||||
<para>On Linux, when memory becomes scarce to the point that the kernel has trouble allocating memory
|
||||
for itself, it might decide to kill a running process in order to free up memory and reduce memory
|
||||
pressure. This setting takes one of <constant>continue</constant>, <constant>stop</constant> or
|
||||
<constant>kill</constant>. If set to <constant>continue</constant> and a process of the service is
|
||||
killed by the kernel's OOM killer this is logged but the service continues running. If set to
|
||||
<constant>stop</constant> the event is logged but the service is terminated cleanly by the service
|
||||
manager. If set to <constant>kill</constant> and one of the service's processes is killed by the OOM
|
||||
killer the kernel is instructed to kill all remaining processes of the service, too. Defaults to the
|
||||
setting <varname>DefaultOOMPolicy=</varname> in
|
||||
killer the kernel is instructed to kill all remaining processes of the service too, by setting the
|
||||
<filename>memory.oom.group</filename> attribute to <constant>1</constant>; also see <ulink
|
||||
url="https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html">kernel documentation</ulink>.
|
||||
</para>
|
||||
|
||||
<para>Defaults to the setting <varname>DefaultOOMPolicy=</varname> in
|
||||
<citerefentry><refentrytitle>systemd-system.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>
|
||||
is set to, except for services where <varname>Delegate=</varname> is turned on, where it defaults to
|
||||
<constant>continue</constant>.</para>
|
||||
@ -1142,9 +1152,9 @@
|
||||
<citerefentry><refentrytitle>systemd.exec</refentrytitle><manvolnum>5</manvolnum></citerefentry> for
|
||||
details.</para>
|
||||
|
||||
<para>This setting also applies to <command>systemd-oomd</command>, similar to kernel OOM kills
|
||||
this setting determines the state of the service after <command>systemd-oomd</command> kills a cgroup associated
|
||||
with the service.</para></listitem>
|
||||
<para>This setting also applies to <command>systemd-oomd</command>, similar to the kernel OOM kills
|
||||
this setting determines the state of the service after <command>systemd-oomd</command> kills a cgroup
|
||||
associated with the service.</para></listitem>
|
||||
</varlistentry>
|
||||
|
||||
</variablelist>
|
||||
|
@ -180,13 +180,13 @@ finish:
|
||||
return r;
|
||||
}
|
||||
|
||||
/* Fill `new_h` with `path`'s descendent OomdCGroupContexts. Only include descendent cgroups that are possible
|
||||
/* Fill 'new_h' with 'path's descendant OomdCGroupContexts. Only include descendant cgroups that are possible
|
||||
* candidates for action. That is, only leaf cgroups or cgroups with memory.oom.group set to "1".
|
||||
*
|
||||
* This function ignores most errors in order to handle cgroups that may have been cleaned up while populating
|
||||
* the hashmap.
|
||||
* This function ignores most errors in order to handle cgroups that may have been cleaned up while
|
||||
* populating the hashmap.
|
||||
*
|
||||
* `new_h` is of the form { key: cgroup paths -> value: OomdCGroupContext } */
|
||||
* 'new_h' is of the form { key: cgroup paths -> value: OomdCGroupContext } */
|
||||
static int recursively_get_cgroup_context(Hashmap *new_h, const char *path) {
|
||||
_cleanup_free_ char *subpath = NULL;
|
||||
_cleanup_closedir_ DIR *d = NULL;
|
||||
|
@ -170,7 +170,7 @@ static int run(int argc, char *argv[]) {
|
||||
assert_se(sigprocmask_many(SIG_BLOCK, NULL, SIGTERM, SIGINT, -1) >= 0);
|
||||
|
||||
if (arg_mem_pressure_usec > 0 && arg_mem_pressure_usec < 1 * USEC_PER_SEC)
|
||||
log_error_errno(SYNTHETIC_ERRNO(EINVAL), "DefaultMemoryPressureDurationSec= must be 0 or at least 1s");
|
||||
return log_error_errno(SYNTHETIC_ERRNO(EINVAL), "DefaultMemoryPressureDurationSec= must be 0 or at least 1s");
|
||||
|
||||
r = manager_new(&m);
|
||||
if (r < 0)
|
||||
|
Loading…
Reference in New Issue
Block a user