From 6f83ea60e90b18e44cc979834aae2947afa66834 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Zbigniew=20J=C4=99drzejewski-Szmek?= Date: Tue, 26 Apr 2022 22:04:31 +0200 Subject: [PATCH] man: beef up the description of systemd-oomd.service The gist of the description is moved from systemd.resource-control to systemd-oomd man page. Cross-references to OOMPolicy, memory.oom.group, oomctl, ManagedOOMSwap and ManagedOOMMemoryPressure are added in all places. The descriptions are also more down-to-earth: instead of talking about "taking action" let's just say "kill". We *might* add configuration for different actions in the future, but we're not there yet, so let's just describe what we do now. --- man/systemd-oomd.service.xml | 75 +++++++++++++++++++------------- man/systemd.resource-control.xml | 32 +++++++------- man/systemd.service.xml | 14 +++--- 3 files changed, 70 insertions(+), 51 deletions(-) diff --git a/man/systemd-oomd.service.xml b/man/systemd-oomd.service.xml index e87a753987..11c9237645 100644 --- a/man/systemd-oomd.service.xml +++ b/man/systemd-oomd.service.xml @@ -29,23 +29,36 @@ Description - systemd-oomd is a system service that uses cgroups-v2 and pressure stall information (PSI) - to monitor and take action on processes before an OOM occurs in kernel space. + systemd-oomd is a system service that uses cgroups-v2 and pressure stall + information (PSI) to monitor and take corrective action before an OOM occurs in the kernel space. - You can enable monitoring and actions on units by setting ManagedOOMSwap= and/or - ManagedOOMMemoryPressure= to the appropriate value. systemd-oomd will - periodically poll enabled units' cgroup data to detect when corrective action needs to occur. When an action needs - to happen, it will only be performed on the descendant cgroups of the enabled units. More precisely, only cgroups with - memory.oom.group set to 1 and leaf cgroup nodes are eligible candidates. - Action will be taken recursively on all of the processes under the chosen candidate. + You can enable monitoring and actions on units by setting ManagedOOMSwap= and + ManagedOOMMemoryPressure= in the unit configuration, see + systemd.resource-control5. + systemd-oomd retrieves information about such units from systemd + when it starts and watches for subsequent changes. - See - oomd.conf5 + Cgroups of units with ManagedOOMSwap= or + ManagedOOMMemoryPressure= set to will be monitored. + systemd-oomd periodically polls PSI statistics for the system and those cgroups to + decide when to take action. If the configured limits are exceeded, systemd-oomd will + select a cgroup to terminate, and send SIGKILL to all processes in it. Note that + only descendant cgroups are eligible candidates for killing; the unit with its property set to + is not a candidate (unless one of its ancestors set their property to + ). Also only leaf cgroups and cgroups with memory.oom.group set + to 1 are eligible candidates; see OOMPolicy= in + systemd.service5. + + + oomctl1 can + be used to list monitored cgroups and pressure information. + + See oomd.conf5 for more information about the configuration of this service. - Setup Information + System requirements and configuration The system must be running systemd with a full unified cgroup hierarchy for the expected cgroups-v2 features. Furthermore, memory accounting must be turned on for all units monitored by systemd-oomd. @@ -53,23 +66,25 @@ is set to true in systemd-system.conf5. - You will need a kernel compiled with PSI support. This is available in Linux 4.20 and above. + The kernel must be compiled with PSI support. This is available in Linux 4.20 and above. - It is highly recommended for the system to have swap enabled for systemd-oomd to function - optimally. With swap enabled, the system spends enough time swapping pages to let systemd-oomd react. - Without swap, the system enters a livelocked state much more quickly and may prevent systemd-oomd - from responding in a reasonable amount of time. See - "In defence of swap: common misconceptions" - for more details on swap. Any swap-based actions on systems without swap will be ignored. While - systemd-oomd can perform pressure-based actions on a system without swap, the pressure increases - will be more abrupt and may require more tuning to get the desired thresholds and behavior. + It is highly recommended for the system to have swap enabled for systemd-oomd to + function optimally. With swap enabled, the system spends enough time swapping pages to let + systemd-oomd react. Without swap, the system enters a livelocked state much more + quickly and may prevent systemd-oomd from responding in a reasonable amount of + time. See "In defence of swap: + common misconceptions" for more details on swap. Any swap-based actions on systems without swap + will be ignored. While systemd-oomd can perform pressure-based actions on such a + system, the pressure increases will be more abrupt and may require more tuning to get the desired + thresholds and behavior. Be aware that if you intend to enable monitoring and actions on user.slice, - user-$UID.slice, or their ancestor cgroups, it is highly recommended that your programs be - managed by the systemd user manager to prevent running too many processes under the same session scope (and thus - avoid a situation where memory intensive tasks trigger systemd-oomd to kill everything under the - cgroup). If you're using a desktop environment like GNOME, it already spawns many session components with the - systemd user manager. + user-$UID.slice, or their ancestor cgroups, it is highly recommended that your + programs be managed by the systemd user manager to prevent running too many processes under the same + session scope (and thus avoid a situation where memory intensive tasks trigger + systemd-oomd to kill everything under the cgroup). If you're using a desktop + environment like GNOME or KDE, it already spawns many session components with the systemd user manager. + @@ -79,11 +94,11 @@ -.slice, and allowing all descendant cgroups to be eligible candidates may make the most sense. - ManagedOOMMemoryPressure= tends to work better on the cgroups below the root slice - -.slice. For units which tend to have processes that are less latency sensitive (e.g. - system.slice), a higher limit like the default of 60% may be acceptable, as those processes - can usually ride out slowdowns caused by lack of memory without serious consequences. However, something like - user@$UID.service may prefer a much lower value like 40%. + ManagedOOMMemoryPressure= tends to work better on the cgroups below the root + slice. For units which tend to have processes that are less latency sensitive (e.g. + system.slice), a higher limit like the default of 60% may be acceptable, as those + processes can usually ride out slowdowns caused by lack of memory without serious consequences. However, + something like user@$UID.service may prefer a much lower value like 40%. diff --git a/man/systemd.resource-control.xml b/man/systemd.resource-control.xml index d9edb6ab74..ce03a2f1a6 100644 --- a/man/systemd.resource-control.xml +++ b/man/systemd.resource-control.xml @@ -1108,24 +1108,24 @@ DeviceAllow=/dev/loop-control systemd-oomd.service8 will act on this unit's cgroups. Defaults to . - When set to , systemd-oomd will actively monitor this unit's - cgroup metrics to decide whether it needs to act. If the cgroup passes the limits set by - oomd.conf5 or its - overrides, systemd-oomd will send a SIGKILL to all of the processes - under the chosen candidate cgroup. Note that only descendant cgroups can be eligible candidates for killing; - the unit that set its property to is not a candidate (unless one of its ancestors set - their property to ). You can find more details on candidates and kill behavior at + When set to , the unit becomes a candidate for monitoring by + systemd-oomd. If the cgroup passes the limits set by + oomd.conf5 or + the unit configuration, systemd-oomd will select a descendant cgroup and send + SIGKILL to all of the processes under it. You can find more details on + candidates and kill behavior at systemd-oomd.service8 - and oomd.conf5. Setting - either of these properties to will also automatically acquire - After= and Wants= dependencies on - systemd-oomd.service unless DefaultDependencies=no. - + and + oomd.conf5. - When set to , systemd-oomd will not actively use this cgroup's - data for monitoring and detection. However, if an ancestor cgroup has one of these properties set to - , a unit with can still be an eligible candidate for - systemd-oomd to act on. + Setting either of these properties to will also result in + After= and Wants= dependencies on + systemd-oomd.service unless DefaultDependencies=no. + + When set to , systemd-oomd will not actively use this + cgroup's data for monitoring and detection. However, if an ancestor cgroup has one of these + properties set to , a unit with can still be a candidate + for systemd-oomd to terminate. diff --git a/man/systemd.service.xml b/man/systemd.service.xml index 4e4a9732e4..ad303d440b 100644 --- a/man/systemd.service.xml +++ b/man/systemd.service.xml @@ -1130,8 +1130,12 @@ killed by the kernel's OOM killer this is logged but the service continues running. If set to stop the event is logged but the service is terminated cleanly by the service manager. If set to kill and one of the service's processes is killed by the OOM - killer the kernel is instructed to kill all remaining processes of the service, too. Defaults to the - setting DefaultOOMPolicy= in + killer the kernel is instructed to kill all remaining processes of the service too, by setting the + memory.oom.group attribute to 1; also see kernel documentation. + + + Defaults to the setting DefaultOOMPolicy= in systemd-system.conf5 is set to, except for services where Delegate= is turned on, where it defaults to continue. @@ -1142,9 +1146,9 @@ systemd.exec5 for details. - This setting also applies to systemd-oomd, similar to kernel OOM kills - this setting determines the state of the service after systemd-oomd kills a cgroup associated - with the service. + This setting also applies to systemd-oomd, similar to the kernel OOM kills + this setting determines the state of the service after systemd-oomd kills a cgroup + associated with the service.