mirror of
https://github.com/systemd/systemd.git
synced 2025-05-27 21:05:55 +03:00
Merge pull request #26393 from poettering/mempress
watch and act on memory pressure in most of our long-running services, including PID 1
This commit is contained in:
commit
adee01643d
5
TODO
5
TODO
@ -159,6 +159,11 @@ Features:
|
||||
invokes systemd-mount and exits. This is then useful to use in
|
||||
ENV{SYSTEMD_WANTS} in udev rules, and a bit prettier than using RUN+=
|
||||
|
||||
* udevd: extend memory pressure logic: also kill any idle worker processes
|
||||
|
||||
* SIGRTMIN+18 and memory pressure handling should still be added to: hostnamed,
|
||||
localed, oomd, timedated.
|
||||
|
||||
* sd-journal puts a limit on parallel journal files to view at once. journald
|
||||
should probably honour that same limit (JOURNAL_FILES_MAX) when vacuuming to
|
||||
ensure we never generate more files than we can actually view.
|
||||
|
240
docs/MEMORY_PRESSURE.md
Normal file
240
docs/MEMORY_PRESSURE.md
Normal file
@ -0,0 +1,240 @@
|
||||
---
|
||||
title: Memory Pressure Handling
|
||||
category: Interfaces
|
||||
layout: default
|
||||
SPDX-License-Identifier: LGPL-2.1-or-later
|
||||
---
|
||||
|
||||
# Memory Pressure Handling in systemd
|
||||
|
||||
When the system is under memory pressure (i.e. some component of the OS
|
||||
requires memory allocation but there is only very little or none available),
|
||||
it can attempt various things to make more memory available again ("reclaim"):
|
||||
|
||||
* The kernel can flush out memory pages backed by files on disk, under the
|
||||
knowledge that it can reread them from disk when needed again. Candidate
|
||||
pages are the many memory mapped executable files and shared libraries on
|
||||
disk, among others.
|
||||
|
||||
* The kernel can flush out memory packages not backed by files on disk
|
||||
("anonymous" memory, i.e. memory allocated via `malloc()` and similar calls,
|
||||
or `tmpfs` file system contents) if there's swap to write it to.
|
||||
|
||||
* Userspace can proactively release memory it allocated but doesn't immediately
|
||||
require back to the kernel. This includes allocation caches, and other forms
|
||||
of caches that are not required for normal operation to continue.
|
||||
|
||||
The latter is what we want to focus on in this document: how to ensure
|
||||
userspace process can detect mounting memory pressure early and release memory
|
||||
back to the kernel as it happens, relieving the memory pressure before it
|
||||
becomes too critical.
|
||||
|
||||
The effects of memory pressure during runtime generaly are growing latencies
|
||||
during operation: when a program requires memory but the system is busy writing
|
||||
out memory to (relatively slow) disks in order make some available, this
|
||||
generally surfaces in scheduling latencies, and applications and services will
|
||||
slow down until memory pressure is relieved. Hence, to ensure stable service
|
||||
latencies it is essential to release unneeded memory back to the kernel early
|
||||
on.
|
||||
|
||||
On Linux the [Pressure Stall Information
|
||||
(PSI)](https://docs.kernel.org/accounting/psi.html) Linux kernel interface is
|
||||
the primary way to determine the system or a part of it is under memory
|
||||
pressure. PSI provides a way how userspace can acquire a `poll()`-able file
|
||||
descriptor that gets notifications whenever memory pressure latencies for the
|
||||
system or a for a control group grow beyond some level.
|
||||
|
||||
`systemd` itself makes use of PSI, and helps applications to do so
|
||||
too. Specifically:
|
||||
|
||||
* Most of systemd's long running components watch for PSI memory pressure
|
||||
events, and release allocation caches and other resources once seen.
|
||||
|
||||
* systemd's service manager provides a protocol for asking services to listen
|
||||
to PSI events and configure the appropriate pressure thresholds.
|
||||
|
||||
* systemd's `sd-event` event loop API provides a high-level call
|
||||
`sd_event_add_memory_pressure()` which allows programs using it to
|
||||
efficiently hook into the PSI memory pressure protocol provided by the
|
||||
service manager, with very few lines of code.
|
||||
|
||||
## Memory Pressure Service Protocol
|
||||
|
||||
If memory pressure handling for a specific service is enabled via
|
||||
`MemoryPressureWatch=` the memory pressure service protocol is used to tell the
|
||||
service code about this. Specifically two environment variables are set by the
|
||||
service manager, and typically consumed by the service:
|
||||
|
||||
* The `$MEMORY_PRESSURE_WATCH` environment variable will contain an absolute
|
||||
path in the file system to the file to watch for memory pressure events. This
|
||||
will usually point to a PSI file such as the `memory.pressure` file of the
|
||||
service's cgroup. In order to make debugging easier, and allow later
|
||||
extension it is recommended for applications to also allow this path to refer
|
||||
to an `AF_UNIX` stream socket in the file system or a FIFO inode in the file
|
||||
system. Regardless which of the three types of inodes this absolute path
|
||||
refers to, all three are `poll()`-able for memory pressure events. The
|
||||
variable can also be set to the literal string `/dev/null`. If so the service
|
||||
code should take this as indication that memory pressure monitoring is not
|
||||
desired and should be turned off.
|
||||
|
||||
* The `$MEMORY_PRESSURE_WRITE` environment variable is optional. If set by the
|
||||
service manager it contains Base64 encoded data (that may contain arbitrary
|
||||
binary values, including NUL bytes) that should be written into the path
|
||||
provided via `$MEMORY_PRESSURE_WATCH` right after opening it. Typically, if
|
||||
talking directly to a PSI kernel file this will contain information about the
|
||||
threshold settings configurable in the service manager.
|
||||
|
||||
When a service initializes it hence should look for
|
||||
`$MEMORY_PRESSURE_WATCH`. If set, it should try to open the specified path. If
|
||||
it detects the path to refer to a regular file it should assume it refers to a
|
||||
PSI kernel file. If so, it should write the data from `$MEMORY_PRESSURE_WRITE`
|
||||
into the file descriptor (after Base64-decoding it, and only if the variable is
|
||||
set) and then watch for `POLLPRI` events on it. If it detects the paths refers
|
||||
to a FIFO inode, it should open it, write the `$MEMORY_PRESSURE_WRITE` data
|
||||
into it (as above) and then watch for `POLLIN` events on it. Whenever `POLLIN`
|
||||
is seen it should read and discard any data queued in the FIFO. If the path
|
||||
refers to an `AF_UNIX` socket in the file system, the application should
|
||||
`connect()` a stream socket to it, write `$MEMORY_PRESSURE_WRITE` into it (as
|
||||
above) and watch for `POLLIN`, discarding any data it might receive.
|
||||
|
||||
To summarize:
|
||||
|
||||
* If `$MEMORY_PRESSURE_WATCH` points to a regular file: open and watch for
|
||||
`POLLPRI`, never read from the file descriptor.
|
||||
|
||||
* If `$MEMORY_PRESSURE_WATCH` points to a FIFO: open and watch for `POLLIN`,
|
||||
read/discard any incoming data.
|
||||
|
||||
* If `$MEMORY_PRESSURE_WATCH` points to an `AF_UNIX` socket: connect and watch
|
||||
for `POLLIN`, read/discard any incoming data.
|
||||
|
||||
* If `$MEMORY_PRESSURE_WATCH` contains the literal string `/dev/null`, turn off
|
||||
memory pressure handling.
|
||||
|
||||
(And in each case, immediately after opening/connecting to the path, write the
|
||||
decoded `$MEMORY_PRESSURE_WRITE` data into it.)
|
||||
|
||||
Whenever a `POLLPRI`/`POLLIN` event is seen the service is under memory
|
||||
pressure. It should use this as hint to release suitable redundant resources,
|
||||
for example:
|
||||
|
||||
* glibc's memory allocation cache, via
|
||||
[`malloc_trim()`](https://man7.org/linux/man-pages/man3/malloc_trim.3.html). Similar,
|
||||
allocation caches implemented in the service itself.
|
||||
|
||||
* Any other local caches, such DNS caches, or web caches (in particular if
|
||||
service is a web browser).
|
||||
|
||||
* Terminate any idle worker threads or processes.
|
||||
|
||||
* Run a garbage collection (GC) cycle, if the programming languages supports that.
|
||||
|
||||
* Terminate the process if idle, and if it can be automatically started when
|
||||
needed next.
|
||||
|
||||
Which actions precisely to take depends on the service in question. Note that
|
||||
the notifications are delivered when memory allocation latency already degraded
|
||||
beyond some point. Hence when discussing which resources to keep and which ones
|
||||
to discard it should be kept in mind that it is typically acceptable that
|
||||
latencies to recover the discarded resources at a later point are less of a
|
||||
problem, given that latencies *already* are affected negatively.
|
||||
|
||||
In case the path supplied via `$MEMORY_PRESSURE_WATCH` points to a PSI kernel
|
||||
API file, or to an `AF_UNIX` opening it multiple times is safe and reliable,
|
||||
and should deliver notifications to each of the opened file descriptors. This
|
||||
is specifically useful for services that consist of multiple processes, and
|
||||
where each of them shall be able to release resources on memory pressure.
|
||||
|
||||
The `POLLPRI`/`POLLIN` conditions will be triggered every time memory pressure
|
||||
is detected, but not continously. It is thus safe to keep `poll()`-ing on the
|
||||
same file descriptor continously, and executing resource release operations
|
||||
whenever the file descriptor triggers without having to expect overloading the
|
||||
process.
|
||||
|
||||
(Currently, the protocol defined here only allows configuration of a single
|
||||
"degree" of memory pressure, there's no distinction made on how strong the
|
||||
pressure is. In future, if it becomes apparent that there's clear need to
|
||||
extend this we might eventually add different degrees, most likely by adding
|
||||
additional environment variables such as `$MEMORY_PRESSURE_WRITE_LOW` and
|
||||
`$MEMORY_PRESSURE_WRITE_HIGH` or similar, which may contain different settings
|
||||
for lower or higher memory pressure thresholds.)
|
||||
|
||||
## Service Manager Settings
|
||||
|
||||
The service manager provides two per-service settings that control the memory
|
||||
pressure handling:
|
||||
|
||||
* The
|
||||
[`MemoryPressureWatch=`](https://www.freedesktop.org/software/systemd/man/systemd.resource-control.html#MemoryPressureWatch=)
|
||||
setting controls whether to enable the memory pressure protocol for the
|
||||
service in question.
|
||||
|
||||
* The `MemoryPressureThresholdSec=` setting allows to configure the threshold
|
||||
when to signal memory pressure to the services. It takes a time value
|
||||
(usually in the millisecond range) that defines a threshold per 1s time
|
||||
window: if memory allocation latencies grow beyond this threshold
|
||||
notifications are generated towards the service, requesting it to release
|
||||
resources.
|
||||
|
||||
The `/etc/systemd/system.conf` file provides two settings that may be used to
|
||||
select the default values for the above settings. If the threshold is neither
|
||||
configured via the per-service nor via the default system-wide option, it
|
||||
defaults to 100ms.
|
||||
|
||||
Ẁhen memory pressure monitoring is enabled for a service via
|
||||
`MemoryPressureWatch=` this primarily does three things:
|
||||
|
||||
* It enables cgroup memory accounting for the service (this is a requirement
|
||||
for per-cgroup PSI)
|
||||
|
||||
* It sets the aforementioned two environment variables for processes invoked
|
||||
for the service, based on the control group of the service and provided
|
||||
settings.
|
||||
|
||||
* The `memory.pressure` PSI control group file associated with the service's
|
||||
cgroup is delegated to the service (i.e. permissions are relaxed so that
|
||||
unprivileged service payload code can open the file for writing).
|
||||
|
||||
## Memory Pressure Events in `sd-event`
|
||||
|
||||
The
|
||||
[`sd-event`](https://www.freedesktop.org/software/systemd/man/sd-event.html)
|
||||
event loop library provides two API calls that encapsulate the
|
||||
functionality described above:
|
||||
|
||||
* The
|
||||
[`sd_event_add_memory_pressure()`](https://www.freedesktop.org/software/systemd/man/sd_event_add_memory_pressure.html)
|
||||
call implements the service-side of the memory pressure protocol and
|
||||
integrates it with an `sd-event` event loop. It reads the two environment
|
||||
variables, connects/opens the specified file, writes the the specified data
|
||||
to it and then watches for events.
|
||||
|
||||
* The `sd_event_trim_memory()` call may be called to trim the calling
|
||||
processes' memory. It's a wrapper around glibc's `malloc_trim()`, but first
|
||||
releases allocation caches maintained by libsystemd internally. If the
|
||||
callback function passed to `sd_event_add_memory_pressure()` is passed as
|
||||
`NULL` this function is called as default implementation.
|
||||
|
||||
Making use of this, in order to hook up a service using `sd-event` with
|
||||
automatic memory pressure handling, it's typically sufficient to add a line
|
||||
such as:
|
||||
|
||||
```c
|
||||
(void) sd_event_add_memory_pressure(event, NULL, NULL, NULL);
|
||||
```
|
||||
|
||||
– right after allocating the event loop object `event`.
|
||||
|
||||
## Other APIs
|
||||
|
||||
Other programming environments might have native APIs to watch memory
|
||||
pressure/low memory events. Most notable is probably GLib's
|
||||
[GMemoryMonitor](https://developer-old.gnome.org/gio/stable/GMemoryMonitor.html). It
|
||||
currently uses the per-system Linux PSI interface as backend, but it operates
|
||||
differently than the above: memory pressure events are picked up by a system
|
||||
service, which then propagates this through D-Bus to the applications. This is
|
||||
typically less than ideal, since this means each notification event has to
|
||||
travel through three processes before being handled, and this creates
|
||||
additional latencies at a time where the system is already experiencing adverse
|
||||
latencies. Moreover, it focusses on system-wide PSI events, even though
|
||||
service-local ones are generally the better approach.
|
@ -529,6 +529,10 @@ node /org/freedesktop/systemd1 {
|
||||
readonly t DefaultLimitRTTIMESoft = ...;
|
||||
@org.freedesktop.DBus.Property.EmitsChangedSignal("false")
|
||||
readonly t DefaultTasksMax = ...;
|
||||
@org.freedesktop.DBus.Property.EmitsChangedSignal("false")
|
||||
readonly t DefaultMemoryPressureThresholdUSec = ...;
|
||||
@org.freedesktop.DBus.Property.EmitsChangedSignal("false")
|
||||
readonly s DefaultMemoryPressureWatch = '...';
|
||||
@org.freedesktop.DBus.Property.EmitsChangedSignal("const")
|
||||
readonly t TimerSlackNSec = ...;
|
||||
@org.freedesktop.DBus.Property.EmitsChangedSignal("const")
|
||||
@ -782,6 +786,10 @@ node /org/freedesktop/systemd1 {
|
||||
|
||||
<!--property DefaultTasksMax is not documented!-->
|
||||
|
||||
<!--property DefaultMemoryPressureThresholdUSec is not documented!-->
|
||||
|
||||
<!--property DefaultMemoryPressureWatch is not documented!-->
|
||||
|
||||
<!--property TimerSlackNSec is not documented!-->
|
||||
|
||||
<!--property DefaultOOMPolicy is not documented!-->
|
||||
@ -1208,6 +1216,10 @@ node /org/freedesktop/systemd1 {
|
||||
|
||||
<variablelist class="dbus-property" generated="True" extra-ref="DefaultTasksMax"/>
|
||||
|
||||
<variablelist class="dbus-property" generated="True" extra-ref="DefaultMemoryPressureThresholdUSec"/>
|
||||
|
||||
<variablelist class="dbus-property" generated="True" extra-ref="DefaultMemoryPressureWatch"/>
|
||||
|
||||
<variablelist class="dbus-property" generated="True" extra-ref="TimerSlackNSec"/>
|
||||
|
||||
<variablelist class="dbus-property" generated="True" extra-ref="DefaultOOMPolicy"/>
|
||||
@ -2803,6 +2815,10 @@ node /org/freedesktop/systemd1/unit/avahi_2ddaemon_2eservice {
|
||||
readonly a(iiqq) SocketBindDeny = [...];
|
||||
@org.freedesktop.DBus.Property.EmitsChangedSignal("false")
|
||||
readonly (bas) RestrictNetworkInterfaces = ...;
|
||||
@org.freedesktop.DBus.Property.EmitsChangedSignal("false")
|
||||
readonly s MemoryPressureWatch = '...';
|
||||
@org.freedesktop.DBus.Property.EmitsChangedSignal("false")
|
||||
readonly t MemoryPressureThresholdUSec = ...;
|
||||
@org.freedesktop.DBus.Property.EmitsChangedSignal("const")
|
||||
readonly as Environment = ['...', ...];
|
||||
@org.freedesktop.DBus.Property.EmitsChangedSignal("const")
|
||||
@ -3395,6 +3411,10 @@ node /org/freedesktop/systemd1/unit/avahi_2ddaemon_2eservice {
|
||||
|
||||
<!--property RestrictNetworkInterfaces is not documented!-->
|
||||
|
||||
<!--property MemoryPressureWatch is not documented!-->
|
||||
|
||||
<!--property MemoryPressureThresholdUSec is not documented!-->
|
||||
|
||||
<!--property EnvironmentFiles is not documented!-->
|
||||
|
||||
<!--property PassEnvironment is not documented!-->
|
||||
@ -3995,6 +4015,10 @@ node /org/freedesktop/systemd1/unit/avahi_2ddaemon_2eservice {
|
||||
|
||||
<variablelist class="dbus-property" generated="True" extra-ref="RestrictNetworkInterfaces"/>
|
||||
|
||||
<variablelist class="dbus-property" generated="True" extra-ref="MemoryPressureWatch"/>
|
||||
|
||||
<variablelist class="dbus-property" generated="True" extra-ref="MemoryPressureThresholdUSec"/>
|
||||
|
||||
<variablelist class="dbus-property" generated="True" extra-ref="Environment"/>
|
||||
|
||||
<variablelist class="dbus-property" generated="True" extra-ref="EnvironmentFiles"/>
|
||||
@ -4747,6 +4771,10 @@ node /org/freedesktop/systemd1/unit/avahi_2ddaemon_2esocket {
|
||||
readonly a(iiqq) SocketBindDeny = [...];
|
||||
@org.freedesktop.DBus.Property.EmitsChangedSignal("false")
|
||||
readonly (bas) RestrictNetworkInterfaces = ...;
|
||||
@org.freedesktop.DBus.Property.EmitsChangedSignal("false")
|
||||
readonly s MemoryPressureWatch = '...';
|
||||
@org.freedesktop.DBus.Property.EmitsChangedSignal("false")
|
||||
readonly t MemoryPressureThresholdUSec = ...;
|
||||
@org.freedesktop.DBus.Property.EmitsChangedSignal("const")
|
||||
readonly as Environment = ['...', ...];
|
||||
@org.freedesktop.DBus.Property.EmitsChangedSignal("const")
|
||||
@ -5359,6 +5387,10 @@ node /org/freedesktop/systemd1/unit/avahi_2ddaemon_2esocket {
|
||||
|
||||
<!--property RestrictNetworkInterfaces is not documented!-->
|
||||
|
||||
<!--property MemoryPressureWatch is not documented!-->
|
||||
|
||||
<!--property MemoryPressureThresholdUSec is not documented!-->
|
||||
|
||||
<!--property EnvironmentFiles is not documented!-->
|
||||
|
||||
<!--property PassEnvironment is not documented!-->
|
||||
@ -5949,6 +5981,10 @@ node /org/freedesktop/systemd1/unit/avahi_2ddaemon_2esocket {
|
||||
|
||||
<variablelist class="dbus-property" generated="True" extra-ref="RestrictNetworkInterfaces"/>
|
||||
|
||||
<variablelist class="dbus-property" generated="True" extra-ref="MemoryPressureWatch"/>
|
||||
|
||||
<variablelist class="dbus-property" generated="True" extra-ref="MemoryPressureThresholdUSec"/>
|
||||
|
||||
<variablelist class="dbus-property" generated="True" extra-ref="Environment"/>
|
||||
|
||||
<variablelist class="dbus-property" generated="True" extra-ref="EnvironmentFiles"/>
|
||||
@ -6590,6 +6626,10 @@ node /org/freedesktop/systemd1/unit/home_2emount {
|
||||
readonly a(iiqq) SocketBindDeny = [...];
|
||||
@org.freedesktop.DBus.Property.EmitsChangedSignal("false")
|
||||
readonly (bas) RestrictNetworkInterfaces = ...;
|
||||
@org.freedesktop.DBus.Property.EmitsChangedSignal("false")
|
||||
readonly s MemoryPressureWatch = '...';
|
||||
@org.freedesktop.DBus.Property.EmitsChangedSignal("false")
|
||||
readonly t MemoryPressureThresholdUSec = ...;
|
||||
@org.freedesktop.DBus.Property.EmitsChangedSignal("const")
|
||||
readonly as Environment = ['...', ...];
|
||||
@org.freedesktop.DBus.Property.EmitsChangedSignal("const")
|
||||
@ -7130,6 +7170,10 @@ node /org/freedesktop/systemd1/unit/home_2emount {
|
||||
|
||||
<!--property RestrictNetworkInterfaces is not documented!-->
|
||||
|
||||
<!--property MemoryPressureWatch is not documented!-->
|
||||
|
||||
<!--property MemoryPressureThresholdUSec is not documented!-->
|
||||
|
||||
<!--property EnvironmentFiles is not documented!-->
|
||||
|
||||
<!--property PassEnvironment is not documented!-->
|
||||
@ -7638,6 +7682,10 @@ node /org/freedesktop/systemd1/unit/home_2emount {
|
||||
|
||||
<variablelist class="dbus-property" generated="True" extra-ref="RestrictNetworkInterfaces"/>
|
||||
|
||||
<variablelist class="dbus-property" generated="True" extra-ref="MemoryPressureWatch"/>
|
||||
|
||||
<variablelist class="dbus-property" generated="True" extra-ref="MemoryPressureThresholdUSec"/>
|
||||
|
||||
<variablelist class="dbus-property" generated="True" extra-ref="Environment"/>
|
||||
|
||||
<variablelist class="dbus-property" generated="True" extra-ref="EnvironmentFiles"/>
|
||||
@ -8406,6 +8454,10 @@ node /org/freedesktop/systemd1/unit/dev_2dsda3_2eswap {
|
||||
readonly a(iiqq) SocketBindDeny = [...];
|
||||
@org.freedesktop.DBus.Property.EmitsChangedSignal("false")
|
||||
readonly (bas) RestrictNetworkInterfaces = ...;
|
||||
@org.freedesktop.DBus.Property.EmitsChangedSignal("false")
|
||||
readonly s MemoryPressureWatch = '...';
|
||||
@org.freedesktop.DBus.Property.EmitsChangedSignal("false")
|
||||
readonly t MemoryPressureThresholdUSec = ...;
|
||||
@org.freedesktop.DBus.Property.EmitsChangedSignal("const")
|
||||
readonly as Environment = ['...', ...];
|
||||
@org.freedesktop.DBus.Property.EmitsChangedSignal("const")
|
||||
@ -8932,6 +8984,10 @@ node /org/freedesktop/systemd1/unit/dev_2dsda3_2eswap {
|
||||
|
||||
<!--property RestrictNetworkInterfaces is not documented!-->
|
||||
|
||||
<!--property MemoryPressureWatch is not documented!-->
|
||||
|
||||
<!--property MemoryPressureThresholdUSec is not documented!-->
|
||||
|
||||
<!--property EnvironmentFiles is not documented!-->
|
||||
|
||||
<!--property PassEnvironment is not documented!-->
|
||||
@ -9426,6 +9482,10 @@ node /org/freedesktop/systemd1/unit/dev_2dsda3_2eswap {
|
||||
|
||||
<variablelist class="dbus-property" generated="True" extra-ref="RestrictNetworkInterfaces"/>
|
||||
|
||||
<variablelist class="dbus-property" generated="True" extra-ref="MemoryPressureWatch"/>
|
||||
|
||||
<variablelist class="dbus-property" generated="True" extra-ref="MemoryPressureThresholdUSec"/>
|
||||
|
||||
<variablelist class="dbus-property" generated="True" extra-ref="Environment"/>
|
||||
|
||||
<variablelist class="dbus-property" generated="True" extra-ref="EnvironmentFiles"/>
|
||||
@ -10053,6 +10113,10 @@ node /org/freedesktop/systemd1/unit/system_2eslice {
|
||||
readonly a(iiqq) SocketBindDeny = [...];
|
||||
@org.freedesktop.DBus.Property.EmitsChangedSignal("false")
|
||||
readonly (bas) RestrictNetworkInterfaces = ...;
|
||||
@org.freedesktop.DBus.Property.EmitsChangedSignal("false")
|
||||
readonly s MemoryPressureWatch = '...';
|
||||
@org.freedesktop.DBus.Property.EmitsChangedSignal("false")
|
||||
readonly t MemoryPressureThresholdUSec = ...;
|
||||
};
|
||||
interface org.freedesktop.DBus.Peer { ... };
|
||||
interface org.freedesktop.DBus.Introspectable { ... };
|
||||
@ -10219,6 +10283,10 @@ node /org/freedesktop/systemd1/unit/system_2eslice {
|
||||
|
||||
<!--property RestrictNetworkInterfaces is not documented!-->
|
||||
|
||||
<!--property MemoryPressureWatch is not documented!-->
|
||||
|
||||
<!--property MemoryPressureThresholdUSec is not documented!-->
|
||||
|
||||
<!--Autogenerated cross-references for systemd.directives, do not edit-->
|
||||
|
||||
<variablelist class="dbus-interface" generated="True" extra-ref="org.freedesktop.systemd1.Unit"/>
|
||||
@ -10391,6 +10459,10 @@ node /org/freedesktop/systemd1/unit/system_2eslice {
|
||||
|
||||
<variablelist class="dbus-property" generated="True" extra-ref="RestrictNetworkInterfaces"/>
|
||||
|
||||
<variablelist class="dbus-property" generated="True" extra-ref="MemoryPressureWatch"/>
|
||||
|
||||
<variablelist class="dbus-property" generated="True" extra-ref="MemoryPressureThresholdUSec"/>
|
||||
|
||||
<!--End of Autogenerated section-->
|
||||
|
||||
<refsect2>
|
||||
@ -10586,6 +10658,10 @@ node /org/freedesktop/systemd1/unit/session_2d1_2escope {
|
||||
readonly a(iiqq) SocketBindDeny = [...];
|
||||
@org.freedesktop.DBus.Property.EmitsChangedSignal("false")
|
||||
readonly (bas) RestrictNetworkInterfaces = ...;
|
||||
@org.freedesktop.DBus.Property.EmitsChangedSignal("false")
|
||||
readonly s MemoryPressureWatch = '...';
|
||||
@org.freedesktop.DBus.Property.EmitsChangedSignal("false")
|
||||
readonly t MemoryPressureThresholdUSec = ...;
|
||||
@org.freedesktop.DBus.Property.EmitsChangedSignal("const")
|
||||
readonly s KillMode = '...';
|
||||
@org.freedesktop.DBus.Property.EmitsChangedSignal("const")
|
||||
@ -10772,6 +10848,10 @@ node /org/freedesktop/systemd1/unit/session_2d1_2escope {
|
||||
|
||||
<!--property RestrictNetworkInterfaces is not documented!-->
|
||||
|
||||
<!--property MemoryPressureWatch is not documented!-->
|
||||
|
||||
<!--property MemoryPressureThresholdUSec is not documented!-->
|
||||
|
||||
<!--property KillMode is not documented!-->
|
||||
|
||||
<!--property KillSignal is not documented!-->
|
||||
@ -10974,6 +11054,10 @@ node /org/freedesktop/systemd1/unit/session_2d1_2escope {
|
||||
|
||||
<variablelist class="dbus-property" generated="True" extra-ref="RestrictNetworkInterfaces"/>
|
||||
|
||||
<variablelist class="dbus-property" generated="True" extra-ref="MemoryPressureWatch"/>
|
||||
|
||||
<variablelist class="dbus-property" generated="True" extra-ref="MemoryPressureThresholdUSec"/>
|
||||
|
||||
<variablelist class="dbus-property" generated="True" extra-ref="KillMode"/>
|
||||
|
||||
<variablelist class="dbus-property" generated="True" extra-ref="KillSignal"/>
|
||||
|
@ -160,6 +160,9 @@
|
||||
accessible for invocation at any time (see above). This function will log a structured log message at
|
||||
<constant>LOG_DEBUG</constant> level (with message ID f9b0be465ad540d0850ad32172d57c21) about the memory
|
||||
pressure operation.</para>
|
||||
|
||||
<para>For further details see <ulink url="https://systemd.io/MEMORY_PRESSURE">Memory Pressure Handling in
|
||||
systemd</ulink>.</para>
|
||||
</refsect1>
|
||||
|
||||
<refsect1>
|
||||
|
@ -556,6 +556,18 @@
|
||||
to configure the rate limit window, and <varname>ReloadLimitBurst=</varname> takes a positive integer to
|
||||
configure the maximum allowed number of reloads within the configured time window.</para></listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term><varname>DefaultMemoryPressureWatch=</varname></term>
|
||||
<term><varname>DefaultMemoryPressureThresholdSec=</varname></term>
|
||||
|
||||
<listitem><para>Configures the default settings for the per-unit
|
||||
<varname>MemoryPressureWatch=</varname> and <varname>MemoryPressureThresholdSec=</varname>
|
||||
settings. See
|
||||
<citerefentry><refentrytitle>systemd.resource-control</refentrytitle><manvolnum>5</manvolnum></citerefentry>
|
||||
for details. Defaults to <literal>auto</literal> and <literal>100ms</literal>, respectively. This
|
||||
also sets the memory pressure monitoring threshold for the service manager itself.</para></listitem>
|
||||
</varlistentry>
|
||||
</variablelist>
|
||||
</refsect1>
|
||||
|
||||
|
@ -3779,6 +3779,16 @@ StandardInputData=V2XigLJyZSBubyBzdHJhbmdlcnMgdG8gbG92ZQpZb3Uga25vdyB0aGUgcnVsZX
|
||||
</para></listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term><varname>$MEMORY_PRESSURE_WATCH</varname></term>
|
||||
<term><varname>$MEMORY_PRESSURE_WRITE</varname></term>
|
||||
|
||||
<listitem><para>If memory pressure monitoring is enabled for this service unit, the path to watch
|
||||
and the data to write into it. See <ulink url="https://systemd.io/MEMORY_PRESSURE">Memory Pressure
|
||||
Handling</ulink> for details about these variables and the service protocol data they
|
||||
convey.</para></listitem>
|
||||
</varlistentry>
|
||||
|
||||
</variablelist>
|
||||
|
||||
<para>For system services, when <varname>PAMName=</varname> is enabled and <command>pam_systemd</command> is part
|
||||
|
@ -1169,6 +1169,53 @@ DeviceAllow=/dev/loop-control
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term><varname>MemoryPressureWatch=</varname></term>
|
||||
|
||||
<listitem><para>Controls memory pressure monitoring for invoked processes. Takes one of
|
||||
<literal>off</literal>, <literal>on</literal>, <literal>auto</literal> or <literal>skip</literal>. If
|
||||
<literal>off</literal> tells the service not to watch for memory pressure events, by setting the
|
||||
<varname>$MEMORY_PRESSURE_WATCH</varname> environment variable to the literal string
|
||||
<filename>/dev/null</filename>. If <literal>on</literal> tells the service to watch for memory
|
||||
pressure events. This enables memory accounting for the service, and ensures the
|
||||
<filename>memory.pressure</filename> cgroup attribute files is accessible for read and write to the
|
||||
service's user. It then sets the <varname>$MEMORY_PRESSURE_WATCH</varname> environment variable for
|
||||
processes invoked by the unit to the file system path to this file. The threshold information
|
||||
configured with <varname>MemoryPressureThresholdSec=</varname> is encoded in the
|
||||
<varname>$MEMORY_PRESSURE_WRITE</varname> environment variable. If the <literal>auto</literal> value
|
||||
is set the protocol is enabled if memory accounting is anyway enabled for the unit, and disabled
|
||||
otherwise. If set to <literal>skip</literal> the logic is neither enabled, nor disabled and the two
|
||||
environment variables are not set.</para>
|
||||
|
||||
<para>Note that services are free to use the two environment variables, but it's unproblematic if
|
||||
they ignore them. Memory pressure handling must be implemented individually in each service, and
|
||||
usually means different things for different software. For further details on memory pressure
|
||||
handling see <ulink url="https://systemd.io/MEMORY_PRESSURE">Memory Pressure Handling in
|
||||
systemd</ulink>.</para>
|
||||
|
||||
<para>Services implemented using
|
||||
<citerefentry><refentrytitle>sd-event</refentrytitle><manvolnum>3</manvolnum></citerefentry> may use
|
||||
<citerefentry><refentrytitle>sd_event_add_memory_pressure</refentrytitle><manvolnum>3</manvolnum></citerefentry>
|
||||
to watch for and handle memory pressure events.</para>
|
||||
|
||||
<para>If not explicit set, defaults to the <varname>DefaultMemoryPressureWatch=</varname> setting in
|
||||
<citerefentry><refentrytitle>systemd-system.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>.</para></listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term><varname>MemoryPressureThresholdSec=</varname></term>
|
||||
|
||||
<listitem><para>Sets the memory pressure threshold time for memory pressure monitor as configured via
|
||||
<varname>MemoryPressureWatch=</varname>. Specifies the maximum allocation latency before a memory
|
||||
pressure event is signalled to the service, per 1s window. If not specified defaults to the
|
||||
<varname>DefaultMemoryPressureThresholdSec=</varname> setting in
|
||||
<citerefentry><refentrytitle>systemd-system.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>
|
||||
(which in turn defaults to 100ms). The specified value expects a time unit such as
|
||||
<literal>ms</literal> or <literal>µs</literal>, see
|
||||
<citerefentry><refentrytitle>systemd.time</refentrytitle><manvolnum>7</manvolnum></citerefentry> for
|
||||
details on the permitted syntax.</para></listitem>
|
||||
</varlistentry>
|
||||
</variablelist>
|
||||
</refsect1>
|
||||
|
||||
|
@ -175,6 +175,9 @@ void cgroup_context_init(CGroupContext *c) {
|
||||
.moom_swap = MANAGED_OOM_AUTO,
|
||||
.moom_mem_pressure = MANAGED_OOM_AUTO,
|
||||
.moom_preference = MANAGED_OOM_PREFERENCE_NONE,
|
||||
|
||||
.memory_pressure_watch = _CGROUP_PRESSURE_WATCH_INVALID,
|
||||
.memory_pressure_threshold_usec = USEC_INFINITY,
|
||||
};
|
||||
}
|
||||
|
||||
@ -517,7 +520,8 @@ void cgroup_context_dump(Unit *u, FILE* f, const char *prefix) {
|
||||
"%sManagedOOMSwap: %s\n"
|
||||
"%sManagedOOMMemoryPressure: %s\n"
|
||||
"%sManagedOOMMemoryPressureLimit: " PERMYRIAD_AS_PERCENT_FORMAT_STR "\n"
|
||||
"%sManagedOOMPreference: %s\n",
|
||||
"%sManagedOOMPreference: %s\n"
|
||||
"%sMemoryPressureWatch: %s\n",
|
||||
prefix, yes_no(c->cpu_accounting),
|
||||
prefix, yes_no(c->io_accounting),
|
||||
prefix, yes_no(c->blockio_accounting),
|
||||
@ -559,7 +563,12 @@ void cgroup_context_dump(Unit *u, FILE* f, const char *prefix) {
|
||||
prefix, managed_oom_mode_to_string(c->moom_swap),
|
||||
prefix, managed_oom_mode_to_string(c->moom_mem_pressure),
|
||||
prefix, PERMYRIAD_AS_PERCENT_FORMAT_VAL(UINT32_SCALE_TO_PERMYRIAD(c->moom_mem_pressure_limit)),
|
||||
prefix, managed_oom_preference_to_string(c->moom_preference));
|
||||
prefix, managed_oom_preference_to_string(c->moom_preference),
|
||||
prefix, cgroup_pressure_watch_to_string(c->memory_pressure_watch));
|
||||
|
||||
if (c->memory_pressure_threshold_usec != USEC_INFINITY)
|
||||
fprintf(f, "%sMemoryPressureThresholdSec: %s\n",
|
||||
prefix, FORMAT_TIMESPAN(c->memory_pressure_threshold_usec, 1));
|
||||
|
||||
if (c->delegate) {
|
||||
_cleanup_free_ char *t = NULL;
|
||||
@ -2362,6 +2371,13 @@ static int unit_update_cgroup(
|
||||
cgroup_context_apply(u, target_mask, state);
|
||||
cgroup_xattr_apply(u);
|
||||
|
||||
/* For most units we expect that memory monitoring is set up before the unit is started and we won't
|
||||
* touch it after. For PID 1 this is different though, because we couldn't possibly do that given
|
||||
* that PID 1 runs before init.scope is even set up. Hence, whenever init.scope is realized, let's
|
||||
* try to open the memory pressure interface anew. */
|
||||
if (unit_has_name(u, SPECIAL_INIT_SCOPE))
|
||||
(void) manager_setup_memory_pressure_event_source(u->manager);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
@ -4369,3 +4385,12 @@ static const char* const freezer_action_table[_FREEZER_ACTION_MAX] = {
|
||||
};
|
||||
|
||||
DEFINE_STRING_TABLE_LOOKUP(freezer_action, FreezerAction);
|
||||
|
||||
static const char* const cgroup_pressure_watch_table[_CGROUP_PRESSURE_WATCH_MAX] = {
|
||||
[CGROUP_PRESSURE_WATCH_OFF] = "off",
|
||||
[CGROUP_PRESSURE_WATCH_AUTO] = "auto",
|
||||
[CGROUP_PRESSURE_WATCH_ON] = "on",
|
||||
[CGROUP_PRESSURE_WATCH_SKIP] = "skip",
|
||||
};
|
||||
|
||||
DEFINE_STRING_TABLE_LOOKUP_WITH_BOOLEAN(cgroup_pressure_watch, CGroupPressureWatch, CGROUP_PRESSURE_WATCH_ON);
|
||||
|
@ -110,6 +110,15 @@ struct CGroupSocketBindItem {
|
||||
uint16_t port_min;
|
||||
};
|
||||
|
||||
typedef enum CGroupPressureWatch {
|
||||
CGROUP_PRESSURE_WATCH_OFF, /* → tells the service payload explicitly not to watch for memory pressure */
|
||||
CGROUP_PRESSURE_WATCH_AUTO, /* → on if memory account is on anyway for the unit, otherwise off */
|
||||
CGROUP_PRESSURE_WATCH_ON,
|
||||
CGROUP_PRESSURE_WATCH_SKIP, /* → doesn't set up memory pressure watch, but also doesn't explicitly tell payload to avoid it */
|
||||
_CGROUP_PRESSURE_WATCH_MAX,
|
||||
_CGROUP_PRESSURE_WATCH_INVALID = -EINVAL,
|
||||
} CGroupPressureWatch;
|
||||
|
||||
struct CGroupContext {
|
||||
bool cpu_accounting;
|
||||
bool io_accounting;
|
||||
@ -207,6 +216,12 @@ struct CGroupContext {
|
||||
ManagedOOMMode moom_mem_pressure;
|
||||
uint32_t moom_mem_pressure_limit; /* Normalized to 2^32-1 == 100% */
|
||||
ManagedOOMPreference moom_preference;
|
||||
|
||||
/* Memory pressure logic */
|
||||
CGroupPressureWatch memory_pressure_watch;
|
||||
usec_t memory_pressure_threshold_usec;
|
||||
/* NB: For now we don't make the period configurable, not the type, nor do we allow multiple
|
||||
* triggers, nor triggers for non-memory pressure. We might add that later. */
|
||||
};
|
||||
|
||||
/* Used when querying IP accounting data */
|
||||
@ -248,6 +263,13 @@ void cgroup_context_free_blockio_device_bandwidth(CGroupContext *c, CGroupBlockI
|
||||
void cgroup_context_remove_bpf_foreign_program(CGroupContext *c, CGroupBPFForeignProgram *p);
|
||||
void cgroup_context_remove_socket_bind(CGroupSocketBindItem **head);
|
||||
|
||||
static inline bool cgroup_context_want_memory_pressure(const CGroupContext *c) {
|
||||
assert(c);
|
||||
|
||||
return c->memory_pressure_watch == CGROUP_PRESSURE_WATCH_ON ||
|
||||
(c->memory_pressure_watch == CGROUP_PRESSURE_WATCH_AUTO && c->memory_accounting);
|
||||
}
|
||||
|
||||
int cgroup_add_device_allow(CGroupContext *c, const char *dev, const char *mode);
|
||||
int cgroup_add_bpf_foreign_program(CGroupContext *c, uint32_t attach_type, const char *path);
|
||||
|
||||
@ -351,3 +373,6 @@ int unit_cgroup_freezer_action(Unit *u, FreezerAction action);
|
||||
|
||||
const char* freezer_action_to_string(FreezerAction a) _const_;
|
||||
FreezerAction freezer_action_from_string(const char *s) _pure_;
|
||||
|
||||
const char* cgroup_pressure_watch_to_string(CGroupPressureWatch a) _const_;
|
||||
CGroupPressureWatch cgroup_pressure_watch_from_string(const char *s) _pure_;
|
||||
|
@ -24,6 +24,7 @@
|
||||
#include "socket-util.h"
|
||||
|
||||
BUS_DEFINE_PROPERTY_GET(bus_property_get_tasks_max, "t", TasksMax, tasks_max_resolve);
|
||||
BUS_DEFINE_PROPERTY_GET_ENUM(bus_property_get_cgroup_pressure_watch, cgroup_pressure_watch, CGroupPressureWatch);
|
||||
|
||||
static BUS_DEFINE_PROPERTY_GET_ENUM(property_get_cgroup_device_policy, cgroup_device_policy, CGroupDevicePolicy);
|
||||
static BUS_DEFINE_PROPERTY_GET_ENUM(property_get_managed_oom_mode, managed_oom_mode, ManagedOOMMode);
|
||||
@ -494,6 +495,8 @@ const sd_bus_vtable bus_cgroup_vtable[] = {
|
||||
SD_BUS_PROPERTY("SocketBindAllow", "a(iiqq)", property_get_socket_bind, offsetof(CGroupContext, socket_bind_allow), 0),
|
||||
SD_BUS_PROPERTY("SocketBindDeny", "a(iiqq)", property_get_socket_bind, offsetof(CGroupContext, socket_bind_deny), 0),
|
||||
SD_BUS_PROPERTY("RestrictNetworkInterfaces", "(bas)", property_get_restrict_network_interfaces, 0, 0),
|
||||
SD_BUS_PROPERTY("MemoryPressureWatch", "s", bus_property_get_cgroup_pressure_watch, offsetof(CGroupContext, memory_pressure_watch), 0),
|
||||
SD_BUS_PROPERTY("MemoryPressureThresholdUSec", "t", bus_property_get_usec, offsetof(CGroupContext, memory_pressure_threshold_usec), 0),
|
||||
SD_BUS_VTABLE_END
|
||||
};
|
||||
|
||||
@ -743,6 +746,47 @@ static int bus_cgroup_set_transient_property(
|
||||
}
|
||||
}
|
||||
|
||||
return 1;
|
||||
|
||||
} else if (streq(name, "MemoryPressureWatch")) {
|
||||
CGroupPressureWatch p;
|
||||
const char *t;
|
||||
|
||||
r = sd_bus_message_read(message, "s", &t);
|
||||
if (r < 0)
|
||||
return r;
|
||||
|
||||
if (isempty(t))
|
||||
p = _CGROUP_PRESSURE_WATCH_INVALID;
|
||||
else {
|
||||
p = cgroup_pressure_watch_from_string(t);
|
||||
if (p < 0)
|
||||
return p;
|
||||
}
|
||||
|
||||
if (!UNIT_WRITE_FLAGS_NOOP(flags)) {
|
||||
c->memory_pressure_watch = p;
|
||||
unit_write_settingf(u, flags, name, "MemoryPressureWatch=%s", strempty(cgroup_pressure_watch_to_string(p)));
|
||||
}
|
||||
|
||||
return 1;
|
||||
|
||||
} else if (streq(name, "MemoryPressureThresholdUSec")) {
|
||||
uint64_t t;
|
||||
|
||||
r = sd_bus_message_read(message, "t", &t);
|
||||
if (r < 0)
|
||||
return r;
|
||||
|
||||
if (!UNIT_WRITE_FLAGS_NOOP(flags)) {
|
||||
c->memory_pressure_threshold_usec = t;
|
||||
|
||||
if (t == UINT64_MAX)
|
||||
unit_write_setting(u, flags, name, "MemoryPressureThresholdUSec=");
|
||||
else
|
||||
unit_write_settingf(u, flags, name, "MemoryPressureThresholdUSec=%" PRIu64, t);
|
||||
}
|
||||
|
||||
return 1;
|
||||
}
|
||||
|
||||
|
@ -10,5 +10,6 @@
|
||||
extern const sd_bus_vtable bus_cgroup_vtable[];
|
||||
|
||||
int bus_property_get_tasks_max(sd_bus *bus, const char *path, const char *interface, const char *property, sd_bus_message *reply, void *userdata, sd_bus_error *ret_error);
|
||||
int bus_property_get_cgroup_pressure_watch(sd_bus *bus, const char *path, const char *interface, const char *property, sd_bus_message *reply, void *userdata, sd_bus_error *ret_error);
|
||||
|
||||
int bus_cgroup_set_property(Unit *u, CGroupContext *c, const char *name, sd_bus_message *message, UnitWriteFlags flags, sd_bus_error *error);
|
||||
|
@ -2943,6 +2943,8 @@ const sd_bus_vtable bus_manager_vtable[] = {
|
||||
SD_BUS_PROPERTY("DefaultLimitRTTIME", "t", bus_property_get_rlimit, offsetof(Manager, rlimit[RLIMIT_RTTIME]), SD_BUS_VTABLE_PROPERTY_CONST),
|
||||
SD_BUS_PROPERTY("DefaultLimitRTTIMESoft", "t", bus_property_get_rlimit, offsetof(Manager, rlimit[RLIMIT_RTTIME]), SD_BUS_VTABLE_PROPERTY_CONST),
|
||||
SD_BUS_PROPERTY("DefaultTasksMax", "t", bus_property_get_tasks_max, offsetof(Manager, default_tasks_max), 0),
|
||||
SD_BUS_PROPERTY("DefaultMemoryPressureThresholdUSec", "t", bus_property_get_usec, offsetof(Manager, default_memory_pressure_threshold_usec), 0),
|
||||
SD_BUS_PROPERTY("DefaultMemoryPressureWatch", "s", bus_property_get_cgroup_pressure_watch, offsetof(Manager, default_memory_pressure_watch), 0),
|
||||
SD_BUS_PROPERTY("TimerSlackNSec", "t", property_get_timer_slack_nsec, 0, SD_BUS_VTABLE_PROPERTY_CONST),
|
||||
SD_BUS_PROPERTY("DefaultOOMPolicy", "s", bus_property_get_oom_policy, offsetof(Manager, default_oom_policy), SD_BUS_VTABLE_PROPERTY_CONST),
|
||||
SD_BUS_PROPERTY("DefaultOOMScoreAdjust", "i", property_get_oom_score_adjust, 0, SD_BUS_VTABLE_PROPERTY_CONST),
|
||||
|
@ -80,6 +80,7 @@
|
||||
#include "parse-util.h"
|
||||
#include "path-util.h"
|
||||
#include "process-util.h"
|
||||
#include "psi-util.h"
|
||||
#include "random-util.h"
|
||||
#include "recurse-dir.h"
|
||||
#include "rlimit-util.h"
|
||||
@ -1808,6 +1809,7 @@ static int build_environment(
|
||||
const Unit *u,
|
||||
const ExecContext *c,
|
||||
const ExecParameters *p,
|
||||
const CGroupContext *cgroup_context,
|
||||
size_t n_fds,
|
||||
char **fdnames,
|
||||
const char *home,
|
||||
@ -1815,6 +1817,7 @@ static int build_environment(
|
||||
const char *shell,
|
||||
dev_t journal_stream_dev,
|
||||
ino_t journal_stream_ino,
|
||||
const char *memory_pressure_path,
|
||||
char ***ret) {
|
||||
|
||||
_cleanup_strv_free_ char **our_env = NULL;
|
||||
@ -1826,7 +1829,7 @@ static int build_environment(
|
||||
assert(p);
|
||||
assert(ret);
|
||||
|
||||
#define N_ENV_VARS 17
|
||||
#define N_ENV_VARS 19
|
||||
our_env = new0(char*, N_ENV_VARS + _EXEC_DIRECTORY_TYPE_MAX);
|
||||
if (!our_env)
|
||||
return -ENOMEM;
|
||||
@ -1990,8 +1993,35 @@ static int build_environment(
|
||||
|
||||
our_env[n_env++] = x;
|
||||
|
||||
our_env[n_env++] = NULL;
|
||||
assert(n_env <= N_ENV_VARS + _EXEC_DIRECTORY_TYPE_MAX);
|
||||
if (memory_pressure_path) {
|
||||
x = strjoin("MEMORY_PRESSURE_WATCH=", memory_pressure_path);
|
||||
if (!x)
|
||||
return -ENOMEM;
|
||||
|
||||
our_env[n_env++] = x;
|
||||
|
||||
if (cgroup_context && !path_equal(memory_pressure_path, "/dev/null")) {
|
||||
_cleanup_free_ char *b = NULL, *e = NULL;
|
||||
|
||||
if (asprintf(&b, "%s " USEC_FMT " " USEC_FMT,
|
||||
MEMORY_PRESSURE_DEFAULT_TYPE,
|
||||
cgroup_context->memory_pressure_threshold_usec == USEC_INFINITY ? MEMORY_PRESSURE_DEFAULT_THRESHOLD_USEC :
|
||||
CLAMP(cgroup_context->memory_pressure_threshold_usec, 1U, MEMORY_PRESSURE_DEFAULT_WINDOW_USEC),
|
||||
MEMORY_PRESSURE_DEFAULT_WINDOW_USEC) < 0)
|
||||
return -ENOMEM;
|
||||
|
||||
if (base64mem(b, strlen(b) + 1, &e) < 0)
|
||||
return -ENOMEM;
|
||||
|
||||
x = strjoin("MEMORY_PRESSURE_WRITE=", e);
|
||||
if (!x)
|
||||
return -ENOMEM;
|
||||
|
||||
our_env[n_env++] = x;
|
||||
}
|
||||
}
|
||||
|
||||
assert(n_env < N_ENV_VARS + _EXEC_DIRECTORY_TYPE_MAX);
|
||||
#undef N_ENV_VARS
|
||||
|
||||
*ret = TAKE_PTR(our_env);
|
||||
@ -4246,6 +4276,7 @@ static int exec_child(
|
||||
const ExecParameters *params,
|
||||
ExecRuntime *runtime,
|
||||
DynamicCreds *dcreds,
|
||||
const CGroupContext *cgroup_context,
|
||||
int socket_fd,
|
||||
const int named_iofds[static 3],
|
||||
int *params_fds,
|
||||
@ -4259,7 +4290,7 @@ static int exec_child(
|
||||
int r, ngids = 0, exec_fd;
|
||||
_cleanup_free_ gid_t *supplementary_gids = NULL;
|
||||
const char *username = NULL, *groupname = NULL;
|
||||
_cleanup_free_ char *home_buffer = NULL;
|
||||
_cleanup_free_ char *home_buffer = NULL, *memory_pressure_path = NULL;
|
||||
const char *home = NULL, *shell = NULL;
|
||||
char **final_argv = NULL;
|
||||
dev_t journal_stream_dev = 0;
|
||||
@ -4672,15 +4703,41 @@ static int exec_child(
|
||||
}
|
||||
}
|
||||
|
||||
/* If delegation is enabled we'll pass ownership of the cgroup to the user of the new process. On cgroup v1
|
||||
* this is only about systemd's own hierarchy, i.e. not the controller hierarchies, simply because that's not
|
||||
* safe. On cgroup v2 there's only one hierarchy anyway, and delegation is safe there, hence in that case only
|
||||
* touch a single hierarchy too. */
|
||||
if (params->cgroup_path && context->user && (params->flags & EXEC_CGROUP_DELEGATE)) {
|
||||
r = cg_set_access(SYSTEMD_CGROUP_CONTROLLER, params->cgroup_path, uid, gid);
|
||||
if (r < 0) {
|
||||
*exit_status = EXIT_CGROUP;
|
||||
return log_unit_error_errno(unit, r, "Failed to adjust control group access: %m");
|
||||
if (params->cgroup_path) {
|
||||
/* If delegation is enabled we'll pass ownership of the cgroup to the user of the new process. On cgroup v1
|
||||
* this is only about systemd's own hierarchy, i.e. not the controller hierarchies, simply because that's not
|
||||
* safe. On cgroup v2 there's only one hierarchy anyway, and delegation is safe there, hence in that case only
|
||||
* touch a single hierarchy too. */
|
||||
|
||||
if (params->flags & EXEC_CGROUP_DELEGATE) {
|
||||
r = cg_set_access(SYSTEMD_CGROUP_CONTROLLER, params->cgroup_path, uid, gid);
|
||||
if (r < 0) {
|
||||
*exit_status = EXIT_CGROUP;
|
||||
return log_unit_error_errno(unit, r, "Failed to adjust control group access: %m");
|
||||
}
|
||||
}
|
||||
|
||||
if (cgroup_context && cg_unified() > 0 && is_pressure_supported() > 0) {
|
||||
if (cgroup_context_want_memory_pressure(cgroup_context)) {
|
||||
r = cg_get_path("memory", params->cgroup_path, "memory.pressure", &memory_pressure_path);
|
||||
if (r < 0) {
|
||||
*exit_status = EXIT_MEMORY;
|
||||
return log_oom();
|
||||
}
|
||||
|
||||
r = chmod_and_chown(memory_pressure_path, 0644, uid, gid);
|
||||
if (r < 0) {
|
||||
log_unit_full_errno(unit, r == -ENOENT || ERRNO_IS_PRIVILEGE(r) ? LOG_DEBUG : LOG_WARNING, r,
|
||||
"Failed to adjust ownership of '%s', ignoring: %m", memory_pressure_path);
|
||||
memory_pressure_path = mfree(memory_pressure_path);
|
||||
}
|
||||
} else if (cgroup_context->memory_pressure_watch == CGROUP_PRESSURE_WATCH_OFF) {
|
||||
memory_pressure_path = strdup("/dev/null"); /* /dev/null is explicit indicator for turning of memory pressure watch */
|
||||
if (!memory_pressure_path) {
|
||||
*exit_status = EXIT_MEMORY;
|
||||
return log_oom();
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@ -4704,6 +4761,7 @@ static int exec_child(
|
||||
unit,
|
||||
context,
|
||||
params,
|
||||
cgroup_context,
|
||||
n_fds,
|
||||
fdnames,
|
||||
home,
|
||||
@ -4711,6 +4769,7 @@ static int exec_child(
|
||||
shell,
|
||||
journal_stream_dev,
|
||||
journal_stream_ino,
|
||||
memory_pressure_path,
|
||||
&our_env);
|
||||
if (r < 0) {
|
||||
*exit_status = EXIT_MEMORY;
|
||||
@ -5358,6 +5417,7 @@ int exec_spawn(Unit *unit,
|
||||
const ExecParameters *params,
|
||||
ExecRuntime *runtime,
|
||||
DynamicCreds *dcreds,
|
||||
const CGroupContext *cgroup_context,
|
||||
pid_t *ret) {
|
||||
|
||||
int socket_fd, r, named_iofds[3] = { -1, -1, -1 }, *fds = NULL;
|
||||
@ -5445,6 +5505,7 @@ int exec_spawn(Unit *unit,
|
||||
params,
|
||||
runtime,
|
||||
dcreds,
|
||||
cgroup_context,
|
||||
socket_fd,
|
||||
named_iofds,
|
||||
fds,
|
||||
|
@ -441,6 +441,7 @@ int exec_spawn(Unit *unit,
|
||||
const ExecParameters *exec_params,
|
||||
ExecRuntime *runtime,
|
||||
DynamicCreds *dynamic_creds,
|
||||
const CGroupContext *cgroup_context,
|
||||
pid_t *ret);
|
||||
|
||||
void exec_command_done_array(ExecCommand *c, size_t n);
|
||||
|
@ -146,6 +146,7 @@ DEFINE_CONFIG_PARSE_ENUM(config_parse_service_timeout_failure_mode, service_time
|
||||
DEFINE_CONFIG_PARSE_ENUM(config_parse_socket_bind, socket_address_bind_ipv6_only_or_bool, SocketAddressBindIPv6Only, "Failed to parse bind IPv6 only value");
|
||||
DEFINE_CONFIG_PARSE_ENUM(config_parse_oom_policy, oom_policy, OOMPolicy, "Failed to parse OOM policy");
|
||||
DEFINE_CONFIG_PARSE_ENUM(config_parse_managed_oom_preference, managed_oom_preference, ManagedOOMPreference, "Failed to parse ManagedOOMPreference=");
|
||||
DEFINE_CONFIG_PARSE_ENUM(config_parse_cgroup_pressure_watch, cgroup_pressure_watch, CGroupPressureWatch, "Failed to parse CGroupPressureWatch=");
|
||||
DEFINE_CONFIG_PARSE_ENUM_WITH_DEFAULT(config_parse_ip_tos, ip_tos, int, -1, "Failed to parse IP TOS value");
|
||||
DEFINE_CONFIG_PARSE_PTR(config_parse_blockio_weight, cg_blkio_weight_parse, uint64_t, "Invalid block IO weight");
|
||||
DEFINE_CONFIG_PARSE_PTR(config_parse_cg_weight, cg_weight_parse, uint64_t, "Invalid weight");
|
||||
|
@ -152,6 +152,7 @@ CONFIG_PARSER_PROTOTYPE(config_parse_watchdog_sec);
|
||||
CONFIG_PARSER_PROTOTYPE(config_parse_tty_size);
|
||||
CONFIG_PARSER_PROTOTYPE(config_parse_log_filter_patterns);
|
||||
CONFIG_PARSER_PROTOTYPE(config_parse_open_file);
|
||||
CONFIG_PARSER_PROTOTYPE(config_parse_cgroup_pressure_watch);
|
||||
|
||||
/* gperf prototypes */
|
||||
const struct ConfigPerfItem* load_fragment_gperf_lookup(const char *key, GPERF_LEN_TYPE length);
|
||||
|
@ -75,6 +75,7 @@
|
||||
#include "pretty-print.h"
|
||||
#include "proc-cmdline.h"
|
||||
#include "process-util.h"
|
||||
#include "psi-util.h"
|
||||
#include "random-util.h"
|
||||
#include "rlimit-util.h"
|
||||
#if HAVE_SECCOMP
|
||||
@ -162,6 +163,8 @@ static bool arg_default_blockio_accounting;
|
||||
static bool arg_default_memory_accounting;
|
||||
static bool arg_default_tasks_accounting;
|
||||
static TasksMax arg_default_tasks_max;
|
||||
static usec_t arg_default_memory_pressure_threshold_usec;
|
||||
static CGroupPressureWatch arg_default_memory_pressure_watch;
|
||||
static sd_id128_t arg_machine_id;
|
||||
static EmergencyAction arg_cad_burst_action;
|
||||
static OOMPolicy arg_default_oom_policy;
|
||||
@ -686,6 +689,8 @@ static int parse_config_file(void) {
|
||||
{ "Manager", "DefaultMemoryAccounting", config_parse_bool, 0, &arg_default_memory_accounting },
|
||||
{ "Manager", "DefaultTasksAccounting", config_parse_bool, 0, &arg_default_tasks_accounting },
|
||||
{ "Manager", "DefaultTasksMax", config_parse_tasks_max, 0, &arg_default_tasks_max },
|
||||
{ "Manager", "DefaultMemoryPressureThresholdSec", config_parse_sec, 0, &arg_default_memory_pressure_threshold_usec },
|
||||
{ "Manager", "DefaultMemoryPressureWatch", config_parse_cgroup_pressure_watch, 0, &arg_default_memory_pressure_watch },
|
||||
{ "Manager", "CtrlAltDelBurstAction", config_parse_emergency_action, arg_system, &arg_cad_burst_action },
|
||||
{ "Manager", "DefaultOOMPolicy", config_parse_oom_policy, 0, &arg_default_oom_policy },
|
||||
{ "Manager", "DefaultOOMScoreAdjust", config_parse_oom_score_adjust, 0, NULL },
|
||||
@ -767,6 +772,8 @@ static void set_manager_defaults(Manager *m) {
|
||||
m->default_memory_accounting = arg_default_memory_accounting;
|
||||
m->default_tasks_accounting = arg_default_tasks_accounting;
|
||||
m->default_tasks_max = arg_default_tasks_max;
|
||||
m->default_memory_pressure_watch = arg_default_memory_pressure_watch;
|
||||
m->default_memory_pressure_threshold_usec = arg_default_memory_pressure_threshold_usec;
|
||||
m->default_oom_policy = arg_default_oom_policy;
|
||||
m->default_oom_score_adjust_set = arg_default_oom_score_adjust_set;
|
||||
m->default_oom_score_adjust = arg_default_oom_score_adjust;
|
||||
@ -2474,6 +2481,8 @@ static void reset_arguments(void) {
|
||||
arg_default_memory_accounting = MEMORY_ACCOUNTING_DEFAULT;
|
||||
arg_default_tasks_accounting = true;
|
||||
arg_default_tasks_max = DEFAULT_TASKS_MAX;
|
||||
arg_default_memory_pressure_threshold_usec = MEMORY_PRESSURE_DEFAULT_THRESHOLD_USEC;
|
||||
arg_default_memory_pressure_watch = CGROUP_PRESSURE_WATCH_AUTO;
|
||||
arg_machine_id = (sd_id128_t) {};
|
||||
arg_cad_burst_action = EMERGENCY_ACTION_REBOOT_FORCE;
|
||||
arg_default_oom_policy = OOM_STOP;
|
||||
|
@ -31,6 +31,7 @@
|
||||
#include "bus-util.h"
|
||||
#include "clean-ipc.h"
|
||||
#include "clock-util.h"
|
||||
#include "common-signal.h"
|
||||
#include "constants.h"
|
||||
#include "core-varlink.h"
|
||||
#include "creds-util.h"
|
||||
@ -69,6 +70,7 @@
|
||||
#include "path-lookup.h"
|
||||
#include "path-util.h"
|
||||
#include "process-util.h"
|
||||
#include "psi-util.h"
|
||||
#include "ratelimit.h"
|
||||
#include "rlimit-util.h"
|
||||
#include "rm-rf.h"
|
||||
@ -567,7 +569,11 @@ static int manager_setup_signals(Manager *m) {
|
||||
SIGRTMIN+15, /* systemd: Immediate reboot */
|
||||
SIGRTMIN+16, /* systemd: Immediate kexec */
|
||||
|
||||
/* ... space for more immediate system state changes ... */
|
||||
/* ... space for one more immediate system state change ... */
|
||||
|
||||
SIGRTMIN+18, /* systemd: control command */
|
||||
|
||||
/* ... space ... */
|
||||
|
||||
SIGRTMIN+20, /* systemd: enable status messages */
|
||||
SIGRTMIN+21, /* systemd: disable status messages */
|
||||
@ -638,6 +644,8 @@ static char** sanitize_environment(char **l) {
|
||||
"LOG_NAMESPACE",
|
||||
"MAINPID",
|
||||
"MANAGERPID",
|
||||
"MEMORY_PRESSURE_WATCH",
|
||||
"MEMORY_PRESSURE_WRITE",
|
||||
"MONITOR_EXIT_CODE",
|
||||
"MONITOR_EXIT_STATUS",
|
||||
"MONITOR_INVOCATION_ID",
|
||||
@ -787,6 +795,31 @@ static int manager_setup_sigchld_event_source(Manager *m) {
|
||||
return 0;
|
||||
}
|
||||
|
||||
int manager_setup_memory_pressure_event_source(Manager *m) {
|
||||
int r;
|
||||
|
||||
assert(m);
|
||||
|
||||
m->memory_pressure_event_source = sd_event_source_disable_unref(m->memory_pressure_event_source);
|
||||
|
||||
r = sd_event_add_memory_pressure(m->event, &m->memory_pressure_event_source, NULL, NULL);
|
||||
if (r < 0)
|
||||
log_full_errno(ERRNO_IS_NOT_SUPPORTED(r) || ERRNO_IS_PRIVILEGE(r) || (r == -EHOSTDOWN) ? LOG_DEBUG : LOG_NOTICE, r,
|
||||
"Failed to establish memory pressure event source, ignoring: %m");
|
||||
else if (m->default_memory_pressure_threshold_usec != USEC_INFINITY) {
|
||||
|
||||
/* If there's a default memory pressure threshold set, also apply it to the service manager itself */
|
||||
r = sd_event_source_set_memory_pressure_period(
|
||||
m->memory_pressure_event_source,
|
||||
m->default_memory_pressure_threshold_usec,
|
||||
MEMORY_PRESSURE_DEFAULT_WINDOW_USEC);
|
||||
if (r < 0)
|
||||
log_warning_errno(r, "Failed to adjust memory pressure threshold, ignoring: %m");
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int manager_find_credentials_dirs(Manager *m) {
|
||||
const char *e;
|
||||
int r;
|
||||
@ -877,6 +910,9 @@ int manager_new(LookupScope scope, ManagerTestRunFlags test_run_flags, Manager *
|
||||
.test_run_flags = test_run_flags,
|
||||
|
||||
.default_oom_policy = OOM_STOP,
|
||||
|
||||
.default_memory_pressure_watch = CGROUP_PRESSURE_WATCH_AUTO,
|
||||
.default_memory_pressure_threshold_usec = USEC_INFINITY,
|
||||
};
|
||||
|
||||
#if ENABLE_EFI
|
||||
@ -967,6 +1003,10 @@ int manager_new(LookupScope scope, ManagerTestRunFlags test_run_flags, Manager *
|
||||
if (r < 0)
|
||||
return r;
|
||||
|
||||
r = manager_setup_memory_pressure_event_source(m);
|
||||
if (r < 0)
|
||||
return r;
|
||||
|
||||
#if HAVE_LIBBPF
|
||||
if (MANAGER_IS_SYSTEM(m) && lsm_bpf_supported(/* initialize = */ true)) {
|
||||
r = lsm_bpf_setup(m);
|
||||
@ -1541,6 +1581,7 @@ Manager* manager_free(Manager *m) {
|
||||
sd_event_source_unref(m->jobs_in_progress_event_source);
|
||||
sd_event_source_unref(m->run_queue_event_source);
|
||||
sd_event_source_unref(m->user_lookup_event_source);
|
||||
sd_event_source_unref(m->memory_pressure_event_source);
|
||||
|
||||
safe_close(m->signal_fd);
|
||||
safe_close(m->notify_fd);
|
||||
@ -2892,6 +2933,47 @@ static int manager_dispatch_signal_fd(sd_event_source *source, int fd, uint32_t
|
||||
|
||||
switch (sfsi.ssi_signo - SIGRTMIN) {
|
||||
|
||||
case 18: {
|
||||
bool generic = false;
|
||||
|
||||
if (sfsi.ssi_code != SI_QUEUE)
|
||||
generic = true;
|
||||
else {
|
||||
/* Override a few select commands by our own PID1-specific logic */
|
||||
|
||||
switch (sfsi.ssi_int) {
|
||||
|
||||
case _COMMON_SIGNAL_COMMAND_LOG_LEVEL_BASE..._COMMON_SIGNAL_COMMAND_LOG_LEVEL_END:
|
||||
manager_override_log_level(m, sfsi.ssi_int - _COMMON_SIGNAL_COMMAND_LOG_LEVEL_BASE);
|
||||
break;
|
||||
|
||||
case COMMON_SIGNAL_COMMAND_CONSOLE:
|
||||
manager_override_log_target(m, LOG_TARGET_CONSOLE);
|
||||
break;
|
||||
|
||||
case COMMON_SIGNAL_COMMAND_JOURNAL:
|
||||
manager_override_log_target(m, LOG_TARGET_JOURNAL);
|
||||
break;
|
||||
|
||||
case COMMON_SIGNAL_COMMAND_KMSG:
|
||||
manager_override_log_target(m, LOG_TARGET_KMSG);
|
||||
break;
|
||||
|
||||
case COMMON_SIGNAL_COMMAND_NULL:
|
||||
manager_override_log_target(m, LOG_TARGET_NULL);
|
||||
break;
|
||||
|
||||
default:
|
||||
generic = true;
|
||||
}
|
||||
}
|
||||
|
||||
if (generic)
|
||||
return sigrtmin18_handler(source, &sfsi, NULL);
|
||||
|
||||
break;
|
||||
}
|
||||
|
||||
case 20:
|
||||
manager_override_show_status(m, SHOW_STATUS_YES, "signal");
|
||||
break;
|
||||
|
@ -377,6 +377,9 @@ struct Manager {
|
||||
int default_oom_score_adjust;
|
||||
bool default_oom_score_adjust_set;
|
||||
|
||||
CGroupPressureWatch default_memory_pressure_watch;
|
||||
usec_t default_memory_pressure_threshold_usec;
|
||||
|
||||
int original_log_level;
|
||||
LogTarget original_log_target;
|
||||
bool log_level_overridden;
|
||||
@ -464,6 +467,8 @@ struct Manager {
|
||||
|
||||
/* Allow users to configure a rate limit for Reload() operations */
|
||||
RateLimit reload_ratelimit;
|
||||
|
||||
sd_event_source *memory_pressure_event_source;
|
||||
};
|
||||
|
||||
static inline usec_t manager_default_timeout_abort_usec(Manager *m) {
|
||||
@ -517,6 +522,8 @@ void manager_unwatch_pid(Manager *m, pid_t pid);
|
||||
|
||||
unsigned manager_dispatch_load_queue(Manager *m);
|
||||
|
||||
int manager_setup_memory_pressure_event_source(Manager *m);
|
||||
|
||||
int manager_default_environment(Manager *m);
|
||||
int manager_transient_environment_add(Manager *m, char **plus);
|
||||
int manager_client_environment_modify(Manager *m, char **minus, char **plus);
|
||||
|
@ -922,6 +922,7 @@ static int mount_spawn(Mount *m, ExecCommand *c, pid_t *_pid) {
|
||||
&exec_params,
|
||||
m->exec_runtime,
|
||||
&m->dynamic_creds,
|
||||
&m->cgroup_context,
|
||||
&pid);
|
||||
if (r < 0)
|
||||
return r;
|
||||
|
@ -1709,6 +1709,7 @@ static int service_spawn_internal(
|
||||
&exec_params,
|
||||
s->exec_runtime,
|
||||
&s->dynamic_creds,
|
||||
&s->cgroup_context,
|
||||
&pid);
|
||||
if (r < 0)
|
||||
return r;
|
||||
|
@ -1948,6 +1948,7 @@ static int socket_spawn(Socket *s, ExecCommand *c, pid_t *_pid) {
|
||||
&exec_params,
|
||||
s->exec_runtime,
|
||||
&s->dynamic_creds,
|
||||
&s->cgroup_context,
|
||||
&pid);
|
||||
if (r < 0)
|
||||
return r;
|
||||
|
@ -690,6 +690,7 @@ static int swap_spawn(Swap *s, ExecCommand *c, pid_t *_pid) {
|
||||
&exec_params,
|
||||
s->exec_runtime,
|
||||
&s->dynamic_creds,
|
||||
&s->cgroup_context,
|
||||
&pid);
|
||||
if (r < 0)
|
||||
goto fail;
|
||||
|
@ -184,6 +184,9 @@ static void unit_init(Unit *u) {
|
||||
|
||||
if (u->type != UNIT_SLICE)
|
||||
cc->tasks_max = u->manager->default_tasks_max;
|
||||
|
||||
cc->memory_pressure_watch = u->manager->default_memory_pressure_watch;
|
||||
cc->memory_pressure_threshold_usec = u->manager->default_memory_pressure_threshold_usec;
|
||||
}
|
||||
|
||||
ec = unit_get_exec_context(u);
|
||||
|
@ -18,6 +18,7 @@
|
||||
#include "bus-log-control-api.h"
|
||||
#include "bus-polkit.h"
|
||||
#include "clean-ipc.h"
|
||||
#include "common-signal.h"
|
||||
#include "conf-files.h"
|
||||
#include "device-util.h"
|
||||
#include "dirent-util.h"
|
||||
@ -225,6 +226,15 @@ int manager_new(Manager **ret) {
|
||||
if (r < 0)
|
||||
return r;
|
||||
|
||||
r = sd_event_add_memory_pressure(m->event, NULL, NULL, NULL);
|
||||
if (r < 0)
|
||||
log_full_errno(ERRNO_IS_NOT_SUPPORTED(r) || ERRNO_IS_PRIVILEGE(r) || (r == -EHOSTDOWN) ? LOG_DEBUG : LOG_WARNING, r,
|
||||
"Failed to allocate memory pressure watch, ignoring: %m");
|
||||
|
||||
r = sd_event_add_signal(m->event, NULL, SIGRTMIN+18, sigrtmin18_handler, NULL);
|
||||
if (r < 0)
|
||||
return r;
|
||||
|
||||
(void) sd_event_set_watchdog(m->event, true);
|
||||
|
||||
m->homes_by_uid = hashmap_new(&homes_by_uid_hash_ops);
|
||||
|
@ -29,7 +29,7 @@ static int run(int argc, char *argv[]) {
|
||||
|
||||
umask(0022);
|
||||
|
||||
assert_se(sigprocmask_many(SIG_BLOCK, NULL, SIGCHLD, SIGTERM, SIGINT, -1) >= 0);
|
||||
assert_se(sigprocmask_many(SIG_BLOCK, NULL, SIGCHLD, SIGTERM, SIGINT, SIGRTMIN+18, -1) >= 0);
|
||||
|
||||
r = manager_new(&m);
|
||||
if (r < 0)
|
||||
|
@ -10,6 +10,7 @@
|
||||
#include "bus-get-properties.h"
|
||||
#include "bus-log-control-api.h"
|
||||
#include "bus-polkit.h"
|
||||
#include "common-signal.h"
|
||||
#include "constants.h"
|
||||
#include "env-util.h"
|
||||
#include "fd-util.h"
|
||||
@ -636,7 +637,23 @@ static int manager_new(Manager **ret) {
|
||||
if (r < 0)
|
||||
return r;
|
||||
|
||||
sd_event_set_watchdog(m->event, true);
|
||||
(void) sd_event_set_watchdog(m->event, true);
|
||||
|
||||
r = sd_event_add_signal(m->event, NULL, SIGINT, NULL, NULL);
|
||||
if (r < 0)
|
||||
return r;
|
||||
|
||||
r = sd_event_add_signal(m->event, NULL, SIGTERM, NULL, NULL);
|
||||
if (r < 0)
|
||||
return r;
|
||||
|
||||
r = sd_event_add_signal(m->event, NULL, SIGRTMIN+18, sigrtmin18_handler, NULL);
|
||||
if (r < 0)
|
||||
return r;
|
||||
|
||||
r = sd_event_add_memory_pressure(m->event, NULL, NULL, NULL);
|
||||
if (r < 0)
|
||||
log_debug_errno(r, "Failed allocate memory pressure event source, ignoring: %m");
|
||||
|
||||
r = sd_bus_default_system(&m->bus);
|
||||
if (r < 0)
|
||||
@ -1389,7 +1406,7 @@ static int run(int argc, char *argv[]) {
|
||||
|
||||
umask(0022);
|
||||
|
||||
assert_se(sigprocmask_many(SIG_BLOCK, NULL, SIGCHLD, -1) >= 0);
|
||||
assert_se(sigprocmask_many(SIG_BLOCK, NULL, SIGCHLD, SIGTERM, SIGINT, SIGRTMIN+18, -1) >= 0);
|
||||
|
||||
r = manager_new(&m);
|
||||
if (r < 0)
|
||||
|
@ -636,6 +636,10 @@ static void client_context_try_shrink_to(Server *s, size_t limit) {
|
||||
}
|
||||
}
|
||||
|
||||
void client_context_flush_regular(Server *s) {
|
||||
client_context_try_shrink_to(s, 0);
|
||||
}
|
||||
|
||||
void client_context_flush_all(Server *s) {
|
||||
assert(s);
|
||||
|
||||
@ -644,7 +648,7 @@ void client_context_flush_all(Server *s) {
|
||||
s->my_context = client_context_release(s, s->my_context);
|
||||
s->pid1_context = client_context_release(s, s->pid1_context);
|
||||
|
||||
client_context_try_shrink_to(s, 0);
|
||||
client_context_flush_regular(s);
|
||||
|
||||
assert(prioq_size(s->client_contexts_lru) == 0);
|
||||
assert(hashmap_size(s->client_contexts) == 0);
|
||||
|
@ -89,6 +89,7 @@ void client_context_maybe_refresh(
|
||||
|
||||
void client_context_acquire_default(Server *s);
|
||||
void client_context_flush_all(Server *s);
|
||||
void client_context_flush_regular(Server *s);
|
||||
|
||||
static inline size_t client_context_extra_fields_n_iovec(const ClientContext *c) {
|
||||
return c ? c->extra_fields_n_iovec : 0;
|
||||
|
@ -1707,7 +1707,7 @@ static int server_setup_signals(Server *s) {
|
||||
|
||||
assert(s);
|
||||
|
||||
assert_se(sigprocmask_many(SIG_SETMASK, NULL, SIGINT, SIGTERM, SIGUSR1, SIGUSR2, SIGRTMIN+1, -1) >= 0);
|
||||
assert_se(sigprocmask_many(SIG_SETMASK, NULL, SIGINT, SIGTERM, SIGUSR1, SIGUSR2, SIGRTMIN+1, SIGRTMIN+18, -1) >= 0);
|
||||
|
||||
r = sd_event_add_signal(s->event, &s->sigusr1_event_source, SIGUSR1, dispatch_sigusr1, s);
|
||||
if (r < 0)
|
||||
@ -1747,6 +1747,10 @@ static int server_setup_signals(Server *s) {
|
||||
if (r < 0)
|
||||
return r;
|
||||
|
||||
r = sd_event_add_signal(s->event, NULL, SIGRTMIN+18, sigrtmin18_handler, &s->sigrtmin18_info);
|
||||
if (r < 0)
|
||||
return r;
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
@ -2420,6 +2424,42 @@ static int server_set_namespace(Server *s, const char *namespace) {
|
||||
return 1;
|
||||
}
|
||||
|
||||
static int server_memory_pressure(sd_event_source *es, void *userdata) {
|
||||
Server *s = ASSERT_PTR(userdata);
|
||||
|
||||
log_info("Under memory pressure, flushing caches.");
|
||||
|
||||
/* Flushed the cached info we might have about client processes */
|
||||
client_context_flush_regular(s);
|
||||
|
||||
/* Let's also close all user files (but keep the system/runtime one open) */
|
||||
for (;;) {
|
||||
ManagedJournalFile *first = ordered_hashmap_steal_first(s->user_journals);
|
||||
|
||||
if (!first)
|
||||
break;
|
||||
|
||||
(void) managed_journal_file_close(first);
|
||||
}
|
||||
|
||||
sd_event_trim_memory();
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int server_setup_memory_pressure(Server *s) {
|
||||
int r;
|
||||
|
||||
assert(s);
|
||||
|
||||
r = sd_event_add_memory_pressure(s->event, NULL, server_memory_pressure, s);
|
||||
if (r < 0)
|
||||
log_full_errno(ERRNO_IS_NOT_SUPPORTED(r) || ERRNO_IS_PRIVILEGE(r) || (r == -EHOSTDOWN) ? LOG_DEBUG : LOG_NOTICE, r,
|
||||
"Failed to install memory pressure event source, ignoring: %m");
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
int server_init(Server *s, const char *namespace) {
|
||||
const char *native_socket, *syslog_socket, *stdout_socket, *varlink_socket, *e;
|
||||
_cleanup_fdset_free_ FDSet *fds = NULL;
|
||||
@ -2470,6 +2510,9 @@ int server_init(Server *s, const char *namespace) {
|
||||
.interval = DEFAULT_KMSG_OWN_INTERVAL,
|
||||
.burst = DEFAULT_KMSG_OWN_BURST,
|
||||
},
|
||||
|
||||
.sigrtmin18_info.memory_pressure_handler = server_memory_pressure,
|
||||
.sigrtmin18_info.memory_pressure_userdata = s,
|
||||
};
|
||||
|
||||
r = server_set_namespace(s, namespace);
|
||||
@ -2652,6 +2695,10 @@ int server_init(Server *s, const char *namespace) {
|
||||
if (r < 0)
|
||||
return r;
|
||||
|
||||
r = server_setup_memory_pressure(s);
|
||||
if (r < 0)
|
||||
return r;
|
||||
|
||||
s->ratelimit = journal_ratelimit_new();
|
||||
if (!s->ratelimit)
|
||||
return log_oom();
|
||||
|
@ -8,6 +8,7 @@
|
||||
|
||||
typedef struct Server Server;
|
||||
|
||||
#include "common-signal.h"
|
||||
#include "conf-parser.h"
|
||||
#include "hashmap.h"
|
||||
#include "journald-context.h"
|
||||
@ -95,6 +96,7 @@ struct Server {
|
||||
sd_event_source *notify_event_source;
|
||||
sd_event_source *watchdog_event_source;
|
||||
sd_event_source *idle_event_source;
|
||||
struct sigrtmin18_info sigrtmin18_info;
|
||||
|
||||
ManagedJournalFile *runtime_journal;
|
||||
ManagedJournalFile *system_journal;
|
||||
|
@ -14,6 +14,7 @@
|
||||
#include "bus-log-control-api.h"
|
||||
#include "bus-polkit.h"
|
||||
#include "cgroup-util.h"
|
||||
#include "common-signal.h"
|
||||
#include "constants.h"
|
||||
#include "daemon-util.h"
|
||||
#include "device-util.h"
|
||||
@ -85,6 +86,14 @@ static int manager_new(Manager **ret) {
|
||||
if (r < 0)
|
||||
return r;
|
||||
|
||||
r = sd_event_add_signal(m->event, NULL, SIGRTMIN+18, sigrtmin18_handler, NULL);
|
||||
if (r < 0)
|
||||
return r;
|
||||
|
||||
r = sd_event_add_memory_pressure(m->event, NULL, NULL, NULL);
|
||||
if (r < 0)
|
||||
log_debug_errno(r, "Failed allocate memory pressure event source, ignoring: %m");
|
||||
|
||||
(void) sd_event_set_watchdog(m->event, true);
|
||||
|
||||
manager_reset_config(m);
|
||||
@ -1196,7 +1205,7 @@ static int run(int argc, char *argv[]) {
|
||||
(void) mkdir_label("/run/systemd/users", 0755);
|
||||
(void) mkdir_label("/run/systemd/sessions", 0755);
|
||||
|
||||
assert_se(sigprocmask_many(SIG_BLOCK, NULL, SIGHUP, SIGTERM, SIGINT, SIGCHLD, -1) >= 0);
|
||||
assert_se(sigprocmask_many(SIG_BLOCK, NULL, SIGHUP, SIGTERM, SIGINT, SIGCHLD, SIGRTMIN+18, -1) >= 0);
|
||||
|
||||
r = manager_new(&m);
|
||||
if (r < 0)
|
||||
|
@ -12,6 +12,7 @@
|
||||
#include "bus-log-control-api.h"
|
||||
#include "bus-polkit.h"
|
||||
#include "cgroup-util.h"
|
||||
#include "common-signal.h"
|
||||
#include "daemon-util.h"
|
||||
#include "dirent-util.h"
|
||||
#include "discover-image.h"
|
||||
@ -61,6 +62,15 @@ static int manager_new(Manager **ret) {
|
||||
if (r < 0)
|
||||
return r;
|
||||
|
||||
r = sd_event_add_signal(m->event, NULL, SIGRTMIN+18, sigrtmin18_handler, NULL);
|
||||
if (r < 0)
|
||||
return r;
|
||||
|
||||
r = sd_event_add_memory_pressure(m->event, NULL, NULL, NULL);
|
||||
if (r < 0)
|
||||
log_full_errno(ERRNO_IS_NOT_SUPPORTED(r) || ERRNO_IS_PRIVILEGE(r) || r == -EHOSTDOWN ? LOG_DEBUG : LOG_NOTICE, r,
|
||||
"Unable to create memory pressure event source, ignoring: %m");
|
||||
|
||||
(void) sd_event_set_watchdog(m->event, true);
|
||||
|
||||
*ret = TAKE_PTR(m);
|
||||
@ -339,7 +349,7 @@ static int run(int argc, char *argv[]) {
|
||||
* make sure this check stays in. */
|
||||
(void) mkdir_label("/run/systemd/machines", 0755);
|
||||
|
||||
assert_se(sigprocmask_many(SIG_BLOCK, NULL, SIGCHLD, SIGTERM, SIGINT, -1) >= 0);
|
||||
assert_se(sigprocmask_many(SIG_BLOCK, NULL, SIGCHLD, SIGTERM, SIGINT, SIGRTMIN+18, -1) >= 0);
|
||||
|
||||
r = manager_new(&m);
|
||||
if (r < 0)
|
||||
|
@ -16,6 +16,7 @@
|
||||
#include "bus-log-control-api.h"
|
||||
#include "bus-polkit.h"
|
||||
#include "bus-util.h"
|
||||
#include "common-signal.h"
|
||||
#include "conf-parser.h"
|
||||
#include "constants.h"
|
||||
#include "daemon-util.h"
|
||||
@ -521,6 +522,11 @@ int manager_setup(Manager *m) {
|
||||
(void) sd_event_add_signal(m->event, NULL, SIGINT | SD_EVENT_SIGNAL_PROCMASK, signal_terminate_callback, m);
|
||||
(void) sd_event_add_signal(m->event, NULL, SIGUSR2 | SD_EVENT_SIGNAL_PROCMASK, signal_restart_callback, m);
|
||||
(void) sd_event_add_signal(m->event, NULL, SIGHUP | SD_EVENT_SIGNAL_PROCMASK, signal_reload_callback, m);
|
||||
(void) sd_event_add_signal(m->event, NULL, (SIGRTMIN+18) | SD_EVENT_SIGNAL_PROCMASK, sigrtmin18_handler, NULL);
|
||||
|
||||
r = sd_event_add_memory_pressure(m->event, NULL, NULL, NULL);
|
||||
if (r < 0)
|
||||
log_debug_errno(r, "Failed allocate memory pressure event source, ignoring: %m");
|
||||
|
||||
r = sd_event_add_post(m->event, NULL, manager_dirty_handler, m);
|
||||
if (r < 0)
|
||||
|
@ -35,6 +35,7 @@
|
||||
#include "capability-util.h"
|
||||
#include "cgroup-util.h"
|
||||
#include "chase-symlinks.h"
|
||||
#include "common-signal.h"
|
||||
#include "copy.h"
|
||||
#include "cpu-set-util.h"
|
||||
#include "creds-util.h"
|
||||
@ -5162,6 +5163,12 @@ static int run_container(
|
||||
(void) sd_event_add_signal(event, NULL, SIGTERM, NULL, NULL);
|
||||
}
|
||||
|
||||
(void) sd_event_add_signal(event, NULL, SIGRTMIN+18, sigrtmin18_handler, NULL);
|
||||
|
||||
r = sd_event_add_memory_pressure(event, NULL, NULL, NULL);
|
||||
if (r < 0)
|
||||
log_debug_errno(r, "Failed allocate memory pressure event source, ignoring: %m");
|
||||
|
||||
/* Exit when the child exits */
|
||||
(void) sd_event_add_signal(event, NULL, SIGCHLD, on_sigchld, PID_TO_PTR(*pid));
|
||||
|
||||
@ -5803,7 +5810,7 @@ static int run(int argc, char *argv[]) {
|
||||
log_info("Spawning container %s on %s.\nPress Ctrl-] three times within 1s to kill container.",
|
||||
arg_machine, arg_image ?: arg_directory);
|
||||
|
||||
assert_se(sigprocmask_many(SIG_BLOCK, NULL, SIGCHLD, SIGWINCH, SIGTERM, SIGINT, -1) >= 0);
|
||||
assert_se(sigprocmask_many(SIG_BLOCK, NULL, SIGCHLD, SIGWINCH, SIGTERM, SIGINT, SIGRTMIN+18, -1) >= 0);
|
||||
|
||||
if (prctl(PR_SET_CHILD_SUBREAPER, 1, 0, 0, 0) < 0) {
|
||||
r = log_error_errno(errno, "Failed to become subreaper: %m");
|
||||
|
@ -8,6 +8,7 @@
|
||||
#include "alloc-util.h"
|
||||
#include "bus-log-control-api.h"
|
||||
#include "bus-polkit.h"
|
||||
#include "common-signal.h"
|
||||
#include "constants.h"
|
||||
#include "daemon-util.h"
|
||||
#include "main-func.h"
|
||||
@ -43,6 +44,14 @@ static int manager_new(Manager **ret) {
|
||||
if (r < 0)
|
||||
return r;
|
||||
|
||||
r = sd_event_add_signal(m->event, NULL, SIGRTMIN+18, sigrtmin18_handler, NULL);
|
||||
if (r < 0)
|
||||
return r;
|
||||
|
||||
r = sd_event_add_memory_pressure(m->event, NULL, NULL, NULL);
|
||||
if (r < 0)
|
||||
log_debug_errno(r, "Failed allocate memory pressure event source, ignoring: %m");
|
||||
|
||||
(void) sd_event_set_watchdog(m->event, true);
|
||||
|
||||
*ret = TAKE_PTR(m);
|
||||
@ -143,7 +152,7 @@ static int run(int argc, char *argv[]) {
|
||||
if (argc != 1)
|
||||
return log_error_errno(SYNTHETIC_ERRNO(EINVAL), "This program takes no arguments.");
|
||||
|
||||
assert_se(sigprocmask_many(SIG_BLOCK, NULL, SIGCHLD, SIGTERM, SIGINT, -1) >= 0);
|
||||
assert_se(sigprocmask_many(SIG_BLOCK, NULL, SIGCHLD, SIGTERM, SIGINT, SIGRTMIN+18, -1) >= 0);
|
||||
|
||||
r = manager_new(&m);
|
||||
if (r < 0)
|
||||
|
@ -543,6 +543,30 @@ static int manager_sigrtmin1(sd_event_source *s, const struct signalfd_siginfo *
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int manager_memory_pressure(sd_event_source *s, void *userdata) {
|
||||
Manager *m = ASSERT_PTR(userdata);
|
||||
|
||||
log_info("Under memory pressure, flushing caches.");
|
||||
|
||||
manager_flush_caches(m, LOG_INFO);
|
||||
sd_event_trim_memory();
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int manager_memory_pressure_listen(Manager *m) {
|
||||
int r;
|
||||
|
||||
assert(m);
|
||||
|
||||
r = sd_event_add_memory_pressure(m->event, NULL, manager_memory_pressure, m);
|
||||
if (r < 0)
|
||||
log_full_errno(ERRNO_IS_NOT_SUPPORTED(r) || ERRNO_IS_PRIVILEGE(r) || (r == -EHOSTDOWN )? LOG_DEBUG : LOG_NOTICE, r,
|
||||
"Failed to install memory pressure event source, ignoring: %m");
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
int manager_new(Manager **ret) {
|
||||
_cleanup_(manager_freep) Manager *m = NULL;
|
||||
int r;
|
||||
@ -572,6 +596,9 @@ int manager_new(Manager **ret) {
|
||||
.need_builtin_fallbacks = true,
|
||||
.etc_hosts_last = USEC_INFINITY,
|
||||
.read_etc_hosts = true,
|
||||
|
||||
.sigrtmin18_info.memory_pressure_handler = manager_memory_pressure,
|
||||
.sigrtmin18_info.memory_pressure_userdata = m,
|
||||
};
|
||||
|
||||
r = dns_trust_anchor_load(&m->trust_anchor);
|
||||
@ -621,6 +648,10 @@ int manager_new(Manager **ret) {
|
||||
if (r < 0)
|
||||
return r;
|
||||
|
||||
r = manager_memory_pressure_listen(m);
|
||||
if (r < 0)
|
||||
return r;
|
||||
|
||||
r = manager_connect_bus(m);
|
||||
if (r < 0)
|
||||
return r;
|
||||
@ -628,6 +659,7 @@ int manager_new(Manager **ret) {
|
||||
(void) sd_event_add_signal(m->event, &m->sigusr1_event_source, SIGUSR1, manager_sigusr1, m);
|
||||
(void) sd_event_add_signal(m->event, &m->sigusr2_event_source, SIGUSR2, manager_sigusr2, m);
|
||||
(void) sd_event_add_signal(m->event, &m->sigrtmin1_event_source, SIGRTMIN+1, manager_sigrtmin1, m);
|
||||
(void) sd_event_add_signal(m->event, NULL, SIGRTMIN+18, sigrtmin18_handler, &m->sigrtmin18_info);
|
||||
|
||||
manager_cleanup_saved_user(m);
|
||||
|
||||
|
@ -7,6 +7,7 @@
|
||||
#include "sd-netlink.h"
|
||||
#include "sd-network.h"
|
||||
|
||||
#include "common-signal.h"
|
||||
#include "hashmap.h"
|
||||
#include "list.h"
|
||||
#include "ordered-set.h"
|
||||
@ -156,6 +157,8 @@ struct Manager {
|
||||
LIST_HEAD(SocketGraveyard, socket_graveyard);
|
||||
SocketGraveyard *socket_graveyard_oldest;
|
||||
size_t n_socket_graveyard;
|
||||
|
||||
struct sigrtmin18_info sigrtmin18_info;
|
||||
};
|
||||
|
||||
/* Manager */
|
||||
|
@ -67,7 +67,7 @@ static int run(int argc, char *argv[]) {
|
||||
return log_error_errno(r, "Failed to drop privileges: %m");
|
||||
}
|
||||
|
||||
assert_se(sigprocmask_many(SIG_BLOCK, NULL, SIGTERM, SIGINT, SIGUSR1, SIGUSR2, SIGRTMIN+1, -1) >= 0);
|
||||
assert_se(sigprocmask_many(SIG_BLOCK, NULL, SIGTERM, SIGINT, SIGUSR1, SIGUSR2, SIGRTMIN+1, SIGRTMIN+18, -1) >= 0);
|
||||
|
||||
r = manager_new(&m);
|
||||
if (r < 0)
|
||||
|
@ -460,7 +460,8 @@ static int bus_append_cgroup_property(sd_bus_message *m, const char *field, cons
|
||||
"Slice",
|
||||
"ManagedOOMSwap",
|
||||
"ManagedOOMMemoryPressure",
|
||||
"ManagedOOMPreference"))
|
||||
"ManagedOOMPreference",
|
||||
"MemoryPressureWatch"))
|
||||
return bus_append_string(m, field, eq);
|
||||
|
||||
if (STR_IN_SET(field, "ManagedOOMMemoryPressureLimit")) {
|
||||
@ -913,6 +914,9 @@ static int bus_append_cgroup_property(sd_bus_message *m, const char *field, cons
|
||||
return 1;
|
||||
}
|
||||
|
||||
if (streq(field, "MemoryPressureThresholdSec"))
|
||||
return bus_append_parse_sec_rename(m, field, eq);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
|
94
src/shared/common-signal.c
Normal file
94
src/shared/common-signal.c
Normal file
@ -0,0 +1,94 @@
|
||||
/* SPDX-License-Identifier: LGPL-2.1-or-later */
|
||||
|
||||
#include "common-signal.h"
|
||||
#include "fd-util.h"
|
||||
#include "fileio.h"
|
||||
#include "process-util.h"
|
||||
#include "signal-util.h"
|
||||
|
||||
int sigrtmin18_handler(sd_event_source *s, const struct signalfd_siginfo *si, void *userdata) {
|
||||
struct sigrtmin18_info *info = userdata;
|
||||
_cleanup_free_ char *comm = NULL;
|
||||
int r;
|
||||
|
||||
assert(s);
|
||||
assert(si);
|
||||
|
||||
(void) get_process_comm(si->ssi_pid, &comm);
|
||||
|
||||
if (si->ssi_code != SI_QUEUE) {
|
||||
log_notice("Received control signal %s from process " PID_FMT " (%s) without command value, ignoring.",
|
||||
signal_to_string(si->ssi_signo),
|
||||
(pid_t) si->ssi_pid,
|
||||
strna(comm));
|
||||
return 0;
|
||||
}
|
||||
|
||||
log_debug("Received control signal %s from process " PID_FMT " (%s) with command 0x%08x.",
|
||||
signal_to_string(si->ssi_signo),
|
||||
(pid_t) si->ssi_pid,
|
||||
strna(comm),
|
||||
(unsigned) si->ssi_int);
|
||||
|
||||
switch (si->ssi_int) {
|
||||
|
||||
case _COMMON_SIGNAL_COMMAND_LOG_LEVEL_BASE..._COMMON_SIGNAL_COMMAND_LOG_LEVEL_END:
|
||||
log_set_max_level(si->ssi_int - _COMMON_SIGNAL_COMMAND_LOG_LEVEL_BASE);
|
||||
break;
|
||||
|
||||
case COMMON_SIGNAL_COMMAND_CONSOLE:
|
||||
log_set_target_and_open(LOG_TARGET_CONSOLE);
|
||||
break;
|
||||
case COMMON_SIGNAL_COMMAND_JOURNAL:
|
||||
log_set_target_and_open(LOG_TARGET_JOURNAL);
|
||||
break;
|
||||
case COMMON_SIGNAL_COMMAND_KMSG:
|
||||
log_set_target_and_open(LOG_TARGET_KMSG);
|
||||
break;
|
||||
case COMMON_SIGNAL_COMMAND_NULL:
|
||||
log_set_target_and_open(LOG_TARGET_NULL);
|
||||
break;
|
||||
|
||||
case COMMON_SIGNAL_COMMAND_MEMORY_PRESSURE:
|
||||
if (info && info->memory_pressure_handler)
|
||||
return info->memory_pressure_handler(s, info->memory_pressure_userdata);
|
||||
|
||||
sd_event_trim_memory();
|
||||
break;
|
||||
|
||||
case COMMON_SIGNAL_COMMAND_MALLOC_INFO: {
|
||||
_cleanup_free_ char *data = NULL;
|
||||
_cleanup_fclose_ FILE *f = NULL;
|
||||
size_t sz;
|
||||
|
||||
f = open_memstream_unlocked(&data, &sz);
|
||||
if (!f) {
|
||||
log_oom();
|
||||
break;
|
||||
}
|
||||
|
||||
if (malloc_info(0, f) < 0) {
|
||||
log_error_errno(errno, "Failed to invoke malloc_info(): %m");
|
||||
break;
|
||||
}
|
||||
|
||||
fputc(0, f);
|
||||
|
||||
r = fflush_and_check(f);
|
||||
if (r < 0) {
|
||||
log_error_errno(r, "Failed to flush malloc_info() buffer: %m");
|
||||
break;
|
||||
}
|
||||
|
||||
log_dump(LOG_INFO, data);
|
||||
break;
|
||||
}
|
||||
|
||||
default:
|
||||
log_notice("Received control signal %s with unknown command 0x%08x, ignoring.",
|
||||
signal_to_string(si->ssi_signo), (unsigned) si->ssi_int);
|
||||
break;
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
58
src/shared/common-signal.h
Normal file
58
src/shared/common-signal.h
Normal file
@ -0,0 +1,58 @@
|
||||
/* SPDX-License-Identifier: LGPL-2.1-or-later */
|
||||
|
||||
#include <syslog.h>
|
||||
|
||||
#include <sd-event.h>
|
||||
|
||||
/* All our long-running services should implement a SIGRTMIN+18 handler that can be used to trigger certain
|
||||
* actions that affect service runtime. The specific action is indicated via the "value integer" you can pass
|
||||
* along realtime signals. This is mostly intended for debugging purposes and is entirely asynchronous in
|
||||
* nature. Specifically, these are the commands:
|
||||
*
|
||||
* Currently available operations:
|
||||
*
|
||||
* • Change maximum log level
|
||||
* • Change log target
|
||||
* • Invoke memory trimming, like under memory pressure
|
||||
* • Write glibc malloc() allocation info to logs
|
||||
*
|
||||
* How to use this? Via a command like the following:
|
||||
*
|
||||
* /usr/bin/kill -s RTMIN+18 -q 768 1
|
||||
*
|
||||
* (This will tell PID 1 to trim its memory use.)
|
||||
*
|
||||
* or:
|
||||
*
|
||||
* systemctl kill --kill-value=0x300 -s RTMIN+18 systemd-journald
|
||||
*
|
||||
* (This will tell journald to trim its memory use.)
|
||||
*/
|
||||
|
||||
enum {
|
||||
_COMMON_SIGNAL_COMMAND_LOG_LEVEL_BASE = 0x100,
|
||||
COMMON_SIGNAL_COMMAND_LOG_EMERG = _COMMON_SIGNAL_COMMAND_LOG_LEVEL_BASE + LOG_EMERG,
|
||||
COMMON_SIGNAL_COMMAND_LOG_ALERT = _COMMON_SIGNAL_COMMAND_LOG_LEVEL_BASE + LOG_ALERT,
|
||||
COMMON_SIGNAL_COMMAND_LOG_CRIT = _COMMON_SIGNAL_COMMAND_LOG_LEVEL_BASE + LOG_CRIT,
|
||||
COMMON_SIGNAL_COMMAND_LOG_ERR = _COMMON_SIGNAL_COMMAND_LOG_LEVEL_BASE + LOG_ERR,
|
||||
COMMON_SIGNAL_COMMAND_LOG_WARNING = _COMMON_SIGNAL_COMMAND_LOG_LEVEL_BASE + LOG_WARNING,
|
||||
COMMON_SIGNAL_COMMAND_LOG_NOTICE = _COMMON_SIGNAL_COMMAND_LOG_LEVEL_BASE + LOG_NOTICE,
|
||||
COMMON_SIGNAL_COMMAND_LOG_INFO = _COMMON_SIGNAL_COMMAND_LOG_LEVEL_BASE + LOG_INFO,
|
||||
COMMON_SIGNAL_COMMAND_LOG_DEBUG = _COMMON_SIGNAL_COMMAND_LOG_LEVEL_BASE + LOG_DEBUG,
|
||||
_COMMON_SIGNAL_COMMAND_LOG_LEVEL_END = COMMON_SIGNAL_COMMAND_LOG_DEBUG,
|
||||
|
||||
COMMON_SIGNAL_COMMAND_CONSOLE = 0x200,
|
||||
COMMON_SIGNAL_COMMAND_JOURNAL,
|
||||
COMMON_SIGNAL_COMMAND_KMSG,
|
||||
COMMON_SIGNAL_COMMAND_NULL,
|
||||
|
||||
COMMON_SIGNAL_COMMAND_MEMORY_PRESSURE = 0x300,
|
||||
COMMON_SIGNAL_COMMAND_MALLOC_INFO,
|
||||
};
|
||||
|
||||
struct sigrtmin18_info {
|
||||
sd_event_handler_t memory_pressure_handler;
|
||||
void *memory_pressure_userdata;
|
||||
};
|
||||
|
||||
int sigrtmin18_handler(sd_event_source *s, const struct signalfd_siginfo *si, void *userdata);
|
@ -35,6 +35,7 @@ shared_sources = files(
|
||||
'chown-recursive.c',
|
||||
'clean-ipc.c',
|
||||
'clock-util.c',
|
||||
'common-signal.c',
|
||||
'compare-operator.c',
|
||||
'condition.c',
|
||||
'conf-parser.c',
|
||||
|
@ -15,6 +15,7 @@
|
||||
|
||||
#include "alloc-util.h"
|
||||
#include "bus-polkit.h"
|
||||
#include "common-signal.h"
|
||||
#include "dns-domain.h"
|
||||
#include "event-util.h"
|
||||
#include "fd-util.h"
|
||||
@ -1129,6 +1130,11 @@ int manager_new(Manager **ret) {
|
||||
|
||||
(void) sd_event_add_signal(m->event, NULL, SIGTERM, NULL, NULL);
|
||||
(void) sd_event_add_signal(m->event, NULL, SIGINT, NULL, NULL);
|
||||
(void) sd_event_add_signal(m->event, NULL, SIGRTMIN+18, sigrtmin18_handler, NULL);
|
||||
|
||||
r = sd_event_add_memory_pressure(m->event, NULL, NULL, NULL);
|
||||
if (r < 0)
|
||||
log_debug_errno(r, "Failed allocate memory pressure event source, ignoring: %m");
|
||||
|
||||
(void) sd_event_set_watchdog(m->event, true);
|
||||
|
||||
|
@ -174,7 +174,7 @@ static int run(int argc, char *argv[]) {
|
||||
return log_error_errno(r, "Failed to drop privileges: %m");
|
||||
}
|
||||
|
||||
assert_se(sigprocmask_many(SIG_BLOCK, NULL, SIGTERM, SIGINT, -1) >= 0);
|
||||
assert_se(sigprocmask_many(SIG_BLOCK, NULL, SIGTERM, SIGINT, SIGRTMIN+18, -1) >= 0);
|
||||
|
||||
r = manager_new(&m);
|
||||
if (r < 0)
|
||||
|
@ -31,6 +31,7 @@
|
||||
#include "blockdev-util.h"
|
||||
#include "cgroup-setup.h"
|
||||
#include "cgroup-util.h"
|
||||
#include "common-signal.h"
|
||||
#include "cpu-set-util.h"
|
||||
#include "daemon-util.h"
|
||||
#include "dev-setup.h"
|
||||
@ -112,6 +113,9 @@ typedef struct Manager {
|
||||
|
||||
sd_event_source *kill_workers_event;
|
||||
|
||||
sd_event_source *memory_pressure_event_source;
|
||||
sd_event_source *sigrtmin18_event_source;
|
||||
|
||||
usec_t last_usec;
|
||||
|
||||
bool udev_node_needs_cleanup;
|
||||
@ -264,6 +268,9 @@ static Manager* manager_free(Manager *manager) {
|
||||
safe_close(manager->inotify_fd);
|
||||
safe_close_pair(manager->worker_watch);
|
||||
|
||||
sd_event_source_unref(manager->memory_pressure_event_source);
|
||||
sd_event_source_unref(manager->sigrtmin18_event_source);
|
||||
|
||||
free(manager->cgroup);
|
||||
return mfree(manager);
|
||||
}
|
||||
@ -1918,7 +1925,7 @@ static int main_loop(Manager *manager) {
|
||||
udev_watch_restore(manager->inotify_fd);
|
||||
|
||||
/* block and listen to all signals on signalfd */
|
||||
assert_se(sigprocmask_many(SIG_BLOCK, NULL, SIGTERM, SIGINT, SIGHUP, SIGCHLD, -1) >= 0);
|
||||
assert_se(sigprocmask_many(SIG_BLOCK, NULL, SIGTERM, SIGINT, SIGHUP, SIGCHLD, SIGRTMIN+18, -1) >= 0);
|
||||
|
||||
r = sd_event_default(&manager->event);
|
||||
if (r < 0)
|
||||
@ -1976,6 +1983,16 @@ static int main_loop(Manager *manager) {
|
||||
if (r < 0)
|
||||
return log_error_errno(r, "Failed to create post event source: %m");
|
||||
|
||||
/* Eventually, we probably want to do more here on memory pressure, for example, kill idle workers immediately */
|
||||
r = sd_event_add_memory_pressure(manager->event, &manager->memory_pressure_event_source, NULL, NULL);
|
||||
if (r < 0)
|
||||
log_full_errno(ERRNO_IS_NOT_SUPPORTED(r) || ERRNO_IS_PRIVILEGE(r) || (r == -EHOSTDOWN) ? LOG_DEBUG : LOG_WARNING, r,
|
||||
"Failed to allocate memory pressure watch, ignoring: %m");
|
||||
|
||||
r = sd_event_add_signal(manager->event, &manager->memory_pressure_event_source, SIGRTMIN+18, sigrtmin18_handler, NULL);
|
||||
if (r < 0)
|
||||
return log_error_errno(r, "Failed to allocate SIGRTMIN+18 event source, ignoring: %m");
|
||||
|
||||
manager->last_usec = now(CLOCK_MONOTONIC);
|
||||
|
||||
udev_builtin_init();
|
||||
|
@ -4,6 +4,7 @@
|
||||
|
||||
#include "sd-daemon.h"
|
||||
|
||||
#include "common-signal.h"
|
||||
#include "fd-util.h"
|
||||
#include "fs-util.h"
|
||||
#include "mkdir.h"
|
||||
@ -102,6 +103,14 @@ int manager_new(Manager **ret) {
|
||||
if (r < 0)
|
||||
return r;
|
||||
|
||||
r = sd_event_add_signal(m->event, NULL, SIGRTMIN+18, sigrtmin18_handler, NULL);
|
||||
if (r < 0)
|
||||
return r;
|
||||
|
||||
r = sd_event_add_memory_pressure(m->event, NULL, NULL, NULL);
|
||||
if (r < 0)
|
||||
log_debug_errno(r, "Failed allocate memory pressure event source, ignoring: %m");
|
||||
|
||||
(void) sd_event_set_watchdog(m->event, true);
|
||||
|
||||
m->workers_fixed = set_new(NULL);
|
||||
|
@ -37,7 +37,7 @@ static int run(int argc, char *argv[]) {
|
||||
if (setenv("SYSTEMD_BYPASS_USERDB", "io.systemd.NameServiceSwitch:io.systemd.Multiplexer:io.systemd.DropIn", 1) < 0)
|
||||
return log_error_errno(errno, "Failed to set $SYSTEMD_BYPASS_USERDB: %m");
|
||||
|
||||
assert_se(sigprocmask_many(SIG_BLOCK, NULL, SIGCHLD, SIGTERM, SIGINT, SIGUSR2, -1) >= 0);
|
||||
assert_se(sigprocmask_many(SIG_BLOCK, NULL, SIGCHLD, SIGTERM, SIGINT, SIGUSR2, SIGRTMIN+18, -1) >= 0);
|
||||
|
||||
r = manager_new(&m);
|
||||
if (r < 0)
|
||||
|
1
test/TEST-79-MEMPRESS/Makefile
Symbolic link
1
test/TEST-79-MEMPRESS/Makefile
Symbolic link
@ -0,0 +1 @@
|
||||
../TEST-01-BASIC/Makefile
|
16
test/TEST-79-MEMPRESS/test.sh
Executable file
16
test/TEST-79-MEMPRESS/test.sh
Executable file
@ -0,0 +1,16 @@
|
||||
#!/usr/bin/env bash
|
||||
# SPDX-License-Identifier: LGPL-2.1-or-later
|
||||
set -e
|
||||
|
||||
TEST_DESCRIPTION="Test Memory Pressure handling"
|
||||
# Ignore gcov complaints caused by DynamicUser=true
|
||||
IGNORE_MISSING_COVERAGE=yes
|
||||
|
||||
# shellcheck source=test/test-functions
|
||||
. "$TEST_BASE_DIR/test-functions"
|
||||
|
||||
test_append_files() {
|
||||
image_install base64
|
||||
}
|
||||
|
||||
do_test "$@"
|
8
test/units/testsuite-79.service
Normal file
8
test/units/testsuite-79.service
Normal file
@ -0,0 +1,8 @@
|
||||
# SPDX-License-Identifier: LGPL-2.1-or-later
|
||||
[Unit]
|
||||
Description=TEST-79-MEMPRESS
|
||||
|
||||
[Service]
|
||||
Type=oneshot
|
||||
ExecStart=/usr/lib/systemd/tests/testdata/units/%N.sh
|
||||
MemoryAccounting=1
|
63
test/units/testsuite-79.sh
Executable file
63
test/units/testsuite-79.sh
Executable file
@ -0,0 +1,63 @@
|
||||
#!/usr/bin/env bash
|
||||
# SPDX-License-Identifier: LGPL-2.1-or-later
|
||||
set -ex
|
||||
set -o pipefail
|
||||
|
||||
# We not just test if the file exists, but try to read from it, since if
|
||||
# CONFIG_PSI_DEFAULT_DISABLED is set in the kernel the file will exist and can
|
||||
# be opened, but any read()s will fail with EOPNOTSUPP, which we want to
|
||||
# detect.
|
||||
if ! cat /proc/pressure/memory >/dev/null ; then
|
||||
echo "kernel too old, has no PSI." >&2
|
||||
echo OK >/testok
|
||||
exit 0
|
||||
fi
|
||||
|
||||
systemd-analyze log-level debug
|
||||
|
||||
CGROUP=/sys/fs/cgroup/"$(systemctl show testsuite-79.service -P ControlGroup)"
|
||||
test -d "$CGROUP"
|
||||
|
||||
if ! test -f "$CGROUP"/memory.pressure ; then
|
||||
echo "No memory accounting/PSI delegated via cgroup, can't test." >&2
|
||||
echo OK >/testok
|
||||
exit 0
|
||||
fi
|
||||
|
||||
UNIT="test-mempress-$RANDOM.service"
|
||||
SCRIPT="/run/bin/mempress-$RANDOM.sh"
|
||||
|
||||
mkdir -p "/run/bin"
|
||||
|
||||
cat >"$SCRIPT" <<'EOF'
|
||||
#!/bin/bash
|
||||
|
||||
set -ex
|
||||
|
||||
export
|
||||
id
|
||||
|
||||
test -n "$MEMORY_PRESSURE_WATCH"
|
||||
test "$MEMORY_PRESSURE_WATCH" != /dev/null
|
||||
test -w "$MEMORY_PRESSURE_WATCH"
|
||||
|
||||
ls -al "$MEMORY_PRESSURE_WATCH"
|
||||
|
||||
EXPECTED="$(echo -n -e "some 123000 1000000\x00" | base64)"
|
||||
|
||||
test "$EXPECTED" = "$MEMORY_PRESSURE_WRITE"
|
||||
|
||||
EOF
|
||||
|
||||
chmod +x "$SCRIPT"
|
||||
|
||||
systemd-run -u "$UNIT" -p Type=exec -p DynamicUser=1 -p MemoryPressureWatch=on -p MemoryPressureThresholdSec=123ms --wait "$SCRIPT"
|
||||
|
||||
rm "$SCRIPT"
|
||||
|
||||
rmdir /run/bin ||:
|
||||
|
||||
systemd-analyze log-level info
|
||||
echo OK >/testok
|
||||
|
||||
exit 0
|
@ -26,3 +26,4 @@ TasksMax=infinity
|
||||
TimeoutStopSec={{ DEFAULT_USER_TIMEOUT_SEC*4//3 }}s
|
||||
KeyringMode=inherit
|
||||
OOMScoreAdjust=100
|
||||
MemoryPressureWatch=skip
|
||||
|
Loading…
x
Reference in New Issue
Block a user