1
0
mirror of https://github.com/systemd/systemd.git synced 2024-12-22 17:35:35 +03:00
Commit Graph

163 Commits

Author SHA1 Message Date
Luca Boccassi
9e615fa3aa core: add WantsMountsFor=
This is the equivalent of RequiresMountsFor=, but adds Wants= instead
of Requires=. It will be useful for example for the autogenerated
systemd-cryptsetup units.

Fixes https://github.com/systemd/systemd/issues/11646
2023-11-29 11:04:59 +00:00
Yu Watanabe
58cde42f65 core: rename MemoryZswapCurrent -> MemoryZSwapCurrent
Follow-up for 26caa66867.
2023-11-13 13:54:56 +01:00
Florian Schmaus
26caa66867 cgroup: add support for memory.zswap.current 2023-11-12 21:10:40 +01:00
Florian Schmaus
37533c9432 cgroup: add support for memory.swap.current
In systemctl-show we only show current swap if ever swapped or non-zero. This
reduces the noise on swapless systems, that would otherwise always show a swap
value that never has the chance to become non-zero. It further reduces the
noise for services that never swapped.
2023-11-11 12:16:29 +01:00
Florian Schmaus
aac3384e56 cgroup: add support for memory.swap.peak 2023-11-11 12:14:07 +01:00
Florian Schmaus
6c71db763c cgroup: add support for memory.peak
Linux's Control Group v2 interfaces exposes memory.peak, which contains the
"max memory usage recorded for the cgroup and its descendants since the
creation of the cgroup."

This commit adds a new property "MemoryPeak" for units and makes "systemctl
show" display this value if it is available.

Fixes #29878.

Signed-off-by: Florian Schmaus <flo@geekplace.eu>
2023-11-06 18:08:33 +01:00
Lennart Poettering
cde8cc946b
Merge pull request #29272 from enr0n/coredump-container
coredump: support forwarding coredumps to containers
2023-10-16 16:13:16 +02:00
Luca Boccassi
7c83d42ef8 mount-util: use mount beneath to replace previous namespace mount
Instead of mounting over, do an atomic swap using mount beneath, if
available. This way assets can be mounted again and again (e.g.:
updates) without leaking mounts.
2023-10-16 14:33:47 +01:00
Nick Rosbrook
cfc015f09e man: document CoredumpReceive= setting 2023-10-13 15:28:50 -04:00
Mike Yuan
854eca4a95
core/execute: always set $USER and introduce SetLoginEnvironment=
Before this commit, $USER, $HOME, $LOGNAME and $SHELL are only
set when User= is set for the unit. For system service, this
results in different behaviors depending on whether User=root is set.

$USER always makes sense on its own, so let's set it unconditionally.
Ideally $HOME should be set too, but it causes trouble when e.g. getty
passes '-p' to login(1), which then doesn't override $HOME. $LOGNAME and
$SHELL are more like "login environments", and are generally not
suitable for system services. Therefore, a new option SetLoginEnvironment=
is also added to control the latter three variables.

Fixes #23438

Replaces #8227
2023-10-10 00:00:26 +08:00
Luca Boccassi
559214cbbd pid1: add SurviveFinalKillSignal= to skip units on final sigterm/sigkill spree
Add a new boolean for units, SurviveFinalKillSignal=yes/no. Units that
set it will not have their process receive the final sigterm/sigkill in
the shutdown phase.

This is implemented by checking if a process is part of a cgroup marked
with a user.survive_final_kill_signal xattr (or a trusted xattr if we
can't set a user one, which were added only in kernel v5.7 and are not
supported in CentOS 8).
2023-09-28 13:48:14 +01:00
Mike Yuan
6bd8340d11
man/org.freedesktop.systemd1: add version info for NFTSet
Follow-up for dc7d69b3c1
2023-09-28 03:04:28 +08:00
Topi Miettinen
dc7d69b3c1 core: firewall integration of cgroups with NFTSet=
New directive `NFTSet=` provides a method for integrating dynamic cgroup IDs
into firewall rules with NFT sets. The benefit of using this setting is to be
able to use control group as a selector in firewall rules easily and this in
turn allows more fine grained filtering. Also, NFT rules for cgroup matching
use numeric cgroup IDs, which change every time a service is restarted, making
them hard to use in systemd environment.

This option expects a whitespace separated list of NFT set definitions. Each
definition consists of a colon-separated tuple of source type (only "cgroup"),
NFT address family (one of "arp", "bridge", "inet", "ip", "ip6", or "netdev"),
table name and set name. The names of tables and sets must conform to lexical
restrictions of NFT table names. The type of the element used in the NFT filter
must be "cgroupsv2". When a control group for a unit is realized, the cgroup ID
will be appended to the NFT sets and it will be be removed when the control
group is removed.  systemd only inserts elements to (or removes from) the sets,
so the related NFT rules, tables and sets must be prepared elsewhere in
advance.  Failures to manage the sets will be ignored.

If the firewall rules are reinstalled so that the contents of NFT sets are
destroyed, command systemctl daemon-reload can be used to refill the sets.

Example:

```
table inet filter {
...
        set timesyncd {
                type cgroupsv2
        }

        chain ntp_output {
                socket cgroupv2 != @timesyncd counter drop
                accept
        }
...
}
```

/etc/systemd/system/systemd-timesyncd.service.d/override.conf
```
[Service]
NFTSet=cgroup:inet:filter:timesyncd
```

```
$ sudo nft list set inet filter timesyncd
table inet filter {
        set timesyncd {
                type cgroupsv2
                elements = { "system.slice/systemd-timesyncd.service" }
        }
}
```
2023-09-27 18:10:11 +00:00
Luca Boccassi
4c9a288154 man: document SystemState's possible values 2023-09-25 22:55:54 +01:00
Abderrahim Kitouni
d9d2d16aea man: add version information for dbus interfaces
These only go back to version 250 which is the first version to provide the
export-dbus-interfaces build target.
2023-09-19 14:33:34 +01:00
Lennart Poettering
2bec84e7a5 core: add new "PollLimit" settings to .socket units
This adds a new "PollLimit" pair of settings to .socket units, very
similar to existing "TriggerLimit" logic. The differences are:

* PollLimit focusses on the polling on the sockets, and pauses that
  temporarily if a ratelimit on that is reached. TriggerLimit otoh
  focusses on the triggering effect of socket units, and stops
  triggering once the ratelimit is hit.

* While the trigger limit being hit is an action that causes the socket
  unit to fail the polling limit being reached will just temporarily
  disable polling on the socket fd, and it is resumed once the ratelimit
  interval is over.

* When a socket unit operates on multiple socket fds (e,g, ListenStream=
  on both some ipv6 and an ipv4 address or so). Then the PollLimit will
  be specific to each fd, while the trigger limit is specific to the
  whole unit.

Implementation-wise this is mostly a wrapper around sd-event's
sd_event_source_set_ratelimit(), which exposes the desired behaviour
directly.

Usecase for all of this: socket services which when overloaded with
connections should just slow down reception of it, but not fail
persistently.
2023-09-18 18:55:19 +02:00
Michal Koutný
055665d596 dbus: Document org.freedesktop.systemd1.Service.MemoryAvailable property
The value is an optimistic estimate, make it clear in the docs.
2023-09-09 10:42:38 +02:00
Abderrahim Kitouni
ec07c3c80b man: add version info
This tries to add information about when each option was added. It goes
back to version 183.

The version info is included from a separate file to allow generating it,
which would allow more control on the formatting of the final output.
2023-08-29 14:07:24 +01:00
Luca Boccassi
b0d3095fd6 Drop split-usr and unmerged-usr support
As previously announced, execute order 66:

https://lists.freedesktop.org/archives/systemd-devel/2022-September/048352.html

The meson options split-usr, rootlibdir and rootprefix become no-ops
that print a warning if they are set to anything other than the
default values. We can remove them in a future release.
2023-07-28 19:34:03 +01:00
Luca Boccassi
3835b9aa4b Revert "core: add IgnoreOnSoftReboot= unit option"
The feature is not ready, postpone it

This reverts commit b80fc61e89.
2023-07-22 23:27:27 +01:00
Luca Boccassi
b80fc61e89 core: add IgnoreOnSoftReboot= unit option
As it says on the tin, configures the unit to survive a soft reboot.
Currently all the following options have to be set by hand:

Conflicts=reboot.target kexec.target poweroff.target halt.target
Before=reboot.target kexec.target poweroff.target halt.target
After=sysinit.target basic.target
DefaultDependencies=no
IgnoreOnIsolate=yes

This is not very user friendly. If new default dependencies are added,
or new shutdown/reboot types, they also have to be added manually.

The new option is much simpler, easy to find, and does the right thing
by default.
2023-07-21 18:05:41 +02:00
Luca Boccassi
b2deaaf01b
Merge pull request #27584 from rphibel/add-restartquick-option
service: add new RestartMode option
2023-07-06 20:37:31 +01:00
Richard Phibel
e568fea9fc service: add new RestartMode option
When this option is set to direct, the service restarts without entering a failed
state. Dependent units are not notified of transitory failure.

This is useful for the following use case:

We have a target with Requires=my-service, After=my-service.
my-service.service is a oneshot service and has Restart=on-failure in
its definition.

my-service.service can get stuck for various reasons and time out, in
which case it is restarted. Currently, when it fails the first time, the
target fails, even though my-service is restarted.

The behavior we're looking for is that until my-service is not restarted
anymore, the target stays pending waiting for my-service.service to
start successfully or fail without being restarted anymore.
2023-07-06 14:33:52 +02:00
Daniel P. Berrangé
1257274ad8 dbus: add 'ConfidentialVirtualization' property to manager object
This property reports whether the system is running inside a confidential
virtual machine.

Related: https://github.com/systemd/systemd/issues/27604
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2023-07-06 12:20:04 +01:00
Daan De Meyer
9c0c670125 core: Add RootEphemeral= setting
This setting allows services to run in an ephemeral copy of the root
directory or root image. To make sure the ephemeral copies are always
cleaned up, we add a tmpfiles snippet to unconditionally clean up
/var/lib/systemd/ephemeral. To prevent in use ephemeral copies from
being cleaned up by tmpfiles, we use the newly added COPY_LOCK_BSD
and BTRFS_SNAPSHOT_LOCK_BSD flags to take a BSD lock on the ephemeral
copies which instruct tmpfiles to not touch those ephemeral copies as
long as the BSD lock is held.
2023-06-21 12:48:46 +02:00
licunlong
a068eeac6f core/dbus-manager: also show DefaultIOAccounting and DefaultIPAccounting
fix: https://github.com/systemd/systemd/issues/28045
2023-06-19 09:57:11 +02:00
Lennart Poettering
e503019bc7 tree-wide: when in doubt use greek small letter mu rather than micro symbol
Doesn't really matter since the two unicode symbols are supposedly
equivalent, but let's better follow the unicode recommendations to
prefer greek small letter mu, as per:

https://www.unicode.org/reports/tr25
2023-06-14 10:23:56 +02:00
Daan De Meyer
bbfb25f4b9 creds: Add ImportCredential=
ImportCredential= takes a credential name and searches for a matching
credential in all the credential stores we know about it. It supports
globs which are expanded so that all matching credentials are loaded.
2023-06-08 14:09:18 +02:00
Stefan Roesch
85614c6e2f add support for KSM
This adds support for KSM (kernel samepage merging). It adds a new
boolean parameter called MemoryKSM to enable the feature. The feature
can only be enabled with newer kernels.
2023-06-05 11:22:43 +02:00
Lennart Poettering
4de665812a man: document the soft reboot operation 2023-06-02 18:43:10 +02:00
Luca Boccassi
d936595672 manager: restrict Dump*() to privileged callers or ratelimit
Dump*() methods can take quite some time due to the amount of data to
serialize, so they can potentially stall the manager. Make them
privileged, as they are debugging tools anyway. Use a new 'dump'
capability for polkit, and the 'reload' capability for SELinux, as
that's also non-destructive but slow.

If the caller is not privileged, allow it but rate limited to 10 calls
every 10 minutes.
2023-05-19 15:18:23 +01:00
Mike Yuan
e9f17fa8dd
core: rename RestartSecMax to RestartMaxDelaySec 2023-05-18 00:23:49 +08:00
Zbigniew Jędrzejewski-Szmek
8fb350049b man: fixes for assorted issues reported by the manpage-l10n project
Fixes #26761.
2023-05-17 12:25:01 +02:00
Miao Wang
4fad639a13 doc: remove legacy DefaultControlGroup from dbus properties
DefaultControlGroup does not exist any more.
2023-05-08 22:23:00 +09:00
Lennart Poettering
a8b993dc11 core: add DelegateSubgroup= setting
This implements a minimal subset of #24961, but in a lot more
restrictive way: we only allow one level of subcgroup (as that's enough
to address the no-processes in inner cgroups rule), and does not change
anything about threaded cgroup logic or similar, or make any of this new
behaviour mandatory.

All this does is this: all non-control processes we invoke for a unit
we'll invoke in a subgroup by the specified name.

We'll later port all our current services that use cgroup delegation
over to this, i.e. user@.service, systemd-nspawn@.service and
systemd-udevd.service.
2023-04-27 12:18:32 +02:00
Lennart Poettering
b9c1883a9c service: add ability to pin fd store
Oftentimes it is useful to allow the per-service fd store to survive
longer than for a restart. This is useful in various scenarios:

1. An fd to some security relevant object needs to be stashed somewhere,
   that should not be cleaned automatically, because the security
   enforcement would be dropped then.

2. A user namespace fd should be allocated on first invocation and be
   kept around until the user logs out (i.e. systemd --user ends), á la
   #16328 (This does not implement what #16318 asks for, but should
   solve the use-case discussed there.)

3. There's interest in allow a concept of "userspace reboots" where the
   kernel stays running, and userspace is swapped out (i.e. all services
   exit, and the rootfs transitioned into a new version of it) while
   keeping some select resources pinned, very similar to how we
   implement a switch root. Thus it is useful to allow services to exit,
   while leaving their fds around till the very end.

This is exposed through a new FileDescriptorStorePreserve= setting that
is closely modelled after RuntimeDirectoryPreserve= (in fact it reused
the same internal type), since we want similar behaviour in the end, and
quite often they probably want to be used together.
2023-04-13 06:44:27 +02:00
Lennart Poettering
3af48a86d9
Merge pull request #25608 from poettering/dissect-moar
dissect: add dissection policies
2023-04-12 13:46:08 +02:00
Colin Walters
4e1ac54e1c tree-wide: A few more uses of "unmet" for conditions
This is a followup to
413e8650b7
> tree-wide: Use "unmet" for condition checks, not "failed"

Since I noticed when running `systemctl status` on a recent
systemd still seeing
`Condition: start condition failed`

To recap the original rationale here for "unmet" is that it's
normal for some units to be conditional, so the term "failure"
here is too strong.
2023-04-11 12:36:53 +09:00
Lennart Poettering
84be0c710d tree-wide: hook up image dissection policy logic everywhere 2023-04-05 20:45:30 +02:00
Mike Yuan
5171356eee core: always calculate the next restart interval
Follow-up for #26902 and #26971

Let's always calculate the next restart interval
since that's more useful.

For that, we add 1 to s->n_restarts unconditionally,
and change RestartUSecCurrent property to RestartUSecNext.
2023-03-31 01:22:58 +01:00
Lennart Poettering
2ea24611b9 pid1: add DumpFileDescriptorStore() bus call that returns fdstore content info 2023-03-29 18:53:20 +02:00
Mike Yuan
57b33e0ce7
core/dbus-service: add RestartUSecCurrent property
This new property shows how much time we actually
waits before restarting.
2023-03-27 19:31:12 +08:00
Mike Yuan
be1adc27fc
core: add RestartSteps= and RestartSecMax= for exponentially increasing
interval between restarts

RestartSteps= accepts a positive integer as the number of steps
to take to increase the interval between auto-restarts from
RestartSec= to RestartSecMax=, or 0 to disable it.

Closes #6129
2023-03-27 19:31:12 +08:00
Mike Yuan
19dff6914d
core: support overriding NOTIFYACCESS= through sd-notify during runtime
Closes #25963
2023-03-22 06:33:12 +08:00
Lennart Poettering
6bb0084204 pid1: add unit file settings to control memory pressure logic 2023-03-01 09:43:23 +01:00
Yu Watanabe
60c5bd7759 tree-wide: fix typo 2023-02-22 14:46:19 +09:00
Lennart Poettering
a721cd0016 pid1: add a new D-Bus method for enquing POSIX signals with values to unit processes
This augments the existing KillUnit() + Kill() methods with
QueueSignalUnit() + QueueSignal(), which are what sigqueue() is to
kill().

This is useful for sending our new SIGRTMIN+18 control signals to system
services.
2023-02-17 09:55:35 +01:00
Luca Boccassi
53fda560dc core: add support for Startup memory limits
We support separate Startup configurations for CPU and I/O, so
add it for memory too. Only cover cgroupsv2 settings.
2023-02-15 20:01:16 +00:00
Luca Boccassi
e0e7bc8223 core: add GetUnitByPIDFD method and use it in systemctl
A pid can be recycled, but a pidfd is pinned. Add a new method that is safer
as it takes a pidfd as input.
Return not only the D-Bus object path, but also the unit id and the last
recorded invocation id, as they are both useful (especially the id, as
converting from a path object to a unit id from a script requires another
round-trip via D-Bus).

Note that the manager still tracks processes by pid, so theorethically this
is not fully error-proof, but on the other hand the method response is
synchronous and the manager is single-threaded, so once a call is being
processed the unit database will not change anyway. Once the manager
switches to use pidfds everywhere, this can be further hardened.
2023-01-18 10:58:46 +01:00
Lennart Poettering
3bd28bf721 pid1: add new Type=notify-reload service type
Fixes: #6162
2023-01-10 18:28:38 +01:00