IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Instead of mounting over, do an atomic swap using mount beneath, if
available. This way assets can be mounted again and again (e.g.:
updates) without leaking mounts.
Before this commit, $USER, $HOME, $LOGNAME and $SHELL are only
set when User= is set for the unit. For system service, this
results in different behaviors depending on whether User=root is set.
$USER always makes sense on its own, so let's set it unconditionally.
Ideally $HOME should be set too, but it causes trouble when e.g. getty
passes '-p' to login(1), which then doesn't override $HOME. $LOGNAME and
$SHELL are more like "login environments", and are generally not
suitable for system services. Therefore, a new option SetLoginEnvironment=
is also added to control the latter three variables.
Fixes#23438
Replaces #8227
Add a new boolean for units, SurviveFinalKillSignal=yes/no. Units that
set it will not have their process receive the final sigterm/sigkill in
the shutdown phase.
This is implemented by checking if a process is part of a cgroup marked
with a user.survive_final_kill_signal xattr (or a trusted xattr if we
can't set a user one, which were added only in kernel v5.7 and are not
supported in CentOS 8).
New directive `NFTSet=` provides a method for integrating dynamic cgroup IDs
into firewall rules with NFT sets. The benefit of using this setting is to be
able to use control group as a selector in firewall rules easily and this in
turn allows more fine grained filtering. Also, NFT rules for cgroup matching
use numeric cgroup IDs, which change every time a service is restarted, making
them hard to use in systemd environment.
This option expects a whitespace separated list of NFT set definitions. Each
definition consists of a colon-separated tuple of source type (only "cgroup"),
NFT address family (one of "arp", "bridge", "inet", "ip", "ip6", or "netdev"),
table name and set name. The names of tables and sets must conform to lexical
restrictions of NFT table names. The type of the element used in the NFT filter
must be "cgroupsv2". When a control group for a unit is realized, the cgroup ID
will be appended to the NFT sets and it will be be removed when the control
group is removed. systemd only inserts elements to (or removes from) the sets,
so the related NFT rules, tables and sets must be prepared elsewhere in
advance. Failures to manage the sets will be ignored.
If the firewall rules are reinstalled so that the contents of NFT sets are
destroyed, command systemctl daemon-reload can be used to refill the sets.
Example:
```
table inet filter {
...
set timesyncd {
type cgroupsv2
}
chain ntp_output {
socket cgroupv2 != @timesyncd counter drop
accept
}
...
}
```
/etc/systemd/system/systemd-timesyncd.service.d/override.conf
```
[Service]
NFTSet=cgroup:inet:filter:timesyncd
```
```
$ sudo nft list set inet filter timesyncd
table inet filter {
set timesyncd {
type cgroupsv2
elements = { "system.slice/systemd-timesyncd.service" }
}
}
```
This adds a new "PollLimit" pair of settings to .socket units, very
similar to existing "TriggerLimit" logic. The differences are:
* PollLimit focusses on the polling on the sockets, and pauses that
temporarily if a ratelimit on that is reached. TriggerLimit otoh
focusses on the triggering effect of socket units, and stops
triggering once the ratelimit is hit.
* While the trigger limit being hit is an action that causes the socket
unit to fail the polling limit being reached will just temporarily
disable polling on the socket fd, and it is resumed once the ratelimit
interval is over.
* When a socket unit operates on multiple socket fds (e,g, ListenStream=
on both some ipv6 and an ipv4 address or so). Then the PollLimit will
be specific to each fd, while the trigger limit is specific to the
whole unit.
Implementation-wise this is mostly a wrapper around sd-event's
sd_event_source_set_ratelimit(), which exposes the desired behaviour
directly.
Usecase for all of this: socket services which when overloaded with
connections should just slow down reception of it, but not fail
persistently.
This tries to add information about when each option was added. It goes
back to version 183.
The version info is included from a separate file to allow generating it,
which would allow more control on the formatting of the final output.
As it says on the tin, configures the unit to survive a soft reboot.
Currently all the following options have to be set by hand:
Conflicts=reboot.target kexec.target poweroff.target halt.target
Before=reboot.target kexec.target poweroff.target halt.target
After=sysinit.target basic.target
DefaultDependencies=no
IgnoreOnIsolate=yes
This is not very user friendly. If new default dependencies are added,
or new shutdown/reboot types, they also have to be added manually.
The new option is much simpler, easy to find, and does the right thing
by default.
When this option is set to direct, the service restarts without entering a failed
state. Dependent units are not notified of transitory failure.
This is useful for the following use case:
We have a target with Requires=my-service, After=my-service.
my-service.service is a oneshot service and has Restart=on-failure in
its definition.
my-service.service can get stuck for various reasons and time out, in
which case it is restarted. Currently, when it fails the first time, the
target fails, even though my-service is restarted.
The behavior we're looking for is that until my-service is not restarted
anymore, the target stays pending waiting for my-service.service to
start successfully or fail without being restarted anymore.
This property reports whether the system is running inside a confidential
virtual machine.
Related: https://github.com/systemd/systemd/issues/27604
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
This setting allows services to run in an ephemeral copy of the root
directory or root image. To make sure the ephemeral copies are always
cleaned up, we add a tmpfiles snippet to unconditionally clean up
/var/lib/systemd/ephemeral. To prevent in use ephemeral copies from
being cleaned up by tmpfiles, we use the newly added COPY_LOCK_BSD
and BTRFS_SNAPSHOT_LOCK_BSD flags to take a BSD lock on the ephemeral
copies which instruct tmpfiles to not touch those ephemeral copies as
long as the BSD lock is held.
Doesn't really matter since the two unicode symbols are supposedly
equivalent, but let's better follow the unicode recommendations to
prefer greek small letter mu, as per:
https://www.unicode.org/reports/tr25
ImportCredential= takes a credential name and searches for a matching
credential in all the credential stores we know about it. It supports
globs which are expanded so that all matching credentials are loaded.
This adds support for KSM (kernel samepage merging). It adds a new
boolean parameter called MemoryKSM to enable the feature. The feature
can only be enabled with newer kernels.
Dump*() methods can take quite some time due to the amount of data to
serialize, so they can potentially stall the manager. Make them
privileged, as they are debugging tools anyway. Use a new 'dump'
capability for polkit, and the 'reload' capability for SELinux, as
that's also non-destructive but slow.
If the caller is not privileged, allow it but rate limited to 10 calls
every 10 minutes.
This implements a minimal subset of #24961, but in a lot more
restrictive way: we only allow one level of subcgroup (as that's enough
to address the no-processes in inner cgroups rule), and does not change
anything about threaded cgroup logic or similar, or make any of this new
behaviour mandatory.
All this does is this: all non-control processes we invoke for a unit
we'll invoke in a subgroup by the specified name.
We'll later port all our current services that use cgroup delegation
over to this, i.e. user@.service, systemd-nspawn@.service and
systemd-udevd.service.
Oftentimes it is useful to allow the per-service fd store to survive
longer than for a restart. This is useful in various scenarios:
1. An fd to some security relevant object needs to be stashed somewhere,
that should not be cleaned automatically, because the security
enforcement would be dropped then.
2. A user namespace fd should be allocated on first invocation and be
kept around until the user logs out (i.e. systemd --user ends), á la
#16328 (This does not implement what #16318 asks for, but should
solve the use-case discussed there.)
3. There's interest in allow a concept of "userspace reboots" where the
kernel stays running, and userspace is swapped out (i.e. all services
exit, and the rootfs transitioned into a new version of it) while
keeping some select resources pinned, very similar to how we
implement a switch root. Thus it is useful to allow services to exit,
while leaving their fds around till the very end.
This is exposed through a new FileDescriptorStorePreserve= setting that
is closely modelled after RuntimeDirectoryPreserve= (in fact it reused
the same internal type), since we want similar behaviour in the end, and
quite often they probably want to be used together.
This is a followup to
413e8650b7
> tree-wide: Use "unmet" for condition checks, not "failed"
Since I noticed when running `systemctl status` on a recent
systemd still seeing
`Condition: start condition failed`
To recap the original rationale here for "unmet" is that it's
normal for some units to be conditional, so the term "failure"
here is too strong.
Follow-up for #26902 and #26971
Let's always calculate the next restart interval
since that's more useful.
For that, we add 1 to s->n_restarts unconditionally,
and change RestartUSecCurrent property to RestartUSecNext.
interval between restarts
RestartSteps= accepts a positive integer as the number of steps
to take to increase the interval between auto-restarts from
RestartSec= to RestartSecMax=, or 0 to disable it.
Closes#6129
This augments the existing KillUnit() + Kill() methods with
QueueSignalUnit() + QueueSignal(), which are what sigqueue() is to
kill().
This is useful for sending our new SIGRTMIN+18 control signals to system
services.
A pid can be recycled, but a pidfd is pinned. Add a new method that is safer
as it takes a pidfd as input.
Return not only the D-Bus object path, but also the unit id and the last
recorded invocation id, as they are both useful (especially the id, as
converting from a path object to a unit id from a script requires another
round-trip via D-Bus).
Note that the manager still tracks processes by pid, so theorethically this
is not fully error-proof, but on the other hand the method response is
synchronous and the manager is single-threaded, so once a call is being
processed the unit database will not change anyway. Once the manager
switches to use pidfds everywhere, this can be further hardened.
Define new unit parameter (LogFilterPatterns) to filter logs processed by
journald.
This option is used to store a regular expression which is carried from
PID1 to systemd-journald through a cgroup xattrs:
`user.journald_log_filter_patterns`.
Trying to disable a unit with no install info is mostly useless, so
adding a warning like we do for enable (with the new dbus method
'DisableUnitFilesWithFlagsAndInstallInfo()'). Note that it would
still find and remove symlinks to the unit in /etc, regardless of
whether it has install info or not, just like before. And if there are
actually files to remove, we suppress the warning.
Fixes#17689
The new function DumpPatterns() can be used to limit (drastically) the size of
the data returned by PID1. Hence the optimization of serializing data into a
file descriptor should be less relevant than having the possibility to limit
the data when communicating with the service manager remotely.
NB: when passing patterns, the dump command omits the version of the manager as
well as the features and the timestamps.
In many places we spelled out the phrase behind "initrd" in full, but this
isn't terribly useful. In fact, no "RAM disk" is used, so emphasizing this
is just confusing to the reader. Let's just say "initrd" everywhere, people
understand what this refers to, and that it's in fact an initramfs image.
Also, s/i.e./e.g./ where appropriate.
Also, don't say "in RAM", when in fact it's virtual memory, whose pages
may or may not be loaded in page frames in RAM, and we have no control over
this.
Also, add <filename></filename> and other minor cleanups.
Follow-up for 10f3f4ed01.
We already have RuntimeWatchdogUSec or friends. Let's not introduce
redundant properties.
Also, drop the const qualifier for WatchdogLastPingTimestamp, as they
are actually not constant.
Not wired in by any unit type yet, just the basic to allocate,
ref, deref and plug in to other unit types.
Includes recording the trigger unit name and passing it to the
triggered unit as TRIGGER_UNIT= env var.
Do not go back to disk on each selinux access, but instead cache the
label off the inode we are actually reading. That way unit file contents
and unit file label we use for access checks are always in sync.
Based on discussions here:
https://github.com/systemd/systemd/pull/10023#issuecomment-1179835586
Replaces:
https://github.com/systemd/systemd/pull/23910
This changes behaviour a bit, because we'll reach and cache the label at
the moment of loading the unit (i.e. usually on boot and reload), but
not after relabelling. Thus, users must refresh the cache explicitly via
a "systemctl daemon-reload" if they relabelled things.
This makes the SELinux story a bit more debuggable, as it adds an
AccessSELinuxContext bus property to units that will report the label we are
using for a unit (or the empty string if not known).
This also drops using the "source" path of a unit as label source. if
there's value in it, then generators should manually copy the selinux
label from the source files onto the generated unit files, so that the
rule that "access labels are read when we read the definition files" is
upheld. But I am not convinced this is really a necessary, good idea.
In the welcome line, use NAME= as the fallback for PRETTY_NAME=.
PRETTY_NAME= doesn't have to be set, but NAME= should.
Example output:
---
Welcome to Fedora Linux 37 (Rawhide Prerelease)!
[ !! ] This OS version (Fedora Linux 37 (Rawhide Prerelease)) is past its end-of-support date (1999-01-01)
Queued start job for default target graphical.target.
[ OK ] Created slice system-getty.slice.
---
We had a description in README, and an outdated list in the man page.
I think we should keep a reference-style list in the man page. The description
in README is more free-form.
This reverts PR #22587 and its follow-up commit. More specifically,
2299b1cae3 (partially),
e176f85527,
ceb46a31a0, and
51bb9076ab.
The PR was merged without final approval, and has several issues:
- OSS fuzz reported issues in the conf parser,
- It calls synchrnous netlink call, it should not be especially in PID1,
- The importance of NFTSet for CGroup and DynamicUser may be
questionable, at least, there was no justification PID1 should support
it.
- For networkd, it should be implemented with Request object,
- There is no test for the feature.
Fixes#23711.
Fixes#23717.
Fixes#23719.
Fixes#23720.
Fixes#23721.
Fixes#23759.
New directive `DynamicUserNFTSet=` provides a method for integrating
configuration of dynamic users into firewall rules with NFT sets.
Example:
```
table inet filter {
set u {
typeof meta skuid
}
chain service_output {
meta skuid != @u drop
accept
}
}
```
```
/etc/systemd/system/dunft.service
[Service]
DynamicUser=yes
DynamicUserNFTSet=inet:filter:u
ExecStart=/bin/sleep 1000
[Install]
WantedBy=multi-user.target
```
```
$ sudo nft list set inet filter u
table inet filter {
set u {
typeof meta skuid
elements = { 64864 }
}
}
$ ps -n --format user,group,pid,command -p `pgrep sleep`
USER GROUP PID COMMAND
64864 64864 55158 /bin/sleep 1000
```
* Avoid traling slash as most links are defined without.
* Always use https:// protocol and www. subdomain
Allows for easier tree-wide linkvalidation
for our migration to systemd.io.
Backward incompatible change to avoid returning 'skipped' if a condition causes
a job activation to be skipped when using StartUnitWithFlags().
Job results are broadcasted, so it is theoretically possible that existing
software could get confused if they see this result.
Replaces https://github.com/systemd/systemd/pull/22369
When writing docs for SD_BUS_VTABLE_CAPABILITY, I noticed that we have one use
of SD_BUS_VTABLE_CAPABILITY(CAP_SYS_ADMIN) in the tree. This is the default, so
it's not very useful to specify it. But if we're touching that, I think it's
better to use mac + polkit for this like for everything else.
We don't have a very good category for this, but I don't think it makes sense
to add a new one. I just reused the same as other similar calls.
Add support for managing and configuring watchdog pretimeout values if
the watchdog hardware supports it. The ping interval is adjusted to
account for a pretimeout so that it will still ping at half the timeout
interval before a pretimeout event would be triggered. By default the
pretimeout defaults to 0s or disabled.
The RuntimeWatchdogPreSec config option is added to allow the pretimeout
to be specified (similar to RuntimeWatchdogSec). The
RuntimeWatchdogPreUSec dbus property is added to override the pretimeout
value at runtime (similar to RuntimeWatchdogUSec). Setting the
pretimeout to 0s will disable the pretimeout.
Add a new setting that follows the same principle and implementation
as ExtensionImages, but using directories as sources.
It will be used to implement support for extending portable images
with directories, since portable services can already use a directory
as root.
When an activation job is skipped because of a Condition*= setting failing,
currently the JobRemoved signal lists 'done' as the result, just as with
a successful job.
This is a problem when doing dbus activation: dbus-broker will receive a
signal that says the job was successful, so then it moves into a state where
it waits for the requested name to appear on the bus, but that never happens
because the job was actually skipped.
Add a new StartUnitWithFlags that changes the behaviour of the JobRemoved
signal to list 'done' or 'skipped'.
Fixes#21520
This introduces `ExitType=main|cgroup` for services.
Similar to how `Type` specifies the launch of a service, `ExitType` is
concerned with how systemd determines that a service exited.
- If set to `main` (the current behavior), the service manager will consider
the unit stopped when the main process exits.
- The `cgroup` exit type is meant for applications whose forking model is not
known ahead of time and which might not have a specific main process.
The service will stay running as long as at least one process in the cgroup
is running. This is intended for transient or automatically generated
services, such as graphical applications inside of a desktop environment.
Motivation for this is #16805. The original PR (#18782) was reverted (#20073)
after realizing that the exit status of "the last process in the cgroup" can't
reliably be known (#19385)
This version instead uses the main process exit status if there is one and just
listens to the cgroup empty event otherwise.
The advantages of a service with `ExitType=cgroup` over scopes are:
- Integrated logging / stdout redirection
- Avoids the race / synchronisation issue between launch and scope creation
- More extensive use of drop-ins and thus distro-level configuration:
by moving from scopes to services we can have drop ins that will affect
properties that can only be set during service creation,
like `OOMPolicy` and security-related properties
- It makes systemd-xdg-autostart-generator usable by fixing [1], as obviously
only services can be used in the generator, not scopes.
[1] https://bugs.kde.org/show_bug.cgi?id=433299
When combined with a tmpfs on /run or /var/lib, allows to create
arbitrary and ephemeral symlinks for StateDirectory or RuntimeDirectory.
This is especially useful when sharing these directories between
different services, to make the same state/runtime directory 'backend'
appear as different names to each service, so that they can be added/removed
to a sharing agreement transparently, without code changes.
An example (simplified, but real) use case:
foo.service:
StateDirectory=foo
bar.service:
StateDirectory=bar
foo.service.d/shared.conf:
StateDirectory=
StateDirectory=shared:foo
bar.service.d/shared.conf:
StateDirectory=
StateDirectory=shared:bar
foo and bar use respectively /var/lib/foo and /var/lib/bar. Then
the orchestration layer decides to stop this sharing, the drop-in
can be removed. The services won't need any update and will keep
working and being able to store state, transparently.
To keep backward compatibility, new DBUS messages are added.
Let's make our service managers slightly less likely to be killed by the
OOM killer by adjusting our services' OOM score adjustment to 100 above
ours. Do this conservatively, i.e. only for regular user sessions.
It's not "const", it can change any time if people change the fs, and we
don#t send out notifications for it. Hence don't claim it was const.
(Otherwise clients might cache it, but they should not)
Prompted-by: #20792
Currently there does not exist a way to specify a path relative to which
all binaries executed by Exec should be found. The only way is to
specify the absolute path.
This change implements the functionality to specify a path relative to which
binaries executed by Exec*= can be found.
Closes#6308