IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
This adds %q, %A and %M specifiers to tmpfiles:
- %A and %M were previously added to tmpfiles.d man page, but not to specifier_table
- %q is added via COMMON_SYSTEM_SPECIFIERS
As discussed in https://github.com/systemd/systemd/pull/32724#discussion_r1638963071
I don't find the opposite reasoning particularly convincing.
We have ProtectHome=tmpfs and friends, and those can be
pretty much trivially implemented through TemporaryFileSystem=
too. The new logic brings many benefits, and is completely generic,
hence I see no reason not to expose it. We can even get more tests
for the code path if we make it public.
This is generally useful, but in some cases particularly: when
implementing enumeration calls that use the "more" flag to return
multiple replies then for the first reply we need to return an error in
case the list of objects to enumerate is empty, usually so form of
"NoSuchXYZ" error. In many cases this shouldn't really be treated as
error, as an empty list probably more than not is as valid as a list
with one, two or more entries.
Update frameworks that work automatically in the background
occasionally need to schedule reboots. Systemd-logind already
provides a nice mechanism to schedule shutdowns, send notfications
and block logins short before the time. Systemd has a framework for
calendar events, so we may conveniently use logind to define a
maintenance time for reboots.
The existing ScheduleShutdown DBus method in logind expects a usec_t
with an absolute time. Passing USEC_INFINITY as magic value now tells
logind to take the time from the configured maintenance time if set.
"shutdown -r" leverages that and uses the maintenance time
automatically if configured. The one minute default is still used if
nothing was specified.
Similarly the new 'auto' setting for the --when parameter of systemctl
uses the maintenance time if configured or a one minute timer like the
shutdown command.
systemd-cryptsetup supports a FIDO2 mode with manual parameters, where
the user provides all the information necessary for recreating the
secret, such as: credential ID, relaying party ID and the salt. This
feature works great for implementing 2FA schemes, where the salt file
is for example a secret unsealed from the TPM or some other source.
While the unlocking part is quite straightforward to set up, enrolling
such a keyslot - not so easy. There is no clearly documented
way on how to set this up and online resources are scarce on this topic
too. By implementing a straightforward way to enroll such a keyslot
directly from systemd-cryptenroll we streamline the enrollment process
and reduce chances for user error when doing such things manually.
DynamicUser= enables PrivateTmp= implicitly to avoid files owned by reusable uids
leaking into the host. Change it to instead create a fully private tmpfs instance
instead, which also ensures the same result, since it has less impactful semantics
with respect to PrivateTmp=yes, which links the mount namespace to the host's /tmp
instead. If a user specifies PrivateTmp manually, let the existing behaviour
unchanged to ensure backward compatibility is not broken.
Historically, systemd-tmpfiles was designed to manager temporary
files, but nowadays it has become a generic tool for managing
all kinds of files. To avoid user confusion, let's remove "temporary"
from the tool's description.
As discussed in #33349
The setting of systemd clock is important and deserves an accurate description,
see for example:
https://discussion.fedoraproject.org/t/f38-to-f39-40-dnf-system-upgrade-can-fail-on-raspberry-pi/92403https://bugzilla.redhat.com/show_bug.cgi?id=2242759
The meat of the description was in systemd-timesyncd.service(8), but
actually it's systemd that sets the clock. In particular, systemd-timesyncd
doesn't know anything about /usr/lib/clock-epoch, and since systemd sets
the clock to the epoch when initializing, systemd-timesyncd would only
get to advance the clock to the epoch under special circumstances.
Also, systemd-timesyncd is an optional component, so we can't even rely
on its man page being installed in all circumstances. The description needs
to be moved to systemd(1).
The description is updated to describe the changes that were made in
previous commits.
Mention that by default, /home is managed by tmpfiles.d/home.conf, and
recommend that users run systemd-tmpfiles --dry-run --purge first to
see exactly what will be removed.
When in FIDO2 mode with manual parameters, i.e. when not reading the
parameters off the LUKS2 header, the current behavior in regards to PIN,
UP and UV features is to default to v248 logic, where we use PIN + UP
when needed, and do not configure UV at all. Let's allow users to
configure those features in manual mode too.
For putting together "varlinkctl call" command lines it's useful to
quickly enumerate all methods implemented by a service. Hence, let's add
a new "list-methods" which uses the introspection data of a service to
quickly list methods.
This is implemented as a special flavour of the "introspect" logic,
and just suppresses all output except for the method names.
let's make it easier to use the introspection functionality of
"varlinkctl": if no interface name is shown, display the introspection
data of all available interfaces. Moreover, allow that multiple
interfaces can be listed, in which case we enumerate them all.
This relieves the user from having to list interfaces first in order to
find the ones which to introspect.
I find myself wanting to check this data with a quick command, and
browsing through /sys/ manually getting binary data sucks. Hence let's
do add a nice little analysis tool.
Multipath TCP (MPTCP), standardized in RFC8684 [1], is a TCP extension
that enables a TCP connection to use different paths. It allows a device
to make use of multiple interfaces at once to send and receive TCP
packets over a single MPTCP connection. MPTCP can aggregate the
bandwidth of multiple interfaces or prefer the one with the lowest
latency, it also allows a fail-over if one path is down, and the traffic
is seamlessly re-injected on other paths.
To benefit from MPTCP, both the client and the server have to support
it. Multipath TCP is a backward-compatible TCP extension that is enabled
by default on recent Linux distributions (Debian, Ubuntu, Redhat, ...).
Multipath TCP is included in the Linux kernel since version 5.6 [2]. To
use it on Linux, an application must explicitly enable it when creating
the socket:
int sd = socket(AF_INET(6), SOCK_STREAM, IPPROTO_MPTCP);
No need to change anything else in the application.
This patch allows MPTCP protocol in the Socket unit configuration. So
now, a <unit>.socket can contain this to use MPTCP instead of TCP:
[Socket]
SocketProtocol=mptcp
MPTCP support has been allowed similarly to what has been already done
to allow SCTP: just one line in core/socket.c, a very simple addition
thanks to the flexible architecture already in place.
On top of that, IPPROTO_MPTCP has also been added in the list of allowed
protocols in two other places, and in the doc. It has also been added to
the missing_network.h file, for systems with an old libc -- note that it
was also required to include <netinet/in.h> in this file to avoid
redefinition errors.
Link: https://www.rfc-editor.org/rfc/rfc8684.html [1]
Link: https://www.mptcp.dev [2]
Set the $REMOTE_ADDR environment variable for AF_UNIX socket connections
when using per-connection socket activation (Accept=yes). $REMOTE_ADDR
will now contain the remote socket's file system path (starting with a
slash "/") or its address in the abstract namespace (starting with an
at symbol "@").
This information is essential for identifying the remote peer in AF_UNIX
socket connections, but it's not easy to obtain in a shell script for
example without pulling in a ton of additional tools. By setting
$REMOTE_ADDR, we make this information readily available to the
activated service.
Since we document /usr/local/lib/systemd/ and other paths for various things,
add notes that this is not supported if /usr/local is a separate partition. In
systemd.unit, I tried to add the footnote in the table where
/usr/local/lib/systemd/ is listed, but that get's rendered as '[sup]a[/sup]'
with a mangled footnote at the bottom of the table :( .
Also, split paragraphs in one place where the subject changes without any
transition.
Follow-up for 02f35b1c90.
Replaces https://github.com/systemd/systemd/pull/33231.
Section "Description" didn't actually say what systemd does. And we had a giant
"Concepts" section that actually described units types and other details about
them. So let's move the basic description of functionality to "Description" and
rename the following section to "Units".
The link to the Original Design Document is moved to "See Also", it is of
historical interest mostly at this point.
The only actual change is that when talking about API filesystems, /dev is also
mentioned. (I think /sys+/proc+/dev are the canonical set and should be always
listed on one breath.)
It has been mentioned in IPv4Forwarding= and IPv6Forwarding=,
but let's also explain in the settings who imply these settings.
Follow-up for 3976c43092 and
485f5148b3.
For run0 (as opposed to systemd-run in general), connecting to
the system bus (of localhost or container) as a different user
than root and then trying to elevate privilege from that
makes little sense:
https://github.com/systemd/systemd/issues/32997#issuecomment-2127992973
The @ syntax is mostly useful when connecting to the user bus,
which is not a use case for run0. Hence, let's remove the example.
The syntax will be properly refused in #32999.
- mention that /run/machine-id is used if exist.
- mention system.machine_id credential,
- credential, VM uuid, and container uuid are not read when --root=
is specified or running in a chroot environment.
Fixes https://github.com/systemd/systemd/issues/28514.
Quoting https://github.com/systemd/systemd/issues/28514#issuecomment-1831781486:
> Whenever PAM is enabled for a service, we set up the PAM session and then
> fork off a process whose only job is to eventually close the PAM session when
> the service dies. That services we run with service privileges, both to
> minimize attack surface and because we want to use PR_SET_DEATHSIG to be get
> a notification via signal whenever the main process dies. But that only works
> if we have the same credentials as that main process.
>
> Now, if pam_systemd runs inside the PAM stack (which it normally does) it's
> session close hook will ask logind to synchronously end the session via a bus
> call. Currently that call is not accessible to unprivileged clients. And
> that's the part we need to relax: allow users to end their own sessions.
The check is implemented in a way that allows the kill if the sender is in
the target session.
I found 'sudo systemctl --user -M "zbyszek@" is-system-running' to
be a convenient reproducer.
Before:
May 16 16:25:26 x1c systemd[1]: run-u24754.service: Deactivated successfully.
May 16 16:25:26 x1c dbus-broker[1489]: A security policy denied :1.24757 to send method call /org/freedesktop/login1:org.freedesktop.login1.Manager.ReleaseSession to org.freedesktop.login1.
May 16 16:25:26 x1c (sd-pam)[3036470]: pam_systemd(login:session): Failed to release session: Access denied
May 16 16:25:26 x1c systemd[1]: Stopping session-114.scope...
May 16 16:25:26 x1c systemd[1]: session-114.scope: Deactivated successfully.
May 16 16:25:26 x1c systemd[1]: Stopped session-114.scope.
May 16 16:25:26 x1c systemd[1]: session-c151.scope: Deactivated successfully.
May 16 16:25:26 x1c systemd-logind[1513]: Session c151 logged out. Waiting for processes to exit.
May 16 16:25:26 x1c systemd-logind[1513]: Removed session c151.
After:
May 16 17:02:15 x1c systemd[1]: run-u24770.service: Deactivated successfully.
May 16 17:02:15 x1c systemd[1]: Stopping session-115.scope...
May 16 17:02:15 x1c systemd[1]: session-c153.scope: Deactivated successfully.
May 16 17:02:15 x1c systemd[1]: session-115.scope: Deactivated successfully.
May 16 17:02:15 x1c systemd[1]: Stopped session-115.scope.
May 16 17:02:15 x1c systemd-logind[1513]: Session c153 logged out. Waiting for processes to exit.
May 16 17:02:15 x1c systemd-logind[1513]: Removed session c153.
Edit: this seems to also fix https://github.com/systemd/systemd/issues/8598.
It seems that with the call to ReleaseSession, we wait for the pam session
close hooks to finish. I inserted a 'sleep(10)' after the call to ReleaseSession
in pam_systemd, and things block on that, nothing is killed prematurely.
Prompted by #32895
Rather than ordering with each power operation targets,
ordering against shutdown.target which is a valid
synchronization point. This has no effect if soft-reboot
is being performed.
A fixed name is too rigid, let's give users the ability to define
custom drop-in names which at the same time also allows defining
multiple dropins per unit.
We use ~ as the separator because:
- ':' is not allowed in credential names
- '=' is used to separate credential from value in mkosi's --credential
argument.
- '-' is commonly used in filenames
- '@' already has meaning as the unit template specifier which might be
confusing when adding dropins for template units
Like much English text, the systemd documentation uses "may not" in the
sense of both "will possibly not" and "is forbidden to". In many cases
this is OK because the context makes it clear, but in others I felt it
was possible to read the "is forbidden to" sense by mistake: in
particular, I tripped over "the target file may not exist" in
systemd.unit(5) before realizing the correct interpretation.
Use "might not" or "may choose not to" in these cases to make it clear
which sense we mean.
This commit adds the new varlink interface io.systemd.Machine at
/run/systemd/machine/io.systemd.Machine with a single method Register
It supports all combinations of RegisterMachine[WithSSH,WithNetwork] all
under the same method.
Also adds three properties:
- VsockCid: the VSOCK CID of the VM
- SshAddress: the address of the VM in a format SSH can connect to
- SshPrivateKeyPath: the path to the SSH private key to use to connect
to the VM.
GetMachineSSHInfo is essentially a convenience method to query both the
SshAddress and SshPrivateKeyPath properties at once.
In the majority of cases, this is caused by
sleep_supported() returning error. Hence it's
very likely that it would fail again, so
the fallback is not really useful. Instead,
honor the --force option for these verbs.
The systemd-confext use case description was mentioning an OSConfig
project which won't say much to users. Also, it's good to call out that
systemd-confext provides a reliable way to manage configuration because
in contrast to other tools it will remove all old configuration files.
According to the documentation in systemd.resource-control(5),
resource-control options may be used in mount, scope, service,
slice, socket and swap units.
While e.g. systemd.service(5) includes that information,
documentation for some other units does not.
The most problematic example is systemd.slice(5).
Its documentation states a slice unit may only contain [Install]
and [Unit] sections, while actually it may contain also a [Slice]
section with options from systemd.resource-control(5).
units/user/app.slice is an example of a slice unit having a [Slice]
section.
TEST-26-SYSTEMCTL is racy as we call systemctl is-active immediately
after systemctl kill. Let's implement --wait for systemctl kill and
use it in TEST-26-SYSTEMCTL to avoid the race.
In mkosi CI, we want persistent journals when running interactively
and runtime journals when running in CI, so let's add a credential
that allows us to configure which one to use.
Required for integration tests to power off on PID 1 crashes. We
deprecate systemd.crash_reboot and related options by removing them
from the documentation but still parsing them.
LinkLocalAddressing accepts a boolean. This can be seen by looking at
`link_local_address_family_from_strong(cont char *s)` in
`src/network/netword-util.c#L102-108` which falls back to
`address_family_from_string`, defined two lines above (L100)
using `DEFINE_STRING_TABLE_LOOKUP_WITH_BOOLEAN`.
Also: rename Handover → Handoff. I think it makes it clearer that this
is not really about handing over any resources, but that the executor is
out off the game from that point on.
With 1df4b21abd we started to default to
enrolling into the LUKS device backing the root fs if none was specified
(and no wipe operation is used). This changes to look for /var/ instead.
On most systems /var/ is going to be on the root fs, hence this change
is with little effect.
However, on systems where / and /var/ is separate it makes more sense to
default to /var/ because that's where the persistent and variable data
is placed (i.e. where LUKS should be used) while / doesn't really have
to be variable, could as well be immutable, or ephemeral. Hence /var/
should be a safer default.
Or to say this differently: I think it makes sense to support systems
with /var/ being on / well. I also think it makes sense to support
systems with them being separate, and /var/ being variable and
persistent. But any other kind of system I find much less interesting to
support, and in that case people should just specify the device name.
Also, while we are at it, tighten the checks a bit, insist on a dm-crypt
+ LUKS superblock before continuing.
And finally, let's print a short message indicating the device we
operate on.
The log files defined using file:, append: or truncate: inherit the owner and other privileges from the effective user running systemd.
The log files are NOT created using the "User", "Group" or "UMask" defined in the service.
When starting a container with --user, the new uid will be resolved and switched to
only in the inner child, at the end of the setup, by spawning getent. But the
credentials are set up in the outer child, long before the user is resolvable,
and the directories/files are made only readable by root and read-only, which
means they cannot be changed later and made visible to the user.
When this particular combination is specified, it is obvious the caller wants
the single-process container to be able to use credentials, so make them world
readable only in that specific case.
Fixes https://github.com/systemd/systemd/issues/31794
Enable the exec_fd logic for Type=notify* services too, and change it
to send a timestamp instead of a '1' byte. Record the timestamp in a
new ExecMainHandoverTimestamp property so that users can track accurately
when control is handed over from systemd to the service payload, so
that latency and startup performance can be trivially and accurately
tracked and attributed.
When an IO event source owns relevant fd, replacing with a new fd leaks
the previously assigned fd.
===
sd_event_add_io(event, &s, fd, ...);
sd_event_source_set_io_fd_own(s, true);
sd_event_source_set_io_fd(s, new_fd); <-- The previous fd is not closed.
sd_event_source_unref(s); <-- new_fd is closed as expected.
===
Without the change, valgrind reports the leak:
==998589==
==998589== FILE DESCRIPTORS: 4 open (3 std) at exit.
==998589== Open file descriptor 4:
==998589== at 0x4F119AB: pipe2 (in /usr/lib64/libc.so.6)
==998589== by 0x408830: test_sd_event_source_set_io_fd (test-event.c:862)
==998589== by 0x403302: run_test_table (tests.h:171)
==998589== by 0x408E31: main (test-event.c:935)
==998589==
==998589==
==998589== HEAP SUMMARY:
==998589== in use at exit: 0 bytes in 0 blocks
==998589== total heap usage: 33,305 allocs, 33,305 frees, 1,283,581 bytes allocated
==998589==
==998589== All heap blocks were freed -- no leaks are possible
==998589==
==998589== For lists of detected and suppressed errors, rerun with: -s
==998589== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Follow-ups for 74c4231ce5.
Previously, the path is obtained from the fd, but it is closed in
sd_event_loop() to unpin the filesystem.
So, let's save the path when the event source is created, and make
sd_event_source_get_inotify_path() simply read it.
We look for the root fs on the device of the booted ESP, and for the
other partitions on the device of the root fs. On EFI systems this
generally boils down to the same, but there are cases where this doesn't
hold, hence document this properly.
Fixes: #31199
This commit adds support for loading, measuring and handling a ".ucode"
UKI section. This section is functionally an initrd, intended for
microcode updates. As such it will always be passed to the kernel first.
Resolve at attach/detach/inspect time, so that the image is pinned and requires
re-attaching on update, given files are extracted from it so just passing
img.v/ to RootImage= is not enough to get a portable image updated
This reworkds --recovery-pin= from a parameter that takes a boolean to
an enum supporting one of "hide", "show", "query".
If "hide" (default behaviour) we'll generate a recovery pin
automatically, but never show it, and thus just seal it and good.
If "show" we'll generate a recovery pin automatically, but display it in
the output, so the user can write it down.
If "query" we'll ask the user for a recovery pin, and not automatically
generate any.
For compatibility the old boolean behaviour is kept.
With this you can now do "systemd-pcrlock make-policy
--recovery-pin=show" to set up the first policy, write down the recovery
PIN. Later, if the PCR prediction didn't work out one day you can then
do "systemd-pcrlock make-policy --recovery-pin=query" and enter the
recovery key and write a new policy.
Running both sd-ndisc and sd-radv should be mostly a misconfiguration,
but may not. So, let's only disable sd-ndisc by default when sd-radv is
enabled, but allow when both are explicitly requested.
This adds a small, socket-activated Varlink daemon that can delegate UID
ranges for user namespaces to clients asking for it.
The primary call is AllocateUserRange() where the user passes in an
uninitialized userns fd, which is then set up.
There are other calls that allow assigning a mount fd to a userns
allocated that way, to set up permissions for a cgroup subtree, and to
allocate a veth for such a user namespace.
Since the UID assignments are supposed to be transitive, i.e. not
permanent, care is taken to ensure that users cannot create inodes owned
by these UIDs, so that persistancy cannot be acquired. This is
implemented via a BPF-LSM module that ensures that any member of a
userns allocated that way cannot create files unless the mount it
operates on is owned by the userns itself, or is explicitly
allowelisted.
BPF LSM program with contributions from Alexei Starovoitov.
This is just good style. In this particular case, if the argument is incorrect and
the function is not tested with $NOTIFY_SOCKET set, the user could not get the
proper error until running for real.
Also, remove mention of systemd. The protocol is fully generic on purpose.
F40 will be out soon, so we can update the man page already. The example should
already work.
The cloud link was dropped in fd571c9df0, so
drop the unused variable too.
Unfortunately, sd-bus-vtable.h, sd-journal.h, and sd-id128.h
have variadic macro and inline initialization of sub-object, these are
not supported in C90. So, we need to silence some errors.
I think the example should reflect the full set of lifecycle messages,
including STOPPING=1, which tells the service manager that the service
is already terminating. This is useful for reporting this information
back to the user and to suppress repeated shutdown requests.
It's not as important as the READY=1 and RELOADING=1 messages, since we
actively wait for those from the service message if the right Type= is
set. But it's still very valuable information, easy to do, and completes
the state engine.
stale HibernateLocation EFI variable
Currently, if the HibernateLocation EFI variable exists,
but we failed to resume from it, the boot carries on
without clearing the stale variable. Therefore, the subsequent
boots would still be waiting for the device timeout,
unless the variable is purged manually.
There's no point to keep trying to resume after a successful
switch-root, because the hibernation image state
would have been invalidated by then. OTOH, we don't
want to clear the variable prematurely either,
i.e. in initrd, since if the resume device is the same
as root one, the boot won't succeed and the user might
be able to try resuming again. So, let's introduce a
unit that only runs after switch-root and clears the var.
Fixes#32021
We are saying in public that the protocl is stable and can be easily
reimplemented, so provide an example doing so in the documentation,
license as MIT-0 so that it can be copied and pasted at will.
Same reason as the reload, reexec is disruptive and it requires the
same privileges, so if somebody wants to limit reloads, they'll also
want to limit reexecs, so use the same setting.
The setting is used when /sys/power/state is set to 'mem'
(common for suspend) or /sys/power/disk is set to 'suspend'
(hybrid-sleep). We default to kernel choice here, i.e.
respect what's set through 'mem_sleep_default=' kernel
cmdline option.
Today listen file descriptors created by socket unit don't get passed to
commands in Exec{Start,Stop}{Pre,Post}= socket options.
This prevents ExecXYZ= commands from accessing the created socket FDs to do
any kind of system setup which involves the socket but is not covered by
existing socket unit options.
One concrete example is to insert a socket FD into a BPF map capable of
holding socket references, such as BPF sockmap/sockhash [1] or
reuseport_sockarray [2]. Or, similarly, send the file descriptor with
SCM_RIGHTS to another process, which has access to a BPF map for storing
sockets.
To unblock this use case, pass ListenXYZ= file descriptors to ExecXYZ=
commands as listen FDs [4]. As an exception, ExecStartPre= command does not
inherit any file descriptors because it gets invoked before the listen FDs
are created.
This new behavior can potentially break existing configurations. Commands
invoked from ExecXYZ= might not expect to inherit file descriptors through
sd_listen_fds protocol.
To prevent breakage, add a new socket unit parameter,
PassFileDescriptorsToExec=, to control whether ExecXYZ= programs inherit
listen FDs.
[1] https://docs.kernel.org/bpf/map_sockmap.html
[2] https://lore.kernel.org/r/20180808075917.3009181-1-kafai@fb.com
[3] https://man.archlinux.org/man/socket.7#SO_INCOMING_CPU
[4] https://www.freedesktop.org/software/systemd/man/latest/sd_listen_fds.html