IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
These are inspired by the existing commands that return the path to the
boot or ESP partitions. However, these new commands show the path to the
boot loader (systemd-boot) or UKI/stub (systemd-stub) that was used on
the current boot. This information is derived from EFI variables.
This introduces 'i' prefix for match string. When specified, string or
pattern will match case-insensitively.
Closes#34359.
Co-authored-by: Ryan Wilson <ryantimwilson@meta.com>
The verb s not really specific to credential management, it was always a
bit misplaced. Hence move it to systemd-analyze, where we already have
some general TPM related verbs such as "srk" and "pcrs"
systemd-stub provides the signing key for TPM2 signed PCR policies in a
file tpm2-pcr-public-key.pem to userspace. Hence, to clarify that this
is the same key as used when signing via "systemd-measure", let's rename
it in the docs like that.
Also rename the private key to tpm2-pcr-private-key.pem, to keep the
symmetry.
With this we should universally stick to this nomenclature:
1. tpm2-pcr-public-key.pem ← public part of signing key
2. tpm2-pcr-private-key.pem ← private part of signing key
3. tpm2-pcr-signature.json ← signature file made with key pair
Inspired by: #34069
It is not true that "no string" is written to journal; the binary
name is used when run via `systemd-cat command`, or `cat` is used
when run via `command | systemd-cat`.
These variables closely mirror the existing
LoaderDevicePartUUID/LoaderImageIdentifier variables. But the Stub…
variables indicate the location of the stub/UKI (i.e. of systemd-stub),
while the Loader… variables indicate the location of the boot loader
(i.e. of systemd-boot). (Except of course, there is no boot loader used,
in which case both sets point to the stub/UKI, as a special case).
This actually matters, as we support that sd-boot runs off the ESP,
while a UKI then runs off XBOOTLDR, i.e. two distinct partitions.
First of all, these were always set, i.e. since sd-boot was merged into
our tree, i.e. v220. Let's say so explicitly.
Also, let's be more accurate, regarding which partition this referes to:
it's usually "the" ESP, but given that you can make firmware boot from
arbitrary disks, it could be any other partition too. Hence, be
explicit on this.
Also, clarify tha sd-stub will set this too, if sd-boot never set it.
This adds a ability to add alternative sections of a specific type in
the same UKI. The primary usecase is for supporting multiple different
kernel cmdlines that are baked into a UKI.
The mechanism is relatively simple (I think), in order to make it robust.
1. A new PE section ".profile" is introduced, that is a lot like
".osrel", but contains information about a specific "profile" to
boot. The ".profile" section can appear multiple times in the same
PE, and acts as delimiter indicating where a new profile starts.
Everything before the first ".profile" is called the "base profile",
and is shared among all other profiles, which can then override or
add addition PE sections on top.
2. An UKI's command line can be prefixed with an argument such as "@0" or
"@1" or "@2" which indicates the "profile" to boot. If no argument is
specified the default is profile 0. Also, a UKI that lacks any
.profile section is treated like one with only a profile 0, but with
no data in that profile section.
3. The stub will first search for its usual set of PE sections
(hereafter called "base sections"), and stop at the first .profile PE
section if any. It will then find the .profile matching the selected
profile by its index, and any sections found as part of that profile
on top of the base sections.
And that's already it.
Example: let's say a distro wants to provide a single UKI that can be
invoked in one of three ways:
1. The regular profile that just boots the system
2. A profile that boots into storagetm
3. A profile that initiates factory reset and reboots.
For this it would define a classic UKI with sections .linux, .initrd,
.cmdline, and whatever else it needs. The .cmdline section would contain
the kernel command line for the regular profile.
It would then insert one ".profile" section, with a contents like the
following:
ID=regular
This is the profile for profile 0. It would immediately afterwards add
another ".profile" section:
ID=storagetm
TITLE=Boot into Storage Target Mode
This would then followed with a .cmdline section that is just like the
basic one, but with "rd.systemd.unit=storage-target-mode.target"
suffixed. Then, another .profile section would be added:
ID=factory-reset
TITLE=Factory Reset
Which is then followed by one last PE section: a .cmdline one with
"systemd.unit=factory-reset.target" suffixed to te regular command line.
i.e. expressed in tabular form the above would be:
The base profile:
.linux
.initrd
.cmdline
.osrel
The regular boot profile:
.profile
The storagetm profile:
.profile
.cmdline
The factory reset profile:
.profile
.cmdline
You might wonder why the first .cmdline in the list above is placed in
the base profile rather than in the regular boot profile, given that it
is overriden in all other profiles anyway. And you are right. The only
reason I'd place it in the base profile is that it makes the UKI more
nicely extensible if later profiles are added that want to replace
something else instead of the .cmdline, for example .ucode or so. But it
really doesn't matter much.
While the primary usecase is of course multiple alternative command
lines, the concept is more powerful than that: for various usecases it
might be valuable to offer multiple choices of devicetree, ucode or
initrds.
The .profile contents is also passed to the invoked kernel as a file in
/.extra/profile (via a synthetic initrd). Thus, this functionality can
even be useful without overriding any section at all, simply by means of
reading that file from userspace.
Design choices:
1. On purposes I used a special command line marker (i.e. the "@" thing,
which maybe we should call the "profile selector"), that doesn't look
like a regular kernel command line option. This is because this is
really not a regular kernel command line option – we process it in
the stub, then remove it as prefix, and measure the unprefixed
command line only after that. The kernel will not see the profile
selector either. I think these special semantics are best
communicated by making it look substantially different from regular
options.
2. This moves around measurements a bit. Previously we measured our UKI
sections right after finding them. Now we first parse the profile
number from the command line, then search for the profile's sections,
and only then measure the sections we actually end up using for this
profile. I think that this logic makes most sense: measure what we
are using, not what we are overriding. Or in other words, if you boot
profile @3, then we'll measure .cmdline (assuming it exists) of
profile 3, and *not* measure .cmdline of the base profile. Also note
that if the user passes in a custom kernel command line via command
line arguments we'll strip off the profile selector (i.e. the initial
"@X" thing) before we pass it on.
3. The .profile stuff is supposed to be generic and extensible. For
example we could use it in future to mark "dangerous" options such as
factory reset, so that boot menus can ask for confirmation before
booting into it. Or we could introduce match expressions against
SMBIOS or other system identifiers, to filter out profiles on
specific hw.
Note btw, that PE allows defining multiple sections that point to the
same offsets in the file. This allows sharing payload under different
names. For example, if profile @4 and @7 shall carry the same .ucode
section, they can define .ucode in each profile and then make it point to
the same offset.
Also note that that one can even "mask" a base section in a profile, by
inserting an empty section. For example, if the base .dtb section should
not be used for profile @4, then add a section .dtb right after the
fourth .profile with a zero size to the UKI, and you will get your wish
fulfilled.
This code only contains changes to sd-stub. A follow-up commit will
teach sd-boot to also find this profile PE sections to synthesize
additional menu entries from a single UKI.
A later commit will add support for gnerating this via ukify.
Fixes: #24539
In mkosi, I want to add a sysupdate verb to wrap systemd-sysupdate.
The definitions will be picked up from mkosi.sysupdate/ and passed
to systemd-sysupdate. I want users to be able to write transfer
definitions that are independent of the output directory used by
mkosi. To make this possible, it should be possible to specify the
directory that transfer sources should be looked up in on the sysupdate
command line. Let's allow this via a new --transfer-source= option.
Additionally, transfer sources that want to take advantage of this
feature should specify PathRelativeTo=directory to indicate the configured
Path= is interpreted relative to the tranfer source directory specified
on the CLI.
This allows for the following transfer definition to be put in
mkosi.sysupdate:
"""
[Transfer]
ProtectVersion=%A
[Source]
Type=regular-file
Path=/
PathRelativeTo=directory
MatchPattern=ParticleOS_@v.usr-%a.@u.raw
[Target]
Type=partition
Path=auto
MatchPattern=ParticleOS_@v
MatchPartitionType=usr
PartitionFlags=0
ReadOnly=1
"""
This options is pretty simple, it allows specifying an UKI whose
sections to import first, and place at the beginning of the new UKI.
This is useful for generating multi-profile UKIs piecemeal: generate the
base UKI first, then append a profile, and another one and another one.
The sections imported this way are not included in any PCR signature,
the assumption is that that already happened before in the imported UKI.
Now that mkfs.btrfs is adding support for compressing the generated
filesystem (https://github.com/kdave/btrfs-progs/pull/882), let's
add general support for specifying the compression algorithm and
compression level to use.
We opt to not parse the specified compression algorithm and instead
pass it on as is to the mkfs tool. This has a few benefits:
- We support every compression algorithm supported by every tool
automatically.
- Users don't need to modify systemd-repart if a mkfs tool learns a
new compression algorithm in the future
- We don't need to maintain a bunch of tables for filesystem to map
from our generic compression algorithm enum to the filesystem specific
names.
We don't add support for btrfs just yet until the corresponding PR
in btrfs-progs is merged.
These operations might require slow I/O, and thus might block PID1's main
loop for an undeterminated amount of time. Instead of performing them
inline, fork a worker process and stash away the D-Bus message, and reply
once we get a SIGCHILD indicating they have completed. That way we don't
break compatibility and callers can continue to rely on the fact that when
they get the method reply the operation either succeeded or failed.
To keep backward compatibility, unlike reload control processes, these
are ran inside init.scope and not the target cgroup. Unlike ExecReload,
this is under our control and is not defined by the unit. This is necessary
because previously the operation also wasn't ran from the target cgroup,
so suddenly forking a copy-on-write copy of pid1 into the target cgroup
will make memory usage spike, and if there is a MemoryMax= or MemoryHigh=
set and the cgroup is already close to the limit, it will cause an OOM
kill, where previously it would have worked fine.
This adds two more fields in 'udevadm info':
- J for device ID, e.g. b128:1, c10:1, n1, and so on.
- B for driver subsystem, e.g. pci, i2c, and so on.
These, especially the device ID field may be useful to find udev
database file under /run/udev/data for a device.
To create the sd_device object of a driver, the function
sd_device_new_from_subsystem_sysname() requires "drivers" for subsystem
and e.g. "pci:iwlwifi" for sysname. Similarly, sd_device_new_from_device_id()
also requires driver subsystem. However, we have never provided a
way to get the driver subsystem ("pci" for the previous example) from
an existing sd_device object.
Let's introduce a way to get driver subsystem.
One of the major pait points of managing fleets of headless nodes is
that when something fails at startup, unless debug level was already
enabled (which usually isn't, as it's a firehose), one needs to manually
enable it and pray the issue can be reproduced, which often is really
hard and time consuming, just to get extra info. Usually the extra log
messages are enough to triage an issue.
This new option makes it so that when a service fails and is restarted
due to Restart=, log level for that unit is set to debug, so that all
setup code in pid1 and sd-executor logs at debug level, and also a new
DEBUG_INVOCATION=1 env var is passed to the service itself, so that it
knows it should start with a higher log level. Once the unit succeeds
or reaches the rate limit the original level is restored.
So far we manually hardcoded $LISTEN_FDNAMES to "varlink" in various
varlink service units we ship, even though FileDescriptorName=varlink
is specified in associated socket units already, because
FileDescriptorName= is currently silently ignored when combined with
Accept=yes. Let's step away from this, which seems saner.
Note that this is technically a compat break, but a mostly negligible
one as there shall be few users setting FileDescriptorName= but
still expecting LISTEN_FDNAMES=connection in the actual executable.
Preparation for #34080
DefaultRoute is a D-Bus property, not a valid setting name in .network
files nor resolved.conf.
Whether a link is the default route or not is configured with
DNSDefaultRoute= setting in .network files.
I don't actually need this anymore since we're going with a
unit based approach for the containers stuff internally so
let's just revert it.
Fixes#34085
This reverts commit ce2291730d.
A previous commit made sysupdate recognize installed versions where some
transfers are missing. This commit teaches sysupdate how to correctly
repair these incomplete versions.
Previously, if you had a incomplete installation of the OS booted, and
ran sysupdate in an attempt to repair it, sysupdate would make things
worse by creating copies of the currently-booted partitions in the
inactive slots. Then at boot you have two identical partitions, with
identical labels an UUIDs, and end up with a mess.
With this commit, sysupdate is able to recognize situations where it can
simply download the missing transfers and leave the rest of the system
undistrubed.
Partial fix for https://github.com/systemd/systemd/issues/33339
When enumerating what versions exist for a given target, sysupdate would
completely throw out any version that's incomplete (where some of the
transfers in the target have that version installed or available, and
other transfers do not).
If we're trying to find what versions we can offer for download, this is
great behavior. If the server side is advertising a partial update to
download, we shouldn't present it to the user.
On the other hand, if we're enumerating what versions we have currently
installed, this is a bad behavior. It makes sysupdate fragile. For
example, if a sysext introduces a new .conf file into
/usr/lib/sysupdate.d, suddenly the currently-installed OS stops being a
version that we've enumerated. Since it's not enumerated, it's not
protected, and so sysupdate will wipe the booted OS.
So if we're looking for installed versions, we now loosen the
restrictions and enumerate incomplete installations.
Partial fix for https://github.com/systemd/systemd/issues/33339
This has been a glaring omission the docs: when people create
.user/.group/.user-privileged/.group-privileged drop-in files, they
should also create matching .membership files.
This softens the behavior originally introduced in eded61e410 to apply
only to the fallback dns servers.
The intent is that the global FallbackDNS (instead of DNS) can now be
used in conjunction with the per-link dns, providing a fallback behavior
without introducing a scope overlap.
References: eded61e410 (resolved: demote the global unicast scope, 2024-08-19)
This commit may have been a breaking change for sd-resolved foreign
resolv.conf mode, where a legacy network management daemon directly
modifies resolv.conf and sd-resolved consumes that.
This reverts commit eded61e410.
mkfs.btrfs has recently learned new options --subvol and --default-subvol
so let's stop failing when Subvolumes= and DefaultSubvolume= are used offline
and use the new --subvol and --default-subvol options instead to create subvolumes
in the generated root filesystem without root privileges or loop devices.
This is the command-line tool to manage systemd-sysudpated
Co-authored-by: Tom Coldrick <thomas.coldrick@codethink.co.uk>
Co-authored-by: Abderrahim Kitouni <abderrahim.kitouni@codethink.co.uk>
This will greatly reduce the number of cases where the global unicast
scope overlaps with link scopes configured as default-route, making it
feasible to use the global DNS setting in conjunction with per-link dns
servers configured by the network.
This change is preferred over demoting links to default-route=no where
the user prefers to use the network provided DNS servers, and I expect
it is non-disruptive in that it should not degrade the efficacy of any
existing configuration.
Note, `systemd-analyze foo@.service --instance=hoge` is equivalent to
`systemd-analyze foo@hoge.service`. But, the option may be useful when
e.g. passing multiple template units that have restriction on their
instance name:
```
$ ls
template_aaa@.service template_bbb@.service template_ccc@.service
$ systemd-analyze ./template_* --instance=hoge
```
Without the option, we need to embed an instance name into each unit
name, so cannot use globs.
Prompted by #33681.
After 3976c43092 (#31423), IPMasquerade=
implies only per-interface IP forwarding. That means, nspawn users need
to manually enable IPv4/IPv6Forwarding= in networkd.conf when
--network-veth or friend is used. Even the change was announced in NEWS,
the change itself breaks backward compatibility and extremely reduces
usability.
Let's make the setting imply the global setting again.
Fixes#34010.
The net_id builtin only checked the of_node of a netdev's parent device,
not that of the netdev itself. While it is common that netdevs don't have
an OF node assigned themselves, as they are derived from some parent
device, this is not always the case. In particular when a single
controller provides multiple ports that can be referenced indiviually in
the Device Tree (both for aliases/MAC address assignment and phandle
references), the correct of_node will be that of the netdev itself, not
that of the parent, so it needs to be checked, too.
A new naming scheme flag NAMING_DEVICETREE_PORT_ALIASES is added to
allow selecting the new behavior.
This allows for example forcing to use /sbin/init instead of always
using /usr/lib/systemd/systemd if it exists. Or it allows using a
different path altogether.
The PrepareForShutdownWithMetadata signal was added via
e4aab5cf1a but a corresponding property
was not. A property has to be a single type, so the bool needs to be
one of the key/value pairs as 'ba{sv}' is not a valid property.
+ Scale the x-axis of the resulting plot by a factor (default 1.0)
+ Add activation timestamps to each bar
Signed-off-by: rajmohan r <rajmohan.r@kpit.com>
This fixes
commit 9b0688f491
Author: Yu Watanabe <watanabe.yu+github@gmail.com>
Date: Tue Jan 9 10:52:49 2024 +0900
virt: add Google Compute Engine support
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Arch and Tumbleweed do not do EOLs but are still stable, so clarify the paragraph.
Also break the entry in paragraphs, to make it more readable when rendered.
The point was made on https://lists.debian.org/debian-ctte/2024/08/msg00005.html
that 'pre-release sounds' like an RC candidate, ie, something that will change
very slightly in the released version. But this is not necessarily the case
for example at the beginnig of a Fedora Rawhide or Debian Testing release cycle,
so change it to a more generic 'development'
Follow-up for 7102dc52e6
This is for experimental builds of the OS made to test some specific WIP
feature.
For example, let's say the distro in question is Asahi Linux and Apple
just released the M3 SoC. The Asahi developers will start porting to the
M3, and will quickly generate builds of Asahi Linux that can technically
boot but aren't ready for any kind of daily use. These images are marked
as experimental, and can be shared among the developers. If a user
somehow stumbles upon one of these images and tries to install it,
they'll be warned that they're about to install an experimental Apple M3
port of Asahi Linux. Eventually, once the Asahi developers think that
their M3 port is ready for a wider audience, they can merge it into the
mainline Asahi repos, where it will be distributed through the usual
nightly CI builds (where RELEASE_TYPE=pre-release; M3 support is no
longer experimental).
This will allow GUIs to customize their behavior a little based on the
type of release.
For example, an OS installer may display a warning/disclaimer if
RELEASE_TYPE=prerelease. The software updates app might be a bit more
insistent about upgrading to the next major release if
RELEASE_TYPE=stable than if RELEASE_TYPE=lts
The --list-invocations command is similar to --list-boots, but shows
invocation IDs of specified unit. This should be useful when showing
a specific invocation of a unit.
The --invocation option is similar to --boot, but takes a invocation ID
or an offset. The -I option is equivalent to --invocation=0.
This allows for "per-instance" credentials for units. The use case
is best explained with an example. Currently all our getty units
have the following stanzas in their unit file:
"""
ImportCredential=agetty.*
ImportCredential=login.*
"""
This means that setting agetty.autologin=root as a system credential
will make every instance of our all our getty units autologin as the
root user. This prevents us from doing autologin on /dev/hvc0 while
still requiring manual login on all other ttys.
To solve the issue, we introduce support for renaming credentials with
ImportCredential=. This will allow us to add the following to e.g.
serial-getty@.service:
"""
ImportCredential=tty.serial.%I.agetty.*:agetty.
ImportCredential=tty.serial.%I.login.*:login.
"""
which for serial-getty@hvc0.service will make the service manager read
all credentials of the form "tty.serial.hvc0.agetty.xxx" and pass them
to the service in the form "agetty.xxx" (same goes for login). We can
apply the same to each of the getty units to allow setting agetty and
login credentials for individual ttys instead of globally.
- Improve wording for explanation when these variables are inherited
- Clarify that these variables are not placed in the process environment block,
so /proc/PID/environ cannot be used as a debugging tool
Currently inhibitors are bypassed unless an explicit request is made to
check for them, or even in that case when the requestor is root or the
same uid as the holder of the lock.
But in many cases this makes it impractical to rely on inhibitor locks.
For example, in Debian there are several convoluted and archaic
workarounds that divert systemctl/reboot to some hacky custom scripts
to try and enforce blocking accidental reboots, when it's not expected
that the requestor will remember to specify the command line option
to enable checking for active inhibitor locks.
Also in many cases one wants to ensure that locks taken by a user are
respected by actions initiated by that same user.
Change logind so that inhibitors checks are not skipped in these
cases, and systemctl so that locks are checked in order to show a
friendly error message rather than "permission denied".
Add new block-weak and delay-weak modes that keep the previous
behaviour unchanged.
This adds support in `systemd-analyze capability` for decoding
capability masks (sets), e.g.:
```console
$ systemd-analyze capability --mask 0000000000003c00
NAME NUMBER
cap_net_bind_service 10
cap_net_broadcast 11
cap_net_admin 12
cap_net_raw 13
```
This is intended as a convenience tool for pretty-printing capability
values as found in e.g. `/proc/$PID/status`.
The page was written when systemd-repart was primarily intended to be used on a
running system. But nowadays it's more often used to create images, so extend
that part of the description.
While at it, fix some whitespace issues and trim some overly complicated sentences.
Certainly on systemd 252 at least a configuration of
```
MemorySwapMax=40%
```
is supported but this was missing from the man page.
Only MemoryMax was documented as supporting a %.
Since Linux commit ddd1ad68826d ("net: bridge: Add netlink knobs for number
/ max learned FDB entries") [1] it is possible to limit to number of
dynamically learned fdb entries per bridge.
Add support to the systemd netdev bridge for the new netlink attribute
IFLA_BR_FDB_MAX_LEARNED.
[1] https://lore.kernel.org/all/20231016-fdb_limit-v5-0-32cddff87758@avm.de/
Signed-off-by: Gregor Herburger <gregor.herburger@ew.tq-group.com>
When credentials are used with Type=simple + ExecStartPost=,
i.e. when multiple sd-executor instances are running in parallel
for a single service, the state of final credential dir
might be unexpected wrt path_is_mount_point() and other
steps. So, let's imply Type=exec if not explicitly specified,
and emit a warning otherwise.
pci_get_hotplug_slot() has the following limitations:
- if slots are not hotpluggable, they are not in /sys/bus/pci/slots.
- the address at /sys/bus/pci/slots/X/addr doesn't contains the function part,
so on some system, 2 different slots with different _SUN end up with the same
hotplug_slot, leading to naming conflicts.
- it tries all parent devices until it finds a slot number, which is incorrect,
and what led to NAMING_BRIDGE_MULTIFUNCTION_SLOT being disabled.
The use of PCI hotplug to find the slot (ACPI _SUN) was introduced in
0035597a30
"udev: net_id - export PCI hotplug slot names" on 2012/11/26.
At the same time on the kernel side we got
bb74ac23b1
"ACPI: create _SUN sysfs file" on 2012/11/16.
Using PCI hotplug was the only way at the time, but now 12 years later we can use
firmware_node/sun sysfs file.
Looking at a small selection of server HW, for HPE (Gen10 DL325), the _SUN is attached
to the NIC device, whereas for Dell (R640/R6515/R6615) and Cisco (UCSC-C220-M5SX),
the _SUN is on the first parent pcieport.
We still fallback to pci_get_hotplug_slot() to handle the s390 case and
maybe some other coner cases (_SUN on grand parent device that is not a
bridge ?).
Make the warning for oneshot services (where RuntimeMaxSec= has no
effect) more actionable by pointing to the directive people can use
instead to effectively limit their runtime.
As per DPS the UUID for /var/ should be keyed by the local machine-id,
which is non-trivial to do in a script. Enhance 'systemd-id128' to
take 'var-partition-uuid' as a verb, and if so perform the
calculation.
This is an analog of x-systemd.requires that adds a Wants dependency
instead. This is useful for filesystems that support mounting in
degraded states (such as multi-device filesystems).
Makes it possible to specify URLs to a changelog and an appstream
catalog XML in the sysupdate.d/*.conf files. This will be passed along
to the clients of systemd-sysupdated, which can then present this data.
This prevents sysupdate from going out to the network to enumerate
available instances. When combined with the list command, this lets us
query installed instances
The XDG base dir spec adopted ~/.local/state/ as a thing a while back,
and we updated our docs in b4d6bc63e6, but
forgot to to update the table at the bottom to fully reflect the update.
Fix that.
This file doesn't document features of systemd, but is more a of a
general description that generalizes/modernizes FHS. As such, the items
listed in it weren't "added" in systemd versions, they simply reflect
general concepts independent of any specific systemd version. hence
let's drop this misleading and confusing version info.
Or in other words, the man page currently claims under "/usr/": "Added
in version 215." – Which of course is rubbish, the directory existed
since time began.
This also rebreaks all paragaphs this touches.
No content changes.
Previously, the order was quite chaotic, even sometimes interleaved with
entirely unrelated switches. Let's clean this up and use the same order
as in the spec.
This doesn't change anything real, but I think it's a worthy clean-up in
particular as this order is documented as the PCR measurement order of
these sections, hence there's actually a bit of relevance to always
communicate the same order everywhere.
At this point we have a clearer model:
* systemd-measure should be used for measuring UKIs on vendor build
systems, i.e. only cover stuff predictable by the OS vendor, and
identical on all systems. And that is pretty much only PCR 11.
* systemd-pcrlock should cover the other PCRs, which carry inherently
local information, and can only be predicted locally and not already
on vendor build systems.
Because of that, let's not bother with any PCRs except for 11 in
systemd-measure. This was added at a time where systemd-pcrlock didn't
exist yet, and hence it wasn't clear how this will play out in the end.
Update the man page of tmpfiles.d to remove outdated comments regarding the behavior of ownership with symlinks.
The behavior has been changed in this commit 51207ca134
The new "password-cache" option allows customizing behavior of the
ask-password module in regards to caching credentials in the kernel
keyring. There are 3 possible values for this option:
* read-only - look for credentials in kernel keyring before asking
* on - same as read-only, but also save credentials input by user
* off - disable keyring credential cache
Currently the cache is forced upon the user and this can cause issues.
For example, if user wants to attach two volumes with two different
FIDO2 tokens in a quick succession, the attachment operation for the
second volume will use the PIN cached from the first FIDO2 token, which
of course will fail and since tokens are only attempted once, this will
cause fallback to a password prompt.
Currently, if user doesn't specify a key file, /etc/cryptsetup-keys.d/
and /run/cryptsetup-keys.d/ will be searched for a key file with name
matching the volume name. But current implementation has an important
flaw. When the auto-discovered key is a socket file - it will read the
key only once, while the socket might provide different keys for
different types of tokens. The issue is fixed by trying to discover the
key on each unlock attempt, this way we can populate the socket bind
name with something the key provider might use to differentiate between
different keys it has to provide.
It seems entirely reasonable to make a policy which e.g. allows block operations
for interactive users after authentication. The tool should support this, so that
more complicated local policies can be used.
Related to https://github.com/systemd/systemd/pull/30307.
This adds %q, %A and %M specifiers to tmpfiles:
- %A and %M were previously added to tmpfiles.d man page, but not to specifier_table
- %q is added via COMMON_SYSTEM_SPECIFIERS
As discussed in https://github.com/systemd/systemd/pull/32724#discussion_r1638963071
I don't find the opposite reasoning particularly convincing.
We have ProtectHome=tmpfs and friends, and those can be
pretty much trivially implemented through TemporaryFileSystem=
too. The new logic brings many benefits, and is completely generic,
hence I see no reason not to expose it. We can even get more tests
for the code path if we make it public.
This is generally useful, but in some cases particularly: when
implementing enumeration calls that use the "more" flag to return
multiple replies then for the first reply we need to return an error in
case the list of objects to enumerate is empty, usually so form of
"NoSuchXYZ" error. In many cases this shouldn't really be treated as
error, as an empty list probably more than not is as valid as a list
with one, two or more entries.
Update frameworks that work automatically in the background
occasionally need to schedule reboots. Systemd-logind already
provides a nice mechanism to schedule shutdowns, send notfications
and block logins short before the time. Systemd has a framework for
calendar events, so we may conveniently use logind to define a
maintenance time for reboots.
The existing ScheduleShutdown DBus method in logind expects a usec_t
with an absolute time. Passing USEC_INFINITY as magic value now tells
logind to take the time from the configured maintenance time if set.
"shutdown -r" leverages that and uses the maintenance time
automatically if configured. The one minute default is still used if
nothing was specified.
Similarly the new 'auto' setting for the --when parameter of systemctl
uses the maintenance time if configured or a one minute timer like the
shutdown command.
systemd-cryptsetup supports a FIDO2 mode with manual parameters, where
the user provides all the information necessary for recreating the
secret, such as: credential ID, relaying party ID and the salt. This
feature works great for implementing 2FA schemes, where the salt file
is for example a secret unsealed from the TPM or some other source.
While the unlocking part is quite straightforward to set up, enrolling
such a keyslot - not so easy. There is no clearly documented
way on how to set this up and online resources are scarce on this topic
too. By implementing a straightforward way to enroll such a keyslot
directly from systemd-cryptenroll we streamline the enrollment process
and reduce chances for user error when doing such things manually.
DynamicUser= enables PrivateTmp= implicitly to avoid files owned by reusable uids
leaking into the host. Change it to instead create a fully private tmpfs instance
instead, which also ensures the same result, since it has less impactful semantics
with respect to PrivateTmp=yes, which links the mount namespace to the host's /tmp
instead. If a user specifies PrivateTmp manually, let the existing behaviour
unchanged to ensure backward compatibility is not broken.
Historically, systemd-tmpfiles was designed to manager temporary
files, but nowadays it has become a generic tool for managing
all kinds of files. To avoid user confusion, let's remove "temporary"
from the tool's description.
As discussed in #33349
The setting of systemd clock is important and deserves an accurate description,
see for example:
https://discussion.fedoraproject.org/t/f38-to-f39-40-dnf-system-upgrade-can-fail-on-raspberry-pi/92403https://bugzilla.redhat.com/show_bug.cgi?id=2242759
The meat of the description was in systemd-timesyncd.service(8), but
actually it's systemd that sets the clock. In particular, systemd-timesyncd
doesn't know anything about /usr/lib/clock-epoch, and since systemd sets
the clock to the epoch when initializing, systemd-timesyncd would only
get to advance the clock to the epoch under special circumstances.
Also, systemd-timesyncd is an optional component, so we can't even rely
on its man page being installed in all circumstances. The description needs
to be moved to systemd(1).
The description is updated to describe the changes that were made in
previous commits.
Mention that by default, /home is managed by tmpfiles.d/home.conf, and
recommend that users run systemd-tmpfiles --dry-run --purge first to
see exactly what will be removed.
When in FIDO2 mode with manual parameters, i.e. when not reading the
parameters off the LUKS2 header, the current behavior in regards to PIN,
UP and UV features is to default to v248 logic, where we use PIN + UP
when needed, and do not configure UV at all. Let's allow users to
configure those features in manual mode too.
For putting together "varlinkctl call" command lines it's useful to
quickly enumerate all methods implemented by a service. Hence, let's add
a new "list-methods" which uses the introspection data of a service to
quickly list methods.
This is implemented as a special flavour of the "introspect" logic,
and just suppresses all output except for the method names.
let's make it easier to use the introspection functionality of
"varlinkctl": if no interface name is shown, display the introspection
data of all available interfaces. Moreover, allow that multiple
interfaces can be listed, in which case we enumerate them all.
This relieves the user from having to list interfaces first in order to
find the ones which to introspect.
I find myself wanting to check this data with a quick command, and
browsing through /sys/ manually getting binary data sucks. Hence let's
do add a nice little analysis tool.
Multipath TCP (MPTCP), standardized in RFC8684 [1], is a TCP extension
that enables a TCP connection to use different paths. It allows a device
to make use of multiple interfaces at once to send and receive TCP
packets over a single MPTCP connection. MPTCP can aggregate the
bandwidth of multiple interfaces or prefer the one with the lowest
latency, it also allows a fail-over if one path is down, and the traffic
is seamlessly re-injected on other paths.
To benefit from MPTCP, both the client and the server have to support
it. Multipath TCP is a backward-compatible TCP extension that is enabled
by default on recent Linux distributions (Debian, Ubuntu, Redhat, ...).
Multipath TCP is included in the Linux kernel since version 5.6 [2]. To
use it on Linux, an application must explicitly enable it when creating
the socket:
int sd = socket(AF_INET(6), SOCK_STREAM, IPPROTO_MPTCP);
No need to change anything else in the application.
This patch allows MPTCP protocol in the Socket unit configuration. So
now, a <unit>.socket can contain this to use MPTCP instead of TCP:
[Socket]
SocketProtocol=mptcp
MPTCP support has been allowed similarly to what has been already done
to allow SCTP: just one line in core/socket.c, a very simple addition
thanks to the flexible architecture already in place.
On top of that, IPPROTO_MPTCP has also been added in the list of allowed
protocols in two other places, and in the doc. It has also been added to
the missing_network.h file, for systems with an old libc -- note that it
was also required to include <netinet/in.h> in this file to avoid
redefinition errors.
Link: https://www.rfc-editor.org/rfc/rfc8684.html [1]
Link: https://www.mptcp.dev [2]
Set the $REMOTE_ADDR environment variable for AF_UNIX socket connections
when using per-connection socket activation (Accept=yes). $REMOTE_ADDR
will now contain the remote socket's file system path (starting with a
slash "/") or its address in the abstract namespace (starting with an
at symbol "@").
This information is essential for identifying the remote peer in AF_UNIX
socket connections, but it's not easy to obtain in a shell script for
example without pulling in a ton of additional tools. By setting
$REMOTE_ADDR, we make this information readily available to the
activated service.
Since we document /usr/local/lib/systemd/ and other paths for various things,
add notes that this is not supported if /usr/local is a separate partition. In
systemd.unit, I tried to add the footnote in the table where
/usr/local/lib/systemd/ is listed, but that get's rendered as '[sup]a[/sup]'
with a mangled footnote at the bottom of the table :( .
Also, split paragraphs in one place where the subject changes without any
transition.
Follow-up for 02f35b1c90.
Replaces https://github.com/systemd/systemd/pull/33231.
Section "Description" didn't actually say what systemd does. And we had a giant
"Concepts" section that actually described units types and other details about
them. So let's move the basic description of functionality to "Description" and
rename the following section to "Units".
The link to the Original Design Document is moved to "See Also", it is of
historical interest mostly at this point.
The only actual change is that when talking about API filesystems, /dev is also
mentioned. (I think /sys+/proc+/dev are the canonical set and should be always
listed on one breath.)
It has been mentioned in IPv4Forwarding= and IPv6Forwarding=,
but let's also explain in the settings who imply these settings.
Follow-up for 3976c43092 and
485f5148b3.
For run0 (as opposed to systemd-run in general), connecting to
the system bus (of localhost or container) as a different user
than root and then trying to elevate privilege from that
makes little sense:
https://github.com/systemd/systemd/issues/32997#issuecomment-2127992973
The @ syntax is mostly useful when connecting to the user bus,
which is not a use case for run0. Hence, let's remove the example.
The syntax will be properly refused in #32999.
- mention that /run/machine-id is used if exist.
- mention system.machine_id credential,
- credential, VM uuid, and container uuid are not read when --root=
is specified or running in a chroot environment.
Fixes https://github.com/systemd/systemd/issues/28514.
Quoting https://github.com/systemd/systemd/issues/28514#issuecomment-1831781486:
> Whenever PAM is enabled for a service, we set up the PAM session and then
> fork off a process whose only job is to eventually close the PAM session when
> the service dies. That services we run with service privileges, both to
> minimize attack surface and because we want to use PR_SET_DEATHSIG to be get
> a notification via signal whenever the main process dies. But that only works
> if we have the same credentials as that main process.
>
> Now, if pam_systemd runs inside the PAM stack (which it normally does) it's
> session close hook will ask logind to synchronously end the session via a bus
> call. Currently that call is not accessible to unprivileged clients. And
> that's the part we need to relax: allow users to end their own sessions.
The check is implemented in a way that allows the kill if the sender is in
the target session.
I found 'sudo systemctl --user -M "zbyszek@" is-system-running' to
be a convenient reproducer.
Before:
May 16 16:25:26 x1c systemd[1]: run-u24754.service: Deactivated successfully.
May 16 16:25:26 x1c dbus-broker[1489]: A security policy denied :1.24757 to send method call /org/freedesktop/login1:org.freedesktop.login1.Manager.ReleaseSession to org.freedesktop.login1.
May 16 16:25:26 x1c (sd-pam)[3036470]: pam_systemd(login:session): Failed to release session: Access denied
May 16 16:25:26 x1c systemd[1]: Stopping session-114.scope...
May 16 16:25:26 x1c systemd[1]: session-114.scope: Deactivated successfully.
May 16 16:25:26 x1c systemd[1]: Stopped session-114.scope.
May 16 16:25:26 x1c systemd[1]: session-c151.scope: Deactivated successfully.
May 16 16:25:26 x1c systemd-logind[1513]: Session c151 logged out. Waiting for processes to exit.
May 16 16:25:26 x1c systemd-logind[1513]: Removed session c151.
After:
May 16 17:02:15 x1c systemd[1]: run-u24770.service: Deactivated successfully.
May 16 17:02:15 x1c systemd[1]: Stopping session-115.scope...
May 16 17:02:15 x1c systemd[1]: session-c153.scope: Deactivated successfully.
May 16 17:02:15 x1c systemd[1]: session-115.scope: Deactivated successfully.
May 16 17:02:15 x1c systemd[1]: Stopped session-115.scope.
May 16 17:02:15 x1c systemd-logind[1513]: Session c153 logged out. Waiting for processes to exit.
May 16 17:02:15 x1c systemd-logind[1513]: Removed session c153.
Edit: this seems to also fix https://github.com/systemd/systemd/issues/8598.
It seems that with the call to ReleaseSession, we wait for the pam session
close hooks to finish. I inserted a 'sleep(10)' after the call to ReleaseSession
in pam_systemd, and things block on that, nothing is killed prematurely.
Prompted by #32895
Rather than ordering with each power operation targets,
ordering against shutdown.target which is a valid
synchronization point. This has no effect if soft-reboot
is being performed.