IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
It is strictly mandatory that this is done during initial
transaction, and not later when the system is already running.
Hence let's refuse manual start for all of the involved units.
Additionally, refuse manual stop for systemd-factory-reset-complete.service,
as it flags the factory reset completion through
/run/systemd/factory-reset-complete, which never gets removed
for the whole boot.
While we typically just use /dev/tpmrm0 for accessing the TPM chip (i.e
via the kernel's own resource manager), some sysfs properties that
matter are on /dev/tpm0 only (i.e. the version without the kernel TPM
resource manager). Hence, wait for both to show up in tpm2.target, so
that we can be sure the full API is available.
This matters because we want to access /sys/class/tpm/tpm0/ppi/request
in the next commit.
This introduces a bunch of facilities:
1. The factory-reset.target unit that requests a factory reset is now
complemented by factory-reset-now.target that executes it at next
boot.
2. This latter is added to the initial transaction via the new trivial
systemd-factory-reset-generator.
3. A tool systemd-factory-reset has been added to query, request,
cancel, complete factory reset operations (via EFI variables). Two of
these are wrapped into units that are plugged into
factory-reset.target and factory-reset-now.target respectively. The
tool also provides a simple Varlink API.
This should make things a lot cleaner, and both be useful as explicit
implementation on UEFI, and as template + hookpoints for alternative
implementations on non-UEFI.
Terminating the plymouth/console agents when the wall agent takes over
can happen asynchronously, after all the pw queries are async anyway and
hence can be seen by both the plymouth/console agents and the wall
agent.
By stopping the two agents with "--no-block" we add a bit of robustness,
since trouble of them exiting won't block the wall agent to start.
This addresses the issue the previous commit fixes in a different way.
Let's make sure that the moment where factory reset is requested is
visible in the TPM PCR state, so that access to secrets is terminated.
This is particulary interesting when the system is booted with
systemd.unit=factory-reset.target on the kernel command line, requesting
a factory reset on the following boot. The preparations done in
userspace should already lose access to the TPM in that case.
storagetm mode means we we are network accessible. let's lock down
access to TPM secrets in this case: let's measure a pcr "phase" string
into PCR 11.
This is good as it means that if we are exploited in this state FDE
secrets protected by TPM are likely to remain protected, since the PCR
values wouldn't allow access.
Let's make the three subsystems more alike, and add remote-*setup.traget
for all three, enable them all three in the presets, and make them
behave in a similar fashion.
This is useful for booting from a freshly downloaded disk image: just
specify
rd.systemd.pull=verify=no,machine,blockdev,raw:image:https://192.168.100.1:8081/image.raw
root=/dev/disk/by-loop-ref/image.raw-part2
on the kernel command line, and we'll download that in the initrd and boot from it.
(note the above disables download-time verification, putting trust in
verity and image policy that this won#t do harm)
Here's a more complete example. From a git checkout do:
ninja -C build && mkosi -f -T serve
and then from another terminal do within the same checkout:
./build/systemd-vmspawn \
--ram=16G \
--register=no \
-n \
-i ./build/mkosi.output/image.raw \
rd.systemd.pull=verify=no,machine,blockdev,raw:image:http://192.168.100.1:8081/image.raw \
root=/dev/disk/by-loop-ref/image.raw-part2 \
rootflags=x-systemd.device-timeout=infinity \
ip=any
This will then boot via the ESP of the specified image, then download
the image via HTTP from the mkosi instance running in the first
terminal, attach it to a loopback block device, and then use its second
partition as root fs, and boot into it.
(this assumes your host is 192.168.100.1, of course)
Note that downloading the full image takes a bit of time (this downloads
it uncompressed after all), hence we turn off the timeout to wait for
the device.
This also introduces a new "imports.target" unit (and associated
"imports-pre.target") between imports are grouped, and which ensure the
imports actually are ordered correctly both on the host and in the
initrd.
This is mostly just a friendly unit wrapper around "systemd-dissect
--attach".
This is useful so that we can automatically attach disk images as
block device at boot.
09fbff57fcde47782a73f23b3d5cfdcd0e8f699b introduced new knob
for such functionality. However, that seems unnecessary.
The mount option string is ubiquitous in that all of fstab,
kernel cmdline, credentials, systemd-mount, ... speak it.
And we already have x-systemd.device-bound= that's parsed
by pid1 instead of fstab-generator. It feels hence more natural
for graceful options to be an extension of that, rather than
its own property.
There's also one nice side effect that the setting itself
is now more graceful for systemd versions not supporting
such feature.
systemd-mountfsd so far provided a MountImage() API call for mounting a
disk image and returning a set of mount fds. This complements the API
with a new MountDirectory() API call, that operates on a directory
instead of an image file. Now, what makes this interesting is that it
applies an idmapping from the foreign UID range to the provided target
userns – and in which case unpriveleged operation is allowed (well,
under some conditions: in particular the client must own a parent dir of
the provided path).
This allows container managers to run fully unprivileged from
directories – as long as those directories are owned by the foreign UID
range. Basic operation is like this:
1. acquire a transient userns from systemd-nsresourced with 64K users
2. ask systemd-mountfsd for an idmapped mount of the container dir
matching that userns
3. join the userns and bind the mount fd as root.
Note that we have to drop various sandboxing knobs from the mountfsd
service file for this to work, since the kernel's security checks that
try to ensure than an obstructed /proc/ cannot be circumvented via
mounting a new procfs will otherwise prohibit mountfsd to duplicate the
mounts properly.
File system modules should be something the kernel can autoload
automatically, and according to my testing that works fine, hence let's
drop the explicit deps, in particular as systems usually stick to one fs
for these things, not both.
I inquired bluca about the reason to add it, and didn't remember
anymore, and was fine with me removing this. So let's remove this for
now, should issues arise we can revert this.
mountfsd is supposed to be available during early boot aleady, before
systemd-tmpfiles-setup-dev-early.service completes, hence make sure
loopback devices and DM already work before that.
As suggested by yuwata here:
https://github.com/systemd/systemd/pull/35685#issuecomment-2608157569
The need for -o was introduced in db6aeda to set the -p flag for login.
Setting -o overrides agettys built-in handling of arguments, so "-- \\u" was needed to mimic it.
This broke the autologin-feature, since the -f (noauth) flag is not passed to login [1].
But with 3d2157e, the -p flag is dropped, but the full change wasn't reverted,
leaving autologin still broken - But for no reason since agetty does the right thing.
Reference:
[1]: https://github.com/util-linux/util-linux/blob/4e14b57/term-utils/agetty.c#L529-L550
- Set `RefuseManualStart=yes`.
- Order before shutdown.target and emergency.target.
- Remove wrong `Wants=remote-fs.target` dependency from
breakpoint-pre-switch-root.service.
- Remove unneeded `After=sysroot.mount` from breakpoint-pre-switch-root.service
(implied by initrd.target).
This turns systemd-ask-password into a small Varlink service, so that
there's an standard IPC way to ask for a password. It mostly directly
exposes the functionality of the Varlink service.
Otherwise, we see "WARNING: failed to open /dev/btrfs-control,
skipping device registration: Operation not permitted" in systemd-homed's
logs when creating a btrfs on luks home.
Introduce the `systemd.break=` kernel command line option to allow stopping the
boot process at a certain point and spawn a debug shell. After exiting this
shell, the system will resume booting.
It accepts the following values:
- `pre-udev`: before starting to process kernel uevents (initrd and host).
- `pre-basic`: before leaving early boot and regular services start (initrd and
host).
- `pre-mount`: before the root filesystem is mounted (initrd).
- `pre-switch-root`: before switching root (initrd).
In the rootfs these need to run after /var/lib/ has been set up. In the
initrd we want them to run as soon as possible so that they can be used
to customize setting up the rootfs.
Let's bump the kernel baseline a bit to 4.3 and thus require ambient
caps.
This allows us to remove support for a variety of special casing, most
importantly the ExecStart=!! hack.
This introduces a new unit condition check: that matches if a specific
kmod module is allowed. This should be generally useful, but there's one
usecase in particular: we can optimize modprobe@.service with this and
avoid forking out a bunch of modprobe requests during boot for the same
kmods.
Checking if a kernel module is loaded is more complicated than just
checking if /sys/module/$MODULE/ exists, since kernel modules typically
take a while to initialize and we must check that this is complete (by
checking if the sysfs attr "initstate" is "live").
In the initrd we want to run as early as possible, before
any of the filesystems are set up, so that users can use
sysexts to customize kernel modules, firmware, etc. But
in the root fs it needs to run after /var/ has been set
up. Split the unit, and have an initrd-specific one that
runs very early.
In the initrd we want to run as early as possible, before
any of the filesystems are set up, so that users can use
confexts to customize fstab/veritytab/crypttab/etc. But
in the root fs it needs to run after /var/ has been set
up. Split the unit, and have an initrd-specific one that
runs very early.
This commit introduces a build-time option to enable/disable sysupdated
separately from sysupdate. 'auto' translated to enabled by default in
developer builds.
Closes#28367 (but not really in the exact form, see below)
We have the problem of restarting all user manager instances
after upgrade. Current approaches involve systemctl kill
with SIGRTMIN+25, which is async and feels rather ugly [1][2];
or systemctl --machine=user@ --user, which requires entering
each user session. Neither is particularly elegant.
Instead, let's just signal daemon-reexec when user@.service
is reloaded from system manager. Our long goal of dropping
daemon-reload in favor of reexec (see TODO) is unlikely to happen
due to user dbus restrictions, but here the synchronization
is done via READY=1.
[1] https://gitlab.archlinux.org/archlinux/packaging/packages/systemd/-/blob/main/systemd.install?ref_type=heads#L37
[2] https://salsa.debian.org/systemd-team/systemd/-/blob/debian/master/debian/systemd.postinst#L24#28367 would not really work for us now I come to think about it,
because all processes will be reparented to pid1 as soon as
original user manager process exits. This alternative approach
seems good enough for our use case.
Add support for opening /dev/hidraw devices via logind's TakeDevice().
Same semantics as our support for evdev devices, but it requires the
HIDIOCREVOKE ioctl in the kernel.
tmpfiles might be linking the configuration for ldconfig into /etc
so make sure it runs after it so that the configuration is guaranteed
to be in place.
The configuration files required by ldconfig could be put into
place by systemd-confext.service (ldconfig only looks in /etc) so
let's order the service after systemd-confext.service to make sure
any config files are in place before the service runs.
Monitor the sysctl set by networkd for writes, if a sysctl is
overwritten with a different value than the one we set, emit a warning.
Writes are detected with an eBPF program attached as BPF_CGROUP_SYSCTL
which reports the sysctl writes only in net/.
The eBPF program only reports sysctl writes from a different cgroup than networkd.
To do this, it uses the `bpf_current_task_under_cgroup_proto()` helper,
which will be available allowed in BPF_CGROUP_SYSCTL from kernel 6.12[1].
Loading a BPF_CGROUP_SYSCTL program requires the CAP_SYS_ADMIN capability,
so drop it just after the program load, whether it loads successfully or not.
Writes are logged but permitted, in future the functionality can be
extended to also deny writes to managed sysctls.
[1] https://lore.kernel.org/bpf/20240819162805.78235-3-technoboy85@gmail.com/