1
1
mirror of https://github.com/systemd/systemd-stable.git synced 2024-12-25 23:21:33 +03:00
Commit Graph

1419 Commits

Author SHA1 Message Date
Jason A. Donenfeld
0a1d8ac77a stub: handle random seed like sd-boot does
sd-stub has an opportunity to handle the seed the same way sd-boot does,
which would have benefits for UKIs when sd-boot is not in use. This
commit wires that up.

It refactors the XBOOTLDR partition discovery to also find the ESP
partition, so that it access the random seed there.
2022-11-23 00:56:45 +01:00
Jason A. Donenfeld
a4eea6038c bootctl: install system token on virtualized systems
Removing the virtualization check might not be the worst thing in the
world, and would potentially get many, many more systems properly seeded
rather than not seeded. There are a few reasons to consider this:

- In most QEMU setups and most guides on how to setup QEMU, a separate
  pflash file is used for nvram variables, and this generally isn't
  copied around.

- We're now hashing in a timestamp, which should provide some level of
  differentiation, given that EFI_TIME has a nanoseconds field.

- The kernel itself will additionally hash in: a high resolution time
  stamp, a cycle counter, RDRAND output, the VMGENID uniquely
  identifying the virtual machine, any other seeds from the hypervisor
  (like from FDT or setup_data).

- During early boot, the RNG is reseeded quite frequently to account for
  the importance of early differentiation.

So maybe the mitigating factors make the actual feared problem
significantly less likely and therefore the pros of having file-based
seeding might outweigh the cons of weird misconfigured setups having a
hypothetical problem on first boot.
2022-11-21 15:13:26 +01:00
Jason A. Donenfeld
0be72218f1 boot: implement kernel EFI RNG seed protocol with proper hashing
Rather than passing seeds up to userspace via EFI variables, pass seeds
directly to the kernel's EFI stub loader, via LINUX_EFI_RANDOM_SEED_TABLE_GUID.
EFI variables can potentially leak and suffer from forward secrecy
issues, and processing these with userspace means that they are
initialized much too late in boot to be useful. In contrast,
LINUX_EFI_RANDOM_SEED_TABLE_GUID uses EFI configuration tables, and so
is hidden from userspace entirely, and is parsed extremely early on by
the kernel, so that every single call to get_random_bytes() by the
kernel is seeded.

In order to do this properly, we use a bit more robust hashing scheme,
and make sure that each input is properly memzeroed out after use. The
scheme is:

    key = HASH(LABEL || sizeof(input1) || input1 || ... || sizeof(inputN) || inputN)
    new_disk_seed = HASH(key || 0)
    seed_for_linux = HASH(key || 1)

The various inputs are:
- LINUX_EFI_RANDOM_SEED_TABLE_GUID from prior bootloaders
- 256 bits of seed from EFI's RNG
- The (immutable) system token, from its EFI variable
- The prior on-disk seed
- The UEFI monotonic counter
- A timestamp

This also adjusts the secure boot semantics, so that the operation is
only aborted if it's not possible to get random bytes from EFI's RNG or
a prior boot stage. With the proper hashing scheme, this should make
boot seeds safe even on secure boot.

There is currently a bug in Linux's EFI stub in which if the EFI stub
manages to generate random bytes on its own using EFI's RNG, it will
ignore what the bootloader passes. That's annoying, but it means that
either way, via systemd-boot or via EFI stub's mechanism, the RNG *does*
get initialized in a good safe way. And this bug is now fixed in the
efi.git tree, and will hopefully be backported to older kernels.

As the kernel recommends, the resultant seeds are 256 bits and are
allocated using pool memory of type EfiACPIReclaimMemory, so that it
gets freed at the right moment in boot.
2022-11-14 15:21:58 +01:00
Yu Watanabe
403ca5b8b4 unit: also prioritize input devices when triggering devices
As in most cases, tty device without input devices is meaningless.

This also swaps the priority of tty and net:
- input devices are often connected under USB bus, hence may take
  slightly much time to be initialized. As, described in the above,
  in most cases it is allowed that tty devices are initialized just
  before input devices,
- network configuration usually requires much time, e.g. DHCP or RA,
  hence it is better that network interfaces initialized. Then,
  network services can start DHCP client or friends earlier.

Fixes #24026.
2022-10-26 10:49:09 +02:00
Lennart Poettering
047273e6e8 pcrphase: add two additional phases
This adds two more phases to the PCR boot phase logic: "sysinit" +
"final".

The "sysinit" one is placed between sysinit.target and basic.target.
It's good to have a milestone in this place, since this is after all
file systems/LUKS volumes are in place (which sooner or later should
result in measurements of their own) and before services are started
(where we should be able to rely on them to be complete).

This is particularly useful to make certain secrets available for
mounting secondary file systems, but making them unavailable later.

This breaks API in a way (as measurements during runtime will change),
but given that the pcrphase stuff wasn't realeased yet should be OK.
2022-10-17 12:09:43 +02:00
Daan De Meyer
9377e53f4f meson: Fix pcrphase unit conditions 2022-10-11 15:29:08 +02:00
Topi Miettinen
75723d31a6 units: udev: partially emulate ProtectClock=
Drop CAP_SYS_TIME and CAP_WAKE_ALARM capabilities and block clock-related
system calls. Update TODO.
2022-09-26 11:40:28 +02:00
Lennart Poettering
4cebd207d1 tmpfiles: add lines for provisioning ssh keys for root by default
With this, I can now easily do:

    systemd-nspawn --load-credential=ssh.authorized_keys.root:/home/lennart/.ssh/authorized_keys --image=… --boot

To boot into an image with my SSH key copied in. Yay!
2022-09-23 09:30:00 +02:00
Lennart Poettering
40f1856791 units: add pcrphase units 2022-09-22 16:53:34 +02:00
Zbigniew Jędrzejewski-Szmek
15b3f7e309
Merge pull request #24670 from keszybz/early-boot-ordering
Early boot ordering
2022-09-17 13:26:51 +02:00
Dan Streetman
137d162c42 add CAP_LINUX_IMMUTABLE to systemd-machined, so it can handle machinectl read-only requests
Without this, the 'machinectl read-only ...' command always fails.
2022-09-16 19:50:52 +01:00
Yu Watanabe
f562abe296 unit: drop ProtectClock=yes from systemd-udevd.service
This partially reverts cabc1c6d7a.

The setting ProtectClock= implies DeviceAllow=, which is not suitable
for udevd. Although we are slowly removing cgropsv1 support, but
DeviceAllow= with cgroupsv1 is necessarily racy, and reloading PID1
during the early boot process may cause issues like #24668.

Let's disable ProtectClock= for udevd. And, if necessary, let's
explicitly drop CAP_SYS_TIME and CAP_WAKE_ALARM (and possibly others)
by using CapabilityBoundingSet= later.

Fixes #24668.
2022-09-16 03:41:29 +09:00
Zbigniew Jędrzejewski-Szmek
89c4dc52b3 units: drop path to executable in $PATH
We don't have it other places, so let's make things a bit simpler.
2022-09-15 14:59:11 +02:00
Zbigniew Jędrzejewski-Szmek
5b5ec138c6 units: make sure that initrd-switch-root.service pulls in .target
Normally we queue initrd-switch-root.target/isolate, which pulls in the
service via Wants= in the .target unit file. But if the service is instead
started directly, there may be nothing pulling in the target. Let's make
sure that the reference exists.
2022-09-15 14:59:11 +02:00
Zbigniew Jędrzejewski-Szmek
3449814b8b units: add dependency ordering for emergency.service conflicts
If we want to stop those services which would compete for access to
the console, we need to have an ordering so that they are actually
stopped before the other things starts, not asynchronously.
2022-09-15 14:59:11 +02:00
Zbigniew Jędrzejewski-Szmek
7c0e2b5559 units: add ordering dependencies on initrd-switch-root.target
For shutdown, we queue shutdown.target/start, so in every unit which should be
stopped *before* shutdown, we need both Conflicts and an ordering dependency
with shutdown.target (either Before= or After= would work, because stop jobs
are always ordered before start jobs).

For initrd transition, we queue initrd-switch-root.service/isolate. This
automatically creates a /stop job for every running unit without
IgnoreOnIsolate. But no ordering dependency is created, unless the unit has a
(possibly transitive) ordering dependency on initrd-switch-root.service.
Since most units must stop before the transition, we should add the ordering
dependency. It is nicer to use Before=initrd-switch-root.target for this.
initrd-switch-root.target is ordered before initrd-switch-root.service, so
the effect it the same when both are in a transaction.

Fixes #23745.

To also cover the case where somebody is emergency mode in the initrd and
queues initrd-switch-root.service/start (not isolate), also add
Conflicts=initrd-switch-root.target, so various units are stopped properly.
This extends 2525682565 to cover all the other
services that are touched. It could be consider "operator error", but it's
easy to make and it's nicer if we can make this more foolproof.
2022-09-15 14:59:11 +02:00
Zbigniew Jędrzejewski-Szmek
d5fd07cdee units/systemd-network-generator.service: add forgotten ordering for shutdown 2022-09-15 14:59:11 +02:00
Zbigniew Jędrzejewski-Szmek
9810e41942 units: reorder/split unit dependency blocks
The block is reordered and split to have:
  1. description + documentation
  2. (optionally) conditions
  3. all the dependencies
I think it's easier to read the units this way.
Also, the Conflicts+Before is seperated out to separate lines.
The ordering dependency is "fake", because it could just as well be
After=, we are adding it to force ordering wrt. shutdown.target, and
it plays a different role than the other Before=, which are about a
real ordering on boot.
2022-09-15 14:59:11 +02:00
Nick Rosbrook
8b8bd621e1 pstore: do not try to load all known pstore modules
Commit 70e74a5997 ("pstore: Run after modules are loaded") added After=
and Wants= entries for all known kernel modules providing a pstore.

While adding these dependencies on systems where one of the modules is
not present, or not configured, should not have a real affect on the
system, it can produce annoying error messages in the kernel log. E.g.
"mtd device must be supplied (device name is empty)" when the mtdpstore
module is not configured correctly.

Since dependencies cannot be removed with drop-ins, if a distro wants to
remove some of these modules from systemd-pstore.service, they need to
patch units/systemd-pstore.service.in. On the other hand, if they want
to append to the dependencies this can be done by shipping a drop-in.

Since the original intent of the previous commit was to fix [1], which
only requires the efi_pstore module, remove all other kernel module
dependencies from systemd-pstore.service, and let distros ship drop-ins
to add dependencies if needed.

[1] https://github.com/systemd/systemd/issues/18540
2022-09-14 05:30:03 +09:00
Lennart Poettering
d3d2dd5e4f units: prolong the stop timeout for homed
Let's give IO/resizing/… more time then usual.

Fixes: #22901
2022-09-05 15:22:53 +02:00
Frantisek Sumsal
cd7ad0cbde
Merge pull request #24054 from keszybz/initrd-no-reload
Don't do daemon-reload in the initrd
2022-08-18 13:15:14 +00:00
Zbigniew Jędrzejewski-Szmek
db5276215a initrd-parse-etc: override argv[0] to avoid dracut issue
Quoting https://github.com/systemd/systemd/pull/24054#issuecomment-1210501631:
> this would need a patch in dracut, specifically adding the
> systemd-sysroot-fstab-check to the list of installed stuff:
> fe8fa2b0ca/modules.d/00systemd/module-setup.sh (L47).
>
> I could do this manually in the CI (and I guess I'd have to do it anyway even
> if the patch lands in upstream, since it won't be available in C8S), but it
> should get there first before merging this PR, otherwise it's going to break
> Rawhide.
2022-08-18 10:27:44 +02:00
Yu Watanabe
af7a86b8a6 network/tuntap: save tun or tap file descriptor in fd store 2022-08-16 21:57:35 +09:00
Daan De Meyer
219fa78b5f units: Simplify container getty handling
Let's remove the baud settings for the container getty units since
they don't have any effect there anyway. On top of that, when we're
dealing with container TTYs, we can handle all the setup involved
ourselves so let's prevent agetty/login from touching the container
tty at all.

One example where this helps is that it actually makes disabling
TTYVHangup have an effect since before, login would unconditionally
call vhangup() on the tty.
2022-07-28 21:30:53 +02:00
Zbigniew Jędrzejewski-Szmek
45bcfcb36c units/initrd-parse-etc.service: only start units that are required
This makes use of the option switch that was added in the previous commit.
We used a pretty big hammer on a relatively small nail: we would do daemon-reload
and (in principle) allow any configuration to be changed. But in fact we only
made use of this in systemd-fstab-generator. systemd-fstab-generator filters
out all mountpoints except /usr and those marked with x-initrd.mount, i.e. on
a big majority of systems it wouldn't do anything.

Also, since systemd-fstab-generator first parses /proc/cmdline, and then
initrd's /etc/fstab, and only then /sysroot/etc/fstab, configuration in the
host would only matter if it the same mountpoint wasn't configured "earlier".
So the config in the host could be used for new mountpoints, but it couldn't
be used to amend configuration for existing mountpoints. And we wouldn't actually
remount anything, so mountpoints that were already mounted wouldn't be affected,
even if did change some config.

In the new scheme, we will parse /sysroot/etc/fstab and explicitly start
sysroot-usr.mount and other units that we just wrote. In most cases (as written
above), this will actually result in no units being created or started.

If the generator is invoked on a system with /sysroot/etc/fstab present,
behaviour is not changed and we'll create units as before. This is needed so
that if daemon-reload is later at some points, we don't "lose" those units.

There's a minor bugfix here: we honour x-initrd.mount for swaps, but we
wouldn't restart swap.target, i.e. the new swaps wouldn't necessarilly be
pulled in immediately.
2022-07-23 19:02:39 +02:00
Lennart Poettering
a0f4426d0f tmpfiles: automatically provision /etc/issue.d/ + /etc/motd.d/ + /etc/hosts from credentials 2022-07-21 00:06:22 +02:00
Lennart Poettering
1d77721f30 tmpfiles: accept additional tmpfiles lines via credential 2022-07-20 23:53:22 +02:00
Yu Watanabe
e1b45a756f tree-wide: fix typo 2022-07-20 13:15:37 +09:00
Lennart Poettering
3acb6edef3 sysusers: allow defining additional sysusers lines via credentials 2022-07-16 00:47:22 +09:00
Lennart Poettering
39f0d1d2e7 sysctl: also process sysctl requests via the "sysctl.extra" credential 2022-07-14 18:02:58 +02:00
Franck Bui
278e815bfa logind: don't delay login for root even if systemd-user-sessions.service is not activated yet
If for any reason something goes wrong during the boot process (most likely due
to a network issue), system admins should be allowed to log in to the system to
debug the problem. However due to the login session barrier enforced by
systemd-user-sessions.service for all users, logins for root will be delayed
until a (dbus) timeout expires. Beside being confusing, it's not a nice user
experience to wait for an indefinite period of time (no message is shown) this
and also suggests that something went wrong in the background.

The reason of this delay is due to the fact that all units involved in the
creation of a user session are ordered after systemd-user-sessions.service,
which is subject to network issues. If root needs to log in at that time,
logind is requested to create a new session (via pam_systemd), which ultimately
ends up waiting for systemd-user-session.service to be activated. This has the
bad side effect to block login for root until the dbus call done by pam_systemd
times out and the PAM stack proceeds anyways.

To solve this problem, this patch orders the session scope units and the user
instances only after systemd-user-sessions.service for unprivileged users only.
2022-07-12 22:54:39 +01:00
Zbigniew Jędrzejewski-Szmek
b8df7f8629 user: delegate cpu controller, assign weights to user slices
So far we didn't enable the cpu controller because of overhead of the
accounting. If I'm reading things correctly, delegation was enabled for a while
for the units with user and pam context set, i.e. for user@.service too.
a931ad47a8 added the explicit Delegate=yes|no
switch, but it was initially set to 'yes'.
acc8059129 disabled delegation for user@.service
with the justication that CPU accounting is expensive, but half a year later
a88c5b8ac4 changed DefaultCPUAccounting=yes for
kernels >=4.15 with the justification that CPU accounting is inexpensive there.

In my (very noncomprehensive) testing, I don't see a measurable overhead if the
cpu controller is enabled for user slices. I tried some repeated compilations,
and there is was no statistical difference, but the noise level was fairly
high. Maybe better benchmarking would reveal a difference.

The goal of this change is very simple: currently all of the user session,
including services like the display server and pipewire are under user@.service.
This means that when e.g. a compilation job is started in the session's
app.slice, the processes in session.slice compete for CPU and can be starved.
In particular, audio starts to stutter, etc. With CPU controller enabled,
I can start start 'ninja -C build -j40' in a tab and this doesn't have any
noticable effect on audio.

I don't think the particular values matter too much: the CPU controller is
work-convserving, and presumably the session slice would never need more than
e.g. one 1 full CPU, i.e. half or a quarter of available CPU resources on even
the smallest of today's machines. app.slice and session.slice are assigned
equal weights, background.slice is assigned a smaller fraction. CPUWeight=100
is the default, but I wrote it explicitly to make it easier for users to see
how the split is done. So effectively this should result in session.slice
getting as much power as it needs.

If if turns out that this does have a noticable overhead, we could make it
opt-in. But I think that the benefit to usability is important enough to enable
it by default. W/o something like this the session is not really usable with
background tasks.
2022-07-05 14:40:01 +02:00
nl6720
0e68582323 tree-wide: link to docs.kernel.org for kernel documentation
https://www.kernel.org/ links to https://docs.kernel.org/ for the documentation.
See https://git.kernel.org/pub/scm/docs/kernel/website.git/commit/?id=ebc1c372850f249dd143c6d942e66c88ec610520

These URLs are shorter and nicer looking.
2022-07-04 19:56:53 +02:00
Zbigniew Jędrzejewski-Szmek
2f8211c64a tree-wide: use html links for kernel docs
Instead of using "*.txt" as reference name, use the actual destination title.
2022-07-02 12:13:00 +02:00
Yu Watanabe
12bdeb58a6 unit: prioritize module devices
Also, prioritize tty and network devices.

Follow-up for 2336bde964

Fixes #23850.
2022-07-01 15:47:45 +02:00
Zbigniew Jędrzejewski-Szmek
74c4bd6b1a units: add IgnoreOnIsolate=yes to systemd-journald too
We already had it on the socket units, so it's possible that
systemd-journald.service would be stopped and then restarted when trafic hits
the sockets when something logs. Let's not try to stop it. It is supposed to
run until the end and be eventually killed in the final killing spree.

This might (or not) help with #23287.
2022-07-01 14:17:33 +09:00
Alban Bedel
9625350e53 units: remove the restart limit on the modprobe@.service
They are various cases where the same module might be repeatedly
loaded in a short time frame, for example if a service depending on a
module keep restarting, or if many instances of such service get
started at the same time. If this happend the modprobe@.service
instance will be marked as failed because it hit the restart limit.

Overall it doesn't seems to make much sense to have a restart limit on
the modprobe service so just disable it.

Fixes: #23742
2022-06-21 18:15:34 +02:00
Alexander Graf
70e74a5997 pstore: Run after modules are loaded
The systemd-pstore service takes pstore files on boot and transfers them
to disk. It only does it once on boot and only if it finds any. The typical
location of the pstore on modern systems is the UEFI variable store.

Most distributions ship with CONFIG_EFI_VARS_PSTORE=m. That means, the
UEFI variable store is only available on boot after the respective module
is loaded.

In most situations, the pstore service gets loaded before the UEFI pstore,
so we don't get to transfer logs. Instead, they accumulate, filling up the
pstore over time, potentially breaking the UEFI variable store.

Let's add a service dependency on any kernel module that can provide a
pstore to ensure we only scan for pstate after we can actually see pstate.

I have seen live occurences of systems breaking because we did not erase
the pstates and ran out of UEFI nvram space.

Fixes https://github.com/systemd/systemd/issues/18540
2022-06-14 10:17:20 +09:00
Benjamin Franzke
92897d768d tree-wide: replace obsolete wiki links with systemd.io/manpages
All wiki pages that contain a deprecation banner
pointing to systemd.io or manpages are updated to
point to their replacements directly.

Helpful command for identification of available links:
git grep freedesktop.org/wiki | \
    sed "s#.*\(https://www.freedesktop.org/wiki[^ $<'\\\")]*\)\(.*\)#\\1#" | \
    sort | uniq
2022-05-21 14:29:14 +02:00
Lennart Poettering
05681510c6 units: remove spurious empty line 2022-05-04 10:17:05 +02:00
Zbigniew Jędrzejewski-Szmek
8f04a1ca2b meson: also allow setting GIT_VERSION via templates
GIT_VERSION is not available as a config.h variable, because it's rendered
into version.h during builds. Let's rework jinja2 rendering to also
parse version.h. No functional change, the new variable is so far unused.

I guess this will make partial rebuilds a bit slower, but it's useful
to be able to use the full version string.
2022-04-05 22:18:31 +02:00
Yu Watanabe
2336bde964 unit: make systemd-udev-trigger.service use --prioritized-subsystem
Replaces #19637 and #22643.
2022-03-22 15:27:06 +09:00
Zbigniew Jędrzejewski-Szmek
c3fb1e43c1 spelling: weekday names are capitalized 2022-03-21 12:16:54 +01:00
Lennart Poettering
4a05d7ed72 unit: add units for new "systemd-sysupdate" tool
These unit (if enabled) will try to update the OS in regular intervals.
Moreover, every day in the early morning this will attempt to reboot the
system if there's a newer version installed than running.
2022-03-19 00:13:55 +01:00
Yu Watanabe
a1f4fd3876 udev: run the main process, workers, and spawned commands in /udev subcgroup
And enable cgroup delegation for udevd.
Then, processes invoked through ExecReload= are assigned .control
subcgroup, and they are not killed by cg_kill().

Fixes #16867 and #22686.
2022-03-17 20:24:38 +09:00
Vivien Didelot
7080df5c2e units: fix factory-reset.target description
The current description for the factory reset target does not add any
value and doesn't respect the definition of the related property as
described in systemd.unit(5).

Starting the target currently results in the following log:

    [   11.139174] systemd[1]: Reached target Target that triggers factory reset. Does nothing by default..
    [  OK  ] Reached target Target that…set. Does nothing by default..

Simply update the target description to "Factory Reset".

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
2022-03-14 22:39:32 +00:00
Lennart Poettering
047c2c14c5 units: drop After=systemd-resolved.service from systemd-nspawn@.service
resolved is now started as part of early boot hence we need no explicit
ordering anymore.
2022-02-24 10:37:11 +01:00
Lennart Poettering
29a8fbf49a units: move resolved to sysinit.target (from basic.target)
79a67f3ca4 pulled systemd-resolved.service
in from basic.target instead of multi-user.target, i.e. the idea is to
make it an early boot service, instead of a regular service.

However, early boot services are supposed to be in sysinit.target, not
basic.target (the latter is just one that combines the early boot
services in sysinit.target, the sockets in sockets.targt, the mounts in
local-fs.target and so on into one big target).

Also, the comit actually didn't add a synchronization point, i.e. not
Before=, so that the whole thing was racy.

Let's fix all that.

Follow-up for 79a67f3ca4
2022-02-24 10:36:47 +01:00
Yu Watanabe
6e4d122ad1 unit: escape %
Fixes #22601.
2022-02-23 06:54:54 +09:00
Lennart Poettering
b547838000 units: drop After=systemd-networkd.service from systemd-resolved.service
This ordering existed since resolved was first created, but there should
not be any need to order the two services against each other, as
resolved should be able to pick up networkd DNS metadata either way (as
it works with inotify in /run).

Let's drop this hence, and not cargo-cult this to eternity

Also see: https://github.com/systemd/systemd/pull/22389#issuecomment-1045978403
2022-02-23 06:52:39 +09:00