IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
The user manager connects to oomd over varlink. Currently, during
shutdown, if oomd is stopped before any user manager, the user
manager will try to reconnect to the socket, leading to a warning
from pid 1 about a conflicting transaction.
Let's fix this by ordering user@.service after systemd-oomd.service,
so that user sessions are stopped before systemd-oomd is stopped,
which makes sure that the user sessions won't try to start oomd via
its socket after systemd-oomd is stopped.
Let's color output when we're forwarding to the console. To make this
work, we inherit TERM from pid 1 and use it to decide whether we should
output colors or not.
This drops all mentions of gnu-efi and its manual build machinery. A
future commit will bring bootloader builds back. A new bootloader meson
option is now used to control whether to build sd-boot and its userspace
tooling.
Let's make things systematic: the per-user and the per-system manager
should manage their own memory pressure, as they are, well, managers of
things.
This is particularly relevant and the per-user service manager should
watch its own "init.scope" subcgroup, instead of the main service unit
cgroup, and hence $MEMORY_PRESSURE_WATCH as set by the per-system
service manager would simply be wrong.
These units are also present in the initrd, so instead of an assert,
just use a condition so they are skipped where they need to be skipped.
Fixes https://github.com/systemd/systemd/issues/26358
Config options are -Ddefault-timeout-sec= and -Ddefault-user-timeout-sec=.
Existing -Dupdate-helper-user-timeout= is renamed to -Dupdate-helper-user-timeout-sec=
for consistency. All three options take an integer value in seconds. The
renaming and type-change of the option is a small compat break, but it's just
at compile time and result in a clear error message. I also doubt that anyone was
actually using the option.
This commit separates the user manager timeouts, but keeps them unchanged at 90 s.
The timeout for the user manager is set to 4/3*user-timeout, which means that it
is still 120 s.
Fedora wants to experiment with lower timeouts, but doing this via a patch would
be annoying and more work than necessary. Let's make this easy to configure.
since we don't have systemd-pcrphase built anyway, which breaks the tests:
...
I: Attempting to install /usr/lib/systemd/systemd-networkd-wait-online (based on unit file reference)
I: Attempting to install /usr/lib/systemd/systemd-network-generator (based on unit file reference)
I: Attempting to install /usr/lib/systemd/systemd-oomd (based on unit file reference)
I: Attempting to install /usr/lib/systemd/systemd-pcrphase (based on unit file reference)
W: Failed to install '/usr/lib/systemd/systemd-pcrphase'
make: *** [Makefile:4: setup] Error 1
make: Leaving directory '/root/systemd/test/TEST-01-BASIC'
Follow-up to 04959faa63.
The systemd-growfs@.service units are currently written in full for each
file system to grow. Which is kinda pointless given that (besides an
optional ordering dep) they contain always the same definition. Let's
fix that and add a static template for this logic, that the generator
simply instantiates (and adds an ordering dep for).
This mimics how systemd-fsck@.service is handled. Similar to the wait
that for root fs there's a special instance systemd-fsck-root.service
we also add a special instance systemd-growfs-root.service for the root
fs, since it has slightly different deps.
Fixes: #20788
See: #10014
We want PCR 15 to be useful for binding per-system policy to. Let's
measure the machine ID into it, to ensure that every OS we can
distinguish will get a different PCR (even if the root disk encryption
key is already measured into it).
Before this patch the only way to prevent journald from reading the audit
messages was to mask systemd-journald-audit.socket. However this had main
drawback that downstream couldn't ship the socket disabled by default (beside
the fact that masking units is not supposed to be the usual way to disable
them).
Fixes#15777
We are basically already there, just need to add MONOTONIC_USEC= to the
RELOADING=1 message, and make sure the message is generated in really
all cases.
And send READY=1 again when we are done with it.
We do this not only for "daemon-reload" but also for "daemon-reexec" and
"switch-root", since from the perspective of an encapsulating service
manager these three operations are not that different.
This adds the same condition that systemd-networkd.service already
carries also to systemd-networkd-wait-online.service. Otherwise we'll
potentially see some logs we'd rather not see about a service we BindTo=
not running. Or in other words, if service X binds to Y then X should be
at least as conditioned as Y.
Note that this drops ProtectProc=invisible from
systemd-resolved.service.
This is done because othewise access to the booted "kernel" command line is not
necessarily available. That's because in containers we want to read
/proc/1/cmdline for that.
Fixes: #24103
This renames systemd-boot-system-token.service to
systemd-boot-random-seed.service and conditions it less strictly.
Previously, the job of the service was to write a "system token" EFI
variable if it was missing. It called "bootctl --graceful random-seed"
for that. With this change we condition it more liberally: instead of
calling it only when the "system token" EFI variable isn't set, we call
it whenever a boot loader interface compatible boot loader is used. This
means, previously it was invoked on the first boot only: now it is
invoked at every boot.
This doesn#t change the command that is invoked. That's because
previously already the "bootctl --graceful random-seed" did two things:
set the system token if not set yet *and* refresh the random seed in the
ESP. Previousy we put the focus on the former, now we shift the focus to
the latter.
With this simple change we can replace the logic
f913c784ad added, but from a service that
can run much later and doesn't keep the ESP pinned.
We want to make use of that when formatting file systems, hence let's
pull in these modules explicitly.
(This is necessary because we are an early boot service that might run
before systemd-tmpfiles-dev.service, which creates /dev/loop-control and
/dev/mapper/control.)
Alternatively we could just order ourselves after
systemd-tmpfiles-dev.service, but I think there's value in adding an
explicit minimal ordering here, since we know what we'll need.
Fixes: #25775
If everything points to the fact that TPM2 should work, but then the
driver fails to initialize we should handle this gracefully and not
cause failing services all over the place.
Fixes: #25700
We don't want systemd-networkd-wait-online to start if systemd-networkd
is skipped due to condition failures. This is only guaranteed by BindsTo=
and not Requires=, so let's use BindsTo=
sd-stub has an opportunity to handle the seed the same way sd-boot does,
which would have benefits for UKIs when sd-boot is not in use. This
commit wires that up.
It refactors the XBOOTLDR partition discovery to also find the ESP
partition, so that it access the random seed there.
Removing the virtualization check might not be the worst thing in the
world, and would potentially get many, many more systems properly seeded
rather than not seeded. There are a few reasons to consider this:
- In most QEMU setups and most guides on how to setup QEMU, a separate
pflash file is used for nvram variables, and this generally isn't
copied around.
- We're now hashing in a timestamp, which should provide some level of
differentiation, given that EFI_TIME has a nanoseconds field.
- The kernel itself will additionally hash in: a high resolution time
stamp, a cycle counter, RDRAND output, the VMGENID uniquely
identifying the virtual machine, any other seeds from the hypervisor
(like from FDT or setup_data).
- During early boot, the RNG is reseeded quite frequently to account for
the importance of early differentiation.
So maybe the mitigating factors make the actual feared problem
significantly less likely and therefore the pros of having file-based
seeding might outweigh the cons of weird misconfigured setups having a
hypothetical problem on first boot.
Rather than passing seeds up to userspace via EFI variables, pass seeds
directly to the kernel's EFI stub loader, via LINUX_EFI_RANDOM_SEED_TABLE_GUID.
EFI variables can potentially leak and suffer from forward secrecy
issues, and processing these with userspace means that they are
initialized much too late in boot to be useful. In contrast,
LINUX_EFI_RANDOM_SEED_TABLE_GUID uses EFI configuration tables, and so
is hidden from userspace entirely, and is parsed extremely early on by
the kernel, so that every single call to get_random_bytes() by the
kernel is seeded.
In order to do this properly, we use a bit more robust hashing scheme,
and make sure that each input is properly memzeroed out after use. The
scheme is:
key = HASH(LABEL || sizeof(input1) || input1 || ... || sizeof(inputN) || inputN)
new_disk_seed = HASH(key || 0)
seed_for_linux = HASH(key || 1)
The various inputs are:
- LINUX_EFI_RANDOM_SEED_TABLE_GUID from prior bootloaders
- 256 bits of seed from EFI's RNG
- The (immutable) system token, from its EFI variable
- The prior on-disk seed
- The UEFI monotonic counter
- A timestamp
This also adjusts the secure boot semantics, so that the operation is
only aborted if it's not possible to get random bytes from EFI's RNG or
a prior boot stage. With the proper hashing scheme, this should make
boot seeds safe even on secure boot.
There is currently a bug in Linux's EFI stub in which if the EFI stub
manages to generate random bytes on its own using EFI's RNG, it will
ignore what the bootloader passes. That's annoying, but it means that
either way, via systemd-boot or via EFI stub's mechanism, the RNG *does*
get initialized in a good safe way. And this bug is now fixed in the
efi.git tree, and will hopefully be backported to older kernels.
As the kernel recommends, the resultant seeds are 256 bits and are
allocated using pool memory of type EfiACPIReclaimMemory, so that it
gets freed at the right moment in boot.
As in most cases, tty device without input devices is meaningless.
This also swaps the priority of tty and net:
- input devices are often connected under USB bus, hence may take
slightly much time to be initialized. As, described in the above,
in most cases it is allowed that tty devices are initialized just
before input devices,
- network configuration usually requires much time, e.g. DHCP or RA,
hence it is better that network interfaces initialized. Then,
network services can start DHCP client or friends earlier.
Fixes#24026.
This adds two more phases to the PCR boot phase logic: "sysinit" +
"final".
The "sysinit" one is placed between sysinit.target and basic.target.
It's good to have a milestone in this place, since this is after all
file systems/LUKS volumes are in place (which sooner or later should
result in measurements of their own) and before services are started
(where we should be able to rely on them to be complete).
This is particularly useful to make certain secrets available for
mounting secondary file systems, but making them unavailable later.
This breaks API in a way (as measurements during runtime will change),
but given that the pcrphase stuff wasn't realeased yet should be OK.
With this, I can now easily do:
systemd-nspawn --load-credential=ssh.authorized_keys.root:/home/lennart/.ssh/authorized_keys --image=… --boot
To boot into an image with my SSH key copied in. Yay!
This partially reverts cabc1c6d7a.
The setting ProtectClock= implies DeviceAllow=, which is not suitable
for udevd. Although we are slowly removing cgropsv1 support, but
DeviceAllow= with cgroupsv1 is necessarily racy, and reloading PID1
during the early boot process may cause issues like #24668.
Let's disable ProtectClock= for udevd. And, if necessary, let's
explicitly drop CAP_SYS_TIME and CAP_WAKE_ALARM (and possibly others)
by using CapabilityBoundingSet= later.
Fixes#24668.
Normally we queue initrd-switch-root.target/isolate, which pulls in the
service via Wants= in the .target unit file. But if the service is instead
started directly, there may be nothing pulling in the target. Let's make
sure that the reference exists.
If we want to stop those services which would compete for access to
the console, we need to have an ordering so that they are actually
stopped before the other things starts, not asynchronously.
For shutdown, we queue shutdown.target/start, so in every unit which should be
stopped *before* shutdown, we need both Conflicts and an ordering dependency
with shutdown.target (either Before= or After= would work, because stop jobs
are always ordered before start jobs).
For initrd transition, we queue initrd-switch-root.service/isolate. This
automatically creates a /stop job for every running unit without
IgnoreOnIsolate. But no ordering dependency is created, unless the unit has a
(possibly transitive) ordering dependency on initrd-switch-root.service.
Since most units must stop before the transition, we should add the ordering
dependency. It is nicer to use Before=initrd-switch-root.target for this.
initrd-switch-root.target is ordered before initrd-switch-root.service, so
the effect it the same when both are in a transaction.
Fixes#23745.
To also cover the case where somebody is emergency mode in the initrd and
queues initrd-switch-root.service/start (not isolate), also add
Conflicts=initrd-switch-root.target, so various units are stopped properly.
This extends 2525682565 to cover all the other
services that are touched. It could be consider "operator error", but it's
easy to make and it's nicer if we can make this more foolproof.
The block is reordered and split to have:
1. description + documentation
2. (optionally) conditions
3. all the dependencies
I think it's easier to read the units this way.
Also, the Conflicts+Before is seperated out to separate lines.
The ordering dependency is "fake", because it could just as well be
After=, we are adding it to force ordering wrt. shutdown.target, and
it plays a different role than the other Before=, which are about a
real ordering on boot.
Commit 70e74a5997 ("pstore: Run after modules are loaded") added After=
and Wants= entries for all known kernel modules providing a pstore.
While adding these dependencies on systems where one of the modules is
not present, or not configured, should not have a real affect on the
system, it can produce annoying error messages in the kernel log. E.g.
"mtd device must be supplied (device name is empty)" when the mtdpstore
module is not configured correctly.
Since dependencies cannot be removed with drop-ins, if a distro wants to
remove some of these modules from systemd-pstore.service, they need to
patch units/systemd-pstore.service.in. On the other hand, if they want
to append to the dependencies this can be done by shipping a drop-in.
Since the original intent of the previous commit was to fix [1], which
only requires the efi_pstore module, remove all other kernel module
dependencies from systemd-pstore.service, and let distros ship drop-ins
to add dependencies if needed.
[1] https://github.com/systemd/systemd/issues/18540
Quoting https://github.com/systemd/systemd/pull/24054#issuecomment-1210501631:
> this would need a patch in dracut, specifically adding the
> systemd-sysroot-fstab-check to the list of installed stuff:
> fe8fa2b0ca/modules.d/00systemd/module-setup.sh (L47).
>
> I could do this manually in the CI (and I guess I'd have to do it anyway even
> if the patch lands in upstream, since it won't be available in C8S), but it
> should get there first before merging this PR, otherwise it's going to break
> Rawhide.
Let's remove the baud settings for the container getty units since
they don't have any effect there anyway. On top of that, when we're
dealing with container TTYs, we can handle all the setup involved
ourselves so let's prevent agetty/login from touching the container
tty at all.
One example where this helps is that it actually makes disabling
TTYVHangup have an effect since before, login would unconditionally
call vhangup() on the tty.
This makes use of the option switch that was added in the previous commit.
We used a pretty big hammer on a relatively small nail: we would do daemon-reload
and (in principle) allow any configuration to be changed. But in fact we only
made use of this in systemd-fstab-generator. systemd-fstab-generator filters
out all mountpoints except /usr and those marked with x-initrd.mount, i.e. on
a big majority of systems it wouldn't do anything.
Also, since systemd-fstab-generator first parses /proc/cmdline, and then
initrd's /etc/fstab, and only then /sysroot/etc/fstab, configuration in the
host would only matter if it the same mountpoint wasn't configured "earlier".
So the config in the host could be used for new mountpoints, but it couldn't
be used to amend configuration for existing mountpoints. And we wouldn't actually
remount anything, so mountpoints that were already mounted wouldn't be affected,
even if did change some config.
In the new scheme, we will parse /sysroot/etc/fstab and explicitly start
sysroot-usr.mount and other units that we just wrote. In most cases (as written
above), this will actually result in no units being created or started.
If the generator is invoked on a system with /sysroot/etc/fstab present,
behaviour is not changed and we'll create units as before. This is needed so
that if daemon-reload is later at some points, we don't "lose" those units.
There's a minor bugfix here: we honour x-initrd.mount for swaps, but we
wouldn't restart swap.target, i.e. the new swaps wouldn't necessarilly be
pulled in immediately.
If for any reason something goes wrong during the boot process (most likely due
to a network issue), system admins should be allowed to log in to the system to
debug the problem. However due to the login session barrier enforced by
systemd-user-sessions.service for all users, logins for root will be delayed
until a (dbus) timeout expires. Beside being confusing, it's not a nice user
experience to wait for an indefinite period of time (no message is shown) this
and also suggests that something went wrong in the background.
The reason of this delay is due to the fact that all units involved in the
creation of a user session are ordered after systemd-user-sessions.service,
which is subject to network issues. If root needs to log in at that time,
logind is requested to create a new session (via pam_systemd), which ultimately
ends up waiting for systemd-user-session.service to be activated. This has the
bad side effect to block login for root until the dbus call done by pam_systemd
times out and the PAM stack proceeds anyways.
To solve this problem, this patch orders the session scope units and the user
instances only after systemd-user-sessions.service for unprivileged users only.
So far we didn't enable the cpu controller because of overhead of the
accounting. If I'm reading things correctly, delegation was enabled for a while
for the units with user and pam context set, i.e. for user@.service too.
a931ad47a8 added the explicit Delegate=yes|no
switch, but it was initially set to 'yes'.
acc8059129 disabled delegation for user@.service
with the justication that CPU accounting is expensive, but half a year later
a88c5b8ac4 changed DefaultCPUAccounting=yes for
kernels >=4.15 with the justification that CPU accounting is inexpensive there.
In my (very noncomprehensive) testing, I don't see a measurable overhead if the
cpu controller is enabled for user slices. I tried some repeated compilations,
and there is was no statistical difference, but the noise level was fairly
high. Maybe better benchmarking would reveal a difference.
The goal of this change is very simple: currently all of the user session,
including services like the display server and pipewire are under user@.service.
This means that when e.g. a compilation job is started in the session's
app.slice, the processes in session.slice compete for CPU and can be starved.
In particular, audio starts to stutter, etc. With CPU controller enabled,
I can start start 'ninja -C build -j40' in a tab and this doesn't have any
noticable effect on audio.
I don't think the particular values matter too much: the CPU controller is
work-convserving, and presumably the session slice would never need more than
e.g. one 1 full CPU, i.e. half or a quarter of available CPU resources on even
the smallest of today's machines. app.slice and session.slice are assigned
equal weights, background.slice is assigned a smaller fraction. CPUWeight=100
is the default, but I wrote it explicitly to make it easier for users to see
how the split is done. So effectively this should result in session.slice
getting as much power as it needs.
If if turns out that this does have a noticable overhead, we could make it
opt-in. But I think that the benefit to usability is important enough to enable
it by default. W/o something like this the session is not really usable with
background tasks.
We already had it on the socket units, so it's possible that
systemd-journald.service would be stopped and then restarted when trafic hits
the sockets when something logs. Let's not try to stop it. It is supposed to
run until the end and be eventually killed in the final killing spree.
This might (or not) help with #23287.
They are various cases where the same module might be repeatedly
loaded in a short time frame, for example if a service depending on a
module keep restarting, or if many instances of such service get
started at the same time. If this happend the modprobe@.service
instance will be marked as failed because it hit the restart limit.
Overall it doesn't seems to make much sense to have a restart limit on
the modprobe service so just disable it.
Fixes: #23742
The systemd-pstore service takes pstore files on boot and transfers them
to disk. It only does it once on boot and only if it finds any. The typical
location of the pstore on modern systems is the UEFI variable store.
Most distributions ship with CONFIG_EFI_VARS_PSTORE=m. That means, the
UEFI variable store is only available on boot after the respective module
is loaded.
In most situations, the pstore service gets loaded before the UEFI pstore,
so we don't get to transfer logs. Instead, they accumulate, filling up the
pstore over time, potentially breaking the UEFI variable store.
Let's add a service dependency on any kernel module that can provide a
pstore to ensure we only scan for pstate after we can actually see pstate.
I have seen live occurences of systems breaking because we did not erase
the pstates and ran out of UEFI nvram space.
Fixes https://github.com/systemd/systemd/issues/18540
All wiki pages that contain a deprecation banner
pointing to systemd.io or manpages are updated to
point to their replacements directly.
Helpful command for identification of available links:
git grep freedesktop.org/wiki | \
sed "s#.*\(https://www.freedesktop.org/wiki[^ $<'\\\")]*\)\(.*\)#\\1#" | \
sort | uniq
GIT_VERSION is not available as a config.h variable, because it's rendered
into version.h during builds. Let's rework jinja2 rendering to also
parse version.h. No functional change, the new variable is so far unused.
I guess this will make partial rebuilds a bit slower, but it's useful
to be able to use the full version string.
These unit (if enabled) will try to update the OS in regular intervals.
Moreover, every day in the early morning this will attempt to reboot the
system if there's a newer version installed than running.
And enable cgroup delegation for udevd.
Then, processes invoked through ExecReload= are assigned .control
subcgroup, and they are not killed by cg_kill().
Fixes#16867 and #22686.
The current description for the factory reset target does not add any
value and doesn't respect the definition of the related property as
described in systemd.unit(5).
Starting the target currently results in the following log:
[ 11.139174] systemd[1]: Reached target Target that triggers factory reset. Does nothing by default..
[ OK ] Reached target Target that…set. Does nothing by default..
Simply update the target description to "Factory Reset".
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
79a67f3ca4 pulled systemd-resolved.service
in from basic.target instead of multi-user.target, i.e. the idea is to
make it an early boot service, instead of a regular service.
However, early boot services are supposed to be in sysinit.target, not
basic.target (the latter is just one that combines the early boot
services in sysinit.target, the sockets in sockets.targt, the mounts in
local-fs.target and so on into one big target).
Also, the comit actually didn't add a synchronization point, i.e. not
Before=, so that the whole thing was racy.
Let's fix all that.
Follow-up for 79a67f3ca4
This ordering existed since resolved was first created, but there should
not be any need to order the two services against each other, as
resolved should be able to pick up networkd DNS metadata either way (as
it works with inotify in /run).
Let's drop this hence, and not cargo-cult this to eternity
Also see: https://github.com/systemd/systemd/pull/22389#issuecomment-1045978403
This is a follow-up for d5ee050ffc, and
reintroduces a requirement dep from systemd-journal-flush.service onto
systemd-journald.service, but a weaker one than originally: a Wants= one
instead of a Requires= one.
Why? Simply because the service issues an IPC call to the journald,
hence it should pull it in. (Note that socket activation doesn't happen
for the Varlink socket it uses, hence we should pull in the service
itself.)
The systemd-oomd.service unit contains
[Install]
WantedBy=multi-user.target
Alias=dbus-org.freedesktop.oom1.service
which means the symlink is supposed to be created dynamically when the
service is enabled.
In the olden days systemd-resolved used dbus and it didn't make sense to start
it before dbus which is started fairly late. But we have mostly ported resolved
over to varlink. The queries from nss-resolve are done using varlink, so name
resolution can work without dbus. resolvectl still uses dbus, so e.g. 'resolvectl
query' will not work, but by starting systemd-resolved earlier we're not making this
any worse.
If systemd-resolved is started after dbus, it registers the name and everything
is fine. If it is started before dbus, it'll watch for the dbus socket and
connect later. So it should be fine to start systemd-resolved earlier. (If dbus
is stopped and restarted, unfortunately systemd-resolved does not reconnect.
This seems to be a small bug: since our daemons know how to watch for
dbus.socket, they could restart the watch if they ever lose the connection. But
this scenario shouldn't happen in normal boot, and restarting dbus is not
supported anyway.)
Moving the start earlier the following advantages:
- name resolution becomes availabe earlier, in particular for synthesized
hostnames even before the network is up.
- basic.target is part of initrd.target, so systemd-resolved will get started
in the initrd if installed. This is required for nfs-root when the server is
specified using a name (https://bugzilla.redhat.com/show_bug.cgi?id=2037311).
Otherwise, systemd-homed-active.service will fail to deactivate all
homes because homectl can no longer talk to homed if dbus stops first.
As a result, /home cannot be umounted.
Doing this on systemd-homed-active.service instead works as well, but
systemd-homed will exit 1 if dbus is already shut down.
It is used by udevd and networkd. Since udevd is enabled statically, let's also
change the preset to "on". networkd is opt-in, so let's pull in the generator
when enabling networkd too.
Fixes#21626. (The bug report talks about /run, but the issue is actually with
/tmp.) People use /tmp for various things that fit in memory, e.g. unpacking
packages, and 400k is not much. Let's raise is a bit.
Programs run by udev triggers may need to execute the bpf() syscall. Even more
so, since on a cgroup v2 system, the only way to set up device access filtering
is to install a BPF program on the cgroup in question and one way of passing
data to such program is through BPF maps, which can only be access using the
bpf() syscall. One such use case was identified in RHBZ#2025264 related to
snap-device-helper, and led to RHBZ#2027627 being filed.
Unfortunately there is no finer grained control over what gets passed in the
syscall, so just enable bpf() and leave fine grained mediation to other
security layers (eg. SELinux).
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2027627
Signed-off-by: Maciek Borzecki <maciek.borzecki@gmail.com>
Due to the fact that systemd-journal-flush.service has
"Requires=systemd-journald.service", this service is stopped too when journald
is requested to do so.
However stopping systemd-journal-flush.service implies that journald
relinquishes /var hence implicitly switching back to the volatile storage
mode and removing /run/systemd/journal/flushed.
If journald is started afterwards, it will run in volatile storage mode
regardless of the value of 'Storage=' as it believes now that /var is not yet
ready (because the flushed flag is missing).
Because this flag is mainly an indication for journald that the initialization
of /var/log/journal (during the boot process) has been done,
systemd-journal-flush.service shouldn't be tied to the state of journald itself
but to the state of /var/log/journal, hence to the state of the system.
Parsing objects is risky as data could be malformed or malicious,
so avoid doing that from the main systemd-coredump process and
instead fork another process, and set it to avoid generating
core files itself.
Users may use rules that refer to binaries e.g. in /opt or /usr/local,
and those directories may be separate mount points. We don't need the
binfmt rules in early boot, so let's delay the service so that we can
rely on the full local filesystem being visible.
Fixes#21178.
When using "capture : true" in custom_target()s the mode of the source
file is not preserved when the generated file is not installed and so
needs to be tweaked manually. Switch from output capture to creating the
target file and copy the permissions from the input file.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
If the tty arg is set to "-", agetty uses the stdin fd as the tty.
Let's pass the tty this way so that we keep an fd open to the tty
at all times. If all fd's to a tty are closed, the kernel might
reset the tty which we want to avoid.
This adds support for dm integrity targets and an associated
/etc/integritytab file which is required as the dm integrity device
super block doesn't include all of the required metadata to bring up
the device correctly. See integritytab man page for details.
Let's make it slightly more likely that a per-user service manager is
killed than any system service. We use a conservative 100 (from a range
that goes all the way to 1000).
Replaces: #17426
Together with the previous commit this means: system manager and system
services are placed at OOM score adjustment 0 (specifically: they
inherit kernel default of 0). User service manager (both for root and
non-root) are placed at 100. User services for non-root are placed at
200, those for root inherit 100.
Note that processes forked off the user *sessions* (i.e. not forked off
the per-user service manager) remain at 0 (e.g. the shell process
created by a tty or ssh login). This probably should be
addressed too one day (maybe in pam_systemd?), but is not covered here.
Compared to PID1 where systemd-oomd has to be the client to PID1
because PID1 is a more privileged process than systemd-oomd, systemd-oomd
is the more privileged process compared to a user manager so we have
user managers be the client whereas systemd-oomd is now the server.
The same varlink protocol is used between user managers and systemd-oomd
to deliver ManagedOOM property updates. systemd-oomd now sets up a varlink
server that user managers connect to to send ManagedOOM property updates.
We also add extra validation to make sure that non-root senders don't
send updates for cgroups they don't own.
The integration test was extended to repeat the chill/bloat test using
a user manager instead of PID1.
In 2020 mount.cifs started to require a bunch for caps to work. let's
add them to the capability bounding set.
Also, SMB support obviously needs network access, hence open that up.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1962920
Normally, these services are killed because we run isolate. But I booted into
emergency mode (because of a futher bug with us timing out improperly on the
luks password prompt), and then continuted to the host system by running
'systemctl start systemd-switch-root.service'. My error, but the results are
confusing and bad: systemd in the host sees 'systemd-tmpfiles-setup.service'
as started successfully, and doesn't restart it, so the setup for /tmp/.X11 is
not done and gdm.service fails. So while we wouldn't encounter this during
normal successful boot, I think it's good to make this more robust.
The dep is added to systemd-tmpfiles-{setup,clean}, because /tmp is not
propagated over switch-root. /dev is, so I didn't touch
systemd-tmpfiles-setup-dev.service.
Boot loaders are software like any other, and hence muse be updated in
regular intervals. Let's add a simple (optional) service that updates
sd-boot automatically from the host if it is found installed but
out-of-date in the ESP.
Note that traditional distros probably should invoke "bootctl update"
directly from the package scripts whenver they update the sd-boot
package. This new service is primarily intended for image-based update
systems, i.e. where the rootfs or /usr are atomically updated in A/B
style and where the current boot loader should be synced into the ESP
from the currently booted image every now and then. It can also act as
safety net if the packaging scripts in classic systems are't doing the
bootctl update stuff themselves.
Since updating boot loaders mit be a tiny bit risky (even though we try
really hard to make them robust, by fsck'ing the ESP and mounting it only on
demand, by doing updates mostly as single file updates and by fsync()ing
heavily) this is an optional feature, i.e. subject to "systemctl
enable". However, since it's the right thing to do I think, it's enabled
by default via the preset logic.
Note that the updating logic is implemented gracefully: i.e. it's a NOP
if the boot loader is already new enough, or was never installed.
"Update about" is not gramatically correct. I also think saying "Record" makes
this easier to understand for people who don't necessarilly know what UTMP is.
In general, it's not very usuful to repeat the unit name as the description.
Especially when the word is a common name and if somebody doesn't understand
the meaning immediately, they are not going to gain anything from the
repeat either, e.g. "halt", "swap".
In the status-unit-format=combined output parentheses are used around
Description, so avoid using parenthesis in the Description itself.
Since d8f9686c0f we use the chattr +i flag
for marking containers in directories as reead-only. But to do so we
need the cap for it, hence grant it.
Fixes: #19115
I'm working on building initramfs images directly from normal packages, and it
doesn't make sense for those units to be started. Pristine system rpms need to
behave correctly as much as possible also in the initrd, and those units are
enabled by the rpms. There usually isn't enough time for the timer to actually
fire, but starting it gives a line on the console and generally looks confusing
and sloppy. Flushing the journal means that its actually lost, since the real
/var is not available yet.
Another approach would be not enable those units, but right now they are
statically enabled, and changing that would be more work, and doesn't really
seem necessary, since the condition checks are very quick.
Checking for /etc/initrd-release is the standard condition that the initrd
units use, so let's do the same here.
The comment talks about upstream development steps and doesn't make
sense for users. We used special '## ' syntax to strip it out during
build, but it got inadvertently reformatted as a normal comment
in 3982becc92.
We don't need two (and half) templating systems anymore, yay!
I'm keeping the changes minimal, to make the diff manageable. Some enhancements
due to a better templating system might be possible in the future.
For handling of '## ' — see the next commit.
Old meson fails with:
Element not a string: [<Holder: <ExternalProgram 'sh' -> ['/bin/sh']>>, '-c', 'test -n "$DESTDIR" || /bin/journalctl --update-catalog']
I'm doing it as a revert so that it's easy to undo the revert when we require
newer meson. The effect is not so bad, maybe a dozen or so lines about finding
'sh'.
Meson 0.58 has gotten quite bad with emitting a message every time
a quoted command is used:
Program /home/zbyszek/src/systemd-work/tools/meson-make-symlink.sh found: YES (/home/zbyszek/src/systemd-work/tools/meson-make-symlink.sh)
Program sh found: YES (/usr/bin/sh)
Program sh found: YES (/usr/bin/sh)
Program sh found: YES (/usr/bin/sh)
Program sh found: YES (/usr/bin/sh)
Program sh found: YES (/usr/bin/sh)
Program sh found: YES (/usr/bin/sh)
Program xsltproc found: YES (/usr/bin/xsltproc)
Configuring custom-entities.ent using configuration
Message: Skipping bootctl.1 because ENABLE_EFI is false
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Message: Skipping journal-remote.conf.5 because HAVE_MICROHTTPD is false
Message: Skipping journal-upload.conf.5 because HAVE_MICROHTTPD is false
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Message: Skipping loader.conf.5 because ENABLE_EFI is false
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
Program ln found: YES (/usr/bin/ln)
...
Let's suffer one message only for each command. Hopefully we can silence
even this when https://github.com/mesonbuild/meson/issues/8642 is
resolved.
This reverts commit 7c20dd4b6e.
Debian has now been updated to patch the issue, so SemaphoreCI should
no longer fail. The fix has also been backported to the affected
stable branches.
Otherwise a coredump started at the inconvinient moment can stop
shutdown.target leaving the system in a halfway-down state:
Pulling in shutdown.target/start from systemd-poweroff.service/start
Added job shutdown.target/start to transaction.
...
Keeping job shutdown.target/start because of systemd-poweroff.service/start
...
[ OK ] Stopped target Remote File Systems.
shutdown.target: starting held back, waiting for: systemd-networkd.socket
sysinit.target: stopping held back, waiting for: remount_tmp.service
systemd-coredump.socket: Incoming traffic
...
systemd-coredump@0-243-0.service: Trying to enqueue job systemd-coredump@0-243-0.service/start/replace
Added job systemd-coredump@0-243-0.service/start to transaction.
Pulling in systemd-journald.socket/start from systemd-coredump@0-243-0.service/start
Added job systemd-journald.socket/start to transaction.
Pulling in system.slice/start from systemd-journald.socket/start
Added job system.slice/start to transaction.
Pulling in -.slice/start from system.slice/start
Added job -.slice/start to transaction.
Pulling in system-systemd\x2dcoredump.slice/start from systemd-coredump@0-243-0.service/start
Added job system-systemd\x2dcoredump.slice/start to transaction.
Pulling in system.slice/start from system-systemd\x2dcoredump.slice/start
Pulling in shutdown.target/stop from system-systemd\x2dcoredump.slice/start
Added job shutdown.target/stop to transaction.
...
Keeping job systemd-poweroff.service/stop because of umount.target/stop
Keeping job shutdown.target/stop because of systemd-coredump@0-243-0.service/start
This changes the fstab-generator to handle mounting of /usr/ a bit
differently than before. Instead of immediately mounting the fs to
/sysroot/usr/ we'll first mount it to /sysusr/usr/ and then add a
separate bind mount that mounts it from /sysusr/usr/ to /sysroot/usr/.
This way we can access /usr independently of the root fs, without for
waiting to be mounted via the /sysusr/ hierarchy. This is useful for
invoking systemd-repart while a root fs doesn't exist yet and for
creating it, with partition data read from the /usr/ hierarchy.
This introduces a new generic target initrd-usr-fs.target that may be
used to generically order services against /sysusr/ to become available.
systemd-networkd.socket can re-start systemd-networkd.service in
shutdown and by doing this even stop shutdown.target leaving the
system in halfway-down state.
Fixes#4955.
Single-param LoadCredential= in units causes systemd v247/v248 to
assert when parsing. Disable it for now, until the fix is merged
in the stable trees, released and available (eg: in Debian
for the CI)
See: https://github.com/systemd/systemd/issues/19178
With 8f20232fcb systemd-localed supports
generating locales when required. This fails if the locale directory is
read-only, so make it writable.
Closes#19138
Let's make use of our own credentials infrastructure in our tools: let's
hook up systemd-sysusers with the credentials logic, so that the root
password can be provisioned this way. This is really useful when working
with stateless systems, in particular nspawn's "--volatile=yes" switch,
as this works now:
# systemd-nspawn -i foo.raw --volatile=yes --set-credential=passwd.plaintext-password:foo
For the first time we have a nice, non-interactive way to provision the
root password for a fully stateless system from the container manager.
Yay!
We have a chicken and egg problem: validation of DNSSEC signatures
doesn't work without a correct clock, but to set the correct clock we
need to contact NTP servers which requires resolving a hostname, which
would normally require DNSSEC validation.
Let's break the cycle by excluding NTP hostname resolution from
validation for now.
Of course, this leaves NTP traffic unprotected. To cover that we need
NTPSEC support, which we can add later.
Fixes: #5873#15607
Even though many of those scripts are very simple, it is easier to include
the header than to try to say whether each of those files is trivial enough
not to require one.
We'll leave this as opt-in (i.e. a unit that must be enabled
explicitly), since this is supposed to be a debug/developer feature
primarily, and thus no be around in regular production systems.
This adds the support for veritytab.
The veritytab file contains at most five fields, the first four are
mandatory, the last one is optional:
- The first field contains the name of the resulting verity volume; its
block device is set up /dev/mapper/</filename>.
- The second field contains a path to the underlying block data device,
or a specification of a block device via UUID= followed by the UUID.
- The third field contains a path to the underlying block hash device,
or a specification of a block device via UUID= followed by the UUID.
- The fourth field is the roothash in hexadecimal.
- The fifth field, if present, is a comma-delimited list of options.
The following options are recognized only: ignore-corruption,
restart-on-corruption, panic-on-corruption, ignore-zero-blocks,
check-at-most-once and root-hash-signature. The others options will
be implemented later.
Also, this adds support for the new kernel verity command line boolean
option "veritytab" which enables the read for veritytab, and the new
environment variable SYSTEMD_VERITYTAB which sets the path to the file
veritytab to read.
Instead of invoking meson-add-wants.sh once for each wants that has
to be added, we pass all wants to a single invocation of
meson-add-wants.sh and in meson-add-wants.sh, loop over the
arguments.
This saves about 300ms on the install step.
Before:
```
‣ Running build script...
[1/418] Generating version.h with a custom command
Installing /root/build/po/be.gmo to /root/dest/usr/share/locale/be/LC_MESSAGES/systemd.mo
Installing /root/build/po/be@latin.gmo to /root/dest/usr/share/locale/be@latin/LC_MESSAGES/systemd.mo
Installing /root/build/po/bg.gmo to /root/dest/usr/share/locale/bg/LC_MESSAGES/systemd.mo
Installing /root/build/po/ca.gmo to /root/dest/usr/share/locale/ca/LC_MESSAGES/systemd.mo
Installing /root/build/po/cs.gmo to /root/dest/usr/share/locale/cs/LC_MESSAGES/systemd.mo
Installing /root/build/po/da.gmo to /root/dest/usr/share/locale/da/LC_MESSAGES/systemd.mo
Installing /root/build/po/de.gmo to /root/dest/usr/share/locale/de/LC_MESSAGES/systemd.mo
Installing /root/build/po/el.gmo to /root/dest/usr/share/locale/el/LC_MESSAGES/systemd.mo
Installing /root/build/po/es.gmo to /root/dest/usr/share/locale/es/LC_MESSAGES/systemd.mo
Installing /root/build/po/fr.gmo to /root/dest/usr/share/locale/fr/LC_MESSAGES/systemd.mo
Installing /root/build/po/gl.gmo to /root/dest/usr/share/locale/gl/LC_MESSAGES/systemd.mo
Installing /root/build/po/hr.gmo to /root/dest/usr/share/locale/hr/LC_MESSAGES/systemd.mo
Installing /root/build/po/hu.gmo to /root/dest/usr/share/locale/hu/LC_MESSAGES/systemd.mo
Installing /root/build/po/id.gmo to /root/dest/usr/share/locale/id/LC_MESSAGES/systemd.mo
Installing /root/build/po/it.gmo to /root/dest/usr/share/locale/it/LC_MESSAGES/systemd.mo
Installing /root/build/po/ja.gmo to /root/dest/usr/share/locale/ja/LC_MESSAGES/systemd.mo
Installing /root/build/po/ko.gmo to /root/dest/usr/share/locale/ko/LC_MESSAGES/systemd.mo
Installing /root/build/po/lt.gmo to /root/dest/usr/share/locale/lt/LC_MESSAGES/systemd.mo
Installing /root/build/po/pl.gmo to /root/dest/usr/share/locale/pl/LC_MESSAGES/systemd.mo
Installing /root/build/po/pt_BR.gmo to /root/dest/usr/share/locale/pt_BR/LC_MESSAGES/systemd.mo
Installing /root/build/po/ro.gmo to /root/dest/usr/share/locale/ro/LC_MESSAGES/systemd.mo
Installing /root/build/po/ru.gmo to /root/dest/usr/share/locale/ru/LC_MESSAGES/systemd.mo
Installing /root/build/po/sk.gmo to /root/dest/usr/share/locale/sk/LC_MESSAGES/systemd.mo
Installing /root/build/po/sr.gmo to /root/dest/usr/share/locale/sr/LC_MESSAGES/systemd.mo
Installing /root/build/po/sv.gmo to /root/dest/usr/share/locale/sv/LC_MESSAGES/systemd.mo
Installing /root/build/po/tr.gmo to /root/dest/usr/share/locale/tr/LC_MESSAGES/systemd.mo
Installing /root/build/po/uk.gmo to /root/dest/usr/share/locale/uk/LC_MESSAGES/systemd.mo
Installing /root/build/po/zh_CN.gmo to /root/dest/usr/share/locale/zh_CN/LC_MESSAGES/systemd.mo
Installing /root/build/po/zh_TW.gmo to /root/dest/usr/share/locale/zh_TW/LC_MESSAGES/systemd.mo
Installing /root/build/po/pa.gmo to /root/dest/usr/share/locale/pa/LC_MESSAGES/systemd.mo
real 0m1.465s
user 0m1.025s
sys 0m0.426s
```
After:
```
‣ Running build script...
[1/418] Generating version.h with a custom command
Installing /root/build/po/be.gmo to /root/dest/usr/share/locale/be/LC_MESSAGES/systemd.mo
Installing /root/build/po/be@latin.gmo to /root/dest/usr/share/locale/be@latin/LC_MESSAGES/systemd.mo
Installing /root/build/po/bg.gmo to /root/dest/usr/share/locale/bg/LC_MESSAGES/systemd.mo
Installing /root/build/po/ca.gmo to /root/dest/usr/share/locale/ca/LC_MESSAGES/systemd.mo
Installing /root/build/po/cs.gmo to /root/dest/usr/share/locale/cs/LC_MESSAGES/systemd.mo
Installing /root/build/po/da.gmo to /root/dest/usr/share/locale/da/LC_MESSAGES/systemd.mo
Installing /root/build/po/de.gmo to /root/dest/usr/share/locale/de/LC_MESSAGES/systemd.mo
Installing /root/build/po/el.gmo to /root/dest/usr/share/locale/el/LC_MESSAGES/systemd.mo
Installing /root/build/po/es.gmo to /root/dest/usr/share/locale/es/LC_MESSAGES/systemd.mo
Installing /root/build/po/fr.gmo to /root/dest/usr/share/locale/fr/LC_MESSAGES/systemd.mo
Installing /root/build/po/gl.gmo to /root/dest/usr/share/locale/gl/LC_MESSAGES/systemd.mo
Installing /root/build/po/hr.gmo to /root/dest/usr/share/locale/hr/LC_MESSAGES/systemd.mo
Installing /root/build/po/hu.gmo to /root/dest/usr/share/locale/hu/LC_MESSAGES/systemd.mo
Installing /root/build/po/id.gmo to /root/dest/usr/share/locale/id/LC_MESSAGES/systemd.mo
Installing /root/build/po/it.gmo to /root/dest/usr/share/locale/it/LC_MESSAGES/systemd.mo
Installing /root/build/po/ja.gmo to /root/dest/usr/share/locale/ja/LC_MESSAGES/systemd.mo
Installing /root/build/po/ko.gmo to /root/dest/usr/share/locale/ko/LC_MESSAGES/systemd.mo
Installing /root/build/po/lt.gmo to /root/dest/usr/share/locale/lt/LC_MESSAGES/systemd.mo
Installing /root/build/po/pl.gmo to /root/dest/usr/share/locale/pl/LC_MESSAGES/systemd.mo
Installing /root/build/po/pt_BR.gmo to /root/dest/usr/share/locale/pt_BR/LC_MESSAGES/systemd.mo
Installing /root/build/po/ro.gmo to /root/dest/usr/share/locale/ro/LC_MESSAGES/systemd.mo
Installing /root/build/po/ru.gmo to /root/dest/usr/share/locale/ru/LC_MESSAGES/systemd.mo
Installing /root/build/po/sk.gmo to /root/dest/usr/share/locale/sk/LC_MESSAGES/systemd.mo
Installing /root/build/po/sr.gmo to /root/dest/usr/share/locale/sr/LC_MESSAGES/systemd.mo
Installing /root/build/po/sv.gmo to /root/dest/usr/share/locale/sv/LC_MESSAGES/systemd.mo
Installing /root/build/po/tr.gmo to /root/dest/usr/share/locale/tr/LC_MESSAGES/systemd.mo
Installing /root/build/po/uk.gmo to /root/dest/usr/share/locale/uk/LC_MESSAGES/systemd.mo
Installing /root/build/po/zh_CN.gmo to /root/dest/usr/share/locale/zh_CN/LC_MESSAGES/systemd.mo
Installing /root/build/po/zh_TW.gmo to /root/dest/usr/share/locale/zh_TW/LC_MESSAGES/systemd.mo
Installing /root/build/po/pa.gmo to /root/dest/usr/share/locale/pa/LC_MESSAGES/systemd.mo
real 0m1.162s
user 0m0.803s
sys 0m0.338s
```
systemd-timesyncd.service only applies the much weaker monotonic clock
from file logic, i.e should pull in and order itself before
time-set.target. The strong time-sync.target unit is pulled in by
systemd-time-wait-sync.service.
In hostnamed this is exposed as a dbus property, and in the logs in both
places.
This is of interest to network management software and such: if the fallback
hostname is used, it's not as useful as the real configured thing. Right now
various programs try to guess the source of hostname by looking at the string.
E.g. "localhost" is assumed to be not the real hostname, but "fedora" is. Any
such attempts are bound to fail, because we cannot distinguish "fedora" (a
fallback value set by a distro), from "fedora" (received from reverse dns),
from "fedora" read from /etc/hostname.
/run/systemd/fallback-hostname is written with the fallback hostname when
either pid1 or hostnamed sets the kernel hostname to the fallback value. Why
remember the fallback value and not the transient hostname in /run/hostname
instead?
We have three hostname types: "static", "transient", fallback".
– Distinguishing "static" is easy: the hostname that is set matches what
is in /etc/hostname.
– Distingiushing "transient" and "fallback" is not easy. And the
"transient" hostname may be set outside of pid1+hostnamed. In particular,
it may be set by container manager, some non-systemd tool in the initramfs,
or even by a direct call. All those mechanisms count as "transient". Trying
to get those cases to write /run/hostname is futile. It is much easier to
isolate the "fallback" case which is mostly under our control.
And since the file is only used as a flag to mark the hostname as fallback,
it can be hidden inside of our /run/systemd directory.
For https://bugzilla.redhat.com/show_bug.cgi?id=1892235.
MESON_INSTALL_QUIET is set when --quiet is passed to meson install.
Make sure we check the variable in our custom install scripts and
don't output anything if it is set.
Commit 42cc2855ba incorrectly removed the condition on sysfs in both
sys-fs-fuse-connections.mount and sys-kernel-config.mount. However there are
still needed in case modprobe of one of these modules is intentionally skipped
(due to lack of privs for example).
This patch restores the 2 conditions which should be safe for the common case,
since all conditions are only checked after all deps ordered before are
complete.
Follow-up for 42cc2855ba.
udev requests to start the fs mount units when their respective module is
loaded. For that it monitors uevents of type "ADD" for the relevant fs modules.
However the uevent is sent by the kernel too early, ie before the init() of the
module is called hence before directories in /sys/fs/ are created.
This patch workarounds adds "Requires/After=modprobe@<fs-module>.service" to
the mount unit, which means that modprobe(8) will be called once the fs module
is announced to be loaded. This sounds pointless, but given that modprobe only
returns after the initialization of the module is complete, it should
workaround the issue.
As a side effect, the module will be automatically loaded if the mount unit is
started manually.
Fixes#17586.
This reverts commit 9cbf1e58f9.
The presence of /sys/module/%I directory can't be used to assert that the load
of a given module is complete and therefore the call to modprobe(8) can be
skipped. Indeed this directory is created before the init() function of the
module is called.
Users of modprobe@.service needs to be sure that once this service returns the
module is fully operational.
This is useful for development where overwriting files out side
the configured prefix will affect the host as well as stateless
systems such as NixOS that don't let packages install to /etc but handle
configuration on their own.
Alternative to https://github.com/systemd/systemd/pull/17501
tested with:
$ mkdir inst build && cd build
$ meson \
-Dcreate-log-dirs=false \
-Dsysvrcnd-path=$(realpath ../inst)/etc/rc.d \
-Dsysvinit-path=$(realpath ../inst)/etc/init.d \
-Drootprefix=$(realpath ../inst) \
-Dinstall-sysconfdir=false \
--prefix=$(realpath ../inst) ..
$ ninja install
To make things simple and robust when debugging journald, we'll leave
the SO_TIMESTAMP invocations in the C code in place, even if they are
now typically redundant, given that the sockets are already passed into
the process with SO_TIMESTAMP turned on now.
[zjs: Replaces #17149.
I took half of the patch in
https://github.com/systemd/systemd/pull/17149#issuecomment-698399194,
hence I'm keeping Jonathan's authorship.
The original reasoning for 6c5496c492 was that we
enable remote-cryptsetup.target via presets, and since presets are not used for
the initrd, we need a different target. But since parts of the unit and target
tree are shared between the initramfs and the main system, we can't just create
a separate target for the initramfs. All the targets that depend on this one
would need to be split also. That condition is true for initrd-fs.target, but
not for sysinit.target.
So let's instead just uncoditionally pull in remote-cryptsetup.target in the
initramfs. It should normally be empty, so there should be no impact on boots
that don't have units in the target.
Jonathan's patch used initrd-root-fs.target, this version instead uses
initrd-root-device.target. initrd-root-device.target is ordered before
sysroot.mount, which means that the decrypted devices will be available earlier
too.]
This reverts commit 6c5496c492.
sysinit.target is shared between the initrd and the host system. Pulling in
initrd-cryptsetup.target into sysinit.target causes the following warning at
boot:
Oct 27 10:42:30 workstation-uefi systemd[1]: initrd-cryptsetup.target: Starting requested but asserts failed.
Oct 27 10:42:30 workstation-uefi systemd[1]: Assertion failed for initrd-cryptsetup.target.
For encrypted block devices that we need to unlock from the initramfs,
we currently rely on dracut shipping `cryptsetup.target`. This works,
but doesn't cover the case where the encrypted block device requires
networking (i.e. the `remote-cryptsetup.target` version). That target
however is traditionally dynamically enabled.
Instead, let's rework things here by adding a `initrd-cryptsetup.target`
specifically for initramfs encrypted block device setup. This plays the
role of both `cryptsetup.target` and `remote-cryptsetup.target` in the
initramfs.
Then, adapt `systemd-cryptsetup-generator` to hook all generated
services to this new unit when running from the initrd. This is
analogous to `systemd-fstab-generator` hooking all mounts to
`initrd-fs.target`, regardless of whether they're network-backed or not.
Ensure that systemd-random-seed.service has completed before marking
a first boot as completed to guarantee that a saved seed will only be
used after it has been initialized at least once.
Make sure systemd-firstboot completes before reaching first-boot-complete.target
and thus marking the first boot as completed. This way, it is
guaranteed that systemd-firstboot has a chance to complete provisioning
at least once, even in cases of the first boot getting aborted early.
Add a new target for synchronizing units that wish to run once during
the first boot of the system. The machine-id will be committed to disk
only after the target has been reached, thus ensuring that all units
ordered before it had a chance to complete.
Let's explicitly deactivate all home dirs on shutdown, in order to
properly synchronizing unmounting and avoiding blocking devices.
Previously, we'd rely on automatic deactivation when home directories
become unused. However, that scheme is asynchronous, and ongoing
deactviations might conflicts with attempts to unmount /home. Let's fix
that by providing an explicit service systemd-homed-activate.service
whose only job is to have a ExecStop= line that explicitly deactivates
all home directories on shutdown. This service can the be ordered after
home.mount and similar, ensuring that we'll first deactivate all homes
before deactivating /home itself during shutdown.
This is kept separate from systemd-homed.service so that it is possible
to restart systemd-homed.service without deactivating all home
directories.
Fixes: #16842
RC_LOCAL_SCRIPT_PATH_START and RC_LOCAL_SCRIPT_PATH_STOP were was originally
added in the conversion to meson based on the autotools name. In
4450894653 RC_LOCAL_SCRIPT_PATH_STOP was dropped.
We don't need to use such a long name.
Add After=systemd-networkd.socket to avoid a race condition and networkd
falling back to the non-socket activation code.
Also add Wants=systemd-networkd.socket, so the socket is started when
networkd is started via `systemctl start systemd-networkd.service`.
A Requires is not strictly necessary, as networkd still ships the
non-socket activation code. Should this code be removed one day, the
Wants should be bumped to Requires accordingly.
See also 5544ee8516.
Fixes: #16809
This should make /home as automount work reasonably well.
If /home is an automount this has little effect at boot, because if the
automount is not triggered it doesn't matter how the associated mount is
ordered.
It does matter at shutdown however, where home.mount is likely active
now. There the ordering means we'll end sessions first, and only then
deactivate home.mount.
Fixes: #16291
Combining Requires= with Before= doesn't really make sense, since this
means we are requiring something that runs after us, which logically
cannot be fulfilled.
Let's hence downgrade Requires= to Wants= so that the ordering is kept
but no failure propagation implied.
This should be enough to fix https://bugzilla.redhat.com/show_bug.cgi?id=1856514.
But the limit should be significantly higher than 10% anyway. By setting a
limit on /tmp at 10% we'll break many reasonable use cases, even though the
machine would deal fine with a much larger fraction devoted to /tmp.
(In the first version of this patch I made it 25% with the comment that
"Even 25% might be too low.". The kernel default is 50%, and we have been using
that seemingly without trouble since https://fedoraproject.org/wiki/Features/tmp-on-tmpfs.
So let's just make it 50% again.)
See 7d85383edb.
(Another consideration is that we learned from from the whole initiative with
zram in Fedora that a reasonable size for zram is 0.5-1.5 of RAM, and that pretty
much all systems benefit from having zram or zswap enabled. Thus it is reasonable
to assume that it'll become widely used. Taking the usual compression effectiveness
of 0.2 into account, machines have effective memory available of between
1.0 - 0.2*0.5 + 0.5 = 1.4 (for zram sized to 0.5 of RAM) and
1.0 - 0.2*1.5 + 1.5 = 2.2 (for zram 1.5 sized to 1.5 of RAM) times RAM size.
This means that the 10% was really like 7-4% of effective memory.)
A comment is added to mount-util.h to clarify that tmp.mount is separate.
This reverts commit c7220ca802.
The removal was done as a reaction to the messages from systemd:
initrd-root-fs.target: Requested dependency OnFailure=emergency.target ignored (target units cannot fail).
initrd.target: Requested dependency OnFailure=emergency.target ignored (target units cannot fail).
initrd-root-device.target: Requested dependency OnFailure=emergency.target ignored (target units cannot fail).
initrd-fs.target: Requested dependency OnFailure=emergency.target ignored (target units cannot fail).
local-fs.target: Requested dependency OnFailure=emergency.target ignored (target units cannot fail).
...
But it seems that the messages themselves are wrong, and the units were OK.
Strictly speaking you can run homed without userdb. But it doesn't
really make much sense: they go hand in hand and implement the same
concepts, just for different sets of users. Let's hence disable both
automatically by default if homed is requested.
(We don't do the reverse: opting into userdbd shouldn't mean that you
are OK with homed.)
And of course, users can always deviate from our defaults easily, and
turn off userbd again right-away if they don't like it, and things will
generally work.
This generator can be used by desktop environments to launch autostart
applications and services. The feature is an opt-in, triggered by
xdg-desktop-autostart.target being activated.
Also included is the new binary xdg-autostart-condition. This binary is
used as an ExecCondition to test the OnlyShowIn and NotShowIn XDG
desktop file keys. These need to be evaluated against the
XDG_CURRENT_DESKTOP environment variable which may not be known at
generation time.
Co-authored-by: Henri Chain <henri.chain@enioka.com>
In our regular gettys the actual shell commands live the the session
scope anyway (as long as logind is used). Hence, let's avoid
KillMode=process, it serves no purpose and is simply unsafe since it
disables systemd's own process lifecycle management.
We want to watch USB sticks being plugged in, and that requires
AF_NETLINK to work correctly and get the host's events. But if we live
in a network namespace AF_NETLINK is disconnected too and we'll not get
the host udev events.
Fixes: #15287
Limit size of various tmpfs mounts to 10% of RAM, except volatile root and /var
to 25%. Another exception is made for /dev (also /devs for PrivateDevices) and
/sys/fs/cgroup since no (or very few) regular files are expected to be used.
In addition, since directories, symbolic links, device specials and xattrs are
not counted towards the size= limit, number of inodes is also limited
correspondingly: 4MB size translates to 1k of inodes (assuming 4k each), 10% of
RAM (using 16GB of RAM as baseline) translates to 400k and 25% to 1M inodes.
Because nr_inodes option can't use ratios like size option, there's an
unfortunate side effect that with small memory systems the limit may be on the
too large side. Also, on an extremely small device with only 256MB of RAM, 10%
of RAM for /run may not be enough for re-exec of PID1 because 16MB of free
space is required.
"Login Service" doesn''t explain much, esp. considering that logind is actually is
for logins. I think "User Login Management" is better, but not that great either.
Suggestions welcome.
We unregister binfmt_misc twice during shutdown with this change:
1. A previous commit added support for doing that in the final shutdown
phase, i.e. when we do the aggressive umount loop. This is the robust
thing to do, in case the earlier ("clean") shutdown phase didn't work
for some reason.
2. This commit adds support for doing that when systemd-binfmt.service
is stopped. This is a good idea so that people can order mounts
before the service if they want to register binaries from such
mounts, as in that case we'll undo the registration on shutdown
again, before unmounting those mounts.
And all that, just because of that weird "F" flag the kernel introduced
that can pin files...
Fixes: #14981
This doesn't really matter, since in non-/usr-merged systems plymouth
needs to be in /bin and on merged ones it doesn't matter, but it is
still prettier to insert the right path, and avoid /bin on merged
systems, since it's just a compat symlink.
Replaces: #15351
This dependency is now generated automatically given we use
StateDirectory=. Moreover the combination of Wants= and After= was too
strong anway, as whether remount-fs is pulled in or not should not be up
to systemd-pstore.service, and in fact is part of the initial
transaction anyway.
sysinit.target is the target our early boot services are generally
pulled in from, make systemd-pstore.service not an exception of that.
Effectively this doesn't mean much, either way our unit is part of the
initial transaction.
Add `ProtectClock=yes` to systemd units. Since it implies certain
`DeviceAllow=` rules, make sure that the units have `DeviceAllow=` rules so
they are still able to access other devices. Exclude timesyncd and timedated.
This reverts commit 7e1ed1f3b2.
systemd-repart is not a user service that should be something people
enable/disable, instead it should just work if there's configuration for
it. It's like systemd-tmpfiles, systemd-sysusers, systemd-load-modules,
systemd-binfmt, systemd-systemd-sysctl which are NOPs if they have no
configuration, and thus don't hurt, but cannot be disabled since they
are too deep part of the OS.
This doesn't mean people couldn't disable the service if they really
want to, there's after all "systemctl mask" and build-time disabling,
but those are OS developer facing instead of admin facing, that's how it
should be.
Note that systemd-repart is in particular an initrd service, and so far
enable/disable state of those is not managed anyway via "systemctl
enable/disable" but more what dracut decides to package up and what not.
/home is posibly a remote file system. it makes sense to order homed
after it, so that we can properly enumerate users in it, but we probably
shouldn't pull it in ourselves, and leave that to users to configure
otherwise.
Fixes: #15102
It's lightweight and generally useful, so it should be enabled by default. But
users might want to disable it for whatever reason, and things should be fine
without it, so let's make it installable so it can be disabled if wanted.
Fixes#15175.
This essentially adds another layer of configurability:
build disable, this, presence of configuration. The default is
set to enabled, because the service does nothing w/o config.
Possible alternative to #14819.
For me, setting RemainAfterExit=yes would be OK, but if people think that it
might cause issues, then this could be a reasonable alternative that still
let's us skip the invocation of the separate binary.
This reverts the second part of 8125e8d38e.
The first part was reverted in 750e550eba.
The problem starts when s-v-s.s is pulled in by something that is then pulled
in by sysinit.target. Every time a unit is started, systemd recursively checks
all dependencies, and since sysinit.target is pull in by almost anything, we'll
start s-v-s.s over and over. In particular, plymouth-start.service currently
has Wants=s-v-s.s and After=s-v-s.s.
This minus has been there since the unit was added in
d42d27ead9. I think the idea was not cause things
to fail if the user instance doesn't work. But ignoring the return value
doesn't seem to be the right way to approach the problem. In particular, if
the program fails to run, we'll get a bogus fail state, see
https://bugzilla.redhat.com/show_bug.cgi?id=1727895#c1:
with the minus:
$ systemctl start user@1002
Job for user@1002.service failed because the service did not take the steps required by its unit configuration.
See "systemctl status user@1002.service" and "journalctl -xe" for details.
without the minus:
$ systemctl start user@1002
Job for user@1002.service failed because the control process exited with error code.
See "systemctl status user@1002.service" and "journalctl -xe" for details.
This patch modifies the RequireMountsFor setting in systemd-nspawn@.service to wait for the machine instance directory to be mounted, not just /var/lib/machines.
Closes#14931
machined needs access to the host mount namespace to propagate bind
mounts created with the "machinectl bind" command. However, the
"ProtectKernelLogs" directive relies on mount namespaces to make the
kernel ring buffer inaccessible. This commit removes the
"ProtectKernelLogs=yes" directive from machined service file introduced
in 6168ae5.
Closes#14559.