IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
This applies the existing SocketUser=/SocketGroup= options to units
defining a POSIX message queue, bringing them in line with UNIX
sockets and FIFOs. They are set on the file descriptor rather than
a file system path because the /dev/mqueue path interface is an
optional mount unit.
This commit adds two settings private and strict to
the ProtectControlGroups= property. Private will unshare the cgroup
namespace and mount a read-write private cgroup2 filesystem at /sys/fs/cgroup.
Strict does the same except the mount is read-only. Since the unit is
running in a cgroup namespace, the new root of /sys/fs/cgroup is the unit's
own cgroup.
We also add a new dbus property ProtectControlGroupsEx which accepts strings
instead of boolean. This will allow users to use private/strict via dbus
and systemd-run in addition to service files.
Note private and strict fall back to no and yes respectively if the kernel
doesn't support cgroup2 or system is not using unified hierarchy.
Fixes: #34634
Otherwise, ProtectHome=tmpfs makes /home/ and friends not read-only.
Also, mount options for /run/ specified in MountAPIVFS=yes are not
applied.
The function append_static_mounts() was introduced in
5327c910d2, but at that time, there were
neither .read_only nor .options in the struct. But, when later the
struct is extended, the function was not updated and they were not
copied from the static table.
The fields has been used in static tables since
e4da7d8c79, and also in
94293d65cd.
Fixes#34825.
Clients should be able to know if the idle logic is available on a
session without secondary knowledge about the session class. Let's hence
expose a property for that.
Similar for the screen lock concept.
Fixes: #34844
CNAME doesn't exist at the zone apex. When we get an unsigned noerror
response to a direct query for a CNAME record, we don't yet know if this
name is zone apex. We already request the correct DS record in this
case, but previously skipped it at validation time, causing the answer
to appear bogus. Make sure to also consider the DS record for the query
name for negative replies.
The structure of DNR options is considerably more complicated than most
DHCP options, and as a result the fuzzer has poor coverage of these code
paths.
This adds some DNR packets to the fuzzing corpus, not with the intent of
capturing some specific edge case, but with the intent to rapidly
improve the fuzzers' coverage of these codepaths by giving it a valid
example to begin with.
Also include an ndisc router advert with a few Encrypted DNS options,
for the same purpose.
Now that mkosi supports generating UKI profiles, let's make use of
that to generate the UKI profiles required for the test instead of
doing it within the test itself.
Otherwise, when the test is executed on a system with signed PCRs,
cryptenroll will automatically pick up the public key from the UKI
which results in a volume that can't be unlocked because the pcrextend
tests appends extra things to pcr 11.
As per spec image builders can create a local /etc/os-release
with per-image IDs, so modify that one instead of the original
one in /usr/lib. For example we do this when we build debian
unstable images in mkosi.
This will allow units (scopes/slices/services) to override the default
systemd-oomd setting DefaultMemoryPressureDurationSec=.
The semantics of ManagedOOMMemoryPressureDurationSec= are:
- If >= 1 second, overrides DefaultMemoryPressureDurationSec= from oomd.conf
- If is empty, uses DefaultMemoryPressureDurationSec= from oomd.conf
- Ignored if ManagedOOMMemoryPressure= is not "kill"
- Disallowed if < 1 second
Note the corresponding dbus property is DefaultMemoryPressureDurationUSec
which is in microseconds. This is consistent with other time-based
dbus properties.
Nested mounts should be carried over from host to overlayfs to overlayfs
(and back to host if unmerged). Otherwise you run into hard to debug
issues where merging extensions means you can't unmount those nested mounts
anymore as they are hidden by the overlayfs mount.
To fix this, before unmerging any previous extensions, let's move the nested
mounts from the hierarchy to the workspace, then set up the new hierachy, and
finally, just before moving the hierarchy into place, move the nested mounts
back into place.
Because there might be multiple nested mounts that consists of one or more
mounts stacked on top of each other, we make sure to move all stacked mounts
properly to the overlayfs. The kernel doesn't really provide a nice way to do
this, so we create a stack, pop off each mount onto the stack and then pop from
the stack again to the destination to re-establish the stacked mounts in the same
order in the destination.
'Default Memory Pressure Duration' field in oomctl, which can be configured
with DefaultMemoryPressureDurationSec= in oomd.conf, is a global config.
Let's check it earlier.
This also drops unnecessary cleanup at the beginning.
Fedora and friends has a drop-in config for the settings in
/usr/lib/systemd/user/slice.d/ . Hence, settings in the main .slice may be
overridden. Let's set below in a drop-in with higher decimal prefix.
Also, rename override.conf -> 99-managed-oom-preference.conf for the same reason.
- suppress unnecessary error messages, especially in loop and at_exit(),
- ensure the container service is stopped before restarting,
- do not send KILL signal, as garbages will remain, and disturb the next
invocation,
- drop unnecessary workaround of trying machine twice.
This also fixes the test for io.systemd.Machine.Terminate.
When systemd-nspawn@.service receives stop signal, then systemd-nspawn
sends SIGRTMIN+3 to the container, which was previously ignored by the
custom init script used by the container.
Let's introduce another trap for the signal, and correctly handle it.
Follow-up for 164af66f9a.
The family will be checked later in
address_section_verify() -> address_section_adjust_broadcast(),
hence it is not necessary to set here.
Follow-up for 5d15c7b19c.
Fixes oss-fuzz#372994449.
Fixes#34748.
This commit adds a corresponding integration test for ExtraFileDescriptors
after systemctl daemon-reexec. This ensures systemd keeps the file
descriptors while the service manager is restarting and we don't lose
ability to restart the service correctly.
Create a unit test for systemd timer DeferReactivation config option.
The test works by creating a timer which fires every 5 seconds and
starts an unit which runs for 5 seconds.
With DeferReactivation=true, the timer must fire every 5+5 seconds,
instead of the 5 it fires normally.
As we need at least two timer runs to check if the delta is correct,
the test duration on success will be at least 20 seconds.
To be safe, the test script waits 35 seconds: this is enough to get
at least three runs but low enough to avoid clogging the CI.
By default, in instances where timers are running on a realtime schedule,
if a service takes longer to run than the interval of a timer, the
service will immediately start again when the previous invocation finishes.
This is caused by the fact that the next elapse is calculated based on
the last trigger time, which, combined with the fact that the interval
is shorter than the runtime of the service, causes that elapse to be in
the past, which in turn means the timer will trigger as soon as the
service finishes running.
This behavior can be changed by enabling the new DeferReactivation setting,
which will cause the next calendar elapse to be calculated based on when
the trigger unit enters inactivity, rather than the last trigger time.
Thus, if a timer is on an realtime interval, the trigger will always
adhere to that specified interval.
E.g. if you have a timer that runs on a minutely interval, the setting
guarantees that triggers will happen at *:*:00 times, whereas by default
this may skew depending on how long the service runs.
Co-authored-by: Matteo Croce <teknoraver@meta.com>
On Ubuntu/Debian infrastructure QEMU crashes a lot, so mark the test
as skipped in that case as there's nothing we can do about it and
we shouldn't mark runs as failed
This adds the ExtraFileDescriptor property to StartTransient dbus API
with format "a(hs)" - array of (file descriptor, name) pairs. The FD
will be passed to the unit via sd_notify like Socket and OpenFile.
systemctl show also shows ExtraFileDescriptorName for these transient
units. We only show the name passed to dbus as the FD numbers will
change once passed over the unix socket and are duplicated, so its
confusing to display the numbers.
We do not add this functionality for systemd-run or general systemd
service units as it is not useful for general systemd services.
Arguably, it could be useful for systemd-run in bash scripts but we
prefer to be cautious and not expose the API yet.
Fixes: #34396
The API introduced in https://github.com/systemd/systemd/pull/34295
is less than ideal:
- It doesn't consider signing at all (ukify can't sign separately yet)
- Measurement is completely broken (all profile sections are marked to
not be measured)
- It focuses on a very niche use case of extending existing UKIs and makes
the more common use case of building a UKI with several profiles included
much harder than needed.
Let's instead rework the API to focus on the primary use case of building
a UKI with multiple profiles added to it immediately. We require the profiles
to be built upfront as separate PE binaries with UKI. There's no need to sign
or measure these, they're solely vehicles for profile sections. This saves us
from having to complicate the command line and config parsing to support defining
multiple profiles.
To add the profiles when building a UKI, we introduce the new --add-profile
switch which takes a path to a PE binary describing a profile. The required
sections are read from each PE binary, measured and added as a profile.
The integration test is disabled until the new API is merged and exposed in
mkosi so that building a UKI with profiles can be left to mkosi and the integration
test will only test the switching between profiles and not the building of UKIs
with profiles.
I encountered this race condition while working on TEST-13-NSPAWN.varlinkctl.sh.
The long-running machine's init script sometimes does not have time to start and
register signals. As result, occasiounally failed tests.
Previously, when the test ran on mkosi, then networkd was not masked, and
might be already started. In that case, the interface test2 would be created
soon after the .netdev file is created, and the .link file would not be
applied to the interface. Hence, the later test case for
'networkctl cat @test2:link' would fail.
This make networkd always started at the beginning of the test, and
.netdev file created after .link file is created. So, .link file is
always applied to the interface created by the .netdev file.
This feature has been deprecated since QEMU 5.0 and finally removed in
QEMU 9.1 [0] which now causes issues when running the storage tests on
latest Arch:
------ testcase_long_sysfs_path: BEGIN ------
...
qemu-system-x86_64: -device virtio-blk-pci,drive=drive0,scsi=off,bus=pci_bridge25: Property 'virtio-blk-pci.scsi' not found
E: qemu failed with exit code 1
[0] a271b8d7b2
In the nvme_subsystem test, there are only namespace IDs 16 and 17,
so there would no longer be an "obsolete" symlink created, since this
test scenaro does not create a namespace with ID 1.
Signed-off-by: Bryan Gurney <bgurney@redhat.com>
This tests the whole shebang:
1. That ukify can generate them properly
2. That systemd-boot can dissect them properly
3. That systemd-stub can accept profile selection propery
4. That the profile information ends up in /run/systemd/stub/ properly
5. That systemd-measure correctly calculates the expected PCR 11 values
for each profile and that we can unlock a public-key bound LUKS
volume with it
This introduces 'i' prefix for match string. When specified, string or
pattern will match case-insensitively.
Closes#34359.
Co-authored-by: Ryan Wilson <ryantimwilson@meta.com>
The verb s not really specific to credential management, it was always a
bit misplaced. Hence move it to systemd-analyze, where we already have
some general TPM related verbs such as "srk" and "pcrs"
TEST-64-UDEV-STORAGE is invoked with the subtest appended, so TEST_SKIP=TEST-64-UDEV-STORAGE
does not work. Fix it by using TEST_SKIP as a partial match.
Follow-up for ddc91af4ea
Linux kernel v4.18 (2018-08-12) added user-namespace support to FUSE, and
bumped the FUSE version to 7.27 (see: da315f6e0398 (Merge tag
'fuse-update-4.18' of
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse, Linus Torvalds,
2018-06-07). This means that on such kernels it is safe to enable FUSE in
nspawn containers.
In outer_child(), before calling copy_devnodes(), check the FUSE version to
decide whether enable (>=7.27) or disable (<7.27) FUSE in the container. We
look at the FUSE version instead of the kernel version in order to enable FUSE
support on older-versioned kernels that may have the mentioned patchset
backported ([as requested by @poettering][1]). However, I am not sure that
this is safe; user-namespace support is not a documented part of the FUSE
protocol, which is what FUSE_KERNEL_VERSION/FUSE_KERNEL_MINOR_VERSION are meant
to capture. While the same patchset
- added FUSE_ABORT_ERROR (which is all that the 7.27 version bump
is documented as including),
- bumped FUSE_KERNEL_MINOR_VERSION from 26 to 27, and
- added user-namespace support
these 3 things are not inseparable; it is conceivable to me that a backport
could include the first 2 of those things and exclude the 3rd; perhaps it would
be safer to check the kernel version.
Do note that our get_fuse_version() function uses the fsopen() family of
syscalls, which were not added until Linux kernel v5.2 (2019-07-07); so if
nothing has been backported, then the minimum kernel version for FUSE-in-nspawn
is actually v5.2, not v4.18.
Pass whether or not to enable FUSE to copy_devnodes(); have copy_devnodes()
copy in /dev/fuse if enabled.
Pass whether or not to enable FUSE back over fd_outer_socket to run_container()
so that it can pass that to append_machine_properties() (via either
register_machine() or allocate_scope()); have append_machine_properties()
append "DeviceAllow=/dev/fuse rw" if enabled.
For testing, simply check that /dev/fuse can be opened for reading and writing,
but that actually reading from it fails with EPERM. The test assumes that if
FUSE is supported (/dev/fuse exists), then the testsuite is running on a kernel
with FUSE >= 7.27; I am unsure how to go about writing a test that validates
that the version check disables FUSE on old kernels.
[1]: https://github.com/systemd/systemd/issues/17607#issuecomment-745418835Closes#17607
Right now it mostly duplicates a test that already exists in
TEST-50-DISSECT.mountfsd.sh, but it serves as a template for more unprivileged
nspawn tests.
The .cred suffix is stripped from a credential as it is imported from
the ESP, hence it should not be included in the credential name embedded
in the credential.
Fixes: #33497
So far you had to pick:
1. Use a signed PCR TPM2 policy to lock your disk to (i.e. UKI vendor
blesses your setup via signature)
or
2. Use a pcrlock policy (i.e. local system blesses your setup via
dynamic local policy stored in NV index)
It was not possible combine these two, because TPM2 access policies do
not allow the combination of PolicyAuthorize (used to implement #1
above) and PolicyAuthorizeNV (used to implement #2) in a single policy,
unless one is "further upstream" (and can simply remove the other from
the policy freely).
This is quite limiting of course, since we actually do want to enforce
on each TPM object that both the OS vendor policy and the local policy
must be fulfilled, without the chance for the vendor or the local system
to disable the other.
This patch addresses this: instead of trying to find a way to come up
with some adventurous scheme to combine both policy into one TPM2
policy, we simply shard the symmetric LUKS decryption key: one half we
protect via the signed PCR policy, and the other we protect via the
pcrlock policy. Only if both halves can be acquired the disk can be
decrypted.
This means:
1. we simply double the unlock key in length in case both policies shall
be used.
2. We store two resulting TPM policy hashes in the LUKS token JSON, one
for each policy
3. We store two sealed TPM policy key blobs in the LUKS token JSON, for
both halves of the LUKS unlock key.
This patch keeps the "sharding" logic relatively generic (i.e. the low
level logic is actually fine with more than 2 shards), because I figure
sooner or later we might have to encode more shards, for example if we
add further TPM2-based access policies, for example when combining FIDO2
with TPM2, or implementing TOTP for this.