IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Previously, chase_symlinks() always returned an absolute path, which
changed after 5bc244aaa9. This commit
fixes chase_symlinks() so it returns absolute paths all the time again.
Btrfs quotas are actually being enabled in systemd-importd via
setup_machine_directory(), not in systemd-{import,pull} where those
environment variables are checked. Therefore, also check them in
systemd-importd and avoid enabling quotas if requested by the user.
Fixes: #18421Fixes: #15903Fixes: #24387
Non-negative return values of setup_machine_directory() were never used
and never had clear meaning, so do not distinguish between various
non-error conditions and just return 0 in all cases.
In cases like packaging scripts, it might be desired to use
enable/disable on units without install info. So, adding an
option '--no-warn' to suppress the warning.
Trying to disable a unit with no install info is mostly useless, so
adding a warning like we do for enable (with the new dbus method
'DisableUnitFilesWithFlagsAndInstallInfo()'). Note that it would
still find and remove symlinks to the unit in /etc, regardless of
whether it has install info or not, just like before. And if there are
actually files to remove, we suppress the warning.
Fixes#17689
When booting with debug logs, we print:
Setting '/proc/sys/fs/file-max' to '9223372036854775807
'
Setting '/proc/sys/fs/nr_open' to '2147483640
'
Couldn't write fs.nr_open as 2147483640, halving it.
Setting '/proc/sys/fs/nr_open' to '1073741816
'
Successfully bumped fs.nr_open to 1073741816
The strange formatting is because we explicitly appended a newline in those two
places. It seems that the kernel doesn't care. In fact, we have a few dozen other
writes to sysctl where we don't append a newline. So let's just drop those here
too, to make the code a bit simpler and avoid strange output in the logs.
This function checks if the external verity data referenced in
VeritySettings covers the specified partition (indicated via
designator).
Right now, we'll use that at one place, but in a later commit in more.
Let's store the GPT partition flags in the dissected partition info.
Right now we won't actually use them for anything yet, but later we'll
add that, when enforcing policy on dissection.
This fixes an issue introduced by af2aea8bb6.
When an outdated address or route is passed to link_request_address()/route(),
then they return 0 and the address or route will not be assigned. Such
situation can happen when we receive RA with zero lifetime. In that
case, we should not unset Link.ndisc_configured flag, otherwise even
no new address nor route will assigned, the interface will enter to the
configuring state, and unnecessary DBus property change is emit and the state
file will be updated. That makes resolved or timesyncd triggered to
reconfigure the interface.
Fixes#25456.
let's make sure we can probe file systems also when unprivileged:
instead of probing the partition block devices for file system
signatures, let's go via the original "whole" fd.
libblkid makes this easy actually, as it allows us to specify the
offset/size of the area to probe. And we have the partition
offsets/sizes anyway, so it's trivial for us to make use of.
This thus enables fs probing also when lacking privs and operating on
naked regular files without loopback devices or anything like this.
Let's explicitly flush the kernel's buffer cache on the whole block
device once we ran "mkfs". This is necessary, because partition and
whole block devices maintain separate buffer caches, and thus writing
to one will not be visible on the other if cached there already, until
the latter's cache is explicitly flushed.
This is preparation for later adding support for probing file sytems
also if we have no open partition block devices, and hence want to use
the whole block device instead.
Let's extend the test further, and try the codepaths where we do not
pin/add the partition block devices (i.e. which is the codepaths we use
when running without privs)
When repeatedly appending an object to a growing array, we would create a new
array larger by one slot, insert all the old entries and the new element with
ref count bumps into the new array, and then unref the old array.
This would cause problems when building an array with more than a few thousand
elements. If userdbctl is modified to construct an array,
'userdbctl --json=pretty group >/dev/null' with 31k groups:
0.74s (existing code)
102.17s (returning an array)
0.79s (with this patch)
We append arrays in various places, so it seems nice to make this generally
fast.
The source would be set implicitly when parsing from a named file. But
it's useful to specify the source also for cases where we're parsing a
ready string. I noticed the lack of this API when trying to write tests,
but it seems generally useful to be specify a source name when parsing
things.
We would output a sequence of concatenated JSON strings. 'jq' accepts such
output without fuss, and can even automatically build an array with --slurp/-s.
Nevertheless, parsing this format is more effort for the reader, since it's not
"standard JSON". E.g. Python's json module cannot do this out-of-the-box, but
needs some loop with json.JSONDecoder.raw_decode() and then collecting the
objects into an array. Such streaming output make sense in case of logs, where
we stream the output and it has no predefined length. But here we expect at
most a few dozen entries, so it's nicer to write normal JSON that is trivial to
parse.
I'm treating this is a bugfix and not attempting to provide compatibility
backwards. I don't think the previous format was seeing much use, and it's
trivial to adapt to the new one.
libblkid really should define an enum for this on its own, but it
currently doesn't and returns literal numeric values. Lets make this
more readable by adding our own symbolic names via an enum.
e.g. vfat doesn't support symlinks, sockets, fifos, etc so let's ignore
any copy failures related to unsupported file types when populating
filesystems.
Curently, these two flags were implied by dissect_loop_device(), but
that's not right, because this means systemd-gpt-auto-generator will
dissect the root block device with these flags set and that's not
desirable: the generator should not cause the partition devices to be
created (we don't intend to use them right-away after all, but expect
udev to find/probe them first, and then mount them though .mount units).
And there's no point in opening the partition devices, since we do not
intend to mount them via fds either.
Hence, rework this: instead of implying the flags, specify them
explicitly.
While we are at it, let's also rename the flags to make them more
descriptive:
DISSECT_IMAGE_MANAGE_PARTITION_DEVICES becomes
DISSECT_IMAGE_ADD_PARTITION_DEVICES, since that's really all this does:
add the partition devices via BLKPG.
DISSECT_IMAGE_OPEN_PARTITION_DEVICES becomes
DISSECT_IMAGE_PIN_PARTITION_DEVICES, since we not only open the devices,
but keep the devices open continously (i.e. we "pin" them).
Also, drop the DISSECT_IMAGE_BLOCK_DEVICE combination flag, since it is
misleading, i.e. it suggests it was appropriate to specify on all
dissected blocking devices, but that's precisely not the case, see the
systemd-gpt-auto-generator case. My guess is that the confusion around
this was actually the cause for this bug we are addressing here.
Fixes: #25528
reset_terminal_fd sets certain minimum required terminal attributes
that systemd relies on.
One of those attributes is `ONLCR` which ensures that when a new line
is sent to the terminal, that the cursor not only moves to the next
line, but also moves to the very beginning of that line.
In order for `ONLCR` to work, the terminal needs to perform output
post-processing. That requires an additional attribute, `OPOST`,
which reset_terminal_fd currently fails to ensure is set.
In most cases `OPOST` (and `ONLCR` actually) are both set anyway, so
it's not an issue, but it could be a problem if, e.g., the terminal was
put in raw mode by a program and the program unexpectedly died before
restoring settings.
This commit ensures when `ONLCR` is set `OPOST` is set too, which is
the only thing that really makes sense to do.
When copying between filesystems, sometimes the target filesystem
might not support symlinks/fifos/sockets/... and we want to log and
ignore any failures to copy such files when copying. Let's introduce
a new flag to enable this behavior.
If the cgroup is owned by root there is no need to get prefix_uid. Only
check prefix_uid when uid != 0, and then set MANAGED_OOM_PREFERENCE_NONE
and return early if uid != prefix_uid.
This is a wrapper around fd_reopen() that will reopen an fd if the
F_GETFL flags indicate this is necessary, and otherwise not.
This is useful for various utility calls that shall be able to operate
on O_PATH and without it, and might need to convert between the two
depending on what's passed in.
Doing the reconnect dance on some real firmware creates huge delays on
boot. This should not be needed anymore as we now ask the firmware to
make console devices and xbootldr partitions available explicitly in a
more targeted fashion.
Fixes: #25510
Currently the kernel-install man page only documents the bls layout for use
with the boot loader spec type #1. 90-loaderentry.install uses this layout to
generate loader entries and copy the kernel image and initrd to $BOOT.
This commit documents a second layout "uki" and adds 90-uki-copy.install,
which copies a UKI "uki.efi" from the staging area or any file with the .efi
extension given on the command line to
$BOOT/EFI/Linux/$ENTRY_TOKEN-$KERNEl_VERSION(+$TRIES).efi
This allows for both locally generated and distro-provided UKIs to be handled
by kernel-install.
--recursive=no will overwrite possible -P or -k option hence making the
recursive disabling impossible.
Check what counting types the system supports (encoded in the ordering
of our enum) of and pick whatever user requests but is also supported.
Fixes: #25248
Let's make sure we use loop devices if we have access to them and
only fall back to regular files if we can't use loop devices. We
prefer loop devices because when using mkfs --root options, we have
to populate a temporary staging tree which means we're copying every
file twice instead of once when using loop devices.
The EFI shell will pass the entire command line to the application it
starts, which includes the file path of the stub binary. This prevents
us from using the built-in cmdline if the command line is otherwise
empty.
Fortunately, the EFI shell registers a protocol on any images it starts
this way. The protocol even lets us access the args individually, making
it easy to strip the stub path off.
Fixes: #25201
(follow-up of #15958)
In #15958 we deprecated passing positional argument to reboot by
generate a warning. It's been two years now and I believe it can
be dropped completely, as per requested in #15773.
A call to pam_namespace is required so that children of user@.service end up in
a namespace as expected. pam_namespace gets called as part of the stack that
creates a session (login, sshd, gdm, etc.) and those processes end up in a
namespace, but it also needs to be called from our stack which is parallel and
descends from pid1 itself.
The call to pam_namespace is similar to the call to pam_keyinit that was added
in ab79099d16. The pam stack for user@.service
creates a new session which is disconnected from the parent environment. Both
calls are not suitable for inclusion in the shared part of the stack (e.g.
@system-auth on Fedora/RHEL systems), because for example su/sudo/runuser
should not include them.
Fixes#17043 (Allow to execute user service into dedicated namespace
if pam_namespace enabled)
Related to https://bugzilla.redhat.com/show_bug.cgi?id=1861836
(Polyinstantiation is ignored/bypassed in GNOME sessions)
Previously, e.g., networkd enumerated network interfaces with ifindex
in a decreasing order, as sd-netlink inverses the order of the received
multipart messages.
Let's keep the order of the multipart messages. Hopefully this changes
no behavior, as our code do not depend on the order of the received
multipart messages.
Before:
===
Nov 26 09:35:10 systemd[1]: Starting Network Configuration...
Nov 26 09:35:11 systemd-networkd[36185]: wlp59s0: Saved new link: ifindex=3, iftype=ETHER(1), kind=n/a
Nov 26 09:35:12 systemd-networkd[36185]: enp0s31f6: Saved new link: ifindex=2, iftype=ETHER(1), kind=n/a
Nov 26 09:35:12 systemd-networkd[36185]: lo: Saved new link: ifindex=1, iftype=LOOPBACK(772), kind=n/a
After:
===
Nov 26 09:45:18 systemd[1]: Starting Network Configuration...
Nov 26 09:45:19 systemd-networkd[38372]: lo: Saved new link: ifindex=1, iftype=LOOPBACK(772), kind=n/a
Nov 26 09:45:19 systemd-networkd[38372]: enp0s31f6: Saved new link: ifindex=2, iftype=ETHER(1), kind=n/a
Nov 26 09:45:19 systemd-networkd[38372]: wlp59s0: Saved new link: ifindex=3, iftype=ETHER(1), kind=n/a
Previously, if a single packet contains multiple non-multipart messages,
then the messages were linked and saved as a single entry, especially
even if the messages has different serial numbers. Though, not sure if
the kernel sends such packet. But at least for safety, let's link only
multipart messages.
When we receive a multi-part message and fail to parse it, then
the prviously received message is freed with the _cleanup_ attribute,
but still referenced by sd_netlink.rqueue_partial. That causes
use-after-free when we receive another multi-part message.
The --empty option applies to the partition table of the block
device, not the number of definition files we've read. Also, even
if we don't find any definition files, let's not shortcut execution
so we can run repart on a device/loopback file to get information
on the partition table.
Filenames to store user linger requests are created with C-escaping.
When we enumerate the files to acquire ligering users, we use the
filenames verbatim. In the case C-escaping is not an identity map (such
as "DOMAIN\User"), we won't be able to start user instances of
such mangled users.
Unescape filenames when we treat them as usernames again.
Fixes: #25448
Previously, we'd return the ifindex the user asked on, and if none was
specified "lo". Let's always return "lo".
This should be a better choice usually, since localhost addresses are
typically not reachable over arbitrary interfaces once SO_BINDTODEVICE
or so is used. Hence, let's report the interface that is always right
for these addresses.
The batch flag is bugged on older versions of mcopy causing failures
such as:
```
Internal error, size too big
Streamcache allocation problem:: 5
```
It's also a little unclear what the batch flag actually does, so since
everything still works without it, it doesn't hurt to remove it.
The n flag only applies when copying from fat to unix which we don't do
so it doesn't make sense in this scenario.
--include-partitions and --exclude-partitions now fully exclude
partitions from repart. Whenever a partition type is excluded, we
don't take any partitions of that type into account at all when
running systemd-repart.
--skip-partitions= is introduced to do what --exclude-partitions did
previously. Any skipped partitions are taken into acount when doing
size calculations, but are not yet populated.
Why do we need both concepts? Exclusion is needed so that we can
use shared repart definitions to generate bootable and non-bootable
images. When generating a non-bootable image, we use --exclude-partitions
to exclude the ESP partition. Skipping is needed so that we can
populate the root partition while skipping the ESP partition, get
the roothash of the root partition, use that to generate a UKI, and
finally populate the ESP partition with the UKI included.
A NULL Bitmap object is by all our code considered identical to an empty
bitmap. Hence let's remove the entirely unnecessary assert().
The assert() can be triggered if debug monitoring is used an an empty
NSEC or NSEC3 RR is included in an answer resolved returns.
it's not really a security issue since enabling debug monitoring is a
manual step requiring root privileges, that is off by default. Moreover,
it's a "clean" assert(), i.e. the worst that happens is tha a coredump
is generated and resolved restarted.
Fixes: #25449
Only files and directories are supported by vfat. When we pass a
symlink to mcopy, it will try to dereference them and copy what the
symlink points at into the vfat partition instead. Let's avoid this
by skipping all unsupported file types when establishing the list of
top level targets that mcopy should copy.
We also use RECURSE_DIR_SORT everywhere when iterating directories
to make things more reproducible.
How to interpret the pixel format depends on the masks in the DIB header
(if present). Also, 16bpp (unlike 24bpp) can carry an alpha channel.
This was previously not accounted for.
Currently, services use mount_move_root() in order to setup the root
directory of services using a mount namespace. This relies on MS_MOVE
and chroot(). However, this has serious drawbacks even for relatively
simple mount propagation scenarios.
What systemd currently does is roughly equivalent to the following shell
code:
unshare --mount --propagation=shared
cd /
mount --make-rslave /
mkdir /new-root
mount --rbind / /new-root
cd /new-root
mount --move /new-root /
chroot .
This looks simple enough but has the consequence that two separate mount
trees exist for the lifetime of the service. The first one was created
when the mount namespace was created, and the second one when a new
mount for the rootfs was created. The first mount tree sticks around as
a shadow mount tree. Both mount trees are dependent mounts with the host
rootfs as their dominating mount.
Now, when mount propagation is triggered by the host by e.g.,
mount --bind /opt /mnt
it means that two propagation events are generated. I'm skipping over
the exact kernel details as they aren't that important. The gist is that
for every propagation event that is generated a second one is generated
for the shadow mount tree. In other words, the kernel creates two copies
for each mount that is propagated instead of one.
This isn't necessary. We can simply change the sequence above to:
unshare --mount --propagation=shared
cd /
mount --make-rslave /
mkdir /new-root
# stash fd to old rootfs
# stash fd to new rootfs
mount --rbind / /new-root
mkdir /new-root
cd /new-root
pivot_root . .
# new root is tucked under old root
# chdir into old rootfs via stashed fd
umount -l /old-root
The pivot_root allows us to get rid of the old mount tree that was
created when the mount namespace was created. So after this sequence
only one mount tree is alive. Plus, it's safer and nicer. Moving mounts
isn't pleasnt.
This patch doesn't convert nspawn yet as the requirements are more
tricky given that it wants to preserve the rootfs as a shared mount
which goes against pivot_root() requirements.
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
When attaching and /etc/systemd/system.attached can't be created or used
(eg: dead symlink) the logs are pretty much useless as even at debug
level there's no indication of what is going wrong.
Add some debug logs, and return a more specific error string over D-Bus.
This conditional with !empty_or_root(ctx->path) always returns false
because the most recent oomd_cgroup_context_acquire() call was with the
root cgroup. Make sure this test case can be reached by checking cgroup
instead of ctx->path.
While here, use an unused uid (61183) instead of the nobody uid so the
test case does not fail in unprivileged LXD containers.
Commit 652a4efb66 ("oomd: loosen the restriction on ManagedOOMPreference")
made the change to allow ManagedOOMPreference on a cgroup candidate when
the monitored cgroup and cgroup candidate are owned by the same user.
The commit assumed that this check was sufficient to continue allowing
ManagedOOMPreference on all cgroups owned by root. However, it caused a
regression for unprivileged LXD containers where e.g. /sys/fs/cgroup is
owned by nobody (uid=65534).
Fix this by explicitly allowing the ManagedOOMPreference if uid == 0 in
oomd_fetch_cgroup_oom_preference().
In our template file, we have jinja2 template markers, so the file
looks fairly messy. But once it's rendered, it looks pretty clean, except
that the columns are unaligned becuase of "-" in some lines in the first
column. Let's make them aligned.
I was looking at a bug in bugzilla about some boot loader issue, and it was
hard to say if the boot entry files were generated by our plugin or something
else. Add a header to make this clear.
kernel-install invokes the plugins via absolute path always, so $0 gives as
the full path the location where the plugin is installed. This is what we want:
title Fedora Linux 37 (Workstation Edition)
# Boot Loader Specification type#1 entry
# File created by /usr/lib/kernel/install.d/90-loaderentry.install (systemd 252-409-g5028904^)
When relaxed checks are requested, let's not require the efi/xbootldr
directory to be the root of the filesystem. When building images, image
builders might install all efi/xbootldr files to a regular directory
first before packing them up into a partition. To allow bootctl to be
used in such scenarios to install systemd-boot, we need to relax the
fsroot check.
It's only used to avoid BLKDISCARD on individual partitions at the moment.
It can take a lot of time to run on very slow devices, so avoid it for
them too.
sd-stub has an opportunity to handle the seed the same way sd-boot does,
which would have benefits for UKIs when sd-boot is not in use. This
commit wires that up.
It refactors the XBOOTLDR partition discovery to also find the ESP
partition, so that it access the random seed there.
This reenables epoll_pwait2() use, i.e. undoes the effect of
39f756d3ae.
Instead of just reverting that, this PR will change things so that we
strictly rely on glibc's new epoll_pwait2() wrapper (which was added
earlier this year), and drop our own manual fallback syscall wrapper.
That should nicely side-step any issues with correct syscall wrapping
definitions (which on some arch seem not to be easy, given the sigset_t
size final argument), by making this a glibc problem, not ours.
Given that the only benefit this delivers are time-outs more granular
than msec, it shouldn't really matter that we'll miss out on support
for this on systems with older glibcs.
This fixes some bugs that could lead to garbage getting appended to the
command line passed to the kernel:
1. The .cmdline section is not guaranteed to be NUL-terminated, but it
was used as if it was.
2. The conversion of the command line to ASCII that was passed to the
stub ate the NUL at the end.
3. LoadOptions is not guaranteed to be a NUL-terminated EFI string (it
really should be and generally always is, though).
This also fixes the inconsistent mangling of the command line. If the
.cmdline section was used ASCII controls chars (new lines in particular)
would not be converted to spaces.
As part of this commit, we optimize conversion for the generic code
instead of the (deprecated) EFI handover protocol. Previously we would
convert to ASCII/UTF-8 and then back to EFI string for the (now) default
generic code path. Instead we now convert to EFI string and mangle that
back to ASCII in the EFI handover protocol path.
In sd_bus_wait(), let's convert EINTR to a return code of 0, thus asking
the caller do loop again and enter sd_bus_process() again (which will
not find any queued events). This way we'll not return an error on
something that isn't really an error. This should typically make sure
things are properly handled by the caller, magically, without eating up
the event entirely, and still giving the caller time to run some code if
they want.
Sometimes, RTM_NEWLINK message with carrier is received earlier than
NL80211_CMD_CONNECT. To make SSID= or other WiFi related settings in
[Match] section work, let's try to reconfigure the interface.
Fixes a bug introduced by 96f5f9ef9a.
Fixes#25384.
If no root= switch is specified on the kernel command line we'll use the
root disk on which the partition the LoaderDevicePartUUID efi var is
located – as long as that partition is an ESP. Let's slightly liberalize
that and also allow it if that partition is an XBOOTLDR partition. This
ensures that UKIs spawned directly from XBOOTLDR work the same as those
from the ESP.
(Note that this makes no difference if sd-boot is in the mix, as in that
case LoaderDevicePartUUID is always set to the ESP, as that's where
sd-boot is located, and sd-boot will set the var first, sd-stub will
only set it later if it#s not set yet.)
The compiler should recognize that these are constant expressions, but
let's better make this explicit, so that the linker can safely share the
initializations all over the place.
Now that the random seed is used on virtualized systems, there's no
point in having a random-seed-mode toggle switch. Let's just always
require it now, with the existing logic already being there to allow not
having it if EFI itself has an RNG. In other words, the logic for this
can now be automatic.
The second argument to _printf_() specifies where the arguments start. We need to
use 0 in two cases: when the args in a va_list and can't be checked, and with journald
logging functions which accept multiple format strings with multiple argument sets,
which the _printf_ checker does not understand. But strv_extendf() can be checked.
Make sure that the sym_xyz function pointers have the types that the
functions we'll assign them have.
And of course, this found a number of incompatibilities right-away, in
particular in the bpf hookup.
(Doing this will trigger deprecation warnings from libbpf. I simply
turned them off locally now, since we are well aware of what we are
doing in that regard.)
There's one return type fix (bool → int), that actually matters I think,
as it might have created an incompatibility on some archs.
Removing the virtualization check might not be the worst thing in the
world, and would potentially get many, many more systems properly seeded
rather than not seeded. There are a few reasons to consider this:
- In most QEMU setups and most guides on how to setup QEMU, a separate
pflash file is used for nvram variables, and this generally isn't
copied around.
- We're now hashing in a timestamp, which should provide some level of
differentiation, given that EFI_TIME has a nanoseconds field.
- The kernel itself will additionally hash in: a high resolution time
stamp, a cycle counter, RDRAND output, the VMGENID uniquely
identifying the virtual machine, any other seeds from the hypervisor
(like from FDT or setup_data).
- During early boot, the RNG is reseeded quite frequently to account for
the importance of early differentiation.
So maybe the mitigating factors make the actual feared problem
significantly less likely and therefore the pros of having file-based
seeding might outweigh the cons of weird misconfigured setups having a
hypothetical problem on first boot.
For some firmware, replacing their own security arch instance with our
override using ReinstallProtocolInterface() is not enough as they will
not use it. This commit goes back to how this was done before by
directly modifying the security protocols.
Fixes: #25336
If the device path to text protocol is not available (looking angrily at
Apple) we would fail to boot because we cannot get the loaded image
path. As this is only used for cosmetic purposes, we can just silently
continue.
Fixes: #25363
Follow-up for #25368.
Let's consider ENOENT an expected error, and just debug log about it
(though, let's suffix it with `, ignoring.`). All other errors will log
loudly, as they are unexpected errors.
ussually if you specify a DNS server on some interface then we'll use
that interface to talk to it. Let's override this for localhost
addresses, as they only really make sense on "lo".
Fixes: #25397
We only allow a selected subset of syscalls from nspawn containers
and don't list any time64 variants (needed for 32-bit arches when
built using TIME_BITS=64, which is relatively new).
We allow sched_rr_get_interval which cpython's test suite makes
use of, but we don't allow sched_rr_get_interval_time64.
The test failures when run in an arm32 nspawn container on an arm64 host
were as follows:
```
======================================================================
ERROR: test_sched_rr_get_interval (test.test_posix.PosixTester.test_sched_rr_get_interval)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/var/tmp/portage/dev-lang/python-3.11.0_p1/work/Python-3.11.0/Lib/test/test_posix.py", line 1180, in test_sched_rr_get_interval
interval = posix.sched_rr_get_interval(0)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
PermissionError: [Errno 1] Operation not permitted
```
Then strace showed:
```
sched_rr_get_interval_time64(0, 0xffbbd4a0) = -1 EPERM (Operation not permitted)
```
This appears to be the only time64 syscall that isn't already included one of
the sets listed in nspawn-seccomp.c that has a non-time64 variant. Checked
over each of the time64 syscalls known to systemd and verified that none
of the others had a non-time64-variant whitelisted in nspawn other than
sched_rr_get_interval.
Bug: https://bugs.gentoo.org/880131
The existing logic can't find the root device in scenarios where
the root has been replaced with an overlay. We support looking
at "/run/systemd/volatile-root" to find the original root, similar
to what systemd-repart and gpt-auto-generator do.
Instead of having fopen_temporary() create the file either next
to an existing file or in tmp/, let's split this up clearly into
two different functions, one for creating temporary files next to
existing files, and one for creating a temporary file in a directory.
systemd supports /etc/machine-id to be set to: uninitialized
In this case the expectation is that systemd creates a new
machine ID and replaces the value 'uninitialized' with the
effective machine id. In the scope of kernel-install we
should also enforce the creation of a new machine id in this
condition
systemd-cryptenroll complains (but succeeds!) upon binding to a signed PCR
policy:
$ systemd-cryptenroll --unlock-key-file=/tmp/passphrase --tpm2-device=auto
--tpm2-public-key=... --tpm2-signature=..." /tmp/tmp.img
ERROR:esys:src/tss2-esys/esys_iutil.c:394:iesys_handle_to_tpm_handle() Error: Esys invalid ESAPI handle (40000001).
WARNING:esys:src/tss2-esys/esys_iutil.c:415:iesys_is_platform_handle() Convert handle from TPM2_RH to ESYS_TR, got: 0x40000001
ERROR:esys:src/tss2-esys/esys_iutil.c:394:iesys_handle_to_tpm_handle() Error: Esys invalid ESAPI handle (40000001).
WARNING:esys:src/tss2-esys/esys_iutil.c:415:iesys_is_platform_handle() Convert handle from TPM2_RH to ESYS_TR, got: 0x4000000
New TPM2 token enrolled as key slot 1.
The problem seems to be that Esys_LoadExternal() function from tpm2-tss
expects a 'ESYS_TR_RH*' constant specifying the requested hierarchy and not
a 'TPM2_RH_*' one (see Esys_LoadExternal() -> Esys_LoadExternal_Async() ->
iesys_handle_to_tpm_handle() call chain).
It all works because Esys_LoadExternal_Async() falls back to using the
supplied values when iesys_handle_to_tpm_handle() fails:
r = iesys_handle_to_tpm_handle(hierarchy, &tpm_hierarchy);
if (r != TSS2_RC_SUCCESS) {
...
tpm_hierarchy = hierarchy;
}
Note, TPM2_RH_OWNER was used on purpose to support older tpm2-tss versions
(pre https://github.com/tpm2-software/tpm2-tss/pull/1531), use meson magic
to preserve compatibility.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
If we call raise(), we lose the information from the original signal.
If we use rt_sigqueueinfo(), the original siginfo gets reused which
is helpful when debugging crashes.
We use mkfs.xfs's protofile (-p) support to achieve this. The
protofile is a description of the files that should be copied into
the filesystem. The format is described in the manpage of mkfs.xfs.
systemd-boot expects being loaded from ESP and is quite unhappy in case
the loaded image device path is something else. When running on qemu
this can easily happen though. Case one is direct kernel boot, i.e.
loading via 'qemu -kernel systemd-bootx64.efi'. Case two is sd-boot
being added to the ovmf firmware image and being loaded from there.
This patch detects both cases and goes inspect all file systems known to
the firmware, trying to find the ESP. When present the
VMMBootOrderNNNN variables are used to inspect the file systems in the
given order.