IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
This is useful to make the dissection logic at boot a bit safer, as we
can reference device nodes by diskseq.
This locks down dissection a bit, since it makes it harder to swap out
the backing device between the time we dissected and validated it, until
we actually mounted it.
This is not complete though, as /bin/mount would have to verify the
diskseq after opening the diskseq symlink again.
See: https://github.com/util-linux/util-linux/issues/1786
When we dissect images automatically, let's be a bit more conservative
with the file system types we are willing to mount: only mount common
file systems automatically.
Explicit mounts requested by admins should always be OK, but when we do
automatic mounts, let's not permit barely maintained, possibly legacy
file systems.
The list for now covers the four common writable and two common
read-only file systems. Sooner or later we might want to add more to the
list.
Also, it might make sense to eventually make this configurable via the
image dissection policy logic.
-1 was used everywhere, but -EBADF or -EBADFD started being used in various
places. Let's make things consistent in the new style.
Note that there are two candidates:
EBADF 9 Bad file descriptor
EBADFD 77 File descriptor in bad state
Since we're initializating the fd, we're just assigning a value that means
"no fd yet", so it's just a bad file descriptor, and the first errno fits
better. If instead we had a valid file descriptor that became invalid because
of some operation or state change, the other errno would fit better.
In some places, initialization is dropped if unnecessary.
This function checks if the external verity data referenced in
VeritySettings covers the specified partition (indicated via
designator).
Right now, we'll use that at one place, but in a later commit in more.
Let's store the GPT partition flags in the dissected partition info.
Right now we won't actually use them for anything yet, but later we'll
add that, when enforcing policy on dissection.
let's make sure we can probe file systems also when unprivileged:
instead of probing the partition block devices for file system
signatures, let's go via the original "whole" fd.
libblkid makes this easy actually, as it allows us to specify the
offset/size of the area to probe. And we have the partition
offsets/sizes anyway, so it's trivial for us to make use of.
This thus enables fs probing also when lacking privs and operating on
naked regular files without loopback devices or anything like this.
Curently, these two flags were implied by dissect_loop_device(), but
that's not right, because this means systemd-gpt-auto-generator will
dissect the root block device with these flags set and that's not
desirable: the generator should not cause the partition devices to be
created (we don't intend to use them right-away after all, but expect
udev to find/probe them first, and then mount them though .mount units).
And there's no point in opening the partition devices, since we do not
intend to mount them via fds either.
Hence, rework this: instead of implying the flags, specify them
explicitly.
While we are at it, let's also rename the flags to make them more
descriptive:
DISSECT_IMAGE_MANAGE_PARTITION_DEVICES becomes
DISSECT_IMAGE_ADD_PARTITION_DEVICES, since that's really all this does:
add the partition devices via BLKPG.
DISSECT_IMAGE_OPEN_PARTITION_DEVICES becomes
DISSECT_IMAGE_PIN_PARTITION_DEVICES, since we not only open the devices,
but keep the devices open continously (i.e. we "pin" them).
Also, drop the DISSECT_IMAGE_BLOCK_DEVICE combination flag, since it is
misleading, i.e. it suggests it was appropriate to specify on all
dissected blocking devices, but that's precisely not the case, see the
systemd-gpt-auto-generator case. My guess is that the confusion around
this was actually the cause for this bug we are addressing here.
Fixes: #25528
systemd-repart generates this in a suitably stable fashion, hence let's
actually use it as an identifier for the image. As a first step parse
it, and show it.
If multiple services with the same encrypted image are simultaneously
starting, one may deactivate the dm device while others using it.
Or, similary, after (regular) partitions are dissected, another process
may try to remove them before we mount them.
To prevent such situations, let's keep the dissected and decrypted
partitions opened. Then, use the file descriptors when we mount the
partitions.
Fixes#24617.
Currently, dissect_image() is only called through dissect_loop_device(),
and the LoopDevice object has device name. Hence, it is not necessary to
get device name in dissect_image().
Note, currently, for each call of dissect_loop_device_and_warn(), the
specified name is equivalent to the path passed to loop_device_make_by_path().
Hence, this should not change the current behavios.
The loading of an extension image from a symlink "NAME.raw" to
"NAME-VERSION.raw" failed because the release file name check worked
with the backing file of the loop device which already resolves the
symlink and thus the found name "NAME-VERSION" mismatched "NAME".
Pass the original filename and use it instead of the backing file
when available. This fixes the loading of "NAME.raw" extensions which
are a symlink to "NAME-VERSION.raw" as, e.g., may be the case when
systemd-sysupdate manages multiple versions.
Fixes https://github.com/systemd/systemd/issues/24293
This reverts a major chunk of 75d7e04eb4
Now that the loopback device code already destroys the partitions we
don't have to do this here anymore.
I am sure the right place to delete the partitions is in the loopback
code, since we really only should do that for loopback devices, see
bug #24431, and not on "real" block devices.
I am also not convinced dropping partitions the dissection logic doesn't
care about is a good idea, after all. The dissection stuff should
probably not consider itself the "owner" of the block devices it
analyzes, but take a more passive role: figure out what is what, but not
modify it.
Fixes: #24431
When closing a loop device, the kernel will asynchronously remove
the probed partitions. This can lead to race conditions where we
try to reuse a partition device that still needs to be removed by
the kernel. To avoid such issues, let's explicitly try to remove
any partitions using BLKPG_DEL_PARTITION when we're done with an
image.
To make sure we don't try to remove partitions when we want them
to remain (e.g. systemd-dissect --mount), we add
dissected_image_relinquish() in a similar vein to loop_device_relinquish()
and decrypted_image_relinquish().
This revisits the mess around waiting for partition block devices in
the image dissection code. It implements a nice little trick:
Instead of waiting for the kernel to probe the partition table for us
and generate the block devices from it, we'll just do that ourselves.
How can we do it? Via the BLKPG_ADD_PARTITION ioctl, that the kernel has
supported for a while. This ioctl allows creating partition block
devices off "whole" block devices from userspace, without the partitions
necessarily being present in the partition table at all.
So, whenever we want a partition to be there, we'll just issue
BLKPG_ADD_PARTITION. This can either work, in which case we know the
partition is there, and can use it. Yay. Or it can fail with EBUSY,
which the kernel returns if a partition by the selected partition index
already exists (or if an existing partition overlaps with the new one).
But if that's the case, then that's also OK, because the partition will
already exist.
So, regardless if we win or the kernel wins, for us the outcome is the
same: the partition block device will exist after invoking the ioctl.
Yay.
Net effect: we are not dependent on asynchronous uevent messages to wait
for the devices. Instead we synchronously get what we need. This makes
us independent of the (apparently less than reliable) netlink transport,
and should almost always be quicker.
Hopefully addresses #17469 even on older kernels.
Fixes: #17469
The implementation of MountImageUnit()/systemctl mount-image was
changed to use a /proc/self/fd path as the source, but that causes
the dm-verity files autodiscovery to fail, as it looks for files
in the same directory as the image.
Use the original file path when setting up dm-verity.
Some parts of our tree used 'Architecture' for storing architectures,
others used ints. Let's unify on the former.
Inspired by #22952's rework of the 'Virtualization' enum.
The whole point of acquiring metadata is quite often to figure out why the
image does not pass verification. Refusing to provide metadata is just being
hostile to the user.
When called from other places (e.g. image_read_metadata()), verification is
still performed.
To allow dissecting images of architectures other than the native
(or secondary) one, we add a third designator 'OTHER' to represent
architectures other than the native or secondary one.
If no partitions of the native or secondary arch are available, we
check if a root partition of any other arch is available and use that
instead if we found one.
This should make things a bit more robust since it ensures system
extension can only applied to the right environments. Right now three
different "scopes" are defined:
1. "system" (for regular OS systems, after the initrd transition)
2. "initrd" (for sysext images that apply to the initrd environment)
3. "portable" (for sysext images that apply to portable images)
If not specified we imply a default of "system portable", i.e. any image
where the field is not specified is implicitly OK for application to OS
images and for portable services – but not for initrds.
Let's also pick more precise names for these helpers that are used for
the tabular output: one checks whether a partition is candidate for
verity at all, and the other checks if it is ready to be used for it.
Let's make this clearer in the name.
Let's make the booleans indicating verity state a bit more descriptive.
Let's rename:
can_verity → has_verity: because that's really what this about
whether verity data is included in the image. Whether we actually
can use it is a different story.
verity → verity_ready: this one should tell us if we have everything
need to actually set it up, hence explicitly say "ready to use" in
the name.
No change in behaviour. Just a bit of renaming.
DISKSEQ is a reliable way to find out if we missed a uevent or not, as
it's monotonically increasing. If we parse an event with a smaller or
no sequence number, we know we need to wait longer. If we parse an
event with a greater sequence number, we know we missed it and the
device was reused.
systemd-repart can grow partitions dynamically at boot, but it won't
grow the file systems inside them. In /etc/fstab you can request that
via x-systemd.growfs. So far we didn't have a nice scheme for images
with GPT auto-discovery however, and that meant in particular in tools
such as systemd-nspawn the file systems couldn't be grown automatically.
Let's address this: let's define a new GPT partition flag that can be
set for our partition types. If set it indicates that the file system
should be grown to the partition size on mount.
This commit adds the flag and adds code to discover it when dissecting
images. There's no code yet to actually do something about it.
This tries to shorten the race of device reuse a bit more: let's ignore
udev database entries that are older than the time where we started to
use a loopback device.
This doesn't fix the whole loopback device raciness mess, but it makes
the race window a bit shorter.