IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
In man pages, horizontal space it at premium, and everything should
generally be indented with 2 spaces to make it more likely that the
examples fit on a user's screen.
C.f. 798d3a524e.
This teaches repart to look for the root block device both as the
backing for /sysroot and for /sysusr/usr.
The latter is a new addition, and starts making more sense with the next
commit. It's about supporting systems that are shipped with only a /usr/
fs, but where a root fs is allocated and formatted on first boot via
systemd-repart (or a similar tool). In this case it's useful to be able
to mount the ultimate /usr/ early on without mounting the root fs
right-away (simple because the rootfs might not exist yet, and we need
the repart data encoded in /usr/ to actually format it). Hence, instead
of requiring that we mount /sysroot/ first and /sysroot/usr/ second as
we did so far, let's rearrange things slightly:
1. We mount the /usr/ file system we discover to /sysusr/usr/
2. We mount the root file system we discover to /sysroot/
3. Once both are established we bind mount /sysusr/usr/ to /sysroot/usr/
And that' it. The first two steps can happen in either order, and we can
access /usr/ with or without a rootfs being around.
This commit implements nothing of the above. Instead, it teaches
systemd-repart to check both /sysroot/ and /sysusr/ for repart drop-ins,
and use the first of these hierarchies it finds populated. This way
systemd-repart can be spawned once /usr is mounted and it will work
correctly without root fs having to exist, or we can invoke it when the
root fs is already mounted, where it also will work correctly.
This changes the fstab-generator to handle mounting of /usr/ a bit
differently than before. Instead of immediately mounting the fs to
/sysroot/usr/ we'll first mount it to /sysusr/usr/ and then add a
separate bind mount that mounts it from /sysusr/usr/ to /sysroot/usr/.
This way we can access /usr independently of the root fs, without for
waiting to be mounted via the /sysusr/ hierarchy. This is useful for
invoking systemd-repart while a root fs doesn't exist yet and for
creating it, with partition data read from the /usr/ hierarchy.
This introduces a new generic target initrd-usr-fs.target that may be
used to generically order services against /sysusr/ to become available.
This tries to shorten the race of device reuse a bit more: let's ignore
udev database entries that are older than the time where we started to
use a loopback device.
This doesn't fix the whole loopback device raciness mess, but it makes
the race window a bit shorter.
This is similar to the preceding work to store the uevent seqnum, but
this stores the CLOCK_MONOTONIC timestamp.
Why? This allows to validate udev database entries, to determine if they
were created *after* we attached the device.
The uevent seqnum logic allows us to validate uevent, and the timestamp
database entries, hence together we should be able to validate both
sources of truth for us.
(note that this is all racy, just a bit less racy, since we cannot
atomically attach loopback devices and get the timestamp for it, the
same way we can't get the uevent seqnum. Thus is shortens the race
window, but doesn#t close it).
We already store a CLOCK_MONOTONIC timestamp for each device appearance,
let' make this queriable.
This is useful to determine whether a udev device database entry is from
a current appearance of the device or a previous one, by comparing it
with appropriately taken timestamps.
Let's drop all monitor uevent that were enqueued before we actually
started setting up the device.
This doesn't fix the race, but it makes the race window smaller: since
we cannot determine the uevent seqnum and the loopback attachment
atomically, there's a tiny window where uevents might be generated by
the device which we mistake for being associated with out use of the
loopback device.
Later, this will allow us to ignore uevents from earlier attachments a
bit better, as we can compare uevent seqnums with this boundary. It's
not a full fix for the race though, since we cannot atomically determine
the uevent and attach the device, but it at least shortens the window a
bit.
Previously, loop_device_make() would return the device fd in one success
code path, but not the other (where' we'd just return 0).
loop_device_open() returns it in all cases.
Hence, let's clean this up, and make sure in all success code paths of
both functions we return it (even though it strictly speaking is
redundant, since we return it in LoopDevice anyway, and currently noone
actually relies on this).
the `man systemd.service` say:
Defaults to the setting DefaultOOMPolicy= in systemd-system.conf(5) is set to
but there is no such line in this config.
This is the default value I extracted from
systemctl show --property=DefaultOOMPolicy
Even if we set up a loopback device read-only and mount it read-only
this means nothing, ext4 will still write through to the backing storage
file.
Yes, I lost 6h debugging time on this.
Apparently, we have to specify "norecovery" when mounting such file
systems, to force them into truly read-only mode. Let's do so.
Let's make the GPT partition flags configurable when creating new
partitions. This is primarily useful for the read-only flag (which we
want to set for verity enabled partitions).
This adds two settings for this: Flags= and ReadOnly=, which strictly
speaking are redundant. The main reason to have both is that usually the
ReadOnly= setting is the one wants to control, and it' more generic.
Moreover we might later on introduce inherting of flags from CopyBlocks=
partitions, where one might want to control most flags as is except for
the RO flag and similar, hence let's keep them separate.
When using systemd-repart as an installer that replicates the install
medium on another medium it is useful to reference the root
partition/usr partition or verity data that is currently booted, in
particular in A/B scenarios where we have two copies and want to
reference the one we currently use. Let's add a CopyBlocks=auto for this
case: for a partition that uses that we'll copy a suitable partition
from the host.
CopyBlocks=auto finds the partition to copy like this: based on the
configured partition type uuid we determine the usual mount point (i.e.
for the /usr partition type we determine /usr/, and so on). We then
figure out the block device behind that path, through dm-verity and
dm-crypt if necessary. Finally, we compare the partition type uuid of
the partition found that way with the one we are supposed to fill and
only use it if it matches (the latter is primarily important on
dm-verity setups where a volume is likely backed by two partitions and
we need to find the right one).
This is particularly fun to use in conjunction with --image= (where
we'll restrict the device search onto the specify device, for security
reasons), as this allows "duplicating" an image like this:
# systemd-repart --image=source.raw --empty=create --size=auto target.raw
If the right repart data is embedded into "source.raw" this will be able
to create and initialize a partition table on target.raw that carrries
all needed partitions, and will stream the source's file systems onto it
as configured.
So far we already had the CopyFiles= option in systemd-repart drop-in
files, as a mechanism for populating freshly formatted file systems with
files and directories. This adds MakeDirectories= in similar style, and
creates simple directories as listed. The option is of course entirely
redundant, since the same can be done with CopyFiles= simply by copying
in a directory. It's kinda nice to encode the dirs to create directly in
the drop-in files however, instead of providing a directory subtree to
copy in somehere, to make the files more self-contained — since often
just creating dirs is entirely sufficient.
The main usecase for this are GPT OS images that carry only a /usr/
tree, and for which a root file system is only formatted on first boot
via repart. Without any additional CopyFiles=/MakeDirectories=
configuration these root file systems are entirely empty of course
initially. To mount in the /usr/ tree, a directory inode for /usr/ to
mount over needs to be created. systemd-nspawn will do so automatically
when booting up the image, as will the initrd during boot. However, this
requires the image to be writable – which is OK for npawn and
initrd-based boots, but there are plenty tools where read-only operation
is desirable after repart ran, before the image was booted for the first
time. Specifically, "systemd-dissect" opens the image in read-only to
inspect its contents, and this will only work of /usr/ can be properly
mounted. Moreover systemd-dissect --mount --read-only won't succeed
either if the fs is read-only.
Via MakeDirectories= we now provide a way that ensures that the image
can be mounted/inspected in a fully read-only way immediately after
systemd-repart completed. Specifically, let's consider a GPT disk image
shipping with a file usr/lib/repart.d/50-root.conf:
[Partition]
Type=root
Format=btrfs
MakeDirectories=/usr
MakeDirectories=/efi
With this in place systemd-repart will create a root partition when run,
and add /usr and /efi into it as directory inods. This ensures that the
whole image can then be mounted truly read-only anf /usr and /efi can be
overmounted by the /usr partition and the ESP.
libfdisk appears to return NULL when encountering an empty partition
label, let's handle this sanely, and treat NULL and "" for the current
label as the same, but for the new label as distinct: there NULL means
nothing is set, and "" means an actual empty label.
This is similar to the --image= switch in the other tools, like
systemd-sysusers or systemd-tmpfiles, i.e. it apply the configuration
from the image to the image.
This is particularly useful for downloading minimized GPT image, and
then extending it to the desired size via:
# systemd-repart --image=foo.image --size=5G
Let's have one flag to request that when dissecting an image the
loopback device is made read-only, and another one to request that when
it is mounted to make it read-only. Previously both concepts were always
done read-only together.
(Of course, making the loopback device read-only but mounting it
read-write doesn't make too much sense, but the kernel should catch that
for us, no need to make restrictions from our side there)
Use-case for this: in systemd-repart we'd like to operate on images for
adding partitions. Thus we'd like to have the loopback device writable,
but if we read repart.d/ snippets from it, we want to do that read-only.