IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
We can't safely apply the fs-verity with signature until we have
booted with the new initrd, because the public key that matches the
signature is loaded from it. So, instead we save the .sig file next
to the compoosefs, and on the first boot we detect that it is there, and
the composefs file isn't fs-verity, so we apply it.
Things get a bit more complex due to having to temporarily make
/sysroot read-write for the fsverity operation too.
Instead of using pkg-config, etc we just include composefs.
In the end the library is just 5 c source files, and it is set up
to be easy to use as a submodule.
For now, composefs support is disabled by default.
When using composefs the root fs will always be read-only, but in this
case we should still continue remounting /sysroot. So, we record a
/run/ostree-composefs-root.stamp file in ostree-prepare-root if composefs
is used, and then react to it in ostree-remount.
In the case of composefs, we cannot compare the devino of the rootfs
and the deploy dir, because the root is the composefs mount, not a
bind mount. Instead we check the devino of the etc subdir of the
deploy, because this is a bind mount even when using composefs.
This changes ostree-prepare-root to use the .ostree.cfs image as a
composefs filesystem, instead of the checkout.
By default, composefs is used if support is built in and the .ostree.cfs
file exists in the deploy dir, otherwise we fall back to the old
method. However, if the ot-composefs kernel option is specified this
can be tweaked as per:
* off: Never use composefsz
* maybe: Use if possible
* on: Fail if not possible
* signed: Fail if the cfs image is not fs-verity signed with
a key in the keyring.
* digest=....: Fail if the cfs image does not match the specified
digest.
The final layout when composefs is active is:
/ ro overlayfs mount for composefs
/sysroot "real" root
/etc rw bind mount to $deploydir/etc
/var rw bind mount to $vardir
We also specify the $deploydir/.ostree-mnt directory as the (internal)
mountpoint for the erofs mount for composefs. This can be used to map
the root fs back to the deploy id/dir in use,
A further note: I didn't test the .usr-ovl-work overlayfs case, but a
comment mentions that you can't mount overlayfs on top of a readonly
mount. That seems incompatible with composefs. If this is needed we
have to merge that with the overlayfs that composefs itself sets up,
which is possible with the libcomposefs APIs.
In many cases, such as when using osbuild, we are not preparing the final
deployment but rather a rootfs tree that will eventually be copied to the
final location. In that case we don't want to apply the signature directly
but when the deployment is copied in place.
To make this situateion workable we also write the signature to a file
next to the composefs image file. Then whatever mechanism that does
the final copy can apply the signature.
This can be used as a composefs source for the root fs instead of
the checkout by pointing the basedir to /ostree/repo/objects.
We only write the file is `composefs` is enabled.
We enable ensure_rootfs_dirs when building the image which adds the
required root dirs to the image. In particular, this includes /etc
which often isn't in ostree commits in use.
We also create an (empty) .ostree.mnt directory, where composefs
will mount the erofs image that will be used as overlayfs lowerdir
for the root overlayfs mount. This way we can find the deploy
dir from the root overlayfs mount options.
If the commit has composefs digests recorded we verify those with the
created file. It also applies the fs-verity signature if it is
recorded, unless this is disabled with the
ex-integrity.composefs-apply-sign=false option.
If `composefs-apply-sig` is enabled (default no) we add an
ostree.composefs digest to the commit metadata. This can be verified
on deploy.
This is a separate option from the generic `composefs` option which
controls whether composefs is used during deploy. It is separate
because we want to not have to force use of fs-verity, etc during the
build.
If the `composefs-certfile` and `composefs-keyfile` keys in the
ex-integrity group are set, then the commit metadata also gets a
ostree.composefs-sig containing the signature of the composefs file.
This supports checking out a commit into a tree which is then
converted into a composefs image containing fs-verity digests for all
the regular files, and payloads that are relative to a the
`repo/objects` directory of a bare ostree repo.
Some specal files are always created in the image. This ensures that
various directories (usr, etc, boot, var, sysroot) exists in the
created image, even if they were not in the source commit. These are
needed (as bindmount targets) if you want to boot from the image. In
the non-composefs case these are just created as needed in the checked
out deploydir, but we can't do that here.
This is all controlled by the new ex-integrity config section, which
has the following layout:
```
[ex-integrity]
fsverity=yes/no/maybe
composefs=yes/no/maybe
composefs-apply-sig=yes/no
composefs-add-metadata=yes/no
composefs-keyfiile=/a/path
composefs-certfile=/a/path
```
The `fsverity` key overrides the old `ex-fsverity` section if
specified. The default for all these is for the new behaviour to be
disabled. Additionally, enabling composefs implies fsverity defaults
to `maybe`, to avoid having to set both.
The `f_bfree` member of the `statvfs` struct is documented as the
"number of free blocks". However, different filesystems have different
interpretations of this. E.g. on XFS, this is truly the number of blocks
free for allocating data. On ext4 however, it includes blocks that
are actually reserved by the filesystem and cannot be used for file
data. (Note this is separate from the distinction between `f_bfree` and
`f_bavail` which isn't relevant to us here since we're privileged.)
If a kernel and initrd is sized just right so that it's still within the
`f_bfree` limit but above what we can actually allocate, the early prune
code won't kick in since it'll think that there is enough space. So we
end up hitting `ENOSPC` when we actually copy the files in.
Rework the early prune code to instead use `fallocate` which guarantees
us that a file of a certain size can fit on the filesystem. `fallocate`
requires filesystem support, but all the filesystems we care about for
the bootfs support it (including even FAT).
(There's technically a TOCTOU race here that existed also with the
`statvfs` code where free space could change between when we check
and when we copy. Ideally we'd be able to pass down that fd to the
copying bits, but anyway in practice the bootfs is pretty much owned by
libostree and one doesn't expect concurrent writes during a finalization
operation.)
Flathub has hit the 10MB limit in 2022, and we had to drop less popular
CPU architectures from the main summary to subsummaries, effectively
cutting off users running too old Flatpak version. Despite that, the
main summary containing only x86_64 is already at 7MB. As this is
eventually going to happen to subsummaries as well, preemptively bump
the limit 12 times.
It takes between 2 and 3 years for a change like this to roll out across
Linux distributions so the best time for this was yesterday.
fixes#2715
Main motivation is prep for composefs in
https://github.com/ostreedev/ostree/pull/2640
In the interest of that, we add a `bool using_composefs` but
it's currently always `false`.
Co-authored-by: Alexander Larsson <alexl@redhat.com>
I've deprecated sh-inline; in the end I think it is better
to minimize the amount of bash code we have. xshell solves
the core convenience problem of taking local variables and mapping
them to command arguments.
A full port would be nontrivial; this just starts the ball
rolling.
During the early design of FCOS and RHCOS, we chose a value of 384M
for the boot partition. This turned out to be too small: some arches
other than x86_64 have larger initrds, kernel binaries, or additional
artifacts (like device tree blobs). We'll likely bump the boot partition
size in the future, but we don't want to abandon all the nodes deployed
with the current size.[[1]]
Because stale entries in `/boot` are cleaned up after new entries are
written, there is a window in the update process during which the bootfs
temporarily must host all the `(kernel, initrd)` pairs for the union of
current and new deployments.
This patch determines if the bootfs is capable of holding all the
pairs. If it can't but it could hold all the pairs from just the new
deployments, the outgoing deployments (e.g. rollbacks) are deleted
*before* new deployments are written. This is done by updating the
bootloader in two steps to maintain atomicity.
Since this is a lot of new logic in an important section of the
code, this feature is gated for now behind an environment variable
(`OSTREE_ENABLE_AUTO_EARLY_PRUNE`). Once we gain more experience with
it, we can consider turning it on by default.
This strategy increases the fallibility of the update system since one
would no longer be able to rollback to the previous deployment if a bug
is present in the bootloader update logic after auto-pruning (see [[2]]
and following). This is however mitigated by the fact that the heuristic
is opportunistic: the rollback is pruned *only if* it's the only way for
the system to update.
[1]: https://github.com/coreos/fedora-coreos-tracker/issues/1247
[2]: https://github.com/ostreedev/ostree/issues/2670#issuecomment-1179341883Closes: #2670