docs/architecture-core.md: New file
This is long overdue. Some of this came up in recent conversation. Let's keep up some continual background momentum on documentation, just like CI. Co-authored-by: Jonathan Lebon <jonathan@jlebon.com>
This commit is contained in:
parent
5a79ca9035
commit
927e02100f
186
docs/architecture-core.md
Normal file
186
docs/architecture-core.md
Normal file
@ -0,0 +1,186 @@
|
||||
---
|
||||
nav_order: 5
|
||||
---
|
||||
|
||||
# Architecture: RPM packages, ostree commits
|
||||
{: .no_toc }
|
||||
|
||||
1. TOC
|
||||
{:toc}
|
||||
|
||||
## RPMs + config -> single OSTree commit
|
||||
|
||||
On the compose side, to generate the "base image" the core idea
|
||||
is that we take a set of packages as input, along with other configuration
|
||||
and data and generate a single OSTree commit - a versioned filesystem
|
||||
tree.
|
||||
|
||||
The same is also true on the client side, but it starts from a "base commit".
|
||||
|
||||
This document will describe the "core" phases and steps in that
|
||||
process that apply both build/compose side and client side.
|
||||
|
||||
## Philosophy: Every change is "from scratch"
|
||||
|
||||
For every change today, rpm-ostree generally rebuilds the target filesystem
|
||||
"from scratch" - adding accurate caching where needed. The goal is to
|
||||
avoid [hysteresis](https://en.wikipedia.org/wiki/Hysteresis) ([related blog](https://blog.verbum.org/2020/08/22/immutable-%E2%86%92-reprovisionable-anti-hysteresis/)).
|
||||
|
||||
In other words if e.g. you `rpm-ostree install foo` and then `rpm-ostree install bar`,
|
||||
the new target filesystem tree will be regenerated "from scratch" and
|
||||
all RPM `%post` scripts etc. will rerun.
|
||||
|
||||
## Overall architecture:
|
||||
|
||||
- For each package, download and import into OSTree commit if necessary
|
||||
- Unpack the "base" filesystem tree if any via hardlinks
|
||||
- Determine an installation order, and unpack each package-ostree commit
|
||||
again via hardlinks
|
||||
- Run all the `%post` scripts (in install order)
|
||||
- Run all the `%posttrans` scripts (in install order)
|
||||
- Write RPM database (if we had a "base commit", starting from that)
|
||||
- If initramfs regeneration is enabled or the kernel was replaced,
|
||||
remove the base initramfs and run `dracut` to generate a new one.
|
||||
- Ask libostree to commit the resulting filesystem tree, optimized
|
||||
by a (device, inode) -> checksum cache, so that files that weren't
|
||||
changed aren't re-checksummed.
|
||||
|
||||
### Generating the filesystem tree
|
||||
|
||||
In contrast to the above, traditional package managers like RPM are usually implemented in
|
||||
a flow that does:
|
||||
|
||||
- Unpack package A
|
||||
- Run `%post` script for A
|
||||
- Update the package database with metadata for A
|
||||
- Unpack package B
|
||||
- Run `%post` script for B
|
||||
- Update the package database with metadata for B
|
||||
- ...
|
||||
- Run `%posttrans` scripts
|
||||
|
||||
etc.
|
||||
|
||||
In contrast, rpm-ostree maintains an OSTree commit corresponding
|
||||
to each RPM package provided as input. On a client system,
|
||||
you can see this in e.g. `ostree refs | grep rpmostree/pkg`
|
||||
(assuming you have layered packages). On a build system,
|
||||
these ostree commits will be stored in a repo at
|
||||
`pkgcache-repo/` within the cache directory.
|
||||
|
||||
This acts as an optimized cache for regenerating the target
|
||||
root filesystem. So for rpm-ostree, the phase is more like this:
|
||||
|
||||
- Unpack the filesystem tree for all packages
|
||||
- Run all the `%post` scripts
|
||||
- Run all the `%posttrans` scripts
|
||||
...
|
||||
|
||||
rpm-ostree is effectively reimplementing large chunks of
|
||||
the librpm userspace in order to make it use OSTree natively.
|
||||
|
||||
### Sandboxing scripts
|
||||
|
||||
On the build server side, it's obviously desirable to
|
||||
have the "build" of an ostree commit for a target system
|
||||
not affect the running host.
|
||||
|
||||
Similarly, on the client side, the default is to provide
|
||||
"offline" updates that don't affect the running system.
|
||||
|
||||
As part of this, rpm-ostree currently uses the
|
||||
[bubblewrap](https://github.com/containers/bubblewrap/)
|
||||
tool to run each script in its own isolated container.
|
||||
|
||||
|
||||
Today, scripts are run with real uid 0 (not in a user namespace),
|
||||
but we [drop most capabilities](https://github.com/coreos/rpm-ostree/pull/1099).
|
||||
Additionally, scripts can't see the real host root filesystem,
|
||||
most notably they do not see the real `/var` with all of the
|
||||
system data. A good example of the benefit of this is
|
||||
["tests: Add a test case for a %post that does rm -rf /"](https://github.com/coreos/rpm-ostree/pull/888).
|
||||
|
||||
In addition to bubblewrap, rpm-ostree uses `rofiles-fuse`
|
||||
from the ostree project which originally enforced the model that
|
||||
a file that has multiple hardlinks is read-only, but
|
||||
more recently gained `--copyup` support which acts
|
||||
in a similar fashion to the in-kernel `overlayfs`.
|
||||
(See also https://github.com/ostreedev/ostree/issues/2281)
|
||||
|
||||
### Kernel handling
|
||||
|
||||
ostree is entirely oriented around bootable filesystem trees;
|
||||
its "source of truth" is the bootloader entries. It has opinions
|
||||
about where the Linux kernel binaries are stored (the current
|
||||
standard is in `/usr/lib/modules/$kver`.)
|
||||
|
||||
In contrast, traditional RPM is unaware of what a kernel is; it's
|
||||
just another package. Most higher level package managers such
|
||||
as yum gained some special casing around the kernel - because
|
||||
it's not possible to restart the running kernel, traditional RPM
|
||||
systems need to keep the kernel modules for the running kernel
|
||||
around. For example yum/dnf have a concept of "installonlyn"
|
||||
which defaults to 2 for the kernel package.
|
||||
|
||||
Additionally, for at least traditional Fedora derivatives with
|
||||
yum/dnf, the initramfs is generated client side as part of
|
||||
a kernel update.
|
||||
|
||||
But for rpm-ostree, the decision was made to default to a
|
||||
pre-generated initramfs by default. Further, in order to implement transactional
|
||||
upgrades, rpm-ostree needs to be in control of the initramfs
|
||||
regeneration - it can't just be a script forked off without its
|
||||
knowledge.
|
||||
|
||||
Further for rpm-ostree, easily replacing the kernel (as well as userspace)
|
||||
is intended to be a first-class operation; you need to be able to do that
|
||||
in order to debug production issues for example.
|
||||
|
||||
In contrast to the yum/dnf "installonly" for ostree there can be exactly one kernel per userspace
|
||||
filesystem tree. To ostree, a "bootable ostree commit"
|
||||
is the pair of (kernel, userspace).
|
||||
|
||||
rpm-ostree combines these two worlds, and goes to some
|
||||
lengths to bend the libdnf stack to work this way. We reset
|
||||
the "installonly" limit back to 1 to ensure we have exactly
|
||||
one kernel. PR: https://github.com/coreos/rpm-ostree/pull/1228
|
||||
|
||||
Further, as noted above rpm-ostree takes over the handling of
|
||||
invoking `dracut` - just like other scripts, it is run inside
|
||||
a container with just read-only access to the system. `dracut`
|
||||
generates the initramfs CPIO archive, which we then place inside
|
||||
the `/usr/lib/modules/$kver` location.
|
||||
|
||||
If client-side initramfs regeneration is enabled, we may selectively
|
||||
provide desired configuration files into this process. PR: https://github.com/coreos/rpm-ostree/pull/2170
|
||||
|
||||
### SELinux
|
||||
|
||||
Handling SELinux is very tricky, because it is a package that can affect
|
||||
*every other package*. Specifically, the SELinux policy package
|
||||
contains a vast set of regular expressions in `file_contexts`
|
||||
to determine labeling.
|
||||
|
||||
For traditional librpm, this is a plugin.
|
||||
|
||||
A major goal of OSTree from the start has been to ensure fully correct
|
||||
handling of SELinux for the base operating system. The way
|
||||
rpm-ostree handles this is by:
|
||||
|
||||
- Recompiling the policy as a `%posttrans` equivalent
|
||||
- Loading the policy from the target root, and pass that loaded policy
|
||||
to libostree, which consults it to use for the label of each
|
||||
committed file.
|
||||
|
||||
This means that on an OSTree based system, the labels for the
|
||||
files in the booted deployment (e.g. in `/usr`) are always
|
||||
correct and set atomically - there's no need to relabel.
|
||||
#### SELinux policy storage location
|
||||
|
||||
Another major difference between traditional yum/dnf and
|
||||
rpm-ostree based systems is the location of the SELinux
|
||||
policy store database itself. rpm-ostree overrides it
|
||||
to be back in `/etc`, when it was moved to `/var` in the
|
||||
RPM package around the Fedora 24 timeframe. For more information
|
||||
see https://bugzilla.redhat.com/show_bug.cgi?id=1290659
|
||||
and the comments in `rpmostree-postprocess.cxx`.
|
Loading…
Reference in New Issue
Block a user