mirror of
https://github.com/ostreedev/ostree.git
synced 2025-01-25 10:04:14 +03:00
d3d3e4ea13
Quite a while ago we added staged deployments, which solved a bunch of issues around the `/etc` merge. However...a persistent problem since then is that any failures in that process that happened in the *previous* boot are not very visible. We ship custom code in `rpm-ostree status` to query the previous journal. But that has a few problems - one is that on systems that have been up a while, that failure message may even get rotated out. And second, some systems may not even have a persistent journal at all. A general thing we do in e.g. Fedora CoreOS testing is to check for systemd unit failures. We do that both in our automated tests, and we even ship code that displays them on ssh logins. And beyond that obviously a lot of other projects do the same; it's easy via `systemctl --failed`. So to make failures more visible, change our `ostree-finalize-staged.service` to have an internal wrapper around the process that "catches" any errors, and copies the error message into a file in `/boot/ostree`. Then, a new `ostree-boot-complete.service` looks for this file on startup and re-emits the error message, and fails. It also deletes the file. The rationale is to avoid *continually* warning. For example we need to handle the case when an upgrade process creates a new staged deployment. Now, we could change the ostree core code to delete the warning file when that happens instead, but this is trying to be a conservative change. This should make failures here much more visible as is.