567 Commits

Author SHA1 Message Date
Andrey Smirnov
807a9950ac
fix: use custom Talos/kernel version when generating UKI
See https://github.com/siderolabs/image-factory/issues/44

Instead of using constants, use proper Talos version and kernel version
discovered from the image.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-11-03 11:14:02 +04:00
Andrey Smirnov
6dc776b8aa
fix: when writing to META in the installer/imager, use fixed name
Use fixed partition name instead of trying to auto-discover by label.

Auto-discovery by label might hit completely wrong blockdevice.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-11-01 20:34:41 +04:00
Artem Chernyshev
ffa5e05cb9
fix: make Talos work on Rockpi 4c boards again
Suppress `efivars` `ENODEV` errors: skip mount and proceed with boot
sequence.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2023-10-25 13:51:38 +03:00
Andrey Smirnov
9dfae8467d
chore: update dependencies
Containerd 1.7.7, Linux 6.1.58.

Fixes #7859

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-10-17 17:41:38 +04:00
Thomas Way
b87092ab69
fix: handle secure boot state policy pcr digest error
This does not fix the underlying digest mismatch issue, but does handle the error and should provide
further insight into issues (if present).

Refs: #7828

Signed-off-by: Thomas Way <thomas@6f.io>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-10-09 18:24:56 +04:00
Thomas Way
336aee0fdb
fix: use tpm2 hash algorithm constants and allow non-SHA-256 PCRs
The conversion from TPM 2 hash algorithm to Go crypto algorithm will fail for
uncommon algorithms like SM3256. This can be avoided by checking the constants
directly, rather than converting them. It should also be fine to allow some non
SHA-256 PCRs.

Fixes: #7810

Signed-off-by: Thomas Way <thomas@6f.io>
Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-10-04 01:02:20 +05:30
Andrey Smirnov
2b548ad0d9
feat: update containerd to 1.7.x
Also update Linux and other pkgs.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-09-28 16:33:57 +04:00
Andrey Smirnov
a52d3cda3b
chore: update gen and COSI runtime
No actual changes, adapting to use new APIs.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-09-22 12:13:13 +04:00
Andrey Smirnov
5ca4d58dc9
fix: generate of modules.dep when on the machine
When running on the machine, the extensionTreePath is not writeable, so
create and clean up a temporary directory to host `modules.dep`
extension.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-09-20 15:51:22 +04:00
Andrey Smirnov
e3b4940588
fix: build CPU ucode correctly for early loader
Closes #7729

This follows the steps described in
https://www.kernel.org/doc/html/v6.1/x86/microcode.html#early-load-microcode

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-09-18 14:03:41 +04:00
Andrey Smirnov
c5bd0ac5cf
refactor: reimplement the depmod extension rebuilder
Drop loop device/mounts completely, use userspace utilities to extract
and lay over module trees in the tmpfs.

Discover kernel version automatically instead of hardcoding it to be
current one (required for Image Service).

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-09-15 21:51:42 +04:00
Andrey Smirnov
a096f05a56
chore: update gRPC library and enable shared write buffers
Fixes #7576

See https://github.com/grpc/grpc-go/pull/6309

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-09-13 21:27:46 +04:00
Noel Georgi
3d2dad4e69
chore: show securtiystate on dashboard
Show Talos SecurityState and MountStatus on dashboard.

Fixes: #7675

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-09-06 21:46:25 +05:30
Andrey Smirnov
2d3ac925ea
refactor: update NTP spike detector
See https://github.com/siderolabs/talos/issues/7080#issuecomment-1696105986

The NTP spike detector code was refactored out of the main NTP code so
that it can be unit-tested.

I dropped one check which I think is causing false-positives in the
spike detector (when NTP offset is higher than the RTT of the best
packet received so far).

The overall flow resembles the one in systemd-timesync, the current
implementation has this check:

6639ac474e/src/timesync/timesyncd-manager.c (L357-L360)

This check was introduced in the initial release, after some
refactoring:

3dbc762003 (diff-4aa9995f07bb31b9884d40a7634f5f6d30245dfd26ac27b89cd5fd3bd4eef56aR429-R431)

There is no equivalent of it in the RFC:

https://datatracker.ietf.org/doc/html/rfc5905#appendix-A.5.2

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-08-29 20:56:42 +04:00
Andrey Smirnov
3c9f7a7de6
chore: re-enable nolintlint and typecheck linters
Drop startup/rand.go, as since Go 1.20 `rand.Seed` is done
automatically.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-08-25 01:05:41 +04:00
Noel Georgi
6778ded29d
feat: add e2e-aws for nvidia extensions
Add e2e tests for nvidia

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-08-24 17:43:36 +05:30
Utku Ozdemir
ea0d6e8c6a
fix: prevent dashboard crashes when process info is not available
Processes and their info are not guaranteed to be present on the api-based data gathered by the dashboard. Therefore, we switch to using nil-safe access to the CPU time when rendering the process table.

Closes siderolabs/talos#7645.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2023-08-22 12:55:36 +02:00
Andrey Smirnov
86c94eff8d
refactor: docgen and config examples
Short version is: move from global variables/`init()` function into
explicit functions.

`docgen` was updated to skip creating any top-level global variables,
now `Doc` information is generated on the fly when it is accessed.
Talos itself doesn't marshal the configuration often, so in general it
should never be accessed for Talos (but will be accessed e.g. for
`talosctl`).

Machine config examples were changed manually from variables to
functions returning a value and moved to a separate file.

There are no changes to the output of `talosctl gen config`.

There is a small change to the generated documentation, which I believe
is a correct one, as previously due to value reuse it was clobbered with
other data.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-08-10 14:56:01 +04:00
Andrey Smirnov
e0f383598e
chore: clean up the output of the imager
Use `Progress`, and options to pass around the way messages are written.

Fixed some tiny issues in the code, but otherwise no functional changes.

To make colored output work with `docker run`, switched back image
generation to use volume mount for output (old mode is still
functioning, but it's not the default, and it works when docker is not
running on the same host).

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-08-07 16:00:14 +04:00
Andrey Smirnov
fb536af4d1
chore: optimize memory usage of tcell library on init
There are two changes here:

* build `machined` binary with `tcell_minimal` tag (which disables
  loading some parts of the terminfo database), which also affects
  `apid`, `trustd` and `dashboard` processes, as they run from the same
  executable; in `dashboard` explicitly import `linux` terminal we're
  using when the `dashboard` runs on the machine
* pass `TCELL_MINIMIZE=1` environment variable to each Talos process
  which removes 0.5MiB of runewdith allocation for a lookup table

See #7578

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-08-04 17:59:18 +04:00
Artem Chernyshev
7d688ccfeb
fix: make encryption config provider default to luks2 if not set
Fixes: https://github.com/siderolabs/talos/issues/7515

Rename `Kind` to `Provider` in the `v1alpha1_provider`.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2023-08-04 12:20:55 +03:00
Dmitriy Matrenichev
80238a05a6
chore: unify semver under github.com/blang/semver/v4
Currently, we use `github.com/coreos/go-semver/semver` and `github.com/hashicorp/go-version`
for version parsing. As we use `github.com/blang/semver/v4` in our other projects, and it
has more features, it makes sense to use it across the projects. It also doesn't allocate
like crazy in `KubernetesVersion.SupportedWith`.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2023-08-04 00:29:52 +03:00
Andrey Smirnov
4eab3017b0
fix: calculate log2i properly
Fixes #7080

The real bug was off-by-one in `log2i` implementation, other changes are
cleanups as `x/sys/unix` package now contains all the constants we need.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-08-03 21:17:58 +04:00
Andrey Smirnov
87fe8f1a2a
feat: implement image generation profiles
Support full configuration for image generation, including image
outputs, support most features (where applicable) for all image output
types, unify image generation process.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-08-02 19:13:44 +04:00
Andrey Smirnov
6d71bb8df2
refactor: replace google/gopacket with gopacket/gopacket
This new fork seems to be more active. The change itself doesn't fix any
memory allocation, but I submitted a PR for gopacket/gopacket:

https://github.com/gopacket/gopacket/pull/24

Also fix crazy alloc in `tui/components` (this is only relevant for
`talosctl`).

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-28 17:34:15 +04:00
Noel Georgi
a3a2aa8ef3
fix: use fast wipe for upgrade
As part of bootloader refactoring `go-blockdevice` was used for wiping
partitions in #7329, but used standard wipe which could be fast/slow
depending on the blockdevice support. Switch to using fast-wipe for
partitions. This should not affect `wipe` option in machineconfig.

Fixes: #7531

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-07-27 21:09:06 +05:30
Utku Ozdemir
355681ddab
fix: terminate dashboard gracefully on & switch back to tty1
- Make dashboard SIGTERM-aware
- Handle panics on dashboard and terminate it gracefully, so it resets the terminal properly
- Switch to TTY2 when it starts and back to TTY1 when it stops.

Closes siderolabs/talos#7516.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2023-07-27 16:00:23 +02:00
Andrey Smirnov
544cb4fe7d
refactor: accept partial machine configuration
This refactors code to handle partial machine config - only multi-doc
without v1alpha1 config.

This uses improvements from
https://github.com/cosi-project/runtime/pull/300:

* where possible, use `TransformController`
* use integrated tracker to reduce boilerplate

Sometimes fix/rewrite tests where applicable.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-27 17:00:42 +04:00
Noel Georgi
14966e718a
fix: skip over tpm2 1.2 devices
For rng seed and pcr extend, let's ignore if the device is not TPM2.0
based. Seal/Unseal operations would still error out since it's
explicitly user enabled feature.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-07-18 12:58:45 +05:30
Noel Georgi
166d75fe88
fix: tpm2 encrypt/decrypt flow
The previous flow was using TPM PCR 11 values to bound the policy which
means TPM cannot unseal when UKI changes. Now it's fixed to use PCR 7
which is bound to the SecureBoot state (SecureBoot status and
Certificates). This provides a full chain of trust bound to SecureBoot
state and signed PCR signature.

Also the code has been refactored to use PolicyCalculator from the TPM
library.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-07-14 23:58:59 +05:30
Andrey Smirnov
c8b7095c01
refactor: use tpm2 library to calculate policy hash
No real change, just using library to do the work (should be more
readable).

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-14 15:07:47 +04:00
Andrey Smirnov
53873b8444
refactor: move ukify into Talos code
This is intemediate step to move parts of the `ukify` down to the main
Talos source tree, and call it from `talosctl` binary.

The next step will be to integrate it into the imager and move `.uki`
build out of the Dockerfile.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-13 19:14:32 +04:00
Noel Georgi
79365d9bac
feat: tpm2 based disk encryption
Support disk encryption using tpm2 and pre-calculated signed PCR values.

Fixes: #7266

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-07-12 20:41:28 +05:30
Andrey Smirnov
06369e8195
fix: retry CRI pod removal, fix upgrade flow in the tests
It seems that CRI has a bit of eventual consistency, and it might fail
to remove a stopped pod failing that it's still running.

Rewrite the upgrade API call in the upgrade test to actually wait for
the upgrade to be successful, and fail immediately if it's not
successful. This should improve the test stability and it should make
it easier to find issues immediately.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-12 16:20:10 +04:00
Andrey Smirnov
8017afb107
feat: implement CRI image management and pre-pull on K8s upgrade
Fixes #6391

Implement a set of APIs and commands to manage images in the CRI, and
pre-pull images on Kubernetes upgrades.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-11 19:25:10 +04:00
Artem Chernyshev
936111ce06
fix: properly set up tls for KMS endpoint
The condition was inverted 🤦

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2023-07-10 21:10:02 +03:00
Artem Chernyshev
cb226eec46
fix: rewrite encryption system information flow
Pass getter to the key handler instead of already fetched node uuid.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2023-07-10 19:07:46 +03:00
Andrey Smirnov
d23d04de2a
feat: seed the kernel random pool from the TPM
Use the TPM2 feature to provide high-quality random bytes.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-07 23:51:11 +04:00
Andrey Smirnov
74de562b29
fix: mount hugepages with nosuid + nodev
Fixes #7445

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-07 21:57:19 +04:00
Artem Chernyshev
ce63abb219
feat: add KMS assisted encryption key handler
Talos now supports new type of encryption keys which rely on Sealing/Unsealing randomly generated bytes with a KMS server:

```
systemDiskEncryption:
  ephemeral:
    keys:
      - kms:
          endpoint: https://1.2.3.4:443
        slot: 0
```
gRPC API definitions and a simple reference implementation of the KMS server can be found in this
[repository](https://github.com/siderolabs/kms-client/blob/main/cmd/kms-server/main.go).

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2023-07-07 19:02:39 +03:00
Andrey Smirnov
35d6adcb9a
fix: provide stashed META values before installation
Previously, if META values were supplied to the Talos ISO via
environment variable, they will be written down and available after the
install. With this fix, values are also readable and available before
the installation runs (in maintenance mode).

Most of the PR is refactoring `meta.Value(s)` to be a shared library
which is used by the installer/imager and (now) Talos.

Also fixes an issue with not returning properly `NotExist` error when
META is not yet available as a partition on disk.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-06-27 20:57:43 +04:00
Noel Georgi
e3f3f5794d
feat: implement revert for sd-boot
Implement revert for sd-boot.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-06-22 20:20:31 +05:30
Andrey Smirnov
fe0f46980f
feat: implement secure boot from disk
This includes sd-boot handling, EFI variables, etc.

There are some TODOs which need to be addressed to make things smooth.

Install to disk, upgrades work.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-06-16 20:15:16 +05:30
Andrey Smirnov
19bc223de8
refactor: bootloader interface, labels
Move labels out of the bootloader interface, while moving copying assets
into the bootloader interface. GRUB is using one set of assets,
`sd-boot` will be using another one.

Fix the problem with `bootloader.Probe()` finding boot partition on the
host when it runs in a priv container, fixing issues with image creation
in the CI.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-06-14 17:33:11 +04:00
Noel Georgi
1c0c7933df
chore: cleanup partition code
Cleanup partition code to be explicit about `Format` and `Partition`
options.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-06-08 00:35:09 +05:30
Noel Georgi
e912c0dfcf
chore: use go-blockdevice for zeroing partitions
Use the `go-blockdevice` library to zero partitions.

Also added a test that writes `ones` to the partition and verifies its
zeroes after zeroing it.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-06-07 01:12:11 +05:30
Andrey Smirnov
bab484a405
feat: use stable network interface names
Use `udevd` rules to create stable interface names.

Link controllers should wait for `udevd` to settle down, otherwise link
rename will fail (interface should not be UP).

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-06-01 21:29:12 +04:00
Utku Ozdemir
196dfb99b0
fix: do not probe kernel args in dashboard if not needed
If the dashboard is run without the "Config URL" screen, do not initialize it, and do not probe the kernel args for the code parameter.

Refactor the dashboard to do not construct the unused screens at all.

Closes siderolabs/talos#7300.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2023-06-01 10:32:43 +02:00
Andrey Smirnov
badbc51e63
refactor: rewrite code to include preliminary support for multi-doc
`config.Container` implements a multi-doc container which implements
both `Container` interface (encoding, validation, etc.), and `Conifg`
interface (accessing parts of the config).

Refactor `generate` and `bundle` packages to support multi-doc, and
provide backwards compatibility.

Implement a first (mostly example) machine config document for
SideroLink API URL.

Many places don't properly support multi-doc yet (e.g. config patches).

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-05-31 18:38:05 +04:00
Andrey Smirnov
dc6764871c
refactor: move around config interfaces, make RawV1Alpha1 typed
See #7230

Refactor more config interfaces, move config accessor interfaces
to different package to break the dependency loop.

Make `.RawV1Alpha1()` method typed to avoid type assertions everywhere.

No functional changes.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-05-23 22:08:58 +04:00