1472 Commits

Author SHA1 Message Date
Andrey Smirnov
e9077a6fb9
feat: filter the hostname to produce nodename
Fixes #7615

This extends the previous handling when Talos did `ToLower()` on the
hostname to do the full filtering as expected.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-08-22 12:41:57 +04:00
Andrey Smirnov
dc8361c1d5
fix: properly GC images supplied with both tag and digest
This is a follow-up fix for #7640

I noticed that image cleanup controller cleans up the images if
specified with both tag and digest.

The problem was incorrectly building image references in the expected
set of images, so they were incorrectly marked as unused.

Refactor the code to make the core part testable.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-08-21 21:04:24 +04:00
Andrey Smirnov
b56e8b7d9b
fix: support 'List' type manifests
Fixes #7636

This support a `List`-type manifests by unwrapping them into individual
objects.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-08-21 16:48:37 +04:00
Andrey Smirnov
574d48e540
fix: use image digest when starting a container
First of all, it seems to be "right way", as it makes sure the image is
looked up by the digest.

Second, it fixes the case when image is specified with both tag and
digest (which is not supposed to be the correct ref, but it is used
frequently).

Talos since 1.5.0 stores images with the following aliases:

```
gcr.io/etcd-development/etcd:v3.5.9
gcr.io/etcd-development/etcd@sha256:8c956d9b0d39745fa574bb4dbacd362ffdc1109479432f54094859d4cf984b17
ghcr.io/siderolabs/kubelet:v1.28.0
ghcr.io/siderolabs/kubelet@sha256:50710f2cd3328c23f57dfc7fb00940d8cfd402315e33fc7cb8184fc660650a5c
sha256:50710f2cd3328c23f57dfc7fb00940d8cfd402315e33fc7cb8184fc660650a5c
sha256:8c956d9b0d39745fa574bb4dbacd362ffdc1109479432f54094859d4cf984b17
```

This change pulls the digest format (the last in this list) and uses it
to start a container.

Fixes #7640

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-08-21 15:48:59 +04:00
Andrey Smirnov
ee6d639f6c
fix: match routes on the priority properly
Fixes #7592

The problem was a mismatch between a "primary key" (ID) of the
`RouteSpec` and the way routes are looked up in the kernel - with two
idential routes but different priority Talos would end up in an infinite
loop fighting to remove and re-add back same route, as priority never
matches.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-08-10 14:29:47 +04:00
Andrey Smirnov
e0f383598e
chore: clean up the output of the imager
Use `Progress`, and options to pass around the way messages are written.

Fixed some tiny issues in the code, but otherwise no functional changes.

To make colored output work with `docker run`, switched back image
generation to use volume mount for output (old mode is still
functioning, but it's not the default, and it works when docker is not
running on the same host).

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-08-07 16:00:14 +04:00
Andrey Smirnov
fb536af4d1
chore: optimize memory usage of tcell library on init
There are two changes here:

* build `machined` binary with `tcell_minimal` tag (which disables
  loading some parts of the terminfo database), which also affects
  `apid`, `trustd` and `dashboard` processes, as they run from the same
  executable; in `dashboard` explicitly import `linux` terminal we're
  using when the `dashboard` runs on the machine
* pass `TCELL_MINIMIZE=1` environment variable to each Talos process
  which removes 0.5MiB of runewdith allocation for a lookup table

See #7578

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-08-04 17:59:18 +04:00
Andrey Smirnov
0f1920bdda
chore: provide a resource to peek into Linux clock adjustments
This is a follow-up for #7567, which won't be backported to 1.5.

This allows to get an output like:

```
$ talosctl -n 172.20.0.5 get adjtimestatus -w
NODE         *   NAMESPACE TYPE            ID     VERSION   OFFSET        ESTERROR   MAXERROR   STATUS               SYNC
172.20.0.5   +   runtime   AdjtimeStatus   node   47        -18.14306ms   0s         191.5ms    STA_PLL | STA_NANO   true
172.20.0.5       runtime   AdjtimeStatus   node   48        -17.109555ms  0s         206.5ms    STA_NANO | STA_PLL   true
172.20.0.5       runtime   AdjtimeStatus   node   49        -16.134923ms  0s         221.5ms    STA_NANO | STA_PLL   true
172.20.0.5       runtime   AdjtimeStatus   node   50        -15.21581ms   0s         236.5ms    STA_PLL | STA_NANO   true
```

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-08-03 22:06:53 +04:00
Jared Davenport
bcf2845307
fix: update providerid prefix for aws
This PR updates the ProviderID format for aws resources. There seems to
be a bug when using Talos CCM (which consumes this value from Talos)
because the format is `aws://x/y` (two slashes) vs. the expected
`aws:///x/y` (three slashes) that is set with the AWS CCM code
[here](d055109367/pkg/providers/v1/instances.go (L47-L53)).

Setting only two slashes causes important software in the workload
cluster to fail, specifically cluster-autoscaler. The regex they use for
pulling providerID is [here](702e9685d6/cluster-autoscaler/cloudprovider/aws/aws_cloud_provider.go (L195)).

Signed-off-by: Spencer Smith <spencer.smith@talos-systems.com>
2023-08-03 10:21:56 -04:00
Andrey Smirnov
793dcedc95
fix: fast-wipe the system disk on talosctl reset
Fixes #7558

I see no reason to keep old behavior (removing all partitions on the
disk), as it's only compatible with Talos itself.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-08-03 16:28:59 +04:00
Andrey Smirnov
87fe8f1a2a
feat: implement image generation profiles
Support full configuration for image generation, including image
outputs, support most features (where applicable) for all image output
types, unify image generation process.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-08-02 19:13:44 +04:00
Andrei Kvapil
10f958cf41
feat: network configuration improvements on the NoCloud platform
* support for bonding
* added interface selection by MAC address
* added routes management

Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-08-02 15:03:33 +04:00
Dmitriy Matrenichev
abf3831174
chore: remove cpu_manager_state on cpuManagerPolicy change
After we closed `kubelet`, remove `/var/lib/kubelet/cpu_manager_state` if there are any changes in `cpuManagerPolicy`.
We do not add any other safeguards, so it's user responsibility to cordon/drain the node in advance.

Also minor fixes in other files.

Closes #7504

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2023-08-01 18:53:04 +03:00
Noel Georgi
68e6b98f7d
feat: add security state resource
Add security state resource that describes the state of Talos SecureBoot
and PCR signing key fingerprints.

The UKI fingerprint is currently not populated.

Fixes: #7514

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-07-31 22:02:08 +05:30
Andrey Smirnov
a17272cdda
chore: update hcloud API SDK to v2
There are no functional changes, but SDK got updated to handle int ->
int64 changes. v1 version is only supported to Sep 2023.

See https://github.com/hetznercloud/hcloud-go#support

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-28 19:00:10 +04:00
Andrey Smirnov
6d71bb8df2
refactor: replace google/gopacket with gopacket/gopacket
This new fork seems to be more active. The change itself doesn't fix any
memory allocation, but I submitted a PR for gopacket/gopacket:

https://github.com/gopacket/gopacket/pull/24

Also fix crazy alloc in `tui/components` (this is only relevant for
`talosctl`).

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-28 17:34:15 +04:00
Andrey Smirnov
846f37d84c
refactor: drop dependency on vmware/govmomi
This module was imported just for a single Go struct (for XML
unmarshalling), and it could be easily internalized.

The module causes significant allocation on startup:

```
init github.com/vmware/govmomi/vim25/types @23 ms, 1.4 ms clock, 1269864 bytes, 196 allocs
```

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-28 16:49:34 +04:00
Andrey Smirnov
ca0b32c514
refactor: update AWS SDK and http-getter to v2 versions
Both are much modular and pull in much less dependendencies in to the
Talos tree.

This solves the problem with allocations in AWS endpoints on import, and
removes a bunch of dependencies.

Raw binary size: -10 MiB.

Memory usage (not scientific): around -5 MiB for all Talos services.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-28 15:30:02 +04:00
Utku Ozdemir
355681ddab
fix: terminate dashboard gracefully on & switch back to tty1
- Make dashboard SIGTERM-aware
- Handle panics on dashboard and terminate it gracefully, so it resets the terminal properly
- Switch to TTY2 when it starts and back to TTY1 when it stops.

Closes siderolabs/talos#7516.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2023-07-27 16:00:23 +02:00
Andrey Smirnov
544cb4fe7d
refactor: accept partial machine configuration
This refactors code to handle partial machine config - only multi-doc
without v1alpha1 config.

This uses improvements from
https://github.com/cosi-project/runtime/pull/300:

* where possible, use `TransformController`
* use integrated tracker to reduce boilerplate

Sometimes fix/rewrite tests where applicable.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-27 17:00:42 +04:00
Andrey Smirnov
786e86f5b8
refactor: rewrite the way Talos acquires the machine configuration
Fixes #7453

The goal is to make it possible to load some multi-doc configuration
from the platform source (or persisted in STATE) before machine acquires
full configuration.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-24 14:26:42 +04:00
Andrey Smirnov
9ef4e5efca
fix: log explicitly when kubelet has no nodeIP match
Fixes #7487

When `.kubelet.nodeIP` filters yield no match, Talos should not start
the kubelet, as using empty address list results in `--node-ip=` empty
kubelet arg, which makes kubelet pick up "the first" address.

Instead, skip updating (creating) the nodeIP and log an explicit
warning.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-20 00:41:47 +04:00
Andrey Smirnov
6b39c6a4d3
fix: enable compression and bump gRPC max msg size
Fixes #7482

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-19 22:46:37 +04:00
Noel Georgi
2f2eca8617
chore: basic support for shutdown/poweroff flags
This adds basic support for shutdown/poweroff flags.
it can distringuish between halt/shutdown/reboot.

In the case of Talos halt/shutdown is same op.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-07-19 23:35:32 +05:30
Noel Georgi
59d7d9344b
chore: use machined for shutdown, poweroff
Use the `machined` socket for `shutdown` and `poweroff` aliases. This
ensures that worker nodes does not have to wait on apid to start.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-07-19 21:48:15 +05:30
Dmitriy Matrenichev
2439bfb719
chore: explicitly add timestamps to machined logs
We can safely do it on `io.Writer` level, since `log.Logger.Output` (called by `Print|Printf`) pretty much promises
that every call to `Write` ends with `\n`.

Closes #7439

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2023-07-19 18:29:17 +03:00
Noel Georgi
166d75fe88
fix: tpm2 encrypt/decrypt flow
The previous flow was using TPM PCR 11 values to bound the policy which
means TPM cannot unseal when UKI changes. Now it's fixed to use PCR 7
which is bound to the SecureBoot state (SecureBoot status and
Certificates). This provides a full chain of trust bound to SecureBoot
state and signed PCR signature.

Also the code has been refactored to use PolicyCalculator from the TPM
library.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-07-14 23:58:59 +05:30
Dmitriy Matrenichev
5f34f5b41f
chore: rename api load balancer to KubePrism
Closes #7432

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2023-07-14 15:23:53 +03:00
Andrey Smirnov
53873b8444
refactor: move ukify into Talos code
This is intemediate step to move parts of the `ukify` down to the main
Talos source tree, and call it from `talosctl` binary.

The next step will be to integrate it into the imager and move `.uki`
build out of the Dockerfile.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-13 19:14:32 +04:00
Noel Georgi
79365d9bac
feat: tpm2 based disk encryption
Support disk encryption using tpm2 and pre-calculated signed PCR values.

Fixes: #7266

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-07-12 20:41:28 +05:30
Andrey Smirnov
8017afb107
feat: implement CRI image management and pre-pull on K8s upgrade
Fixes #6391

Implement a set of APIs and commands to manage images in the CRI, and
pre-pull images on Kubernetes upgrades.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-11 19:25:10 +04:00
Andrey Smirnov
1c2f19b367
feat: update Kubernetes to 1.28.0-alpha.4
The Go modules were not tagged for alpha.4, so using alpha.3 tag.

Talos 1.5 will ship with Kubernetes 1.28.0.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-11 15:40:24 +04:00
Artem Chernyshev
cb226eec46
fix: rewrite encryption system information flow
Pass getter to the key handler instead of already fetched node uuid.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2023-07-10 19:07:46 +03:00
Andrey Smirnov
bd4f89f633
fix: disable dashboard on Azure, GCP and Scaleway
Fixes #7416

These platforms don't have video console access.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-10 17:05:56 +04:00
Andrey Smirnov
bdb96189fa
refactor: make maintenance service controller-based
Fixes #7430

Introduce a set of resources which look similar to other API
implementations: CA, certs, cert SANs, etc.

Introduce a controller which manages the service based on resource
state.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-10 15:41:52 +04:00
Andrey Smirnov
d23d04de2a
feat: seed the kernel random pool from the TPM
Use the TPM2 feature to provide high-quality random bytes.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-07 23:51:11 +04:00
LukasAuerbeck
c81ce8cfb0
feat: support controlplane resources configuration
Fixes #7379

Add possibility to configure the controlplane static pod resources via
APIServer, ControllerManager and Scheduler configs.

Signed-off-by: LukasAuerbeck <17929465+LukasAuerbeck@users.noreply.github.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-07 22:44:56 +04:00
Artem Chernyshev
ce63abb219
feat: add KMS assisted encryption key handler
Talos now supports new type of encryption keys which rely on Sealing/Unsealing randomly generated bytes with a KMS server:

```
systemDiskEncryption:
  ephemeral:
    keys:
      - kms:
          endpoint: https://1.2.3.4:443
        slot: 0
```
gRPC API definitions and a simple reference implementation of the KMS server can be found in this
[repository](https://github.com/siderolabs/kms-client/blob/main/cmd/kms-server/main.go).

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2023-07-07 19:02:39 +03:00
Andrey Smirnov
6be5a13d5d
feat: implement machine config documents for event and log streaming
Fixes #7228

Add some changes to make Talos accept partial machine configuration
without main v1alpha1 config.

With this change, it's possible to connect a machine already running
with machine configuration (v1alpha1), the following patch will connect
to a local SideroLink endpoint:

```yaml
apiVersion: v1alpha1
kind: SideroLinkConfig
apiUrl: grpc://172.20.0.1:4000/?jointoken=foo
---
apiVersion: v1alpha1
kind: KmsgLogConfig
name: apiSink
url: tcp://[fdae:41e4:649b:9303::1]:4001/
---
apiVersion: v1alpha1
kind: EventSinkConfig
endpoint: "[fdae:41e4:649b:9303::1]:8080"
```

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-01 00:22:44 +04:00
James Callahan
c02ada7d95
fix: capabilities including ALL should be uppercase
Pod security standard requires that ALL is in caps

Signed-off-by: James Callahan <james@wavesquid.com>
Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-06-29 12:58:30 +05:30
Noel Georgi
cbdf96d461
feat: support environment file for extensions
Supports setting `environmentFile` for Talos System Extension Services.

Fixes: #7316

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-06-28 00:21:13 +05:30
Andrey Smirnov
35d6adcb9a
fix: provide stashed META values before installation
Previously, if META values were supplied to the Talos ISO via
environment variable, they will be written down and available after the
install. With this fix, values are also readable and available before
the installation runs (in maintenance mode).

Most of the PR is refactoring `meta.Value(s)` to be a shared library
which is used by the installer/imager and (now) Talos.

Also fixes an issue with not returning properly `NotExist` error when
META is not yet available as a partition on disk.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-06-27 20:57:43 +04:00
Noel Georgi
bc371ecfda
chore: add /sbin/shutdown
Some tools like qemu-guest-agent when ran as a extension service calls
`/sbin/shutdown` instead of `/sbin/poweroff`. This adds handling for the
same.

Ref: https://github.com/siderolabs/extensions/pull/173

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-06-27 16:10:51 +05:30
Utku Ozdemir
0d313b9733
feat: add reboot-mode flag to talosctl upgrade
Allow specifying the reboot mode during upgrades by introducing `--reboot-mode` flag, similar to the `--mode` flag of the reboot command.

Closes siderolabs/talos#7302.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2023-06-26 17:37:19 +02:00
Markus Reiter
7ce87f20c3
fix: compare only basename of os.Args[0] in machined
This makes handling of `exec` more flexible.

Signed-off-by: Markus Reiter <me@reitermark.us>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-06-26 17:42:30 +04:00
Noel Georgi
e3f3f5794d
feat: implement revert for sd-boot
Implement revert for sd-boot.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-06-22 20:20:31 +05:30
Andrey Smirnov
fe0f46980f
feat: implement secure boot from disk
This includes sd-boot handling, EFI variables, etc.

There are some TODOs which need to be addressed to make things smooth.

Install to disk, upgrades work.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-06-16 20:15:16 +05:30
Dmitriy Matrenichev
445f5ad542
feat: support API server load balancer
This commit adds support for API load balancer. Quick way to enable it is during cluster creation using new `api-server-balancer-port` flag (0 by default - disabled). When enabled all API request will be routed across
cluster control plane endpoints.

Closes #7191

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2023-06-16 10:09:20 -04:00
Andrey Smirnov
19bc223de8
refactor: bootloader interface, labels
Move labels out of the bootloader interface, while moving copying assets
into the bootloader interface. GRUB is using one set of assets,
`sd-boot` will be using another one.

Fix the problem with `bootloader.Probe()` finding boot partition on the
host when it runs in a priv container, fixing issues with image creation
in the CI.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-06-14 17:33:11 +04:00
Noel Georgi
71a548d180
chore: generic boootloader implementation
This changes the bootloader code to be generic to support
multiple bootloader implementations.

Signed-off-by: Noel Georgi <git@frezbo.dev>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-06-13 23:36:20 +04:00