4378 Commits

Author SHA1 Message Date
Andrey Smirnov
0f1920bdda
chore: provide a resource to peek into Linux clock adjustments
This is a follow-up for #7567, which won't be backported to 1.5.

This allows to get an output like:

```
$ talosctl -n 172.20.0.5 get adjtimestatus -w
NODE         *   NAMESPACE TYPE            ID     VERSION   OFFSET        ESTERROR   MAXERROR   STATUS               SYNC
172.20.0.5   +   runtime   AdjtimeStatus   node   47        -18.14306ms   0s         191.5ms    STA_PLL | STA_NANO   true
172.20.0.5       runtime   AdjtimeStatus   node   48        -17.109555ms  0s         206.5ms    STA_NANO | STA_PLL   true
172.20.0.5       runtime   AdjtimeStatus   node   49        -16.134923ms  0s         221.5ms    STA_NANO | STA_PLL   true
172.20.0.5       runtime   AdjtimeStatus   node   50        -15.21581ms   0s         236.5ms    STA_PLL | STA_NANO   true
```

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-08-03 22:06:53 +04:00
Andrey Smirnov
4eab3017b0
fix: calculate log2i properly
Fixes #7080

The real bug was off-by-one in `log2i` implementation, other changes are
cleanups as `x/sys/unix` package now contains all the constants we need.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-08-03 21:17:58 +04:00
Jared Davenport
bcf2845307
fix: update providerid prefix for aws
This PR updates the ProviderID format for aws resources. There seems to
be a bug when using Talos CCM (which consumes this value from Talos)
because the format is `aws://x/y` (two slashes) vs. the expected
`aws:///x/y` (three slashes) that is set with the AWS CCM code
[here](d055109367/pkg/providers/v1/instances.go (L47-L53)).

Setting only two slashes causes important software in the workload
cluster to fail, specifically cluster-autoscaler. The regex they use for
pulling providerID is [here](702e9685d6/cluster-autoscaler/cloudprovider/aws/aws_cloud_provider.go (L195)).

Signed-off-by: Spencer Smith <spencer.smith@talos-systems.com>
2023-08-03 10:21:56 -04:00
Christian Rolland
ac2aff5cc5
fix: fix azure portion of cloud uploader
Correctly propagate errors back. Drop ARM templates and use native APIs.
Correctly handle restarted runs for creating image versions. fixes #7512.

Signed-off-by: Christian Rolland <christian.rolland@siderolabs.com>
2023-08-03 09:38:16 -04:00
Andrey Smirnov
793dcedc95
fix: fast-wipe the system disk on talosctl reset
Fixes #7558

I see no reason to keep old behavior (removing all partitions on the
disk), as it's only compatible with Talos itself.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-08-03 16:28:59 +04:00
Noel Georgi
76fa45afba
docs: update cilium instructions
Update cilium instructions to skip mounting `bpffs`.

Also fix the TPM example in release notes.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-08-03 14:47:23 +05:30
Andrey Smirnov
87fe8f1a2a
feat: implement image generation profiles
Support full configuration for image generation, including image
outputs, support most features (where applicable) for all image output
types, unify image generation process.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-08-02 19:13:44 +04:00
Andrey Smirnov
e685208ce5
chore: update go 1.20.7
Some final bumps for the go.mod before going beta.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-08-02 17:11:51 +04:00
Andrei Kvapil
10f958cf41
feat: network configuration improvements on the NoCloud platform
* support for bonding
* added interface selection by MAC address
* added routes management

Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-08-02 15:03:33 +04:00
Noel Georgi
5adeb5042f
feat: update extension spec allowlist for opengl
NVIDIA OpenGL/Vulkan files are super hard-coded.

Ref: https://github.com/siderolabs/extensions/pull/191

Fixes: https://github.com/siderolabs/extensions/issues/171

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-08-02 04:06:09 +05:30
Dmitriy Matrenichev
abf3831174
chore: remove cpu_manager_state on cpuManagerPolicy change
After we closed `kubelet`, remove `/var/lib/kubelet/cpu_manager_state` if there are any changes in `cpuManagerPolicy`.
We do not add any other safeguards, so it's user responsibility to cordon/drain the node in advance.

Also minor fixes in other files.

Closes #7504

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2023-08-01 18:53:04 +03:00
Andrey Smirnov
018e7f5871
chore: bump dependencies
Linux: 6.1.42
containerd: 1.6.22
Flannel: 0.22.1

And some other Go module bumps, new pkgs/tools/extras.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-31 22:33:22 +04:00
Noel Georgi
68e6b98f7d
feat: add security state resource
Add security state resource that describes the state of Talos SecureBoot
and PCR signing key fingerprints.

The UKI fingerprint is currently not populated.

Fixes: #7514

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-07-31 22:02:08 +05:30
Noel Georgi
209c34801e
chore: drop with-secureboot talosctl flag
The code picks up firmware files in the order it's defined. The
secureboot QEMU firmware files are defined first, so this flag is a
no-op. This was leftover from when `ovmfctl` was used.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-07-31 17:33:12 +04:00
Steve Francis
ab14905d98
docs: note that Talos API requires TCP only load balancer, not HTTPS
Same note for Kubernetes API.

Signed-off-by: Steve Francis <steve.francis@talos-systems.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-31 15:26:13 +04:00
Andrey Smirnov
078c29c733
chore: re-enable cloud images step
The step was disabled for the latest alpha release to workaround AWS
issues which have been resolved.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-31 14:55:35 +04:00
Andrey Smirnov
a17272cdda
chore: update hcloud API SDK to v2
There are no functional changes, but SDK got updated to handle int ->
int64 changes. v1 version is only supported to Sep 2023.

See https://github.com/hetznercloud/hcloud-go#support

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-28 19:00:10 +04:00
Andrey Smirnov
6d71bb8df2
refactor: replace google/gopacket with gopacket/gopacket
This new fork seems to be more active. The change itself doesn't fix any
memory allocation, but I submitted a PR for gopacket/gopacket:

https://github.com/gopacket/gopacket/pull/24

Also fix crazy alloc in `tui/components` (this is only relevant for
`talosctl`).

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-28 17:34:15 +04:00
Andrey Smirnov
846f37d84c
refactor: drop dependency on vmware/govmomi
This module was imported just for a single Go struct (for XML
unmarshalling), and it could be easily internalized.

The module causes significant allocation on startup:

```
init github.com/vmware/govmomi/vim25/types @23 ms, 1.4 ms clock, 1269864 bytes, 196 allocs
```

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-28 16:49:34 +04:00
Andrey Smirnov
ca0b32c514
refactor: update AWS SDK and http-getter to v2 versions
Both are much modular and pull in much less dependendencies in to the
Talos tree.

This solves the problem with allocations in AWS endpoints on import, and
removes a bunch of dependencies.

Raw binary size: -10 MiB.

Memory usage (not scientific): around -5 MiB for all Talos services.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-28 15:30:02 +04:00
Spencer Smith
dbb9f2bc7a
chore: add dm_multipath module
This PR pulls in the latest pkgs commit to enable dm_multipath as a module.

Signed-off-by: Spencer Smith <spencer.smith@talos-systems.com>
2023-07-27 19:05:10 -04:00
Andrey Smirnov
b70b7ea57d
chore: use new go-pcidb database
See https://github.com/siderolabs/go-pcidb/pull/2

This shows minus 2-3 MiB of resident memory usage for each of `apid`,
`dashboard`, `machined` and `trustd`.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-27 22:48:59 +04:00
Andrey Smirnov
9b533e27cf
feat: update Kubernetes to 1.28.0-rc.0
See https://github.com/kubernetes/kubernetes/releases/tag/v1.28.0-rc.0

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-27 20:39:58 +04:00
Noel Georgi
a3a2aa8ef3
fix: use fast wipe for upgrade
As part of bootloader refactoring `go-blockdevice` was used for wiping
partitions in #7329, but used standard wipe which could be fast/slow
depending on the blockdevice support. Switch to using fast-wipe for
partitions. This should not affect `wipe` option in machineconfig.

Fixes: #7531

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-07-27 21:09:06 +05:30
Andrey Smirnov
f863498ff6
fix: always override APIServer audit policy
Fixes #7537

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-27 18:22:08 +04:00
Utku Ozdemir
355681ddab
fix: terminate dashboard gracefully on & switch back to tty1
- Make dashboard SIGTERM-aware
- Handle panics on dashboard and terminate it gracefully, so it resets the terminal properly
- Switch to TTY2 when it starts and back to TTY1 when it stops.

Closes siderolabs/talos#7516.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2023-07-27 16:00:23 +02:00
Andrey Smirnov
544cb4fe7d
refactor: accept partial machine configuration
This refactors code to handle partial machine config - only multi-doc
without v1alpha1 config.

This uses improvements from
https://github.com/cosi-project/runtime/pull/300:

* where possible, use `TransformController`
* use integrated tracker to reduce boilerplate

Sometimes fix/rewrite tests where applicable.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-27 17:00:42 +04:00
Andrey Smirnov
9b0bc3e931
chore: split kernel modules out of the tree
Also update Linux 6.1.41 (Zenbleed workaround).

See https://github.com/siderolabs/pkgs/pull/768

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-26 21:42:29 +04:00
Andrey Smirnov
ffa48ac803
chore: workaround AWS AMI failures, disable Azure uploader
Fixes #7513

AWS image uploads recently consistently fail in some regions, which
blocks the release process. Allow to skip some AMIs if they fail to
upload.

Disable Azure until #7512 is resolved.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-26 17:14:31 +04:00
Spencer Smith
4cd7623cf7
chore: add alx drivers
This PR adds the alx drivers from pkgs to talos

Signed-off-by: Spencer Smith <spencer.smith@talos-systems.com>
2023-07-25 11:00:12 -04:00
Andrey Smirnov
663264c864
release(v1.5.0-alpha.3): prepare release
This is the official v1.5.0-alpha.3 release.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-25 17:26:08 +04:00
Andrey Smirnov
d2f64af863
chore: disable cloud-images, pull in new kernel and gre module
Disable cloud-images step due to the issue with AWS & Azure atm.

Pull in https://github.com/siderolabs/pkgs/pull/761

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-25 15:15:54 +04:00
Scott Cariss
8edce49063
docs: improve proxmox install guide
Improve proxmox install guide.

Fixes: #7402

Signed-off-by: Scott Cariss <scott@cariss.dev>
Signed-off-by: Noel Georgi <git@frezbo.dev>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-24 17:59:39 +04:00
Sacha Trémoureux
c783458be0
docs: typo dhcp -> dhcp
Small typo in reference/kernel/

Signed-off-by: Sacha Trémoureux <sacha@tremoureux.fr>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-24 16:08:14 +04:00
Thomas Lemarchand
003cbd1611
docs: warn about secretboxEncryptionSecret in kubeadm migration guide
Migrating from kubeadm fix.

Signed-off-by: Thomas Lemarchand <tlemarchand@users.noreply.github.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-24 15:24:39 +04:00
Andrey Smirnov
786e86f5b8
refactor: rewrite the way Talos acquires the machine configuration
Fixes #7453

The goal is to make it possible to load some multi-doc configuration
from the platform source (or persisted in STATE) before machine acquires
full configuration.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-24 14:26:42 +04:00
Andrey Smirnov
5e13cafe5b
feat: enforce kernel lockdown for UKI
UKI is meant to be for UEFI Secure Boot, so it's expected to enforce
kernel lockdown. We might reconsider in the future to use a kernel patch
instead: b1a0314b08

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-22 13:52:46 +04:00
Andrey Smirnov
4d96d642fd
feat: update default Kubernetes version to 1.28.0-beta.0
See https://github.com/kubernetes/kubernetes/releases/tag/v1.28.0-beta.0

Go modules are not tagged yet, so skipped updating them.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-21 22:04:19 +04:00
Noel Georgi
170a73e161
chore: support creating qemu guest socket
Support creating a qemu guest agent socket so we can test
`qemu-guest-agent` extension in CI.

Ref: https://github.com/siderolabs/extensions/pull/173#issuecomment-1611911106

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-07-21 22:46:13 +05:30
Christian Rolland
59ac38a6bf
docs: add docs for installing azure ccm and csi
Add docs for installing Azure ccm and csi on Talos.

Signed-off-by: Christian Rolland <christian.rolland@siderolabs.com>
2023-07-21 12:30:26 -04:00
Andrey Smirnov
6288cd970e
release(v1.5.0-alpha.2): prepare release
This is the official v1.5.0-alpha.2 release.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-20 20:57:01 +04:00
Andrey Smirnov
60c304126f
chore: bump dependencies
* go.mod dependencies
* Linux 6.1.39
* runc 1.1.8
* dm-raid kernel module

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-20 18:25:41 +04:00
Andrey Smirnov
9ef4e5efca
fix: log explicitly when kubelet has no nodeIP match
Fixes #7487

When `.kubelet.nodeIP` filters yield no match, Talos should not start
the kubelet, as using empty address list results in `--node-ip=` empty
kubelet arg, which makes kubelet pick up "the first" address.

Instead, skip updating (creating) the nodeIP and log an explicit
warning.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-20 00:41:47 +04:00
Andrey Smirnov
6b39c6a4d3
fix: enable compression and bump gRPC max msg size
Fixes #7482

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-19 22:46:37 +04:00
Noel Georgi
2f2eca8617
chore: basic support for shutdown/poweroff flags
This adds basic support for shutdown/poweroff flags.
it can distringuish between halt/shutdown/reboot.

In the case of Talos halt/shutdown is same op.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-07-19 23:35:32 +05:30
Florian Klink
b84277d7dc
docs: fix wrong capability name
It's CAP_SYS_BOOT, not CAP_BOOT.

Signed-off-by: Florian Klink <flokli@flokli.de>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-19 21:23:05 +04:00
Noel Georgi
59d7d9344b
chore: use machined for shutdown, poweroff
Use the `machined` socket for `shutdown` and `poweroff` aliases. This
ensures that worker nodes does not have to wait on apid to start.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-07-19 21:48:15 +05:30
Dmitriy Matrenichev
2439bfb719
chore: explicitly add timestamps to machined logs
We can safely do it on `io.Writer` level, since `log.Logger.Output` (called by `Print|Printf`) pretty much promises
that every call to `Write` ends with `\n`.

Closes #7439

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2023-07-19 18:29:17 +03:00
Noel Georgi
14966e718a
fix: skip over tpm2 1.2 devices
For rng seed and pcr extend, let's ignore if the device is not TPM2.0
based. Seal/Unseal operations would still error out since it's
explicitly user enabled feature.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-07-18 12:58:45 +05:30
Dmitriy Matrenichev
6716e7bc0b
docs: update cilium documentation about KubePrism usage
Closes #7400

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2023-07-17 19:25:09 +03:00