2030 Commits

Author SHA1 Message Date
Andrey Smirnov
bac1d00c35
chore: prepare for Talos 1.8
Fork docs, introduce version contract for 1.8.

Clean up old version contracts 0.8-0.14.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-04-19 18:19:36 +04:00
Dmitriy Matrenichev
908f67fa15
feat: add host dns support for resolving member addrs
Closes #8330

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-04-18 15:29:30 +03:00
Dmitriy Matrenichev
ec69d7a785
chore: replace math/rand with math/rand/v2
New package arrived in Go 1.22 which provides better rand primitives and functions.
Use it instead of the old one.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-04-18 13:20:59 +03:00
Andrey Smirnov
3433fa13bf
feat: use container DNS when in container mode
More specifically, pick up `/etc/resolv.conf` contents by default when
in container mode, and use that as a base resolver for the host DNS.

Fixes #8303

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-04-16 17:01:36 +04:00
Andrey Smirnov
5d07ac5a7d
fix: close apid inter-backend connections gracefully for real
Fixes #8552

This fixes up the previous fix where `for` condition was inverted, and
also updates the idle timeout, so that the transition to idle happens
before the timeout expires.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-04-16 16:21:34 +04:00
Artem Chernyshev
3dd1f4e88c
chore: extract pkg/imager/quirks to pkg/machinery
To make it possible to use it without pulling the whole Talos.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2024-04-15 21:37:47 +03:00
Andrey Smirnov
bfbd02abfb
fix: assign different priority to IPv6 default gateway on OpenStack
Fixes #8558

Similar fix is done for other platforms, but not OpenStack.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-04-10 21:02:13 +04:00
Andrey Smirnov
c8f674bd3d
test: add a test for 'spin' container runtime
See https://github.com/siderolabs/extensions/pull/355

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-04-10 20:42:16 +04:00
Dmitriy Matrenichev
5390ccd48c
chore: replace []byte with string and use go:embed for templates
Optimize code a bit.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-04-10 17:47:43 +03:00
Dmitriy Matrenichev
ba7cdc8c8b
chore: optimize DNSResolveCacheController
Optimize `DNSResolveCacheController` type, including `dns.Server` optimization for easy start/stop. This PR ensures that we
delete server from runners on stop (even unexpected) and restart it properly. Also fixes incorrect assumption on unit-tests.

Fixes #8563

This PR also does those things:
- Removes `utils.Runner`
- Removes `ctxutil.MonitorFn`
- Removes `dns.Runner`
- Removes `network.dnsRunner`

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-04-10 17:24:19 +03:00
Utku Ozdemir
3735add87c
fix: reconnect to the logs stream in dashboard after reboot
The log stream displayed in the dashboard was stopping to work when a node was rebooted.
Rework the log data source to establish a per-node connection and use a retry loop to always reconnect until the dashboard is terminated.

Print the connection errors in the log stream in red color.

Closes siderolabs/talos#8388.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2024-04-10 10:43:45 +02:00
Andrey Smirnov
9aa1e1b79b
fix: present all accepted CAs to the kube-apiserver
This fixes an issue with a single controlplane cluster.

Properly present all accepted CAs to the apiserver, in the test let the
cluster fully recovery between two CA rotations performed.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-04-08 23:33:22 +04:00
Andrey Smirnov
336e611746
fix: close the apid connection to other machines gracefully
Fixes #8552

When `apid` notices update in the PKI, it flushes its client connections
to other machines (used for proxying), as it might need to use new
client certificate.

While flushing, just calling `Close` might abort already running
connections.

So instead, try to close gracefully with a timeout when the connection
is idle.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-04-08 19:47:04 +04:00
Andrey Smirnov
ff2c427b04
fix: pre-create nftables chain to make kubelet use nftables
In Talos, kubelet (and kube-proxy) images use `iptables-wrapper` script
to detect which version of `iptables` (legacy or NFT) to use.

The script assumes that `kubelet` runs on the host, and uses whatever
version of `iptables` which is being used by the host. In Talos,
`kubelet` runs in a container which has same `iptables-wrapper` script,
and it defaults to `legacy` mode in our case.

We can't check the `kubelet` image, as it would affect all Talos
version, so instead pre-create the chains/tables in `nftables` so that
kubelet will pick up `nft` version of `iptables`, and `kube-proxy` will
do the same.

Without this fix, the problem arises from the mix of `nft` used by Talos
for the firewall and Kubernetes world relying on `legacy` (`xtables`).

Fixes https://github.com/siderolabs/kubelet/issues/77

See e139a11535/iptables-wrapper-installer.sh (L102-L130)

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-04-08 16:24:42 +04:00
Dmitriy Matrenichev
01d8b897c4
fix: make safeReset truly safe to call multiple times
Reading documentation is important, because `timer.Stop()` explicitly says that it will return false if it
already expired *OR* it has been already stopped. Previous version of code would block forever and because of
that code tunnel relay never started.

Take that into account with new version.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-04-05 00:34:17 +03:00
Dmitry Sharshakov
653f838b09
feat: support multiple Docker cluster in talosctl cluster create
Dynamically map Kubernetes and Talos API ports to an available port on
the host, so every cluster gets its own unique set of parts.

As part of the changes, refactor the provision library and interfaces,
dropping old weird interfaces replacing with (hopefully) much more
descriprive names.

Signed-off-by: Dmitry Sharshakov <dmitry.sharshakov@siderolabs.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-04-04 21:21:39 +04:00
Andrey Smirnov
951904554e
chore: bump dependencies (go 1.22.2)
Update Go to 1.22.2, update Go modules to resolve
[HTTP/2 issue](https://www.kb.cert.org/vuls/id/421644).

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-04-04 14:59:24 +04:00
Andrey Smirnov
862c76001b
feat: add support for CoreDNS forwarding to host DNS
This PR adds the support for CoreDNS forwarding to host DNS. We try to bind on 9th address on the first element from
`serviceSubnets` and create a simple service so k8s will not attempt to rebind it.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Co-authored-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-04-03 23:36:17 +03:00
Evan Johnson
e8ae5ef63a
feat: add akamai platform support
Add support for the Akamai(Linode) platform

Signed-off-by: Evan Johnson <ejohnson@akamai.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-04-03 19:50:42 +04:00
Andrey Smirnov
5c0f74b377
fix: don't announce the VIP on acquire failure
I noticed that while looking at #8493, but I don't know if this problem
actually happened in real life.

If acquiring a VIP fails (which can only fail for Equinix/HCloud, not L2
ARP announce), we should not set the leader flag, as it would make the
controller announce the IP, while it shouldn't do that.

If this call fails, there's no matching call to de-announce on failure.

The bug would show up as two nodes having same VIP assigned on the host.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-04-03 18:04:44 +04:00
Andrey Smirnov
1b17008e9d
fix: handle more OpenStack link types
Fixes #8481

The issue was that the link 'bridge' was skipped, so Talos default was
applied to run DHCP and use the DHCP hostname (instead of using
platform's hostname).

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-04-03 16:54:36 +04:00
Andrey Smirnov
e7d8041404
fix: always update firewall rules (kubespan)
Fixes #8498

Before KubeSpan was reimplemented to use resources for firewall rules,
the update was happening always, but it got moved to a wrong section of
the controller which gets executed on resource updates, but ignores
updates of the peer statuses.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-04-03 16:33:16 +04:00
Andrey Smirnov
78b9bd9273
fix: report unsupported x86_64 microarchitecture level
Fixes #8361

Talos requires v2 (circa 2008), but VMs are often configured to limit
the exposed features to the baseline (v1).

```
[    0.779218] [talos] [initramfs] booting Talos v1.7.0-alpha.1-35-gef5bbe728-dirty
[    0.779806] [talos] [initramfs] CPU: QEMU Virtual CPU version 2.5+, 4 core(s), 1 thread(s) per core
[    0.780529] [talos] [initramfs] x86_64 microarchitecture level: 1
[    0.781018] [talos] [initramfs] it might be that the VM is configured with an older CPU model, please check the VM configuration
[    0.782346] [talos] [initramfs] x86_64 microarchitecture level 2 or higher is required, halting
```

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-04-03 16:09:57 +04:00
Dmitriy Matrenichev
71d90ba5f3
fix: retry in the fixed amount of time if grpc relay failed
Before this commit, if tunnel failed with error, it would never restart again until `siderolink.TunnelType` event happen.
For most of the time it's a good idea, because it might mean that destination has changed.

But tunnel can also fail because allowed peer list is not yet loaded on newly started Omni instance.

Because of that, we want to try again and not be tied to the runtime event channel.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-04-03 14:03:42 +03:00
Andrey Smirnov
3195e5d15c
fix: force Flannel CNI to use KubePrism Kubernetes API endpoint
Fixes #8501

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-04-02 22:01:05 +04:00
Noel Georgi
f515741b52
chore: add equinix e2e-tests
Add equinix e2e-tests.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-04-02 17:16:59 +05:30
Andrey Smirnov
117e60583d
feat: add support for static extra fields for JSON logs
Fixes #7356

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-04-02 15:15:14 +04:00
Andrey Smirnov
090143b030
fix: allow platform cmdline args to be platform-specific
Fix Equnix Metal (where proper arm64 args are known) and metal platform
(using generic arm64 console arg).

Other platforms might need to be updated, but correct settings are not
known at the moment.

Fixes #8529

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-04-02 14:41:39 +04:00
Andrey Smirnov
7a68504b6b
feat: support rotating Kubernetes CA
Fixes #8440

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-04-01 22:08:02 +04:00
Dmitriy Matrenichev
8dc4910c48
chore: enable "WG over GRPC" testing in siderolink agent tests
Fixes https://github.com/siderolabs/talos/issues/8514
For https://github.com/siderolabs/talos/issues/8392

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-04-01 18:24:57 +03:00
Dmitry Sharshakov
9456489147
feat: support hardware watchdog timers
Only enabled when activated by config, disabled on shutdown/reboot

Fixes #8284

Signed-off-by: Dmitry Sharshakov <dmitry.sharshakov@siderolabs.com>
Signed-off-by: Dmitry Sharshakov <d3dx12.xx@gmail.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-03-25 18:19:39 +03:00
Dmitriy Matrenichev
949ad11a2d
chore: import siderolink as siderolink-launch subcommand
This PR ensures that we can test our siderolink communication using embedded siderolink-agent.
If `--with-siderolink` provided during `talos cluster create` talosctl will embed proper kernel string and setup `siderolink-agent` as a separate process. It should be used with combination of `--skip-injecting-config` and `--with-apply-config` (the latter will use newly generated IPv6 siderolink addresses which talosctl passes to the agent as a "pre-bind").

Fixes #8392

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-03-23 16:08:56 +03:00
Andrey Smirnov
8eacc4ba80
feat: support rotation of Talos API CA
This allows to roll all nodes to use a new CA, to refresh it, or e.g.
when the `talosconfig` was exposed accidentally.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-03-22 12:16:47 +04:00
Dmitry Sharshakov
84ec8c16f3
feat: support syncing to PTP clocks
Also abstract away from NTP types.

Signed-off-by: Dmitry Sharshakov <dmitry.sharshakov@siderolabs.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-03-21 17:20:26 +04:00
Andrey Smirnov
7d43c9aa6b
chore: annotate installer errors
I want to catch a spurious error `ENODEV`, where exactly it comes from.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-03-21 16:58:34 +04:00
Andrey Smirnov
f737e6495c
fix: populate routes to BGP neighbors (Equinix Metal)
Fixes #8267

Also refactor the code so that we don't fail hard on mutiple bonds, but
it's not clear still how to attach addresses, as they don't have a
interface name field, so for now attaching to the first bond.

Fixes #8411

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-03-21 15:44:21 +04:00
Dmitriy Matrenichev
19f15a840c
chore: bump golangci-lint to 1.57.0
Fix all discovered issues.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-03-21 01:06:53 +03:00
Artem Chernyshev
113fb646ec
chore: use go-talos-support library
The code for collecting Talos `support.zip` was extracted there.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2024-03-19 18:28:46 +03:00
Andrey Smirnov
89fc68b459
fix: service lifecycle issues
The core change is moving the context out of the `ServiceRunner` struct
to be a local variable, and using a channel to notify about shutdown
events.

Add more synchronization between Run and the moment service started to
avoid mis-identifying not running (yet) service as successfully finished.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Co-authored-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-03-19 18:11:13 +04:00
Andrey Smirnov
ead37abf09
test: disable volume tests
They're flaky, disable until the root cause is known.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-03-19 16:40:42 +04:00
Andrey Smirnov
15beb14780
feat: implement blockdevice watch controller
This controller combines kobject events, and scan of `/sys/block` to
build a consistent list of available block devices, updating resources
as the blockdevice changes.

Based on these resources the next step can run probe on the blockdevices
as they change to present a consistent view of filesystems/partitions.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-03-18 18:28:40 +04:00
Dmitriy Matrenichev
06e3bc0cbd
feat: implement Siderolink wireguard over GRPC
For #8064

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-03-18 15:38:13 +03:00
Andrey Smirnov
9afa70baf3
fix: patch correctly config in talosctl upgrade-k8s
The current code was stipping non-`v1alpha1.Config` documents. Provide a
proper method in the config provider, and update places using it.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-03-15 20:42:44 +04:00
Andrey Smirnov
3130caf954
chore: re-enable DRBD extension
See https://github.com/siderolabs/extensions/pull/343

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-03-15 15:55:18 +04:00
Utku Ozdemir
7376f34e82
fix: remove maintenance config when maintenance service is shut down
We now remove the machine config with the id `maintenance` when we are done with it - when the maintenance service is shut down.

Closes siderolabs/talos#8424, where in some configurations there would be machine configs with both `v1alpha1` and `maintenance` IDs present, causing the `talosctl edit machineconfig` to loop twice and causing confusion.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2024-03-14 12:51:59 +01:00
Noel Georgi
d118a852b9
feat: implement Install for imager overlays
Implement `Install` for imager overlays.
Also add support for generating installers.

Depends on: #8377

Fixes: #8350
Fixes: #8351
Fixes: #8350

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-03-12 22:46:29 +05:30
Dmitriy Matrenichev
32e0877607
chore: print all available logs containers in logs command completions
This is a small quality of life improvement that allows `logs` subcommand to suggest all available logs.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-03-11 17:48:01 +03:00
Utku Ozdemir
1bb6027ccd
fix: fix nil panic on maintenance upgrade with partial config
Fix the nil dereferences when a Talos node is attempted to be upgraded while in maintenance mode and having a partial machine config.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2024-03-08 12:52:21 +03:00
Noel Georgi
1ec6683e0c
chore: use go-copy
Use go-copy and drop `pkg/copy`.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-03-07 19:51:28 +05:30
Artem Chernyshev
3c8f51d707
chore: move cli formatters and version modules to machinery
To be used in the `go-talos-support` module without importing the whole
Talos repo.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2024-03-07 16:29:15 +03:00