2122 Commits

Author SHA1 Message Date
Andrey Smirnov
5ff6cf82ca
fix: drop /opt mount for containers/tink
The `/opt/cni/bin` in the rootfs contains CNI binaries, which get
overwritten by the volume mount.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-08-22 20:39:52 +04:00
Utku Ozdemir
3041d90751
fix: always handle PermissionDenied in dashboard resource watches
A single resource not being there (i.e., the type does not exist on an older version of Talos) or not allowed to be read for whatever reason should not interrupt the refresh cycle of the other resources' status.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2024-08-20 22:25:19 +02:00
Andrey Smirnov
ee4290f684
fix: bind HostDNS to 169.254.x link-local address
This is an attempt to fix many issues related with trying to use Service
IP for host DNS.

Fixes #9196

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-08-19 18:44:35 +04:00
Dmitriy Matrenichev
45cc8688a1
chore: replace if blocks with min/max functions
Simplify code where possible.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-08-16 10:40:44 +03:00
Dmitriy Matrenichev
a5bd770bf9
fix: retry with another upstream if the previous failed
Do not return response to the client if we got SERVFAIL or REFUSED,
until we run out of upstreams.

Fixes #9143

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-08-14 22:19:10 +03:00
Andrey Smirnov
3c36c41a91
feat: provide device extra settle timeout
Fixes #9092

This is a workaround for broken hardware drivers (e.g. RAID
controllers), which report settled event too early.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-08-14 17:36:45 +04:00
Andrey Smirnov
61a1c946bf
feat: bundle (some) CNI plugins with Talos core
Fixes https://github.com/siderolabs/extensions/issues/448

Bundle some CNI standard plugins plus Flannel CNI plugin (as Flannel is
the default CNI in Talos) in the Talos `initramfs`.

With this change, no plugin install is required, so the `install-cni`
step is dropped from the Flannel default manifest.

The bundled plugins:

```
$ talosctl -n 172.20.0.2 ls -lH /opt/cni/bin/
NODE         MODE         UID   GID   SIZE(B)   LASTMOD       NAME
172.20.0.2   drwxr-xr-x   0     0     109 B     7 hours ago   .
172.20.0.2   -rwxr-xr-x   0     0     3.2 MB    7 hours ago   bridge
172.20.0.2   -rwxr-xr-x   0     0     3.3 MB    7 hours ago   firewall
172.20.0.2   -rwxr-xr-x   0     0     2.4 MB    7 hours ago   flannel
172.20.0.2   -rwxr-xr-x   0     0     2.4 MB    7 hours ago   host-local
172.20.0.2   -rwxr-xr-x   0     0     2.4 MB    7 hours ago   loopback
172.20.0.2   -rwxr-xr-x   0     0     2.8 MB    7 hours ago   portmap
```

The `initramfs` for amd64 grows 67 -> 73 MiB with this change.

The path `/opt/cni/bin` is still an overlay mount, so extra plugins can
be dropped to this directory (no change here).

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-08-14 14:33:18 +04:00
Noel Georgi
091da163b7
chore: support arm64 kexec from zboot kernel images
When using kernel images that are using ZBOOT for arm64 we need to
extract the vmlinux from the vmlinuz EFI file and pass it on the the
kexec call.

Ref: https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/kexec-pe-zboot.c

Fixes: #8907

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-08-13 20:56:00 +05:30
Serge Logvinov
ee67da14c5
feat: scaleway routed ip
Support new network feature "routed ip".
IPv4 now attached to the VM directly.

Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-08-12 22:42:34 +04:00
Noel Georgi
f9f5e0ef55
chore: fix k8s tests
The check for k8s suite added in #9085 causes issues with applying k8s resources
which are global like `Namespace` or `StorageClass`.

Instead of failing just log.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-08-09 13:28:02 +05:30
Noel Georgi
2ac8d2274f
chore: support unsupported flag for mkfs
Support `unsupported` flag for mkfs, so that `STATE` partition with size
less than 300M can be created by `mkfs.xfs`.

This allows to bring in newer `xfsprogs` that can repair corrupted FS
better.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-08-08 20:21:02 +05:30
Utku Ozdemir
9d34158500
fix: fix graph diffs in dashboard when node aliases are used
When `talosctl dashboard` is used with node "aliases" (e.g., node names or machine IDs in Omni) passed via `-n` flag, the graphs in the monitor tab were not rendered correctly: The matching of the old and current data were done incorrectly.

Fix this by pushing node alias->IP resolution down to the (api & log) data sources of the dashboard, by passing a resolver to them.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2024-08-07 14:54:32 +02:00
Noel Georgi
64914b086c
chore: add test for crun extension
Add a test to verify the `crun` runtimeclass container-runtime extension
works as expected.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-08-02 20:15:01 +05:30
Andrey Smirnov
7a1c62b8bc
feat: publish installed extensions as node labels/annotations
Extensions are posted the following way:

`extensions.talos.dev/<name>=<version>`

The name should be valid as a label (annotation) key.

If the value is valid as a label value, use labels, otherwise use
annotations.

Also implements node annotations in the machine config as a side-effect.

Fixes #9089

Fixes #8971

See #9070

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-08-01 17:32:09 +04:00
Andrey Smirnov
3f2058aba2
fix: update containerd configuration and settings
Provide `XDG_RUNTIME_DIR` environment variable, this specifically fixes
the `kubectl exec` action when `/tmp` is filled up.

Update containerd configuration to version 3 and fix it up.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-07-31 19:15:19 +04:00
Noel Georgi
50e5f37efb
chore: add test for apparmor
Add a test that verifies pods can be scheduled with `RuntimeDefault`
apparmor profile.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-07-30 20:24:57 +05:30
Noel Georgi
3ce5492f85
feat: runc memfd-bind service
Add a `runc-memfd-bind` service so that runc binary is not copied for
every `runc` invocation.

Fixes: #9007.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-07-29 19:02:59 +05:30
Noel Georgi
117628aa60
chore: add test for gvisor extension with platform kvm
Add test for Gvisor extensions when kvm platform is used.

The test is marked as skipped until pod termination issue is resolved.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-07-25 19:15:27 +05:30
EricMa
0872901783
feat: use ethtool ioctl to get link status when netlink api not available
when kernel not support ethtool-netlink,we will use ethtool-ioctl to get link status

Signed-off-by: EricMa <307748790@qq.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-07-23 19:07:55 +04:00
Jean-Francois Roy
fd54dc191d
feat(talosctl): append microsoft secure boot certs
This patch adds a flag to `secureboot.database.Generate` to append the
Microsoft UEFI secure boot DB and KEK certificates to the appropriate
ESLs, in addition to complimentary command line flags.

This patch also includes a copy of said Microsoft certificates. The
certificates are downloaded from an official Microsoft repo.

Signed-off-by: Jean-Francois Roy <jf@devklog.net>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-07-22 14:15:42 +04:00
Andrey Smirnov
fd6ddd11ef
feat: provide POD_IP env var to scheduler and controller-manager
Fixes #9031

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-07-17 21:41:15 +04:00
Andrey Smirnov
c288ace7b1
fix: be more smart when merging DNS resolver config
Fixes #8690

Consider the following scenario (e.g. OpenStack): platform issues a
correct list of DNS servers, which includes both IPv4 and IPv6
resolvers, and configures DHCPv4 on the interface.

DHCPv4 returns a set of IPv4 resolvers (as it can't return IPv6 ones),
and this list completely overrides the list from the platform, wiping
out the IPv6 resolvers completely.

With this change, the merge process is more smart, as it tries to
preserve IPv6 resolvers for example if the next layer provides no
resolvers for IPv6.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-07-16 21:20:27 +04:00
Andrey Smirnov
d983e44308
fix: panic on shutdown
Fixes #9017

Don't assume the config is there before trying to access it.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-07-16 17:24:03 +04:00
Andrey Smirnov
b07338f547
feat: provide machine config document to update trusted CA roots
Fixes #8867

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-07-12 19:28:31 +04:00
Andrey Smirnov
f14c4795e5
fix: sort ports and merge adjacent ones in the nft rule
Fixes #9009

When building a port interval set, sort the ports and merge adjacent
ranges to prevent mismatch on the nftables side.

With address sets, this was already the case due to the way IPRange
builder works, but ports need a manual implementation.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-07-12 14:53:23 +04:00
Andrey Smirnov
cf5effabb2
feat: provide an option to enforce SecureBoot for TPM enrollment
Fixes #8995

There is no security impact, as the actual SecureBoot
state/configuration is measured into the PCR 7 and the disk encryption
key unsealing is tied to this value.

This is more to provide a way to avoid accidentally encrypting to the
TPM while SecureBoot is not enabled.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-07-11 22:21:47 +04:00
Andrey Smirnov
736c1485e2
fix: change the UEFI firmware search path order
Ensure that SecureBoot enabled images come before regular ones.

With Ubuntu 24.04 `ovmf` package, due to the ordering of the search
paths `talosctl` might pick up a wrong image and disable SecureBoot.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-07-11 21:56:33 +04:00
Andrey Smirnov
398151e64f
fix: remove host bind mount for /tmp for trustd
Not sure why this mount was needed, but it was added long time ago, and
I believe it's no longer needed.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-07-10 19:56:21 +04:00
Dmitriy Matrenichev
fbde9c556f
chore: bump deps
Bump github.com/siderolabs/grpc-proxy to v0.4.1 and replace deprecated calls to `grpc.CustomCodec`.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-07-09 20:01:13 +03:00
Andrey Smirnov
3bab15214d
feat: update Kubernetes to 1.31.0-alpha.3
Fixes #8911

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-07-09 17:49:06 +04:00
Dmitriy Matrenichev
dad9c40c73
chore: simplify code
- replace `interface{}` with `any` using `gofmt -r 'interface{} -> any -w'`
- replace `a = []T{}` with `var a []T` where possible.
- replace `a = []T{}` with `a = make([]T, 0, len(b))` where possible.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-07-08 18:14:00 +03:00
Andrey Smirnov
2512ef435f
test: fix the integrtion tests for apply-config
They got broken after refactoring.

Also use this PR to test things before the release.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-07-08 14:06:45 +04:00
Dmitriy Matrenichev
076f3c4f20
chore: improve link spec controller code
`SortBonds` function bothered me since the last time I refactored this part.

We always know that it only accepts `network.LinkSpec`s, but we accepted the slice of untyped Resources because
this is what `List` method returns. Now we can do better, since `safe.List` now supports `Swap` method.

We can utilize `sort.Interface` and pass `safe.List` directly to `SortBonds`.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-07-05 16:39:27 +03:00
Andrey Smirnov
0454130ad9
feat: suppress controller runtime first N failures on the console
As the controllers might fail with transient errors on machine startup,
but errors are always retried, persisten errors will anyway show up in
the console.

The full `talosctl logs controller-runtime` are not suppressed.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-07-05 15:36:54 +04:00
Andrey Smirnov
be35f380cc
chore: update pkgs/tools/extras
This brings in Go 1.22.5 and new Flannel CNI plugin.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-07-03 20:38:55 +04:00
Andrey Smirnov
b4c871e4b7
chore: bump dependencies
Update Go modules and other dependencies.

Fix linting of the Dockerfile.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-07-02 14:46:51 +04:00
Andrey Smirnov
cc345c8c94
feat: add support for configuring vlan filtering on the bridge
Fixes #8941

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-07-01 20:20:28 +04:00
Dmitriy Matrenichev
2d054ad355
chore: handle documents diff in apply-config dry run
Before this PR diff generator only diffed the v1alpha1 config and nothing else. With this PR it also takes
separate docs into the account.

```shell
~ > <editor> controlplane.yaml
~ > talosctl -n talos-default-controlplane-1  apply-config --file controlplane.yaml --dry-run
Dry run summary:
Applied configuration without a reboot (skipped in dry-run).
Config diff:
No changes.
Documents diff:
[]config.Document{
+	&runtime.KmsgLogV1Alpha1{
+		Meta:       meta.Meta{MetaAPIVersion: "v1alpha1", MetaKind: "KmsgLogConfig"},
+		MetaName:   "omni-kmsg",
+		KmsgLogURL: s"tcp://[fdae:41e4:649b:9303::1]:8092",
+	},
}
~ > talosctl -n talos-default-controlplane-1  apply-config --file controlplane.yaml
Applied configuration without a reboot
~ >
~ >
~ >
~ > <editor> controlplane.yaml
~ > talosctl -n talos-default-controlplane-1  apply-config --file controlplane.yaml --dry-run
Dry run summary:
Applied configuration without a reboot (skipped in dry-run).
Config diff:
No changes.
Documents diff:
[]config.Document{
	&runtime.KmsgLogV1Alpha1{Meta: {MetaAPIVersion: "v1alpha1", MetaKind: "KmsgLogConfig"}, MetaName: "omni-kmsg", KmsgLogURL: {URL: &{Scheme: "tcp", Host: "[fdae:41e4:649b:9303::1]:8092"}}},
+	&network.DefaultActionConfigV1Alpha1{
+		Meta:    meta.Meta{MetaAPIVersion: "v1alpha1", MetaKind: "NetworkDefaultActionConfig"},
+		Ingress: s"block",
+	},
}
```

Closes #8885

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-06-27 21:15:36 +03:00
Dmitriy Matrenichev
c603d2bf95
chore: output more info when ExecuteCommandInPod fails
This should make investigating things like [this](https://github.com/siderolabs/talos/actions/runs/9411253542/job/25924192027)
easier.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-06-24 20:15:45 +03:00
Noel Georgi
86a3222aee
chore: use new disks api for iscsi tests
The iscsi test broke when the new disks api was introduced making the
test pass always, now filter other only `iscsi` disk types using the new
disks API.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-06-18 18:38:21 +05:30
Utku Ozdemir
5ffc3f14bd
feat: show siderolink status on dashboard
Add a new resource, `SiderolinkStatus`, which combines the following info:
- The Siderolink API endpoint without the query parameters or fragments (potentially sensitive info due to the join token)
- The status of the Siderolink connection

This resource is not set as sensitive, so it can be retrieved by the users with `os:operator` role (e.g., using `talosctl dashboard` through Omni).

Make use of this resource in the dashboard to display the status of the Siderolink connection.

Additionally, rework the status columns in the dashboard to:
- Display a Linux terminal compatible "tick" or a "cross" prefix for statuses in addition to the red/green color coding.
- Move and combine some statuses to save rows and make them more even.

Closes siderolabs/talos#8643.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2024-06-18 12:31:54 +02:00
Andrey Smirnov
6f6a5d1057
chore: upgrade to rtnetlink/v2 library
The v1 version is no longer supported.

The major change is the decoding of link data, but we're not using it,
as we have our own decoders/encoders for a long time.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-13 19:20:48 +04:00
Andrey Smirnov
7fcb521a6a
feat: use hydrophone instead of sonobuoy
Fixes #8790

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-11 16:51:45 +04:00
Andrey Smirnov
d1a0c1f983
test: fix the integration test for no META name
When META has never been written (e.g. booted from a disk image), it
won't be detected as `talosmeta`.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-11 15:17:52 +04:00
Dmitriy Matrenichev
5350063340
chore: fix our dns server implementation
This PR does those things:
* No longer shuffles dns servers for each request.
* Sets a context timeout of 4.5 seconds.
* Correctly returns a proper error from the root layer.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-06-11 01:36:21 +03:00
Dmitriy Matrenichev
c6f90d0149
chore: replace sync.Map with concurrent.HashTrieMap
Also bump `cosi-project/runtime` to the v0.4.4

Closes #8851

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-06-10 20:45:47 +03:00
Andrey Smirnov
e8ced2c2dd
chore: drop k8s timeout in the default kubeconfig
(This is not user-facing, but rather internal use of the kubeconfig in
the tests/inside the machine).

This was added 4 years ago as a workaround, but instead of a global
timeout we should rather use contexts with timeouts/deadlines (and we
do!).

Setting a global timeout breaks streaming Kubernetes pod logs.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-10 18:29:50 +04:00
Andrey Smirnov
7cbdce73f7
fix: detect CD devices, fix user disks wipe test
Detect CD devices, and set size to 0 for CD without media.

In user disk wipe tests, skip device mapper devices and CD-ROM.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-10 18:00:06 +04:00
Dmitriy Matrenichev
aca475c665
chore: small usability fixes
* Replace logging.Wrap(log.Writer()) with zaptest.NewLogger(suite.T()) where possible.
* Replace reflect.DeepEqual with =|slices.Equal|bytes.Equal where possible.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-06-10 05:48:11 +03:00
Marcel Richter
5e66e117e2
fix: initial assignment of Hetzner Cloud Alias IP
The assignment of private networks happens in the hetzner cloud after
starting the server and therefore often after querying the network
information when assigning VIPs.

If an alias IP is to be set but no private network is yet available, an
error message is now thrown, until the private network is assigned.

Previously, no error message was thrown and the
network ID was set to 0, which means that the VIP
is regarded as a public floating IP in the further
code and not as a private alias IP.

Signed-off-by: Marcel Richter <mail@mrclrchtr.de>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-07 21:49:24 +04:00