Commit Graph

2110 Commits

Author SHA1 Message Date
Noel Georgi
64914b086c
chore: add test for crun extension
Add a test to verify the `crun` runtimeclass container-runtime extension
works as expected.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-08-02 20:15:01 +05:30
Andrey Smirnov
7a1c62b8bc
feat: publish installed extensions as node labels/annotations
Extensions are posted the following way:

`extensions.talos.dev/<name>=<version>`

The name should be valid as a label (annotation) key.

If the value is valid as a label value, use labels, otherwise use
annotations.

Also implements node annotations in the machine config as a side-effect.

Fixes #9089

Fixes #8971

See #9070

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-08-01 17:32:09 +04:00
Andrey Smirnov
3f2058aba2
fix: update containerd configuration and settings
Provide `XDG_RUNTIME_DIR` environment variable, this specifically fixes
the `kubectl exec` action when `/tmp` is filled up.

Update containerd configuration to version 3 and fix it up.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-07-31 19:15:19 +04:00
Noel Georgi
50e5f37efb
chore: add test for apparmor
Add a test that verifies pods can be scheduled with `RuntimeDefault`
apparmor profile.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-07-30 20:24:57 +05:30
Noel Georgi
3ce5492f85
feat: runc memfd-bind service
Add a `runc-memfd-bind` service so that runc binary is not copied for
every `runc` invocation.

Fixes: #9007.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-07-29 19:02:59 +05:30
Noel Georgi
117628aa60
chore: add test for gvisor extension with platform kvm
Add test for Gvisor extensions when kvm platform is used.

The test is marked as skipped until pod termination issue is resolved.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-07-25 19:15:27 +05:30
EricMa
0872901783
feat: use ethtool ioctl to get link status when netlink api not available
when kernel not support ethtool-netlink,we will use ethtool-ioctl to get link status

Signed-off-by: EricMa <307748790@qq.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-07-23 19:07:55 +04:00
Jean-Francois Roy
fd54dc191d
feat(talosctl): append microsoft secure boot certs
This patch adds a flag to `secureboot.database.Generate` to append the
Microsoft UEFI secure boot DB and KEK certificates to the appropriate
ESLs, in addition to complimentary command line flags.

This patch also includes a copy of said Microsoft certificates. The
certificates are downloaded from an official Microsoft repo.

Signed-off-by: Jean-Francois Roy <jf@devklog.net>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-07-22 14:15:42 +04:00
Andrey Smirnov
fd6ddd11ef
feat: provide POD_IP env var to scheduler and controller-manager
Fixes #9031

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-07-17 21:41:15 +04:00
Andrey Smirnov
c288ace7b1
fix: be more smart when merging DNS resolver config
Fixes #8690

Consider the following scenario (e.g. OpenStack): platform issues a
correct list of DNS servers, which includes both IPv4 and IPv6
resolvers, and configures DHCPv4 on the interface.

DHCPv4 returns a set of IPv4 resolvers (as it can't return IPv6 ones),
and this list completely overrides the list from the platform, wiping
out the IPv6 resolvers completely.

With this change, the merge process is more smart, as it tries to
preserve IPv6 resolvers for example if the next layer provides no
resolvers for IPv6.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-07-16 21:20:27 +04:00
Andrey Smirnov
d983e44308
fix: panic on shutdown
Fixes #9017

Don't assume the config is there before trying to access it.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-07-16 17:24:03 +04:00
Andrey Smirnov
b07338f547
feat: provide machine config document to update trusted CA roots
Fixes #8867

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-07-12 19:28:31 +04:00
Andrey Smirnov
f14c4795e5
fix: sort ports and merge adjacent ones in the nft rule
Fixes #9009

When building a port interval set, sort the ports and merge adjacent
ranges to prevent mismatch on the nftables side.

With address sets, this was already the case due to the way IPRange
builder works, but ports need a manual implementation.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-07-12 14:53:23 +04:00
Andrey Smirnov
cf5effabb2
feat: provide an option to enforce SecureBoot for TPM enrollment
Fixes #8995

There is no security impact, as the actual SecureBoot
state/configuration is measured into the PCR 7 and the disk encryption
key unsealing is tied to this value.

This is more to provide a way to avoid accidentally encrypting to the
TPM while SecureBoot is not enabled.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-07-11 22:21:47 +04:00
Andrey Smirnov
736c1485e2
fix: change the UEFI firmware search path order
Ensure that SecureBoot enabled images come before regular ones.

With Ubuntu 24.04 `ovmf` package, due to the ordering of the search
paths `talosctl` might pick up a wrong image and disable SecureBoot.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-07-11 21:56:33 +04:00
Andrey Smirnov
398151e64f
fix: remove host bind mount for /tmp for trustd
Not sure why this mount was needed, but it was added long time ago, and
I believe it's no longer needed.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-07-10 19:56:21 +04:00
Dmitriy Matrenichev
fbde9c556f
chore: bump deps
Bump github.com/siderolabs/grpc-proxy to v0.4.1 and replace deprecated calls to `grpc.CustomCodec`.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-07-09 20:01:13 +03:00
Andrey Smirnov
3bab15214d
feat: update Kubernetes to 1.31.0-alpha.3
Fixes #8911

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-07-09 17:49:06 +04:00
Dmitriy Matrenichev
dad9c40c73
chore: simplify code
- replace `interface{}` with `any` using `gofmt -r 'interface{} -> any -w'`
- replace `a = []T{}` with `var a []T` where possible.
- replace `a = []T{}` with `a = make([]T, 0, len(b))` where possible.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-07-08 18:14:00 +03:00
Andrey Smirnov
2512ef435f
test: fix the integrtion tests for apply-config
They got broken after refactoring.

Also use this PR to test things before the release.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-07-08 14:06:45 +04:00
Dmitriy Matrenichev
076f3c4f20
chore: improve link spec controller code
`SortBonds` function bothered me since the last time I refactored this part.

We always know that it only accepts `network.LinkSpec`s, but we accepted the slice of untyped Resources because
this is what `List` method returns. Now we can do better, since `safe.List` now supports `Swap` method.

We can utilize `sort.Interface` and pass `safe.List` directly to `SortBonds`.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-07-05 16:39:27 +03:00
Andrey Smirnov
0454130ad9
feat: suppress controller runtime first N failures on the console
As the controllers might fail with transient errors on machine startup,
but errors are always retried, persisten errors will anyway show up in
the console.

The full `talosctl logs controller-runtime` are not suppressed.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-07-05 15:36:54 +04:00
Andrey Smirnov
be35f380cc
chore: update pkgs/tools/extras
This brings in Go 1.22.5 and new Flannel CNI plugin.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-07-03 20:38:55 +04:00
Andrey Smirnov
b4c871e4b7
chore: bump dependencies
Update Go modules and other dependencies.

Fix linting of the Dockerfile.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-07-02 14:46:51 +04:00
Andrey Smirnov
cc345c8c94
feat: add support for configuring vlan filtering on the bridge
Fixes #8941

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-07-01 20:20:28 +04:00
Dmitriy Matrenichev
2d054ad355
chore: handle documents diff in apply-config dry run
Before this PR diff generator only diffed the v1alpha1 config and nothing else. With this PR it also takes
separate docs into the account.

```shell
~ > <editor> controlplane.yaml
~ > talosctl -n talos-default-controlplane-1  apply-config --file controlplane.yaml --dry-run
Dry run summary:
Applied configuration without a reboot (skipped in dry-run).
Config diff:
No changes.
Documents diff:
[]config.Document{
+	&runtime.KmsgLogV1Alpha1{
+		Meta:       meta.Meta{MetaAPIVersion: "v1alpha1", MetaKind: "KmsgLogConfig"},
+		MetaName:   "omni-kmsg",
+		KmsgLogURL: s"tcp://[fdae:41e4:649b:9303::1]:8092",
+	},
}
~ > talosctl -n talos-default-controlplane-1  apply-config --file controlplane.yaml
Applied configuration without a reboot
~ >
~ >
~ >
~ > <editor> controlplane.yaml
~ > talosctl -n talos-default-controlplane-1  apply-config --file controlplane.yaml --dry-run
Dry run summary:
Applied configuration without a reboot (skipped in dry-run).
Config diff:
No changes.
Documents diff:
[]config.Document{
	&runtime.KmsgLogV1Alpha1{Meta: {MetaAPIVersion: "v1alpha1", MetaKind: "KmsgLogConfig"}, MetaName: "omni-kmsg", KmsgLogURL: {URL: &{Scheme: "tcp", Host: "[fdae:41e4:649b:9303::1]:8092"}}},
+	&network.DefaultActionConfigV1Alpha1{
+		Meta:    meta.Meta{MetaAPIVersion: "v1alpha1", MetaKind: "NetworkDefaultActionConfig"},
+		Ingress: s"block",
+	},
}
```

Closes #8885

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-06-27 21:15:36 +03:00
Dmitriy Matrenichev
c603d2bf95
chore: output more info when ExecuteCommandInPod fails
This should make investigating things like [this](https://github.com/siderolabs/talos/actions/runs/9411253542/job/25924192027)
easier.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-06-24 20:15:45 +03:00
Noel Georgi
86a3222aee
chore: use new disks api for iscsi tests
The iscsi test broke when the new disks api was introduced making the
test pass always, now filter other only `iscsi` disk types using the new
disks API.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-06-18 18:38:21 +05:30
Utku Ozdemir
5ffc3f14bd
feat: show siderolink status on dashboard
Add a new resource, `SiderolinkStatus`, which combines the following info:
- The Siderolink API endpoint without the query parameters or fragments (potentially sensitive info due to the join token)
- The status of the Siderolink connection

This resource is not set as sensitive, so it can be retrieved by the users with `os:operator` role (e.g., using `talosctl dashboard` through Omni).

Make use of this resource in the dashboard to display the status of the Siderolink connection.

Additionally, rework the status columns in the dashboard to:
- Display a Linux terminal compatible "tick" or a "cross" prefix for statuses in addition to the red/green color coding.
- Move and combine some statuses to save rows and make them more even.

Closes siderolabs/talos#8643.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2024-06-18 12:31:54 +02:00
Andrey Smirnov
6f6a5d1057
chore: upgrade to rtnetlink/v2 library
The v1 version is no longer supported.

The major change is the decoding of link data, but we're not using it,
as we have our own decoders/encoders for a long time.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-13 19:20:48 +04:00
Andrey Smirnov
7fcb521a6a
feat: use hydrophone instead of sonobuoy
Fixes #8790

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-11 16:51:45 +04:00
Andrey Smirnov
d1a0c1f983
test: fix the integration test for no META name
When META has never been written (e.g. booted from a disk image), it
won't be detected as `talosmeta`.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-11 15:17:52 +04:00
Dmitriy Matrenichev
5350063340
chore: fix our dns server implementation
This PR does those things:
* No longer shuffles dns servers for each request.
* Sets a context timeout of 4.5 seconds.
* Correctly returns a proper error from the root layer.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-06-11 01:36:21 +03:00
Dmitriy Matrenichev
c6f90d0149
chore: replace sync.Map with concurrent.HashTrieMap
Also bump `cosi-project/runtime` to the v0.4.4

Closes #8851

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-06-10 20:45:47 +03:00
Andrey Smirnov
e8ced2c2dd
chore: drop k8s timeout in the default kubeconfig
(This is not user-facing, but rather internal use of the kubeconfig in
the tests/inside the machine).

This was added 4 years ago as a workaround, but instead of a global
timeout we should rather use contexts with timeouts/deadlines (and we
do!).

Setting a global timeout breaks streaming Kubernetes pod logs.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-10 18:29:50 +04:00
Andrey Smirnov
7cbdce73f7
fix: detect CD devices, fix user disks wipe test
Detect CD devices, and set size to 0 for CD without media.

In user disk wipe tests, skip device mapper devices and CD-ROM.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-10 18:00:06 +04:00
Dmitriy Matrenichev
aca475c665
chore: small usability fixes
* Replace logging.Wrap(log.Writer()) with zaptest.NewLogger(suite.T()) where possible.
* Replace reflect.DeepEqual with =|slices.Equal|bytes.Equal where possible.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-06-10 05:48:11 +03:00
Marcel Richter
5e66e117e2
fix: initial assignment of Hetzner Cloud Alias IP
The assignment of private networks happens in the hetzner cloud after
starting the server and therefore often after querying the network
information when assigning VIPs.

If an alias IP is to be set but no private network is yet available, an
error message is now thrown, until the private network is assigned.

Previously, no error message was thrown and the
network ID was set to 0, which means that the VIP
is regarded as a public floating IP in the further
code and not as a private alias IP.

Signed-off-by: Marcel Richter <mail@mrclrchtr.de>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-07 21:49:24 +04:00
Andrey Smirnov
f07b79f4a8
feat: provide disk detection based on new blockdevices
Uses go-siderolabs/go-blockdevice/v2 for all the hard parts,
provides new resource `Disk` which describes all disks in the system.

Additional resource `SystemDisk` always point to the system disk (based
on the location of `META` partition).

The `Disks` API (and `talosctl disks`) provides a view now into the
`talosctl get disks` to keep backwards compatibility.

QEMU provisioner can now create extra disks of various types: IDE, AHCI,
SCSI, NVME, this allows to test detection properly.

The new resource will be the foundation for volume provisioning (to pick
up the disk to provision the volume on).

Example:

```
talosctl -n 172.20.0.5 get disks
NODE         NAMESPACE   TYPE   ID        VERSION   SIZE          READ ONLY   TRANSPORT   ROTATIONAL   WWID                                                               MODEL            SERIAL
172.20.0.5   runtime     Disk   loop0     1         65568768      true
172.20.0.5   runtime     Disk   nvme0n1   1         10485760000   false       nvme                     nvme.1b36-6465616462656566-51454d55204e564d65204374726c-00000001   QEMU NVMe Ctrl   deadbeef
172.20.0.5   runtime     Disk   sda       1         10485760000   false       virtio      true                                                                            QEMU HARDDISK
172.20.0.5   runtime     Disk   sdb       1         10485760000   false       sata        true         t10.ATA     QEMU HARDDISK                           QM00013        QEMU HARDDISK
172.20.0.5   runtime     Disk   sdc       1         10485760000   false       sata        true         t10.ATA     QEMU HARDDISK                           QM00001        QEMU HARDDISK
172.20.0.5   runtime     Disk   vda       1         12884901888   false       virtio      true
```

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-07 20:18:32 +04:00
Andrey Smirnov
7c9a14383e
fix: volume discovery improvements
Use shared locks, discover more partitions, some other small changes.

Re-enable the flaky test.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-06 19:45:40 +04:00
Andrey Smirnov
80ca8ff713
fix: update the cgroups for Talos core services
There was a bit of a mess here which worked fine until we bumped
runc/containerd, and the problem shows up in Talos-in-Kubernetes tests.

Use consistently `runner.WithCgroupPath`, as it handles cgroup nesting
for cases when Talos runs in a container.

Assign each service its own unique cgroup.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-06 17:24:20 +04:00
Andrey Smirnov
8dbe2128a9
feat: implement Talos diagnostics
Talos diagnostics analyzes current system state and comes up with detailed
warnings on the system misconfiguration which might be tricky to figure
out other way.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-05 22:28:15 +04:00
Andrey Smirnov
41f92e0ba4
chore: update Go to 1.22.4, other updates
Bump go modules, adjust the code.

New linter warnings.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-05 20:59:52 +04:00
Andrey Smirnov
9a23d846c1
fix: downgrade Azure IMDS required version
Fixes #8555

It seems that older version supports same set of fields we actually use
in our platform code, so we can safely downgrade to the version
supported by Azure Stack Hub.

I used
[this repo](https://github.com/Azure/azure-rest-api-specs/tree/main/specification/imds/data-plane/Microsoft.InstanceMetadataService/stable)
to check schemas across versions.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-03 19:41:09 +04:00
Andrey Smirnov
30860210cc
test: fix hardware test not to require PCI devices
On e.g. Azure VMs there are non reported.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-03 17:20:42 +04:00
Andrey Smirnov
4dd0aa7120
feat: implement PCI device bus enumeration
Fixes #8826

From the QEMU VM:

```shell
$ talosctl -n 172.20.0.5 get pcidevice
NODE         NAMESPACE   TYPE        ID             VERSION   CLASS                       SUBCLASS                    VENDOR              PRODUCT
172.20.0.5   hardware    PCIDevice   0000:00:00.0   1         Bridge                      Host bridge                 Intel Corporation   82G33/G31/P35/P31 Express DRAM Controller
172.20.0.5   hardware    PCIDevice   0000:00:01.0   1         Display controller          VGA compatible controller
172.20.0.5   hardware    PCIDevice   0000:00:02.0   1         Network controller          Ethernet controller         Red Hat, Inc.       Virtio network device
172.20.0.5   hardware    PCIDevice   0000:00:03.0   1         Unclassified device                                     Red Hat, Inc.       Virtio RNG
172.20.0.5   hardware    PCIDevice   0000:00:04.0   1         Unclassified device                                     Red Hat, Inc.       Virtio memory balloon
172.20.0.5   hardware    PCIDevice   0000:00:05.0   1         Communication controller    Communication controller    Red Hat, Inc.       Virtio console
172.20.0.5   hardware    PCIDevice   0000:00:06.0   1         Generic system peripheral   System peripheral           Intel Corporation   6300ESB Watchdog Timer
172.20.0.5   hardware    PCIDevice   0000:00:07.0   1         Mass storage controller     SCSI storage controller     Red Hat, Inc.       Virtio block device
172.20.0.5   hardware    PCIDevice   0000:00:1f.0   1         Bridge                      ISA bridge                  Intel Corporation   82801IB (ICH9) LPC Interface Controller
172.20.0.5   hardware    PCIDevice   0000:00:1f.2   1         Mass storage controller     SATA controller             Intel Corporation   82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA Controller [AHCI mode]
172.20.0.5   hardware    PCIDevice   0000:00:1f.3   1         Serial bus controller       SMBus                       Intel Corporation   82801I (ICH9 Family) SMBus Controller
```

```yaml
node: 172.20.0.5
metadata:
    namespace: hardware
    type: PCIDevices.hardware.talos.dev
    id: 0000:00:1f.3
    version: 1
    owner: hardware.PCIDevicesController
    phase: running
    created: 2024-05-30T12:09:05Z
    updated: 2024-05-30T12:09:05Z
spec:
    class: Serial bus controller
    subclass: SMBus
    vendor: Intel Corporation
    product: 82801I (ICH9 Family) SMBus Controller
    class_id: "0x0c"
    subclass_id: "0x05"
    vendor_id: "0x8086"
    product_id: "0x2930"
```

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-31 20:56:16 +04:00
Andrey Smirnov
b0466e0abf
fix: disable kexec on GCP/Azure
It looks like VMs might be stuck on kexec, while there's little value in
kexec with VMs.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-31 19:53:57 +04:00
Dmitriy Matrenichev
893e64fcb1
fix: replace nslookup with dig in integration tests
This should be more reliable on `integration-aws-*` and others.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-05-30 01:37:01 +03:00
Andrey Smirnov
0359c8537c
chore: unify toml packages being used
Drop BurntSushi one, and use /v2 of pelletier package.
There is indirect use of v1 which should hopefully go away once we move
away from sonobouy.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-29 21:22:56 +04:00
Dmitry Sharshakov
da8305ffb4
test: add a test for watchdog timers
Try to activate/deactivate watchdogs, change timeout, run only on QEMU.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Signed-off-by: Dmitry Sharshakov <dmitry.sharshakov@siderolabs.com>
2024-05-28 16:46:04 +04:00