418 Commits

Author SHA1 Message Date
Andrey Smirnov
362c9f812c
test: skip lvm test if not enough user disks available
E.g. in trusted-boot pipeline, we don't have extra disks.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-10-08 21:09:39 +04:00
Andrey Smirnov
4d279c65f4
fix: volume encryption with failing keyslots
Fix the flow when a failing key slot leads to repeated attempts to open
the volume, while it's already open, but the failure was to sync other
keys.

Refactor the code to get rid of variable assignment in the outer block
from closures.

Fixes #9415

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit 0a2b4556c55eda27536ee563f60bcf5d69379479)
2024-10-08 14:49:32 +04:00
Andrey Smirnov
66228ef10e
fix: multiple fixes for LVM activation
Two fixes were in pkgs/lvm2:

* https://github.com/siderolabs/pkgs/pull/1041
* https://github.com/siderolabs/pkgs/pull/1042

Other fixes in this PR:

* adjust the controller a bit for some interactions
* make Rook test use more complicated, encrypted setup which uses LVM
* adjust LVM test to handle a case when there's more than one worker

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit 74861573a793f9e143d7d2638990f37ec639aa88)
2024-10-08 14:48:19 +04:00
Noel Georgi
4c7948bb46
chore: better lvm2 tests
Use LVM2 tests that relies on module loading by lvm.

Fixes: #9300

Signed-off-by: Noel Georgi <git@frezbo.dev>
(cherry picked from commit 76318bd0bb008e9f43ae5fa86e7f862269e1ad0d)
2024-09-23 13:06:30 +04:00
Noel Georgi
67ba478253
chore: refactor tests
Refactor tests to avoid code duplication.

Signed-off-by: Noel Georgi <git@frezbo.dev>
(cherry picked from commit 9fa08e843728dbd85ed7e0035f59cdd6232de9a9)
2024-09-21 13:27:15 +04:00
Noel Georgi
70d3c91fb7
feat: support lvm auto activation
Support lvm auto-activation as per
https://man7.org/linux/man-pages/man7/lvmautoactivation.7.html.

This changes from how Talos previously used to unconditionally tried to
activate all volume groups to based on udev events.

Fixes: #9300

Signed-off-by: Noel Georgi <git@frezbo.dev>
(cherry picked from commit d8ab4981b626ff41fbcdb526a032a5584519e3df)
2024-09-21 13:26:15 +04:00
Andrey Smirnov
dd4185b144
feat: add KubeSpan extra endpoint configuration
Fixes #9174

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-09-06 14:50:12 +04:00
Andrey Smirnov
3038ccfa88
feat: add configuration for EPHEMERAL volume
Fixes #9261

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-09-06 14:11:35 +04:00
Andrey Smirnov
b453385bd9
feat: support volume configuration, provisioning, etc
This implements the first round of changes, replacing the volume backend
with the new implementation, while keeping most of the external
interfaces intact.

See #8367

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-08-30 18:32:34 +04:00
Andrey Smirnov
be2ebf6b4d
chore: bump dependencies
Update tools, pkgs, extras, Go dependencies, Go tools, etc.

Linux 6.6.47 and containerd 2.0.0-rc.4.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-08-29 20:44:37 +04:00
Andrey Smirnov
a9551b7caa
fix: host DNS access with firewall enabled
Explicitly enable access to host DNS from pod/service IPs.

Also fix the Kubernetes health checks to assert number of ready pods to
match expectation, otherwise the check might skip a pod (e.g.
`kube-proxy` one) which is not ready, allowing the test to proceed too
early.

Update DNS test to print more logs on error.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-08-27 15:44:14 +04:00
Noel Georgi
8fe39eacba
chore: move csi tests as go test
Move rook-ceph CSI tests as go tests.
This allows us to add more CSI tests in the future.

Fixes: #9135

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-08-26 18:18:09 +05:30
Andrey Smirnov
5ff6cf82ca
fix: drop /opt mount for containers/tink
The `/opt/cni/bin` in the rootfs contains CNI binaries, which get
overwritten by the volume mount.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-08-22 20:39:52 +04:00
Noel Georgi
f9f5e0ef55
chore: fix k8s tests
The check for k8s suite added in #9085 causes issues with applying k8s resources
which are global like `Namespace` or `StorageClass`.

Instead of failing just log.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-08-09 13:28:02 +05:30
Noel Georgi
64914b086c
chore: add test for crun extension
Add a test to verify the `crun` runtimeclass container-runtime extension
works as expected.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-08-02 20:15:01 +05:30
Andrey Smirnov
7a1c62b8bc
feat: publish installed extensions as node labels/annotations
Extensions are posted the following way:

`extensions.talos.dev/<name>=<version>`

The name should be valid as a label (annotation) key.

If the value is valid as a label value, use labels, otherwise use
annotations.

Also implements node annotations in the machine config as a side-effect.

Fixes #9089

Fixes #8971

See #9070

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-08-01 17:32:09 +04:00
Noel Georgi
50e5f37efb
chore: add test for apparmor
Add a test that verifies pods can be scheduled with `RuntimeDefault`
apparmor profile.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-07-30 20:24:57 +05:30
Noel Georgi
117628aa60
chore: add test for gvisor extension with platform kvm
Add test for Gvisor extensions when kvm platform is used.

The test is marked as skipped until pod termination issue is resolved.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-07-25 19:15:27 +05:30
Andrey Smirnov
b07338f547
feat: provide machine config document to update trusted CA roots
Fixes #8867

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-07-12 19:28:31 +04:00
Andrey Smirnov
736c1485e2
fix: change the UEFI firmware search path order
Ensure that SecureBoot enabled images come before regular ones.

With Ubuntu 24.04 `ovmf` package, due to the ordering of the search
paths `talosctl` might pick up a wrong image and disable SecureBoot.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-07-11 21:56:33 +04:00
Dmitriy Matrenichev
dad9c40c73
chore: simplify code
- replace `interface{}` with `any` using `gofmt -r 'interface{} -> any -w'`
- replace `a = []T{}` with `var a []T` where possible.
- replace `a = []T{}` with `a = make([]T, 0, len(b))` where possible.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-07-08 18:14:00 +03:00
Andrey Smirnov
2512ef435f
test: fix the integrtion tests for apply-config
They got broken after refactoring.

Also use this PR to test things before the release.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-07-08 14:06:45 +04:00
Dmitriy Matrenichev
2d054ad355
chore: handle documents diff in apply-config dry run
Before this PR diff generator only diffed the v1alpha1 config and nothing else. With this PR it also takes
separate docs into the account.

```shell
~ > <editor> controlplane.yaml
~ > talosctl -n talos-default-controlplane-1  apply-config --file controlplane.yaml --dry-run
Dry run summary:
Applied configuration without a reboot (skipped in dry-run).
Config diff:
No changes.
Documents diff:
[]config.Document{
+	&runtime.KmsgLogV1Alpha1{
+		Meta:       meta.Meta{MetaAPIVersion: "v1alpha1", MetaKind: "KmsgLogConfig"},
+		MetaName:   "omni-kmsg",
+		KmsgLogURL: s"tcp://[fdae:41e4:649b:9303::1]:8092",
+	},
}
~ > talosctl -n talos-default-controlplane-1  apply-config --file controlplane.yaml
Applied configuration without a reboot
~ >
~ >
~ >
~ > <editor> controlplane.yaml
~ > talosctl -n talos-default-controlplane-1  apply-config --file controlplane.yaml --dry-run
Dry run summary:
Applied configuration without a reboot (skipped in dry-run).
Config diff:
No changes.
Documents diff:
[]config.Document{
	&runtime.KmsgLogV1Alpha1{Meta: {MetaAPIVersion: "v1alpha1", MetaKind: "KmsgLogConfig"}, MetaName: "omni-kmsg", KmsgLogURL: {URL: &{Scheme: "tcp", Host: "[fdae:41e4:649b:9303::1]:8092"}}},
+	&network.DefaultActionConfigV1Alpha1{
+		Meta:    meta.Meta{MetaAPIVersion: "v1alpha1", MetaKind: "NetworkDefaultActionConfig"},
+		Ingress: s"block",
+	},
}
```

Closes #8885

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-06-27 21:15:36 +03:00
Dmitriy Matrenichev
c603d2bf95
chore: output more info when ExecuteCommandInPod fails
This should make investigating things like [this](https://github.com/siderolabs/talos/actions/runs/9411253542/job/25924192027)
easier.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-06-24 20:15:45 +03:00
Noel Georgi
86a3222aee
chore: use new disks api for iscsi tests
The iscsi test broke when the new disks api was introduced making the
test pass always, now filter other only `iscsi` disk types using the new
disks API.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-06-18 18:38:21 +05:30
Andrey Smirnov
7fcb521a6a
feat: use hydrophone instead of sonobuoy
Fixes #8790

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-11 16:51:45 +04:00
Andrey Smirnov
d1a0c1f983
test: fix the integration test for no META name
When META has never been written (e.g. booted from a disk image), it
won't be detected as `talosmeta`.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-11 15:17:52 +04:00
Andrey Smirnov
e8ced2c2dd
chore: drop k8s timeout in the default kubeconfig
(This is not user-facing, but rather internal use of the kubeconfig in
the tests/inside the machine).

This was added 4 years ago as a workaround, but instead of a global
timeout we should rather use contexts with timeouts/deadlines (and we
do!).

Setting a global timeout breaks streaming Kubernetes pod logs.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-10 18:29:50 +04:00
Andrey Smirnov
7cbdce73f7
fix: detect CD devices, fix user disks wipe test
Detect CD devices, and set size to 0 for CD without media.

In user disk wipe tests, skip device mapper devices and CD-ROM.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-10 18:00:06 +04:00
Andrey Smirnov
f07b79f4a8
feat: provide disk detection based on new blockdevices
Uses go-siderolabs/go-blockdevice/v2 for all the hard parts,
provides new resource `Disk` which describes all disks in the system.

Additional resource `SystemDisk` always point to the system disk (based
on the location of `META` partition).

The `Disks` API (and `talosctl disks`) provides a view now into the
`talosctl get disks` to keep backwards compatibility.

QEMU provisioner can now create extra disks of various types: IDE, AHCI,
SCSI, NVME, this allows to test detection properly.

The new resource will be the foundation for volume provisioning (to pick
up the disk to provision the volume on).

Example:

```
talosctl -n 172.20.0.5 get disks
NODE         NAMESPACE   TYPE   ID        VERSION   SIZE          READ ONLY   TRANSPORT   ROTATIONAL   WWID                                                               MODEL            SERIAL
172.20.0.5   runtime     Disk   loop0     1         65568768      true
172.20.0.5   runtime     Disk   nvme0n1   1         10485760000   false       nvme                     nvme.1b36-6465616462656566-51454d55204e564d65204374726c-00000001   QEMU NVMe Ctrl   deadbeef
172.20.0.5   runtime     Disk   sda       1         10485760000   false       virtio      true                                                                            QEMU HARDDISK
172.20.0.5   runtime     Disk   sdb       1         10485760000   false       sata        true         t10.ATA     QEMU HARDDISK                           QM00013        QEMU HARDDISK
172.20.0.5   runtime     Disk   sdc       1         10485760000   false       sata        true         t10.ATA     QEMU HARDDISK                           QM00001        QEMU HARDDISK
172.20.0.5   runtime     Disk   vda       1         12884901888   false       virtio      true
```

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-07 20:18:32 +04:00
Andrey Smirnov
7c9a14383e
fix: volume discovery improvements
Use shared locks, discover more partitions, some other small changes.

Re-enable the flaky test.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-06 19:45:40 +04:00
Andrey Smirnov
30860210cc
test: fix hardware test not to require PCI devices
On e.g. Azure VMs there are non reported.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-03 17:20:42 +04:00
Andrey Smirnov
4dd0aa7120
feat: implement PCI device bus enumeration
Fixes #8826

From the QEMU VM:

```shell
$ talosctl -n 172.20.0.5 get pcidevice
NODE         NAMESPACE   TYPE        ID             VERSION   CLASS                       SUBCLASS                    VENDOR              PRODUCT
172.20.0.5   hardware    PCIDevice   0000:00:00.0   1         Bridge                      Host bridge                 Intel Corporation   82G33/G31/P35/P31 Express DRAM Controller
172.20.0.5   hardware    PCIDevice   0000:00:01.0   1         Display controller          VGA compatible controller
172.20.0.5   hardware    PCIDevice   0000:00:02.0   1         Network controller          Ethernet controller         Red Hat, Inc.       Virtio network device
172.20.0.5   hardware    PCIDevice   0000:00:03.0   1         Unclassified device                                     Red Hat, Inc.       Virtio RNG
172.20.0.5   hardware    PCIDevice   0000:00:04.0   1         Unclassified device                                     Red Hat, Inc.       Virtio memory balloon
172.20.0.5   hardware    PCIDevice   0000:00:05.0   1         Communication controller    Communication controller    Red Hat, Inc.       Virtio console
172.20.0.5   hardware    PCIDevice   0000:00:06.0   1         Generic system peripheral   System peripheral           Intel Corporation   6300ESB Watchdog Timer
172.20.0.5   hardware    PCIDevice   0000:00:07.0   1         Mass storage controller     SCSI storage controller     Red Hat, Inc.       Virtio block device
172.20.0.5   hardware    PCIDevice   0000:00:1f.0   1         Bridge                      ISA bridge                  Intel Corporation   82801IB (ICH9) LPC Interface Controller
172.20.0.5   hardware    PCIDevice   0000:00:1f.2   1         Mass storage controller     SATA controller             Intel Corporation   82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA Controller [AHCI mode]
172.20.0.5   hardware    PCIDevice   0000:00:1f.3   1         Serial bus controller       SMBus                       Intel Corporation   82801I (ICH9 Family) SMBus Controller
```

```yaml
node: 172.20.0.5
metadata:
    namespace: hardware
    type: PCIDevices.hardware.talos.dev
    id: 0000:00:1f.3
    version: 1
    owner: hardware.PCIDevicesController
    phase: running
    created: 2024-05-30T12:09:05Z
    updated: 2024-05-30T12:09:05Z
spec:
    class: Serial bus controller
    subclass: SMBus
    vendor: Intel Corporation
    product: 82801I (ICH9 Family) SMBus Controller
    class_id: "0x0c"
    subclass_id: "0x05"
    vendor_id: "0x8086"
    product_id: "0x2930"
```

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-31 20:56:16 +04:00
Dmitriy Matrenichev
893e64fcb1
fix: replace nslookup with dig in integration tests
This should be more reliable on `integration-aws-*` and others.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-05-30 01:37:01 +03:00
Dmitry Sharshakov
da8305ffb4
test: add a test for watchdog timers
Try to activate/deactivate watchdogs, change timeout, run only on QEMU.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Signed-off-by: Dmitry Sharshakov <dmitry.sharshakov@siderolabs.com>
2024-05-28 16:46:04 +04:00
Dmitriy Matrenichev
a9cf9b7892
fix: correctly handle dns messages in our dns implementation
- By default, github.com/miekg/dns uses `dns.MinMsgSize` for UDP messages, which is 512 bytes. This is too small for some
DNS request/responses, and can cause truncation and errors. This change sets the buffer size to `dns.DefaultMsgSize`
4096 bytes, which is the maximum size of a dns packet payload per RFC 6891.
- We also retry the request if the response is truncated or previous connection was closed.
- And finally we properly handle the case where the response is larger than the client buffer size,
and we return a truncated correct response.

Closes #8763

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-05-24 21:41:00 +03:00
Andrey Smirnov
c2b19dcb97
chore: move to containerd 2.0 API
Lots of module moves/renames.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-24 21:48:55 +04:00
Andrey Smirnov
2e64e9e4e0
fix: require accepted CAs on worker nodes
Note: this issue never happens with default Talos worker configuration
(generated by Omni, `talosctl gen config` or CABPT).

Before change https://github.com/siderolabs/talos/pull/4294 3 years ago,
worker nodes connected to trustd in "insecure" mode (without validating
the trustd server certificate). The change kept backwards compatibility,
so it still allowed insecure mode on upgrades.

Now it's time to break this compatibility promise, and require
accepted CAs to be always present. Adds validation for machine
configuration, so if upgrade is attempeted, it would not validate the
machine config without accepted CAs.

Now lack of accepted CAs would lead to failure to connect to trustd.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-23 17:48:16 +04:00
Andrey Smirnov
b7afe2669b
feat: update Linux 6.6.30
Update tools/pkgs to the latest version, brings in all updates.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-13 17:14:03 +04:00
Andrey Smirnov
763dae2508
fix: add cluster name to the worker machine config
This is 1.8+ only.

Fixes #8694

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-07 20:11:23 +04:00
Andrey Smirnov
b690ffeb89
test: improve DNS resolver test stability
Run a health check before the test, as the test depends on CoreDNS being
healthy, and previous tests might disturb the cluster.

Also refactor by using watch instead of retries, make pods terminate
fast.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-04-29 19:31:34 +04:00
Andrey Smirnov
05fd042bb3
test: improve the reset integration tests
Provide a trace for each step of the reset sequence taken, so if one of
those fails, integration test produces a meaningful message instead of
proceeding and failing somewhere else.

More cleanup/refactor, should be functionally equivalent.

Fixes #8635

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-04-24 18:35:39 +04:00
Andrey Smirnov
bac1d00c35
chore: prepare for Talos 1.8
Fork docs, introduce version contract for 1.8.

Clean up old version contracts 0.8-0.14.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-04-19 18:19:36 +04:00
Dmitriy Matrenichev
ec69d7a785
chore: replace math/rand with math/rand/v2
New package arrived in Go 1.22 which provides better rand primitives and functions.
Use it instead of the old one.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-04-18 13:20:59 +03:00
Andrey Smirnov
3433fa13bf
feat: use container DNS when in container mode
More specifically, pick up `/etc/resolv.conf` contents by default when
in container mode, and use that as a base resolver for the host DNS.

Fixes #8303

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-04-16 17:01:36 +04:00
Andrey Smirnov
c8f674bd3d
test: add a test for 'spin' container runtime
See https://github.com/siderolabs/extensions/pull/355

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-04-10 20:42:16 +04:00
Andrey Smirnov
9aa1e1b79b
fix: present all accepted CAs to the kube-apiserver
This fixes an issue with a single controlplane cluster.

Properly present all accepted CAs to the apiserver, in the test let the
cluster fully recovery between two CA rotations performed.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-04-08 23:33:22 +04:00
Dmitry Sharshakov
653f838b09
feat: support multiple Docker cluster in talosctl cluster create
Dynamically map Kubernetes and Talos API ports to an available port on
the host, so every cluster gets its own unique set of parts.

As part of the changes, refactor the provision library and interfaces,
dropping old weird interfaces replacing with (hopefully) much more
descriprive names.

Signed-off-by: Dmitry Sharshakov <dmitry.sharshakov@siderolabs.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-04-04 21:21:39 +04:00
Andrey Smirnov
78b9bd9273
fix: report unsupported x86_64 microarchitecture level
Fixes #8361

Talos requires v2 (circa 2008), but VMs are often configured to limit
the exposed features to the baseline (v1).

```
[    0.779218] [talos] [initramfs] booting Talos v1.7.0-alpha.1-35-gef5bbe728-dirty
[    0.779806] [talos] [initramfs] CPU: QEMU Virtual CPU version 2.5+, 4 core(s), 1 thread(s) per core
[    0.780529] [talos] [initramfs] x86_64 microarchitecture level: 1
[    0.781018] [talos] [initramfs] it might be that the VM is configured with an older CPU model, please check the VM configuration
[    0.782346] [talos] [initramfs] x86_64 microarchitecture level 2 or higher is required, halting
```

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-04-03 16:09:57 +04:00
Noel Georgi
f515741b52
chore: add equinix e2e-tests
Add equinix e2e-tests.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-04-02 17:16:59 +05:30