4890 Commits

Author SHA1 Message Date
Andrey Smirnov
c9aeeca3d4
chore: fix the Makefile
Fix the error when not on a release.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-07-08 15:55:35 +04:00
Andrey Smirnov
48cdbe0de7
release(v1.8.0-alpha.1): prepare release
This is the official v1.8.0-alpha.1 release.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-07-08 14:48:13 +04:00
Andrey Smirnov
2512ef435f
test: fix the integrtion tests for apply-config
They got broken after refactoring.

Also use this PR to test things before the release.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-07-08 14:06:45 +04:00
Dmitriy Matrenichev
076f3c4f20
chore: improve link spec controller code
`SortBonds` function bothered me since the last time I refactored this part.

We always know that it only accepts `network.LinkSpec`s, but we accepted the slice of untyped Resources because
this is what `List` method returns. Now we can do better, since `safe.List` now supports `Swap` method.

We can utilize `sort.Interface` and pass `safe.List` directly to `SortBonds`.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-07-05 16:39:27 +03:00
Andrey Smirnov
0454130ad9
feat: suppress controller runtime first N failures on the console
As the controllers might fail with transient errors on machine startup,
but errors are always retried, persisten errors will anyway show up in
the console.

The full `talosctl logs controller-runtime` are not suppressed.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-07-05 15:36:54 +04:00
Andrey Smirnov
3d35e54683
chore: update hydrophone library
My PR https://github.com/kubernetes-sigs/hydrophone/pull/198 got merged
upstream, so drop local workaround.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-07-05 14:42:47 +04:00
Noel Georgi
1f28726d46
chore: support version with and without v prefix
Support passing in version with and without `v` prefix to Talos machine
config version contract parser.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-07-04 08:10:48 +05:30
Noel Georgi
9a56b8527b
chore(ci): fix parallel runs of tf pipelines
Previously it was generating same name for the state file causing
parallel runs to delete resources created by another running test.

Fix names to be unique by reading `cluster_name`.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-07-03 23:08:37 +05:30
Andrey Smirnov
be35f380cc
chore: update pkgs/tools/extras
This brings in Go 1.22.5 and new Flannel CNI plugin.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-07-03 20:38:55 +04:00
Justin Garrison
93df234445
docs: update opengraph image for main landing pages
Set the default image and explicitly set it for main pages.
Lint pre-rendered html for _index.html

Signed-off-by: Justin Garrison <justin.garrison@siderolabs.com>
2024-07-02 09:43:53 -07:00
Andrey Smirnov
d9d62d4da6
feat: update Linux to 6.6.36
Also update containerd to 2.0.0-rc.3, runc to 1.2.0-rc.2.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-07-02 19:55:50 +04:00
Marco Franssen
6b0fe5b8ca
docs: update deploying cilium docs for v1.7 and v1.8
Updates to reflect the changes in the latest cilium CLI, as well small fix in last example

Signed-off-by: Marco Franssen <marco.franssen@gmail.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-07-02 16:53:31 +04:00
Andrey Smirnov
52611a90d8
feat: update Kubernetes to v1.30.2
See https://github.com/kubernetes/kubernetes/releases/tag/v1.30.2

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-07-02 15:54:34 +04:00
Steve Francis
c19cc4ccbc
docs: clarify direct access needed to nodes in insecure mode
And some small updates.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-07-02 15:23:48 +04:00
Andrey Smirnov
b4c871e4b7
chore: bump dependencies
Update Go modules and other dependencies.

Fix linting of the Dockerfile.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-07-02 14:46:51 +04:00
Andrey Smirnov
cc345c8c94
feat: add support for configuring vlan filtering on the bridge
Fixes #8941

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-07-01 20:20:28 +04:00
Dmitriy Matrenichev
2d054ad355
chore: handle documents diff in apply-config dry run
Before this PR diff generator only diffed the v1alpha1 config and nothing else. With this PR it also takes
separate docs into the account.

```shell
~ > <editor> controlplane.yaml
~ > talosctl -n talos-default-controlplane-1  apply-config --file controlplane.yaml --dry-run
Dry run summary:
Applied configuration without a reboot (skipped in dry-run).
Config diff:
No changes.
Documents diff:
[]config.Document{
+	&runtime.KmsgLogV1Alpha1{
+		Meta:       meta.Meta{MetaAPIVersion: "v1alpha1", MetaKind: "KmsgLogConfig"},
+		MetaName:   "omni-kmsg",
+		KmsgLogURL: s"tcp://[fdae:41e4:649b:9303::1]:8092",
+	},
}
~ > talosctl -n talos-default-controlplane-1  apply-config --file controlplane.yaml
Applied configuration without a reboot
~ >
~ >
~ >
~ > <editor> controlplane.yaml
~ > talosctl -n talos-default-controlplane-1  apply-config --file controlplane.yaml --dry-run
Dry run summary:
Applied configuration without a reboot (skipped in dry-run).
Config diff:
No changes.
Documents diff:
[]config.Document{
	&runtime.KmsgLogV1Alpha1{Meta: {MetaAPIVersion: "v1alpha1", MetaKind: "KmsgLogConfig"}, MetaName: "omni-kmsg", KmsgLogURL: {URL: &{Scheme: "tcp", Host: "[fdae:41e4:649b:9303::1]:8092"}}},
+	&network.DefaultActionConfigV1Alpha1{
+		Meta:    meta.Meta{MetaAPIVersion: "v1alpha1", MetaKind: "NetworkDefaultActionConfig"},
+		Ingress: s"block",
+	},
}
```

Closes #8885

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-06-27 21:15:36 +03:00
Konrad Eriksson
bd34f71f3e
feat: add apparmor pkg
Bring in AppArmor pkg from `pkgs` which would add
`/sbin/apparmor_parser` which would get picked by containerd.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-06-27 20:52:08 +05:30
Fabian Topfstedt
71857fd4d3
docs: fix typo: messure -> measure
Fix clilum docs typo.:

Signed-off-by: Fabian Topfstedt <topfstedt@schneevonmorgen.com>
Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-06-27 18:28:39 +05:30
Noel Georgi
f75f16b0a8
chore(ci): fix cluster name generation
Append the target name to the cluster name so that parallel tests do not
create resources with same names.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-06-26 15:17:18 +05:30
Dmitriy Matrenichev
c603d2bf95
chore: output more info when ExecuteCommandInPod fails
This should make investigating things like [this](https://github.com/siderolabs/talos/actions/runs/9411253542/job/25924192027)
easier.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-06-24 20:15:45 +03:00
Grzegorz Rozniecki
4b5a7445e9
docs: fix missing Akamai platform in supported matrix
Add Akamai Connected Cloud (Linode) to supported cloud platforms matrix docs.

Signed-off-by: Grzegorz Rozniecki <grozniec@akamai.com>
Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-06-24 20:44:31 +05:30
Noel Georgi
4701498a1b
chore(ci): run e2e-aws-nvidia with zfs extension enabled
Run e2e-aws-nvidia-oss with zfs extension enabled.

Also fix the iscsi tests to get transport info using the new disks api.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-06-24 15:12:34 +05:30
Noel Georgi
86a3222aee
chore: use new disks api for iscsi tests
The iscsi test broke when the new disks api was introduced making the
test pass always, now filter other only `iscsi` disk types using the new
disks API.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-06-18 18:38:21 +05:30
Utku Ozdemir
5ffc3f14bd
feat: show siderolink status on dashboard
Add a new resource, `SiderolinkStatus`, which combines the following info:
- The Siderolink API endpoint without the query parameters or fragments (potentially sensitive info due to the join token)
- The status of the Siderolink connection

This resource is not set as sensitive, so it can be retrieved by the users with `os:operator` role (e.g., using `talosctl dashboard` through Omni).

Make use of this resource in the dashboard to display the status of the Siderolink connection.

Additionally, rework the status columns in the dashboard to:
- Display a Linux terminal compatible "tick" or a "cross" prefix for statuses in addition to the red/green color coding.
- Move and combine some statuses to save rows and make them more even.

Closes siderolabs/talos#8643.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2024-06-18 12:31:54 +02:00
Andrey Smirnov
6f6a5d1057
chore: upgrade to rtnetlink/v2 library
The v1 version is no longer supported.

The major change is the decoding of link data, but we're not using it,
as we have our own decoders/encoders for a long time.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-13 19:20:48 +04:00
Andrey Smirnov
1fb8453c2d
chore: update Go modules
Azure SDK has a CVE, bump other modules.

Update `hydrophone` with my fixes which got merged upstream.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-12 15:56:58 +04:00
Noel Georgi
8e15621e83
chore(ci): add conformance pipelines
This was missed when moving to GHA.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-06-12 10:47:43 +07:00
Andrey Smirnov
7fcb521a6a
feat: use hydrophone instead of sonobuoy
Fixes #8790

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-11 16:51:45 +04:00
Andrey Smirnov
d1a0c1f983
test: fix the integration test for no META name
When META has never been written (e.g. booted from a disk image), it
won't be detected as `talosmeta`.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-11 15:17:52 +04:00
Dmitriy Matrenichev
5350063340
chore: fix our dns server implementation
This PR does those things:
* No longer shuffles dns servers for each request.
* Sets a context timeout of 4.5 seconds.
* Correctly returns a proper error from the root layer.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-06-11 01:36:21 +03:00
Dmitriy Matrenichev
c6f90d0149
chore: replace sync.Map with concurrent.HashTrieMap
Also bump `cosi-project/runtime` to the v0.4.4

Closes #8851

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-06-10 20:45:47 +03:00
Andrey Smirnov
e8ced2c2dd
chore: drop k8s timeout in the default kubeconfig
(This is not user-facing, but rather internal use of the kubeconfig in
the tests/inside the machine).

This was added 4 years ago as a workaround, but instead of a global
timeout we should rather use contexts with timeouts/deadlines (and we
do!).

Setting a global timeout breaks streaming Kubernetes pod logs.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-10 18:29:50 +04:00
Andrey Smirnov
7cbdce73f7
fix: detect CD devices, fix user disks wipe test
Detect CD devices, and set size to 0 for CD without media.

In user disk wipe tests, skip device mapper devices and CD-ROM.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-10 18:00:06 +04:00
Dmitriy Matrenichev
aca475c665
chore: small usability fixes
* Replace logging.Wrap(log.Writer()) with zaptest.NewLogger(suite.T()) where possible.
* Replace reflect.DeepEqual with =|slices.Equal|bytes.Equal where possible.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-06-10 05:48:11 +03:00
Dmitriy Matrenichev
26cf566dc8
chore: bump our coredns fork
Update from github.com/coredns/coredns v1.11.2 to v1.11.3 and apply our changes.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-06-07 22:18:28 +03:00
Marcel Richter
5e66e117e2
fix: initial assignment of Hetzner Cloud Alias IP
The assignment of private networks happens in the hetzner cloud after
starting the server and therefore often after querying the network
information when assigning VIPs.

If an alias IP is to be set but no private network is yet available, an
error message is now thrown, until the private network is assigned.

Previously, no error message was thrown and the
network ID was set to 0, which means that the VIP
is regarded as a public floating IP in the further
code and not as a private alias IP.

Signed-off-by: Marcel Richter <mail@mrclrchtr.de>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-07 21:49:24 +04:00
Andrey Smirnov
f07b79f4a8
feat: provide disk detection based on new blockdevices
Uses go-siderolabs/go-blockdevice/v2 for all the hard parts,
provides new resource `Disk` which describes all disks in the system.

Additional resource `SystemDisk` always point to the system disk (based
on the location of `META` partition).

The `Disks` API (and `talosctl disks`) provides a view now into the
`talosctl get disks` to keep backwards compatibility.

QEMU provisioner can now create extra disks of various types: IDE, AHCI,
SCSI, NVME, this allows to test detection properly.

The new resource will be the foundation for volume provisioning (to pick
up the disk to provision the volume on).

Example:

```
talosctl -n 172.20.0.5 get disks
NODE         NAMESPACE   TYPE   ID        VERSION   SIZE          READ ONLY   TRANSPORT   ROTATIONAL   WWID                                                               MODEL            SERIAL
172.20.0.5   runtime     Disk   loop0     1         65568768      true
172.20.0.5   runtime     Disk   nvme0n1   1         10485760000   false       nvme                     nvme.1b36-6465616462656566-51454d55204e564d65204374726c-00000001   QEMU NVMe Ctrl   deadbeef
172.20.0.5   runtime     Disk   sda       1         10485760000   false       virtio      true                                                                            QEMU HARDDISK
172.20.0.5   runtime     Disk   sdb       1         10485760000   false       sata        true         t10.ATA     QEMU HARDDISK                           QM00013        QEMU HARDDISK
172.20.0.5   runtime     Disk   sdc       1         10485760000   false       sata        true         t10.ATA     QEMU HARDDISK                           QM00001        QEMU HARDDISK
172.20.0.5   runtime     Disk   vda       1         12884901888   false       virtio      true
```

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-07 20:18:32 +04:00
Noel Georgi
8ee0872683
chore(ci): drop crashdump, save logs as artifacts
Drop `--crashdump` and save talos cluster logs as artifacts.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-06-07 10:52:05 +08:00
Andrey Smirnov
7c9a14383e
fix: volume discovery improvements
Use shared locks, discover more partitions, some other small changes.

Re-enable the flaky test.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-06 19:45:40 +04:00
Andrey Smirnov
80ca8ff713
fix: update the cgroups for Talos core services
There was a bit of a mess here which worked fine until we bumped
runc/containerd, and the problem shows up in Talos-in-Kubernetes tests.

Use consistently `runner.WithCgroupPath`, as it handles cgroup nesting
for cases when Talos runs in a container.

Assign each service its own unique cgroup.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-06 17:24:20 +04:00
Ron Olson
fe317f1e16
docs: fix typo in QEMU guest agent support on Proxmox
Fix typo in parameter for installing QEMU guest agent support on
Proxmox.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-06 16:50:20 +04:00
Andrey Smirnov
8dbe2128a9
feat: implement Talos diagnostics
Talos diagnostics analyzes current system state and comes up with detailed
warnings on the system misconfiguration which might be tricky to figure
out other way.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-05 22:28:15 +04:00
Andrey Smirnov
357d7754fd
fix: clean up VM runners on cluster destroy
This never worked properly, as `Wait()` doesn't work for child
processes, and `talosctl cluster destroy` is not a child of processes
created by `talosctl cluster create`.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-05 21:53:24 +04:00
Andrey Smirnov
41f92e0ba4
chore: update Go to 1.22.4, other updates
Bump go modules, adjust the code.

New linter warnings.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-05 20:59:52 +04:00
Andrey Smirnov
4621e9bb77
chore: add stale and lock issue workflows
Mark as stale and close issues (to keep ourselves focused, and ensure
that the issue is still relevant today).

Lock old issues to force new issue being created, even if the problem
looks similar.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-04 14:55:40 +04:00
Andrey Smirnov
82d9cd3229
fix: add upgrade errata for arm64/zboot kernels
Fixes #8854

Talos 1.8.0 instroduces EFI ZBoot compression, and kexec from 1.7.0 to
compressed kernel doesn't work.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-03 20:10:24 +04:00
Andrey Smirnov
9a23d846c1
fix: downgrade Azure IMDS required version
Fixes #8555

It seems that older version supports same set of fields we actually use
in our platform code, so we can safely downgrade to the version
supported by Azure Stack Hub.

I used
[this repo](https://github.com/Azure/azure-rest-api-specs/tree/main/specification/imds/data-plane/Microsoft.InstanceMetadataService/stable)
to check schemas across versions.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-03 19:41:09 +04:00
Andrey Smirnov
30860210cc
test: fix hardware test not to require PCI devices
On e.g. Azure VMs there are non reported.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-03 17:20:42 +04:00
Andrey Smirnov
9fcc9b8415
feat: update Flannel to v0.25.3
See https://github.com/flannel-io/flannel/releases/tag/v0.25.3

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-03 12:19:21 +04:00