2137 Commits

Author SHA1 Message Date
Nico Berlee
4919f6ee22
feat: add GOMEMLIMIT to shipped manifests with memory limits
This commit integrates the GOMEMLIMIT environment variable into shipped K8S
manifests when resources.limits.memory is defined. It is set to 95% of the
memory limit to optimize the performance of the Go garbage collector,
mitigating the risk of OOMKills in containerized environments.

When configuring the controller-manager or scheduler custom resources in
machine config, they where accepted, but ignored.

This commit adds Resources to NewControlPlaneSchedulerController and
NewControlPlaneControllerManagerController so machine config resources
Fixes #7874

Signed-off-by: Nico Berlee <nico.berlee@on2it.net>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-10-20 20:41:40 +04:00
Andrey Smirnov
9dfae8467d
chore: update dependencies
Containerd 1.7.7, Linux 6.1.58.

Fixes #7859

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-10-17 17:41:38 +04:00
Serge Logvinov
38ce3c827a
feat: nocloud prefer mac address
Use MAC address over network interface name.

Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-10-16 17:35:58 +04:00
Andrey Smirnov
c3e4182000
refactor: use COSI runtime with new controller runtime DB
See https://github.com/cosi-project/runtime/pull/336

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-10-12 19:44:44 +04:00
Serge Logvinov
0ff7350abe
fix: oracle integration fixes
* Set static gateway IPv6 if it possible.
  Some cni do not work properly with ipv6, so we will fix it.
* Disable talos dashboard.

Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-10-12 17:51:50 +04:00
Andrey Smirnov
f9639fb531
test: fix 'talosctl gen' tests
There were weird hacks put into the tests, while each test already runs
in a temporary directory as 'working directory', so no hacks are needed.

Moreover, using fixed `/tmp/...` paths leads to test failures, as CI
runs docker & QEMU tests in parallel conflicting with each other.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-10-12 16:24:02 +04:00
Andrei Kvapil
6142d87a0f
feat: hostname configuration improvements on the NoCloud platform
* support for local-hostname parameter
* support for hostnames passed via user-data (for Proxmox VE)

Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-10-12 15:45:25 +04:00
Andrey Smirnov
7bb205ebe2
fix: don't use runtime-specs Mount struct in machine config
First of all, it breaks our backwards compatibility promises and breaks
documentation generation. Upstream `specs.Mount` might change at any
time.

The issue was that containerd 1.7.x brings in new `specs.Mount` which
contains extra fields which don't have `omitempty` for YAML, so
machinery always generates them which confuses old Talos versions.

Use a copy of the upstream struct with proper YAML tags, and also
provide a special trick to make sure if the upstream struct changes, we
have a chance to update our copy of the struct.

Also this fixes docs and JSON schema.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-10-11 23:06:19 +04:00
Thomas Way
b87092ab69
fix: handle secure boot state policy pcr digest error
This does not fix the underlying digest mismatch issue, but does handle the error and should provide
further insight into issues (if present).

Refs: #7828

Signed-off-by: Thomas Way <thomas@6f.io>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-10-09 18:24:56 +04:00
Thomas Way
336aee0fdb
fix: use tpm2 hash algorithm constants and allow non-SHA-256 PCRs
The conversion from TPM 2 hash algorithm to Go crypto algorithm will fail for
uncommon algorithms like SM3256. This can be avoided by checking the constants
directly, rather than converting them. It should also be fine to allow some non
SHA-256 PCRs.

Fixes: #7810

Signed-off-by: Thomas Way <thomas@6f.io>
Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-10-04 01:02:20 +05:30
Noel Georgi
69d8054c9e
chore: drop UpdateEndpointSuite
drop `UpdateEndpointSuite` suite since KubePrism is enabled by default
starting Talos 1.6 and the test never passes since K8s node is always
ready since it can connect to api server over KubePrism.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-10-04 00:26:59 +05:30
Andrey Smirnov
ef7be16c80
fix: clear the encryption config in META when STATE is reset
When STATE is reset, we need to make sure we wipe the META keys
containing encryption config as well.

Fixes #7819

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-10-03 22:00:21 +04:00
Noel Georgi
9b5cfdd0bc
chore: add tests for iscsi
Add tests for iscsi to make sure it works.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-10-02 22:12:42 +05:30
Andrey Smirnov
10ed130679
fix: the node IP for kubelet shouldn't change if nothing matches
This was a fix some time ago, but it was incorrect (missing `continue`),
which was failing the unit-tests.

Also fix a data race in another unit-test (which is unit-test only, not
affecting production).

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-09-29 14:22:52 +04:00
Andrey Smirnov
e7575ecaae
feat: support n-5 latest Kubernetes versions
For Talos 1.6 this means 1.24-1.29 Kubernetes.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-09-29 13:41:56 +04:00
Andrey Smirnov
2b548ad0d9
feat: update containerd to 1.7.x
Also update Linux and other pkgs.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-09-28 16:33:57 +04:00
Andrey Smirnov
a52d3cda3b
chore: update gen and COSI runtime
No actual changes, adapting to use new APIs.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-09-22 12:13:13 +04:00
Noel Georgi
9c2ba7c6fa
chore: add tests for chelsio drivers
Add tests for Chelsio drivers and firmware.

Ref: https://github.com/siderolabs/extensions/pull/232

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-09-20 20:07:25 +05:30
Andrey Smirnov
5ca4d58dc9
fix: generate of modules.dep when on the machine
When running on the machine, the extensionTreePath is not writeable, so
create and clean up a temporary directory to host `modules.dep`
extension.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-09-20 15:51:22 +04:00
Andrey Smirnov
96f2a62eaf
test: update upgrade tests versions
Use a 1.4/1.5 releases.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-09-19 14:53:25 +04:00
Andrey Smirnov
e3b4940588
fix: build CPU ucode correctly for early loader
Closes #7729

This follows the steps described in
https://www.kernel.org/doc/html/v6.1/x86/microcode.html#early-load-microcode

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-09-18 14:03:41 +04:00
Andrey Smirnov
c5bd0ac5cf
refactor: reimplement the depmod extension rebuilder
Drop loop device/mounts completely, use userspace utilities to extract
and lay over module trees in the tmpfs.

Discover kernel version automatically instead of hardcoding it to be
current one (required for Image Service).

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-09-15 21:51:42 +04:00
Andrey Smirnov
a7edd0523f
fix: set default route priority for hcloud platform
Otherwise route gets created with priority '0' and it seems to get into
conflict with what Cilium tries to add.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-09-15 14:47:45 +04:00
Andrey Smirnov
9698e45479
fix: handle correctly change of listen address for maintenance service
Fixes #7738

If the SideroLink address changes, maintenance service should listen on
new address. Previously it worked "sometimes", as there was a race on
maintenance config either be removed/recreated or just updated. In case
of an update the listen address was not updated properly, but recreate
case worked correctly.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-09-14 19:07:22 +04:00
Andrey Smirnov
a096f05a56
chore: update gRPC library and enable shared write buffers
Fixes #7576

See https://github.com/grpc/grpc-go/pull/6309

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-09-13 21:27:46 +04:00
Artem Chernyshev
2960f93baa
feat: add readonly information to the disks API response
Forward device readonly info from `go-blockdevice` library.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2023-09-12 18:09:59 +03:00
Serge Logvinov
3f52320752
feat: upgrade-k8s without comments
This feature allows us to remove any comments from the machineconfig after
upgrading Kubernetes.

Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-09-12 14:50:56 +04:00
Noel Georgi
3d2dad4e69
chore: show securtiystate on dashboard
Show Talos SecurityState and MountStatus on dashboard.

Fixes: #7675

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-09-06 21:46:25 +05:30
Noel Georgi
1eebbce357
chore: add output flag for talosctl config info
Add output flag for `talosctl config info`.

This allows to programatically gather endpoints for CI tests.

Eg:

```bash
_out/talosctl-linux-amd64 config info --output json | jq '.Contexts[].Endpoints[0]'
```

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-09-05 21:25:21 +04:00
Noel Georgi
3fbed806c4
chore: add tests for util-linux extensions
Add tests for utils-linux extensions.

Ref: https://github.com/siderolabs/extensions/pull/216

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-09-05 19:29:50 +05:30
Andrey Smirnov
6058c36023
fix: shorten VLAN link names to fit into the limit of 15 characters
Fixes #7679

This should be no-op if the link name is <= 10 chars, but with
predictable interface names based on MAC addresses, they have to be
shortened to make some space for VLAN ID.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-09-05 14:51:09 +04:00
Andrey Smirnov
9c2f765c86
fix: allow network device selector to match multiple links
Fixes #7673

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-09-04 20:37:04 +04:00
Andrey Smirnov
d91b5b3a31
feat: set environment variables early in the boot
Fixes #7696

This allows to set env variables from `talos.environment=` command line
arg.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-09-04 15:12:10 +04:00
Andrey Smirnov
c918c0855d
fix: set correct (1 year) talosconfig expiration
Fixes #7698

Also fix `talosctl config info` for `talosconfig` without a client
certificate (e.g. Omni-generated one).

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-09-04 14:46:28 +04:00
Andrey Smirnov
79bbdf454e
fix: set proper timeouts for KubePrism loadbalancer
The default timeouts are very aggressive, and we should use explicit
timeouts so that healh checks don't run that often.

Fixes #7690

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-09-01 00:16:09 +04:00
Andrey Smirnov
b8fb55d5c2
fix: use a mount prefix when installing a bootloader
This is not a problem in general, but when running multiple image
generation procedures using the same mount point is a problem.

This is a no-op if `MountPrefix` is not set (when installing/upgrading
vs. creating an image).

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-08-31 22:21:41 +04:00
Andrey Smirnov
2d3ac925ea
refactor: update NTP spike detector
See https://github.com/siderolabs/talos/issues/7080#issuecomment-1696105986

The NTP spike detector code was refactored out of the main NTP code so
that it can be unit-tested.

I dropped one check which I think is causing false-positives in the
spike detector (when NTP offset is higher than the RTT of the best
packet received so far).

The overall flow resembles the one in systemd-timesync, the current
implementation has this check:

6639ac474e/src/timesync/timesyncd-manager.c (L357-L360)

This check was introduced in the initial release, after some
refactoring:

3dbc762003 (diff-4aa9995f07bb31b9884d40a7634f5f6d30245dfd26ac27b89cd5fd3bd4eef56aR429-R431)

There is no equivalent of it in the RFC:

https://datatracker.ietf.org/doc/html/rfc5905#appendix-A.5.2

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-08-29 20:56:42 +04:00
Noel Georgi
d03dc7a8af
chore: validate new system extensions
Validate the amdgpu and intel-ice firmware extensions.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-08-25 17:18:19 +05:30
Andrey Smirnov
3c9f7a7de6
chore: re-enable nolintlint and typecheck linters
Drop startup/rand.go, as since Go 1.20 `rand.Seed` is done
automatically.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-08-25 01:05:41 +04:00
Andrey Smirnov
8670450d28
release(v1.6.0-alpha.0): prepare release
This is the official v1.6.0-alpha.0 release.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-08-24 17:09:34 +04:00
Noel Georgi
6778ded29d
feat: add e2e-aws for nvidia extensions
Add e2e tests for nvidia

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-08-24 17:43:36 +05:30
Andrey Smirnov
74c07ed714
chore: update Go to 1.21
This fixes a problem in the `RouteSpecController` which is due to a
subtle (but correct) change in the behavior in the `stdlib`.

Also some small (but should be safe) bumps.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-08-23 22:52:04 +04:00
Andrey Smirnov
c0ea4d7ba5
fix: properly calculate overal of node address with subnet filters
Example: host has address `10.0.0.1/8`, while Kubernetes pod CIDR is
`10.244.0.0/16`. These two subnets overlap, but the address `10.0.0.1`
isn't contained in the `10.244.0.0/16` subnet.

This change fixes the check to make sure address is not contained vs.
the address subnet overlaps with the filter.

NB: this is still a bad idea to have host network subnet to overlap with
Kubernetes pod/service CIDRs.

Also refactor the unit-tests to use new (better ways) to do assertions.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-08-23 21:16:58 +04:00
Noel Georgi
d6b2719e2e
chore: drone: move extensions step to a function
Move drone extensions integration to a function. This allows us to
re-use the code and just depend on a single step rather than explicitly
defining all dependencies.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-08-23 20:56:43 +05:30
Andrey Smirnov
9608ef56dc
chore: allow bridge traffic with DHCP broadcast traffic
This is required for https://github.com/siderolabs/sidero/pull/1070, as
we need to allow DHCP traffic from Sidero controller running in a VM
through the bridge to other VMs.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-08-23 18:37:37 +04:00
Noel Georgi
833895940b
chore: add tests for zfs extension
Add tests for ZFS and btrfs extensions.
Also fix the e2e-aws cron pipeline.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-08-23 11:16:25 +05:30
Utku Ozdemir
ea0d6e8c6a
fix: prevent dashboard crashes when process info is not available
Processes and their info are not guaranteed to be present on the api-based data gathered by the dashboard. Therefore, we switch to using nil-safe access to the CPU time when rendering the process table.

Closes siderolabs/talos#7645.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2023-08-22 12:55:36 +02:00
Andrey Smirnov
e9077a6fb9
feat: filter the hostname to produce nodename
Fixes #7615

This extends the previous handling when Talos did `ToLower()` on the
hostname to do the full filtering as expected.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-08-22 12:41:57 +04:00
Andrey Smirnov
dc8361c1d5
fix: properly GC images supplied with both tag and digest
This is a follow-up fix for #7640

I noticed that image cleanup controller cleans up the images if
specified with both tag and digest.

The problem was incorrectly building image references in the expected
set of images, so they were incorrectly marked as unused.

Refactor the code to make the core part testable.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-08-21 21:04:24 +04:00
Andrey Smirnov
b56e8b7d9b
fix: support 'List' type manifests
Fixes #7636

This support a `List`-type manifests by unwrapping them into individual
objects.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-08-21 16:48:37 +04:00