4666 Commits

Author SHA1 Message Date
Noel Georgi
5aea424278
fix(ci): fix crons by setting up buildx always
Fix crons by setting up buildx always, also make sure `images-essential`
has `uki-certs` as dependency.

Also use platform as `linux/amd64` in CI integration tests and cron
jobs, since we don't run tests with arm64 binaries.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-05-24 15:19:27 +05:30
Justin Garrison
84706c3e29
docs: default to brew docs for talosctl
Updated all install instructions and added install page for future OS
specific install instructions

Signed-off-by: Justin Garrison <justin.garrison@siderolabs.com>
2024-05-23 16:37:45 -07:00
Dmitriy Matrenichev
fcd65ff65c
feat: enable forwardKubeDNSToHost by default
And ensure that it works.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-05-23 20:31:36 +03:00
Andrey Smirnov
2e64e9e4e0
fix: require accepted CAs on worker nodes
Note: this issue never happens with default Talos worker configuration
(generated by Omni, `talosctl gen config` or CABPT).

Before change https://github.com/siderolabs/talos/pull/4294 3 years ago,
worker nodes connected to trustd in "insecure" mode (without validating
the trustd server certificate). The change kept backwards compatibility,
so it still allowed insecure mode on upgrades.

Now it's time to break this compatibility promise, and require
accepted CAs to be always present. Adds validation for machine
configuration, so if upgrade is attempeted, it would not validate the
machine config without accepted CAs.

Now lack of accepted CAs would lead to failure to connect to trustd.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-23 17:48:16 +04:00
Noel Georgi
23c1c4560e
fix(ci): fix crons fby rekres
Fix cron jobs by pulling in new kres changes.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-05-23 15:51:48 +05:30
Andrey Smirnov
2d50392c5a
feat: update containerd to 2.0.0-rc.2, runc to 1.2.0-rc.1
This only updates the binaries, the API update will be handled via
PR #8766.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-22 19:18:34 +04:00
Noel Georgi
a12e4bb24e
chore(ci): fix github action crons
Fix GitHub action cron jobs.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-05-22 18:39:37 +05:30
Dmitriy Matrenichev
e7bd9cd2bb
fix: decrease maximum negative ttl for dns responses
The maximum negative ttl (ttl for non-existent domain responses) was set to 1 hour, which is
too long. This PR decreases the maximum negative ttl to 10 seconds.

Also update CoreDNS module while we are at it.

Closes #8631

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-05-21 23:20:42 +03:00
Noel Georgi
9c3ebad9fd
chore(ci): kresify gh actions
Kresify, only handle gh workflows.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-05-22 00:17:09 +05:30
Andrey Smirnov
ff60f6fde6
refactor: make some of the extensions package public
Moving the loading and validation to the machinery package, so that we
can import and use that from other projects.

Co-authored-by: Noel Georgi <git@frezbo.dev>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-21 21:24:36 +04:00
Andrey Smirnov
ce8c86d640
fix: panic in osroot controller
Fixes #8753

There seems to be a problem in the machine config anyways, as
`machine.ca.crt` is missing for the worker (this should break `apid`
connectivity), but still Talos controller shouldn't enter a panic loop.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-21 18:26:11 +04:00
Andrey Smirnov
e1711cd3c9
chore: stop using containerd package for cri namespace
In containerd 2.0 source tree, this constant is under `internal`, so we
can't import it directly.

So instead re-declare it as a Talos constant.

Doing this multi-staged, as `go-talos-support` is using it as well, and
to update it to stop importing old containerd library I need first to
declare the constant in Talos source tree.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-21 17:45:52 +04:00
Andrey Smirnov
d4307043ff
fix: update go-tail library to fix 'short read' error
See https://github.com/siderolabs/go-tail/pull/2

It seems to pop up more with compressed logs, but overall makes sense to
be fixed.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-20 20:44:43 +04:00
Michael Trip
7cd13ef4a6
docs: add documentation on using Multus with Talos
Short introduction into running Multus CNI.

Signed-off-by: Michael Trip <michael@alcatrash.net>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-20 17:12:08 +04:00
Andrey Smirnov
4784da3ef8
feat: use new circular buffer compressed chunks feature
Nothing changes from functional point of view: Talos still keeps max of
1M of logs per buffer, but the chunks after first 64k are compressed on
the fly.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-17 14:51:23 +04:00
Andrey Smirnov
78b48eb3ae
feat: include EDAC drivers
See https://github.com/siderolabs/pkgs/pull/957

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-15 23:05:36 +04:00
Andrey Smirnov
0bf2d69fbb
feat: update Kubernetes to 1.30.1
Latest v1.30.x version.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-15 21:18:04 +04:00
Dmitriy Matrenichev
53f5489130
fix: increase host dns packet ttl for pods
This PR fixes incorrect packet TTL if `forwardKubeDNSToHost` is enabled.

Credits go to Julian Wiedmann.
Closes #8698.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-05-15 17:40:32 +03:00
Dmitriy Matrenichev
dedb6d360d
fix: update github.com/siderolabs/siderolink to v0.3.7
Version 0.3.6 contains incorrect server implementation which breaks our integration tests.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-05-15 17:10:06 +03:00
Steve Francis
43939f1a6e
docs: fix typos, add docker socket info
Adjust docker docs.

Signed-off-by: Steve Francis <steve.francis@talos-systems.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-15 17:23:02 +04:00
Spencer Smith
6663068bbd
chore: update project in GCP testing
This PR moves the GCP tests to a new project there. I'm working on consolidating projects, names, and doing some reservations out there.

Signed-off-by: Spencer Smith <spencer.smith@talos-systems.com>
2024-05-14 14:00:45 -04:00
Spencer Smith
b86edc6776
chore: update office hours in talos repo
This updates the office hours in all "published" docs versions and in the readme.

Signed-off-by: Spencer Smith <spencer.smith@talos-systems.com>
2024-05-14 09:26:26 -04:00
Spencer Smith
cfa25d22dc
chore: remove docs prior to 1.0 from website navigation
These docs are still present in the repo, but won't be an option in the talos docs site.

Signed-off-by: Spencer Smith <spencer.smith@talos-systems.com>
2024-05-14 09:19:28 -04:00
Noel Georgi
1207054599
chore: handle I/O error for xfs_repair
Run `xfs_repair` on `unix.EIO` error.

```text
16T18:19:30.85674118Z]: XFS (sdb5): Mounting V5 Filesystem
109.200.197.196: kern:    info: [2024-04-16T18:19:30.92421418Z]: XFS (sdb5): Ending clean mount
109.200.197.196: kern:  notice: [2024-04-16T18:19:36.42651618Z]: XFS (sdb6): Mounting V5 Filesystem
109.200.197.196: kern:    info: [2024-04-16T18:19:36.49568918Z]: XFS (sdb6): Ending clean mount
109.200.197.196: kern:  notice: [2024-04-16T18:19:36.54484918Z]: XFS (sdb6): Quotacheck needed: Please wait.
109.200.197.196: kern:  notice: [2024-04-16T18:19:36.54586418Z]: XFS (sdb6): Quotacheck: Done.
109.200.197.196: kern:   alert: [2024-05-13T15:13:11.99476118Z]: XFS (sdb6): log I/O error -5
109.200.197.196: kern:   alert: [2024-05-13T15:13:11.99477118Z]: XFS (sdb6): Filesystem has been shut down due to log error (0x2).
109.200.197.196: kern:   alert: [2024-05-13T15:13:11.99477318Z]: XFS (sdb6): Please unmount the filesystem and rectify the problem(s).
```

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-05-13 21:19:50 +05:30
Andrey Smirnov
b7afe2669b
feat: update Linux 6.6.30
Update tools/pkgs to the latest version, brings in all updates.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-13 17:14:03 +04:00
USBAkimbo
26519ceed0
docs: update proxmox.md
Update proxmox guide to show example of using qemu-guest-agent.

Signed-off-by: USBAkimbo <71508071+USBAkimbo@users.noreply.github.com>
Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-05-13 14:50:53 +05:30
Andrey Smirnov
851b91a0e2
fix: don't enable hostDNS for versions of Talos which do not have it
The problem is that `talosctl cluster create` tries to enable
forwardKubeDNSToHost (for 1.7+), but due to the wrong condition this
tries to enable `hostDNS` for any version of Talos, while it's only
supported since 1.7+.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-09 19:22:20 +04:00
Artem Chernyshev
42ac5cd0c2
fix: check for nil machine config during installation
Otherwise we get `nil reference` exception during maintenance mode
upgrade with partial machine configs.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2024-05-08 16:34:55 +03:00
Andrey Smirnov
1d29111d43
chore: update Go to 1.22.3
Also bump dependencies.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-08 14:59:41 +04:00
Serge Logvinov
f4d7b9d9a9
feat: gather plaform dns names
Retrieve the DNS names of instances from the platform metadata.

Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-08 00:11:24 +04:00
Steve Francis
0b0f9995a6
docs: add resource information, some grammar fixes
Improve the ingress firewall docs.

Signed-off-by: Steve Francis <steve.francis@talos-systems.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-07 21:35:15 +04:00
Andrey Smirnov
763dae2508
fix: add cluster name to the worker machine config
This is 1.8+ only.

Fixes #8694

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-07 20:11:23 +04:00
Andrew Rynhard
4aac5b4ec3
feat: mount /sys/kernel/security into kubelet
This allows the kubelet to detect AppArmor.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-07 19:12:06 +04:00
Will Bush
817f18153f
docs: remove mention of enabling KubePrism after v1.6
I noticed in the docs
[here](8df5b85ec7/website/content/v1.8/kubernetes-guides/network/deploying-cilium.md (L241))
it mentions enabling the KubePrism feature. However, it mentions enabling the
KubePrism feature. However,
[here](8df5b85ec7/website/content/v1.8/kubernetes-guides/configuration/kubeprism.md (L25))
the docs mention it's enabled by default since 1.6..

So I was wondering if mention of enabling KubePrism after v1.6 is a mistake?
Note it was mentioned several times in the docs v1.5.

```
❯ rg "kubePrism:" --glob "*deploying-cilium.md" -A1
website/content/v1.8/kubernetes-guides/network/deploying-cilium.md
240:    kubePrism:
241-      enabled: true

website/content/v1.7/kubernetes-guides/network/deploying-cilium.md
240:    kubePrism:
241-      enabled: true

website/content/v1.6/kubernetes-guides/network/deploying-cilium.md
240:    kubePrism:
241-      enabled: true

website/content/v1.5/kubernetes-guides/network/deploying-cilium.md
32:    kubePrism:
33-      enabled: true
--
56:    kubePrism:
57-      enabled: true
--
212:    kubePrism:
213-      enabled: true
--
240:    kubePrism:
241-      enabled: true
--
264:    kubePrism:
265-      enabled: true
```

Signed-off-by: Will Bush <git@willbush.dev>
2024-05-07 17:49:52 +04:00
dhaines-quera
c08d797326
docs: fix the variable name typo
Update building-images.md.

Signed-off-by: dhaines-quera <139260712+dhaines-quera@users.noreply.github.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-07 16:29:05 +04:00
Utku Ozdemir
478b862b4c
fix: do not fail cli action tracker when boot id cannot be read
If the `reboot/reset/shutdown/upgrade` action tracker cannot read the boot ID from the node under `/proc/sys/kernel/random/boot_id` due to insufficient permissions (e.g., when `talosctl reboot` is used over Omni), fall back to skipping boot ID check instead of hard-failing.

Closes siderolabs/talos#7197.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2024-05-07 13:51:28 +02:00
Simon-Boyer
be510f9eb2
docs: fix grpc_tunnel value to true
grpc_tunnel is described as being enabled by using the value yes in the docs, but it should be true.

Signed-off-by: Simon-Boyer <si.boyer@hotmail.ca>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-07 14:40:19 +04:00
Artem Chernyshev
b7b8a8d8fa
docs: add logs example for the certificate errors troubleshooting
Should simplify the search of this error over the Internet.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2024-05-04 12:21:47 +03:00
Andrey Smirnov
8df5b85ec7
release(v1.8.0-alpha.0): prepare release
This is the official v1.8.0-alpha.0 release.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-01 22:40:04 +04:00
Andrey Smirnov
07f78182c6
fix: use a fresh context for etcd unlock
By the time unlock is called, context might be already canceled.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-01 18:59:50 +04:00
Andrey Smirnov
84cd7dbec4
feat: update Linux to 6.6.29
Pull in fixes for cloud-image-uploader from #8667.:w

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-01 15:59:04 +04:00
Spencer Smith
70fdca6a43
chore: update minimum hardware requirement for vmware ova
his PR bumps the minimum hardware family to vmx-15. This corresponds to "ESXi 6.7 U2" and matches the minimum required for anyone deploying the vsphere CSI as shown in [this](https://docs.vmware.com/en/VMware-vSphere-Container-Storage-Plug-in/3.0/vmware-vsphere-csp-getting-started/GUID-0AB6E692-AA47-4B6A-8CEA-38B754E16567.html) doc. This allows us to bypass an extra step anytime talos is deployed into a vsphere environment.

Signed-off-by: Spencer Smith <spencer.smith@talos-systems.com>
2024-04-30 16:03:42 -04:00
Andrey Smirnov
b690ffeb89
test: improve DNS resolver test stability
Run a health check before the test, as the test depends on CoreDNS being
healthy, and previous tests might disturb the cluster.

Also refactor by using watch instead of retries, make pods terminate
fast.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-04-29 19:31:34 +04:00
Birger J. Nordølum
5aa0299b6e
style: use correct capitalization for openstack
The current form of OpenStack is not capitalized correctly. Stack should
be written with a large S, like OpenStack and not Openstack.

Signed-off-by: Birger J. Nordølum <contact@mindtooth.no>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-04-29 18:46:06 +04:00
Andrey Smirnov
4c0c626b78
feat: use zstd compression in place of xz
Initramfs and kernel are compressed with zstd.

Extensions are compressed with zstd for Talos 1.8+.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-04-29 18:09:12 +04:00
Andrey Smirnov
98906ed6ea
fix: use reboot delay only in case of error
Delay the reboot for 10 seconds only if Talos hits an error, but
otherwise just proceed with the requested action.

This removes 10 seconds on "regular" reboot without kexec.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-04-26 18:46:00 +04:00
Andrey Smirnov
05fd042bb3
test: improve the reset integration tests
Provide a trace for each step of the reset sequence taken, so if one of
those fails, integration test produces a meaningful message instead of
proceeding and failing somewhere else.

More cleanup/refactor, should be functionally equivalent.

Fixes #8635

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-04-24 18:35:39 +04:00
darox
8cdf0f7cb0
docs: fix typo in Cilium instructions
Use correct pod security label.

Signed-off-by: darox <maderdario@gmail.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-04-24 16:14:01 +04:00
Utku Ozdemir
dd1d279daa
fix: allow more flags in talosctl cluster create --input-dir
Some of the flags passed to `talosctl cluster create` were failing the input validation due to being incorrectly marked as mutually exclusive with the `--input-dir` flag.

Clean up the check to allow passing all flags along with the `--input-dir` flag if those flags impact the provisioning process in any way (i.e., not solely used in generating machine config).

Additionally, replace the mutual exclusion checks with Cobra's built-in function for that.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2024-04-24 10:49:24 +02:00
Dmitry Sharshakov
ef4394e586
chore: update kernel and other packages
Kernel updates enable SELinux, intel_idle and update kernel version

Signed-off-by: Dmitry Sharshakov <dmitry.sharshakov@siderolabs.com>
2024-04-24 10:03:46 +03:00