Commit Graph

4882 Commits

Author SHA1 Message Date
Dmitriy Matrenichev
893e64fcb1
fix: replace nslookup with dig in integration tests
This should be more reliable on `integration-aws-*` and others.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-05-30 01:37:01 +03:00
Andrey Smirnov
0359c8537c
chore: unify toml packages being used
Drop BurntSushi one, and use /v2 of pelletier package.
There is indirect use of v1 which should hopefully go away once we move
away from sonobouy.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-29 21:22:56 +04:00
Artem Chernyshev
4feb94ca09
feat: add multidoc check to the Talos quirks module
Make it report true for Talos >= 1.5.0.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2024-05-29 17:44:14 +03:00
Justin Garrison
0b4a9777fc
docs: update talosctl install instructions for 1.8
Pulled changes from 1.7 docs

Signed-off-by: Justin Garrison <justin.garrison@siderolabs.com>
2024-05-28 11:32:29 -07:00
Dmitry Sharshakov
da8305ffb4
test: add a test for watchdog timers
Try to activate/deactivate watchdogs, change timeout, run only on QEMU.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Signed-off-by: Dmitry Sharshakov <dmitry.sharshakov@siderolabs.com>
2024-05-28 16:46:04 +04:00
Andrey Smirnov
da7f276409
fix: mount tracefs filesystem
Fixes https://github.com/siderolabs/pkgs/issues/963

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-28 13:54:42 +04:00
Noel Georgi
7b37e5b63d
chore(ci): fix integration extensions
Now that extensions run the `extensions-validator` we need to fetch proper tags.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-05-27 17:16:03 +05:30
Noel Georgi
de7553d77f
fix(ci): cron jobs
Crons needing extensions need the `generate` step as a dependency for
the `talos-metadata` file.

TrustedBoot needs the `secureboot-installer`.
Equinix needs arm64 since we boot an arm64 box as part of integration.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-05-26 17:10:42 +05:30
Dmitriy Matrenichev
eb510d9fdf
chore: require enabled bootloader for docker provisioner
Otherwise, it doesn't make sense.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-05-24 23:15:04 +03:00
Dmitriy Matrenichev
a9cf9b7892
fix: correctly handle dns messages in our dns implementation
- By default, github.com/miekg/dns uses `dns.MinMsgSize` for UDP messages, which is 512 bytes. This is too small for some
DNS request/responses, and can cause truncation and errors. This change sets the buffer size to `dns.DefaultMsgSize`
4096 bytes, which is the maximum size of a dns packet payload per RFC 6891.
- We also retry the request if the response is truncated or previous connection was closed.
- And finally we properly handle the case where the response is larger than the client buffer size,
and we return a truncated correct response.

Closes #8763

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-05-24 21:41:00 +03:00
Andrey Smirnov
c2b19dcb97
chore: move to containerd 2.0 API
Lots of module moves/renames.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-24 21:48:55 +04:00
Andrey Smirnov
92a274e9a0
fix: workaround problems with udevd races
When `udevd` rescans block device partitions while Talos is doing
partitions, it might be that Talos can hit the following error
while trying to open/mount a partition:

```
no such device or address
```

Previous attempts to fix that were using `ENODEV`, while the proper code
is `ENXIO`.

Also take exclusive lock while working with user disks to prevent
concurrent udevd rescan.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-24 21:22:07 +04:00
Noel Georgi
31b24ea3d7
chore(ci): split integration misc
Split integration misc into three.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-05-24 20:32:18 +05:30
Andrey Smirnov
8a1371337f
fix: produce stable order of bonds with equinix
Fixes the problem when bonds can be listed in random order.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-24 17:47:03 +04:00
Andrey Smirnov
6406193f46
test: add Equnix Metal sample metadata with two bonds
Talos doesn't support MLAG aggregation yet, but having the initial
testcase is a good step forward.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-24 15:39:58 +04:00
Andrey Smirnov
01ea82053e
fix: time sync over NTP from future era
Logs:

```
[    7.127481] [talos] adjusting time (jump) by -205704h26m36.111961385s via 162.159.200.1, state TIME_OK, status STA_NANO {"component": "controller-runtime", "controller":t}
```

Fix: https://github.com/beevik/ntp/pull/47

Fixes: #8771

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-24 14:49:27 +04:00
Noel Georgi
5aea424278
fix(ci): fix crons by setting up buildx always
Fix crons by setting up buildx always, also make sure `images-essential`
has `uki-certs` as dependency.

Also use platform as `linux/amd64` in CI integration tests and cron
jobs, since we don't run tests with arm64 binaries.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-05-24 15:19:27 +05:30
Justin Garrison
84706c3e29
docs: default to brew docs for talosctl
Updated all install instructions and added install page for future OS
specific install instructions

Signed-off-by: Justin Garrison <justin.garrison@siderolabs.com>
2024-05-23 16:37:45 -07:00
Dmitriy Matrenichev
fcd65ff65c
feat: enable forwardKubeDNSToHost by default
And ensure that it works.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-05-23 20:31:36 +03:00
Andrey Smirnov
2e64e9e4e0
fix: require accepted CAs on worker nodes
Note: this issue never happens with default Talos worker configuration
(generated by Omni, `talosctl gen config` or CABPT).

Before change https://github.com/siderolabs/talos/pull/4294 3 years ago,
worker nodes connected to trustd in "insecure" mode (without validating
the trustd server certificate). The change kept backwards compatibility,
so it still allowed insecure mode on upgrades.

Now it's time to break this compatibility promise, and require
accepted CAs to be always present. Adds validation for machine
configuration, so if upgrade is attempeted, it would not validate the
machine config without accepted CAs.

Now lack of accepted CAs would lead to failure to connect to trustd.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-23 17:48:16 +04:00
Noel Georgi
23c1c4560e
fix(ci): fix crons fby rekres
Fix cron jobs by pulling in new kres changes.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-05-23 15:51:48 +05:30
Andrey Smirnov
2d50392c5a
feat: update containerd to 2.0.0-rc.2, runc to 1.2.0-rc.1
This only updates the binaries, the API update will be handled via
PR #8766.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-22 19:18:34 +04:00
Noel Georgi
a12e4bb24e
chore(ci): fix github action crons
Fix GitHub action cron jobs.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-05-22 18:39:37 +05:30
Dmitriy Matrenichev
e7bd9cd2bb
fix: decrease maximum negative ttl for dns responses
The maximum negative ttl (ttl for non-existent domain responses) was set to 1 hour, which is
too long. This PR decreases the maximum negative ttl to 10 seconds.

Also update CoreDNS module while we are at it.

Closes #8631

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-05-21 23:20:42 +03:00
Noel Georgi
9c3ebad9fd
chore(ci): kresify gh actions
Kresify, only handle gh workflows.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-05-22 00:17:09 +05:30
Andrey Smirnov
ff60f6fde6
refactor: make some of the extensions package public
Moving the loading and validation to the machinery package, so that we
can import and use that from other projects.

Co-authored-by: Noel Georgi <git@frezbo.dev>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-21 21:24:36 +04:00
Andrey Smirnov
ce8c86d640
fix: panic in osroot controller
Fixes #8753

There seems to be a problem in the machine config anyways, as
`machine.ca.crt` is missing for the worker (this should break `apid`
connectivity), but still Talos controller shouldn't enter a panic loop.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-21 18:26:11 +04:00
Andrey Smirnov
e1711cd3c9
chore: stop using containerd package for cri namespace
In containerd 2.0 source tree, this constant is under `internal`, so we
can't import it directly.

So instead re-declare it as a Talos constant.

Doing this multi-staged, as `go-talos-support` is using it as well, and
to update it to stop importing old containerd library I need first to
declare the constant in Talos source tree.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-21 17:45:52 +04:00
Andrey Smirnov
d4307043ff
fix: update go-tail library to fix 'short read' error
See https://github.com/siderolabs/go-tail/pull/2

It seems to pop up more with compressed logs, but overall makes sense to
be fixed.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-20 20:44:43 +04:00
Michael Trip
7cd13ef4a6
docs: add documentation on using Multus with Talos
Short introduction into running Multus CNI.

Signed-off-by: Michael Trip <michael@alcatrash.net>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-20 17:12:08 +04:00
Andrey Smirnov
4784da3ef8
feat: use new circular buffer compressed chunks feature
Nothing changes from functional point of view: Talos still keeps max of
1M of logs per buffer, but the chunks after first 64k are compressed on
the fly.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-17 14:51:23 +04:00
Andrey Smirnov
78b48eb3ae
feat: include EDAC drivers
See https://github.com/siderolabs/pkgs/pull/957

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-15 23:05:36 +04:00
Andrey Smirnov
0bf2d69fbb
feat: update Kubernetes to 1.30.1
Latest v1.30.x version.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-15 21:18:04 +04:00
Dmitriy Matrenichev
53f5489130
fix: increase host dns packet ttl for pods
This PR fixes incorrect packet TTL if `forwardKubeDNSToHost` is enabled.

Credits go to Julian Wiedmann.
Closes #8698.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-05-15 17:40:32 +03:00
Dmitriy Matrenichev
dedb6d360d
fix: update github.com/siderolabs/siderolink to v0.3.7
Version 0.3.6 contains incorrect server implementation which breaks our integration tests.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-05-15 17:10:06 +03:00
Steve Francis
43939f1a6e
docs: fix typos, add docker socket info
Adjust docker docs.

Signed-off-by: Steve Francis <steve.francis@talos-systems.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-15 17:23:02 +04:00
Spencer Smith
6663068bbd
chore: update project in GCP testing
This PR moves the GCP tests to a new project there. I'm working on consolidating projects, names, and doing some reservations out there.

Signed-off-by: Spencer Smith <spencer.smith@talos-systems.com>
2024-05-14 14:00:45 -04:00
Spencer Smith
b86edc6776
chore: update office hours in talos repo
This updates the office hours in all "published" docs versions and in the readme.

Signed-off-by: Spencer Smith <spencer.smith@talos-systems.com>
2024-05-14 09:26:26 -04:00
Spencer Smith
cfa25d22dc
chore: remove docs prior to 1.0 from website navigation
These docs are still present in the repo, but won't be an option in the talos docs site.

Signed-off-by: Spencer Smith <spencer.smith@talos-systems.com>
2024-05-14 09:19:28 -04:00
Noel Georgi
1207054599
chore: handle I/O error for xfs_repair
Run `xfs_repair` on `unix.EIO` error.

```text
16T18:19:30.85674118Z]: XFS (sdb5): Mounting V5 Filesystem
109.200.197.196: kern:    info: [2024-04-16T18:19:30.92421418Z]: XFS (sdb5): Ending clean mount
109.200.197.196: kern:  notice: [2024-04-16T18:19:36.42651618Z]: XFS (sdb6): Mounting V5 Filesystem
109.200.197.196: kern:    info: [2024-04-16T18:19:36.49568918Z]: XFS (sdb6): Ending clean mount
109.200.197.196: kern:  notice: [2024-04-16T18:19:36.54484918Z]: XFS (sdb6): Quotacheck needed: Please wait.
109.200.197.196: kern:  notice: [2024-04-16T18:19:36.54586418Z]: XFS (sdb6): Quotacheck: Done.
109.200.197.196: kern:   alert: [2024-05-13T15:13:11.99476118Z]: XFS (sdb6): log I/O error -5
109.200.197.196: kern:   alert: [2024-05-13T15:13:11.99477118Z]: XFS (sdb6): Filesystem has been shut down due to log error (0x2).
109.200.197.196: kern:   alert: [2024-05-13T15:13:11.99477318Z]: XFS (sdb6): Please unmount the filesystem and rectify the problem(s).
```

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-05-13 21:19:50 +05:30
Andrey Smirnov
b7afe2669b
feat: update Linux 6.6.30
Update tools/pkgs to the latest version, brings in all updates.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-13 17:14:03 +04:00
USBAkimbo
26519ceed0
docs: update proxmox.md
Update proxmox guide to show example of using qemu-guest-agent.

Signed-off-by: USBAkimbo <71508071+USBAkimbo@users.noreply.github.com>
Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-05-13 14:50:53 +05:30
Andrey Smirnov
851b91a0e2
fix: don't enable hostDNS for versions of Talos which do not have it
The problem is that `talosctl cluster create` tries to enable
forwardKubeDNSToHost (for 1.7+), but due to the wrong condition this
tries to enable `hostDNS` for any version of Talos, while it's only
supported since 1.7+.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-09 19:22:20 +04:00
Artem Chernyshev
42ac5cd0c2
fix: check for nil machine config during installation
Otherwise we get `nil reference` exception during maintenance mode
upgrade with partial machine configs.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2024-05-08 16:34:55 +03:00
Andrey Smirnov
1d29111d43
chore: update Go to 1.22.3
Also bump dependencies.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-08 14:59:41 +04:00
Serge Logvinov
f4d7b9d9a9
feat: gather plaform dns names
Retrieve the DNS names of instances from the platform metadata.

Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-08 00:11:24 +04:00
Steve Francis
0b0f9995a6
docs: add resource information, some grammar fixes
Improve the ingress firewall docs.

Signed-off-by: Steve Francis <steve.francis@talos-systems.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-07 21:35:15 +04:00
Andrey Smirnov
763dae2508
fix: add cluster name to the worker machine config
This is 1.8+ only.

Fixes #8694

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-07 20:11:23 +04:00
Andrew Rynhard
4aac5b4ec3
feat: mount /sys/kernel/security into kubelet
This allows the kubelet to detect AppArmor.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-07 19:12:06 +04:00
Will Bush
817f18153f
docs: remove mention of enabling KubePrism after v1.6
I noticed in the docs
[here](8df5b85ec7/website/content/v1.8/kubernetes-guides/network/deploying-cilium.md (L241))
it mentions enabling the KubePrism feature. However, it mentions enabling the
KubePrism feature. However,
[here](8df5b85ec7/website/content/v1.8/kubernetes-guides/configuration/kubeprism.md (L25))
the docs mention it's enabled by default since 1.6..

So I was wondering if mention of enabling KubePrism after v1.6 is a mistake?
Note it was mentioned several times in the docs v1.5.

```
❯ rg "kubePrism:" --glob "*deploying-cilium.md" -A1
website/content/v1.8/kubernetes-guides/network/deploying-cilium.md
240:    kubePrism:
241-      enabled: true

website/content/v1.7/kubernetes-guides/network/deploying-cilium.md
240:    kubePrism:
241-      enabled: true

website/content/v1.6/kubernetes-guides/network/deploying-cilium.md
240:    kubePrism:
241-      enabled: true

website/content/v1.5/kubernetes-guides/network/deploying-cilium.md
32:    kubePrism:
33-      enabled: true
--
56:    kubePrism:
57-      enabled: true
--
212:    kubePrism:
213-      enabled: true
--
240:    kubePrism:
241-      enabled: true
--
264:    kubePrism:
265-      enabled: true
```

Signed-off-by: Will Bush <git@willbush.dev>
2024-05-07 17:49:52 +04:00