593 Commits

Author SHA1 Message Date
Andrey Smirnov
862c76001b
feat: add support for CoreDNS forwarding to host DNS
This PR adds the support for CoreDNS forwarding to host DNS. We try to bind on 9th address on the first element from
`serviceSubnets` and create a simple service so k8s will not attempt to rebind it.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Co-authored-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-04-03 23:36:17 +03:00
Dmitry Sharshakov
84ec8c16f3
feat: support syncing to PTP clocks
Also abstract away from NTP types.

Signed-off-by: Dmitry Sharshakov <dmitry.sharshakov@siderolabs.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-03-21 17:20:26 +04:00
Andrey Smirnov
7d43c9aa6b
chore: annotate installer errors
I want to catch a spurious error `ENODEV`, where exactly it comes from.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-03-21 16:58:34 +04:00
Dmitriy Matrenichev
19f15a840c
chore: bump golangci-lint to 1.57.0
Fix all discovered issues.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-03-21 01:06:53 +03:00
Noel Georgi
d118a852b9
feat: implement Install for imager overlays
Implement `Install` for imager overlays.
Also add support for generating installers.

Depends on: #8377

Fixes: #8350
Fixes: #8351
Fixes: #8350

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-03-12 22:46:29 +05:30
Utku Ozdemir
1bb6027ccd
fix: fix nil panic on maintenance upgrade with partial config
Fix the nil dereferences when a Talos node is attempted to be upgraded while in maintenance mode and having a partial machine config.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2024-03-08 12:52:21 +03:00
Artem Chernyshev
3c8f51d707
chore: move cli formatters and version modules to machinery
To be used in the `go-talos-support` module without importing the whole
Talos repo.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2024-03-07 16:29:15 +03:00
Dmitriy Matrenichev
f8c556a1ce
chore: listen for dns requests on 127.0.0.53
Turns out there is actually no black magic in systemd, they simply listen on 127.0.0.53 and forward dns requests there in resolv.conf.
Reason is the same as ours — to preserve compatibility with other applications. So we do the same in our code.

This PR also does two things:
- Adds `::1` into resolv.conf for IPv6 only resolvers.
- Drops `SO_REUSEPORT` from control options (it works without them).

Closes #8328

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-02-26 20:59:12 +03:00
Andrey Smirnov
8872a7a210
fix: ignore 'no such device' in addition to 'no such file'
This errors pops up when `udevd` rescans the partition table with Talos
trying to mount a device concurrently.

This feels to be something new with Linux 6.6 probably.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-02-26 20:00:05 +04:00
Andrey Smirnov
66f3ffdd4a
fix: ensure that Talos runs in a pod (container)
Drop the Kubernetes manifests as static files clean up (this is only
needed for upgrades from 1.2.x).

Fix Talos handling of cgroup hierarchy: if started in container in a
non-root cgroup hiearachy, use that to handle proper cgroup paths.

Add a test for a simple TinK mode (Talos-in-Kubernetes).

Update the docs.

Fixes #8274

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-02-20 15:06:48 +04:00
Dmitriy Matrenichev
fa3b933705
chore: replace fmt.Errorf with errors.New where possible
This time use `eg` from `x/tools` repo tool to do this.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-02-14 17:39:30 +03:00
Noel Georgi
2f0421b406
fix: run xfs_repair on invalid argument error
Run `xfs_repair` for invalid argument error.

Part of: #8292

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-02-13 23:01:33 +05:30
Dmitriy Matrenichev
fa2d34dd88
chore: enable v6 support on the same port
Replace `SO_REUSEPORT` with `SO_REUSEPORT`.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-02-13 01:02:27 +03:00
Dmitriy Matrenichev
83e0b0c19a
chore: adjust dns sockets settings
Enable some TCP optimization, set minimal TTL, set socket reuse.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-02-12 17:13:03 +03:00
Dmitriy Matrenichev
5324d39167
chore: bump stuff
Also fix .golangci.yml file.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-02-09 19:19:25 +03:00
Dmitriy Matrenichev
afa71d6b02
chore: use "handle-like" resource in DNSResolveCacheController
Rework (and simplify) `DNSResolveCacheController` to use `DNSUpstream` "handle-like" resources.

Depends on https://github.com/cosi-project/runtime/pull/400

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-02-08 21:40:57 +03:00
Andrey Smirnov
9d8cd4d058
chore: drop deprecated method EtcdRemoveMember
It was deprecated 16 months ago, time to cleanup.

(This is to prepare for the first v1.7 release)

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-02-01 15:54:29 +04:00
Andrey Smirnov
593afeea38
fix: run the interactive installer loop to report errors
In the previous implementation, even though `installer.err` was set, it
was never checked 🤦.

The run loop was stolen from the dashboard code.

Fixes #8205

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-31 19:20:46 +04:00
Dmitriy Matrenichev
ebeef28525
feat: implement local caching dns server
This PR adds a new controller - `DNSServerController` that starts tcp and udp dns servers locally. Just like `EtcFileController` it monitors `ResolverStatusType` and updates the list of destinations from there.

Most of the caching logic is in our "lobotomized" "`CoreDNS` fork. We need this fork because default `CoreDNS` carries
full Caddy server and various other modules that we don't need in Talos. On our side we implement
random selection of the actual dns and request forwarding.

Closes #7693

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-01-29 20:26:38 +03:00
Andrey Smirnov
9782319c31
fix: support KubePrism settings in Kubernetes Discovery
Fixes #8143

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-16 20:41:13 +04:00
Utku Ozdemir
7fa7362ddc
fix: fix nodes on dashboard footer when node names are used in --nodes
When the dashboard is used via the CLI through a proxy, e.g., through Omni, node names or IDs can be used in the `--nodes` flag instead of the IPs.

This caused rendering inconsistencies in the dashboard, as some parts of it used the IPs and some used the names passed in the context.

Fix this by collecting all node IPs on dashboard start, and map these IPs to the respective nodes passed as the `--nodes` flag.

On the dashboard footer, we always display the node names as they are passed in the `--nodes` flag.

As part of it, remove the node list change reactivity from the dashboard, so it will always take the passed nodes as the truth.

The IP to node mapping collection at dashboard startup also solves another issue where the first API call by the dashboard triggered the interactive API authentication (e.g., the OIDC flow). Previously, because the terminal was already switched to the raw mode, it was not possible to authenticate properly.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2024-01-12 12:00:08 +01:00
Andrey Smirnov
f38eaaab87
feat: rework secureboot and PCR signing key
Support different providers, not only static file paths.

Drop `pcr-signing-key-public.pem` file, as we generate it on the fly
now.

See https://github.com/siderolabs/image-factory/issues/19

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-11-10 21:14:21 +04:00
Dmitriy Matrenichev
6eade3d5ef
chore: add ability to rewrite uuids and set unique tokens for Talos
This PR does those things:
- It allows API calls `MetaWrite` and `MetaRead` in maintenance mode.
- SystemInformation resource now waits for available META
- SystemInformation resource now overwrites UUID from META if there is an override
- META now supports "UUID override" and "unique token" keys
- ProvisionRequest now includes unique token and Talos version

For #7694

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2023-11-10 18:17:54 +03:00
Andrey Smirnov
e22ab440d7
feat: update Linux 6.1.61, containerd 1.7.8, runc 1.1.10
Bump tools/pkgs/extras.

Update Go dependencies.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-11-09 20:17:28 +04:00
Noel Georgi
75d3987c05
chore: drop sha1 from genereated pcr json
Drop `sha1` algorithm from expected PCR json calculation.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-11-07 22:11:33 +05:30
Andrey Smirnov
6f3cd05935
refactor: update packet capture to use 'afpacket' interface
First of all, this interface is way more performant than `pcap`
interface. It is Linux-specific, but we don't care in Talos Linux :)

Second, this drop dependency of `machined` on `gopacket/layers` package,
which has huge issues with memory allocations and startup time.

This cuts around 20MiB of process RSS for all Talos processes.
(`talosctl` still requires this `gopacket/layers` library for decoding
packets).

Fixes #7880

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-11-07 15:52:04 +04:00
Andrey Smirnov
807a9950ac
fix: use custom Talos/kernel version when generating UKI
See https://github.com/siderolabs/image-factory/issues/44

Instead of using constants, use proper Talos version and kernel version
discovered from the image.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-11-03 11:14:02 +04:00
Andrey Smirnov
6dc776b8aa
fix: when writing to META in the installer/imager, use fixed name
Use fixed partition name instead of trying to auto-discover by label.

Auto-discovery by label might hit completely wrong blockdevice.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-11-01 20:34:41 +04:00
Artem Chernyshev
ffa5e05cb9
fix: make Talos work on Rockpi 4c boards again
Suppress `efivars` `ENODEV` errors: skip mount and proceed with boot
sequence.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2023-10-25 13:51:38 +03:00
Andrey Smirnov
9dfae8467d
chore: update dependencies
Containerd 1.7.7, Linux 6.1.58.

Fixes #7859

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-10-17 17:41:38 +04:00
Thomas Way
b87092ab69
fix: handle secure boot state policy pcr digest error
This does not fix the underlying digest mismatch issue, but does handle the error and should provide
further insight into issues (if present).

Refs: #7828

Signed-off-by: Thomas Way <thomas@6f.io>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-10-09 18:24:56 +04:00
Thomas Way
336aee0fdb
fix: use tpm2 hash algorithm constants and allow non-SHA-256 PCRs
The conversion from TPM 2 hash algorithm to Go crypto algorithm will fail for
uncommon algorithms like SM3256. This can be avoided by checking the constants
directly, rather than converting them. It should also be fine to allow some non
SHA-256 PCRs.

Fixes: #7810

Signed-off-by: Thomas Way <thomas@6f.io>
Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-10-04 01:02:20 +05:30
Andrey Smirnov
2b548ad0d9
feat: update containerd to 1.7.x
Also update Linux and other pkgs.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-09-28 16:33:57 +04:00
Andrey Smirnov
a52d3cda3b
chore: update gen and COSI runtime
No actual changes, adapting to use new APIs.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-09-22 12:13:13 +04:00
Andrey Smirnov
5ca4d58dc9
fix: generate of modules.dep when on the machine
When running on the machine, the extensionTreePath is not writeable, so
create and clean up a temporary directory to host `modules.dep`
extension.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-09-20 15:51:22 +04:00
Andrey Smirnov
e3b4940588
fix: build CPU ucode correctly for early loader
Closes #7729

This follows the steps described in
https://www.kernel.org/doc/html/v6.1/x86/microcode.html#early-load-microcode

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-09-18 14:03:41 +04:00
Andrey Smirnov
c5bd0ac5cf
refactor: reimplement the depmod extension rebuilder
Drop loop device/mounts completely, use userspace utilities to extract
and lay over module trees in the tmpfs.

Discover kernel version automatically instead of hardcoding it to be
current one (required for Image Service).

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-09-15 21:51:42 +04:00
Andrey Smirnov
a096f05a56
chore: update gRPC library and enable shared write buffers
Fixes #7576

See https://github.com/grpc/grpc-go/pull/6309

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-09-13 21:27:46 +04:00
Noel Georgi
3d2dad4e69
chore: show securtiystate on dashboard
Show Talos SecurityState and MountStatus on dashboard.

Fixes: #7675

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-09-06 21:46:25 +05:30
Andrey Smirnov
2d3ac925ea
refactor: update NTP spike detector
See https://github.com/siderolabs/talos/issues/7080#issuecomment-1696105986

The NTP spike detector code was refactored out of the main NTP code so
that it can be unit-tested.

I dropped one check which I think is causing false-positives in the
spike detector (when NTP offset is higher than the RTT of the best
packet received so far).

The overall flow resembles the one in systemd-timesync, the current
implementation has this check:

6639ac474e/src/timesync/timesyncd-manager.c (L357-L360)

This check was introduced in the initial release, after some
refactoring:

3dbc762003 (diff-4aa9995f07bb31b9884d40a7634f5f6d30245dfd26ac27b89cd5fd3bd4eef56aR429-R431)

There is no equivalent of it in the RFC:

https://datatracker.ietf.org/doc/html/rfc5905#appendix-A.5.2

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-08-29 20:56:42 +04:00
Andrey Smirnov
3c9f7a7de6
chore: re-enable nolintlint and typecheck linters
Drop startup/rand.go, as since Go 1.20 `rand.Seed` is done
automatically.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-08-25 01:05:41 +04:00
Noel Georgi
6778ded29d
feat: add e2e-aws for nvidia extensions
Add e2e tests for nvidia

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-08-24 17:43:36 +05:30
Utku Ozdemir
ea0d6e8c6a
fix: prevent dashboard crashes when process info is not available
Processes and their info are not guaranteed to be present on the api-based data gathered by the dashboard. Therefore, we switch to using nil-safe access to the CPU time when rendering the process table.

Closes siderolabs/talos#7645.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2023-08-22 12:55:36 +02:00
Andrey Smirnov
86c94eff8d
refactor: docgen and config examples
Short version is: move from global variables/`init()` function into
explicit functions.

`docgen` was updated to skip creating any top-level global variables,
now `Doc` information is generated on the fly when it is accessed.
Talos itself doesn't marshal the configuration often, so in general it
should never be accessed for Talos (but will be accessed e.g. for
`talosctl`).

Machine config examples were changed manually from variables to
functions returning a value and moved to a separate file.

There are no changes to the output of `talosctl gen config`.

There is a small change to the generated documentation, which I believe
is a correct one, as previously due to value reuse it was clobbered with
other data.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-08-10 14:56:01 +04:00
Andrey Smirnov
e0f383598e
chore: clean up the output of the imager
Use `Progress`, and options to pass around the way messages are written.

Fixed some tiny issues in the code, but otherwise no functional changes.

To make colored output work with `docker run`, switched back image
generation to use volume mount for output (old mode is still
functioning, but it's not the default, and it works when docker is not
running on the same host).

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-08-07 16:00:14 +04:00
Andrey Smirnov
fb536af4d1
chore: optimize memory usage of tcell library on init
There are two changes here:

* build `machined` binary with `tcell_minimal` tag (which disables
  loading some parts of the terminfo database), which also affects
  `apid`, `trustd` and `dashboard` processes, as they run from the same
  executable; in `dashboard` explicitly import `linux` terminal we're
  using when the `dashboard` runs on the machine
* pass `TCELL_MINIMIZE=1` environment variable to each Talos process
  which removes 0.5MiB of runewdith allocation for a lookup table

See #7578

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-08-04 17:59:18 +04:00
Artem Chernyshev
7d688ccfeb
fix: make encryption config provider default to luks2 if not set
Fixes: https://github.com/siderolabs/talos/issues/7515

Rename `Kind` to `Provider` in the `v1alpha1_provider`.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2023-08-04 12:20:55 +03:00
Dmitriy Matrenichev
80238a05a6
chore: unify semver under github.com/blang/semver/v4
Currently, we use `github.com/coreos/go-semver/semver` and `github.com/hashicorp/go-version`
for version parsing. As we use `github.com/blang/semver/v4` in our other projects, and it
has more features, it makes sense to use it across the projects. It also doesn't allocate
like crazy in `KubernetesVersion.SupportedWith`.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2023-08-04 00:29:52 +03:00
Andrey Smirnov
4eab3017b0
fix: calculate log2i properly
Fixes #7080

The real bug was off-by-one in `log2i` implementation, other changes are
cleanups as `x/sys/unix` package now contains all the constants we need.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-08-03 21:17:58 +04:00
Andrey Smirnov
87fe8f1a2a
feat: implement image generation profiles
Support full configuration for image generation, including image
outputs, support most features (where applicable) for all image output
types, unify image generation process.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-08-02 19:13:44 +04:00