4565 Commits

Author SHA1 Message Date
Utku Ozdemir
cc06b5d7a6
fix: fix .der output in talosctl gen secureboot
PEM was converted to DER incorrectly when the output was a X509 certificate and not a public key.

Skip unnecessary parsing of it to an RSA public key before writing it in DER format as output.

Simplify the code as we do not generate `*-signing-public-key.pem` anymore.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2024-01-17 14:02:03 +01:00
Andrey Smirnov
1dbb4abf43
fix: update discovery service client to v0.1.6
This pulls in gRPC keepalive fix.

See https://github.com/siderolabs/discovery-client/pull/8

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-17 14:42:01 +04:00
Andrey Smirnov
9782319c31
fix: support KubePrism settings in Kubernetes Discovery
Fixes #8143

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-16 20:41:13 +04:00
Utku Ozdemir
6c5a0c2811
feat: generate a single JSON schema for multidoc config
Rework docgen to scan a whole directory for multidoc config types recursively and generate a single schema for all of them.

Annotate the files which need to be scanned by docgen while generating a schema by `//docgen:jsonschema`.

Move and rename the schema.

Bring back schema tests.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2024-01-16 12:25:15 +01:00
Dmitriy Matrenichev
f70b47dddc
fix: force KubePrism to connect using IPv4
Before this change KubePrism used hardcoded "localhost" as destination which Go could resolve to IPv6 destination and
then fail to connect to. This change forces KubePrism to connect using IPv4 and uses hardcoded "127.0.0.1" destination so
it will always use IPv4.

For #8112

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-01-15 21:25:05 +03:00
Andrey Smirnov
d5321e085e
fix: update kmsg with utf-8 fix
See: https://github.com/siderolabs/go-kmsg/pull/9

This fixes lots of `\xab` issues, specifically in:

* `talosctl dmesg` output
* `taloscl dashboard`
* embedded dashboard, including OAuth2 QR code display

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-12 18:33:43 +04:00
Utku Ozdemir
7fa7362ddc
fix: fix nodes on dashboard footer when node names are used in --nodes
When the dashboard is used via the CLI through a proxy, e.g., through Omni, node names or IDs can be used in the `--nodes` flag instead of the IPs.

This caused rendering inconsistencies in the dashboard, as some parts of it used the IPs and some used the names passed in the context.

Fix this by collecting all node IPs on dashboard start, and map these IPs to the respective nodes passed as the `--nodes` flag.

On the dashboard footer, we always display the node names as they are passed in the `--nodes` flag.

As part of it, remove the node list change reactivity from the dashboard, so it will always take the passed nodes as the truth.

The IP to node mapping collection at dashboard startup also solves another issue where the first API call by the dashboard triggered the interactive API authentication (e.g., the OIDC flow). Previously, because the terminal was already switched to the raw mode, it was not possible to authenticate properly.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2024-01-12 12:00:08 +01:00
Utku Ozdemir
ba88678f1a
fix: merge ports and ingress configs correctly in NetworkRuleConfig
Use `replace` patch merging strategy for `portSelector.ports` and `ingress`es in `NetworkRuleConfig` document, so that they do not have duplicate entries and/or fail on port range validation.

Closes siderolabs/talos#8136.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2024-01-11 16:09:47 +01:00
Jonomir
dea9bda2d0
fix: disk UUID & WWID always empty in talosctl disks
Add missing attributes to conversion of go-blockdevice disk
to protobuf disk.

Signed-off-by: Jonomir <68125495+Jonomir@users.noreply.github.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-11 14:37:39 +04:00
Andrey Smirnov
8dc112f36b
chore: pull in NBD modules
See https://github.com/siderolabs/pkgs/pull/862

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-29 20:08:49 +04:00
Serge Logvinov
f6926faab5
fix: default priority for ipv6
We will use the default IPv6 gateway priority as 2048.
The RA default is 1024, which leads to verbose messages such as 'error adding route: netlink receive: file exists.'

Azure uses DHCPv6 and RA for configuring IPv6 on the node.
The platform sets the default gateway as a fallback in case 'accept_ra' is not set to 2.

Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-29 18:42:23 +04:00
Andrey Smirnov
e8758dcbad
chore: support http downloads for assets in talosctl cluster create
This allows to pass direct URLs to Image Factory assets for disk
image/ISO/vmlinuz/initramfs, so that we can test Image Factory with
Talos.

Also add an integration test for Image Factory.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-25 18:58:25 +04:00
Andrey Smirnov
265f21be09
fix: replace the filemap implementation to not buffer in memory
This filemap is used to generate installer image layer with artifacts.

Previous dumb implementation buffered in memory which leads to extensive
memory usage.

See https://github.com/siderolabs/image-factory/issues/77

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-22 19:07:46 +04:00
Andrey Smirnov
8db3c5b3c6
fix: pick correctly base installer image layers
Only Talos 1.5+ provides proper optimized image,
Talos 1.4 provided a single-layer image (which worked in this case),
while Talos 1.2-1.3 have multi-layered images which can't be replaced
easily.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-22 17:09:05 +04:00
Andrey Smirnov
0a30ef7845
fix: imager should support different Talos versions
Add some quirks to make images generated with newer Talos compatible
with images generated by older Talos.

Specifically, reset options were adding in Talos 1.4, so we shouldn't
add them for older versions.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-22 16:13:34 +04:00
Andrey Smirnov
d6342cda53
docs: update latest version to v1.6.1
Also port a fix from #8103

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-22 14:42:03 +04:00
Andrey Smirnov
e6e422b92a
chore: bump dependencies
Go modules, tools, etc.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-21 19:01:16 +04:00
Andrey Smirnov
5a19d078ad
fix: properly overwrite files on install
Without truncate the file was not overwritten properly if the file with
the same name already exists and has smaller size.

Fixes #8097

Also add a 10 second timeout on UEFI ISO boot, so that boot menu can be
seen without pressing `Esc` many times.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-20 19:41:30 +04:00
Tim Jones
9eb6cea789
docs: secureboot sd-boot menu clarification
Add note to try spamming Esc to bring up the sd-boot menu option if keys
don't automatically enroll in UEFI firmware.

Signed-off-by: Tim Jones <tim.jones@siderolabs.com>
2023-12-19 18:19:31 +01:00
Andrey Smirnov
01f0cbe61c
feat: support iPXE direct booting in talosctl cluster create
This embeds a tiny TFTP server which serves UEFI iPXE which embeds a
script that chainloads a given iPXE script.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-19 17:56:08 +04:00
Andrey Smirnov
3ba84701d9
feat: pull in kernel modules for mlx Infiniband and VFIO
See:

* https://github.com/siderolabs/pkgs/pull/854
* https://github.com/siderolabs/pkgs/pull/855

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-19 13:55:42 +04:00
Andrey Smirnov
ba993e0edd
docs: announce that SecureBoot is available
Restructure the docs a bit to start with the easiest option (via Image
Factory).

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-18 20:43:08 +04:00
Andrey Smirnov
241bc9312e
fix: update the way secureboot signer fetches certificate (azure)
The previous code was a mistake, the public part of the certificate is
more easily available.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-18 17:54:51 +04:00
Dmitriy Matrenichev
59b62398f6
chore: modernize machined/pkg/controllers/k8s
This is going to be multipart effort to finally use safe.* wrappers in the production code.
Quick regexp search shows that there are around 150 direct type assertions on resources (excluding the ones in this commit).

Also - migrate from `interface{}` to `any` and use `slices.Sort*` instead of `sort.*` where possible.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2023-12-15 19:33:06 +03:00
Andrey Smirnov
760f793d55
fix: use correct prefix when installing SBC files
When creating an image under non-default mount prefix, it should be
used explicitly when copying SBC files.

See https://github.com/siderolabs/image-factory/issues/65

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-15 19:46:10 +04:00
Noel Georgi
0b94550c42
chore: fix the gvisor test
The gvisor test was not using the correct runtimeclass and would have
always passed the regardless.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-12-15 20:48:44 +05:30
Andrey Smirnov
3a787c1d67
docs: update 1.6 docs with Noel's feedback
I merged docs PR before receiving those updates.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-15 18:48:17 +04:00
Andrey Smirnov
d803e40ef2
docs: provide documentation for Talos 1.6
Updated lots of documentation with new/updated flows.

Provide What's New for Talos 1.6.0.

Update Troubleshooting guide to cover more steps.

Make Talos 1.6 docs the default.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-15 16:36:57 +04:00
Andrey Smirnov
9a185a30f7
feat: update Kubernetes to v1.29.0
See https://github.com/kubernetes/kubernetes/releases/v1.29.0

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-13 22:59:17 +04:00
Andrey Smirnov
5934815d2f
chore: split more kernel modules on amd64
See https://github.com/siderolabs/pkgs/pull/844

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-13 21:26:32 +04:00
Andrey Smirnov
10c59a6b90
fix: leave discovery service later in the reset sequence
Fixes #8057

I went back and forth on the way to fix it exactly, and ended up with a
pretty simple version of a fix.

The problem was that discovery service was removing the member at the
initial phase of reset, which actually still requires KubeSpan to be up:

* leaving `etcd` (need to talk to other members)
* stopping pods (might need to talk to Kubernetes API with some CNIs)

Now leaving discovery service happens way later, when network
interactions are no longer required.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-13 19:16:12 +04:00
Noel Georgi
0c86ca1cc6
chore: enable kubespan+firewall for cilium tests
Enable kubespan and default block firewall with cilium tests.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-12-12 22:50:47 +05:30
Andrey Smirnov
98fd722d51
feat: provide compatibility for future Talos 1.7
Ensure that Talos 1.6 machinery can handle compatibility for Talos 1.7.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-12 15:10:11 +04:00
Andrey Smirnov
131a1b1671
fix: add a KubeSpan option to disable extra endpoint harvesting
It works well for small clusters, but with bigger clusters it puts too
much load on the discovery service, as it has quadratic complexity in
number of endpoints discovered/reported from each member.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-12 14:07:31 +04:00
Artem Chernyshev
4547ad9afa
feat: send actor id to the SideroLink events sink
This might come handy to distinguish sequences, tasks initiated by a
particular API request.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2023-12-11 21:59:02 +03:00
Andrey Smirnov
04e7745471
docs: cap max heading level
Markdown/HTML can't have headings after level 6, so make sure the
maximum heading level is capped at 6.

We have just a single place with such deep nesting.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-11 18:39:18 +04:00
Dmitriy Matrenichev
6bb1e99aa3
chore: optimize pcap dump
Reimplement `gopacket.PacketSource.PacketsCtx` as `forEachPacket`.

- Use `ZeroCopyPacketDataSource` instead of `PacketDataSource`. I didn't find any specific reason why `PacketDataSource` exists at all, since `NewPacket` is doing copy inside if you don't explicitly tell it not to.
- Use `WillPool` to pool packet buffers. It doesn't fully remove allocations, but it's a safe start.
  Send packets back into the pool after we are done with them.
- Pass `Packet` directly to the closure instead of waiting for it on the channel. We don't store this packet anywhere so there is no reason to async this part.
- Drop `time.Sleep` code in `forEachPacket` body.
- Drop `SnapLen` support in client and server since it didn't work anyway (details in the PR).

Closes #7994

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2023-12-11 15:44:42 +03:00
Andrey Smirnov
4f9d3b975f
feat: update Kubernetes to v1.29.0-rc.2
See https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.29.md

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-08 19:41:28 +04:00
Andrey Smirnov
46121c9fec
docs: rework machine config documentation generation
Generate a structured table of contents following the structure of the
config.

Make high-level examples follow the full structure of the config.

Document new multi-doc machine config.

Fixes #8023

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-08 14:16:40 +04:00
Andrey Smirnov
e128d3c827
fix: talosctl cluster create not to enforce kubeprism always
The command should be able to deploy old versions of Talos as well,
even before KubePrism.

The version contract correctly enables/disables KubePrism by default, so
take default flag value as "don't change defaults".

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-07 18:15:54 +04:00
Andrey Smirnov
320064c5a8
feat: update Go 1.21.5, Linux 6.1.65, etcd 3.5.11
For main version, cut the release notes to start the 1.7 process.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-07 16:52:28 +04:00
Andrey Smirnov
270604bead
fix: support user disks via symlinks
The core blockdevice library already supported resolving symlinks, we
just need to get the raw block device name from it, and use it
afterwards.

In QEMU provisioner, leave the first (system) disk as virtio (for
performance), and mount user disks as 'ata', which allows `udevd` to
pick up the disk IDs (not available for `virtio`), and use the symlink
path in the tests.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-05 22:02:56 +04:00
Andrey Smirnov
4f195dd271
chore: fix the release.toml
It was using `note` instead of `notes`, so some entries got dropped.

I blame CodePilot for that ;)

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-04 20:23:03 +04:00
Andrey Smirnov
474fa0480d
fix: store and execute desired action on emergency action
Fixes #7854

Talos runs an emergency handler if the sequence experience and
unrecoverable failure. The emergency handler was unconditionally
executing "reboot" action if no other action was received (which only
gets received if the sequence completes successfully), so the Shutdown
request might result in a Reboot behavior on error during shutdown
phase.

This is not a pretty fix, but it's hard to deliver the intent from one
part of the code to another right now, so instead use a global variable
which stores default emergency intention, and gets overridden early in
the Shutdown sequence.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-04 19:51:48 +04:00
Sebastian Gaiser
515ae2a184
docs: extend hetzner-cloud docs for arm64
Added docs for arm64 and updated packer plugin.

Signed-off-by: Sebastian Gaiser <sebastiangaiser@users.noreply.github.com>
Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-12-04 20:49:25 +05:30
Dmitriy Matrenichev
eecc4dbd51
fix: trim leading spaces\newlines in inline manifest contents
In route `LoadPatches` -> `configpatcher.Apply` -> `configloader.NewFromBytes` any leading newlines will be transformed  into `|4` yaml. We want to prevent that.

Closes #7993

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2023-12-04 17:12:20 +03:00
Andrey Smirnov
dbf274ddf7
fix: skip writing the file if the contents haven't changed
As the controller reconciles every /etc file present, it might be called
multiple times for the same file, even if the actual contents haven't
changed.

Rewriting the file might lead to some concurrent process seeing
incomplete file contents more often than needed.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-04 15:58:03 +04:00
Dmitriy Matrenichev
6329222bdc
fix: do not panic in merge.Merge if map value is nil
Checking for `zeroValue` is not enough when accessing `map[string]any`.

Closes #8005

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2023-12-04 12:38:09 +03:00
Andrey Smirnov
d8a435f0e4
fix: initialize boot assets with defaults early
The problem was that bootloaders were correctly picking up defaults for
`installer` mode (vs. `imager` mode), but DTB and other SBC stuff wasn't
properly initialized, so installing on SBC fails.

Now all options are properly initialized with defaults early in the
process.

Fixes #8009

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-01 17:47:05 +04:00
Andrey Smirnov
c6835de17a
fix: pick etcd adverised addresses from 'current' addresses
Fixes #7947

This way etcd advertised address can be picked from the `external IPs`
of the machine.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-01 17:26:28 +04:00