4527 Commits

Author SHA1 Message Date
Andrey Smirnov
d677901b67
feat: implement device selector for 'physical'
Closes #8090

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-23 15:05:51 +04:00
ExtraClock
7d11172896
docs: add missing talosconfig flag
Add missing `--talosconfig` flag to setting up vmtoolds secret step.

Signed-off-by: ExtraClock <35864862+ExtraClock@users.noreply.github.com>
Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-01-23 12:39:41 +05:30
Andrey Smirnov
8a1732bcb1
fix: pull in mptspi driver
See https://github.com/siderolabs/pkgs/pull/871

This should fix issues with VMWare SCSI disk virtualization.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-22 19:52:15 +04:00
Andrey Smirnov
c1e45071f0
refactor: use etcd configuration from the EtcdSpec resource
This is currently no-op, just noticed that while looking into another
bug. This should make the intention more clean.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-22 16:06:16 +04:00
Andrey Smirnov
4e9b688d3f
fix: use correct TTL for talosconfig in talosctl config new
See #8152

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-22 15:39:41 +04:00
Andrey Smirnov
fb5ad05551
feat: update Kubernetes default to 1.29.1
See https://github.com/kubernetes/kubernetes/releases/v1.29.1

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-18 20:20:29 +04:00
Andrey Smirnov
fe24139f3c
docs: fork docs for v1.7
Time start v1.7 development cycle!

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-18 19:17:42 +04:00
Andrey Smirnov
1c2d10cccc
chore: bump dependencies
Go 1.21.6, update pkgs, tools, Go modules, etc.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-18 18:01:05 +04:00
Anthony ARNAUD
a599e38674
chore: allow custom registry to build installer/imager
Use custom pkgs repository by setting PKGS_PREFIX as argument.

Signed-off-by: Anthony ARNAUD <github@anthony-arnaud.fr>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-18 17:32:52 +04:00
Steve Francis
3911ddf7bd
docs: add how-to for cert management
Explain certificate auto-rotation.

Signed-off-by: Steve Francis <steve.francis@talos-systems.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-18 16:57:42 +04:00
Andrey Smirnov
b0ee0bfba3
fix: strategic patch merging for audit policy
The audit policy is marked as `merge: replace`, but there's no check for
zero value. So the problem is that any patch which has `cluster:`
section zeroes out previously set `cluster.apiServer.auditPolicy`.

Add regression tests.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-18 14:36:44 +04:00
Andrey Smirnov
474eccdc4c
fix: watch bufer overrun for RouteStatus
Fixes #8157

This PR contains two fixes, both related to the same problem.

Several routes for different links but  same IPv6 destination might exist
at the same time, so route resource ID should handle that. The problem
was that these routes were mis-reported causing internally updates for
the same resources multiple times (equal to the number of the links).

Don't trigger controllers more often than 10 times/seconds (with burst of
5) for kernel notifications. This ensures Talos doesn't try to reflect
current state of the network subsystem too often as resources, which
causes excessive CPU usage and might potentially lead to the buffer
overrun under high rate of changes.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-17 19:28:25 +04:00
Utku Ozdemir
cc06b5d7a6
fix: fix .der output in talosctl gen secureboot
PEM was converted to DER incorrectly when the output was a X509 certificate and not a public key.

Skip unnecessary parsing of it to an RSA public key before writing it in DER format as output.

Simplify the code as we do not generate `*-signing-public-key.pem` anymore.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2024-01-17 14:02:03 +01:00
Andrey Smirnov
1dbb4abf43
fix: update discovery service client to v0.1.6
This pulls in gRPC keepalive fix.

See https://github.com/siderolabs/discovery-client/pull/8

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-17 14:42:01 +04:00
Andrey Smirnov
9782319c31
fix: support KubePrism settings in Kubernetes Discovery
Fixes #8143

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-16 20:41:13 +04:00
Utku Ozdemir
6c5a0c2811
feat: generate a single JSON schema for multidoc config
Rework docgen to scan a whole directory for multidoc config types recursively and generate a single schema for all of them.

Annotate the files which need to be scanned by docgen while generating a schema by `//docgen:jsonschema`.

Move and rename the schema.

Bring back schema tests.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2024-01-16 12:25:15 +01:00
Dmitriy Matrenichev
f70b47dddc
fix: force KubePrism to connect using IPv4
Before this change KubePrism used hardcoded "localhost" as destination which Go could resolve to IPv6 destination and
then fail to connect to. This change forces KubePrism to connect using IPv4 and uses hardcoded "127.0.0.1" destination so
it will always use IPv4.

For #8112

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-01-15 21:25:05 +03:00
Andrey Smirnov
d5321e085e
fix: update kmsg with utf-8 fix
See: https://github.com/siderolabs/go-kmsg/pull/9

This fixes lots of `\xab` issues, specifically in:

* `talosctl dmesg` output
* `taloscl dashboard`
* embedded dashboard, including OAuth2 QR code display

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-12 18:33:43 +04:00
Utku Ozdemir
7fa7362ddc
fix: fix nodes on dashboard footer when node names are used in --nodes
When the dashboard is used via the CLI through a proxy, e.g., through Omni, node names or IDs can be used in the `--nodes` flag instead of the IPs.

This caused rendering inconsistencies in the dashboard, as some parts of it used the IPs and some used the names passed in the context.

Fix this by collecting all node IPs on dashboard start, and map these IPs to the respective nodes passed as the `--nodes` flag.

On the dashboard footer, we always display the node names as they are passed in the `--nodes` flag.

As part of it, remove the node list change reactivity from the dashboard, so it will always take the passed nodes as the truth.

The IP to node mapping collection at dashboard startup also solves another issue where the first API call by the dashboard triggered the interactive API authentication (e.g., the OIDC flow). Previously, because the terminal was already switched to the raw mode, it was not possible to authenticate properly.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2024-01-12 12:00:08 +01:00
Utku Ozdemir
ba88678f1a
fix: merge ports and ingress configs correctly in NetworkRuleConfig
Use `replace` patch merging strategy for `portSelector.ports` and `ingress`es in `NetworkRuleConfig` document, so that they do not have duplicate entries and/or fail on port range validation.

Closes siderolabs/talos#8136.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2024-01-11 16:09:47 +01:00
Jonomir
dea9bda2d0
fix: disk UUID & WWID always empty in talosctl disks
Add missing attributes to conversion of go-blockdevice disk
to protobuf disk.

Signed-off-by: Jonomir <68125495+Jonomir@users.noreply.github.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-11 14:37:39 +04:00
Andrey Smirnov
8dc112f36b
chore: pull in NBD modules
See https://github.com/siderolabs/pkgs/pull/862

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-29 20:08:49 +04:00
Serge Logvinov
f6926faab5
fix: default priority for ipv6
We will use the default IPv6 gateway priority as 2048.
The RA default is 1024, which leads to verbose messages such as 'error adding route: netlink receive: file exists.'

Azure uses DHCPv6 and RA for configuring IPv6 on the node.
The platform sets the default gateway as a fallback in case 'accept_ra' is not set to 2.

Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-29 18:42:23 +04:00
Andrey Smirnov
e8758dcbad
chore: support http downloads for assets in talosctl cluster create
This allows to pass direct URLs to Image Factory assets for disk
image/ISO/vmlinuz/initramfs, so that we can test Image Factory with
Talos.

Also add an integration test for Image Factory.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-25 18:58:25 +04:00
Andrey Smirnov
265f21be09
fix: replace the filemap implementation to not buffer in memory
This filemap is used to generate installer image layer with artifacts.

Previous dumb implementation buffered in memory which leads to extensive
memory usage.

See https://github.com/siderolabs/image-factory/issues/77

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-22 19:07:46 +04:00
Andrey Smirnov
8db3c5b3c6
fix: pick correctly base installer image layers
Only Talos 1.5+ provides proper optimized image,
Talos 1.4 provided a single-layer image (which worked in this case),
while Talos 1.2-1.3 have multi-layered images which can't be replaced
easily.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-22 17:09:05 +04:00
Andrey Smirnov
0a30ef7845
fix: imager should support different Talos versions
Add some quirks to make images generated with newer Talos compatible
with images generated by older Talos.

Specifically, reset options were adding in Talos 1.4, so we shouldn't
add them for older versions.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-22 16:13:34 +04:00
Andrey Smirnov
d6342cda53
docs: update latest version to v1.6.1
Also port a fix from #8103

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-22 14:42:03 +04:00
Andrey Smirnov
e6e422b92a
chore: bump dependencies
Go modules, tools, etc.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-21 19:01:16 +04:00
Andrey Smirnov
5a19d078ad
fix: properly overwrite files on install
Without truncate the file was not overwritten properly if the file with
the same name already exists and has smaller size.

Fixes #8097

Also add a 10 second timeout on UEFI ISO boot, so that boot menu can be
seen without pressing `Esc` many times.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-20 19:41:30 +04:00
Tim Jones
9eb6cea789
docs: secureboot sd-boot menu clarification
Add note to try spamming Esc to bring up the sd-boot menu option if keys
don't automatically enroll in UEFI firmware.

Signed-off-by: Tim Jones <tim.jones@siderolabs.com>
2023-12-19 18:19:31 +01:00
Andrey Smirnov
01f0cbe61c
feat: support iPXE direct booting in talosctl cluster create
This embeds a tiny TFTP server which serves UEFI iPXE which embeds a
script that chainloads a given iPXE script.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-19 17:56:08 +04:00
Andrey Smirnov
3ba84701d9
feat: pull in kernel modules for mlx Infiniband and VFIO
See:

* https://github.com/siderolabs/pkgs/pull/854
* https://github.com/siderolabs/pkgs/pull/855

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-19 13:55:42 +04:00
Andrey Smirnov
ba993e0edd
docs: announce that SecureBoot is available
Restructure the docs a bit to start with the easiest option (via Image
Factory).

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-18 20:43:08 +04:00
Andrey Smirnov
241bc9312e
fix: update the way secureboot signer fetches certificate (azure)
The previous code was a mistake, the public part of the certificate is
more easily available.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-18 17:54:51 +04:00
Dmitriy Matrenichev
59b62398f6
chore: modernize machined/pkg/controllers/k8s
This is going to be multipart effort to finally use safe.* wrappers in the production code.
Quick regexp search shows that there are around 150 direct type assertions on resources (excluding the ones in this commit).

Also - migrate from `interface{}` to `any` and use `slices.Sort*` instead of `sort.*` where possible.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2023-12-15 19:33:06 +03:00
Andrey Smirnov
760f793d55
fix: use correct prefix when installing SBC files
When creating an image under non-default mount prefix, it should be
used explicitly when copying SBC files.

See https://github.com/siderolabs/image-factory/issues/65

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-15 19:46:10 +04:00
Noel Georgi
0b94550c42
chore: fix the gvisor test
The gvisor test was not using the correct runtimeclass and would have
always passed the regardless.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-12-15 20:48:44 +05:30
Andrey Smirnov
3a787c1d67
docs: update 1.6 docs with Noel's feedback
I merged docs PR before receiving those updates.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-15 18:48:17 +04:00
Andrey Smirnov
d803e40ef2
docs: provide documentation for Talos 1.6
Updated lots of documentation with new/updated flows.

Provide What's New for Talos 1.6.0.

Update Troubleshooting guide to cover more steps.

Make Talos 1.6 docs the default.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-15 16:36:57 +04:00
Andrey Smirnov
9a185a30f7
feat: update Kubernetes to v1.29.0
See https://github.com/kubernetes/kubernetes/releases/v1.29.0

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-13 22:59:17 +04:00
Andrey Smirnov
5934815d2f
chore: split more kernel modules on amd64
See https://github.com/siderolabs/pkgs/pull/844

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-13 21:26:32 +04:00
Andrey Smirnov
10c59a6b90
fix: leave discovery service later in the reset sequence
Fixes #8057

I went back and forth on the way to fix it exactly, and ended up with a
pretty simple version of a fix.

The problem was that discovery service was removing the member at the
initial phase of reset, which actually still requires KubeSpan to be up:

* leaving `etcd` (need to talk to other members)
* stopping pods (might need to talk to Kubernetes API with some CNIs)

Now leaving discovery service happens way later, when network
interactions are no longer required.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-13 19:16:12 +04:00
Noel Georgi
0c86ca1cc6
chore: enable kubespan+firewall for cilium tests
Enable kubespan and default block firewall with cilium tests.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-12-12 22:50:47 +05:30
Andrey Smirnov
98fd722d51
feat: provide compatibility for future Talos 1.7
Ensure that Talos 1.6 machinery can handle compatibility for Talos 1.7.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-12 15:10:11 +04:00
Andrey Smirnov
131a1b1671
fix: add a KubeSpan option to disable extra endpoint harvesting
It works well for small clusters, but with bigger clusters it puts too
much load on the discovery service, as it has quadratic complexity in
number of endpoints discovered/reported from each member.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-12 14:07:31 +04:00
Artem Chernyshev
4547ad9afa
feat: send actor id to the SideroLink events sink
This might come handy to distinguish sequences, tasks initiated by a
particular API request.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2023-12-11 21:59:02 +03:00
Andrey Smirnov
04e7745471
docs: cap max heading level
Markdown/HTML can't have headings after level 6, so make sure the
maximum heading level is capped at 6.

We have just a single place with such deep nesting.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-11 18:39:18 +04:00
Dmitriy Matrenichev
6bb1e99aa3
chore: optimize pcap dump
Reimplement `gopacket.PacketSource.PacketsCtx` as `forEachPacket`.

- Use `ZeroCopyPacketDataSource` instead of `PacketDataSource`. I didn't find any specific reason why `PacketDataSource` exists at all, since `NewPacket` is doing copy inside if you don't explicitly tell it not to.
- Use `WillPool` to pool packet buffers. It doesn't fully remove allocations, but it's a safe start.
  Send packets back into the pool after we are done with them.
- Pass `Packet` directly to the closure instead of waiting for it on the channel. We don't store this packet anywhere so there is no reason to async this part.
- Drop `time.Sleep` code in `forEachPacket` body.
- Drop `SnapLen` support in client and server since it didn't work anyway (details in the PR).

Closes #7994

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2023-12-11 15:44:42 +03:00
Andrey Smirnov
4f9d3b975f
feat: update Kubernetes to v1.29.0-rc.2
See https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.29.md

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-08 19:41:28 +04:00