4637 Commits

Author SHA1 Message Date
Dmitriy Matrenichev
ebeef28525
feat: implement local caching dns server
This PR adds a new controller - `DNSServerController` that starts tcp and udp dns servers locally. Just like `EtcFileController` it monitors `ResolverStatusType` and updates the list of destinations from there.

Most of the caching logic is in our "lobotomized" "`CoreDNS` fork. We need this fork because default `CoreDNS` carries
full Caddy server and various other modules that we don't need in Talos. On our side we implement
random selection of the actual dns and request forwarding.

Closes #7693

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-01-29 20:26:38 +03:00
edwinavalos
4a3691a273
docs: fix broken links in metal-network-configuration.md
Fixed the set of same links in 1.4, 1.5, 1.6, and 1.7, with an exception
of a link in 1.4 where the it links to boot assets and boot assets, if
we were to place a copy in that version, is missing a bunch of
supporting links. Opted to skip that update, as that documentation is
unsupported.

Signed-off-by: edwinavalos <edwin.a.avalos@gmail.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-29 18:44:21 +04:00
Spencer Smith
c4ed189a69
docs: provide sane defaults for each release series in vmware script
This PR sets proper defaults based on the series of talos. Defaults to last release in each series.

Signed-off-by: Spencer Smith <spencer.smith@talos-systems.com>
2024-01-29 09:25:04 -05:00
Andrey Smirnov
8138d54c6c
docs: clarify node taints/labels for worker nodes
`NodeRestriction` admission plugin heavily restricts what worker nodes
can set.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-29 17:56:46 +04:00
Andrey Smirnov
b44551ccdb
feat: update Linux to 6.6.13
See https://github.com/siderolabs/pkgs/pull/873

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-29 16:50:33 +04:00
Christian Mohn
385707c5f3
docs: update vmware.sh
Add support for using the GOVC_NETWORK environment variable to determine which vSphere vSwitch PortGroup to use.

This checks if the GOVC_NETWORK environment variable is set, if that's the case, use that value. If not, continue with the default PortGroup (VM Network) as before.

Checks added for both control plane and worker nodes.

Signed-off-by: Christian Mohn <christian@drible.net>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-29 14:55:21 +04:00
Spencer Smith
d1a79b845f
docs: fix small typo in etcd maintenance guide
This PR fixes a little typo in these docs, b/c etcd is under the cluster
key.

Signed-off-by: Spencer Smith <spencer.smith@talos-systems.com>
2024-01-29 14:22:04 +04:00
Utku Ozdemir
cf0603330a
docs: copy generated JSON schema to host
After the JSON schema is generated in a build container, copy it over to the host, so it becomes a part of the codebase.

This is required as the location of the schema changed recently from being under `pkg/machinery/config/types/` to be under `pkg/machinery/config/schemas/`.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2024-01-26 13:56:55 +01:00
Andrey Smirnov
f11139c229
docs: document local path provisioner install
Use kustomize (as the official supported way for Local Path
Provisioner).

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-26 14:30:45 +04:00
Andrey Smirnov
e0dfbb8fba
fix: allow META encoded values to be compressed
Fixes #8186

This is planned to be backported to Talos 1.6.3.

This allows to pass large META values (YAML for platform network
configuration) which might otherwise exceed the limit for kernel
command line params.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-23 17:24:18 +04:00
Andrey Smirnov
d677901b67
feat: implement device selector for 'physical'
Closes #8090

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-23 15:05:51 +04:00
ExtraClock
7d11172896
docs: add missing talosconfig flag
Add missing `--talosconfig` flag to setting up vmtoolds secret step.

Signed-off-by: ExtraClock <35864862+ExtraClock@users.noreply.github.com>
Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-01-23 12:39:41 +05:30
Andrey Smirnov
8a1732bcb1
fix: pull in mptspi driver
See https://github.com/siderolabs/pkgs/pull/871

This should fix issues with VMWare SCSI disk virtualization.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-22 19:52:15 +04:00
Andrey Smirnov
c1e45071f0
refactor: use etcd configuration from the EtcdSpec resource
This is currently no-op, just noticed that while looking into another
bug. This should make the intention more clean.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-22 16:06:16 +04:00
Andrey Smirnov
4e9b688d3f
fix: use correct TTL for talosconfig in talosctl config new
See #8152

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-22 15:39:41 +04:00
Andrey Smirnov
fb5ad05551
feat: update Kubernetes default to 1.29.1
See https://github.com/kubernetes/kubernetes/releases/v1.29.1

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-18 20:20:29 +04:00
Andrey Smirnov
fe24139f3c
docs: fork docs for v1.7
Time start v1.7 development cycle!

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-18 19:17:42 +04:00
Andrey Smirnov
1c2d10cccc
chore: bump dependencies
Go 1.21.6, update pkgs, tools, Go modules, etc.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-18 18:01:05 +04:00
Anthony ARNAUD
a599e38674
chore: allow custom registry to build installer/imager
Use custom pkgs repository by setting PKGS_PREFIX as argument.

Signed-off-by: Anthony ARNAUD <github@anthony-arnaud.fr>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-18 17:32:52 +04:00
Steve Francis
3911ddf7bd
docs: add how-to for cert management
Explain certificate auto-rotation.

Signed-off-by: Steve Francis <steve.francis@talos-systems.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-18 16:57:42 +04:00
Andrey Smirnov
b0ee0bfba3
fix: strategic patch merging for audit policy
The audit policy is marked as `merge: replace`, but there's no check for
zero value. So the problem is that any patch which has `cluster:`
section zeroes out previously set `cluster.apiServer.auditPolicy`.

Add regression tests.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-18 14:36:44 +04:00
Andrey Smirnov
474eccdc4c
fix: watch bufer overrun for RouteStatus
Fixes #8157

This PR contains two fixes, both related to the same problem.

Several routes for different links but  same IPv6 destination might exist
at the same time, so route resource ID should handle that. The problem
was that these routes were mis-reported causing internally updates for
the same resources multiple times (equal to the number of the links).

Don't trigger controllers more often than 10 times/seconds (with burst of
5) for kernel notifications. This ensures Talos doesn't try to reflect
current state of the network subsystem too often as resources, which
causes excessive CPU usage and might potentially lead to the buffer
overrun under high rate of changes.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-17 19:28:25 +04:00
Utku Ozdemir
cc06b5d7a6
fix: fix .der output in talosctl gen secureboot
PEM was converted to DER incorrectly when the output was a X509 certificate and not a public key.

Skip unnecessary parsing of it to an RSA public key before writing it in DER format as output.

Simplify the code as we do not generate `*-signing-public-key.pem` anymore.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2024-01-17 14:02:03 +01:00
Andrey Smirnov
1dbb4abf43
fix: update discovery service client to v0.1.6
This pulls in gRPC keepalive fix.

See https://github.com/siderolabs/discovery-client/pull/8

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-17 14:42:01 +04:00
Andrey Smirnov
9782319c31
fix: support KubePrism settings in Kubernetes Discovery
Fixes #8143

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-16 20:41:13 +04:00
Utku Ozdemir
6c5a0c2811
feat: generate a single JSON schema for multidoc config
Rework docgen to scan a whole directory for multidoc config types recursively and generate a single schema for all of them.

Annotate the files which need to be scanned by docgen while generating a schema by `//docgen:jsonschema`.

Move and rename the schema.

Bring back schema tests.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2024-01-16 12:25:15 +01:00
Dmitriy Matrenichev
f70b47dddc
fix: force KubePrism to connect using IPv4
Before this change KubePrism used hardcoded "localhost" as destination which Go could resolve to IPv6 destination and
then fail to connect to. This change forces KubePrism to connect using IPv4 and uses hardcoded "127.0.0.1" destination so
it will always use IPv4.

For #8112

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-01-15 21:25:05 +03:00
Andrey Smirnov
d5321e085e
fix: update kmsg with utf-8 fix
See: https://github.com/siderolabs/go-kmsg/pull/9

This fixes lots of `\xab` issues, specifically in:

* `talosctl dmesg` output
* `taloscl dashboard`
* embedded dashboard, including OAuth2 QR code display

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-12 18:33:43 +04:00
Utku Ozdemir
7fa7362ddc
fix: fix nodes on dashboard footer when node names are used in --nodes
When the dashboard is used via the CLI through a proxy, e.g., through Omni, node names or IDs can be used in the `--nodes` flag instead of the IPs.

This caused rendering inconsistencies in the dashboard, as some parts of it used the IPs and some used the names passed in the context.

Fix this by collecting all node IPs on dashboard start, and map these IPs to the respective nodes passed as the `--nodes` flag.

On the dashboard footer, we always display the node names as they are passed in the `--nodes` flag.

As part of it, remove the node list change reactivity from the dashboard, so it will always take the passed nodes as the truth.

The IP to node mapping collection at dashboard startup also solves another issue where the first API call by the dashboard triggered the interactive API authentication (e.g., the OIDC flow). Previously, because the terminal was already switched to the raw mode, it was not possible to authenticate properly.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2024-01-12 12:00:08 +01:00
Utku Ozdemir
ba88678f1a
fix: merge ports and ingress configs correctly in NetworkRuleConfig
Use `replace` patch merging strategy for `portSelector.ports` and `ingress`es in `NetworkRuleConfig` document, so that they do not have duplicate entries and/or fail on port range validation.

Closes siderolabs/talos#8136.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2024-01-11 16:09:47 +01:00
Jonomir
dea9bda2d0
fix: disk UUID & WWID always empty in talosctl disks
Add missing attributes to conversion of go-blockdevice disk
to protobuf disk.

Signed-off-by: Jonomir <68125495+Jonomir@users.noreply.github.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-11 14:37:39 +04:00
Andrey Smirnov
8dc112f36b
chore: pull in NBD modules
See https://github.com/siderolabs/pkgs/pull/862

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-29 20:08:49 +04:00
Serge Logvinov
f6926faab5
fix: default priority for ipv6
We will use the default IPv6 gateway priority as 2048.
The RA default is 1024, which leads to verbose messages such as 'error adding route: netlink receive: file exists.'

Azure uses DHCPv6 and RA for configuring IPv6 on the node.
The platform sets the default gateway as a fallback in case 'accept_ra' is not set to 2.

Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-29 18:42:23 +04:00
Andrey Smirnov
e8758dcbad
chore: support http downloads for assets in talosctl cluster create
This allows to pass direct URLs to Image Factory assets for disk
image/ISO/vmlinuz/initramfs, so that we can test Image Factory with
Talos.

Also add an integration test for Image Factory.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-25 18:58:25 +04:00
Andrey Smirnov
265f21be09
fix: replace the filemap implementation to not buffer in memory
This filemap is used to generate installer image layer with artifacts.

Previous dumb implementation buffered in memory which leads to extensive
memory usage.

See https://github.com/siderolabs/image-factory/issues/77

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-22 19:07:46 +04:00
Andrey Smirnov
8db3c5b3c6
fix: pick correctly base installer image layers
Only Talos 1.5+ provides proper optimized image,
Talos 1.4 provided a single-layer image (which worked in this case),
while Talos 1.2-1.3 have multi-layered images which can't be replaced
easily.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-22 17:09:05 +04:00
Andrey Smirnov
0a30ef7845
fix: imager should support different Talos versions
Add some quirks to make images generated with newer Talos compatible
with images generated by older Talos.

Specifically, reset options were adding in Talos 1.4, so we shouldn't
add them for older versions.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-22 16:13:34 +04:00
Andrey Smirnov
d6342cda53
docs: update latest version to v1.6.1
Also port a fix from #8103

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-22 14:42:03 +04:00
Andrey Smirnov
e6e422b92a
chore: bump dependencies
Go modules, tools, etc.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-21 19:01:16 +04:00
Andrey Smirnov
5a19d078ad
fix: properly overwrite files on install
Without truncate the file was not overwritten properly if the file with
the same name already exists and has smaller size.

Fixes #8097

Also add a 10 second timeout on UEFI ISO boot, so that boot menu can be
seen without pressing `Esc` many times.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-20 19:41:30 +04:00
Tim Jones
9eb6cea789
docs: secureboot sd-boot menu clarification
Add note to try spamming Esc to bring up the sd-boot menu option if keys
don't automatically enroll in UEFI firmware.

Signed-off-by: Tim Jones <tim.jones@siderolabs.com>
2023-12-19 18:19:31 +01:00
Andrey Smirnov
01f0cbe61c
feat: support iPXE direct booting in talosctl cluster create
This embeds a tiny TFTP server which serves UEFI iPXE which embeds a
script that chainloads a given iPXE script.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-19 17:56:08 +04:00
Andrey Smirnov
3ba84701d9
feat: pull in kernel modules for mlx Infiniband and VFIO
See:

* https://github.com/siderolabs/pkgs/pull/854
* https://github.com/siderolabs/pkgs/pull/855

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-19 13:55:42 +04:00
Andrey Smirnov
ba993e0edd
docs: announce that SecureBoot is available
Restructure the docs a bit to start with the easiest option (via Image
Factory).

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-18 20:43:08 +04:00
Andrey Smirnov
241bc9312e
fix: update the way secureboot signer fetches certificate (azure)
The previous code was a mistake, the public part of the certificate is
more easily available.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-18 17:54:51 +04:00
Dmitriy Matrenichev
59b62398f6
chore: modernize machined/pkg/controllers/k8s
This is going to be multipart effort to finally use safe.* wrappers in the production code.
Quick regexp search shows that there are around 150 direct type assertions on resources (excluding the ones in this commit).

Also - migrate from `interface{}` to `any` and use `slices.Sort*` instead of `sort.*` where possible.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2023-12-15 19:33:06 +03:00
Andrey Smirnov
760f793d55
fix: use correct prefix when installing SBC files
When creating an image under non-default mount prefix, it should be
used explicitly when copying SBC files.

See https://github.com/siderolabs/image-factory/issues/65

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-15 19:46:10 +04:00
Noel Georgi
0b94550c42
chore: fix the gvisor test
The gvisor test was not using the correct runtimeclass and would have
always passed the regardless.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-12-15 20:48:44 +05:30
Andrey Smirnov
3a787c1d67
docs: update 1.6 docs with Noel's feedback
I merged docs PR before receiving those updates.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-15 18:48:17 +04:00
Andrey Smirnov
d803e40ef2
docs: provide documentation for Talos 1.6
Updated lots of documentation with new/updated flows.

Provide What's New for Talos 1.6.0.

Update Troubleshooting guide to cover more steps.

Make Talos 1.6 docs the default.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-15 16:36:57 +04:00