4565 Commits

Author SHA1 Message Date
Andrey Smirnov
013e130702
fix: error with decoding config document with wrong apiVersion
Fixes #8270

The base bug was that the registry will return `nil` `ConfigDocument` if
the version is not registered for a kind, which would result into weird
config decoding errors.

Add more unit-tests, while at it, also add more fuzzing samples.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-02-08 18:39:21 +04:00
Louis SCHNEIDER
1e77bb1c3d
chore: allow custom pkgs to build talos
Allow to override each package reference.

Signed-off-by: Louis SCHNEIDER <louis.schneider@bedrockstreaming.com>
Signed-off-by: Louis SCHNEIDER <louis@schne.id>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-02-08 17:07:31 +04:00
Andrey Smirnov
3f8a85f1b3
fix: unlock the upgrade mutex properly
Fixes #4525

The previous implementation had several issues:

* etcd concurrency session never closed
* Unlock() with potentially closed context
* unlocking when upgrade sequence finishes, but this overlaps with the
  machine reboot, so a chance that it never got unlocked

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-02-08 15:50:02 +04:00
AvnarJakob
61c3331b14
docs: update indentation in vip.md
Wrong YAML indentation.

Signed-off-by: AvnarJakob <75129695+AvnarJakob@users.noreply.github.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-02-08 15:16:40 +04:00
Andrey Smirnov
383e528df8
chore: allow uuid-based hostnames in talosctl cluster create
This is useful when the VMs are booted without machine config,
so default hostnames based on controlplanes/workers no longer make
sense.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-02-07 16:22:53 +04:00
Noel Georgi
1e6c8c4dec
feat: extensions services config
Support config files for extension services.

Fixes: #7791

Co-authored-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-02-06 17:12:01 +05:30
shurkys
989ca3ade1
feat: add OpenNebula platform support
Initial support without documentation.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Signed-off-by: shurkys <no@mail.com>
2024-02-05 20:43:47 +04:00
bri
914f887788
docs: update nocloud.md Proxmox information
Proxmox _does_ support manually editing the configuration files, but a safer option is to use the CLI or API for the sake of option validation.

This PR updates the documentation that suggested reading and editing the VM configuration by hand, and replaces that with CLI commands to do the same. The `qm` command needs to be run from a root shell, but you need to be `root` to edit (or even read!) the configuration via something like SFTP, anyway.

I also updated the UUID to be a real UUID, and then tested these commands on my home Proxmox server.

Signed-off-by: bri <284789+b-@users.noreply.github.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-02-05 20:05:09 +04:00
Henno Schooljan
a04cc80154
fix: pass TTL when generating client certificate
Pass the TTL to the talosconfig generation function.

Signed-off-by: Henno Schooljan <github@sfynx.nl>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-02-05 18:54:16 +04:00
Dmitriy Matrenichev
3fe8c12ca6
fix: add log line about controller runtime failing
While we decide what to do with #8263 and #8256 this quickfix at least allows us to
see what went wrong

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-02-05 17:22:02 +03:00
Andrey Smirnov
ddbabc7e58
fix: use a separate cgroup for each extension service
Fixes #8229

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-02-05 17:37:55 +04:00
Andrey Smirnov
6ccdd2c09c
chore: fix markdown-lint call
Don't ask me why this weird syntax for flags.

Don't ask me why it fails with exit code zero (success) on invalid
flags.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-02-05 17:18:45 +04:00
Saiyam Pathak
4184e617ab
chore: add test for wasmedge runtime extension
Add tests for WasmEdge container runtime system extension.

Signed-off-by: Saiyam Pathak <saiyam911@gmail.com>
Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-02-05 18:18:13 +05:30
Andrey Smirnov
95ea3a6c65
chore: bump timeout in acquire tests
With switching to RSA service account, machine config generation time is
considerably higher now, so the test might not make it in time.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-02-05 15:18:22 +04:00
Andrey Smirnov
c19a505d8c
chore: bump docker dind image
We don't need hacked one anymore.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-02-05 14:43:39 +04:00
fazledyn-or
d7d4154d5d
chore: remove channel blocking in qemu launch
The channel is never read from.

Signed-off-by: fazledyn-or <ataf@openrefactory.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-02-02 18:57:36 +04:00
Andrey Smirnov
029d7f7b9b
release(v1.7.0-alpha.0): prepare release
This is the official v1.7.0-alpha.0 release.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-02-01 22:10:27 +04:00
Andrey Smirnov
2ff81c06bc
feat: update runc 1.1.12, containerd 1.7.13
Also:

* Linux 6.6.14 + XDP enablement
* etcd 3.5.12

Various other bumps for the tools, utilities, and Go modules.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-02-01 17:01:04 +04:00
Andrey Smirnov
9d8cd4d058
chore: drop deprecated method EtcdRemoveMember
It was deprecated 16 months ago, time to cleanup.

(This is to prepare for the first v1.7 release)

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-02-01 15:54:29 +04:00
Andrey Smirnov
17567f19be
fix: take into account the moment seen when cleaning up CRI images
Fixes #8069

The image age from the CRI is the moment the image was pulled, so if it
was pulled long time ago, the previous version would nuke the image as
soon as it is unreferenced. The new version would allow the image to
stay for the full grace period in case the rollback is requested.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-02-01 14:44:22 +04:00
Andrey Smirnov
aa03204b86
docs: document the process of building custom kernel packages
Fixes #7612

Drop the customizing rootfs docs, and point towards system extensions
documentation, as it is the right way.

Document building custom Talos Linux kernel.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-02-01 14:24:31 +04:00
Andrey Smirnov
7af48bd559
feat: use RSA key for kube-apiserver service account key
Fixes #8111

Starting with 1.7, use RSA instead of ECDSA.

RSA is way slower, but it has better support with other providers.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-31 23:05:50 +04:00
Andrey Smirnov
a5e13c696d
fix: retry blockdevice open in the installer
We had these retries in other places, but not here.

This seems to happen more frequently with Linux 6.6 update, the tl;dr is
same: `udevd` tries to rescan the partition table at the wrong moment,
preventing Talos installer to open the partition which was just created.

It's a race, so workaround it by retrying the call.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-31 22:17:20 +04:00
Andrey Smirnov
593afeea38
fix: run the interactive installer loop to report errors
In the previous implementation, even though `installer.err` was set, it
was never checked 🤦.

The run loop was stolen from the dashboard code.

Fixes #8205

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-31 19:20:46 +04:00
Andrey Smirnov
87be76b878
fix: be more tolerant to error handling in Mounts API
Fixes #8202

If some mountpoint can't be queried successfully for 'diskfree'
information, don't treat that as an error, and report zero values for
disk usage/size instead.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-31 18:24:38 +04:00
stereobutter
03add75030
docs: add section on using imager with extensions from tarball
Add an example of using a custom extension via tarball.

Signed-off-by: stereobutter <sascha.desch@hotmail.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-30 15:56:59 +04:00
Steve Francis
ee0fb5effc
docs: consolidate certificate management articles
Move around some docs.

Signed-off-by: Steve Francis <steve.francis@talos-systems.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-30 15:22:04 +04:00
Dmitriy Matrenichev
9c14dea209
chore: bump coredns
Bump our CoreDNS fork.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-01-30 02:12:36 +03:00
Dmitriy Matrenichev
ebeef28525
feat: implement local caching dns server
This PR adds a new controller - `DNSServerController` that starts tcp and udp dns servers locally. Just like `EtcFileController` it monitors `ResolverStatusType` and updates the list of destinations from there.

Most of the caching logic is in our "lobotomized" "`CoreDNS` fork. We need this fork because default `CoreDNS` carries
full Caddy server and various other modules that we don't need in Talos. On our side we implement
random selection of the actual dns and request forwarding.

Closes #7693

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-01-29 20:26:38 +03:00
edwinavalos
4a3691a273
docs: fix broken links in metal-network-configuration.md
Fixed the set of same links in 1.4, 1.5, 1.6, and 1.7, with an exception
of a link in 1.4 where the it links to boot assets and boot assets, if
we were to place a copy in that version, is missing a bunch of
supporting links. Opted to skip that update, as that documentation is
unsupported.

Signed-off-by: edwinavalos <edwin.a.avalos@gmail.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-29 18:44:21 +04:00
Spencer Smith
c4ed189a69
docs: provide sane defaults for each release series in vmware script
This PR sets proper defaults based on the series of talos. Defaults to last release in each series.

Signed-off-by: Spencer Smith <spencer.smith@talos-systems.com>
2024-01-29 09:25:04 -05:00
Andrey Smirnov
8138d54c6c
docs: clarify node taints/labels for worker nodes
`NodeRestriction` admission plugin heavily restricts what worker nodes
can set.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-29 17:56:46 +04:00
Andrey Smirnov
b44551ccdb
feat: update Linux to 6.6.13
See https://github.com/siderolabs/pkgs/pull/873

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-29 16:50:33 +04:00
Christian Mohn
385707c5f3
docs: update vmware.sh
Add support for using the GOVC_NETWORK environment variable to determine which vSphere vSwitch PortGroup to use.

This checks if the GOVC_NETWORK environment variable is set, if that's the case, use that value. If not, continue with the default PortGroup (VM Network) as before.

Checks added for both control plane and worker nodes.

Signed-off-by: Christian Mohn <christian@drible.net>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-29 14:55:21 +04:00
Spencer Smith
d1a79b845f
docs: fix small typo in etcd maintenance guide
This PR fixes a little typo in these docs, b/c etcd is under the cluster
key.

Signed-off-by: Spencer Smith <spencer.smith@talos-systems.com>
2024-01-29 14:22:04 +04:00
Utku Ozdemir
cf0603330a
docs: copy generated JSON schema to host
After the JSON schema is generated in a build container, copy it over to the host, so it becomes a part of the codebase.

This is required as the location of the schema changed recently from being under `pkg/machinery/config/types/` to be under `pkg/machinery/config/schemas/`.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2024-01-26 13:56:55 +01:00
Andrey Smirnov
f11139c229
docs: document local path provisioner install
Use kustomize (as the official supported way for Local Path
Provisioner).

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-26 14:30:45 +04:00
Andrey Smirnov
e0dfbb8fba
fix: allow META encoded values to be compressed
Fixes #8186

This is planned to be backported to Talos 1.6.3.

This allows to pass large META values (YAML for platform network
configuration) which might otherwise exceed the limit for kernel
command line params.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-23 17:24:18 +04:00
Andrey Smirnov
d677901b67
feat: implement device selector for 'physical'
Closes #8090

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-23 15:05:51 +04:00
ExtraClock
7d11172896
docs: add missing talosconfig flag
Add missing `--talosconfig` flag to setting up vmtoolds secret step.

Signed-off-by: ExtraClock <35864862+ExtraClock@users.noreply.github.com>
Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-01-23 12:39:41 +05:30
Andrey Smirnov
8a1732bcb1
fix: pull in mptspi driver
See https://github.com/siderolabs/pkgs/pull/871

This should fix issues with VMWare SCSI disk virtualization.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-22 19:52:15 +04:00
Andrey Smirnov
c1e45071f0
refactor: use etcd configuration from the EtcdSpec resource
This is currently no-op, just noticed that while looking into another
bug. This should make the intention more clean.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-22 16:06:16 +04:00
Andrey Smirnov
4e9b688d3f
fix: use correct TTL for talosconfig in talosctl config new
See #8152

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-22 15:39:41 +04:00
Andrey Smirnov
fb5ad05551
feat: update Kubernetes default to 1.29.1
See https://github.com/kubernetes/kubernetes/releases/v1.29.1

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-18 20:20:29 +04:00
Andrey Smirnov
fe24139f3c
docs: fork docs for v1.7
Time start v1.7 development cycle!

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-18 19:17:42 +04:00
Andrey Smirnov
1c2d10cccc
chore: bump dependencies
Go 1.21.6, update pkgs, tools, Go modules, etc.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-18 18:01:05 +04:00
Anthony ARNAUD
a599e38674
chore: allow custom registry to build installer/imager
Use custom pkgs repository by setting PKGS_PREFIX as argument.

Signed-off-by: Anthony ARNAUD <github@anthony-arnaud.fr>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-18 17:32:52 +04:00
Steve Francis
3911ddf7bd
docs: add how-to for cert management
Explain certificate auto-rotation.

Signed-off-by: Steve Francis <steve.francis@talos-systems.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-18 16:57:42 +04:00
Andrey Smirnov
b0ee0bfba3
fix: strategic patch merging for audit policy
The audit policy is marked as `merge: replace`, but there's no check for
zero value. So the problem is that any patch which has `cluster:`
section zeroes out previously set `cluster.apiServer.auditPolicy`.

Add regression tests.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-18 14:36:44 +04:00
Andrey Smirnov
474eccdc4c
fix: watch bufer overrun for RouteStatus
Fixes #8157

This PR contains two fixes, both related to the same problem.

Several routes for different links but  same IPv6 destination might exist
at the same time, so route resource ID should handle that. The problem
was that these routes were mis-reported causing internally updates for
the same resources multiple times (equal to the number of the links).

Don't trigger controllers more often than 10 times/seconds (with burst of
5) for kernel notifications. This ensures Talos doesn't try to reflect
current state of the network subsystem too often as resources, which
causes excessive CPU usage and might potentially lead to the buffer
overrun under high rate of changes.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-01-17 19:28:25 +04:00