IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Add output flag for `talosctl config info`.
This allows to programatically gather endpoints for CI tests.
Eg:
```bash
_out/talosctl-linux-amd64 config info --output json | jq '.Contexts[].Endpoints[0]'
```
Signed-off-by: Noel Georgi <git@frezbo.dev>
Fixes#7679
This should be no-op if the link name is <= 10 chars, but with
predictable interface names based on MAC addresses, they have to be
shortened to make some space for VLAN ID.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Fixes#7698
Also fix `talosctl config info` for `talosconfig` without a client
certificate (e.g. Omni-generated one).
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
The default timeouts are very aggressive, and we should use explicit
timeouts so that healh checks don't run that often.
Fixes#7690
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
This is not a problem in general, but when running multiple image
generation procedures using the same mount point is a problem.
This is a no-op if `MountPrefix` is not set (when installing/upgrading
vs. creating an image).
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
This fixes a problem in the `RouteSpecController` which is due to a
subtle (but correct) change in the behavior in the `stdlib`.
Also some small (but should be safe) bumps.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Example: host has address `10.0.0.1/8`, while Kubernetes pod CIDR is
`10.244.0.0/16`. These two subnets overlap, but the address `10.0.0.1`
isn't contained in the `10.244.0.0/16` subnet.
This change fixes the check to make sure address is not contained vs.
the address subnet overlaps with the filter.
NB: this is still a bad idea to have host network subnet to overlap with
Kubernetes pod/service CIDRs.
Also refactor the unit-tests to use new (better ways) to do assertions.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Move drone extensions integration to a function. This allows us to
re-use the code and just depend on a single step rather than explicitly
defining all dependencies.
Signed-off-by: Noel Georgi <git@frezbo.dev>
This is required for https://github.com/siderolabs/sidero/pull/1070, as
we need to allow DHCP traffic from Sidero controller running in a VM
through the bridge to other VMs.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Processes and their info are not guaranteed to be present on the api-based data gathered by the dashboard. Therefore, we switch to using nil-safe access to the CPU time when rendering the process table.
Closessiderolabs/talos#7645.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Fixes#7615
This extends the previous handling when Talos did `ToLower()` on the
hostname to do the full filtering as expected.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
This is a follow-up fix for #7640
I noticed that image cleanup controller cleans up the images if
specified with both tag and digest.
The problem was incorrectly building image references in the expected
set of images, so they were incorrectly marked as unused.
Refactor the code to make the core part testable.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Fixes#7636
This support a `List`-type manifests by unwrapping them into individual
objects.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
First of all, it seems to be "right way", as it makes sure the image is
looked up by the digest.
Second, it fixes the case when image is specified with both tag and
digest (which is not supposed to be the correct ref, but it is used
frequently).
Talos since 1.5.0 stores images with the following aliases:
```
gcr.io/etcd-development/etcd:v3.5.9
gcr.io/etcd-development/etcd@sha256:8c956d9b0d39745fa574bb4dbacd362ffdc1109479432f54094859d4cf984b17
ghcr.io/siderolabs/kubelet:v1.28.0
ghcr.io/siderolabs/kubelet@sha256:50710f2cd3328c23f57dfc7fb00940d8cfd402315e33fc7cb8184fc660650a5c
sha256:50710f2cd3328c23f57dfc7fb00940d8cfd402315e33fc7cb8184fc660650a5c
sha256:8c956d9b0d39745fa574bb4dbacd362ffdc1109479432f54094859d4cf984b17
```
This change pulls the digest format (the last in this list) and uses it
to start a container.
Fixes#7640
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Short version is: move from global variables/`init()` function into
explicit functions.
`docgen` was updated to skip creating any top-level global variables,
now `Doc` information is generated on the fly when it is accessed.
Talos itself doesn't marshal the configuration often, so in general it
should never be accessed for Talos (but will be accessed e.g. for
`talosctl`).
Machine config examples were changed manually from variables to
functions returning a value and moved to a separate file.
There are no changes to the output of `talosctl gen config`.
There is a small change to the generated documentation, which I believe
is a correct one, as previously due to value reuse it was clobbered with
other data.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Fixes#7592
The problem was a mismatch between a "primary key" (ID) of the
`RouteSpec` and the way routes are looked up in the kernel - with two
idential routes but different priority Talos would end up in an infinite
loop fighting to remove and re-add back same route, as priority never
matches.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Use `Progress`, and options to pass around the way messages are written.
Fixed some tiny issues in the code, but otherwise no functional changes.
To make colored output work with `docker run`, switched back image
generation to use volume mount for output (old mode is still
functioning, but it's not the default, and it works when docker is not
running on the same host).
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
There are two changes here:
* build `machined` binary with `tcell_minimal` tag (which disables
loading some parts of the terminfo database), which also affects
`apid`, `trustd` and `dashboard` processes, as they run from the same
executable; in `dashboard` explicitly import `linux` terminal we're
using when the `dashboard` runs on the machine
* pass `TCELL_MINIMIZE=1` environment variable to each Talos process
which removes 0.5MiB of runewdith allocation for a lookup table
See #7578
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Currently, we use `github.com/coreos/go-semver/semver` and `github.com/hashicorp/go-version`
for version parsing. As we use `github.com/blang/semver/v4` in our other projects, and it
has more features, it makes sense to use it across the projects. It also doesn't allocate
like crazy in `KubernetesVersion.SupportedWith`.
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
Fixes#7080
The real bug was off-by-one in `log2i` implementation, other changes are
cleanups as `x/sys/unix` package now contains all the constants we need.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This PR updates the ProviderID format for aws resources. There seems to
be a bug when using Talos CCM (which consumes this value from Talos)
because the format is `aws://x/y` (two slashes) vs. the expected
`aws:///x/y` (three slashes) that is set with the AWS CCM code
[here](d055109367/pkg/providers/v1/instances.go (L47-L53)).
Setting only two slashes causes important software in the workload
cluster to fail, specifically cluster-autoscaler. The regex they use for
pulling providerID is [here](702e9685d6/cluster-autoscaler/cloudprovider/aws/aws_cloud_provider.go (L195)).
Signed-off-by: Spencer Smith <spencer.smith@talos-systems.com>
Fixes#7558
I see no reason to keep old behavior (removing all partitions on the
disk), as it's only compatible with Talos itself.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Support full configuration for image generation, including image
outputs, support most features (where applicable) for all image output
types, unify image generation process.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
* support for bonding
* added interface selection by MAC address
* added routes management
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
After we closed `kubelet`, remove `/var/lib/kubelet/cpu_manager_state` if there are any changes in `cpuManagerPolicy`.
We do not add any other safeguards, so it's user responsibility to cordon/drain the node in advance.
Also minor fixes in other files.
Closes#7504
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
Add security state resource that describes the state of Talos SecureBoot
and PCR signing key fingerprints.
The UKI fingerprint is currently not populated.
Fixes: #7514
Signed-off-by: Noel Georgi <git@frezbo.dev>
There are no functional changes, but SDK got updated to handle int ->
int64 changes. v1 version is only supported to Sep 2023.
See https://github.com/hetznercloud/hcloud-go#support
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This new fork seems to be more active. The change itself doesn't fix any
memory allocation, but I submitted a PR for gopacket/gopacket:
https://github.com/gopacket/gopacket/pull/24
Also fix crazy alloc in `tui/components` (this is only relevant for
`talosctl`).
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This module was imported just for a single Go struct (for XML
unmarshalling), and it could be easily internalized.
The module causes significant allocation on startup:
```
init github.com/vmware/govmomi/vim25/types @23 ms, 1.4 ms clock, 1269864 bytes, 196 allocs
```
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Both are much modular and pull in much less dependendencies in to the
Talos tree.
This solves the problem with allocations in AWS endpoints on import, and
removes a bunch of dependencies.
Raw binary size: -10 MiB.
Memory usage (not scientific): around -5 MiB for all Talos services.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
As part of bootloader refactoring `go-blockdevice` was used for wiping
partitions in #7329, but used standard wipe which could be fast/slow
depending on the blockdevice support. Switch to using fast-wipe for
partitions. This should not affect `wipe` option in machineconfig.
Fixes: #7531
Signed-off-by: Noel Georgi <git@frezbo.dev>
- Make dashboard SIGTERM-aware
- Handle panics on dashboard and terminate it gracefully, so it resets the terminal properly
- Switch to TTY2 when it starts and back to TTY1 when it stops.
Closessiderolabs/talos#7516.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
This refactors code to handle partial machine config - only multi-doc
without v1alpha1 config.
This uses improvements from
https://github.com/cosi-project/runtime/pull/300:
* where possible, use `TransformController`
* use integrated tracker to reduce boilerplate
Sometimes fix/rewrite tests where applicable.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Fixes#7453
The goal is to make it possible to load some multi-doc configuration
from the platform source (or persisted in STATE) before machine acquires
full configuration.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Fixes#7487
When `.kubelet.nodeIP` filters yield no match, Talos should not start
the kubelet, as using empty address list results in `--node-ip=` empty
kubelet arg, which makes kubelet pick up "the first" address.
Instead, skip updating (creating) the nodeIP and log an explicit
warning.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This adds basic support for shutdown/poweroff flags.
it can distringuish between halt/shutdown/reboot.
In the case of Talos halt/shutdown is same op.
Signed-off-by: Noel Georgi <git@frezbo.dev>
Use the `machined` socket for `shutdown` and `poweroff` aliases. This
ensures that worker nodes does not have to wait on apid to start.
Signed-off-by: Noel Georgi <git@frezbo.dev>
We can safely do it on `io.Writer` level, since `log.Logger.Output` (called by `Print|Printf`) pretty much promises
that every call to `Write` ends with `\n`.
Closes#7439
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
For rng seed and pcr extend, let's ignore if the device is not TPM2.0
based. Seal/Unseal operations would still error out since it's
explicitly user enabled feature.
Signed-off-by: Noel Georgi <git@frezbo.dev>
The previous flow was using TPM PCR 11 values to bound the policy which
means TPM cannot unseal when UKI changes. Now it's fixed to use PCR 7
which is bound to the SecureBoot state (SecureBoot status and
Certificates). This provides a full chain of trust bound to SecureBoot
state and signed PCR signature.
Also the code has been refactored to use PolicyCalculator from the TPM
library.
Signed-off-by: Noel Georgi <git@frezbo.dev>
This is intemediate step to move parts of the `ukify` down to the main
Talos source tree, and call it from `talosctl` binary.
The next step will be to integrate it into the imager and move `.uki`
build out of the Dockerfile.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
It seems that CRI has a bit of eventual consistency, and it might fail
to remove a stopped pod failing that it's still running.
Rewrite the upgrade API call in the upgrade test to actually wait for
the upgrade to be successful, and fail immediately if it's not
successful. This should improve the test stability and it should make
it easier to find issues immediately.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Fixes#6391
Implement a set of APIs and commands to manage images in the CRI, and
pre-pull images on Kubernetes upgrades.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
The Go modules were not tagged for alpha.4, so using alpha.3 tag.
Talos 1.5 will ship with Kubernetes 1.28.0.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Fixes#7430
Introduce a set of resources which look similar to other API
implementations: CA, certs, cert SANs, etc.
Introduce a controller which manages the service based on resource
state.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Fixes#7379
Add possibility to configure the controlplane static pod resources via
APIServer, ControllerManager and Scheduler configs.
Signed-off-by: LukasAuerbeck <17929465+LukasAuerbeck@users.noreply.github.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Talos now supports new type of encryption keys which rely on Sealing/Unsealing randomly generated bytes with a KMS server:
```
systemDiskEncryption:
ephemeral:
keys:
- kms:
endpoint: https://1.2.3.4:443
slot: 0
```
gRPC API definitions and a simple reference implementation of the KMS server can be found in this
[repository](https://github.com/siderolabs/kms-client/blob/main/cmd/kms-server/main.go).
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
Fixes#7228
Add some changes to make Talos accept partial machine configuration
without main v1alpha1 config.
With this change, it's possible to connect a machine already running
with machine configuration (v1alpha1), the following patch will connect
to a local SideroLink endpoint:
```yaml
apiVersion: v1alpha1
kind: SideroLinkConfig
apiUrl: grpc://172.20.0.1:4000/?jointoken=foo
---
apiVersion: v1alpha1
kind: KmsgLogConfig
name: apiSink
url: tcp://[fdae:41e4:649b:9303::1]:4001/
---
apiVersion: v1alpha1
kind: EventSinkConfig
endpoint: "[fdae:41e4:649b:9303::1]:8080"
```
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Previously, if META values were supplied to the Talos ISO via
environment variable, they will be written down and available after the
install. With this fix, values are also readable and available before
the installation runs (in maintenance mode).
Most of the PR is refactoring `meta.Value(s)` to be a shared library
which is used by the installer/imager and (now) Talos.
Also fixes an issue with not returning properly `NotExist` error when
META is not yet available as a partition on disk.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Some tools like qemu-guest-agent when ran as a extension service calls
`/sbin/shutdown` instead of `/sbin/poweroff`. This adds handling for the
same.
Ref: https://github.com/siderolabs/extensions/pull/173
Signed-off-by: Noel Georgi <git@frezbo.dev>
Allow specifying the reboot mode during upgrades by introducing `--reboot-mode` flag, similar to the `--mode` flag of the reboot command.
Closessiderolabs/talos#7302.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
This makes handling of `exec` more flexible.
Signed-off-by: Markus Reiter <me@reitermark.us>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This includes sd-boot handling, EFI variables, etc.
There are some TODOs which need to be addressed to make things smooth.
Install to disk, upgrades work.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This commit adds support for API load balancer. Quick way to enable it is during cluster creation using new `api-server-balancer-port` flag (0 by default - disabled). When enabled all API request will be routed across
cluster control plane endpoints.
Closes#7191
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
Move labels out of the bootloader interface, while moving copying assets
into the bootloader interface. GRUB is using one set of assets,
`sd-boot` will be using another one.
Fix the problem with `bootloader.Probe()` finding boot partition on the
host when it runs in a priv container, fixing issues with image creation
in the CI.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
`WITH_CONFIG_PATCH_WORKER` check result was overriding any value set in `CONFIG_PATCH_FLAG` variable.
Move it to the different variable.
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This changes the bootloader code to be generic to support
multiple bootloader implementations.
Signed-off-by: Noel Georgi <git@frezbo.dev>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Fixes#7233
Waiting for node readiness now happens in the `MachineStatus` controller
which won't mark the node as ready until Kubernetes `Node` is ready.
Handling cordoning/uncordining happens with help of additional resource
in `NodeApplyController`.
New controller provides reactive `NodeStatus` resource to see current
status of Kubernetes `Node`.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
See #7233
The controlplane label is simply injected into existing controller-based
node label flow.
For controlplane taint default NoScheduleTaint, additional controller &
resource was implemented to handle node taints.
This also fixes a problem with `allowSchedulingOnControlPlanes` not
being reactive to config changes - now it is.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Because `SetConfig` can be called concurrently with `Config` there is risk of data race, if something goes wrong. Since `config.Provider` is an interface type, it means its size is two machine words. And so in very unpleasant situations it can lead to arbitrary RCE, because interface variable can be in partially updated state.
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
I ended up completely rewriting the controller, simplifying the flow
(somewhat) so that there's just a single control flow in the controller,
while reading from v1alpha1 events is converted to reading from a
channel.
Fixes#7227
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Fixes#7333
Also fixed the discovery service controller to reconnect the client on
config changes (previously it wasn't reactive on e.g. URL changes).
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Use the `go-blockdevice` library to zero partitions.
Also added a test that writes `ones` to the partition and verifies its
zeroes after zeroing it.
Signed-off-by: Noel Georgi <git@frezbo.dev>
This changes the mounting/unmounting of `BOOT` partiton code into
`kexecPrepare` phase. Also skips if `BOOT` partition cannot be found.
Signed-off-by: Noel Georgi <git@frezbo.dev>
Fixes#7226
This follows same flow as other similar changes - split out logging
configuration as a separate resource, source it for now in the cmdline.
Rewrite the controller to allow multiple log outputs, add send retries.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This PR adds support for creating a list of API endpoints (each is pair of host and port).
It gets them from
- Machine config cluster endpoint.
- Localhost with LocalAPIServerPort if machine is control panel.
- netip.Addr[0] and port from affiliates if they are control panels.
For #7191
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
Use `udevd` rules to create stable interface names.
Link controllers should wait for `udevd` to settle down, otherwise link
rename will fail (interface should not be UP).
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
If the dashboard is run without the "Config URL" screen, do not initialize it, and do not probe the kernel args for the code parameter.
Refactor the dashboard to do not construct the unused screens at all.
Closessiderolabs/talos#7300.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
RENEW packets are sent unicast, so Talos needs the address of the DHCP
server to send RENEW packets to.
Fixes#7211Fixes#7263
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
`config.Container` implements a multi-doc container which implements
both `Container` interface (encoding, validation, etc.), and `Conifg`
interface (accessing parts of the config).
Refactor `generate` and `bundle` packages to support multi-doc, and
provide backwards compatibility.
Implement a first (mostly example) machine config document for
SideroLink API URL.
Many places don't properly support multi-doc yet (e.g. config patches).
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This is a port of ukify.py and systemd-measure from systemd.
This requires no actual TPM to be present to calculate the PCR
signatures.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Signed-off-by: Noel Georgi <git@frezbo.dev>
See #7230
Refactor more config interfaces, move config accessor interfaces
to different package to break the dependency loop.
Make `.RawV1Alpha1()` method typed to avoid type assertions everywhere.
No functional changes.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
See #7230
This is a step towards preparing for multi-doc config.
Split the `config.Provider` interface into parts which have different
implementation:
* `config.Config` accesses the config itself, it might be implemented by
`v1alpha1.Config` for example
* `config.Container` will be a set of config documents, which implement
validation, encoding, etc.
`Version()` method dropped, as it makes little sense and it was almost
not used.
`Raw()` method renamed to `RawV1Alpha1()` to support legacy direct
access to `v1alpha1.Config`, next PR will refactor more to make it
return proper type.
There will be many more changes coming up.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Fixes#7246
The problem was that `udevd` watches via `inotify` any attempts to open
blockdevices with 'write' access.
Talos was opening with write access, but actually accessing as
read-only, so the fix is to open as read-only.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This is controlled with a feature flag which gets enabled automatically
for Talos 1.5+.
Fixes#7181
If enabled, configures kubelet to use project quotas to track xfs volume
usage, which is much more efficient than doing `du` periodically.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Kubelet doesn't refresh self-issued serving certificates, so force it by
removing the cert on each restart.
Fix the code which was forcing rejoin when the nodename changes, it was
broken, as it was checking serving certificate instead of client
certificate. It worked by accident when not using controlplane-issued
serving certificates.
Fixes#7235
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
* drop old resources API, which was deprecated long time ago
* use bootstrapped event in `talosctl get --watch` to better align
columns in the table output
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
- github.com/containerd/typeurl to v2.1.1
- github.com/aws/aws-sdk-go to v1.44.264
- alpine to 3.18.0
- node to 20.2.0-alpine
- github.com/containernetworking/plugins to v1.3.0
- github.com/docker/docker to v23.0.6+incompatible
- github.com/hetznercloud/hcloud-go to v1.45.1
- github.com/insomniacslk/dhcp to v0.0.0-20230516061539-49801966e6cb
- github.com/rivo/tview to v0.0.0-20230511053024-822bd067b165
- tools to v1.5.0-alpha.0-7-gd2dde48
- pkgs to v1.5.0-alpha.0-16-g7958db1
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
This reverts commit a2565f6741.
The fix done in `a2565f67`, was actually a no-op caused by the
misunderstanding the fix done in Go and backported to [Go 1.20.4](ecf7e00db8).
The fix gave a false confidence that it was working when it was tested
against Talos `main` branch since the PR #7190 bumped `x/sys` package
from [v0.7.0 -> v0.8.0](ecf7e00db8), the actual change in `x/sys` can be found here at ff18efa0a3 which meant that when updating Go to 1.20.4 the `x/sys` package should been updated too. The `x/sys` package changed how the syscall to set the rlimit was called, it got moved into the Go stdlib instead of calling rlimit syscall in the `x/sys` package, which meant a combination of using Go 1.20.4 and an older `x/sys` package means `RLIMIT_NOFILE` value would not be set back to the original value.
The Talos 1.4 release branch currently have `x/sys`
at [v0.7.0(https://github.com/siderolabs/talos/blob/v1.4.3/go.mod#L133),
so the backport would consist of this change along another commit bumping `x/sys` package to `v0.8.0`.
Fixes: #7198Fixes: #7206
Co-authored-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Signed-off-by: Noel Georgi <git@frezbo.dev>
This bug is pretty cosmetic, but it shows up as a wrong check when
performing worker upgrade - Talos pretends it checks e.g. kube-apiserver
version which doesn't make sense for workers.
There were two bugs in the code:
* check for machine type was done against `TypeWorker`, while
`MachineType` resource is initially created as `TypeUnknown`
* the cleanup code was not implemented
As I touched the code, I updated controller and tests to use modern
conventions.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Introduce a new resource, `SiderolinkConfig`, to store SideroLink connection configuration (api endpoint for now).
Introduce a controller for this resource which populates it from the Kernel cmdline.
Rework the SideroLink `ManagerController` to take this new resource as input and reconfigure the link on changes.
Additionally, if the siderolink connection is lost, reconnect to it and reconfigure the links/addresses.
Closessiderolabs/talos#7142, siderolabs/talos#7143.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Fixes#7159
The change looks big, but it's actually pretty simple inside: the static
pods had an annotation which tracks a version of the secrets which
forced control plane pods to reload on a change. At the same time
`kube-apiserver` can reload certificate inputs automatically from files
without restart.
So the inputs were split: the dynamic (for kube-apiserver) inputs don't
need to be reloaded, so its version is not tracked in static pod
annotation, so they don't cause a reload. The previous non-dynamic
resource still causes a reload, but it doesn't get updated when e.g.
node addresses change.
There might be many more refactoring done, the resource chain is a bit
of a mess there, but I wanted to keep number of changes minimal to keep
this backportable.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Talos doesn't have `rpc.statsd` running, so mounting without locking is
the only option. Some places in Kubernetes don't allow to set mount
options for NFS, so setting defaults is the only way.
Fixes#6582
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Ensure to wait as long as possibly given to kubelet shutdown timers.
Related to fix of siderolabs#7138
Signed-off-by: Niklas Wik <niklas.wik@nokia.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Fixes#7137
The `umount` syscall might hang "forever" if the underlying network
filesystem endpoint is down.
To be on the safe side, add a timeout around unmount operations, and try
to umount with force as a last resort.
Sample log:
```
14795.458779] [talos] task unmountPodMounts (2/2): unmounting /var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com/dbe8d7f58e21d06cbef1ae0849317661eba4e82776722e7db5c65194ad73e916/globalmount/0001-0009-rook-ceph-0000000000000001-1051beb3-8d7a-4291-bf45-5711c13523d1
[14795.459797] [talos] task unmountPodMounts (2/2): unmounting /var/lib/kubelet/pods/f3f4d789-7f48-4dd9-9ef5-649b002c8f9c/volumes/kubernetes.io~csi/pvc-a4e72749-a8a1-43d9-9152-5bc1f757c924/mount
[14795.460555] EXT4-fs (rbd0): unmounting filesystem.
[14813.461319] [talos] task unmountPodMounts (2/2): unmounting /var/lib/kubelet/pods/f3f4d789-7f48-4dd9-9ef5-649b002c8f9c/volumes/kubernetes.io~csi/pvc-a4e72749-a8a1-43d9-9152-5bc1f757c924/mount is taking longer than expected, still waiting for 1m11.999162834s
[14831.460813] [talos] task unmountPodMounts (2/2): unmounting /var/lib/kubelet/pods/f3f4d789-7f48-4dd9-9ef5-649b002c8f9c/volumes/kubernetes.io~csi/pvc-a4e72749-a8a1-43d9-9152-5bc1f757c924/mount is taking longer than expected, still waiting for 53.999567033s
[14849.461336] [talos] task unmountPodMounts (2/2): unmounting /var/lib/kubelet/pods/f3f4d789-7f48-4dd9-9ef5-649b002c8f9c/volumes/kubernetes.io~csi/pvc-a4e72749-a8a1-43d9-9152-5bc1f757c924/mount is taking longer than expected, still waiting for 35.998979117s
[14867.460748] [talos] task unmountPodMounts (2/2): unmounting /var/lib/kubelet/pods/f3f4d789-7f48-4dd9-9ef5-649b002c8f9c/volumes/kubernetes.io~csi/pvc-a4e72749-a8a1-43d9-9152-5bc1f757c924/mount is taking longer than expected, still waiting for 17.999502128s
[14885.461123] [talos] task unmountPodMounts (2/2): unmounting /var/lib/kubelet/pods/f3f4d789-7f48-4dd9-9ef5-649b002c8f9c/volumes/kubernetes.io~csi/pvc-a4e72749-a8a1-43d9-9152-5bc1f757c924/mount with force
[14885.462395] [talos] ignoring unmount error /var/lib/kubelet/pods/f3f4d789-7f48-4dd9-9ef5-649b002c8f9c/volumes/kubernetes.io~csi/pvc-a4e72749-a8a1-43d9-9152-5bc1f757c924/mount: invalid argument
[14885.463529] [talos] task unmountPodMounts (2/2): unmounting /var/run/netns/cni-0888dc71-ba9e-af8a-d322-074f654561e5
[14885.464267] [talos] task unmountPodMounts (2/2): done, 1m30.028862262s
```
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
API server takes care of setting priority for "regular" pods from
priorityClassName, but nothing does that for static pods, so we have to
specify the priotity explicitly for static pods.
This fixes the graceful node shutdown (kubelet) to stop non-critical
pods before the api-server and friends (critical pods).
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Fix udevd not triggering rules properly.
This also fixes an issue with go-blockdevice not resolving symlinks.
Fixes: #7117
Signed-off-by: Noel Georgi <git@frezbo.dev>
Fixes#7121
Talos pulls some images on its own (without CRI/kubelet) to the `system`
namespace of the CRI containerd. These images are not visible to the
CRI/kubelet, so we need to clean them up manually.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Rename members to machines to be clearer.
Display the correct member count.
Closessiderolabs/talos#7127.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Add startup probes that probe the containers for 60 seconds before switching to liveness probes.
Closessiderolabs/talos#7054.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Hide kube-apiserver, kube-controller-manager and kube-scheduler statuses on the dashboard for the worker nodes, instead of showing them as n/a.
Also display the cluster name as n/a for workers (instead of an empty string), as that information is not available to them.
Closessiderolabs/talos#7103.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Show the config URL template that will be populated when the code is entered. Closessiderolabs/talos#7092.
Clear the form when the tab is exited & do not display "Saved successfully" message when the code is saved, as we navigate to the summary tab afterward anyway. Closessiderolabs/talos#7093.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Updated documentation, what's new, etc.
Also fix some minor UI issues in the dashboard.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Prevent dashboard from crashing when a dead/non-existent node is specified on `talosctl --nodes`.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Use GRUB quoting function to the kernel args passed to Talos.
This fixes passing `${variable}` to `talos.config=` kernel argument.
Also fix a problem with `ONBUILD` being exected for `imager` image.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
The problem was that 'Succeeded' pod was treated as 'not ready', so that
`MachineStatus` never reached readiness state.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Fixes#7041
Rework the DHCP flow so that we don't use `INFORM` requests anymore. The
idea is to try requesting a hostname from the DHCP server first, and if
the hostname is not send, or it gets overridden in Talos, restart the
DHCP sequence sending the hostname to the DHCP server.
This still avoids sending and requesting a hostname in one request.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Fixes: https://github.com/siderolabs/talos/issues/7017
Should allow external services to detect which user block devices might
need to be wiped during reset.
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
Network probes are configured with the specs, and provide their output
as a status.
At the moment only platform code can configure network probes.
If any network probes are configured, they affect network.Status
'Connectivity' flag.
Example, create the probe:
```
talosctl -n 172.20.0.3 meta write 0xa '{"probes": [{"interval": "1s", "tcp": {"endpoint": "google.com:80", "timeout": "10s"}}]}'
```
Watch probe status:
```
$ talosctl -n 172.20.0.3 get probe
NODE NAMESPACE TYPE ID VERSION SUCCESS
172.20.0.3 network ProbeStatus tcp:google.com:80 5 true
```
With failing probes:
```
$ talosctl -n 172.20.0.3 get probe
NODE NAMESPACE TYPE ID VERSION SUCCESS
172.20.0.3 network ProbeStatus tcp:google.com:80 4 true
172.20.0.3 network ProbeStatus tcp:google.com:81 1 false
$ talosctl -n 172.20.0.3 get networkstatus
NODE NAMESPACE TYPE ID VERSION ADDRESS CONNECTIVITY HOSTNAME ETC
172.20.0.3 network NetworkStatus status 5 true true true true
```
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Implement a screen for entering/managing the config `${code}` variable.
Enable this screen only when the platform is `metal` and there is a `${code}` variable in the `talos.config` kernel cmdline URL query.
Additionally, remove the "Delete" button and its functionality from the network config screen to avoid users accidentally deleting PlatformNetworkConfig parts that are not managed by the dashboard.
Add some tests for the form data parsing on the network config screen.
Remove the unnecessary lock on the summary tab - all updates come from the same goroutine.
Closessiderolabs/talos#6993.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Restart udevd on adding custom rules where in the case the subsystems
needs to be re-triggered.
Fixes: #7001
Signed-off-by: Noel Georgi <git@frezbo.dev>
talosctl netstat -k show all host and non-hostnetwork pods sockets/connections.
talosctl netstat namespace/pod shows sockets/connections of a specific pod +
autocompletes in the shell.
Signed-off-by: Nico Berlee <nico.berlee@on2it.net>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
* Clear the input form and switch to summary tab after the network config is saved.
* Use nodeaddress resource for detecting and displaying IPs. Improve the IP filtering logic.
* Fix the logic of gateway detection. Display all gateways instead of a single one.
* Use hostnamestatus resource to detect the hostname instead of an API call.
* Add hostname entry to the network info section on summary tab (as `HOST`).
* Enable `OUTBOUND` entry in network info section on summary tab.
* Display only the physical network interfaces in the interface dropdown on network config tab.
* Improve form input handling.
* Additional minor fixes & improvements.
Closessiderolabs/talos#6992.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
This adds support for automatically registering node hostnames in DNS by
sending the current hostname to DHCP via option 12. If the current hostname is
updated, issue a new DISCOVER to propagate the update to DHCP (updating the
hostname on lease renewals is not universally supported by DHCP servers). This
addition maintains the previous functionality where the node can also request
its hostname from the DHCP server. The received hostname will be processed and
prioritized as usual by the `network.HostnameSpecController`.
This change set also contains fixes to make DHCP renewals compliant with RFC
2131, specifically avoiding sending the server identifier and requested IP
address when issuing renewals using a previous offer. This also uncovered
issues and missing features in the upstream `insomniacslk/dhcp` library, the
fixes and improvements for which are now finally merged.
Sending hostname updates have been tested against `dnsmasq` and the built-in
DHCP + DNS services in Windows Server. Hostname retrieval from DHCP and edge
cases with overridden hostnames from different configuration layers have been
extensively tested against `dnsmasq`.
Signed-off-by: Dennis Marttinen <twelho@welho.tech>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Unify getting environment variables, support passing environment
variables via kernel args.
Fixes#6984
See #6999
For META this will be used to pass environment variables to the
installer for ISO images (or PXE booting).
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
New variable value is coming from `META`, and it might be set using the
interactive console (not implemented yet, but it will come soon).
I had to refactor the URL expansion implementation:
* simplify things where possible
* provide more unit-tests for smaller units
* handle expansion of all variables in parallel
* allow parallel expansion on multiple variables
Also I refactored download code to support proper passing of endpoint
function with context.
The end result:
* Talos will try to download config for 3 hours before rebooting
* Each attempt which includes URL expansion + download is limited to 3
minutes
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Implement the network config screen with input forms to configure the initial node networking by writing a config to the META partition.
Closessiderolabs/talos#6961.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
The problem was that `GracefulStop()` will hang forever if there is a
running API call. So if there is a running streaming call, the
maintenance service might hang until it is finished.
The problem shows up with 'Upgrade' API in the maintenance mode if there
is a concurrent streaming API call, e.g.:
1. Watch API is running against maintenance mode.
2. Upgrade API is issued, it tries to run the MaintenanceUpgrade
sequence, which tries to take over the Initialize sequence. The
Initialize sequence is canceled, maintenance API service context is
canceled, but the service doesn't terminate, as it's stuck in
`GracefulStop`. The sequence take over times out, as even the
sequence is canceled, it hasn't terminated yet.
Sample log:
```
[talos] upgrade request received: "ghcr.io/siderolabs/installer:v1.3.3"
[talos] upgrade failed: failed to acquire lock: timeout
[talos] task loadConfig (1/1): failed: failed to receive config via maintenance service: maintenance service failed: context canceled
[talos] phase config (6/7): failed
[talos] initialize sequence: failed
<stuck here>
```
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Bump golangci-lint and fixup new warnings. Ignore check that checks for
used function parameters, it's kind of noisy and makes it confusing to
read interface implementations.
Signed-off-by: Noel Georgi <git@frezbo.dev>
Discovered in #6971. Go compiler cannot deduce proper type on 32bit architectures for those constants,
in `fmt.Print(f)` functions. Since we only compare them with uint32 variables, it makes sense to add proper
types to them.
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
The problem showed up on 'reset' of the Talos node which had multiple
endpoints for other control plane nodes, many of which weren't actually
available.
When 'grpc.WithBlock()' is used, etcd will try to dial the first
endpoint and return an error if the dial fails.
Use noblock mode by default with multiple endpoints, and blocking mode
with a single endpoint.
Pass the context to etcd to properly abort dial operations if the
context get canceled.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Instead of doing excessive get/list requests, do a watch per node in an infinite retry.
Additionally, refactor the dashboard code to make the various data listener namings more consistent and reorganize the packages.
Closessiderolabs/talos#6960.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
If link has no `Info` field we can't do anything meaningful, so we'll just log and skip.
Also fix race in test.
For #6956
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
Fix a data race caused by the metadata field of PlatformNetworkConfig being edited after it was sent to the channel. It caused test failures.
Fix it by setting a copy of the metadata instead.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
A special META key might contain optional platform network config for
the `METAL` platform.
It is completely optional, but if present, it works same way as in the
clouds: it is applied with low priority (can be overridden with machine
config), but provides some initial defaults for the machine.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This allows to put keys to META partition.
META contents can be viewed with `talosctl get metakeys`.
There is not real usecase for it yet, but the next PRs will introduce
two special keys which can be written:
* platform network config for `metal`
* `${code}` variable
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Implement the new summary dashboard with node info and logs.
Replace the previous metrics dashboard with the new dashboard which has multiple screens for node summary, metrics and editing network config.
Port the old metrics dashboard to the tview library and assign it to be a screen in the new dashboard, accessible by F2 key.
Add a new resource, infos.cluster.talos.dev which contains the cluster name and id of a node.
Disable the network config editor screen in the new dashboard until it is fully implemented with its backend.
Closessiderolabs/talos#4790.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Use a global instance, handle loading/saving META in global context.
Deprecate legacy syslinux ADV, provide an easier interface for
consumers.
Expose META as resources.
Fix the bootloader revert process (it was completely broken for quite a
while :sad:).
This is a first step which mostly does preparation work, real changes
will come in the next PRs:
* add APIs to write to META
* consume META keys for platform network config for `metal`
* custom key for URL `${code}`
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Fixes#6730
`go generate`-based step downloads the upstream manifest, transforms it
to match our requirements, and it is compiled in as the Flannel
manifest.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Launch CoreDNS even if the node is not initialized.
Network is ready already, but CCM didn't finish their job.
Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Fixes#6817
The original problem wasn't reproducible with `main`, but there was a
set of bugs in the shutdown sequence which was preventing it from
completing successfully, as in the maintenance mode nothing is running
and initialized yet.
Most of the bugs were `nil` pointer dereferences.
Fixed a small issue with final 'RebootError' printed as a failure in the
ACPI shutdown path.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This PR adds first 12 symbols from container ID and adds them to `talosctl -k containers` each container output.
That way we can ensure that we get the logs from proper container even if there is a newer one.
Closes#6886
Co-authored-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
Use new version of go-kubernetes, and move the `kube-proxy` DaemonSet
update to follow common logic of bootstrap manifests update.
This fixes a confusing behavior when after `k8s-upgrade` the version of
`kube-proxy` is not updated in the machine config.
See https://github.com/siderolabs/go-kubernetes/pull/3
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Exposes the pod IP as the `POD_IP` environment variable via the downward
API in the kube-proxy pod for use in e.g. metrics-bind-addr.
Signed-off-by: Tim Jones <tim.jones@siderolabs.com>
This introduces a new role for Talos API which fills the gap between
`os:reader` and `os:admin` roles.
Fixes#6898
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
When removing a member from `etcd`, the server does a pre-check to make
sure the member is connected to a quorum of other members, and the
remove request might fail. Add a retry to wait for the etcd to be fully
connected before giving up, as some parts of the reset flow alrady ran.
Also fix an issue which appears in the integration test, when `reset` is
called early in the boot sequence when local etcd hasn't started fully yet.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This PR ensures that we can handle third grub option - "wipe". We will use it in 1.4.
For #6842
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
Fixes: https://github.com/siderolabs/talos/issues/6815
Additionally, make it possible to run reset in maintenance mode: to
enable a way for resetting system disk and remove all traces of Talos
from it.
The new reset flow works in a separate sequence, changed disk probe
lookup to check the boot partition instead of the ephemeral one.
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
Move dashboard package into a common location where both Talos and talosctl can use it.
Add support for overriding stdin, stdout, stderr and ctt in process runner.
Create a dashboard service which runs the dashboard on /dev/tty2.
Redirect kernel messages to tty1 and switch to tty2 after starting the dashboard on it.
Related to siderolabs/talos#6841, siderolabs/talos#4791.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
- github.com/aws/aws-sdk-go to v1.44.209
- github.com/stretchr/testify to v1.8.2
- github.com/jsimonetti/rtnetlink to v1.3.1
- google.golang.org/genproto to v0.0.0-20230223222841-637eb2293923
- github.com/emicklei/dot to v1.3.1
- github.com/gdamore/tcell/v2 to v2.6.0
- github.com/insomniacslk/dhcp to v0.0.0-20230220063916-5369909a5de7
- github.com/jsimonetti/rtnetlink to v1.3.1
- github.com/opencontainers/runtime-spec to v1.1.0-rc.1.0.20230215090456-58ec43f9fc39
- github.com/rivo/tview to v0.0.0-20230226195229-47e7db7885b4
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
The `modules.dep` kernel module dependency tree extension root path was
previously created with a permission of `0o700` which means the talos
root go a permission of `0o700` when the kernel module tree was re-built
when extensions providing kernel modules was enabled. This means that
any binaries lost the executable permission when ran as non-root
creating an `EACCES` error. Fix by making sure the temporary directory
created for building kernel modules tree has `0o755` permission
explicitly.
Signed-off-by: Noel Georgi <git@frezbo.dev>
The shared code is going out to the
github.com/siderolabs/go-kubernetes library.
The code will be used in Talos and other projects using same features.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Load & start machined earlier and in initialize sequence, so that it is possible to use its API over its unix socket in maintenance mode.
Additionally, do not return features from Version API if a config is not yet available.
Related to siderolabs/talos#4791.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Can set netmask as number.
Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Talos always supported that, but CRI config lacked support for it.
Now with recent containerd the new `_default` host is used as a
fallback, so this re-enables the support and updates the docs.
See https://github.com/containerd/containerd/pull/8065
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Fixes: #6802
Automatically load kernel modules based on hardware info and modules
alias info. udevd would automatically load modules based on HW
information present.
Signed-off-by: Noel Georgi <git@frezbo.dev>
This fixes the issue when the overlay mount target directory was used as
lowerdir for the mount, creating extra folders in the extension.
Fix the issue by adding support for normal overlay mounts to use a
source directory when specified.
Also fixes a small issue where messages was logged when error is nil.
Signed-off-by: Noel Georgi <git@frezbo.dev>
Not sure how I missed it in the first PR, but that's the only character
which was not quoted properly.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Wait for the network before trying to access the metadata service.
Retry the calls when appropriate (most platforms use `download.Download`
function which does proper retries).
Co-authored-by: Noel Georgi <git@frezbo.dev>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
One case was missing: when network section is present, but value is
omitted.
Fixes#6825
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Use process wrapper introduced in #6814 to drop capabilities. This change
also means the capabilities are dropped per process level and not for
PID 1 (machined), which allows us to drop capabilities per process.
Signed-off-by: Noel Georgi <git@frezbo.dev>
Use a wrapper for starting processes which can setup proper cgroups,
OOMscore, and also drop capabilities for the process, then it calls
`execve`.
The containerd tests is also fixed to support cgroups when
running tests in buildkit. It used to pass previously as we did not
error if cgroup setup failed.
Signed-off-by: Noel Georgi <git@frezbo.dev>
This fixes multiple issues:
* `log.Fatalf` in the machined code leads to kernel panic
* return URL if some expansion fails
* correctly handle destroyed event (wait for the next one)
Fixes#6807
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
One of the fields in the GRUB config - boot arguments - contains
user-controlled input. Talos supports variable expansion in
`talos.config` parameter, and uses `${var}` syntax.
In GRUB config, `}` is a special character, and introduction of `}`
breaks config parsing both for GRUB and Talos.
Correctly escape and unescape special characters.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
The previous `udevd` healthcheck was incomplete and if `udevd` took more
time to startup the initial `udevadm trigger` would have silently failed
failing to setup proper devices. `udevadm trigger` returns an exit code
of zero even if `udevd` is not running. This PR fixes by first checking
if the `udevd` control socket exists, which is a faster check, then
making sure `udevd` is up by running `udevadm control` command. This
ensures that `udevd` is properly initialized before running any `udevadm
trigger` commands even if `udevd` is restarted/killed.
Signed-off-by: Noel Georgi <git@frezbo.dev>
This excludes it out of the `NodeAddress`.
Needs extra testing to confirm that it actually still works as anchor
IP.
Fixes#6760
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
As the client returns wrapped errors, unwrap them using our own method
which does `errors.As` instead of gRPC one which doesn't do unwrapping.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
The supposed format with multiple adverised URLs is:
`name=u1,name=u2`
Previously Talos generated:
`name=u1,u2`
(which is wrong)
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
For some reason `go-mod-outdated` didn't work for me, so I had to do
this manually.
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
Run `depmod` during install/upgrades when extensions provide kernel
modules and `modules.dep` needs to be re-generated. This also allows
modules of same name from kernel to co-exist. Modules in `extras`
folder takes precedence over `in-built` ones.
Signed-off-by: Noel Georgi <git@frezbo.dev>
Fixes#6707
There was a race condition between different parts of the service code:
`Stop` waits for the event which is published before the service is
removed from the `running[id]` map, so if one does `Stop` followed by
`Start` (this is what `services restart` API does), by the time it goes
to `Start` it might be still in the `running[id]` map, so `Start` does
nothing.
Overall this code should be rewritten and simplified, but for now move
out sending these "terminal" events out so that by the time the event is
published, the service is stopped and removed from the `running[id]`
map.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This is (still) being used in Talos to handle upgrade rollbacks.
There were multiple problems with this code, and one of them leads to
panic if the tag is written multiple times without deletion:
```
github.com/siderolabs/talos/internal/app/machined/pkg/runtime/v1alpha1/bootloader/adv/syslinux.ADV.SetTagBytes({0xc00175bc00?, 0x1f11dbe?, 0xed4f4d?}, 0x0?, {0xc000afb7f0?, 0x400?, 0x0?})
/src/internal/app/machined/pkg/runtime/v1alpha1/bootloader/adv/syslinux/syslinux.go:125 +0x270
github.com/siderolabs/talos/internal/app/machined/pkg/runtime/v1alpha1/bootloader/adv/syslinux.ADV.SetTag(...)
/src/internal/app/machined/pkg/runtime/v1alpha1/bootloader/adv/syslinux/syslinux.go:95
github.com/siderolabs/talos/cmd/installer/pkg/install.(*Installer).Install(0xc0004374a0, 0x5)
/src/cmd/installer/pkg/install/install.go
```
The `uint8()` conversion was causing overflow and wrong index when ADV
real length is over 255.
Fix multiple writes of the same tag by deleting previous value first.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Change TCP maximum segment size if it goes through the KubeSpan to match
KubeSpan MTU.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
As with #6724, controlplane node kubelet doesn't use control plane
endpoint anymore, run the test on the worker node instead of cp node.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This switches the last usage of Kubernetes controlplane endpoint to use
`localhost` (itself) for controlplane nodes.
Worker nodes still use cluster-wide controlplane endpoint.
This allows controlplane nodes to boot fully even if the controlplane
endpoint (e.g. loadbalancer) doesn't function.
The process of joining etcd still requires either a discovery service or
a proper functioning controlplane endpoint.
With this fix, Talos controlplane nodes can boot successfully without a
loadbalancer being up, while worker nodes obviously won't join.
This improves Talos behavior in single-node clusters when controlplane
endpoint is not available, the node will still boot just fine and
function properly.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
When the sequence fails hard, Talos does automatic reboot, so reflect
this in the machine status properly.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This allows to safely recover out of space quota issues, and perform
degragmentation as needed.
`talosctl etcd status` command provides lots of information about the
cluster health.
See docs for more details.
Fixes#4889
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
These endpoints are used for workers to find the addresses of the
controlplane nodes to connect to `trustd` to issue certificates of
`apid`.
These endpoints today come from two sources:
* discovery service data
* Kubernetes API server endpoints
This PR adds to the list static entry based on the Kubernetes control
plane endpoint in the machine config.
E.g. if the loadbalancer is used for the controlplane endpoint, and that
loadbalancer also proxies requests for port 50001 (trustd), this static
endpoint will provide workers with connectivity to trustd even if the
discovery service is disabled, and Kubernetes API is not up.
If this endpoint doesn't provide any trustd API, Talos will still try
other endpoints.
Talos does server certificate validation when calling trustd,
so including malicious endpoints doesn't cause any harm, as malicious
endpoint can't provider proper server certificate.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Introduce ipv6 to the google cloud.
It also can work with dhcpv6 is on.
But the route receives through RA packages which not working.
Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Bumps tools/pkgs/extras to the latest.
Bumps Go modules.
Enables adaptive capacity for COSI state.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This seems to happen specifically for CRDs, regular Kubernetes resources
have some extra magic.
Fixes#6663
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Making organization a interface for preparing to avoid giving
system:masters access to the talosctl kubeconfig generated certificate.
Signed-off-by: Niklas Wik <niklas.wik@nokia.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Talos has no wireless support & wireless kernel drivers,
so disabling it the recommended way might actually might save power consumption.
It could save ~45 mA:
https://forums.raspberrypi.com/viewtopic.php?t=257144#p1568474
Or 'The WiFi half of the wireless chip will be powered but be held in reset':
https://forums.raspberrypi.com/viewtopic.php?t=343854#p2060246
Either way, it does not hurt and it should be treated the same as bluetooth.
Signed-off-by: Nico Berlee <nico.berlee@on2it.net>
Signed-off-by: Noel Georgi <git@frezbo.dev>
This PR adds two additional checks which are performed during boot sequence and in `talosctl health`. They ensure that nodes have enough memory and disk.
- Boot check will print a warning if memory / disk size is not sufficient.
- Health check will fail if memory / disk size is not sufficient.
Closes#6467
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
This fixes a potential panic which I found in the unit-tests logs.
The error 'not found' is ignored, so need an addiitonal check.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
It's only used to detect if resource is `nil` or of incorrect type. Both errors are developer errors, so we should not collect them.
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
Currently `.Error()` call is panicking if `watchErr` is nil. Besides - we want to wrap errors the way we can unwrap them.
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
There was inconsistency in the way `/v2` was appended to registry
endpoint path between containerd (CRI) and Talos:
* Talos only appended `/v2` to empty paths
* containerd appended `/v2` if it's not the suffix already
Fix Talos to act same as containerd, and introduce a setting
`overridePath` which stops both Talos and `containerd` from appending
`/v2` (should be required with e.g. Harbor registry mirror).
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Fixes#6566
This avoid putting all node addresses which might not be routeable
across Kubernetes.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Add talosctl machineconfig patch command which accepts a machine config as input and a list of patches, applying the patches and writing the result to a file or to stdout.
Link `talosctl machineconfig gen` to `talosctl gen config`, so they work the same way.
Closessiderolabs/talos#6562.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Fixes#6553
Talos itself defaults to XFS, so IMA measurements weren't done for Talos
own filesystems. But many other solutions create by default ext4
filesystems, or it might be something mounted by other means.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Add support to specify the types of outputs to be generated by talosctl gen config.
Add support for writing a single type of output to stdout instead of a file.
Related to siderolabs/talos#6562.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
This brings many fixes, including a new Watch with support for
Bootstapped and Errored event types.
`talosctl` from before this change is still compatible, as there's gRPC
API level backwards compatibility versioning.
New client doesn't yet depend on new event types, so it will work
against Talos 1.2.x.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Host Talos mounts machined socket for API access into the installer
container (for upgrades).
Installer runs any check it might need to verify compatibility.
At the moment following checks are implemented:
* Talos version (whether upgrade from version X to Y is supported)
* Kubernetes version (whether Kubernetes version X is supported with
Talos Y).
Fixes#6149
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Enabling BTF in the kernel brakes kexec from pre-BTF kernel (e.g. when
upgrading from 1.2.x to 1.3.x).
As there's no way to detect Talos version in the installer at the
moment, use another way to detect whether BTF is enabled in the Talos
version which is running right now.
Fixes#6443
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This allows multiple `ip=` parameters, and fixes setting DHCP for any
link on the cmdline.
Fixes#6475
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Fixes#5937
This removes external IPs from a set of addresses published by the node
(we source addresses from 'routed' now which excludes external). This is
definitely "right" thing to do, as those addresses are not on the node
itself and can't be routed to the node.
On other hand it also removes them from `talosctl get members`, but we
don't have to split this up right now.
For the KubeSpan endpoints, we still use 'all' addresses, as external
IPs are perfect as KubeSpan endpoints (Wireguard endpoints).
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This feature allows us to use only IPv4 or IPv6 stack to reach the peers.
Also, it can help to not share the node-specific IPs,
which cannot be accessible at all.
Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>
This resolves a case when a node is behind NAT, but KubeSpan port is
forwarded back to the node. Discovery Service returns public IP of the
client as it sees from the incoming request. That address is now
published to the KubeSpan endpoints.
Fixes#6508
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Removed deprecated arg from the kubelet spec, as the arg is going to be
removed completely in v1.27 (kubelet defaults to remote CRI anyways).
Go modules not updated due to https://github.com/kubernetes/kubernetes/issues/113951
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
We add the `nodeLabels` key to the machine config to allow users to add
node labels to the kubernetes Node object. A controller
reads the nodeLabels from the machine config and applies them via the
kubernetes API.
Older versions of talosctl will throw an unknown keys error if `edit mc`
is called on a node with this change.
Fixes#6301
Signed-off-by: Philipp Sauter <philipp.sauter@siderolabs.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Use boot kernel arg `talos.unified_cgroup_hierarchy=0` to force Talos to
use cgroups v1. Talos still defaults to cgroupsv2.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
We add a controller that provides the etcd member id as a resource
and change the etcd related commands to support member ids next to
hostnames.
Fixes: #6223
Signed-off-by: Philipp Sauter <philipp.sauter@siderolabs.com>
The node can have two IPv6 of the same addresses:
* IPv6/64
* IPv6/128
In this case, the node will advertise two of the same IP:PORT endpoints.
Which adds more time to create/recover a p2p (kubespan) connection.
Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
There's a cyclic dependency on siderolink library which imports talos
machinery back. We will fix that after we get talos pushed under a new
name.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This the first step towards replacing all import paths to be based on
`siderolabs/` instead of `talos-systems/`.
All updates contain no functional changes, just refactorings to adapt to
the new path structure.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
It got broken with the changes to the kubelet now sourcing static pods
from a HTTP internal server.
As we don't want it to be broken, and to make health checks better, add
a new check to make sure kubelet reports control plane static pods as
running. This coupled with API server check should make it more
thorough.
Also add logging when static pod definitions are updated (they were
previously there for file-based implementation). These logs are very
helpful for troubleshooting.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This fixes a cosmetic issue of `machined` being the only service which
doesn't have the healthcheck, and showing up as 'health unknown'.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
* support for bonding
* added interface selection by MAC address
* fixed bug where network configuration from config-drive was not being
applied due to errors when discovering `hostname` and `extIPs` from
OpenStack API.
Signed-off-by: Maxim Makarov <maxpain177@gmail.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
When upgrading from older version of Talos using static pod manifests
directory to new version providing static pods via internal web server,
we need to make sure that legacy static pods are cleaned up, otherwise
kubelet receives "two" versions of the static pods which makes it fail
to run them.
The previous cleanup location wasn't working properly, as
`/etc/kubernetes/manifests` exists in the rootfs (and it's empty), while
actual contents are in `/var`, and they appear only when respective
overlay mount is done.
The controller tried to clean up on start, saw nothing (looking into
rootfs), then started doing other functions. The result was that when
overlay was mounted, static pods were still there, while the controller
will do next attempt only when it fails, and it fails next time when
kubelet is already running, and when it already picked up those stale
definitions.
Fix all of that by moving cleanup into sequencer after overlayfs mount.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Previously we were serving the static pod files on localhost which
assumes DNS. We change that to 127.0.0.1.
Signed-off-by: Philipp Sauter <philipp.sauter@siderolabs.com>
We add support for encryption with secretbox. While AESCBC is still
supported secretbox will take precedence if both are configured.
Secretbox is not the default encryption for new clusters.
Fixes: #6362
Signed-off-by: Philipp Sauter <philipp.sauter@siderolabs.com>
Previously static pod manifests were written to and read from a folder
on the disk. We add a controller that cleans up the default static pod
manifests on the disk and serves them as a PodList manifest via HTTP.
The to the manifest is injected into the kubelet. File based static pod
manifests are still supported and may be enabled by setting the key
kubelet -> enableManifestsDirectory in the machine config.
Fixes#5494
Signed-off-by: Philipp Sauter <philipp.sauter@siderolabs.com>
I had to do several things:
- contextcheck now supports Go 1.18 generics, but I had to disable it because of this https://github.com/kkHAIKE/contextcheck/issues/9
- dupword produces to many false positives, so it's also disabled
- revive found all packages which didn't have a documentation comment before. And tehre is A LOT of them. I updated some of them, but gave up at some point and just added them to exclude rules for now.
- change lint-vulncheck to use `base` stage as base
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
We have a high timeout for node labeling retry, while the generated
temporary Kubernetes PKI has a lifetime of 10 minutes, which means after
10 minutes any call would fail with `Unauthorized`.
Fix that by pulling in the PKI generation into the retry loop, this way
cert is always refresh, and it also reacts to machine config changes
(e.g. if the endpoint got changed).
Also it helps with the retries if the DNS updates or any other changes
like that.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Kubernetes always lowercases whatever nodename is given to the kubelet,
so we should do the same, otherwise Talos looks for a `Node` with
uppercase letter which is never going to be registered.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Kubespan creates package size more than MTU external interface size.
This PR adds capabilities to change MTU size through machine config.
And sets MTU of the default kubespan route.
Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Containers created with `talosctl cluster create` are ran with readonly
filesystem. This more accurately mimics standard Talos.
Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
Introduce new DHCP operator option to skip hostname request/response,
and use that in OpenStack platform.
OpenStack configures interface with DHCP, while providing dummy hostname
over DHCP and proper hostname over metadata. As operators override
platform settings, DHCP hostname takes over OpenStack hostname. As a
fix, ignore DHCP hostname while on OpenStack.
Fixes#6350
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Example, set interface IP address by MAC:
```cmdline: ip=172.20.0.2::172.20.0.1:255.255.255.0::enx001122aabbcc```
Interface MAC is `00:11:22:aa:bb:cc`.
Source: https://www.freedesktop.org/wiki/Software/systemd/PredictableNetworkInterfaceNames/
Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This implements a simple way to upgrade Talos node running in
maintenance mode (only if Talos is installed, i.e. if `STATE` and
`EPHEMERAL` partitions are wiped).
Upgrade is only available over SideroLink for security reasons.
Upgrade in maintenance mode doesn't support any options, and it works
without machine configuration, so proxy environment variables are not
available, registry mirrors can't be used, and extensions are not
installed.
Fixes#6224
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Add resource `AuditPolicyConfigs.kubernetes.talos.dev`.
It can be changed through machine config `cluster.apiServer.auditPolicy`
Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
We add a filter to the `talosctl get` command that allows users to
specify a jsonpath filter. Now they can reduce the information that is
printed to only the parts they are interested in.
Fixes#6109
Signed-off-by: Philipp Sauter <philipp.sauter@siderolabs.com>
This commit adds support for building Talos for the
Compute Module 4 and other generic Raspberry Pi
hardware.
Fixes: #6273
Signed-off-by: Kris Reeves <kris@pressbuttonllc.com>
Signed-off-by: Noel Georgi <git@frezbo.dev>
See #6333
Using permanent address fixes issues with mis-matching the links after
they got bonded.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Permanent address is only available for physical links, and it might be
different from the 'hardware address': when bonding, 'hardware address'
gets overridden from the bond master, while 'permanent address' still
shows MAC of the interface.
This part of the fix for incorrect bonding issue on Equinix Metal.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Fixes#6302
This allows Talos to proceed if some manifest is invalid (or malformed),
while aborts the loop on connection errors (when `kube-apiserver` is not
ready).
This fixes a problem when a single resource might stop all manifests
from being applied and preventing a cluster bootstrap.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This fixes an issue introduced in #5879: options should be set same way
for both `init` and `controlplane` cases.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Don't allow worker nodes to act as apid routers:
* don't try to issue client certificate for apid on worker nodes
* if worker nodes receives incoming connections with `--nodes` set to
one of the local addresses of the nodd, it routes the request to
itself without proxying
Second point allows using `talosctl -e worker -n worker` to connect
directly to the worker if the connection from the control plane is not
available for some reason.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Talos worker nodes use `trustd` API on control plane nodes to issue
certificates for `apid` service. Access to the API is protected with the
Talos join token specified in the machine configuration.
There was no validation on what kind of request is requested, so
`trustd` could issue a certificate which is valid for client
authentication with any set of Talos API RBAC roles, including
`os:admin` role allowing full access to the Talos API on control plane
nodes.
See: GHSA-7hgc-php5-77qq
CVE: CVE-2022-36103
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>