IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Drop the Kubernetes manifests as static files clean up (this is only
needed for upgrades from 1.2.x).
Fix Talos handling of cgroup hierarchy: if started in container in a
non-root cgroup hiearachy, use that to handle proper cgroup paths.
Add a test for a simple TinK mode (Talos-in-Kubernetes).
Update the docs.
Fixes#8274
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
SideroLink is a secure channel, so we can allow read access to the resources. This will give us more control of the node via Omni and/or other systems using SideroLink.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
This disables by default (if not specified in the machine config) the
endpoint harvesting for KubeSpan peers.
The idea was to observe Wireguard endpoints as seen by other peers in
the cluster, and add them to the list of endpoints for the node. This
might be helpful only in case of some special type of NATs which are
almost never seen in the wild today.
So disable by default, but keep an option to enable it.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Prevent `DNSUpstreamController` from panicking by checking if the `machine` section in the config is `nil`. This is the case when a machine has partial configuration, e.g., when the machine has only a `SideroLinkConfig` in its config.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Talos Linux 1.7.0 will ship with Kubernetes v1.30.0.
Drop some compatibility for Kubernetes < 1.25, as 1.25 is the minimum
supported version now.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Fixes#4525
The previous implementation had several issues:
* etcd concurrency session never closed
* Unlock() with potentially closed context
* unlocking when upgrade sequence finishes, but this overlaps with the
machine reboot, so a chance that it never got unlocked
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
While we decide what to do with #8263 and #8256 this quickfix at least allows us to
see what went wrong
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
With switching to RSA service account, machine config generation time is
considerably higher now, so the test might not make it in time.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Also:
* Linux 6.6.14 + XDP enablement
* etcd 3.5.12
Various other bumps for the tools, utilities, and Go modules.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
It was deprecated 16 months ago, time to cleanup.
(This is to prepare for the first v1.7 release)
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Fixes#8069
The image age from the CRI is the moment the image was pulled, so if it
was pulled long time ago, the previous version would nuke the image as
soon as it is unreferenced. The new version would allow the image to
stay for the full grace period in case the rollback is requested.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
In the previous implementation, even though `installer.err` was set, it
was never checked 🤦.
The run loop was stolen from the dashboard code.
Fixes#8205
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Fixes#8202
If some mountpoint can't be queried successfully for 'diskfree'
information, don't treat that as an error, and report zero values for
disk usage/size instead.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
This PR adds a new controller - `DNSServerController` that starts tcp and udp dns servers locally. Just like `EtcFileController` it monitors `ResolverStatusType` and updates the list of destinations from there.
Most of the caching logic is in our "lobotomized" "`CoreDNS` fork. We need this fork because default `CoreDNS` carries
full Caddy server and various other modules that we don't need in Talos. On our side we implement
random selection of the actual dns and request forwarding.
Closes#7693
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
This is currently no-op, just noticed that while looking into another
bug. This should make the intention more clean.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Fixes#8157
This PR contains two fixes, both related to the same problem.
Several routes for different links but same IPv6 destination might exist
at the same time, so route resource ID should handle that. The problem
was that these routes were mis-reported causing internally updates for
the same resources multiple times (equal to the number of the links).
Don't trigger controllers more often than 10 times/seconds (with burst of
5) for kernel notifications. This ensures Talos doesn't try to reflect
current state of the network subsystem too often as resources, which
causes excessive CPU usage and might potentially lead to the buffer
overrun under high rate of changes.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Before this change KubePrism used hardcoded "localhost" as destination which Go could resolve to IPv6 destination and
then fail to connect to. This change forces KubePrism to connect using IPv4 and uses hardcoded "127.0.0.1" destination so
it will always use IPv4.
For #8112
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
When the dashboard is used via the CLI through a proxy, e.g., through Omni, node names or IDs can be used in the `--nodes` flag instead of the IPs.
This caused rendering inconsistencies in the dashboard, as some parts of it used the IPs and some used the names passed in the context.
Fix this by collecting all node IPs on dashboard start, and map these IPs to the respective nodes passed as the `--nodes` flag.
On the dashboard footer, we always display the node names as they are passed in the `--nodes` flag.
As part of it, remove the node list change reactivity from the dashboard, so it will always take the passed nodes as the truth.
The IP to node mapping collection at dashboard startup also solves another issue where the first API call by the dashboard triggered the interactive API authentication (e.g., the OIDC flow). Previously, because the terminal was already switched to the raw mode, it was not possible to authenticate properly.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Add missing attributes to conversion of go-blockdevice disk
to protobuf disk.
Signed-off-by: Jonomir <68125495+Jonomir@users.noreply.github.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
We will use the default IPv6 gateway priority as 2048.
The RA default is 1024, which leads to verbose messages such as 'error adding route: netlink receive: file exists.'
Azure uses DHCPv6 and RA for configuring IPv6 on the node.
The platform sets the default gateway as a fallback in case 'accept_ra' is not set to 2.
Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Add some quirks to make images generated with newer Talos compatible
with images generated by older Talos.
Specifically, reset options were adding in Talos 1.4, so we shouldn't
add them for older versions.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
This is going to be multipart effort to finally use safe.* wrappers in the production code.
Quick regexp search shows that there are around 150 direct type assertions on resources (excluding the ones in this commit).
Also - migrate from `interface{}` to `any` and use `slices.Sort*` instead of `sort.*` where possible.
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
When creating an image under non-default mount prefix, it should be
used explicitly when copying SBC files.
See https://github.com/siderolabs/image-factory/issues/65
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Updated lots of documentation with new/updated flows.
Provide What's New for Talos 1.6.0.
Update Troubleshooting guide to cover more steps.
Make Talos 1.6 docs the default.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Fixes#8057
I went back and forth on the way to fix it exactly, and ended up with a
pretty simple version of a fix.
The problem was that discovery service was removing the member at the
initial phase of reset, which actually still requires KubeSpan to be up:
* leaving `etcd` (need to talk to other members)
* stopping pods (might need to talk to Kubernetes API with some CNIs)
Now leaving discovery service happens way later, when network
interactions are no longer required.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
It works well for small clusters, but with bigger clusters it puts too
much load on the discovery service, as it has quadratic complexity in
number of endpoints discovered/reported from each member.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
This might come handy to distinguish sequences, tasks initiated by a
particular API request.
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
Reimplement `gopacket.PacketSource.PacketsCtx` as `forEachPacket`.
- Use `ZeroCopyPacketDataSource` instead of `PacketDataSource`. I didn't find any specific reason why `PacketDataSource` exists at all, since `NewPacket` is doing copy inside if you don't explicitly tell it not to.
- Use `WillPool` to pool packet buffers. It doesn't fully remove allocations, but it's a safe start.
Send packets back into the pool after we are done with them.
- Pass `Packet` directly to the closure instead of waiting for it on the channel. We don't store this packet anywhere so there is no reason to async this part.
- Drop `time.Sleep` code in `forEachPacket` body.
- Drop `SnapLen` support in client and server since it didn't work anyway (details in the PR).
Closes#7994
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
Generate a structured table of contents following the structure of the
config.
Make high-level examples follow the full structure of the config.
Document new multi-doc machine config.
Fixes#8023
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
The core blockdevice library already supported resolving symlinks, we
just need to get the raw block device name from it, and use it
afterwards.
In QEMU provisioner, leave the first (system) disk as virtio (for
performance), and mount user disks as 'ata', which allows `udevd` to
pick up the disk IDs (not available for `virtio`), and use the symlink
path in the tests.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Fixes#7854
Talos runs an emergency handler if the sequence experience and
unrecoverable failure. The emergency handler was unconditionally
executing "reboot" action if no other action was received (which only
gets received if the sequence completes successfully), so the Shutdown
request might result in a Reboot behavior on error during shutdown
phase.
This is not a pretty fix, but it's hard to deliver the intent from one
part of the code to another right now, so instead use a global variable
which stores default emergency intention, and gets overridden early in
the Shutdown sequence.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>