talos

Author	SHA1	Message	Date
Andrey Smirnov	66f3ffdd4a	fix: ensure that Talos runs in a pod (container) Drop the Kubernetes manifests as static files clean up (this is only needed for upgrades from 1.2.x). Fix Talos handling of cgroup hierarchy: if started in container in a non-root cgroup hiearachy, use that to handle proper cgroup paths. Add a test for a simple TinK mode (Talos-in-Kubernetes). Update the docs. Fixes #8274 Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2024-02-20 15:06:48 +04:00
Noel Georgi	9dbc33972a	feat: add basic syslog implementation Add a basic syslog listening on `/dev/log`. Fixes: #8087 Signed-off-by: Noel Georgi <git@frezbo.dev>	2024-02-20 15:02:06 +05:30
Utku Ozdemir	0b7a27e6a1	feat: allow access to all resources over siderolink in maintenance mode SideroLink is a secure channel, so we can allow read access to the resources. This will give us more control of the node via Omni and/or other systems using SideroLink. Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>	2024-02-16 16:39:11 +01:00
Andrey Smirnov	7ee999f8a3	fix: disable KubeSpan endpoint harvesting by default This disables by default (if not specified in the machine config) the endpoint harvesting for KubeSpan peers. The idea was to observe Wireguard endpoints as seen by other peers in the cluster, and add them to the list of endpoints for the node. This might be helpful only in case of some special type of NATs which are almost never seen in the wild today. So disable by default, but keep an option to enable it. Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2024-02-16 18:18:33 +04:00
Utku Ozdemir	493bb60f81	fix: correctly handle partial configs in `DNSUpstreamController` Prevent `DNSUpstreamController` from panicking by checking if the `machine` section in the config is `nil`. This is the case when a machine has partial configuration, e.g., when the machine has only a `SideroLinkConfig` in its config. Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>	2024-02-16 10:31:54 +01:00
Andrey Smirnov	1366ce14a8	feat: update Kubernetes to v1.30.0-alpha.2 Talos Linux 1.7.0 will ship with Kubernetes v1.30.0. Drop some compatibility for Kubernetes < 1.25, as 1.25 is the minimum supported version now. Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2024-02-15 21:56:56 +04:00
Noel Georgi	15e8bca2b2	feat: support environment in `ExtensionServicesConfig` Support setting extension services environment variables in `ExtensionServiceConfig` document. Refactor `ExtensionServicesConfig` -> `ExtensionServiceConfig` and move extensions config under `runtime` pkg. Fixes: #8271 Signed-off-by: Noel Georgi <git@frezbo.dev>	2024-02-15 20:16:29 +05:30
Matthieu S	3fe82ec461	feat: custom image settings for k8s upgrade Allows to use custom registry/images. Fixes: #8275 Co-authored-by: @g3offrey Signed-off-by: Matthieu STROHL <mstrohl@dive-in-it.com> Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2024-02-15 17:54:01 +04:00
Dmitriy Matrenichev	fa3b933705	chore: replace fmt.Errorf with errors.New where possible This time use `eg` from `x/tools` repo tool to do this. Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>	2024-02-14 17:39:30 +03:00
Noel Georgi	2f0421b406	fix: run xfs_repair on invalid argument error Run `xfs_repair` for invalid argument error. Part of: #8292 Signed-off-by: Noel Georgi <git@frezbo.dev>	2024-02-13 23:01:33 +05:30
Dmitriy Matrenichev	fa2d34dd88	chore: enable v6 support on the same port Replace `SO_REUSEPORT` with `SO_REUSEPORT`. Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>	2024-02-13 01:02:27 +03:00
Dmitriy Matrenichev	83e0b0c19a	chore: adjust dns sockets settings Enable some TCP optimization, set minimal TTL, set socket reuse. Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>	2024-02-12 17:13:03 +03:00
Dmitriy Matrenichev	5324d39167	chore: bump stuff Also fix .golangci.yml file. Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>	2024-02-09 19:19:25 +03:00
Dmitriy Matrenichev	afa71d6b02	chore: use "handle-like" resource in `DNSResolveCacheController` Rework (and simplify) `DNSResolveCacheController` to use `DNSUpstream` "handle-like" resources. Depends on https://github.com/cosi-project/runtime/pull/400 Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>	2024-02-08 21:40:57 +03:00
Andrey Smirnov	3f8a85f1b3	fix: unlock the upgrade mutex properly Fixes #4525 The previous implementation had several issues: * etcd concurrency session never closed * Unlock() with potentially closed context * unlocking when upgrade sequence finishes, but this overlaps with the machine reboot, so a chance that it never got unlocked Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2024-02-08 15:50:02 +04:00
Noel Georgi	1e6c8c4dec	feat: extensions services config Support config files for extension services. Fixes: #7791 Co-authored-by: Andrey Smirnov <andrey.smirnov@siderolabs.com> Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com> Signed-off-by: Noel Georgi <git@frezbo.dev>	2024-02-06 17:12:01 +05:30
shurkys	989ca3ade1	feat: add OpenNebula platform support Initial support without documentation. Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com> Signed-off-by: shurkys <no@mail.com>	2024-02-05 20:43:47 +04:00
Henno Schooljan	a04cc80154	fix: pass TTL when generating client certificate Pass the TTL to the talosconfig generation function. Signed-off-by: Henno Schooljan <github@sfynx.nl> Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2024-02-05 18:54:16 +04:00
Dmitriy Matrenichev	3fe8c12ca6	fix: add log line about controller runtime failing While we decide what to do with #8263 and #8256 this quickfix at least allows us to see what went wrong Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>	2024-02-05 17:22:02 +03:00
Andrey Smirnov	ddbabc7e58	fix: use a separate cgroup for each extension service Fixes #8229 Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2024-02-05 17:37:55 +04:00
Saiyam Pathak	4184e617ab	chore: add test for wasmedge runtime extension Add tests for WasmEdge container runtime system extension. Signed-off-by: Saiyam Pathak <saiyam911@gmail.com> Signed-off-by: Noel Georgi <git@frezbo.dev>	2024-02-05 18:18:13 +05:30
Andrey Smirnov	95ea3a6c65	chore: bump timeout in acquire tests With switching to RSA service account, machine config generation time is considerably higher now, so the test might not make it in time. Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2024-02-05 15:18:22 +04:00
Andrey Smirnov	2ff81c06bc	feat: update runc 1.1.12, containerd 1.7.13 Also: * Linux 6.6.14 + XDP enablement * etcd 3.5.12 Various other bumps for the tools, utilities, and Go modules. Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2024-02-01 17:01:04 +04:00
Andrey Smirnov	9d8cd4d058	chore: drop deprecated method EtcdRemoveMember It was deprecated 16 months ago, time to cleanup. (This is to prepare for the first v1.7 release) Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2024-02-01 15:54:29 +04:00
Andrey Smirnov	17567f19be	fix: take into account the moment seen when cleaning up CRI images Fixes #8069 The image age from the CRI is the moment the image was pulled, so if it was pulled long time ago, the previous version would nuke the image as soon as it is unreferenced. The new version would allow the image to stay for the full grace period in case the rollback is requested. Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2024-02-01 14:44:22 +04:00
Andrey Smirnov	593afeea38	fix: run the interactive installer loop to report errors In the previous implementation, even though `installer.err` was set, it was never checked 🤦. The run loop was stolen from the dashboard code. Fixes #8205 Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2024-01-31 19:20:46 +04:00
Andrey Smirnov	87be76b878	fix: be more tolerant to error handling in Mounts API Fixes #8202 If some mountpoint can't be queried successfully for 'diskfree' information, don't treat that as an error, and report zero values for disk usage/size instead. Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2024-01-31 18:24:38 +04:00
Dmitriy Matrenichev	ebeef28525	feat: implement local caching dns server This PR adds a new controller - `DNSServerController` that starts tcp and udp dns servers locally. Just like `EtcFileController` it monitors `ResolverStatusType` and updates the list of destinations from there. Most of the caching logic is in our "lobotomized" "`CoreDNS` fork. We need this fork because default `CoreDNS` carries full Caddy server and various other modules that we don't need in Talos. On our side we implement random selection of the actual dns and request forwarding. Closes #7693 Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>	2024-01-29 20:26:38 +03:00
Andrey Smirnov	b44551ccdb	feat: update Linux to 6.6.13 See https://github.com/siderolabs/pkgs/pull/873 Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2024-01-29 16:50:33 +04:00
Andrey Smirnov	d677901b67	feat: implement device selector for 'physical' Closes #8090 Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2024-01-23 15:05:51 +04:00
Andrey Smirnov	c1e45071f0	refactor: use etcd configuration from the EtcdSpec resource This is currently no-op, just noticed that while looking into another bug. This should make the intention more clean. Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2024-01-22 16:06:16 +04:00
Andrey Smirnov	474eccdc4c	fix: watch bufer overrun for RouteStatus Fixes #8157 This PR contains two fixes, both related to the same problem. Several routes for different links but same IPv6 destination might exist at the same time, so route resource ID should handle that. The problem was that these routes were mis-reported causing internally updates for the same resources multiple times (equal to the number of the links). Don't trigger controllers more often than 10 times/seconds (with burst of 5) for kernel notifications. This ensures Talos doesn't try to reflect current state of the network subsystem too often as resources, which causes excessive CPU usage and might potentially lead to the buffer overrun under high rate of changes. Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2024-01-17 19:28:25 +04:00
Andrey Smirnov	9782319c31	fix: support KubePrism settings in Kubernetes Discovery Fixes #8143 Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2024-01-16 20:41:13 +04:00
Dmitriy Matrenichev	f70b47dddc	fix: force KubePrism to connect using IPv4 Before this change KubePrism used hardcoded "localhost" as destination which Go could resolve to IPv6 destination and then fail to connect to. This change forces KubePrism to connect using IPv4 and uses hardcoded "127.0.0.1" destination so it will always use IPv4. For #8112 Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>	2024-01-15 21:25:05 +03:00
Utku Ozdemir	7fa7362ddc	fix: fix nodes on dashboard footer when node names are used in `--nodes` When the dashboard is used via the CLI through a proxy, e.g., through Omni, node names or IDs can be used in the `--nodes` flag instead of the IPs. This caused rendering inconsistencies in the dashboard, as some parts of it used the IPs and some used the names passed in the context. Fix this by collecting all node IPs on dashboard start, and map these IPs to the respective nodes passed as the `--nodes` flag. On the dashboard footer, we always display the node names as they are passed in the `--nodes` flag. As part of it, remove the node list change reactivity from the dashboard, so it will always take the passed nodes as the truth. The IP to node mapping collection at dashboard startup also solves another issue where the first API call by the dashboard triggered the interactive API authentication (e.g., the OIDC flow). Previously, because the terminal was already switched to the raw mode, it was not possible to authenticate properly. Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>	2024-01-12 12:00:08 +01:00
Jonomir	dea9bda2d0	fix: disk UUID & WWID always empty in `talosctl disks` Add missing attributes to conversion of go-blockdevice disk to protobuf disk. Signed-off-by: Jonomir <68125495+Jonomir@users.noreply.github.com> Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2024-01-11 14:37:39 +04:00
Serge Logvinov	f6926faab5	fix: default priority for ipv6 We will use the default IPv6 gateway priority as 2048. The RA default is 1024, which leads to verbose messages such as 'error adding route: netlink receive: file exists.' Azure uses DHCPv6 and RA for configuring IPv6 on the node. The platform sets the default gateway as a fallback in case 'accept_ra' is not set to 2. Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev> Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2023-12-29 18:42:23 +04:00
Andrey Smirnov	0a30ef7845	fix: imager should support different Talos versions Add some quirks to make images generated with newer Talos compatible with images generated by older Talos. Specifically, reset options were adding in Talos 1.4, so we shouldn't add them for older versions. Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2023-12-22 16:13:34 +04:00
Andrey Smirnov	e6e422b92a	chore: bump dependencies Go modules, tools, etc. Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2023-12-21 19:01:16 +04:00
Dmitriy Matrenichev	59b62398f6	chore: modernize machined/pkg/controllers/k8s This is going to be multipart effort to finally use safe.* wrappers in the production code. Quick regexp search shows that there are around 150 direct type assertions on resources (excluding the ones in this commit). Also - migrate from `interface{}` to `any` and use `slices.Sort` instead of `sort.` where possible. Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>	2023-12-15 19:33:06 +03:00
Andrey Smirnov	760f793d55	fix: use correct prefix when installing SBC files When creating an image under non-default mount prefix, it should be used explicitly when copying SBC files. See https://github.com/siderolabs/image-factory/issues/65 Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2023-12-15 19:46:10 +04:00
Noel Georgi	0b94550c42	chore: fix the gvisor test The gvisor test was not using the correct runtimeclass and would have always passed the regardless. Signed-off-by: Noel Georgi <git@frezbo.dev>	2023-12-15 20:48:44 +05:30
Andrey Smirnov	d803e40ef2	docs: provide documentation for Talos 1.6 Updated lots of documentation with new/updated flows. Provide What's New for Talos 1.6.0. Update Troubleshooting guide to cover more steps. Make Talos 1.6 docs the default. Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2023-12-15 16:36:57 +04:00
Andrey Smirnov	10c59a6b90	fix: leave discovery service later in the reset sequence Fixes #8057 I went back and forth on the way to fix it exactly, and ended up with a pretty simple version of a fix. The problem was that discovery service was removing the member at the initial phase of reset, which actually still requires KubeSpan to be up: * leaving `etcd` (need to talk to other members) * stopping pods (might need to talk to Kubernetes API with some CNIs) Now leaving discovery service happens way later, when network interactions are no longer required. Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2023-12-13 19:16:12 +04:00
Andrey Smirnov	131a1b1671	fix: add a KubeSpan option to disable extra endpoint harvesting It works well for small clusters, but with bigger clusters it puts too much load on the discovery service, as it has quadratic complexity in number of endpoints discovered/reported from each member. Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2023-12-12 14:07:31 +04:00
Artem Chernyshev	4547ad9afa	feat: send `actor id` to the SideroLink events sink This might come handy to distinguish sequences, tasks initiated by a particular API request. Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>	2023-12-11 21:59:02 +03:00
Dmitriy Matrenichev	6bb1e99aa3	chore: optimize pcap dump Reimplement `gopacket.PacketSource.PacketsCtx` as `forEachPacket`. - Use `ZeroCopyPacketDataSource` instead of `PacketDataSource`. I didn't find any specific reason why `PacketDataSource` exists at all, since `NewPacket` is doing copy inside if you don't explicitly tell it not to. - Use `WillPool` to pool packet buffers. It doesn't fully remove allocations, but it's a safe start. Send packets back into the pool after we are done with them. - Pass `Packet` directly to the closure instead of waiting for it on the channel. We don't store this packet anywhere so there is no reason to async this part. - Drop `time.Sleep` code in `forEachPacket` body. - Drop `SnapLen` support in client and server since it didn't work anyway (details in the PR). Closes #7994 Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>	2023-12-11 15:44:42 +03:00
Andrey Smirnov	46121c9fec	docs: rework machine config documentation generation Generate a structured table of contents following the structure of the config. Make high-level examples follow the full structure of the config. Document new multi-doc machine config. Fixes #8023 Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2023-12-08 14:16:40 +04:00
Andrey Smirnov	270604bead	fix: support user disks via symlinks The core blockdevice library already supported resolving symlinks, we just need to get the raw block device name from it, and use it afterwards. In QEMU provisioner, leave the first (system) disk as virtio (for performance), and mount user disks as 'ata', which allows `udevd` to pick up the disk IDs (not available for `virtio`), and use the symlink path in the tests. Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2023-12-05 22:02:56 +04:00
Andrey Smirnov	474fa0480d	fix: store and execute desired action on emergency action Fixes #7854 Talos runs an emergency handler if the sequence experience and unrecoverable failure. The emergency handler was unconditionally executing "reboot" action if no other action was received (which only gets received if the sequence completes successfully), so the Shutdown request might result in a Reboot behavior on error during shutdown phase. This is not a pretty fix, but it's hard to deliver the intent from one part of the code to another right now, so instead use a global variable which stores default emergency intention, and gets overridden early in the Shutdown sequence. Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2023-12-04 19:51:48 +04:00

... 2 3 4 5 6 ...

2122 Commits