talos

Author	SHA1	Message	Date
Henno Schooljan	a04cc80154	fix: pass TTL when generating client certificate Pass the TTL to the talosconfig generation function. Signed-off-by: Henno Schooljan <github@sfynx.nl> Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2024-02-05 18:54:16 +04:00
Dmitriy Matrenichev	3fe8c12ca6	fix: add log line about controller runtime failing While we decide what to do with #8263 and #8256 this quickfix at least allows us to see what went wrong Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>	2024-02-05 17:22:02 +03:00
Andrey Smirnov	ddbabc7e58	fix: use a separate cgroup for each extension service Fixes #8229 Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2024-02-05 17:37:55 +04:00
Saiyam Pathak	4184e617ab	chore: add test for wasmedge runtime extension Add tests for WasmEdge container runtime system extension. Signed-off-by: Saiyam Pathak <saiyam911@gmail.com> Signed-off-by: Noel Georgi <git@frezbo.dev>	2024-02-05 18:18:13 +05:30
Andrey Smirnov	95ea3a6c65	chore: bump timeout in acquire tests With switching to RSA service account, machine config generation time is considerably higher now, so the test might not make it in time. Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2024-02-05 15:18:22 +04:00
Andrey Smirnov	2ff81c06bc	feat: update runc 1.1.12, containerd 1.7.13 Also: * Linux 6.6.14 + XDP enablement * etcd 3.5.12 Various other bumps for the tools, utilities, and Go modules. Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2024-02-01 17:01:04 +04:00
Andrey Smirnov	9d8cd4d058	chore: drop deprecated method EtcdRemoveMember It was deprecated 16 months ago, time to cleanup. (This is to prepare for the first v1.7 release) Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2024-02-01 15:54:29 +04:00
Andrey Smirnov	17567f19be	fix: take into account the moment seen when cleaning up CRI images Fixes #8069 The image age from the CRI is the moment the image was pulled, so if it was pulled long time ago, the previous version would nuke the image as soon as it is unreferenced. The new version would allow the image to stay for the full grace period in case the rollback is requested. Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2024-02-01 14:44:22 +04:00
Andrey Smirnov	593afeea38	fix: run the interactive installer loop to report errors In the previous implementation, even though `installer.err` was set, it was never checked 🤦. The run loop was stolen from the dashboard code. Fixes #8205 Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2024-01-31 19:20:46 +04:00
Andrey Smirnov	87be76b878	fix: be more tolerant to error handling in Mounts API Fixes #8202 If some mountpoint can't be queried successfully for 'diskfree' information, don't treat that as an error, and report zero values for disk usage/size instead. Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2024-01-31 18:24:38 +04:00
Dmitriy Matrenichev	ebeef28525	feat: implement local caching dns server This PR adds a new controller - `DNSServerController` that starts tcp and udp dns servers locally. Just like `EtcFileController` it monitors `ResolverStatusType` and updates the list of destinations from there. Most of the caching logic is in our "lobotomized" "`CoreDNS` fork. We need this fork because default `CoreDNS` carries full Caddy server and various other modules that we don't need in Talos. On our side we implement random selection of the actual dns and request forwarding. Closes #7693 Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>	2024-01-29 20:26:38 +03:00
Andrey Smirnov	b44551ccdb	feat: update Linux to 6.6.13 See https://github.com/siderolabs/pkgs/pull/873 Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2024-01-29 16:50:33 +04:00
Andrey Smirnov	d677901b67	feat: implement device selector for 'physical' Closes #8090 Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2024-01-23 15:05:51 +04:00
Andrey Smirnov	c1e45071f0	refactor: use etcd configuration from the EtcdSpec resource This is currently no-op, just noticed that while looking into another bug. This should make the intention more clean. Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2024-01-22 16:06:16 +04:00
Andrey Smirnov	474eccdc4c	fix: watch bufer overrun for RouteStatus Fixes #8157 This PR contains two fixes, both related to the same problem. Several routes for different links but same IPv6 destination might exist at the same time, so route resource ID should handle that. The problem was that these routes were mis-reported causing internally updates for the same resources multiple times (equal to the number of the links). Don't trigger controllers more often than 10 times/seconds (with burst of 5) for kernel notifications. This ensures Talos doesn't try to reflect current state of the network subsystem too often as resources, which causes excessive CPU usage and might potentially lead to the buffer overrun under high rate of changes. Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2024-01-17 19:28:25 +04:00
Andrey Smirnov	9782319c31	fix: support KubePrism settings in Kubernetes Discovery Fixes #8143 Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2024-01-16 20:41:13 +04:00
Dmitriy Matrenichev	f70b47dddc	fix: force KubePrism to connect using IPv4 Before this change KubePrism used hardcoded "localhost" as destination which Go could resolve to IPv6 destination and then fail to connect to. This change forces KubePrism to connect using IPv4 and uses hardcoded "127.0.0.1" destination so it will always use IPv4. For #8112 Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>	2024-01-15 21:25:05 +03:00
Utku Ozdemir	7fa7362ddc	fix: fix nodes on dashboard footer when node names are used in `--nodes` When the dashboard is used via the CLI through a proxy, e.g., through Omni, node names or IDs can be used in the `--nodes` flag instead of the IPs. This caused rendering inconsistencies in the dashboard, as some parts of it used the IPs and some used the names passed in the context. Fix this by collecting all node IPs on dashboard start, and map these IPs to the respective nodes passed as the `--nodes` flag. On the dashboard footer, we always display the node names as they are passed in the `--nodes` flag. As part of it, remove the node list change reactivity from the dashboard, so it will always take the passed nodes as the truth. The IP to node mapping collection at dashboard startup also solves another issue where the first API call by the dashboard triggered the interactive API authentication (e.g., the OIDC flow). Previously, because the terminal was already switched to the raw mode, it was not possible to authenticate properly. Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>	2024-01-12 12:00:08 +01:00
Jonomir	dea9bda2d0	fix: disk UUID & WWID always empty in `talosctl disks` Add missing attributes to conversion of go-blockdevice disk to protobuf disk. Signed-off-by: Jonomir <68125495+Jonomir@users.noreply.github.com> Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2024-01-11 14:37:39 +04:00
Serge Logvinov	f6926faab5	fix: default priority for ipv6 We will use the default IPv6 gateway priority as 2048. The RA default is 1024, which leads to verbose messages such as 'error adding route: netlink receive: file exists.' Azure uses DHCPv6 and RA for configuring IPv6 on the node. The platform sets the default gateway as a fallback in case 'accept_ra' is not set to 2. Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev> Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2023-12-29 18:42:23 +04:00
Andrey Smirnov	0a30ef7845	fix: imager should support different Talos versions Add some quirks to make images generated with newer Talos compatible with images generated by older Talos. Specifically, reset options were adding in Talos 1.4, so we shouldn't add them for older versions. Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2023-12-22 16:13:34 +04:00
Andrey Smirnov	e6e422b92a	chore: bump dependencies Go modules, tools, etc. Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2023-12-21 19:01:16 +04:00
Dmitriy Matrenichev	59b62398f6	chore: modernize machined/pkg/controllers/k8s This is going to be multipart effort to finally use safe.* wrappers in the production code. Quick regexp search shows that there are around 150 direct type assertions on resources (excluding the ones in this commit). Also - migrate from `interface{}` to `any` and use `slices.Sort` instead of `sort.` where possible. Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>	2023-12-15 19:33:06 +03:00
Andrey Smirnov	760f793d55	fix: use correct prefix when installing SBC files When creating an image under non-default mount prefix, it should be used explicitly when copying SBC files. See https://github.com/siderolabs/image-factory/issues/65 Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2023-12-15 19:46:10 +04:00
Noel Georgi	0b94550c42	chore: fix the gvisor test The gvisor test was not using the correct runtimeclass and would have always passed the regardless. Signed-off-by: Noel Georgi <git@frezbo.dev>	2023-12-15 20:48:44 +05:30
Andrey Smirnov	d803e40ef2	docs: provide documentation for Talos 1.6 Updated lots of documentation with new/updated flows. Provide What's New for Talos 1.6.0. Update Troubleshooting guide to cover more steps. Make Talos 1.6 docs the default. Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2023-12-15 16:36:57 +04:00
Andrey Smirnov	10c59a6b90	fix: leave discovery service later in the reset sequence Fixes #8057 I went back and forth on the way to fix it exactly, and ended up with a pretty simple version of a fix. The problem was that discovery service was removing the member at the initial phase of reset, which actually still requires KubeSpan to be up: * leaving `etcd` (need to talk to other members) * stopping pods (might need to talk to Kubernetes API with some CNIs) Now leaving discovery service happens way later, when network interactions are no longer required. Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2023-12-13 19:16:12 +04:00
Andrey Smirnov	131a1b1671	fix: add a KubeSpan option to disable extra endpoint harvesting It works well for small clusters, but with bigger clusters it puts too much load on the discovery service, as it has quadratic complexity in number of endpoints discovered/reported from each member. Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2023-12-12 14:07:31 +04:00
Artem Chernyshev	4547ad9afa	feat: send `actor id` to the SideroLink events sink This might come handy to distinguish sequences, tasks initiated by a particular API request. Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>	2023-12-11 21:59:02 +03:00
Dmitriy Matrenichev	6bb1e99aa3	chore: optimize pcap dump Reimplement `gopacket.PacketSource.PacketsCtx` as `forEachPacket`. - Use `ZeroCopyPacketDataSource` instead of `PacketDataSource`. I didn't find any specific reason why `PacketDataSource` exists at all, since `NewPacket` is doing copy inside if you don't explicitly tell it not to. - Use `WillPool` to pool packet buffers. It doesn't fully remove allocations, but it's a safe start. Send packets back into the pool after we are done with them. - Pass `Packet` directly to the closure instead of waiting for it on the channel. We don't store this packet anywhere so there is no reason to async this part. - Drop `time.Sleep` code in `forEachPacket` body. - Drop `SnapLen` support in client and server since it didn't work anyway (details in the PR). Closes #7994 Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>	2023-12-11 15:44:42 +03:00
Andrey Smirnov	46121c9fec	docs: rework machine config documentation generation Generate a structured table of contents following the structure of the config. Make high-level examples follow the full structure of the config. Document new multi-doc machine config. Fixes #8023 Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2023-12-08 14:16:40 +04:00
Andrey Smirnov	270604bead	fix: support user disks via symlinks The core blockdevice library already supported resolving symlinks, we just need to get the raw block device name from it, and use it afterwards. In QEMU provisioner, leave the first (system) disk as virtio (for performance), and mount user disks as 'ata', which allows `udevd` to pick up the disk IDs (not available for `virtio`), and use the symlink path in the tests. Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2023-12-05 22:02:56 +04:00
Andrey Smirnov	474fa0480d	fix: store and execute desired action on emergency action Fixes #7854 Talos runs an emergency handler if the sequence experience and unrecoverable failure. The emergency handler was unconditionally executing "reboot" action if no other action was received (which only gets received if the sequence completes successfully), so the Shutdown request might result in a Reboot behavior on error during shutdown phase. This is not a pretty fix, but it's hard to deliver the intent from one part of the code to another right now, so instead use a global variable which stores default emergency intention, and gets overridden early in the Shutdown sequence. Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2023-12-04 19:51:48 +04:00
Andrey Smirnov	dbf274ddf7	fix: skip writing the file if the contents haven't changed As the controller reconciles every /etc file present, it might be called multiple times for the same file, even if the actual contents haven't changed. Rewriting the file might lead to some concurrent process seeing incomplete file contents more often than needed. Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2023-12-04 15:58:03 +04:00
Andrey Smirnov	d8a435f0e4	fix: initialize boot assets with defaults early The problem was that bootloaders were correctly picking up defaults for `installer` mode (vs. `imager` mode), but DTB and other SBC stuff wasn't properly initialized, so installing on SBC fails. Now all options are properly initialized with defaults early in the process. Fixes #8009 Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2023-12-01 17:47:05 +04:00
Andrey Smirnov	c6835de17a	fix: pick etcd adverised addresses from 'current' addresses Fixes #7947 This way etcd advertised address can be picked from the `external IPs` of the machine. Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2023-12-01 17:26:28 +04:00
Andrey Smirnov	e71e3e4161	feat: support extra arguments for `flanneld` Fixes #7754 Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2023-12-01 16:18:02 +04:00
Andrey Smirnov	36c8ddb5e1	feat: implement ingress firewall rules Fixes #4421 See documentation for details on how to use the feature. With `talosctl cluster create`, firewall can be easily test with `--with-firewall=accept\|block` (default mode). Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2023-11-30 22:58:16 +04:00
Dmitriy Matrenichev	0b111ecb81	fix: support slices of enums and fix NfTablesConntrackStateMatch We already have the code which supports custom enums, so let's extend it to support custom enums in slices and fix the NfTablesConntrackStateMatch proto definition. Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>	2023-11-30 00:23:16 +03:00
Andrey Smirnov	9a85217412	feat: improve nftables backend Many changes to the nftables backend which will be used in the follow-up PR with #4421. 1. Add support for chain policy: drop/accept. 2. Properly handle match on all IPs in the set (`0.0.0.0/0` like). 3. Implement conntrack state matching. 4. Implement multiple ifname matching in a single rule. 5. Implement anonymous counters. Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2023-11-29 21:22:47 +04:00
Noel Georgi	f041b26299	chore: add tests for mdadm extension Add tests for mdadm extension. See: https://github.com/siderolabs/extensions/pull/271 Signed-off-by: Noel Georgi <git@frezbo.dev>	2023-11-27 23:18:35 +05:30
Andrey Smirnov	e46e6a312f	feat: implement nftables backend Implement initial set of backend controllers/resources to handle nftables chains/rules etc. Replace the KubeSpan nftables operations with controller-based. See #4421 Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2023-11-27 21:14:15 +04:00
Dmitriy Matrenichev	ba827bf8b8	chore: support getting multiple endpoints from the `Provision` rpc call The code will rotate through the endpoints, until it reaches the end, and only then it will try to do the provisioning again. Closes #7973 Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>	2023-11-25 21:38:44 +03:00
Dmitriy Matrenichev	dd45dd06cf	chore: add custom node taints This PR adds support for custom node taints. Refer to `nodeTaints` in the `configuration` for more information. Closes #7581 Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>	2023-11-25 18:33:18 +03:00
Dmitriy Matrenichev	70d53ee13c	chore: deprecate .persist and .extensions This commit deprecates those things: - Removes the support of `.persist` flag. From now, it should always be enabled or not defined in the config. - Removes the documentation for `.bootloader`. It never worked anyway. - Adds a warning for `.machine.install.extensions`, suggests to use boot-assets. Closes #7972 Closes #7507 Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>	2023-11-22 20:35:38 +03:00
Noel Georgi	aca8b5e179	fix: ignore kernel command line in container mode Ignore kernel command line for `SideroLink` and `EventsSink` config when running in container mode. Otherwise when running Talos as a docker container in Talos it picks up the host kernel cmdline and try to configure SideroLink/EventsSink. Signed-off-by: Noel Georgi <git@frezbo.dev>	2023-11-21 18:55:37 +05:30
Andrey Smirnov	27d208c26b	feat: implement OAuth2 device flow for machine config Fixes #7939 See documentation in the PR for the description of the feature. Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2023-11-20 14:31:43 +04:00
Noel Georgi	5c8fa2a803	chore: start containerd early in boot Start container early in the boot process so system extension services start in maintenance mode. Fixes: #7083 Signed-off-by: Noel Georgi <git@frezbo.dev>	2023-11-16 23:19:33 +05:30
Noel Georgi	0d3c3ed716	feat: support kube scheduler config Support kube-scheduler config. Fixes: #7905 Partially fixes: #7911 Signed-off-by: Noel Georgi <git@frezbo.dev>	2023-11-15 10:15:23 +05:30
Andrey Smirnov	06941b7e5c	fix: allow rootfs propagation configuration for extension services Fixes #7873 Some services which perform mounts inside the container which require mounts to propagate back to the host (e.g. `stargz-snapshotter`) require this configuration setting. Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2023-11-13 21:58:22 +04:00

1 2 3 4 5 ...

1955 Commits