talos

Author	SHA1	Message	Date
Noel Georgi	3d2dad4e69	chore: show securtiystate on dashboard Show Talos SecurityState and MountStatus on dashboard. Fixes: #7675 Signed-off-by: Noel Georgi <git@frezbo.dev>	2023-09-06 21:46:25 +05:30
Noel Georgi	1eebbce357	chore: add output flag for talosctl config info Add output flag for `talosctl config info`. This allows to programatically gather endpoints for CI tests. Eg: ```bash _out/talosctl-linux-amd64 config info --output json \| jq '.Contexts[].Endpoints[0]' ``` Signed-off-by: Noel Georgi <git@frezbo.dev>	2023-09-05 21:25:21 +04:00
Noel Georgi	3fbed806c4	chore: add tests for util-linux extensions Add tests for utils-linux extensions. Ref: https://github.com/siderolabs/extensions/pull/216 Signed-off-by: Noel Georgi <git@frezbo.dev>	2023-09-05 19:29:50 +05:30
Andrey Smirnov	6058c36023	fix: shorten VLAN link names to fit into the limit of 15 characters Fixes #7679 This should be no-op if the link name is <= 10 chars, but with predictable interface names based on MAC addresses, they have to be shortened to make some space for VLAN ID. Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2023-09-05 14:51:09 +04:00
Andrey Smirnov	9c2f765c86	fix: allow network device selector to match multiple links Fixes #7673 Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2023-09-04 20:37:04 +04:00
Andrey Smirnov	d91b5b3a31	feat: set environment variables early in the boot Fixes #7696 This allows to set env variables from `talos.environment=` command line arg. Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2023-09-04 15:12:10 +04:00
Andrey Smirnov	c918c0855d	fix: set correct (1 year) talosconfig expiration Fixes #7698 Also fix `talosctl config info` for `talosconfig` without a client certificate (e.g. Omni-generated one). Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2023-09-04 14:46:28 +04:00
Andrey Smirnov	79bbdf454e	fix: set proper timeouts for KubePrism loadbalancer The default timeouts are very aggressive, and we should use explicit timeouts so that healh checks don't run that often. Fixes #7690 Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2023-09-01 00:16:09 +04:00
Andrey Smirnov	b8fb55d5c2	fix: use a mount prefix when installing a bootloader This is not a problem in general, but when running multiple image generation procedures using the same mount point is a problem. This is a no-op if `MountPrefix` is not set (when installing/upgrading vs. creating an image). Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2023-08-31 22:21:41 +04:00
Andrey Smirnov	2d3ac925ea	refactor: update NTP spike detector See https://github.com/siderolabs/talos/issues/7080#issuecomment-1696105986 The NTP spike detector code was refactored out of the main NTP code so that it can be unit-tested. I dropped one check which I think is causing false-positives in the spike detector (when NTP offset is higher than the RTT of the best packet received so far). The overall flow resembles the one in systemd-timesync, the current implementation has this check: `6639ac474e/src/timesync/timesyncd-manager.c (L357-L360)` This check was introduced in the initial release, after some refactoring: `3dbc762003 (diff-4aa9995f07bb31b9884d40a7634f5f6d30245dfd26ac27b89cd5fd3bd4eef56aR429-R431)` There is no equivalent of it in the RFC: https://datatracker.ietf.org/doc/html/rfc5905#appendix-A.5.2 Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2023-08-29 20:56:42 +04:00
Noel Georgi	d03dc7a8af	chore: validate new system extensions Validate the amdgpu and intel-ice firmware extensions. Signed-off-by: Noel Georgi <git@frezbo.dev>	2023-08-25 17:18:19 +05:30
Andrey Smirnov	3c9f7a7de6	chore: re-enable nolintlint and typecheck linters Drop startup/rand.go, as since Go 1.20 `rand.Seed` is done automatically. Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2023-08-25 01:05:41 +04:00
Andrey Smirnov	8670450d28	release(v1.6.0-alpha.0): prepare release This is the official v1.6.0-alpha.0 release. Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2023-08-24 17:09:34 +04:00
Noel Georgi	6778ded29d	feat: add e2e-aws for nvidia extensions Add e2e tests for nvidia Signed-off-by: Noel Georgi <git@frezbo.dev>	2023-08-24 17:43:36 +05:30
Andrey Smirnov	74c07ed714	chore: update Go to 1.21 This fixes a problem in the `RouteSpecController` which is due to a subtle (but correct) change in the behavior in the `stdlib`. Also some small (but should be safe) bumps. Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2023-08-23 22:52:04 +04:00
Andrey Smirnov	c0ea4d7ba5	fix: properly calculate overal of node address with subnet filters Example: host has address `10.0.0.1/8`, while Kubernetes pod CIDR is `10.244.0.0/16`. These two subnets overlap, but the address `10.0.0.1` isn't contained in the `10.244.0.0/16` subnet. This change fixes the check to make sure address is not contained vs. the address subnet overlaps with the filter. NB: this is still a bad idea to have host network subnet to overlap with Kubernetes pod/service CIDRs. Also refactor the unit-tests to use new (better ways) to do assertions. Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2023-08-23 21:16:58 +04:00
Noel Georgi	d6b2719e2e	chore: drone: move extensions step to a function Move drone extensions integration to a function. This allows us to re-use the code and just depend on a single step rather than explicitly defining all dependencies. Signed-off-by: Noel Georgi <git@frezbo.dev>	2023-08-23 20:56:43 +05:30
Andrey Smirnov	9608ef56dc	chore: allow bridge traffic with DHCP broadcast traffic This is required for https://github.com/siderolabs/sidero/pull/1070, as we need to allow DHCP traffic from Sidero controller running in a VM through the bridge to other VMs. Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2023-08-23 18:37:37 +04:00
Noel Georgi	833895940b	chore: add tests for zfs extension Add tests for ZFS and btrfs extensions. Also fix the e2e-aws cron pipeline. Signed-off-by: Noel Georgi <git@frezbo.dev>	2023-08-23 11:16:25 +05:30
Utku Ozdemir	ea0d6e8c6a	fix: prevent dashboard crashes when process info is not available Processes and their info are not guaranteed to be present on the api-based data gathered by the dashboard. Therefore, we switch to using nil-safe access to the CPU time when rendering the process table. Closes siderolabs/talos#7645. Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>	2023-08-22 12:55:36 +02:00
Andrey Smirnov	e9077a6fb9	feat: filter the hostname to produce nodename Fixes #7615 This extends the previous handling when Talos did `ToLower()` on the hostname to do the full filtering as expected. Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2023-08-22 12:41:57 +04:00
Andrey Smirnov	dc8361c1d5	fix: properly GC images supplied with both tag and digest This is a follow-up fix for #7640 I noticed that image cleanup controller cleans up the images if specified with both tag and digest. The problem was incorrectly building image references in the expected set of images, so they were incorrectly marked as unused. Refactor the code to make the core part testable. Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2023-08-21 21:04:24 +04:00
Andrey Smirnov	b56e8b7d9b	fix: support 'List' type manifests Fixes #7636 This support a `List`-type manifests by unwrapping them into individual objects. Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2023-08-21 16:48:37 +04:00
Andrey Smirnov	574d48e540	fix: use image digest when starting a container First of all, it seems to be "right way", as it makes sure the image is looked up by the digest. Second, it fixes the case when image is specified with both tag and digest (which is not supposed to be the correct ref, but it is used frequently). Talos since 1.5.0 stores images with the following aliases: ``` gcr.io/etcd-development/etcd:v3.5.9 gcr.io/etcd-development/etcd@sha256:8c956d9b0d39745fa574bb4dbacd362ffdc1109479432f54094859d4cf984b17 ghcr.io/siderolabs/kubelet:v1.28.0 ghcr.io/siderolabs/kubelet@sha256:50710f2cd3328c23f57dfc7fb00940d8cfd402315e33fc7cb8184fc660650a5c sha256:50710f2cd3328c23f57dfc7fb00940d8cfd402315e33fc7cb8184fc660650a5c sha256:8c956d9b0d39745fa574bb4dbacd362ffdc1109479432f54094859d4cf984b17 ``` This change pulls the digest format (the last in this list) and uses it to start a container. Fixes #7640 Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2023-08-21 15:48:59 +04:00
Noel Georgi	6b0373ebef	chore: move bash tests to integration move extensions and secureboot tests to integration. Makes it easier to test. Signed-off-by: Noel Georgi <git@frezbo.dev>	2023-08-17 19:58:35 +05:30
Andrey Smirnov	86c94eff8d	refactor: docgen and config examples Short version is: move from global variables/`init()` function into explicit functions. `docgen` was updated to skip creating any top-level global variables, now `Doc` information is generated on the fly when it is accessed. Talos itself doesn't marshal the configuration often, so in general it should never be accessed for Talos (but will be accessed e.g. for `talosctl`). Machine config examples were changed manually from variables to functions returning a value and moved to a separate file. There are no changes to the output of `talosctl gen config`. There is a small change to the generated documentation, which I believe is a correct one, as previously due to value reuse it was clobbered with other data. Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2023-08-10 14:56:01 +04:00
Andrey Smirnov	ee6d639f6c	fix: match routes on the priority properly Fixes #7592 The problem was a mismatch between a "primary key" (ID) of the `RouteSpec` and the way routes are looked up in the kernel - with two idential routes but different priority Talos would end up in an infinite loop fighting to remove and re-add back same route, as priority never matches. Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>	2023-08-10 14:29:47 +04:00
Dmitriy Matrenichev	c4a1ca8d61	chore: remove <-errCh where possible in grpc methods Simplify code by passing error directly into the pipe closer. Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>	2023-08-07 22:28:58 +03:00
Andrey Smirnov	e0f383598e	chore: clean up the output of the `imager` Use `Progress`, and options to pass around the way messages are written. Fixed some tiny issues in the code, but otherwise no functional changes. To make colored output work with `docker run`, switched back image generation to use volume mount for output (old mode is still functioning, but it's not the default, and it works when docker is not running on the same host). Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-08-07 16:00:14 +04:00
Andrey Smirnov	fb536af4d1	chore: optimize memory usage of `tcell` library on init There are two changes here: * build `machined` binary with `tcell_minimal` tag (which disables loading some parts of the terminfo database), which also affects `apid`, `trustd` and `dashboard` processes, as they run from the same executable; in `dashboard` explicitly import `linux` terminal we're using when the `dashboard` runs on the machine * pass `TCELL_MINIMIZE=1` environment variable to each Talos process which removes 0.5MiB of runewdith allocation for a lookup table See #7578 Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-08-04 17:59:18 +04:00
Artem Chernyshev	7d688ccfeb	fix: make encryption config provider default to `luks2` if not set Fixes: https://github.com/siderolabs/talos/issues/7515 Rename `Kind` to `Provider` in the `v1alpha1_provider`. Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>	2023-08-04 12:20:55 +03:00
Dmitriy Matrenichev	80238a05a6	chore: unify semver under `github.com/blang/semver/v4` Currently, we use `github.com/coreos/go-semver/semver` and `github.com/hashicorp/go-version` for version parsing. As we use `github.com/blang/semver/v4` in our other projects, and it has more features, it makes sense to use it across the projects. It also doesn't allocate like crazy in `KubernetesVersion.SupportedWith`. Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>	2023-08-04 00:29:52 +03:00
Andrey Smirnov	0f1920bdda	chore: provide a resource to peek into Linux clock adjustments This is a follow-up for #7567, which won't be backported to 1.5. This allows to get an output like: ``` $ talosctl -n 172.20.0.5 get adjtimestatus -w NODE * NAMESPACE TYPE ID VERSION OFFSET ESTERROR MAXERROR STATUS SYNC 172.20.0.5 + runtime AdjtimeStatus node 47 -18.14306ms 0s 191.5ms STA_PLL \| STA_NANO true 172.20.0.5 runtime AdjtimeStatus node 48 -17.109555ms 0s 206.5ms STA_NANO \| STA_PLL true 172.20.0.5 runtime AdjtimeStatus node 49 -16.134923ms 0s 221.5ms STA_NANO \| STA_PLL true 172.20.0.5 runtime AdjtimeStatus node 50 -15.21581ms 0s 236.5ms STA_PLL \| STA_NANO true ``` Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-08-03 22:06:53 +04:00
Andrey Smirnov	4eab3017b0	fix: calculate log2i properly Fixes #7080 The real bug was off-by-one in `log2i` implementation, other changes are cleanups as `x/sys/unix` package now contains all the constants we need. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-08-03 21:17:58 +04:00
Jared Davenport	bcf2845307	fix: update providerid prefix for aws This PR updates the ProviderID format for aws resources. There seems to be a bug when using Talos CCM (which consumes this value from Talos) because the format is `aws://x/y` (two slashes) vs. the expected `aws:///x/y` (three slashes) that is set with the AWS CCM code [here](`d055109367/pkg/providers/v1/instances.go (L47-L53)`). Setting only two slashes causes important software in the workload cluster to fail, specifically cluster-autoscaler. The regex they use for pulling providerID is [here](`702e9685d6/cluster-autoscaler/cloudprovider/aws/aws_cloud_provider.go (L195)`). Signed-off-by: Spencer Smith <spencer.smith@talos-systems.com>	2023-08-03 10:21:56 -04:00
Andrey Smirnov	793dcedc95	fix: fast-wipe the system disk on talosctl reset Fixes #7558 I see no reason to keep old behavior (removing all partitions on the disk), as it's only compatible with Talos itself. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-08-03 16:28:59 +04:00
Andrey Smirnov	87fe8f1a2a	feat: implement image generation profiles Support full configuration for image generation, including image outputs, support most features (where applicable) for all image output types, unify image generation process. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-08-02 19:13:44 +04:00
Andrei Kvapil	10f958cf41	feat: network configuration improvements on the NoCloud platform * support for bonding * added interface selection by MAC address * added routes management Signed-off-by: Andrei Kvapil <kvapss@gmail.com> Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-08-02 15:03:33 +04:00
Dmitriy Matrenichev	abf3831174	chore: remove `cpu_manager_state` on `cpuManagerPolicy` change After we closed `kubelet`, remove `/var/lib/kubelet/cpu_manager_state` if there are any changes in `cpuManagerPolicy`. We do not add any other safeguards, so it's user responsibility to cordon/drain the node in advance. Also minor fixes in other files. Closes #7504 Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>	2023-08-01 18:53:04 +03:00
Noel Georgi	68e6b98f7d	feat: add security state resource Add security state resource that describes the state of Talos SecureBoot and PCR signing key fingerprints. The UKI fingerprint is currently not populated. Fixes: #7514 Signed-off-by: Noel Georgi <git@frezbo.dev>	2023-07-31 22:02:08 +05:30
Andrey Smirnov	a17272cdda	chore: update hcloud API SDK to v2 There are no functional changes, but SDK got updated to handle int -> int64 changes. v1 version is only supported to Sep 2023. See https://github.com/hetznercloud/hcloud-go#support Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-07-28 19:00:10 +04:00
Andrey Smirnov	6d71bb8df2	refactor: replace google/gopacket with gopacket/gopacket This new fork seems to be more active. The change itself doesn't fix any memory allocation, but I submitted a PR for gopacket/gopacket: https://github.com/gopacket/gopacket/pull/24 Also fix crazy alloc in `tui/components` (this is only relevant for `talosctl`). Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-07-28 17:34:15 +04:00
Andrey Smirnov	846f37d84c	refactor: drop dependency on vmware/govmomi This module was imported just for a single Go struct (for XML unmarshalling), and it could be easily internalized. The module causes significant allocation on startup: ``` init github.com/vmware/govmomi/vim25/types @23 ms, 1.4 ms clock, 1269864 bytes, 196 allocs ``` Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-07-28 16:49:34 +04:00
Andrey Smirnov	ca0b32c514	refactor: update AWS SDK and http-getter to v2 versions Both are much modular and pull in much less dependendencies in to the Talos tree. This solves the problem with allocations in AWS endpoints on import, and removes a bunch of dependencies. Raw binary size: -10 MiB. Memory usage (not scientific): around -5 MiB for all Talos services. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-07-28 15:30:02 +04:00
Noel Georgi	a3a2aa8ef3	fix: use fast wipe for upgrade As part of bootloader refactoring `go-blockdevice` was used for wiping partitions in #7329, but used standard wipe which could be fast/slow depending on the blockdevice support. Switch to using fast-wipe for partitions. This should not affect `wipe` option in machineconfig. Fixes: #7531 Signed-off-by: Noel Georgi <git@frezbo.dev>	2023-07-27 21:09:06 +05:30
Utku Ozdemir	355681ddab	fix: terminate dashboard gracefully on & switch back to tty1 - Make dashboard SIGTERM-aware - Handle panics on dashboard and terminate it gracefully, so it resets the terminal properly - Switch to TTY2 when it starts and back to TTY1 when it stops. Closes siderolabs/talos#7516. Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>	2023-07-27 16:00:23 +02:00
Andrey Smirnov	544cb4fe7d	refactor: accept partial machine configuration This refactors code to handle partial machine config - only multi-doc without v1alpha1 config. This uses improvements from https://github.com/cosi-project/runtime/pull/300: * where possible, use `TransformController` * use integrated tracker to reduce boilerplate Sometimes fix/rewrite tests where applicable. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-07-27 17:00:42 +04:00
Andrey Smirnov	786e86f5b8	refactor: rewrite the way Talos acquires the machine configuration Fixes #7453 The goal is to make it possible to load some multi-doc configuration from the platform source (or persisted in STATE) before machine acquires full configuration. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-07-24 14:26:42 +04:00
Andrey Smirnov	9ef4e5efca	fix: log explicitly when kubelet has no nodeIP match Fixes #7487 When `.kubelet.nodeIP` filters yield no match, Talos should not start the kubelet, as using empty address list results in `--node-ip=` empty kubelet arg, which makes kubelet pick up "the first" address. Instead, skip updating (creating) the nodeIP and log an explicit warning. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-07-20 00:41:47 +04:00
Andrey Smirnov	6b39c6a4d3	fix: enable compression and bump gRPC max msg size Fixes #7482 Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-07-19 22:46:37 +04:00
Noel Georgi	2f2eca8617	chore: basic support for shutdown/poweroff flags This adds basic support for shutdown/poweroff flags. it can distringuish between halt/shutdown/reboot. In the case of Talos halt/shutdown is same op. Signed-off-by: Noel Georgi <git@frezbo.dev>	2023-07-19 23:35:32 +05:30
Noel Georgi	59d7d9344b	chore: use machined for `shutdown`, `poweroff` Use the `machined` socket for `shutdown` and `poweroff` aliases. This ensures that worker nodes does not have to wait on apid to start. Signed-off-by: Noel Georgi <git@frezbo.dev>	2023-07-19 21:48:15 +05:30
Dmitriy Matrenichev	2439bfb719	chore: explicitly add timestamps to machined logs We can safely do it on `io.Writer` level, since `log.Logger.Output` (called by `Print\|Printf`) pretty much promises that every call to `Write` ends with `\n`. Closes #7439 Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>	2023-07-19 18:29:17 +03:00
Noel Georgi	14966e718a	fix: skip over tpm2 1.2 devices For rng seed and pcr extend, let's ignore if the device is not TPM2.0 based. Seal/Unseal operations would still error out since it's explicitly user enabled feature. Signed-off-by: Noel Georgi <git@frezbo.dev>	2023-07-18 12:58:45 +05:30
Noel Georgi	166d75fe88	fix: tpm2 encrypt/decrypt flow The previous flow was using TPM PCR 11 values to bound the policy which means TPM cannot unseal when UKI changes. Now it's fixed to use PCR 7 which is bound to the SecureBoot state (SecureBoot status and Certificates). This provides a full chain of trust bound to SecureBoot state and signed PCR signature. Also the code has been refactored to use PolicyCalculator from the TPM library. Signed-off-by: Noel Georgi <git@frezbo.dev>	2023-07-14 23:58:59 +05:30
Dmitriy Matrenichev	5f34f5b41f	chore: rename api load balancer to KubePrism Closes #7432 Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>	2023-07-14 15:23:53 +03:00
Andrey Smirnov	c8b7095c01	refactor: use tpm2 library to calculate policy hash No real change, just using library to do the work (should be more readable). Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-07-14 15:07:47 +04:00
Andrey Smirnov	53873b8444	refactor: move ukify into Talos code This is intemediate step to move parts of the `ukify` down to the main Talos source tree, and call it from `talosctl` binary. The next step will be to integrate it into the imager and move `.uki` build out of the Dockerfile. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-07-13 19:14:32 +04:00
Noel Georgi	79365d9bac	feat: tpm2 based disk encryption Support disk encryption using tpm2 and pre-calculated signed PCR values. Fixes: #7266 Signed-off-by: Noel Georgi <git@frezbo.dev>	2023-07-12 20:41:28 +05:30
Andrey Smirnov	06369e8195	fix: retry CRI pod removal, fix upgrade flow in the tests It seems that CRI has a bit of eventual consistency, and it might fail to remove a stopped pod failing that it's still running. Rewrite the upgrade API call in the upgrade test to actually wait for the upgrade to be successful, and fail immediately if it's not successful. This should improve the test stability and it should make it easier to find issues immediately. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-07-12 16:20:10 +04:00
Andrey Smirnov	8017afb107	feat: implement CRI image management and pre-pull on K8s upgrade Fixes #6391 Implement a set of APIs and commands to manage images in the CRI, and pre-pull images on Kubernetes upgrades. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-07-11 19:25:10 +04:00
Andrey Smirnov	1c2f19b367	feat: update Kubernetes to 1.28.0-alpha.4 The Go modules were not tagged for alpha.4, so using alpha.3 tag. Talos 1.5 will ship with Kubernetes 1.28.0. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-07-11 15:40:24 +04:00
Artem Chernyshev	936111ce06	fix: properly set up tls for KMS endpoint The condition was inverted 🤦 Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>	2023-07-10 21:10:02 +03:00
Artem Chernyshev	cb226eec46	fix: rewrite encryption system information flow Pass getter to the key handler instead of already fetched node uuid. Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>	2023-07-10 19:07:46 +03:00
Andrey Smirnov	bd4f89f633	fix: disable dashboard on Azure, GCP and Scaleway Fixes #7416 These platforms don't have video console access. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-07-10 17:05:56 +04:00
Andrey Smirnov	bdb96189fa	refactor: make maintenance service controller-based Fixes #7430 Introduce a set of resources which look similar to other API implementations: CA, certs, cert SANs, etc. Introduce a controller which manages the service based on resource state. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-07-10 15:41:52 +04:00
Andrey Smirnov	d23d04de2a	feat: seed the kernel random pool from the TPM Use the TPM2 feature to provide high-quality random bytes. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-07-07 23:51:11 +04:00
LukasAuerbeck	c81ce8cfb0	feat: support controlplane resources configuration Fixes #7379 Add possibility to configure the controlplane static pod resources via APIServer, ControllerManager and Scheduler configs. Signed-off-by: LukasAuerbeck <17929465+LukasAuerbeck@users.noreply.github.com> Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-07-07 22:44:56 +04:00
Andrey Smirnov	74de562b29	fix: mount hugepages with nosuid + nodev Fixes #7445 Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-07-07 21:57:19 +04:00
Artem Chernyshev	ce63abb219	feat: add KMS assisted encryption key handler Talos now supports new type of encryption keys which rely on Sealing/Unsealing randomly generated bytes with a KMS server: ``` systemDiskEncryption: ephemeral: keys: - kms: endpoint: https://1.2.3.4:443 slot: 0 ``` gRPC API definitions and a simple reference implementation of the KMS server can be found in this [repository](https://github.com/siderolabs/kms-client/blob/main/cmd/kms-server/main.go). Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>	2023-07-07 19:02:39 +03:00
Andrey Smirnov	6be5a13d5d	feat: implement machine config documents for event and log streaming Fixes #7228 Add some changes to make Talos accept partial machine configuration without main v1alpha1 config. With this change, it's possible to connect a machine already running with machine configuration (v1alpha1), the following patch will connect to a local SideroLink endpoint: ```yaml apiVersion: v1alpha1 kind: SideroLinkConfig apiUrl: grpc://172.20.0.1:4000/?jointoken=foo --- apiVersion: v1alpha1 kind: KmsgLogConfig name: apiSink url: tcp://[fdae:41e4:649b:9303::1]:4001/ --- apiVersion: v1alpha1 kind: EventSinkConfig endpoint: "[fdae:41e4:649b:9303::1]:8080" ``` Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-07-01 00:22:44 +04:00
James Callahan	c02ada7d95	fix: capabilities including `ALL` should be uppercase Pod security standard requires that ALL is in caps Signed-off-by: James Callahan <james@wavesquid.com> Signed-off-by: Noel Georgi <git@frezbo.dev>	2023-06-29 12:58:30 +05:30
Noel Georgi	cbdf96d461	feat: support environment file for extensions Supports setting `environmentFile` for Talos System Extension Services. Fixes: #7316 Signed-off-by: Noel Georgi <git@frezbo.dev>	2023-06-28 00:21:13 +05:30
Andrey Smirnov	35d6adcb9a	fix: provide stashed META values before installation Previously, if META values were supplied to the Talos ISO via environment variable, they will be written down and available after the install. With this fix, values are also readable and available before the installation runs (in maintenance mode). Most of the PR is refactoring `meta.Value(s)` to be a shared library which is used by the installer/imager and (now) Talos. Also fixes an issue with not returning properly `NotExist` error when META is not yet available as a partition on disk. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-06-27 20:57:43 +04:00
Noel Georgi	bc371ecfda	chore: add `/sbin/shutdown` Some tools like qemu-guest-agent when ran as a extension service calls `/sbin/shutdown` instead of `/sbin/poweroff`. This adds handling for the same. Ref: https://github.com/siderolabs/extensions/pull/173 Signed-off-by: Noel Georgi <git@frezbo.dev>	2023-06-27 16:10:51 +05:30
Utku Ozdemir	0d313b9733	feat: add `reboot-mode` flag to `talosctl upgrade` Allow specifying the reboot mode during upgrades by introducing `--reboot-mode` flag, similar to the `--mode` flag of the reboot command. Closes siderolabs/talos#7302. Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>	2023-06-26 17:37:19 +02:00
Markus Reiter	7ce87f20c3	fix: compare only basename of `os.Args[0]` in machined This makes handling of `exec` more flexible. Signed-off-by: Markus Reiter <me@reitermark.us> Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-06-26 17:42:30 +04:00
Noel Georgi	8daf432b29	chore: bump deps Bump deps. Signed-off-by: Noel Georgi <git@frezbo.dev>	2023-06-22 22:41:08 +05:30
Noel Georgi	e3f3f5794d	feat: implement revert for sd-boot Implement revert for sd-boot. Signed-off-by: Noel Georgi <git@frezbo.dev>	2023-06-22 20:20:31 +05:30
Andrey Smirnov	fe0f46980f	feat: implement secure boot from disk This includes sd-boot handling, EFI variables, etc. There are some TODOs which need to be addressed to make things smooth. Install to disk, upgrades work. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-06-16 20:15:16 +05:30
Dmitriy Matrenichev	445f5ad542	feat: support API server load balancer This commit adds support for API load balancer. Quick way to enable it is during cluster creation using new `api-server-balancer-port` flag (0 by default - disabled). When enabled all API request will be routed across cluster control plane endpoints. Closes #7191 Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>	2023-06-16 10:09:20 -04:00
Andrey Smirnov	19bc223de8	refactor: bootloader interface, labels Move labels out of the bootloader interface, while moving copying assets into the bootloader interface. GRUB is using one set of assets, `sd-boot` will be using another one. Fix the problem with `bootloader.Probe()` finding boot partition on the host when it runs in a priv container, fixing issues with image creation in the CI. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-06-14 17:33:11 +04:00
Dmitriy Matrenichev	665702ddd3	chore: fix cilium e2e tests `WITH_CONFIG_PATCH_WORKER` check result was overriding any value set in `CONFIG_PATCH_FLAG` variable. Move it to the different variable. Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com> Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-06-14 15:08:31 +04:00
Noel Georgi	71a548d180	chore: generic boootloader implementation This changes the bootloader code to be generic to support multiple bootloader implementations. Signed-off-by: Noel Georgi <git@frezbo.dev> Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-06-13 23:36:20 +04:00
Andrey Smirnov	e9dbc9311b	test: bump versions for upgrade tests As we're getting to 1.5.0, bump versions for upgrade tests. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-06-13 23:17:22 +04:00
Andrey Smirnov	0a99965efb	refactor: replace `uncordonNode` with controllers Fixes #7233 Waiting for node readiness now happens in the `MachineStatus` controller which won't mark the node as ready until Kubernetes `Node` is ready. Handling cordoning/uncordining happens with help of additional resource in `NodeApplyController`. New controller provides reactive `NodeStatus` resource to see current status of Kubernetes `Node`. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-06-13 21:48:42 +04:00
Dmitriy Matrenichev	c74d937280	chore: bump github.com/cosi-project/runtime Replace resource.Resource with meta.ResourceWithRD where possible. Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>	2023-06-12 09:49:08 -04:00
Andrey Smirnov	dbaf5c6997	refactor: task `labelControlPlane` into controllers See #7233 The controlplane label is simply injected into existing controller-based node label flow. For controlplane taint default NoScheduleTaint, additional controller & resource was implemented to handle node taints. This also fixes a problem with `allowSchedulingOnControlPlanes` not being reactive to config changes - now it is. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-06-12 15:25:13 +04:00
Dmitriy Matrenichev	3816318b9e	chore: wrap config.Provider in atomic wrapper Because `SetConfig` can be called concurrently with `Config` there is risk of data race, if something goes wrong. Since `config.Provider` is an interface type, it means its size is two machine words. And so in very unpleasant situations it can lead to arbitrary RCE, because interface variable can be in partially updated state. Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>	2023-06-09 15:05:39 -04:00
Andrey Smirnov	f5e3272fce	refactor: task 'updateBootLoader' as controller Fixes #7232 Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-06-09 15:27:48 +04:00
Andrey Smirnov	e7be6ee7c3	refactor: make event log streaming fully reactive I ended up completely rewriting the controller, simplifying the flow (somewhat) so that there's just a single control flow in the controller, while reading from v1alpha1 events is converted to reading from a channel. Fixes #7227 Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-06-08 23:13:33 +04:00
Andrey Smirnov	c719aa2316	fix: allow http:// for discovery service URL Fixes #7333 Also fixed the discovery service controller to reconnect the client on config changes (previously it wasn't reactive on e.g. URL changes). Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-06-08 20:28:12 +04:00
Andrey Smirnov	aac441f618	chore: update Go to 1.20.5, bump dependencies Go dependencies, new pkgs, extras, etc. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-06-07 23:40:59 +04:00
Noel Georgi	1c0c7933df	chore: cleanup partition code Cleanup partition code to be explicit about `Format` and `Partition` options. Signed-off-by: Noel Georgi <git@frezbo.dev>	2023-06-08 00:35:09 +05:30
Noel Georgi	e912c0dfcf	chore: use go-blockdevice for zeroing partitions Use the `go-blockdevice` library to zero partitions. Also added a test that writes `ones` to the partition and verifies its zeroes after zeroing it. Signed-off-by: Noel Georgi <git@frezbo.dev>	2023-06-07 01:12:11 +05:30
Noel Georgi	47986cb79e	chore: unify kexec phase This changes the mounting/unmounting of `BOOT` partiton code into `kexecPrepare` phase. Also skips if `BOOT` partition cannot be found. Signed-off-by: Noel Georgi <git@frezbo.dev>	2023-06-06 20:30:59 +05:30
Andrey Smirnov	5dab45e869	refactor: allow kmsg log streaming to be reconfigured on the fly Fixes #7226 This follows same flow as other similar changes - split out logging configuration as a separate resource, source it for now in the cmdline. Rewrite the controller to allow multiple log outputs, add send retries. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-06-06 15:56:24 +04:00
Dmitriy Matrenichev	8a02ecd4cb	chore: add endpoints balancer controller This PR adds support for creating a list of API endpoints (each is pair of host and port). It gets them from - Machine config cluster endpoint. - Localhost with LocalAPIServerPort if machine is control panel. - netip.Addr[0] and port from affiliates if they are control panels. For #7191 Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>	2023-06-05 20:47:52 -04:00
Andrey Smirnov	bab484a405	feat: use stable network interface names Use `udevd` rules to create stable interface names. Link controllers should wait for `udevd` to settle down, otherwise link rename will fail (interface should not be UP). Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-06-01 21:29:12 +04:00
Utku Ozdemir	196dfb99b0	fix: do not probe kernel args in dashboard if not needed If the dashboard is run without the "Config URL" screen, do not initialize it, and do not probe the kernel args for the code parameter. Refactor the dashboard to do not construct the unused screens at all. Closes siderolabs/talos#7300. Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>	2023-06-01 10:32:43 +02:00
Andrey Smirnov	8c071b5796	fix: skip DHCP RENEW if server IP in the lease is all zeroes RENEW packets are sent unicast, so Talos needs the address of the DHCP server to send RENEW packets to. Fixes #7211 Fixes #7263 Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-05-31 22:51:05 +04:00
Andrey Smirnov	badbc51e63	refactor: rewrite code to include preliminary support for multi-doc `config.Container` implements a multi-doc container which implements both `Container` interface (encoding, validation, etc.), and `Conifg` interface (accessing parts of the config). Refactor `generate` and `bundle` packages to support multi-doc, and provide backwards compatibility. Implement a first (mostly example) machine config document for SideroLink API URL. Many places don't properly support multi-doc yet (e.g. config patches). Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-05-31 18:38:05 +04:00
Andrey Smirnov	a0773f783c	chore: add ukify Go script This is a port of ukify.py and systemd-measure from systemd. This requires no actual TPM to be present to calculate the PCR signatures. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com> Signed-off-by: Noel Georgi <git@frezbo.dev>	2023-05-30 23:33:26 +05:30
Andrey Smirnov	dc6764871c	refactor: move around config interfaces, make RawV1Alpha1 typed See #7230 Refactor more config interfaces, move config accessor interfaces to different package to break the dependency loop. Make `.RawV1Alpha1()` method typed to avoid type assertions everywhere. No functional changes. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-05-23 22:08:58 +04:00
Andrey Smirnov	0bb7e8a5cf	refactor: split config.Provider into Config & Container See #7230 This is a step towards preparing for multi-doc config. Split the `config.Provider` interface into parts which have different implementation: * `config.Config` accesses the config itself, it might be implemented by `v1alpha1.Config` for example * `config.Container` will be a set of config documents, which implement validation, encoding, etc. `Version()` method dropped, as it makes little sense and it was almost not used. `Raw()` method renamed to `RawV1Alpha1()` to support legacy direct access to `v1alpha1.Config`, next PR will refactor more to make it return proper type. There will be many more changes coming up. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-05-23 16:05:16 +04:00
Andrey Smirnov	ff11fd39c7	fix: race with `udevd` and `mountUserDisks` Fixes #7246 The problem was that `udevd` watches via `inotify` any attempts to open blockdevices with 'write' access. Talos was opening with write access, but actually accessing as read-only, so the fix is to open as read-only. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-05-19 22:02:48 +04:00
Andrey Smirnov	10155c390e	feat: enable xfs project quota support, kubelet feature This is controlled with a feature flag which gets enabled automatically for Talos 1.5+. Fixes #7181 If enabled, configures kubelet to use project quotas to track xfs volume usage, which is much more efficient than doing `du` periodically. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-05-19 20:33:39 +04:00
Andrey Smirnov	dd8336c9ee	fix: refresh kubelet self-issued serving certificates Kubelet doesn't refresh self-issued serving certificates, so force it by removing the cert on each restart. Fix the code which was forcing rejoin when the nodename changes, it was broken, as it was checking serving certificate instead of client certificate. It worked by accident when not using controlplane-issued serving certificates. Fixes #7235 Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-05-18 22:19:34 +04:00
Andrey Smirnov	bb02dd263c	chore: drop deprecated stuff for Talos 1.5 * drop old resources API, which was deprecated long time ago * use bootstrapped event in `talosctl get --watch` to better align columns in the table output Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-05-18 19:46:37 +04:00
Dmitriy Matrenichev	61cad86731	chore: bump deps - github.com/containerd/typeurl to v2.1.1 - github.com/aws/aws-sdk-go to v1.44.264 - alpine to 3.18.0 - node to 20.2.0-alpine - github.com/containernetworking/plugins to v1.3.0 - github.com/docker/docker to v23.0.6+incompatible - github.com/hetznercloud/hcloud-go to v1.45.1 - github.com/insomniacslk/dhcp to v0.0.0-20230516061539-49801966e6cb - github.com/rivo/tview to v0.0.0-20230511053024-822bd067b165 - tools to v1.5.0-alpha.0-7-gd2dde48 - pkgs to v1.5.0-alpha.0-16-g7958db1 Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>	2023-05-18 01:07:36 -04:00
Dmitriy Matrenichev	97fffaf78a	chore: use ctest.UpdateWithConflicts instead of plain UpdateWithConflicts More type-safety. Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>	2023-05-12 20:39:32 -04:00
Dmitriy Matrenichev	45e6e27af7	chore: bump runtime Use new functions and methods from runtime module. Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>	2023-05-11 17:18:08 -04:00
Noel Georgi	4f720d4653	fix: revert: set rlimit explicitly in wrapperd This reverts commit `a2565f6741`. The fix done in `a2565f67`, was actually a no-op caused by the misunderstanding the fix done in Go and backported to [Go 1.20.4](`ecf7e00db8`). The fix gave a false confidence that it was working when it was tested against Talos `main` branch since the PR #7190 bumped `x/sys` package from [v0.7.0 -> v0.8.0](`ecf7e00db8`), the actual change in `x/sys` can be found here at `ff18efa0a3` which meant that when updating Go to 1.20.4 the `x/sys` package should been updated too. The `x/sys` package changed how the syscall to set the rlimit was called, it got moved into the Go stdlib instead of calling rlimit syscall in the `x/sys` package, which meant a combination of using Go 1.20.4 and an older `x/sys` package means `RLIMIT_NOFILE` value would not be set back to the original value. The Talos 1.4 release branch currently have `x/sys` at [v0.7.0(https://github.com/siderolabs/talos/blob/v1.4.3/go.mod#L133), so the backport would consist of this change along another commit bumping `x/sys` package to `v0.8.0`. Fixes: #7198 Fixes: #7206 Co-authored-by: Utku Ozdemir <utku.ozdemir@siderolabs.com> Signed-off-by: Noel Georgi <git@frezbo.dev>	2023-05-11 23:38:20 +05:30
Noel Georgi	a2565f6741	fix: set rlimit explicitly in wrapperd Now Go only sets the rlimit for the parent and any fork/exec'ed process gets the rlimit that was the default before fork/exec. Ref: https://github.com/golang/go/issues/46279 This fix got backported to [Go 1.20.4](`ecf7e00db8`) breaking Talos. Talos used to set rlimit in the [`SetRLimit`](https://github.com/siderolabs/talos/blob/v1.4.2/internal/app/machined/pkg/runtime/v1alpha1/v1alpha1_sequencer_tasks.go#L302) sequencer task. This means any process started by `wrapperd` gets the default Rlimit (1024). Fix this by explicitly setting `rlimit` in `wrapperd` before we drop any capabilities. Fixes: #7198 Signed-off-by: Noel Georgi <git@frezbo.dev>	2023-05-10 00:10:17 +05:30
Andrey Smirnov	55ae59a0ad	fix: properly skip/cleanup controlplane configs for workers This bug is pretty cosmetic, but it shows up as a wrong check when performing worker upgrade - Talos pretends it checks e.g. kube-apiserver version which doesn't make sense for workers. There were two bugs in the code: * check for machine type was done against `TypeWorker`, while `MachineType` resource is initially created as `TypeUnknown` * the cleanup code was not implemented As I touched the code, I updated controller and tests to use modern conventions. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-05-05 23:17:27 +04:00
Utku Ozdemir	62c6e9655c	feat: introduce siderolink config resource & reconnect Introduce a new resource, `SiderolinkConfig`, to store SideroLink connection configuration (api endpoint for now). Introduce a controller for this resource which populates it from the Kernel cmdline. Rework the SideroLink `ManagerController` to take this new resource as input and reconfigure the link on changes. Additionally, if the siderolink connection is lost, reconnect to it and reconfigure the links/addresses. Closes siderolabs/talos#7142, siderolabs/talos#7143. Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>	2023-05-05 17:04:34 +02:00
Andrey Smirnov	860002c735	fix: don't reload control plane pods on cert SANs changes Fixes #7159 The change looks big, but it's actually pretty simple inside: the static pods had an annotation which tracks a version of the secrets which forced control plane pods to reload on a change. At the same time `kube-apiserver` can reload certificate inputs automatically from files without restart. So the inputs were split: the dynamic (for kube-apiserver) inputs don't need to be reloaded, so its version is not tracked in static pod annotation, so they don't cause a reload. The previous non-dynamic resource still causes a reload, but it doesn't get updated when e.g. node addresses change. There might be many more refactoring done, the resource chain is a bit of a mess there, but I wanted to keep number of changes minimal to keep this backportable. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-05-05 16:59:09 +04:00
Andrey Smirnov	d43c61e80f	fix: enforce nolock option for all NFS mounts by default Talos doesn't have `rpc.statsd` running, so mounting without locking is the only option. Some places in Kubernetes don't allow to set mount options for NFS, so setting defaults is the only way. Fixes #6582 Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-05-04 17:26:36 +04:00
Niklas Wik	339986db9d	fix: inhibit timer to follow kubelet timer Ensure to wait as long as possibly given to kubelet shutdown timers. Related to fix of siderolabs#7138 Signed-off-by: Niklas Wik <niklas.wik@nokia.com> Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-05-04 15:08:56 +04:00
Andrey Smirnov	cbf6dc1009	fix: set timeout for unmount calls Fixes #7137 The `umount` syscall might hang "forever" if the underlying network filesystem endpoint is down. To be on the safe side, add a timeout around unmount operations, and try to umount with force as a last resort. Sample log: ``` 14795.458779] [talos] task unmountPodMounts (2/2): unmounting /var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com/dbe8d7f58e21d06cbef1ae0849317661eba4e82776722e7db5c65194ad73e916/globalmount/0001-0009-rook-ceph-0000000000000001-1051beb3-8d7a-4291-bf45-5711c13523d1 [14795.459797] [talos] task unmountPodMounts (2/2): unmounting /var/lib/kubelet/pods/f3f4d789-7f48-4dd9-9ef5-649b002c8f9c/volumes/kubernetes.io~csi/pvc-a4e72749-a8a1-43d9-9152-5bc1f757c924/mount [14795.460555] EXT4-fs (rbd0): unmounting filesystem. [14813.461319] [talos] task unmountPodMounts (2/2): unmounting /var/lib/kubelet/pods/f3f4d789-7f48-4dd9-9ef5-649b002c8f9c/volumes/kubernetes.io~csi/pvc-a4e72749-a8a1-43d9-9152-5bc1f757c924/mount is taking longer than expected, still waiting for 1m11.999162834s [14831.460813] [talos] task unmountPodMounts (2/2): unmounting /var/lib/kubelet/pods/f3f4d789-7f48-4dd9-9ef5-649b002c8f9c/volumes/kubernetes.io~csi/pvc-a4e72749-a8a1-43d9-9152-5bc1f757c924/mount is taking longer than expected, still waiting for 53.999567033s [14849.461336] [talos] task unmountPodMounts (2/2): unmounting /var/lib/kubelet/pods/f3f4d789-7f48-4dd9-9ef5-649b002c8f9c/volumes/kubernetes.io~csi/pvc-a4e72749-a8a1-43d9-9152-5bc1f757c924/mount is taking longer than expected, still waiting for 35.998979117s [14867.460748] [talos] task unmountPodMounts (2/2): unmounting /var/lib/kubelet/pods/f3f4d789-7f48-4dd9-9ef5-649b002c8f9c/volumes/kubernetes.io~csi/pvc-a4e72749-a8a1-43d9-9152-5bc1f757c924/mount is taking longer than expected, still waiting for 17.999502128s [14885.461123] [talos] task unmountPodMounts (2/2): unmounting /var/lib/kubelet/pods/f3f4d789-7f48-4dd9-9ef5-649b002c8f9c/volumes/kubernetes.io~csi/pvc-a4e72749-a8a1-43d9-9152-5bc1f757c924/mount with force [14885.462395] [talos] ignoring unmount error /var/lib/kubelet/pods/f3f4d789-7f48-4dd9-9ef5-649b002c8f9c/volumes/kubernetes.io~csi/pvc-a4e72749-a8a1-43d9-9152-5bc1f757c924/mount: invalid argument [14885.463529] [talos] task unmountPodMounts (2/2): unmounting /var/run/netns/cni-0888dc71-ba9e-af8a-d322-074f654561e5 [14885.464267] [talos] task unmountPodMounts (2/2): done, 1m30.028862262s ``` Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-05-03 23:32:23 +04:00
Andrey Smirnov	b58f913d5f	fix: set the static pod priority as values API server takes care of setting priority for "regular" pods from priorityClassName, but nothing does that for static pods, so we have to specify the priotity explicitly for static pods. This fixes the graceful node shutdown (kubelet) to stop non-critical pods before the api-server and friends (critical pods). Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-05-02 20:20:20 +04:00
Thomas Perronin	7442ff8b09	chore: fix typos inteface -> interface (docs and tests) Fix typos. Signed-off-by: Thomas Perronin <gecko.splinter@gmail.com> Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-05-01 16:15:08 +04:00
Andrey Smirnov	344746ae2f	fix: bump max inhibit delay to 20 min Fixes #7138 This brings max shutdown period to 20 min that kubelet would accept. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-04-27 17:08:04 +04:00
Noel Georgi	014008ea25	fix: udevd rules trigger Fix udevd not triggering rules properly. This also fixes an issue with go-blockdevice not resolving symlinks. Fixes: #7117 Signed-off-by: Noel Georgi <git@frezbo.dev>	2023-04-26 19:44:05 +05:30
Andrey Smirnov	08ec66c55c	feat: clean up (garbage collect) system images which are not referenced Fixes #7121 Talos pulls some images on its own (without CRI/kubelet) to the `system` namespace of the CRI containerd. These images are not visible to the CRI/kubelet, so we need to clean them up manually. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-04-26 16:47:49 +04:00
Utku Ozdemir	b097efcde2	fix: display correct number of machines on dashboard Rename members to machines to be clearer. Display the correct member count. Closes siderolabs/talos#7127. Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>	2023-04-26 12:14:56 +02:00
Noel Georgi	cad43f0ad3	chore: remove k8s master label Since talos now defaults to k8s 1.27, remove the handling of `master` label for controlplane nodes. Signed-off-by: Noel Georgi <git@frezbo.dev>	2023-04-25 20:48:05 +05:30
Utku Ozdemir	103f0ffdd3	feat: add startup probes to controller-manager and scheduler Add startup probes that probe the containers for 60 seconds before switching to liveness probes. Closes siderolabs/talos#7054. Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>	2023-04-25 15:39:46 +02:00
Utku Ozdemir	2d824b5639	fix: do not show control plane status for workers on dashboard Hide kube-apiserver, kube-controller-manager and kube-scheduler statuses on the dashboard for the worker nodes, instead of showing them as n/a. Also display the cluster name as n/a for workers (instead of an empty string), as that information is not available to them. Closes siderolabs/talos#7103. Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>	2023-04-21 11:57:32 +02:00
Utku Ozdemir	e1d38b6feb	feat: show template URL in dashboard config URL tab Show the config URL template that will be populated when the code is entered. Closes siderolabs/talos#7092. Clear the form when the tab is exited & do not display "Saved successfully" message when the code is saved, as we navigate to the summary tab afterward anyway. Closes siderolabs/talos#7093. Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>	2023-04-19 11:27:47 +02:00
Andrey Smirnov	8689bef5f1	docs: update documentation for Talos 1.4 Updated documentation, what's new, etc. Also fix some minor UI issues in the dashboard. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-04-18 15:09:55 +04:00
Utku Ozdemir	f14928b0a9	fix: fix dashboard crash when a non-existent node is specified Prevent dashboard from crashing when a dead/non-existent node is specified on `talosctl --nodes`. Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>	2023-04-13 16:46:23 +02:00
Andrey Smirnov	3cd1c6bb0b	fix: send 'STOP' event on phase end Previously 'START' was sent for both start and finish. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-04-10 17:56:56 +04:00
Andrey Smirnov	2c55550a66	fix: quote ISO kernel args for GRUB Use GRUB quoting function to the kernel args passed to Talos. This fixes passing `${variable}` to `talos.config=` kernel argument. Also fix a problem with `ONBUILD` being exected for `imager` image. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-04-07 12:29:49 +04:00
Andrey Smirnov	170f73899a	fix: correctly parse static pod phase The problem was that 'Succeeded' pod was treated as 'not ready', so that `MachineStatus` never reached readiness state. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-04-05 18:08:20 +04:00
Andrey Smirnov	eb01edbc8a	fix: rework DHCP flow Fixes #7041 Rework the DHCP flow so that we don't use `INFORM` requests anymore. The idea is to try requesting a hostname from the DHCP server first, and if the hostname is not send, or it gets overridden in Talos, restart the DHCP sequence sending the hostname to the DHCP server. This still avoids sending and requesting a hostname in one request. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-04-03 17:30:30 +04:00
Thomas Way	7ffabe0f14	feat: support network bond device selectors Fixes https://github.com/siderolabs/talos/issues/6756 Signed-off-by: Thomas Way <thomas@6f.io> Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-03-31 20:29:20 +04:00
Utku Ozdemir	cbab12e3a1	refactor: rename outbound to connectivity on dashboard Rename to be consistent between the `networkstatus` resource and the dashboard. Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>	2023-03-31 15:31:35 +02:00
Artem Chernyshev	07c3c5d59e	feat: return disk subsystem in the `Disks API` Fixes: https://github.com/siderolabs/talos/issues/7017 Should allow external services to detect which user block devices might need to be wiped during reset. Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>	2023-03-31 16:10:59 +03:00
Andrey Smirnov	aa14993539	feat: introduce network probes Network probes are configured with the specs, and provide their output as a status. At the moment only platform code can configure network probes. If any network probes are configured, they affect network.Status 'Connectivity' flag. Example, create the probe: ``` talosctl -n 172.20.0.3 meta write 0xa '{"probes": [{"interval": "1s", "tcp": {"endpoint": "google.com:80", "timeout": "10s"}}]}' ``` Watch probe status: ``` $ talosctl -n 172.20.0.3 get probe NODE NAMESPACE TYPE ID VERSION SUCCESS 172.20.0.3 network ProbeStatus tcp:google.com:80 5 true ``` With failing probes: ``` $ talosctl -n 172.20.0.3 get probe NODE NAMESPACE TYPE ID VERSION SUCCESS 172.20.0.3 network ProbeStatus tcp:google.com:80 4 true 172.20.0.3 network ProbeStatus tcp:google.com:81 1 false $ talosctl -n 172.20.0.3 get networkstatus NODE NAMESPACE TYPE ID VERSION ADDRESS CONNECTIVITY HOSTNAME ETC 172.20.0.3 network NetworkStatus status 5 true true true true ``` Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-03-31 15:20:21 +04:00
Utku Ozdemir	7967ccfc13	feat: add config code entry screen to dashboard Implement a screen for entering/managing the config `${code}` variable. Enable this screen only when the platform is `metal` and there is a `${code}` variable in the `talos.config` kernel cmdline URL query. Additionally, remove the "Delete" button and its functionality from the network config screen to avoid users accidentally deleting PlatformNetworkConfig parts that are not managed by the dashboard. Add some tests for the form data parsing on the network config screen. Remove the unnecessary lock on the summary tab - all updates come from the same goroutine. Closes siderolabs/talos#6993. Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>	2023-03-31 10:33:28 +02:00
Noel Georgi	ddb014cfdc	fix: udevd rules trigger Restart udevd on adding custom rules where in the case the subsystems needs to be re-triggered. Fixes: #7001 Signed-off-by: Noel Georgi <git@frezbo.dev>	2023-03-31 01:59:07 +05:30
Nico Berlee	0af8fe2fb5	feat: netstat pod support talosctl netstat -k show all host and non-hostnetwork pods sockets/connections. talosctl netstat namespace/pod shows sockets/connections of a specific pod + autocompletes in the shell. Signed-off-by: Nico Berlee <nico.berlee@on2it.net> Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-03-30 23:39:38 +04:00
Utku Ozdemir	aa662ff635	fix: apply small fixes on dashboard * Clear the input form and switch to summary tab after the network config is saved. * Use nodeaddress resource for detecting and displaying IPs. Improve the IP filtering logic. * Fix the logic of gateway detection. Display all gateways instead of a single one. * Use hostnamestatus resource to detect the hostname instead of an API call. * Add hostname entry to the network info section on summary tab (as `HOST`). * Enable `OUTBOUND` entry in network info section on summary tab. * Display only the physical network interfaces in the interface dropdown on network config tab. * Improve form input handling. * Additional minor fixes & improvements. Closes siderolabs/talos#6992. Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>	2023-03-30 09:39:14 +02:00
Andrey Smirnov	188560a334	fix: add a link-scope route if the cmdline gateway is not reachable Fixes #7020 Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-03-29 22:25:04 +04:00
Dennis Marttinen	45c5b47a57	feat: dhcpv4: send current hostname, fix spec compliance of renewals This adds support for automatically registering node hostnames in DNS by sending the current hostname to DHCP via option 12. If the current hostname is updated, issue a new DISCOVER to propagate the update to DHCP (updating the hostname on lease renewals is not universally supported by DHCP servers). This addition maintains the previous functionality where the node can also request its hostname from the DHCP server. The received hostname will be processed and prioritized as usual by the `network.HostnameSpecController`. This change set also contains fixes to make DHCP renewals compliant with RFC 2131, specifically avoiding sending the server identifier and requested IP address when issuing renewals using a previous offer. This also uncovered issues and missing features in the upstream `insomniacslk/dhcp` library, the fixes and improvements for which are now finally merged. Sending hostname updates have been tested against `dnsmasq` and the built-in DHCP + DNS services in Windows Server. Hostname retrieval from DHCP and edge cases with overridden hostnames from different configuration layers have been extensively tested against `dnsmasq`. Signed-off-by: Dennis Marttinen <twelho@welho.tech> Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-03-29 21:04:32 +04:00
Andrey Smirnov	ea0e9bdbe4	feat: environment variables via the kernel arguments Unify getting environment variables, support passing environment variables via kernel args. Fixes #6984 See #6999 For META this will be used to pass environment variables to the installer for ISO images (or PXE booting). Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-03-28 16:28:33 +04:00
Andrey Smirnov	9e8603f53b	feat: implement new download URL variable `${code}` New variable value is coming from `META`, and it might be set using the interactive console (not implemented yet, but it will come soon). I had to refactor the URL expansion implementation: * simplify things where possible * provide more unit-tests for smaller units * handle expansion of all variables in parallel * allow parallel expansion on multiple variables Also I refactored download code to support proper passing of endpoint function with context. The end result: * Talos will try to download config for 3 hours before rebooting * Each attempt which includes URL expansion + download is limited to 3 minutes Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-03-24 21:49:36 +04:00
Utku Ozdemir	a7b79ef1be	feat: add network config screen to dashboard Implement the network config screen with input forms to configure the initial node networking by writing a config to the META partition. Closes siderolabs/talos#6961. Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>	2023-03-23 17:29:52 +04:00
Andrey Smirnov	cf2ccc521f	fix: always shutdown maintenance API service The problem was that `GracefulStop()` will hang forever if there is a running API call. So if there is a running streaming call, the maintenance service might hang until it is finished. The problem shows up with 'Upgrade' API in the maintenance mode if there is a concurrent streaming API call, e.g.: 1. Watch API is running against maintenance mode. 2. Upgrade API is issued, it tries to run the MaintenanceUpgrade sequence, which tries to take over the Initialize sequence. The Initialize sequence is canceled, maintenance API service context is canceled, but the service doesn't terminate, as it's stuck in `GracefulStop`. The sequence take over times out, as even the sequence is canceled, it hasn't terminated yet. Sample log: ``` [talos] upgrade request received: "ghcr.io/siderolabs/installer:v1.3.3" [talos] upgrade failed: failed to acquire lock: timeout [talos] task loadConfig (1/1): failed: failed to receive config via maintenance service: maintenance service failed: context canceled [talos] phase config (6/7): failed [talos] initialize sequence: failed <stuck here> ``` Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-03-23 16:59:30 +04:00
Noel Georgi	d1a61fd343	chore: bump golangci-lint Bump golangci-lint and fixup new warnings. Ignore check that checks for used function parameters, it's kind of noisy and makes it confusing to read interface implementations. Signed-off-by: Noel Georgi <git@frezbo.dev>	2023-03-22 19:55:38 +05:30
Noel Georgi	36a9a208ec	chore: bump deps Bump deps Signed-off-by: Noel Georgi <git@frezbo.dev>	2023-03-22 16:37:27 +05:30
Noel Georgi	c63cf90e32	feat: update k8s to v1.27.0-beta.0 Update k8s to v1.27.0-beta.0 Signed-off-by: Noel Georgi <git@frezbo.dev>	2023-03-21 23:59:17 +05:30
Dmitriy Matrenichev	b246c90abd	fix: add uint32 to Magic1 and Magic2 Discovered in #6971. Go compiler cannot deduce proper type on 32bit architectures for those constants, in `fmt.Print(f)` functions. Since we only compare them with uint32 variables, it makes sense to add proper types to them. Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>	2023-03-21 09:57:55 -03:00
Andrey Smirnov	bec89bf6e5	fix: use 'no block' etcd dial with multiple endpoints The problem showed up on 'reset' of the Talos node which had multiple endpoints for other control plane nodes, many of which weren't actually available. When 'grpc.WithBlock()' is used, etcd will try to dial the first endpoint and return an error if the dial fails. Use noblock mode by default with multiple endpoints, and blocking mode with a single endpoint. Pass the context to etcd to properly abort dial operations if the context get canceled. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-03-21 15:35:31 +04:00
Utku Ozdemir	2dd0964c5f	refactor: use resource watches on dashboard Instead of doing excessive get/list requests, do a watch per node in an infinite retry. Additionally, refactor the dashboard code to make the various data listener namings more consistent and reorganize the packages. Closes siderolabs/talos#6960. Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>	2023-03-17 23:06:35 +01:00
Dmitriy Matrenichev	a14a0aba04	fix: nil pointer exception in syncLink If link has no `Info` field we can't do anything meaningful, so we'll just log and skip. Also fix race in test. For #6956 Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>	2023-03-17 15:33:20 +04:00
Noel Georgi	cf101e56fb	fix: add `--force` flag for `talosctl gen` Error out if file(s) already exists and warn user to use `--force` to overwrite. Fixes: #6963 Signed-off-by: Noel Georgi <git@frezbo.dev>	2023-03-17 15:07:12 +05:30
Utku Ozdemir	ea2aa06116	fix: fix data race on network config read Fix a data race caused by the metadata field of PlatformNetworkConfig being edited after it was sent to the channel. It caused test failures. Fix it by setting a copy of the metadata instead. Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>	2023-03-17 00:24:22 +01:00
Andrey Smirnov	64e3d24c6b	feat: provide platform network config for 'metal' in META A special META key might contain optional platform network config for the `METAL` platform. It is completely optional, but if present, it works same way as in the clouds: it is applied with low priority (can be overridden with machine config), but provides some initial defaults for the machine. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-03-15 23:54:39 +04:00
Andrey Smirnov	442cb9c1b0	feat: implement APIs to write to META This allows to put keys to META partition. META contents can be viewed with `talosctl get metakeys`. There is not real usecase for it yet, but the next PRs will introduce two special keys which can be written: * platform network config for `metal` * `${code}` variable Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-03-15 22:17:52 +04:00
Utku Ozdemir	9e07832db9	feat: implement summary dashboard Implement the new summary dashboard with node info and logs. Replace the previous metrics dashboard with the new dashboard which has multiple screens for node summary, metrics and editing network config. Port the old metrics dashboard to the tview library and assign it to be a screen in the new dashboard, accessible by F2 key. Add a new resource, infos.cluster.talos.dev which contains the cluster name and id of a node. Disable the network config editor screen in the new dashboard until it is fully implemented with its backend. Closes siderolabs/talos#4790. Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>	2023-03-15 13:13:28 +01:00
Andrey Smirnov	1df841bb54	refactor: change the interface of META Use a global instance, handle loading/saving META in global context. Deprecate legacy syslinux ADV, provide an easier interface for consumers. Expose META as resources. Fix the bootloader revert process (it was completely broken for quite a while :sad:). This is a first step which mostly does preparation work, real changes will come in the next PRs: * add APIs to write to META * consume META keys for platform network config for `metal` * custom key for URL `${code}` Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-03-15 15:43:16 +04:00
Andrey Smirnov	02b0ff35ee	feat: generate Flannel CNI manifest from upstream Fixes #6730 `go generate`-based step downloads the upstream manifest, transforms it to match our requirements, and it is compiled in as the Flannel manifest. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-03-13 20:00:35 +04:00
Serge Logvinov	9948a646d2	feat: coredns node uninitialized toleration Launch CoreDNS even if the node is not initialized. Network is ready already, but CCM didn't finish their job. Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev> Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-03-13 14:29:14 +04:00
Erik Lund	230cfaf803	feat: use network information from guestinfo.metadata Add VMware GuestInfo metadata to network configuration. Fixes #6708 Signed-off-by: Erik Lund Jensen <info@erikjensen.it> Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-03-09 16:51:08 +04:00
Nico Berlee	97048f7c37	feat: netstat in API and client Implements netstat in Talos API and client (talosctl). Signed-off-by: Nico Berlee <nico.berlee@on2it.net> Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-03-09 15:48:30 +04:00
Andrey Smirnov	fda6da6929	fix: successful ACPI shutdown in maintenance mode Fixes #6817 The original problem wasn't reproducible with `main`, but there was a set of bugs in the shutdown sequence which was preventing it from completing successfully, as in the maintenance mode nothing is running and initialized yet. Most of the bugs were `nil` pointer dereferences. Fixed a small issue with final 'RebootError' printed as a failure in the ACPI shutdown path. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-03-07 23:52:02 +04:00
Dmitriy Matrenichev	ebc92f3c1d	chore: add container id to `talosctl -k containers` and `talosctl -k logs` This PR adds first 12 symbols from container ID and adds them to `talosctl -k containers` each container output. That way we can ensure that we get the logs from proper container even if there is a newer one. Closes #6886 Co-authored-by: Utku Ozdemir <utku.ozdemir@siderolabs.com> Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>	2023-03-07 13:20:44 +03:00
Dmitriy Matrenichev	22ef81c1e7	feat: add grub option to drop to maintenance mode - [x] Support `talos.experimental.wipe=system:EPHEMERAL,STATE` boot kernel arg - [x] GRUB option to wipe like above - [x] update GRUB library to handle that Closes #6842 Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>	2023-03-07 12:37:59 +03:00
Dmitriy Matrenichev	e71cc6619b	fix: redo assertHostnames in HostnameMergeSuite.TestMerge Use `rtestutils.AssertResources` for hostnames test. Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>	2023-03-06 15:09:50 +03:00
Andrey Smirnov	8ea4bfad8f	refactor: improve the kubernetes upgrade flow Use new version of go-kubernetes, and move the `kube-proxy` DaemonSet update to follow common logic of bootstrap manifests update. This fixes a confusing behavior when after `k8s-upgrade` the version of `kube-proxy` is not updated in the machine config. See https://github.com/siderolabs/go-kubernetes/pull/3 Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-03-06 15:01:29 +04:00
Tim Jones	061640cccf	feat: add pod ip to kube-proxy spec Exposes the pod IP as the `POD_IP` environment variable via the downward API in the kube-proxy pod for use in e.g. metrics-bind-addr. Signed-off-by: Tim Jones <tim.jones@siderolabs.com>	2023-03-03 12:52:30 +01:00
Andrey Smirnov	337aaba7a7	feat: add 'os:operator' role This introduces a new role for Talos API which fills the gap between `os:reader` and `os:admin` roles. Fixes #6898 Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-03-01 16:12:25 +04:00
Andrey Smirnov	40e69af224	fix: improve etcd leave on reset process When removing a member from `etcd`, the server does a pre-check to make sure the member is connected to a quorum of other members, and the remove request might fail. Add a retry to wait for the etcd to be fully connected before giving up, as some parts of the reset flow alrady ran. Also fix an issue which appears in the integration test, when `reset` is called early in the boot sequence when local etcd hasn't started fully yet. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-03-01 14:51:49 +04:00
Dmitriy Matrenichev	638dc9128f	fix: fix "defer" leak in ResetUserDisks Also, print error if we failed to close the device. Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>	2023-02-28 21:51:37 +03:00
Dmitriy Matrenichev	bfba3677b0	chore: handle grub option - "wipe" This PR ensures that we can handle third grub option - "wipe". We will use it in 1.4. For #6842 Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>	2023-02-28 21:21:28 +03:00
Artem Chernyshev	b520710810	feat: introduce new flag in reset API that makes Talos reset user disks Fixes: https://github.com/siderolabs/talos/issues/6815 Additionally, make it possible to run reset in maintenance mode: to enable a way for resetting system disk and remove all traces of Talos from it. The new reset flow works in a separate sequence, changed disk probe lookup to check the boot partition instead of the ephemeral one. Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>	2023-02-28 15:10:41 +03:00
Utku Ozdemir	f55f5df739	feat: move dashboard package & run it in tty2 Move dashboard package into a common location where both Talos and talosctl can use it. Add support for overriding stdin, stdout, stderr and ctt in process runner. Create a dashboard service which runs the dashboard on /dev/tty2. Redirect kernel messages to tty1 and switch to tty2 after starting the dashboard on it. Related to siderolabs/talos#6841, siderolabs/talos#4791. Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>	2023-02-28 12:00:25 +01:00
Dmitriy Matrenichev	36e077ead4	chore: bump deps - github.com/aws/aws-sdk-go to v1.44.209 - github.com/stretchr/testify to v1.8.2 - github.com/jsimonetti/rtnetlink to v1.3.1 - google.golang.org/genproto to v0.0.0-20230223222841-637eb2293923 - github.com/emicklei/dot to v1.3.1 - github.com/gdamore/tcell/v2 to v2.6.0 - github.com/insomniacslk/dhcp to v0.0.0-20230220063916-5369909a5de7 - github.com/jsimonetti/rtnetlink to v1.3.1 - github.com/opencontainers/runtime-spec to v1.1.0-rc.1.0.20230215090456-58ec43f9fc39 - github.com/rivo/tview to v0.0.0-20230226195229-47e7db7885b4 Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>	2023-02-28 00:14:59 +03:00
Noel Georgi	426fe9687d	fix: extension base folder permission The `modules.dep` kernel module dependency tree extension root path was previously created with a permission of `0o700` which means the talos root go a permission of `0o700` when the kernel module tree was re-built when extensions providing kernel modules was enabled. This means that any binaries lost the executable permission when ran as non-root creating an `EACCES` error. Fix by making sure the temporary directory created for building kernel modules tree has `0o755` permission explicitly. Signed-off-by: Noel Georgi <git@frezbo.dev>	2023-02-27 19:49:06 +05:30
Andrey Smirnov	230e46e567	refactor: extract parts of kubernetes libraries The shared code is going out to the github.com/siderolabs/go-kubernetes library. The code will be used in Talos and other projects using same features. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-02-22 14:56:49 +04:00
Utku Ozdemir	5ac9f43e45	feat: start machined earlier & in maintenance mode Load & start machined earlier and in initialize sequence, so that it is possible to use its API over its unix socket in maintenance mode. Additionally, do not return features from Version API if a config is not yet available. Related to siderolabs/talos#4791. Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>	2023-02-21 12:21:36 +01:00
Dmitriy Matrenichev	3d55bd80f4	fix: add `--force` flag to `talosctl gen config` Only overwrite existing files if explicitly demanded. Closes #6847 Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>	2023-02-20 23:44:00 +03:00
Serge Logvinov	660b8874da	feat: cmdline integer netmask Can set netmask as number. Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev> Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-02-20 20:55:56 +04:00
Andrey Smirnov	6e8f13529c	fix: add support for a fallback '*' mirror configuration Talos always supported that, but CRI config lacked support for it. Now with recent containerd the new `_default` host is used as a fallback, so this re-enables the support and updates the docs. See https://github.com/containerd/containerd/pull/8065 Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-02-16 23:12:13 +04:00
Artem Chernyshev	dcd4eb1a93	fix: improve error message on single node upgrade Fixes: https://github.com/siderolabs/talos/issues/6828 Propose a solution if the node upgrade fails. Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>	2023-02-16 17:33:04 +03:00
Noel Georgi	2d01480180	feat: automatically load modules based on hw info Fixes: #6802 Automatically load kernel modules based on hardware info and modules alias info. udevd would automatically load modules based on HW information present. Signed-off-by: Noel Georgi <git@frezbo.dev>	2023-02-14 19:57:13 +05:30
Noel Georgi	7b75cd8b94	fix: kernel module dependency tree generation This fixes the issue when the overlay mount target directory was used as lowerdir for the mount, creating extra folders in the extension. Fix the issue by adding support for normal overlay mounts to use a source directory when specified. Also fixes a small issue where messages was logged when error is nil. Signed-off-by: Noel Georgi <git@frezbo.dev>	2023-02-14 01:07:11 +05:30
Noel Georgi	65d02e5ade	fix: dbus shutdown when it's not initialized If dbus is not started and a shutdown was called talos panics, fix by checking if the mock is nil. Signed-off-by: Noel Georgi <git@frezbo.dev>	2023-02-13 21:12:54 +05:30
Andrey Smirnov	a7079ce85c	fix: quote the ampersand character in GRUB config Not sure how I missed it in the first PR, but that's the only character which was not quoted properly. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-02-13 18:58:34 +04:00
Andrey Smirnov	dcbcf5a93c	fix: wait for network and retry in platform get config funcs Wait for the network before trying to access the metadata service. Retry the calls when appropriate (most platforms use `download.Download` function which does proper retries). Co-authored-by: Noel Georgi <git@frezbo.dev> Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-02-09 21:04:43 +04:00
Andrey Smirnov	e09e106665	fix: default dns domain to 'cluster.local' in local case One case was missing: when network section is present, but value is omitted. Fixes #6825 Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-02-08 14:35:28 +04:00
Noel Georgi	cc6e37a47f	feat: use process wrapper for dropping capabilities Use process wrapper introduced in #6814 to drop capabilities. This change also means the capabilities are dropped per process level and not for PID 1 (machined), which allows us to drop capabilities per process. Signed-off-by: Noel Georgi <git@frezbo.dev>	2023-02-07 00:49:56 +05:30
Noel Georgi	5cb2915d8e	feat: use wrapper for starting processes Use a wrapper for starting processes which can setup proper cgroups, OOMscore, and also drop capabilities for the process, then it calls `execve`. The containerd tests is also fixed to support cgroups when running tests in buildkit. It used to pass previously as we did not error if cgroup setup failed. Signed-off-by: Noel Georgi <git@frezbo.dev>	2023-02-03 18:32:09 +05:30
Andrey Smirnov	38a51191e4	fix: correctly expand parameters in the URL This fixes multiple issues: * `log.Fatalf` in the machined code leads to kernel panic * return URL if some expansion fails * correctly handle destroyed event (wait for the next one) Fixes #6807 Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-02-02 18:42:45 +04:00
Andrey Smirnov	54f7d4c923	fix: correctly quote and unquote strings in GRUB config One of the fields in the GRUB config - boot arguments - contains user-controlled input. Talos supports variable expansion in `talos.config` parameter, and uses `${var}` syntax. In GRUB config, `}` is a special character, and introduction of `}` breaks config parsing both for GRUB and Talos. Correctly escape and unescape special characters. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-02-02 17:11:22 +04:00
Noel Georgi	590a393de9	fix: udevd healthcheck The previous `udevd` healthcheck was incomplete and if `udevd` took more time to startup the initial `udevadm trigger` would have silently failed failing to setup proper devices. `udevadm trigger` returns an exit code of zero even if `udevd` is not running. This PR fixes by first checking if the `udevd` control socket exists, which is a faster check, then making sure `udevd` is up by running `udevadm control` command. This ensures that `udevd` is properly initialized before running any `udevadm trigger` commands even if `udevd` is restarted/killed. Signed-off-by: Noel Georgi <git@frezbo.dev>	2023-01-30 16:12:41 +05:30
Noel Georgi	812a2877cd	chore: bump deps + renovate cleanup Bump dependencies. Disable renovate for PR's and skip un-needed update checks. Signed-off-by: Noel Georgi <git@frezbo.dev>	2023-01-24 00:42:58 +05:30
Andrey Smirnov	aa9f66c1c8	fix: mark DigitalOcean anchor IP as scope link This excludes it out of the `NodeAddress`. Needs extra testing to confirm that it actually still works as anchor IP. Fixes #6760 Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-01-23 20:35:52 +04:00
Andrey Smirnov	3e00571627	fix: unwrap gRPC errors on stop/remove pods check As the client returns wrapped errors, unwrap them using our own method which does `errors.As` instead of gRPC one which doesn't do unwrapping. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-01-23 14:29:04 +04:00
Andrey Smirnov	00e52ae078	fix: build correctly etcd initial cluster URL The supposed format with multiple adverised URLs is: `name=u1,name=u2` Previously Talos generated: `name=u1,u2` (which is wrong) Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-01-20 22:52:47 +04:00
Dmitriy Matrenichev	c5954f4345	chore: bump deps For some reason `go-mod-outdated` didn't work for me, so I had to do this manually. Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>	2023-01-19 21:40:00 +03:00
Noel Georgi	d4b8b35de7	feat: generate kernel module dependency tree Run `depmod` during install/upgrades when extensions provide kernel modules and `modules.dep` needs to be re-generated. This also allows modules of same name from kernel to co-exist. Modules in `extras` folder takes precedence over `in-built` ones. Signed-off-by: Noel Georgi <git@frezbo.dev>	2023-01-19 18:54:10 +05:30
Andrey Smirnov	18122ae73e	fix: service restart (including extension services) Fixes #6707 There was a race condition between different parts of the service code: `Stop` waits for the event which is published before the service is removed from the `running[id]` map, so if one does `Stop` followed by `Start` (this is what `services restart` API does), by the time it goes to `Start` it might be still in the `running[id]` map, so `Start` does nothing. Overall this code should be rewritten and simplified, but for now move out sending these "terminal" events out so that by the time the event is published, the service is stopped and removed from the `running[id]` map. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-01-18 14:52:47 +04:00
Andrey Smirnov	0b65bbfc87	fix: handle overwriting tags in syslinux ADV This is (still) being used in Talos to handle upgrade rollbacks. There were multiple problems with this code, and one of them leads to panic if the tag is written multiple times without deletion: ``` github.com/siderolabs/talos/internal/app/machined/pkg/runtime/v1alpha1/bootloader/adv/syslinux.ADV.SetTagBytes({0xc00175bc00?, 0x1f11dbe?, 0xed4f4d?}, 0x0?, {0xc000afb7f0?, 0x400?, 0x0?}) /src/internal/app/machined/pkg/runtime/v1alpha1/bootloader/adv/syslinux/syslinux.go:125 +0x270 github.com/siderolabs/talos/internal/app/machined/pkg/runtime/v1alpha1/bootloader/adv/syslinux.ADV.SetTag(...) /src/internal/app/machined/pkg/runtime/v1alpha1/bootloader/adv/syslinux/syslinux.go:95 github.com/siderolabs/talos/cmd/installer/pkg/install.(*Installer).Install(0xc0004374a0, 0x5) /src/cmd/installer/pkg/install/install.go ``` The `uint8()` conversion was causing overflow and wrong index when ADV real length is over 255. Fix multiple writes of the same tag by deleting previous value first. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-01-17 23:21:39 +04:00
Serge Logvinov	70d9428a1d	fix: kubespan MSS clamping Change TCP maximum segment size if it goes through the KubeSpan to match KubeSpan MTU. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-01-17 19:02:33 +04:00
Andrey Smirnov	062c7d754b	test: fix integration test on cp endpoint update As with #6724, controlplane node kubelet doesn't use control plane endpoint anymore, run the test on the worker node instead of cp node. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-01-12 15:23:14 +04:00
Andrey Smirnov	0a5a8802e7	feat: use 'localhost' endpoint for controlplane nodes This switches the last usage of Kubernetes controlplane endpoint to use `localhost` (itself) for controlplane nodes. Worker nodes still use cluster-wide controlplane endpoint. This allows controlplane nodes to boot fully even if the controlplane endpoint (e.g. loadbalancer) doesn't function. The process of joining etcd still requires either a discovery service or a proper functioning controlplane endpoint. With this fix, Talos controlplane nodes can boot successfully without a loadbalancer being up, while worker nodes obviously won't join. This improves Talos behavior in single-node clusters when controlplane endpoint is not available, the node will still boot just fine and function properly. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-01-10 20:50:51 +04:00
Andrey Smirnov	29020cb9c7	fix: report fatal sequence errors as reboots When the sequence fails hard, Talos does automatic reboot, so reflect this in the machine status properly. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-01-10 14:24:23 +04:00
Andrey Smirnov	96629d5ba6	feat: implement etcd maintenance commands This allows to safely recover out of space quota issues, and perform degragmentation as needed. `talosctl etcd status` command provides lots of information about the cluster health. See docs for more details. Fixes #4889 Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-01-03 23:25:28 +04:00
Andrey Smirnov	80fed31940	feat: include Kubernetes controlplane endpoint as one of the endpoints These endpoints are used for workers to find the addresses of the controlplane nodes to connect to `trustd` to issue certificates of `apid`. These endpoints today come from two sources: * discovery service data * Kubernetes API server endpoints This PR adds to the list static entry based on the Kubernetes control plane endpoint in the machine config. E.g. if the loadbalancer is used for the controlplane endpoint, and that loadbalancer also proxies requests for port 50001 (trustd), this static endpoint will provide workers with connectivity to trustd even if the discovery service is disabled, and Kubernetes API is not up. If this endpoint doesn't provide any trustd API, Talos will still try other endpoints. Talos does server certificate validation when calling trustd, so including malicious endpoints doesn't cause any harm, as malicious endpoint can't provider proper server certificate. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-01-03 21:33:18 +04:00
Serge Logvinov	80f150ac85	feat: enable ipv6 on gcp Introduce ipv6 to the google cloud. It also can work with dhcpv6 is on. But the route receives through RA packages which not working. Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev> Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-12-28 14:45:49 +04:00
Serge Logvinov	f6a86ae906	fix: oralce cloud zone Zone definition misspell. Native services use uppercase zone. Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev> Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-12-26 14:49:26 +04:00
Andrey Smirnov	89dbb0ecf0	release(v1.4.0-alpha.0): prepare release This is the official v1.4.0-alpha.0 release. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-12-23 22:32:09 +04:00
Andrey Smirnov	31fb905358	feat: update Linux 6.1.1, containerd 1.6.14 Bumps tools/pkgs/extras to the latest. Bumps Go modules. Enables adaptive capacity for COSI state. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-12-23 20:30:09 +04:00
Andrey Smirnov	a0c0352ddc	fix: send diagnostic output to stderr consistently Fixes #6676 There was a mix of stdout/stderr, move more consistently to stderr. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-12-23 18:41:56 +04:00
Andrey Smirnov	9a5f4c08a2	fix: default the manifest namespace if not set This seems to happen specifically for CRDs, regular Kubernetes resources have some extra magic. Fixes #6663 Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-12-22 16:20:46 +04:00
Niklas Wik	34babe858d	chore: make organization selection an interface Making organization a interface for preparing to avoid giving system:masters access to the talosctl kubeconfig generated certificate. Signed-off-by: Niklas Wik <niklas.wik@nokia.com> Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-12-19 15:12:30 +04:00
Nico Berlee	171aa94679	fix: disable Wireless Lan using dtoverlay Talos has no wireless support & wireless kernel drivers, so disabling it the recommended way might actually might save power consumption. It could save ~45 mA: https://forums.raspberrypi.com/viewtopic.php?t=257144#p1568474 Or 'The WiFi half of the wireless chip will be powered but be held in reset': https://forums.raspberrypi.com/viewtopic.php?t=343854#p2060246 Either way, it does not hurt and it should be treated the same as bluetooth. Signed-off-by: Nico Berlee <nico.berlee@on2it.net> Signed-off-by: Noel Georgi <git@frezbo.dev>	2022-12-17 01:48:43 +05:30
Dmitriy Matrenichev	eb332cfcb7	feat: add health check for a minimal memory / disk size This PR adds two additional checks which are performed during boot sequence and in `talosctl health`. They ensure that nodes have enough memory and disk. - Boot check will print a warning if memory / disk size is not sufficient. - Health check will fail if memory / disk size is not sufficient. Closes #6467 Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>	2022-12-10 07:05:08 +03:00
Andrey Smirnov	d04970dfa9	fix: ignore k8s additional addresses if nil This fixes a potential panic which I found in the unit-tests logs. The error 'not found' is ignored, so need an addiitonal check. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-12-09 19:25:07 +04:00
Dmitriy Matrenichev	a8ebcca4a9	chore: remove `watchErr` from `metal.getResource` It's only used to detect if resource is `nil` or of incorrect type. Both errors are developer errors, so we should not collect them. Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>	2022-12-06 22:04:28 +03:00
Dmitriy Matrenichev	1253513bd1	fix: fix nil pointer panic and incorrect error output Currently `.Error()` call is panicking if `watchErr` is nil. Besides - we want to wrap errors the way we can unwrap them. Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>	2022-12-06 21:03:25 +03:00
Andrey Smirnov	82e8c9e1f6	fix: workaround panic in the kubelet service controller The traceback: ``` user: warning: [2022-12-02T17:31:09.496341098Z]: [talos] controller failed {"component": "controller-runtime", "controller": "k8s.KubeletServiceController", "error": "controller \x5c"k8s.KubeletServiceController\x5c" panicked: runtime error: invalid memory address or nil pointer dereference\x5cn\x5cngoroutine 308 [running]:\x5cnruntime/debug.Stack()\x5cn\x5ct/toolchain/go/src/runtime/debug/stack.go:24 +0x65\x5cngithub.com/cosi-project/runtime/pkg/controller/runtime.(adapter).runOnce.func2()\x5cn\x5ct/.cache/mod/github.com/cosi-project/runtime@v0.1.1/pkg/controller/runtime/adapter.go:403 +0x5d\x5cnpanic({0x2b7b600, 0x536c7c0})\x5cn\x5ct/toolchain/go/src/runtime/panic.go:884 +0x212\x5cngithub.com/talos-systems/talos/internal/app/machined/pkg/controllers/k8s.updateKubeconfig(0xc0000d49b0?)\x5cn\x5ct/src/internal/app/machined/pkg/controllers/k8s/kubelet_service.go:302 +0xb8\x5cngithub.com/talos-systems/talos/internal/app/machined/pkg/controllers/k8s.(KubeletServiceController).Run(0xc000956030, {0x389f7c0, 0xc000808040}, {0x38bce60, 0xc0000dfa80}, 0x0?)\x5cn\x5ct/s... ``` Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-12-06 20:53:30 +04:00
Andrey Smirnov	a505b8909a	fix: update COSI and reset restart backoff on success See https://github.com/cosi-project/runtime/pull/191 Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-12-06 17:43:26 +04:00
Andrey Smirnov	5b2960efff	fix: introduce 'overridePath' setting and fix Talos resolver There was inconsistency in the way `/v2` was appended to registry endpoint path between containerd (CRI) and Talos: * Talos only appended `/v2` to empty paths * containerd appended `/v2` if it's not the suffix already Fix Talos to act same as containerd, and introduce a setting `overridePath` which stops both Talos and `containerd` from appending `/v2` (should be required with e.g. Harbor registry mirror). Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-12-05 12:50:53 +04:00
Andrey Smirnov	0219d1124e	fix: use only kube-apiserver endpoints for Talos API access endpoints Fixes #6566 This avoid putting all node addresses which might not be routeable across Kubernetes. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-12-02 22:27:55 +04:00
Andrey Smirnov	dc5e0f4af0	fix: report errors to Equinix Metal event API This provides more detailed event for better error analysis. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-12-02 21:24:00 +04:00
Utku Ozdemir	7ab140a94a	feat: add talosctl machineconfig patch command Add talosctl machineconfig patch command which accepts a machine config as input and a list of patches, applying the patches and writing the result to a file or to stdout. Link `talosctl machineconfig gen` to `talosctl gen config`, so they work the same way. Closes siderolabs/talos#6562. Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>	2022-12-02 15:42:48 +01:00
Andrey Smirnov	d3cf061149	fix: ignore many more filesystems in IMA Fixes #6553 Talos itself defaults to XFS, so IMA measurements weren't done for Talos own filesystems. But many other solutions create by default ext4 filesystems, or it might be something mounted by other means. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-12-01 20:16:41 +04:00
Utku Ozdemir	44e2799b8c	feat: add stdout and single config type support to talosctl gen config Add support to specify the types of outputs to be generated by talosctl gen config. Add support for writing a single type of output to stdout instead of a file. Related to siderolabs/talos#6562. Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>	2022-12-01 16:55:22 +01:00
Andrey Smirnov	4cd125d499	fix: correctly handle new watch event types This is a fix after upgrade to COSI v0.2.0. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-12-01 13:53:22 +04:00
Andrey Smirnov	2ebe410e93	feat: update COSI to v0.2.0 This brings many fixes, including a new Watch with support for Bootstapped and Errored event types. `talosctl` from before this change is still compatible, as there's gRPC API level backwards compatibility versioning. New client doesn't yet depend on new event types, so it will work against Talos 1.2.x. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-11-29 21:21:59 +04:00
Andrey Smirnov	1103c5ad24	feat: implement pre-flight checks in the installer Host Talos mounts machined socket for API access into the installer container (for upgrades). Installer runs any check it might need to verify compatibility. At the moment following checks are implemented: * Talos version (whether upgrade from version X to Y is supported) * Kubernetes version (whether Kubernetes version X is supported with Talos Y). Fixes #6149 Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-11-28 13:45:49 +04:00
Andrey Smirnov	4a052eadf3	fix: disable kexec on upgrades from pre-BTF kernel Enabling BTF in the kernel brakes kexec from pre-BTF kernel (e.g. when upgrading from 1.2.x to 1.3.x). As there's no way to detect Talos version in the installer at the moment, use another way to detect whether BTF is enabled in the Talos version which is running right now. Fixes #6443 Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-11-24 22:48:39 +04:00
Andrey Smirnov	732c459ecf	fix: parse and apply DHCP settings properly from cmdline This allows multiple `ip=` parameters, and fixes setting DHCP for any link on the cmdline. Fixes #6475 Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-11-24 21:47:29 +04:00
Andrey Smirnov	ee7a4777af	chore: bump dependencies Linux 5.15.79, containerd 1.6.10 Other changes come from: * https://github.com/siderolabs/toolchain/pull/57 * https://github.com/siderolabs/tools/pull/244 * https://github.com/siderolabs/pkgs/pull/619 * https://github.com/siderolabs/extras/pull/67 Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-11-22 23:47:05 +04:00
Serge Logvinov	a58c3d6699	feat: hcloud location properties Receive regian/zone from metadata server. Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev> Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-11-22 18:31:21 +04:00
Andrey Smirnov	c54bea1283	fix: don't publish external IPs as affiliate addresses Fixes #5937 This removes external IPs from a set of addresses published by the node (we source addresses from 'routed' now which excludes external). This is definitely "right" thing to do, as those addresses are not on the node itself and can't be routed to the node. On other hand it also removes them from `talosctl get members`, but we don't have to split this up right now. For the KubeSpan endpoints, we still use 'all' addresses, as external IPs are perfect as KubeSpan endpoints (Wireguard endpoints). Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-11-21 15:02:53 +04:00
Serge Logvinov	e432579d48	feat: kubespan node endpoints filter This feature allows us to use only IPv4 or IPv6 stack to reach the peers. Also, it can help to not share the node-specific IPs, which cannot be accessible at all. Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>	2022-11-18 19:55:42 +04:00
Andrey Smirnov	6430ce1efc	fix: limit SideroLink Wireguard link MTU to 1280 See https://github.com/siderolabs/siderolink/pull/19 Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-11-18 00:09:10 +04:00
Andrey Smirnov	aa56aed798	feat: publish discovered public IP as one of the KubeSpan endpoint This resolves a case when a node is behind NAT, but KubeSpan port is forwarded back to the node. Discovery Service returns public IP of the client as it sees from the incoming request. That address is now published to the KubeSpan endpoints. Fixes #6508 Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-11-16 17:36:38 +04:00
Andrey Smirnov	9382443baa	feat: update Kubernetes to v1.26.0-rc.0 Removed deprecated arg from the kubelet spec, as the arg is going to be removed completely in v1.27 (kubelet defaults to remote CRI anyways). Go modules not updated due to https://github.com/kubernetes/kubernetes/issues/113951 Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-11-16 17:07:06 +04:00
Andrey Smirnov	6ffc381c59	feat: implement CRI configuration customization This is tricky, as containerd doesn't merge itself plugin configuration across multiple files. TOML can't load configuration correctly from concatenated files. Fixes #6390 Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-11-16 15:38:44 +04:00
Philipp Sauter	e1e340bdd9	feat: expose Talos node labels as a machine configuration field We add the `nodeLabels` key to the machine config to allow users to add node labels to the kubernetes Node object. A controller reads the nodeLabels from the machine config and applies them via the kubernetes API. Older versions of talosctl will throw an unknown keys error if `edit mc` is called on a node with this change. Fixes #6301 Signed-off-by: Philipp Sauter <philipp.sauter@siderolabs.com> Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-11-15 21:25:40 +04:00
Utku Ozdemir	5bfd7dbfa7	test: fix assertion on reboot test Fix the reboot test assertion after the changed formatting. Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>	2022-11-11 14:42:12 +01:00
Andrey Smirnov	1cfb6188bc	feat: implement support for cgroupsv1 Use boot kernel arg `talos.unified_cgroup_hierarchy=0` to force Talos to use cgroups v1. Talos still defaults to cgroupsv2. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-11-11 15:49:25 +04:00
Andrey Smirnov	3866d0e334	feat: update Kubernetes to v1.26.0-beta.0 See https://github.com/kubernetes/kubernetes/releases/tag/v1.26.0-beta.0 Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-11-11 15:06:34 +04:00
Philipp Sauter	4e114ca120	feat: use the etcd member id for etcd operations instead of hostname We add a controller that provides the etcd member id as a resource and change the etcd related commands to support member ids next to hostnames. Fixes: #6223 Signed-off-by: Philipp Sauter <philipp.sauter@siderolabs.com>	2022-11-10 19:17:56 +04:00
Serge Logvinov	06fea24414	feat: expand platform metadata resources * add IPv6 to the ExternalIPs resource. * platformMetadata can define Spot instances. Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev> Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-11-07 18:57:17 +04:00
Serge Logvinov	03a20da9da	fix: filter up duplicate IPs out of NodeAddresses The node can have two IPv6 of the same addresses: * IPv6/64 * IPv6/128 In this case, the node will advertise two of the same IP:PORT endpoints. Which adds more time to create/recover a p2p (kubespan) connection. Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev> Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-11-04 22:52:43 +04:00
Andrey Smirnov	96aa9638f7	chore: rename talos-systems/talos to siderolabs/talos There's a cyclic dependency on siderolink library which imports talos machinery back. We will fix that after we get talos pushed under a new name. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-11-03 16:50:32 +04:00
Andrey Smirnov	30bbf6463a	refactor: use siderolabs/net version with netip.Addr Replace most of `net.IP` usage in Talos with `netip.Addr`, refactor code accordingly. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-11-02 14:21:03 +04:00
Andrey Smirnov	343c55762e	chore: replace talos-systems Go modules with siderolabs This the first step towards replacing all import paths to be based on `siderolabs/` instead of `talos-systems/`. All updates contain no functional changes, just refactorings to adapt to the new path structure. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-11-01 12:55:40 +04:00
Andrey Smirnov	08e7e49a29	test: update versions for upgrade tests Use the latest releases in each branch. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-11-01 10:40:19 +04:00
Andrey Smirnov	0b41923c36	fix: restore the StaticPodStatus resource It got broken with the changes to the kubelet now sourcing static pods from a HTTP internal server. As we don't want it to be broken, and to make health checks better, add a new check to make sure kubelet reports control plane static pods as running. This coupled with API server check should make it more thorough. Also add logging when static pod definitions are updated (they were previously there for file-based implementation). These logs are very helpful for troubleshooting. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-10-31 18:48:03 +04:00
Andrey Smirnov	1947092ae2	chore: introduce a healthcheck for `machined` service This fixes a cosmetic issue of `machined` being the only service which doesn't have the healthcheck, and showing up as 'health unknown'. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-10-31 18:15:29 +04:00
Andrey Smirnov	3333cd93c8	fix: generate correct Flannel config for IPv6-only clusters See https://github.com/flannel-io/flannel/blob/master/Documentation/configuration.md#ipv6-only Fixes #6427 Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-10-31 17:56:04 +04:00
Andrey Smirnov	d7070f5e74	release(v1.3.0-alpha.1): prepare release This is the official v1.3.0-alpha.1 release. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-10-31 16:43:11 +04:00
Maxim Makarov	869f3b5a51	feat: network configuration improvements on the OpenStack platform * support for bonding * added interface selection by MAC address * fixed bug where network configuration from config-drive was not being applied due to errors when discovering `hostname` and `extIPs` from OpenStack API. Signed-off-by: Maxim Makarov <maxpain177@gmail.com> Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-10-28 18:36:22 +04:00
Serge Logvinov	29f2195e13	feat: support exoscale cloud Add Exoscale cloud-init support. Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev> Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-10-28 17:52:55 +04:00
Serge Logvinov	8bfa7ac1d6	feat: platform metadata resource This resource stores common platform metadata information. Such as: * Hostname * Region * Zone * InstanceType (SKU) * InstanceID * ProviderID (CCM cloud native magic string) Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev> Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-10-28 14:32:39 +04:00
Andrey Smirnov	7e50e24c01	fix: properly cleanup legacy static pod manifests directory When upgrading from older version of Talos using static pod manifests directory to new version providing static pods via internal web server, we need to make sure that legacy static pods are cleaned up, otherwise kubelet receives "two" versions of the static pods which makes it fail to run them. The previous cleanup location wasn't working properly, as `/etc/kubernetes/manifests` exists in the rootfs (and it's empty), while actual contents are in `/var`, and they appear only when respective overlay mount is done. The controller tried to clean up on start, saw nothing (looking into rootfs), then started doing other functions. The result was that when overlay was mounted, static pods were still there, while the controller will do next attempt only when it fails, and it fails next time when kubelet is already running, and when it already picked up those stale definitions. Fix all of that by moving cleanup into sequencer after overlayfs mount. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-10-27 23:09:47 +04:00
Philipp Sauter	4ea3b99b52	fix: serve static pod files on 127.0.0.1 instead of localhost Previously we were serving the static pod files on localhost which assumes DNS. We change that to 127.0.0.1. Signed-off-by: Philipp Sauter <philipp.sauter@siderolabs.com>	2022-10-27 13:57:20 +02:00
Philipp Sauter	23842114f0	feat: support encryption with secretbox We add support for encryption with secretbox. While AESCBC is still supported secretbox will take precedence if both are configured. Secretbox is not the default encryption for new clusters. Fixes: #6362 Signed-off-by: Philipp Sauter <philipp.sauter@siderolabs.com>	2022-10-26 19:06:53 +02:00
Andrey Smirnov	d7edd0e2e6	refactor: use go-circular, go-kubeconfig, and go-tail Remove Talos versions, use new extracted Go modules. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-10-25 20:20:44 +04:00
Philipp Sauter	c6e1702eca	feat: use URL-based manifests to present static pods to the kubelet Previously static pod manifests were written to and read from a folder on the disk. We add a controller that cleans up the default static pod manifests on the disk and serves them as a PodList manifest via HTTP. The to the manifest is injected into the kubelet. File based static pod manifests are still supported and may be enabled by setting the key kubelet -> enableManifestsDirectory in the machine config. Fixes #5494 Signed-off-by: Philipp Sauter <philipp.sauter@siderolabs.com>	2022-10-25 14:30:19 +02:00
Andrey Smirnov	879e8c0bfe	chore: update kernel with BTF support This pull in: * https://github.com/siderolabs/pkgs/pull/612 * https://github.com/siderolabs/pkgs/pull/606 Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-10-24 17:42:10 +04:00
Tim Jones	e6fba7d3bc	chore: update dependencies Updates: * pkgs v1.3.0-alpha.0-33-g8fe5cbc * tools v1.3.0-alpha.0-20-g3b5f89a * aws-sdk-go v1.44.120 * docker v20.10.20+incompatible * fsnotify v1.6.0 * nftables v0.0.0-20221015190445-4f5cd5826fbd * gen v0.4.0 * grpc-proxy v0.4.0 * spf13/cobra v1.6.0 * u-root v0.10.0 * x/net v0.1.0 * x/sync v0.1.0 * x/sys v0.1.0 * x/term v0.1.0 * x/time v0.1.0 * grpc v1.50.1 * genproto v0.0.0-20221018160656-63c7b68cfc55 * Linux kernel 5.15.74 Signed-off-by: Tim Jones <tim.jones@siderolabs.com>	2022-10-21 15:20:01 +04:00
Dmitriy Matrenichev	93e55b85f2	chore: bump golangci-lint to v1.50.0 I had to do several things: - contextcheck now supports Go 1.18 generics, but I had to disable it because of this https://github.com/kkHAIKE/contextcheck/issues/9 - dupword produces to many false positives, so it's also disabled - revive found all packages which didn't have a documentation comment before. And tehre is A LOT of them. I updated some of them, but gave up at some point and just added them to exclude rules for now. - change lint-vulncheck to use `base` stage as base Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>	2022-10-20 18:33:19 +03:00
Andrey Smirnov	aa3d9b4ca6	fix: regenerate cert on node labeling retry We have a high timeout for node labeling retry, while the generated temporary Kubernetes PKI has a lifetime of 10 minutes, which means after 10 minutes any call would fail with `Unauthorized`. Fix that by pulling in the PKI generation into the retry loop, this way cert is always refresh, and it also reacts to machine config changes (e.g. if the endpoint got changed). Also it helps with the retries if the DNS updates or any other changes like that. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-10-18 22:00:41 +04:00
Andrey Smirnov	021c73c352	fix: lowercase nodename Kubernetes always lowercases whatever nodename is given to the kubelet, so we should do the same, otherwise Talos looks for a `Node` with uppercase letter which is never going to be registered. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-10-18 21:20:12 +04:00
Serge Logvinov	dc70d892a3	fix: support setting KubeSpan link MTU Kubespan creates package size more than MTU external interface size. This PR adds capabilities to change MTU size through machine config. And sets MTU of the default kubespan route. Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev> Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-10-17 14:39:15 +04:00
Andrew Rynhard	b7b1d4fd6a	feat: use readonly containers Containers created with `talosctl cluster create` are ran with readonly filesystem. This more accurately mimics standard Talos. Signed-off-by: Andrew Rynhard <andrew@rynhard.io>	2022-10-11 15:24:38 +00:00
Andrey Smirnov	993743f634	fix: skip hostname via DHCP on OpenStack platform Introduce new DHCP operator option to skip hostname request/response, and use that in OpenStack platform. OpenStack configures interface with DHCP, while providing dummy hostname over DHCP and proper hostname over metadata. As operators override platform settings, DHCP hostname takes over OpenStack hostname. As a fix, ignore DHCP hostname while on OpenStack. Fixes #6350 Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-10-10 14:18:46 +04:00
Serge Logvinov	db076e7b5a	feat: pin interface by mac address in cmdline args Example, set interface IP address by MAC: ```cmdline: ip=172.20.0.2::172.20.0.1:255.255.255.0::enx001122aabbcc``` Interface MAC is `00:11:22:aa:bb:cc`. Source: https://www.freedesktop.org/wiki/Software/systemd/PredictableNetworkInterfaceNames/ Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev> Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-10-10 13:56:42 +04:00
Andrey Smirnov	06f76bfebb	chore: bump dependencies Update to some dependencies moved to siderolabs/ path. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-10-04 14:47:27 +04:00
Andrey Smirnov	139c62d762	feat: allow upgrades in maintenance mode (only over SideroLink) This implements a simple way to upgrade Talos node running in maintenance mode (only if Talos is installed, i.e. if `STATE` and `EPHEMERAL` partitions are wiped). Upgrade is only available over SideroLink for security reasons. Upgrade in maintenance mode doesn't support any options, and it works without machine configuration, so proxy environment variables are not available, registry mirrors can't be used, and extensions are not installed. Fixes #6224 Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-09-30 21:16:15 +04:00
Noel Georgi	48dee48057	feat: support mtu for routes Support setting MTU for routes. Fixes: #6324 Signed-off-by: Noel Georgi <git@frezbo.dev>	2022-09-30 16:38:22 +05:30
Serge Logvinov	18c377a4d1	feat: customize audit policy Add resource `AuditPolicyConfigs.kubernetes.talos.dev`. It can be changed through machine config `cluster.apiServer.auditPolicy` Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev> Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-09-28 13:46:44 +04:00
Noel Georgi	23c9ea46bb	fix: raspberry pi install Fix raspberry pi install. Some fixes were missed from #6388 Signed-off-by: Noel Georgi <git@frezbo.dev>	2022-09-28 01:09:28 +05:30
Philipp Sauter	f17cdee167	feat: jsonpath filter for talosctl get outputs We add a filter to the `talosctl get` command that allows users to specify a jsonpath filter. Now they can reduce the information that is printed to only the parts they are interested in. Fixes #6109 Signed-off-by: Philipp Sauter <philipp.sauter@siderolabs.com>	2022-09-27 20:47:11 +02:00
Noel Georgi	6bd3cca1a8	chore: generic raspberry pi images Use generic Raspberry Pi images. Deprecate the RPi4 specific image. Ref: https://github.com/siderolabs/pkgs/pull/596 Signed-off-by: Noel Georgi <git@frezbo.dev>	2022-09-27 16:39:12 +05:30
Kris Reeves	a0151aa13e	feat: add generic rpi u-boot support This commit adds support for building Talos for the Compute Module 4 and other generic Raspberry Pi hardware. Fixes: #6273 Signed-off-by: Kris Reeves <kris@pressbuttonllc.com> Signed-off-by: Noel Georgi <git@frezbo.dev>	2022-09-26 21:04:07 +05:30
Andrey Smirnov	30f851d093	chore: bump dependences go-mod-outdated Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-09-26 18:37:38 +04:00
Andrey Smirnov	8b2235c3b6	fix: lookup Equinix Metal bond slaves using 'permanent addr' See #6333 Using permanent address fixes issues with mis-matching the links after they got bonded. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-09-26 18:10:39 +04:00
Andrey Smirnov	0b2767c164	feat: implement 'permanent addr' in link statuses Permanent address is only available for physical links, and it might be different from the 'hardware address': when bonding, 'hardware address' gets overridden from the bond master, while 'permanent address' still shows MAC of the interface. This part of the fix for incorrect bonding issue on Equinix Metal. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-09-26 14:45:46 +04:00
Dmitriy Matrenichev	fc48849d00	chore: move maps/slices/ordered to gen module Use github.com/siderolabs/gen Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>	2022-09-21 20:22:43 +03:00
Andrey Smirnov	8b09bd4b04	feat: update Kubernetes to v1.26.0-alpha.1 Talos 1.3.0 will ship with Kubernetes 1.26.0. See https://github.com/kubernetes/kubernetes/releases/tag/v1.26.0-alpha.1 Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-09-21 18:42:31 +04:00
Noel Georgi	357b770cb5	fix: cryptsetup delete slot Fix cryptsetup delete slot. Fixes: #6298 Signed-off-by: Noel Georgi <git@frezbo.dev>	2022-09-21 16:37:54 +05:30
Andrey Smirnov	7111288393	fix: continue applying bootstrap manifests on some errors Fixes #6302 This allows Talos to proceed if some manifest is invalid (or malformed), while aborts the loop on connection errors (when `kube-apiserver` is not ready). This fixes a problem when a single resource might stop all manifests from being applied and preventing a cluster bootstrap. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-09-20 22:27:17 +04:00
Andrey Smirnov	472590aa82	chore: return InvalidArgument on invalid config in maintenance mode Follow-up fix for #6258 Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-09-15 21:46:48 +04:00
Andrey Smirnov	e5cabd42cc	feat: enable etcd consistency hashcheck This will be only enabled for Talos v1.3.x. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-09-15 21:03:40 +04:00
Andrey Smirnov	015535d905	fix: update discovery client with the redirect fix See https://github.com/siderolabs/discovery-client/pull/4 Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-09-15 20:32:33 +04:00
Andrey Smirnov	94b088f02f	fix: set etcd options consistently This fixes an issue introduced in #5879: options should be set same way for both `init` and `controlplane` cases. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-09-14 22:56:26 +04:00
Andrey Smirnov	7b270ff33d	test: fix api controller test Fixing the test to match the implementation. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-09-13 15:26:32 +04:00
Andrey Smirnov	2dadcd6695	fix: stop worker nodes from acting as apid routers Don't allow worker nodes to act as apid routers: * don't try to issue client certificate for apid on worker nodes * if worker nodes receives incoming connections with `--nodes` set to one of the local addresses of the nodd, it routes the request to itself without proxying Second point allows using `talosctl -e worker -n worker` to connect directly to the worker if the connection from the control plane is not available for some reason. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-09-13 15:07:31 +04:00
Andrey Smirnov	9eaf33f3f2	fix: never sign client certificate requests in trustd Talos worker nodes use `trustd` API on control plane nodes to issue certificates for `apid` service. Access to the API is protected with the Talos join token specified in the machine configuration. There was no validation on what kind of request is requested, so `trustd` could issue a certificate which is valid for client authentication with any set of Talos API RBAC roles, including `os:admin` role allowing full access to the Talos API on control plane nodes. See: GHSA-7hgc-php5-77qq CVE: CVE-2022-36103 Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2022-09-13 15:06:09 +04:00
Noel Georgi	4367491247	feat: environment vars for extension service This allows setting environment variables for the extension service. Signed-off-by: Noel Georgi <git@frezbo.dev>	2022-09-13 14:06:55 +05:30

... 4 5 6 7 8 ...

2110 Commits