talos

Author	SHA1	Message	Date
Alex Lubbock	ecce29dee9	fix: upgrade-k8s use internal IP first, external IP fallback Currently, upgrade-k8s adds both node internal and external IPs. This commit uses the internal IP if available; external IP is only used as a fallback. Signed-off-by: Alex Lubbock <code@alexlubbock.com> Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-05-31 18:21:27 +04:00
Andrey Smirnov	3c64a5ffba	chore: optimize image generation time Use `pigz` and `--sparse` to handle more efficiently compression of the assets. Also move tasks out of `setup-ci` step, as it runs always, including for the promoted pipelines. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-05-31 17:47:23 +04:00
Eirik Askheim	2292f36d97	chore: registry.k8s.io for coredns image Replace docker.io with registry.k8s.io for the coredns image. Signed-off-by: Eirik Askheim <eirik@x13.no> Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-05-31 17:06:13 +04:00
Alex Corcoles	f2b258b373	docs: document talosctl version for upgrades Use same version as running cluster. Signed-off-by: Alex Corcoles <alex@corcoles.net> Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-05-31 16:29:53 +04:00
Andrey Smirnov	a0773f783c	chore: add ukify Go script This is a port of ukify.py and systemd-measure from systemd. This requires no actual TPM to be present to calculate the PCR signatures. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com> Signed-off-by: Noel Georgi <git@frezbo.dev>	2023-05-30 23:33:26 +05:30
Andrey Smirnov	b69e38d1ff	chore: bump dependencies New pkgs, Linux 6.1.30, Flannel 0.22.0, Go modules. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-05-29 23:19:44 +04:00
DJAlPee	adce651034	docs: add piraeus/drbd to storage documentation How-To install Piraeus on a Talos cluster Signed-off-by: DJAlPee <DJAlPee@GitHub.com> Signed-off-by: Noel Georgi <git@frezbo.dev>	2023-05-29 16:52:01 +05:30
Alex Corcoles	a982cabe70	docs: link support matrix in k8s update doc Provide a link to explain what versions are supported. Signed-off-by: Alex Corcoles <alex@pdp7.net> Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-05-29 14:58:11 +04:00
Andrey Smirnov	1fb29a56a8	fix: fail quickly if upgrade-k8s is used with multiple nodes Fixes #7283 Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-05-29 14:35:29 +04:00
Noel Georgi	51d931c470	chore: faster dev cycle This cleans up `Dockerfile` and `Makefile` targets to be in similar parity with `kres` auto-generated targets. Now `make talosctl` would only build the one for the specific local machine making development easier. Also added a `iso` docker target that builds iso for local development without having to push and pull the imager. (`make local-iso DEST=_out`) Signed-off-by: Noel Georgi <git@frezbo.dev>	2023-05-27 19:10:11 +05:30
Andrey Smirnov	dc6764871c	refactor: move around config interfaces, make RawV1Alpha1 typed See #7230 Refactor more config interfaces, move config accessor interfaces to different package to break the dependency loop. Make `.RawV1Alpha1()` method typed to avoid type assertions everywhere. No functional changes. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-05-23 22:08:58 +04:00
Andrey Smirnov	ea9a97dba3	fix: fall back to external IP when discovering nodes in upgrade-k8s Fixes #7253 Also fix the case that `kube-proxy` version was updated in the machine config in `--dry-run` mode. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-05-23 17:54:35 +04:00
Andrey Smirnov	0bb7e8a5cf	refactor: split config.Provider into Config & Container See #7230 This is a step towards preparing for multi-doc config. Split the `config.Provider` interface into parts which have different implementation: * `config.Config` accesses the config itself, it might be implemented by `v1alpha1.Config` for example * `config.Container` will be a set of config documents, which implement validation, encoding, etc. `Version()` method dropped, as it makes little sense and it was almost not used. `Raw()` method renamed to `RawV1Alpha1()` to support legacy direct access to `v1alpha1.Config`, next PR will refactor more to make it return proper type. There will be many more changes coming up. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-05-23 16:05:16 +04:00
Dmitriy Matrenichev	85d8a16194	chore: bump deps Bump deps Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>	2023-05-22 16:02:15 -04:00
Spencer Smith	39b7a56f01	chore: use 8GiB instead of 10GiB for cloud images This PR changes the default disk size for cloud images to be 8GiB instead. This was prompted b/c the disk price in azure between tiers is doubled and the cutoff for the tier is 8GiB. Signed-off-by: Spencer Smith <spencer.smith@talos-systems.com>	2023-05-19 20:35:13 -04:00
Andrey Smirnov	ff11fd39c7	fix: race with `udevd` and `mountUserDisks` Fixes #7246 The problem was that `udevd` watches via `inotify` any attempts to open blockdevices with 'write' access. Talos was opening with write access, but actually accessing as read-only, so the fix is to open as read-only. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-05-19 22:02:48 +04:00
Spencer Smith	c3fabb9829	chore: update default image sizes to 10GB for all "cloud" images This PR adds a flag to imager that allows for tweaking the size of the created disk. Additionally, it sets the default value of that created disk to 10GB, as most images are cloud images that fail when uploaded b/c it only picks up a 1GB disk currently. Also adds some processing the makefile to make sure we set the default small value for metal images and SBCs. Signed-off-by: Spencer Smith <spencer.smith@talos-systems.com>	2023-05-19 13:35:39 -04:00
Andrey Smirnov	10155c390e	feat: enable xfs project quota support, kubelet feature This is controlled with a feature flag which gets enabled automatically for Talos 1.5+. Fixes #7181 If enabled, configures kubelet to use project quotas to track xfs volume usage, which is much more efficient than doing `du` periodically. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-05-19 20:33:39 +04:00
Andrey Smirnov	eba8185642	release(v1.5.0-alpha.0): prepare release This is the official v1.5.0-alpha.0 release. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-05-19 18:38:24 +04:00
Andrey Smirnov	383471c3e9	feat: update default Kubernetes to v1.27.2 See https://github.com/kubernetes/kubernetes/releases/v1.27.2 Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-05-19 15:14:17 +04:00
Dmitriy Matrenichev	8f68d1abef	chore: bump deps - github.com/benbjohnson/clock to v1.3.5 - cilium-cli to v0.14.3 Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>	2023-05-18 19:04:05 -04:00
Christian Rolland	e0c1585d30	feat: create azure community gallery image version on release Create Azure Community Gallery Image Version on release: - Add /hack/cloud-image-uploader/azure.go - Upload vhd file to container for all architectures - Create managed disk from vhd file for all architectures - Create image version from managed disk for all architectures - Modify /hack/cloud-image-uploader/main.go - Start Community Gallery processes concurently with AWS upload - Modify /hack/cloud-image-uploader/options.go - Add additional Options for Community Gallery processes - Modify .drone.jsonnet to use secrets for environment variables - The following secrets need to be created for this to work: - azure_subscription_id - azure_client_id - azure_client_secret - azure_tenant_id Signed-off-by: Christian Rolland <christian.rolland@siderolabs.com> chore: fix linting errors in readme Fix linting errors in readme Signed-off-by: Christian Rolland <christian.rolland@siderolabs.com> chore: fix markdown linting errors Fix markdown linting errors in readme Signed-off-by: Christian Rolland <christian.rolland@siderolabs.com> chore: fix markdown linting errors Fix markdown linting errors in readme Signed-off-by: Christian Rolland <christian.rolland@siderolabs.com> chore: change disk size to match new 10GB cloud image size Change disk size to match 10GB cloud image size Signed-off-by: Christian Rolland <christian.rolland@siderolabs.com>	2023-05-18 17:49:55 -04:00
Andrey Smirnov	dd8336c9ee	fix: refresh kubelet self-issued serving certificates Kubelet doesn't refresh self-issued serving certificates, so force it by removing the cert on each restart. Fix the code which was forcing rejoin when the nodename changes, it was broken, as it was checking serving certificate instead of client certificate. It worked by accident when not using controlplane-issued serving certificates. Fixes #7235 Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-05-18 22:19:34 +04:00
Andrey Smirnov	bb02dd263c	chore: drop deprecated stuff for Talos 1.5 * drop old resources API, which was deprecated long time ago * use bootstrapped event in `talosctl get --watch` to better align columns in the table output Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-05-18 19:46:37 +04:00
Dmitriy Matrenichev	61cad86731	chore: bump deps - github.com/containerd/typeurl to v2.1.1 - github.com/aws/aws-sdk-go to v1.44.264 - alpine to 3.18.0 - node to 20.2.0-alpine - github.com/containernetworking/plugins to v1.3.0 - github.com/docker/docker to v23.0.6+incompatible - github.com/hetznercloud/hcloud-go to v1.45.1 - github.com/insomniacslk/dhcp to v0.0.0-20230516061539-49801966e6cb - github.com/rivo/tview to v0.0.0-20230511053024-822bd067b165 - tools to v1.5.0-alpha.0-7-gd2dde48 - pkgs to v1.5.0-alpha.0-16-g7958db1 Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>	2023-05-18 01:07:36 -04:00
Andrey Smirnov	01dfd3af7d	feat: update etcd to v3.5.9 See https://github.com/etcd-io/etcd/releases/tag/v3.5.9 Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-05-15 15:59:23 +04:00
Ricky Sadowski	aa65fbb8a1	chore: update KUBECTL_URL to reflect the community bucket This PR changes the url used in the Makefile from a legacy URL to point to the new community owned download host. Signed-off-by: Ricky Sadowski <richard.j.sadowski@gmail.com> Signed-off-by: Noel Georgi <git@frezbo.dev>	2023-05-15 16:54:00 +05:30
Noel Georgi	cc3128d944	chore: bump kernel to 6.1.28 Bump kernel to 6.1.28 Signed-off-by: Noel Georgi <git@frezbo.dev>	2023-05-15 14:30:02 +05:30
Dmitriy Matrenichev	97fffaf78a	chore: use ctest.UpdateWithConflicts instead of plain UpdateWithConflicts More type-safety. Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>	2023-05-12 20:39:32 -04:00
Noel Georgi	3b36993b99	fix: rlimit nofile test The test was added at the wrong place. Signed-off-by: Noel Georgi <git@frezbo.dev>	2023-05-12 16:20:52 +05:30
Dmitriy Matrenichev	45e6e27af7	chore: bump runtime Use new functions and methods from runtime module. Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>	2023-05-11 17:18:08 -04:00
Noel Georgi	4f720d4653	fix: revert: set rlimit explicitly in wrapperd This reverts commit a2565f67416e9b9bc22f2d5506df9ea7771c0c8c. The fix done in `a2565f67`, was actually a no-op caused by the misunderstanding the fix done in Go and backported to [Go 1.20.4](`ecf7e00db8`). The fix gave a false confidence that it was working when it was tested against Talos `main` branch since the PR #7190 bumped `x/sys` package from [v0.7.0 -> v0.8.0](`ecf7e00db8`), the actual change in `x/sys` can be found here at `ff18efa0a3` which meant that when updating Go to 1.20.4 the `x/sys` package should been updated too. The `x/sys` package changed how the syscall to set the rlimit was called, it got moved into the Go stdlib instead of calling rlimit syscall in the `x/sys` package, which meant a combination of using Go 1.20.4 and an older `x/sys` package means `RLIMIT_NOFILE` value would not be set back to the original value. The Talos 1.4 release branch currently have `x/sys` at [v0.7.0(https://github.com/siderolabs/talos/blob/v1.4.3/go.mod#L133), so the backport would consist of this change along another commit bumping `x/sys` package to `v0.8.0`. Fixes: #7198 Fixes: #7206 Co-authored-by: Utku Ozdemir <utku.ozdemir@siderolabs.com> Signed-off-by: Noel Georgi <git@frezbo.dev>	2023-05-11 23:38:20 +05:30
Noel Georgi	a2565f6741	fix: set rlimit explicitly in wrapperd Now Go only sets the rlimit for the parent and any fork/exec'ed process gets the rlimit that was the default before fork/exec. Ref: https://github.com/golang/go/issues/46279 This fix got backported to [Go 1.20.4](`ecf7e00db8`) breaking Talos. Talos used to set rlimit in the [`SetRLimit`](https://github.com/siderolabs/talos/blob/v1.4.2/internal/app/machined/pkg/runtime/v1alpha1/v1alpha1_sequencer_tasks.go#L302) sequencer task. This means any process started by `wrapperd` gets the default Rlimit (1024). Fix this by explicitly setting `rlimit` in `wrapperd` before we drop any capabilities. Fixes: #7198 Signed-off-by: Noel Georgi <git@frezbo.dev>	2023-05-10 00:10:17 +05:30
Andrey Smirnov	cdfc242b83	chore: re-enable Go buildid With Go 1.20.4 the reproducibility issue is fixed. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-05-08 21:40:42 +04:00
Andrey Smirnov	e67f3f5c54	feat: linux 6.1.27, containerd 1.6.21, go 1.20.4 Plus bunch of other dependencies. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-05-08 20:26:19 +04:00
Andrey Smirnov	55ae59a0ad	fix: properly skip/cleanup controlplane configs for workers This bug is pretty cosmetic, but it shows up as a wrong check when performing worker upgrade - Talos pretends it checks e.g. kube-apiserver version which doesn't make sense for workers. There were two bugs in the code: * check for machine type was done against `TypeWorker`, while `MachineType` resource is initially created as `TypeUnknown` * the cleanup code was not implemented As I touched the code, I updated controller and tests to use modern conventions. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-05-05 23:17:27 +04:00
Andrey Smirnov	64eade9bde	chore: clean up unused constant There is another `KubeProxyImage` which we're actually using. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-05-05 20:18:10 +04:00
Utku Ozdemir	62c6e9655c	feat: introduce siderolink config resource & reconnect Introduce a new resource, `SiderolinkConfig`, to store SideroLink connection configuration (api endpoint for now). Introduce a controller for this resource which populates it from the Kernel cmdline. Rework the SideroLink `ManagerController` to take this new resource as input and reconfigure the link on changes. Additionally, if the siderolink connection is lost, reconnect to it and reconfigure the links/addresses. Closes siderolabs/talos#7142, siderolabs/talos#7143. Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>	2023-05-05 17:04:34 +02:00
Andrey Smirnov	860002c735	fix: don't reload control plane pods on cert SANs changes Fixes #7159 The change looks big, but it's actually pretty simple inside: the static pods had an annotation which tracks a version of the secrets which forced control plane pods to reload on a change. At the same time `kube-apiserver` can reload certificate inputs automatically from files without restart. So the inputs were split: the dynamic (for kube-apiserver) inputs don't need to be reloaded, so its version is not tracked in static pod annotation, so they don't cause a reload. The previous non-dynamic resource still causes a reload, but it doesn't get updated when e.g. node addresses change. There might be many more refactoring done, the resource chain is a bit of a mess there, but I wanted to keep number of changes minimal to keep this backportable. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-05-05 16:59:09 +04:00
Andrey Smirnov	d43c61e80f	fix: enforce nolock option for all NFS mounts by default Talos doesn't have `rpc.statsd` running, so mounting without locking is the only option. Some places in Kubernetes don't allow to set mount options for NFS, so setting defaults is the only way. Fixes #6582 Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-05-04 17:26:36 +04:00
Niklas Wik	339986db9d	fix: inhibit timer to follow kubelet timer Ensure to wait as long as possibly given to kubelet shutdown timers. Related to fix of siderolabs#7138 Signed-off-by: Niklas Wik <niklas.wik@nokia.com> Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-05-04 15:08:56 +04:00
Andrey Smirnov	cbf6dc1009	fix: set timeout for unmount calls Fixes #7137 The `umount` syscall might hang "forever" if the underlying network filesystem endpoint is down. To be on the safe side, add a timeout around unmount operations, and try to umount with force as a last resort. Sample log: ``` 14795.458779] [talos] task unmountPodMounts (2/2): unmounting /var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com/dbe8d7f58e21d06cbef1ae0849317661eba4e82776722e7db5c65194ad73e916/globalmount/0001-0009-rook-ceph-0000000000000001-1051beb3-8d7a-4291-bf45-5711c13523d1 [14795.459797] [talos] task unmountPodMounts (2/2): unmounting /var/lib/kubelet/pods/f3f4d789-7f48-4dd9-9ef5-649b002c8f9c/volumes/kubernetes.io~csi/pvc-a4e72749-a8a1-43d9-9152-5bc1f757c924/mount [14795.460555] EXT4-fs (rbd0): unmounting filesystem. [14813.461319] [talos] task unmountPodMounts (2/2): unmounting /var/lib/kubelet/pods/f3f4d789-7f48-4dd9-9ef5-649b002c8f9c/volumes/kubernetes.io~csi/pvc-a4e72749-a8a1-43d9-9152-5bc1f757c924/mount is taking longer than expected, still waiting for 1m11.999162834s [14831.460813] [talos] task unmountPodMounts (2/2): unmounting /var/lib/kubelet/pods/f3f4d789-7f48-4dd9-9ef5-649b002c8f9c/volumes/kubernetes.io~csi/pvc-a4e72749-a8a1-43d9-9152-5bc1f757c924/mount is taking longer than expected, still waiting for 53.999567033s [14849.461336] [talos] task unmountPodMounts (2/2): unmounting /var/lib/kubelet/pods/f3f4d789-7f48-4dd9-9ef5-649b002c8f9c/volumes/kubernetes.io~csi/pvc-a4e72749-a8a1-43d9-9152-5bc1f757c924/mount is taking longer than expected, still waiting for 35.998979117s [14867.460748] [talos] task unmountPodMounts (2/2): unmounting /var/lib/kubelet/pods/f3f4d789-7f48-4dd9-9ef5-649b002c8f9c/volumes/kubernetes.io~csi/pvc-a4e72749-a8a1-43d9-9152-5bc1f757c924/mount is taking longer than expected, still waiting for 17.999502128s [14885.461123] [talos] task unmountPodMounts (2/2): unmounting /var/lib/kubelet/pods/f3f4d789-7f48-4dd9-9ef5-649b002c8f9c/volumes/kubernetes.io~csi/pvc-a4e72749-a8a1-43d9-9152-5bc1f757c924/mount with force [14885.462395] [talos] ignoring unmount error /var/lib/kubelet/pods/f3f4d789-7f48-4dd9-9ef5-649b002c8f9c/volumes/kubernetes.io~csi/pvc-a4e72749-a8a1-43d9-9152-5bc1f757c924/mount: invalid argument [14885.463529] [talos] task unmountPodMounts (2/2): unmounting /var/run/netns/cni-0888dc71-ba9e-af8a-d322-074f654561e5 [14885.464267] [talos] task unmountPodMounts (2/2): done, 1m30.028862262s ``` Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-05-03 23:32:23 +04:00
Andrey Smirnov	b58f913d5f	fix: set the static pod priority as values API server takes care of setting priority for "regular" pods from priorityClassName, but nothing does that for static pods, so we have to specify the priotity explicitly for static pods. This fixes the graceful node shutdown (kubelet) to stop non-critical pods before the api-server and friends (critical pods). Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-05-02 20:20:20 +04:00
Steve Francis	f8a7a5b6bf	docs: add information about KubeSpan ports and topology Update KubeSpan documentation. Signed-off-by: Steve Francis <steve.francis@talos-systems.com> Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-05-01 17:57:43 +04:00
Steve Francis	2bad74d642	docs: add how to on scaling down Describe scaling down Talos cluster. Signed-off-by: Steve Francis <steve.francis@talos-systems.com> Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-05-01 16:48:13 +04:00
Thomas Perronin	7442ff8b09	chore: fix typos inteface -> interface (docs and tests) Fix typos. Signed-off-by: Thomas Perronin <gecko.splinter@gmail.com> Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-05-01 16:15:08 +04:00
Michael A. Davis	d4e94f7a15	fix: add back required TARGETARCH for installer Adds back in the required TARGETARCH for installer so extensions can be built off installer again as nvidia nonfree extension building was broken. Fixes: #7155 Refs: #7115 Signed-off-by: Michael A. Davis <6325127+mrmichaeladavis@users.noreply.github.com> Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-05-01 15:41:56 +04:00
Andrey Smirnov	e6fffda013	chore: linux 6.1.26, runc 1.1.7 Update to the latest pkgs. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-04-27 20:31:04 +04:00
Andrey Smirnov	344746ae2f	fix: bump max inhibit delay to 20 min Fixes #7138 This brings max shutdown period to 20 min that kubelet would accept. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-04-27 17:08:04 +04:00
Andrey Smirnov	d9bdea2b54	chore: fork docs and compatibility modules for Talos 1.5 Getting ready for the next Talos 1.5.0-alpha.0 release. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>	2023-04-27 15:36:31 +04:00

1 2 3 4 5 ...

4141 Commits