talos

Author	SHA1	Message	Date
Alexey Palazhchenko	37a5edf04a	feat: update Kubernetes to 1.21.0 release See CHANGELOG: https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.21.md Closes #3329. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-04-09 20:08:20 +03:00
Alexey Palazhchenko	30f687b417	fix: document HDMI problem on RPi 4 Closes #3414. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-04-08 14:06:12 -07:00
Alexey Palazhchenko	29da22d063	feat: add config validation warnings Closes #3412. Refs #3413. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-04-08 13:49:58 -07:00
Andrey Smirnov	eee7ad13aa	release(v0.10.0-alpha.2): prepare release This is the official v0.10.0-alpha.2 release. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-04-08 13:03:50 -07:00
Andrey Smirnov	e0650218a6	feat: support etcd recovery from snapshot on bootstrap When Talos `controlplane` node is waiting for a bootstrap, `etcd` contents can be recovered from a snapshot created with `talosctl etcd snapshot` on a healthy cluster. Bootstrap process goes same way as before, but the etcd data directory is recovered from the snapshot. This flow enables disaster recovery for the control plane: given that periodic backups are available, destroy control plane nodes, re-create them with the same config, and bootstrap one node with the saved snapshot to recover etcd state at the time of the snapshot. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-04-08 10:15:37 -07:00
Artem Chernyshev	247bd50e05	docs: describe steps to install and boot Talos from the SSD on rockpi4 Describe that gross flow while I still remember it. Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2021-04-07 13:06:58 -07:00
Spencer Smith	e6b4e524ff	test: update CAPA to 0.6.4 This PR pulls in an updated cluster api aws version, ensuring the CRDs are closer to what's expected when we patch the CAPA image later in the setup. We will eventually move to 0.6.5 as soon as it's cut. Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>	2021-04-07 14:37:20 -04:00
Andrey Smirnov	28753f6dcb	fix: trim endpoints/nodes from arguments in talosctl config When copy-pasting extra space might be added around an argument to the `talosctl config endpoints/nodes`, which breaks the config as the endpoint doesn't parse anymore as IP address. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-04-07 11:37:02 -07:00
Alexey Palazhchenko	aca63b8829	docs: fix "DigitalOcean" spelling Refs #3427. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-04-07 09:13:24 -07:00
Andrey Smirnov	33035901ff	fix: revert mark PMBR EFI partition as bootable See talos-systems/go-blockdevice#34 talos-systems/talos#3440 That change broke UEFI boot. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-04-07 07:24:58 -07:00
Andrey Smirnov	fbfd1eb2b1	refactor: pull new version of os-runtime, update code This is mostly refactoring to adapt to the new APIs. There are some small changes which are not user-visible immediately (but visible when using `talosctl get` to inspect low-level details): * `extras` namespace is removed, it was a hack to distinguish extra and system manifests * `Manifests` are managed by two controllers as shared outputs, stored in the `controlplane` namespace now * `talosctl inspect dependencies` output got slightly changed * resources now have `md.owner` set to the controller name which manages the resource Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-04-07 06:55:09 -07:00
Alexey Palazhchenko	8737ea716a	feat: allow external cloud provides configration Closes #3312. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-04-06 22:54:24 -07:00
Andrey Smirnov	3909e2d011	chore: update Go to 1.16.3 See talos-systems/tools#134 talos-systems/pkgs#260 talos-systems/extras#16 Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-04-06 13:53:53 -07:00
Andrey Smirnov	690eb20e97	chore: update blockdevice library for PMBR bootable fix See https://github.com/talos-systems/go-blockdevice/pull/33 Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-04-06 06:14:56 -07:00
Andrey Smirnov	a8761b8e1e	fix: require leader on etcd member operations This fix is not obvious on whether we need it actually or not, but what I've seen in the tests seems to be around the fact that added member is not visible in the member list fetched after the add command succeeds. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-04-06 05:36:45 -07:00
Alexey Palazhchenko	3dc84625cb	fix: make both HDMI ports work on RPi 4 Closes #3414. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-04-05 15:25:39 -07:00
Andrey Smirnov	bd5ae1e0b5	fix: add a check for overlay mounts in installer pre-flight checks Overlay mount in `mountinfo` don't show up as mounts for any particular block device, so the existing check doesn't catch them. This was discovered as our current master can't upgrade because of overlay mount for `/opt` and `apid` image in `/opt/apid` (which will be fixed in a separate PR). Without the check, installer fails on resetting partition table for the disk effectively wiping the node (`device or resource busy` error). Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-04-05 14:29:46 -07:00
Andrey Smirnov	df8649cbe6	refactor: download modules before `go generate` This moves things around a bit so that `go generate` is called after modules are generated, as `go generate` downloads modules as well. This fixes a race condition which might show up randomly. Spotted by: @AlekSi Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-04-05 11:38:40 -07:00
Andrey Smirnov	39ae0415e9	chore: bump dependencies via dependabot See #3431 #3432 #3433 #3434 Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-04-05 06:16:24 -07:00
Artem Chernyshev	e16d6d3468	fix: publish rockpi4 image to release artifacts Attempt #2. Forgot to add it to .drone.jsonnet also 🤦 Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2021-04-03 18:20:54 -07:00
Artem Chernyshev	39c6dbcc7a	feat: add --config-patch parameter to talosctl gen config Fixes: https://github.com/talos-systems/talos/issues/3410 Same as in `talosctl cluster create`. Will apply RFC6902 json patch during the config generation if specified. Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2021-04-02 10:56:41 -07:00
Andrey Smirnov	e664362cec	feat: add API and command to save etcd snapshot (backup) This adds a simple API and `talosctl etcd snapshot` command to stream snapshot of etcd from one of the control plane nodes to the local file. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-04-02 09:20:16 -07:00
Andrey Smirnov	61b694b948	fix: create rootfs for system services via /system tmpfs Container rootfs should be writeable as containerd mounts standard filesystems `/proc` et al. When `/opt` was used as a root of container filesystem this results in a problem: Talos overlay mounts `/opt` on `/var/system` which means that as long as `apid` running `/var` can't be unmounted which breaks upgrades. So instead use `/system/libexec` as rootfs for the containers, `/system` is `tmpfs`, and bind-mount actually executable (`/sbin/init`, machined) into rootfs. This fixes upgrades for 0.10. See also #3425 Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-04-02 06:37:29 -07:00
Andrey Smirnov	abc2e17ebb	test: update 0.9.x version in upgrade tests to 0.9.1 Version 0.9.1 contains a fix for concurrent map write on unmount which was frequently breaking our upgrade tests. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-04-02 03:59:36 -07:00
Andrey Smirnov	a1e6415403	fix: retry Kubernetes API errors on cordon/uncordon/etc This extracts function which was used in upgrade/convert flows to retry transient errors to the main `kubernetes` package, expands it to ignore timeout errors, and it is now used to retry errors where applicable in `pkg/kubernetes`. Fixes #3403 Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-04-02 03:51:40 -07:00
Andrey Smirnov	063d1abe9c	fix: print task failure error immediately The way processing works is that errors are not printed in the sequencer, but something which called the sequencer prints the error, but this means that for fatal failures say in 'upgrade' sequence error message is printed by machined after the `apid` is stopped. This means that error won't be visible via `talosctl dmesg`, but only in serial console. This changes the flow to print the task error as soon as task fails, and removes 'done' messages in the sequencer if sequence/phase/task fails (as otherwise it has both 'done' and 'failed' message which is confusing). Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-04-02 03:08:26 -07:00
Andrey Smirnov	e039172eda	fix: ignore EOF errors from Kubernetes API when converting control plane During the conversion process, API server goes down, so we can see lots of network errors including EOF. Fixes #3404 Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-04-01 10:52:44 -07:00
Branden Cash	7bcb91a433	docs: fix typo for stage flag docs mentioned `--staged` flag, but should be `--stage` Signed-off-by: Branden Cash <ammmze@gmail.com>	2021-04-01 10:44:46 -07:00
Andrey Smirnov	a43acb2150	feat: bring in Linux 5.10.27, support for 32-bit time syscalls This provides binary compatibility for really old binaries using 32-bit time. See also: talos-systems/pkgs#259 Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-04-01 08:21:37 -07:00
Andrey Smirnov	e2bb5973da	release(v0.10.0-alpha.1): prepare release This is the official v0.10.0-alpha.1 release. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-03-31 23:17:31 +03:00
Andrey Smirnov	8309312a3d	chore: build components with race detector enabled in dev mode This provides a variable to build core Talos components with race detector enabled: `make initramfs WITH_RACE=yes`. Also refactored and DRYed up the build code exposing common build/link flags via the Makefile. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-03-31 10:55:50 -07:00
Andrey Smirnov	7d91258475	test: fix data race in apply config tests Variable `chanErr` was read before waiting for the goroutine to finish. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-03-31 10:46:50 -07:00
Andrey Smirnov	204caf8eb9	test: fix apply-config integration test, bump clusterctl version Tests for ApplyConfig API were relying on not really supported behavior of modifying config via the `Provider` interface (and it was "fixed" in another PR which cleans up such access to the configuration). Cluster version bumped to try to workaround strange CAPI bootstrap failures in e2e-capi. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-03-31 09:55:53 -07:00
Artem Chernyshev	d812099df3	fix: address several issues in TUI installer - Table row selection was 1 element off, so disk selector wasn't quite working. - Reduce amount of interfaces on the last screen: show only ones that have physical addresses (changing some settings for lo0 for example was making TUI generate incorrect configs) Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2021-03-30 19:00:12 -07:00
Andrey Smirnov	269c9ad098	fix: don't write to config object on access This avoids data race on config access: config object might be accessed concurrently and it should be read-only on access. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-03-30 10:38:02 -07:00
Alexey Palazhchenko	a9451f5712	feat: update Kubernetes to 1.21.0-beta.1 See CHANGELOG: https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.21.md Refs #3329. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-03-30 03:07:03 -07:00
Artem Chernyshev	4b42ced4c2	feat: add ability to disable comments in talosctl gen config Fixes: https://github.com/talos-systems/talos/issues/3384 Instead of doing simple `--no-comments` flag, decided to use more granular approach which allows to either disable examples, or docstring, or both. Thus the command looks like this: ```bash talosctl gen config --with-docs=false --with-examples=false <...> ``` Both are enabled by default to provide better UX for users learning Talos. Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2021-03-29 10:52:14 -07:00
Andrey Smirnov	a0dcfc3d52	fix: workaround race in containerd runner with stdin pipe Containerd API to pass stdin to the container is far from being perfect, but it seems to contain a race condition we can't avoid: if `NewTask()` fails, it starts the I/O loop in a goroutine, but never stops it. We can't stop it as well, as `NewTask()` failed, so to workaround this failure, copy the stdin into new reader on each access. This copying shouldn't be a big deal for us, as it's just machine configuration and it's tiny. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-03-29 10:04:50 -07:00
Andrey Smirnov	2ea20f598a	feat: replace timed with time sync controller This is a complete rewrite of time sync process. Now the time sync process starts early at boot time, and it adapts to configuration changes: * before config is available, `pool.ntp.org` is used * once config is available, configured time servers are used Controller updates same time sync resource as other controllers had dependency on, so they have a chance to wait for the time sync event. Talos services which depend on time now wait on same resource instead of waiting on timed health. New features: * time sync now sticks to the particular time server unless there's an error from that server, and server is changed in that case, this improves time sync accuracy * time sync acts on config changes immediately, so it's possible to reconfigure time sync at any time * there's a new 'epoch' field in time sync resources which allows time-dependent controllers to regenerate certs when there's a big enough jump in time Features to implement later: * apid shouldn't depend on timed, it should be started early and it should regenerate certs on time jump * trustd should be updated in same way Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-03-29 09:29:43 -07:00
Andrey Smirnov	c38a161ade	test: add unit-test for machine config validation Follow-up for #3383 I added couple of first tests, we should add more as we go through this code. Even with those tests, I found and fixed two more panics. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-03-29 07:32:06 -07:00
Andrey Smirnov	a6106815b7	chore: bump dependencies via dependabot See #3386 #3387 #3388 Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-03-29 06:38:55 -07:00
Alexey Palazhchenko	35598f391d	chore: refactor: extract ClusterConfig Extract ClusterConfig and related types. Make one huge file a bit smaller. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-03-29 05:49:51 -07:00
Artem Chernyshev	032851844f	fix: get rid of data race in encoder and fix concurrent map access Fixes: https://github.com/talos-systems/talos/issues/3377, https://github.com/talos-systems/talos/issues/3380 Fixed the data race in the encoder documentation examples by using `sync.Once`. We only need to generate them once anyways and then it's not a big deal that we are using the same pointers everywhere as they're pretty much constant. As of `system.go`, looks like we actually have concurrent operations for partitions unmount so I just added a mutex there. Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>	2021-03-29 01:00:46 -07:00
Andrey Smirnov	4b3580aa57	fix: prevent panic in validate config if `machine.install` is missing Fixes #3382 Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-03-26 15:47:07 -07:00
Alexey Palazhchenko	d7e9f6d6a8	chore: build integration tests with -race Refs https://github.com/talos-systems/talos/issues/3378. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-03-26 10:08:12 -07:00
Alexey Palazhchenko	9f7d67ac71	chore: fix typo Actually share golangci-lint cache. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-03-25 15:14:30 -07:00
Andrey Smirnov	672c970739	fix: allow `convert-k8s --remove-initialized-keys` with K8s cp is down The command `--remove-initialized-key` is the last resort to convert control plane when control plane is down for whatever reason, so it should work when control plane is not available. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2021-03-25 14:06:08 -07:00
Alexey Palazhchenko	fb605a0fc5	chore: tweak nolintlint settings Copy from kres manually for now. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-03-25 13:56:16 -07:00
Alexey Palazhchenko	1f5a0c4065	fix: resolve the issue with Kubernetes upgrade Add missing cases, refactoring. Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>	2021-03-25 12:48:28 -07:00
Spencer Smith	74b2b5578c	docs: update AWS docs to ensure instances are tagged This PR updates our AWS docs so that we specify a tag when creating instances. This makes it easier to know which VMs were created as part of this process, as well as quickly spot the init node. Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>	2021-03-25 11:55:19 -04:00

1 2 3 4 5 ...

2334 Commits