2306 Commits

Author SHA1 Message Date
Andrey Smirnov
a43acb2150 feat: bring in Linux 5.10.27, support for 32-bit time syscalls
This provides binary compatibility for really old binaries using 32-bit
time.

See also: talos-systems/pkgs#259

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-04-01 08:21:37 -07:00
Andrey Smirnov
e2bb5973da release(v0.10.0-alpha.1): prepare release
This is the official v0.10.0-alpha.1 release.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-31 23:17:31 +03:00
Andrey Smirnov
8309312a3d chore: build components with race detector enabled in dev mode
This provides a variable to build core Talos components with race
detector enabled: `make initramfs WITH_RACE=yes`.

Also refactored and DRYed up the build code exposing common build/link
flags via the Makefile.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-31 10:55:50 -07:00
Andrey Smirnov
7d91258475 test: fix data race in apply config tests
Variable `chanErr` was read before waiting for the goroutine to finish.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-31 10:46:50 -07:00
Andrey Smirnov
204caf8eb9 test: fix apply-config integration test, bump clusterctl version
Tests for ApplyConfig API were relying on not really supported behavior
of modifying config via the `Provider` interface (and it was "fixed" in
another PR which cleans up such access to the configuration).

Cluster version bumped to try to workaround strange CAPI bootstrap
failures in e2e-capi.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-31 09:55:53 -07:00
Artem Chernyshev
d812099df3 fix: address several issues in TUI installer
- Table row selection was 1 element off, so disk selector wasn't quite
working.
- Reduce amount of interfaces on the last screen: show only ones that
have physical addresses (changing some settings for lo0 for example was
 making TUI generate incorrect configs)

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2021-03-30 19:00:12 -07:00
Andrey Smirnov
269c9ad098 fix: don't write to config object on access
This avoids data race on config access: config object might be accessed
concurrently and it should be read-only on access.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-30 10:38:02 -07:00
Alexey Palazhchenko
a9451f5712 feat: update Kubernetes to 1.21.0-beta.1
See CHANGELOG:
https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.21.md

Refs #3329.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>
2021-03-30 03:07:03 -07:00
Artem Chernyshev
4b42ced4c2 feat: add ability to disable comments in talosctl gen config
Fixes: https://github.com/talos-systems/talos/issues/3384

Instead of doing simple `--no-comments` flag, decided to use more
granular approach which allows to either disable examples, or docstring,
or both.

Thus the command looks like this:

```bash
talosctl gen config --with-docs=false --with-examples=false <...>
```

Both are enabled by default to provide better UX for users learning
Talos.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2021-03-29 10:52:14 -07:00
Andrey Smirnov
a0dcfc3d52 fix: workaround race in containerd runner with stdin pipe
Containerd API to pass stdin to the container is far from being perfect,
but it seems to contain a race condition we can't avoid: if `NewTask()`
fails, it starts the I/O loop in a goroutine, but never stops it. We
can't stop it as well, as `NewTask()` failed, so to workaround this
failure, copy the stdin into new reader on each access.

This copying shouldn't be a big deal for us, as it's just machine
configuration and it's tiny.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-29 10:04:50 -07:00
Andrey Smirnov
2ea20f598a feat: replace timed with time sync controller
This is a complete rewrite of time sync process.

Now the time sync process starts early at boot time, and it adapts to
configuration changes:

* before config is available, `pool.ntp.org` is used
* once config is available, configured time servers are used

Controller updates same time sync resource as other controllers had
dependency on, so they have a chance to wait for the time sync event.

Talos services which depend on time now wait on same resource instead of
waiting on timed health.

New features:

* time sync now sticks to the particular time server unless there's an
error from that server, and server is changed in that case, this
improves time sync accuracy

* time sync acts on config changes immediately, so it's possible to
reconfigure time sync at any time

* there's a new 'epoch' field in time sync resources which allows
time-dependent controllers to regenerate certs when there's a big enough
jump in time

Features to implement later:

* apid shouldn't depend on timed, it should be started early and it
should regenerate certs on time jump

* trustd should be updated in same way

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-29 09:29:43 -07:00
Andrey Smirnov
c38a161ade test: add unit-test for machine config validation
Follow-up for #3383

I added couple of first tests, we should add more as we go through this
code. Even with those tests, I found and fixed two more panics.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-29 07:32:06 -07:00
Andrey Smirnov
a6106815b7 chore: bump dependencies via dependabot
See #3386 #3387 #3388

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-29 06:38:55 -07:00
Alexey Palazhchenko
35598f391d chore: refactor: extract ClusterConfig
Extract ClusterConfig and related types.
Make one huge file a bit smaller.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>
2021-03-29 05:49:51 -07:00
Artem Chernyshev
032851844f fix: get rid of data race in encoder and fix concurrent map access
Fixes: https://github.com/talos-systems/talos/issues/3377, https://github.com/talos-systems/talos/issues/3380

Fixed the data race in the encoder documentation examples by using `sync.Once`.
We only need to generate them once anyways and then it's not a big deal
that we are using the same pointers everywhere as they're pretty much
constant.

As of `system.go`, looks like we actually have concurrent operations for
partitions unmount so I just added a mutex there.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2021-03-29 01:00:46 -07:00
Andrey Smirnov
4b3580aa57 fix: prevent panic in validate config if machine.install is missing
Fixes #3382

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-26 15:47:07 -07:00
Alexey Palazhchenko
d7e9f6d6a8 chore: build integration tests with -race
Refs https://github.com/talos-systems/talos/issues/3378.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>
2021-03-26 10:08:12 -07:00
Alexey Palazhchenko
9f7d67ac71 chore: fix typo
Actually share golangci-lint cache.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>
2021-03-25 15:14:30 -07:00
Andrey Smirnov
672c970739 fix: allow convert-k8s --remove-initialized-keys with K8s cp is down
The command `--remove-initialized-key` is the last resort to convert
control plane when control plane is down for whatever reason, so it
should work when control plane is not available.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-25 14:06:08 -07:00
Alexey Palazhchenko
fb605a0fc5 chore: tweak nolintlint settings
Copy from kres manually for now.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>
2021-03-25 13:56:16 -07:00
Alexey Palazhchenko
1f5a0c4065 fix: resolve the issue with Kubernetes upgrade
Add missing cases, refactoring.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>
2021-03-25 12:48:28 -07:00
Spencer Smith
74b2b5578c docs: update AWS docs to ensure instances are tagged
This PR updates our AWS docs so that we specify a tag when creating
instances. This makes it easier to know which VMs were created as part
of this process, as well as quickly spot the init node.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2021-03-25 11:55:19 -04:00
Alexey Palazhchenko
dc21d9b4b0 chore: remove old file
To prevent confusion.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>
2021-03-25 08:39:54 -07:00
Andrey Smirnov
966caf7a67 chore: remove unused module replace directives
They were required long time ago, doesn't look like we need it now.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-25 08:16:25 -07:00
Spencer Smith
98b22f1e0b feat: show short options in talosctl kubeconfig
This PR just fixes a teeny usability problem I saw yesterday with Steve,
where it's not immediately clear that you don't have to type the entire
word when you encounter an existing context when pulling kubeconfig.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2021-03-25 09:55:24 -04:00
Andrey Smirnov
51139d54d4 chore: cache go modules in the build
This does proper caching for Go modules so that when go.mod/go.sum are
changed, only updated modules are downloaded vs. all of them.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-25 06:54:57 -07:00
Artem Chernyshev
65701aa724 fix: resolve the issue with DHCP lease not being renewed
Very easily reproduced when you start a node with a Dynamic IP.
Normally it should renew lease after TTL/2, but that doesn't happen, so
the node starts to get next IP one after another.

After looking at packets sent by other clients, found out that they
have `Client IP address` equal to the IP given by the DHCP server.

Additionally, changed DHCP client to send Request packets directly to the DHCP server after getting an offer.
It looks like DHCP spec states that you should use unicast request directly to DHCP server, not broadcast.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2021-03-25 06:44:23 -07:00
Andrey Smirnov
711f5b23be fix: config validation: CNI should apply to cp nodes, encryption config
Encryption config should be checked for state partition as well.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-25 02:42:21 -07:00
Spencer Smith
5ff491d968 fix: allow empty list for CNI URLs
This PR fixes a bug where, only when init nodes were used, we were
throwing an error during validation if there were no URLs in the list
for custom CNIs. We actually allow this empty list now so folks can
BYO-CNI.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2021-03-24 11:48:09 -07:00
Spencer Smith
946e74f047 docs: update path for kernel downloads in qemu docs
This PR fixes a docs bug where the name of the kernel and init to
download were incorrect for qemu.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2021-03-24 09:48:12 -07:00
Alexey Palazhchenko
ed272e604e feat: update Kubernetes to 1.21.0-beta.0
See CHANGELOG:
https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.21.md

Refs #3329.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>
2021-03-24 07:36:54 -07:00
Andrey Smirnov
b0209fd29d refactor: move networkd, timed APIs to machined, remove routerd
This moves implementation of the user-facing APIs to the machined, and
as now all the APIs are implemented by machined, remove routerd and
adjust apid to proxy to machined.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-24 00:00:28 -07:00
Artem Chernyshev
6ffabe5169 feat: add ability to find disk by disk properties
Fixes: https://github.com/talos-systems/talos/issues/3323

Not exactly matching with udevd generated `by-<id>` symlinks, but should
provide sufficient amount of property selectors to be able to pick
specific disks for any kind of disk: sd card, hdd, ssd, nvme.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2021-03-23 14:23:02 -07:00
Andrey Smirnov
ac8764702f refactor: move apid, routerd, timed and trustd to single executable
This removes container images for the aforementioned services, they are
now built into `machined` executable which launches one or another
service based on `argv[0]`.

Containers are started with rootfs directory which contains only a
single executable file for the service.

This creates rootfs on squashfs for each container in
`/opt/<container>`.

Service `networkd` is not touched as it's handled in #3350.

This removes all the image imports, snapshots and other things which
were associated with the existing way to run containers.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-23 09:48:11 -07:00
Andrey Smirnov
89a4b09fe8 refactor: run networkd as a goroutine in machined
This removes networkd as a separate container and image.

Reasons:

* `machined` becomes more and more bound into the core flow - now it
interacts with `etcd` for VIPs, so container has more and more
mounts/permissions
* it should be easier to COSIfy machined piece by piece if we have it
running in the same process
* initramfs size

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-23 06:22:49 -07:00
Alexey Palazhchenko
f4a6a19cd1 chore: update sonobuoy
To stay current.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>
2021-03-23 04:49:44 -07:00
Andrey Smirnov
dc294db16c chore: bump dependencies via dependabot
PRs #3336 #3337 #3338 #3339

Also bump proto tools via talos-systems/tools#133

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-22 13:58:08 -07:00
Andrey Smirnov
2b1641a3b5 docs: add AMIs for Talos 0.9.0
Not all the regions were able to process the request, so list is a bit
shorter than usual.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-22 11:58:28 -07:00
Andrew Rynhard
79ceb428d4 docs: make v0.9 the default docs
This makes the v0.9 release the default documentation.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2021-03-22 11:20:20 -07:00
Andrey Smirnov
a5b62f4dc2 docs: add documentation for Talos 0.10
Move default docs generation to 0.10 folder.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-22 06:24:39 -07:00
Andrey Smirnov
ce795f1cea fix: command etcd remove-member shouldn't remove etcd data directory
There are two APIs and `talosctl` commands:

* `etcd leave` removes the member from the cluster and removes etcd
data directory for the called node
* `etcd remove-member <node>` removes some other node from the etcd
cluster, but it doesn't affect called node state

This fixes confusing naming of the methods vs. what they're doing.

Fixes #3340

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-22 02:11:06 -07:00
Jorik Jonker
aab49a167b fix: repair zsh completion
Cobra does not hook the ZSH completion rules, as it appears. Tools with
working ZSH cobra completion (helm, kubectl) do so by printing the hook
(`compdef _<completion> <tool>`) themselves.

Fixes #3318

Signed-off-by: Jorik Jonker <jorik@kippendief.biz>
2021-03-21 13:57:42 -07:00
Artem Chernyshev
fc9c416a3c fix: build rockpi4 metal image as part of CI build
Added rockpi_4 to the list of sbcs targets in the Makefile.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2021-03-19 21:30:36 -07:00
Andrey Smirnov
125b86f4ef fix: upgrade-k8s bug with empty config values and provision script
First, if the config for some component image (e.g. `apiServer`) is empty,
Talos pushes default image which is unknown to the script, so verify
that change is not no-op, as otherwise script will hang forvever waiting
for k8s control plane config update.

Second, with bootkube bootstrap it was fine to omit explicit kubernetes
version in upgrade test, but with Talos-managed that means that after
Talos upgrade Kubernetes gets upgraded as well (as Talos config doesn't
contain K8s version, and defaults are used). This is not what we want to
test actually.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-19 12:05:31 -07:00
Alexey Palazhchenko
8b2d228dc4 chore: add script for starting registry proxies
To avoid copying and pasting it from the documentation every time.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>
2021-03-19 09:37:47 -07:00
Alexey Palazhchenko
f7d276b854 chore: remove old osctl reference
One place was missed.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>
2021-03-19 08:08:58 -07:00
Alexey Palazhchenko
5b14d6f2b8 chore: fix make help output
`e2e-%` was missing due to bad regex.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>
2021-03-19 06:08:43 -07:00
Andrey Smirnov
f0512dfce9 feat: update Kubernetes to 1.20.5
See CHANGELOG:
https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.20.md#changelog-since-v1204

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-19 03:14:46 -07:00
bzub
24cd0a2067 feat: publish talosctl container image
Creates a new container image and corresponding Makefile target.

Signed-off-by: bzub <Bryan.Zubrod@target.com>
2021-03-18 13:25:32 -07:00
Alexey Palazhchenko
6e17102c21 chore: remove unused code
Leftovers of the old blockdevice library.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>
2021-03-18 06:43:01 -07:00