188 Commits

Author SHA1 Message Date
Andrey Smirnov
a8dd2ff30d fix: checkpoint controller-manager and scheduler
Default manifests created by bootkube so far were only enabling
pod-checkpointer for kube-apiserver. This seems to have issues with
single-node control plane scenario, when without scheduler and
controller-manager node might fall into `NodeAffinity` state.

See https://github.com/talos-systems/bootkube-plugin/pull/23

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-12-28 11:53:17 -08:00
Artem Chernyshev
7b6c4bcb1f refactor: define default kernel flags in machinery instead of procfs
That change should make Talos updates more straightforward in any
projects that depend on Talos.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2020-12-24 06:50:53 -08:00
Artem Chernyshev
47fb7d26e0 fix: use SetAll instead of AppendAll when building kernel args
SBC should always overwrite default kernel params.
Otherwise we will always get duplicate values for some of them.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2020-12-23 09:09:13 -08:00
Andrey Smirnov
b1d4814308 feat: update Kubernetes to 1.20.1
See https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.20.md

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-12-21 23:52:29 +03:00
Andrey Smirnov
c4624078ce fix: sync RTC in timed, sync time before fetching packet metadata
1. When time is set by timed, also sync RTC with system time, otherwise
time might be off after a reboot.

2. Before fetching Packet (Equinix Metal) metadata over `https://`, do
one-time time sync in the background, as if the clock skew is big
enough, TLS certificate will be considered invalid and boot fails.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-12-18 22:06:44 +03:00
Andrey Smirnov
f90aa613ac fix: don't overwrite PMBR
See https://github.com/talos-systems/go-blockdevice/pull/24

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-12-18 10:21:37 -08:00
Andrey Smirnov
ed42d4a42a fix: bump blockdevice library for 2nd partitione entries copy fix
See https://github.com/talos-systems/go-blockdevice/pull/23

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-12-18 09:15:21 -08:00
Andrey Smirnov
80184393bc feat: update kernel to 5.9.13, new KSPP requirements
Pulls in following changes:

* https://github.com/talos-systems/toolchain/pull/20
* https://github.com/talos-systems/tools/pull/116
* https://github.com/talos-systems/pkgs/pull/214
* https://github.com/talos-systems/pkgs/pull/215
* https://github.com/talos-systems/pkgs/pull/216
* https://github.com/talos-systems/pkgs/pull/217
* https://github.com/talos-systems/go-procfs/pull/4

New empty amd64 images for u-boot & rpi-firmware reduce the size of
amd64 installer image.

For backwards compatibility QEMU provisioner still injects "legacy" KSPP
kernel args into initial boot environment.

Installer correctly upgrades KSPP options when moving from one version
of Talos to another.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-12-10 12:41:58 -08:00
Andrey Smirnov
872e792dbc feat: update Kubernetes to 1.20.0
Official K8s release matching Talos 0.8.0.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-12-09 06:11:48 -08:00
Andrey Smirnov
28d2270067 fix: make reset work again
This bumps go-blockdevice library for the fix
https://github.com/talos-systems/go-blockdevice/pull/22

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-12-04 19:27:41 +03:00
Artem Chernyshev
c7062e3f4d feat: make GenerateConfiguration accept current time as a parameter
If the node time is out of sync, it can generate incorrect
configuration. And maintenance mode does not allow us starting ntp,
because there is no containerd.

By providing current UTC time of the machine where talosctl client is
running, it is possible to force GenerateConfiguration use correct time.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2020-12-03 08:28:11 -08:00
Andrey Smirnov
621968977e feat: update kubernetes to 1.20.0-rc.0
Talos 0.8 is going to ship with K8s 1.20.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-12-02 10:50:58 -08:00
Andrey Smirnov
3951898c76 chore: update module dependencies
Generic bump of dependencies.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-12-01 10:04:02 -08:00
Andrey Smirnov
08c84fe678 feat: stop including K8s version by default in talosctl gen config
Default image versions are kept as commented out examples.

This allows better experience for generating config on amd64 for arm64
servers. (e.g. for RPi).

Without embedded values in the config, Talos is going to use the
defaults which work better.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-11-30 12:52:53 -08:00
Andrey Smirnov
07f4ed7fb4 feat: upgrade etcd to 3.4.14
No major fixes, just keeping version up to date.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-11-26 09:14:41 -08:00
Andrey Smirnov
1eac88e470 feat: add support for installing to SBCs
This introduces the notion of a "board" in Talos. A board is an interface that is capable
of modifying the installation in specific ways for a given SBC. This also adds support for the
libretech_all_h3_cc_h5.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2020-11-26 07:18:25 -08:00
Andrey Smirnov
28ba6e416e feat: update Kubernetes to v1.20.0-beta.2
Talos 0.8 is going to ship with K8s 1.20.x.

Changes to support new `control-plane` label,
upgrade-k8s supports automated fixups for 1.20.

See also: https://github.com/talos-systems/bootkube-plugin/pull/22

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-11-25 06:39:14 -08:00
Andrey Smirnov
b5173477c8 fix: bump blockdevice library for mmcblk part name fix
See https://github.com/talos-systems/go-blockdevice/pull/17

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-11-20 05:36:14 -08:00
Artem Chernyshev
b6874ee82a feat: add TUI based talos interactive installer
This is initial commit of the installer.
What's done:
- verifying node availability before starting any operations.
- gathering information about disks on the machine.
- allows setting: install disk, hostname, machine type, installer image,
  kubernetes version, dns domain, cluster-name.
- dumps/merges talosconfig to a file after applying configuration.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2020-11-18 12:34:15 -08:00
Andrey Smirnov
83bb1afcb6 feat: drop to maintenance mode in cloud platforms if userdata is missing
On first boot of Talos, if userdata is missing, Talos is going to drop
into maintenance mode which allows to upload config to the server via
`talosctl apply-config` command.

See also: https://github.com/talos-systems/go-retry/pull/4

Fixes #2780

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-11-16 11:03:26 -08:00
Andrey Smirnov
df6ad3fa80 feat: upgrade Kubernetes default version to 1.19.4
k8s.io modules don't have 1.19.4 tag yet :(

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-11-12 08:51:04 -08:00
Andrey Smirnov
58df555580 feat: add example command in maintenance, enforce cert fingerprint
Server in maintenance mode now prints certficate fingerprint and
provides sample talosctl command to upload config to the node.

`talosctl` can optionally enforce server certificate fingerprint.

See also https://github.com/talos-systems/crypto/pull/4

Fixes #2753

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-11-12 07:36:18 -08:00
Andrew Rynhard
71321214a1 feat: add storage API
This is the initial implementation of a storage API.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2020-11-11 10:12:25 -08:00
Andrey Smirnov
4c42e22dbf chore: bump version of x/net module
Fixes #2746

This should cover the `x/net/http2` fixes which were backported in Go
1.15.4.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-11-10 07:09:19 -08:00
Andrew Rynhard
8e40ea46e0 fix: address issues in webconfig
Fixes TLS and invalide redirects.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2020-11-01 12:03:55 -08:00
Andrey Smirnov
350d75eb46 feat: build talosctl-cni-bundle, use it in talosctl for QEMU
This builds a bundle with CNI plugins for talosctl which is
automatically downloaded by `talosctl` if CNI plugins are missing.

CNI directories are moved by default to the `~/.talos/cni` path.

Also add a bunch of pre-flight checks to the QEMU provisioner to make it
easier to bootstrap the Talos QEMU cluster.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-10-30 16:30:37 -07:00
Andrey Smirnov
d9b74f0cc6 feat: skip resizing ephemeral partition if not required
This skips writing partition table if partition doesn't have to be
resized (already resized or max size from the beginning).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-10-30 14:31:10 -07:00
Andrey Smirnov
bc9e0c0dba fix: re-implement upgrade (install) with preserve
For 0.6 -> 0.7 upgrade, in any case config.yaml is preserved and moved
from `/boot` to `/system/state`.

For single node upgrade, `EPHEMERAL` partition is not touched and other
partitions are re-created as needed.

Bump provision tests to 0.6/0.7 upgrades as we get closer to the new
release.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-10-28 07:25:26 -07:00
Andrey Smirnov
41f92b5d35 feat: wipe disks faster in the installer
See https://github.com/talos-systems/talos/pull/2663 for details.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-10-22 08:32:29 -07:00
Spencer Smith
8b5406c889 chore: move to newer release of rtnetlink with fn args
This PR makes use of a new merge into the upstream rtnetlink library
that introduces functional args for adding routes.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2020-10-22 06:56:22 -07:00
Andrey Smirnov
16b6d344de chore: bump module dependencies in go.mod
This covers most of the packages except for those we have to keep on
hold (etcd and grpc because of etcd).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-10-20 08:09:42 -07:00
Andrey Smirnov
56f1ee37fd feat: upgrade Kubernetes to 1.19.3
Just minor release bump.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-10-20 05:12:32 -07:00
Andrey Smirnov
4adb613f66 refactor: bring more control to install.Manifest execution
This unifies more code paths under the control of `install.Manifest` vs.
being split across the installer and manifest code.

There should be no functional changes now.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-10-20 01:08:14 -07:00
Andrey Smirnov
d7f5de62c3 feat: colorize output of cluster health checks
It only gets enabled if output is a terminal. Failures which resolve
themselves are removed from the final output. Small spinner to indicate
progress.

While I was at it, I fixed client-side `talosctl health` when init node
is missing.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-10-06 07:59:30 -07:00
Andrey Smirnov
018086d1fa refactor: extract blockdevice library
Library `blockdevice` was extracted as `talos-systems/go-blockdevice`,
this PR finalizes the move by removing Talos copy of it.

Some functions around `mkfs`/`growfs` were extracted as `makefs`
package, as they depend on `cmd` package.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-10-05 11:18:43 -07:00
Andrey Smirnov
16eb47a1a3 feat: use kubeconfig merge in talosctl kubeconfig by default
Kubeconfig merge was completely rewritten to be "smarter":

* automatically apply renames done at previous stages to avoid asking
over and over again (in general should ask just once)

* skip checks if parts of the config match exactly

* allow overwrite as an option

* flexible way to control the output

* activating context in the end

* custom merged context name

Fixes #2578

Fixes #2587

Fixes #2577

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-10-03 05:36:15 -07:00
Andrey Smirnov
8236822c90 fix: retry image pulling, stop on 404, no duplicate pulls
This uses go-retry feature
(https://github.com/talos-systems/go-retry/pull/3) to print errors being
retried.

If image is not found in the index, abort retries immediately.

Don't pull installer image twice (if already pulled by the validation
code before).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-09-22 07:07:45 -07:00
Seán C McCord
3db5f72b3b fix: validate cluster endpoint
Validates cluster endpoint from v1alpha1 config.

Fixes #2101

Signed-off-by: Seán C McCord <ulexus@gmail.com>
2020-09-17 16:39:14 -07:00
Andrey Smirnov
14ad0674d3 feat: update Flannel to 0.12, support for arm64
This updates bootkube-plugin with PR
https://github.com/talos-systems/bootkube-plugin/pull/21.

Flannel image is chosen based on host architecture.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-09-15 08:04:29 -07:00
Andrey Smirnov
b4341d8780 feat: upgrade kubernetes to 1.19.1
Release notes: https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.19.md#v1191

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-09-11 06:58:12 -07:00
Andrey Smirnov
788cd15c29 test: add e2e test to the provision (upgrade) tests
Add sonobuoy runner code with log fetching on failure. Use hand-picked
set of e2e tests to run: verify basic pod functionality, verify service
connectivity.

Add option `--run-e2e` to the `talosctl health` to run quick e2e test to
verify cluster health.

Add option to run provision tests with custom CNI, run one track of
provision tests with Cilium.

Bump Cilium to 1.8.2.

Talos 0.6 won't uncordon node automatically after upgrade from 0.5, as
0.5 doesn't put annotation. Workaround that in upgrade tests.

Bump upgrade test version to 0.6.0 release.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-09-08 13:26:31 -07:00
Andrew Rynhard
2b84cf1967 feat: upgrade runc to v1.0.0-rc92
This brings in the latest stable version of runc.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2020-09-04 13:09:37 -07:00
Andrew Rynhard
6a85a47ffa feat: upgrade containerd to v1.4.0
This brings in the latest stable containerd.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2020-09-04 02:59:08 -07:00
Andrey Smirnov
f6ecf000c9 refactor: extract packages loadbalancer and retry
This removes in-tree packages in favor of:

* github.com/talos-systems/go-retry
* github.com/talos-systems/go-loadbalancer

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-09-02 13:46:22 -07:00
Andrey Smirnov
ac4ab11d36 chore: update k8s modules to 1.19 final version
I think we missed that when updating K8s for the final version.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-09-02 06:25:28 -07:00
Andrey Smirnov
6bbe54dba4 chore: update machinery version in go.mod
The version we had was pre-push to master and it's invalid version:

```
go get github.com/talos-systems/talos@master
go: github.com/talos-systems/talos master => v0.7.0-alpha.0.0.20200818212414-6a7cc0264819
go get: github.com/talos-systems/talos@v0.7.0-alpha.0.0.20200818212414-6a7cc0264819 requires
        github.com/talos-systems/talos/pkg/machinery@v0.0.0-00010101000000-000000000000:
	invalid version: unknown revision 000000000000
```

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-08-19 13:11:19 -07:00
Andrey Smirnov
6a7cc02648 fix: handle bootkube recover correctly, support recovery from etcd
Bootkube recover process (and `talosctl recover`) was actually
regenerating assets each time `recover` runs forcing control plane to be
at the state when cluster got created. This PR fixes that by running
recover process correctly.

Recovery via etcd was fixed to handle encrypted etcd data:
it follows the way `apiserver` handles encryption at rest, and as at
the moment AES CBC is the only supported encryption method, code simply
follows the same path.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-08-18 14:24:14 -07:00
Andrey Smirnov
bddd4f1bf6 refactor: move external API packages into machinery/
This moves `pkg/config`, `pkg/client` and `pkg/constants`
under `pkg/machinery` umbrella.

And `pkg/machinery` is published as Go module inside Talos repository.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-08-17 09:56:14 -07:00
Andrey Smirnov
7875e9499f chore: re-import talos-systems/pkg/crypto/tls
See also https://github.com/talos-systems/crypto/pull/2

This should break dependency of `pkg/client` on `pkg/grpc`.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-08-17 08:06:38 -07:00
Andrey Smirnov
2697b99b7d refactor: extract pkg/net as github.com/talos-systems/net
This extracts common package as new module/repository.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-08-14 11:04:50 -07:00