128 Commits

Author SHA1 Message Date
Artem Chernyshev
a83af03730 refactor: update go-blockdevice and restructure disk interaction code
This refactoring is required to simplify the work to be done to support
disk encryption.

Tried to minimize amount of queries done by `blockdevice` `probe`
methods.
Instead, where we have `runtime.Runtime` we get all required blockdevices
there from blockdevice cache stored in `State().Machine().Disk()`.
This opens a way to store encryption settings in the `Partition`
objects.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2021-01-28 17:42:09 +03:00
Andrey Smirnov
0aaf8fa968 feat: replace bootkube with Talos-managed control plane
Control plane components are running as static pods managed by the
kubelets.

Whole subsystem is managed via resources/controllers from os-runtime.

Many supporting changes/refactoring to enable new code paths.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-01-26 14:22:35 -08:00
Andrey Smirnov
78eecc0574 chore: enable virtio-balloon and monitor in QEMU provisioner
Ballooning is not automatic, but it can be verified via QEMU monitor by
inflating/deflating the balloon inside the VM.

Monitor can be used like that:

```
$ sudo socat - unix-connect:/home/smira/.talos/clusters/talos-default/talos-default-master-1.monitor
QEMU 5.0.0 monitor - type 'help' for more information
(qemu) info status
info status
VM status: running
```

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-01-15 10:36:48 -08:00
Artem Chernyshev
7b6c4bcb1f refactor: define default kernel flags in machinery instead of procfs
That change should make Talos updates more straightforward in any
projects that depend on Talos.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2020-12-24 06:50:53 -08:00
Artem Chernyshev
6540e9bf70 feat: support disk image in talosctl cluster create
Fixes: https://github.com/talos-systems/talos/issues/2973

Can now supply disk image using `--disk-image-path` flag.
May need to enable `--with-apply-config` if it's necessary to bootstrap
nodes properly.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2020-12-22 17:06:00 +03:00
Andrey Smirnov
80184393bc feat: update kernel to 5.9.13, new KSPP requirements
Pulls in following changes:

* https://github.com/talos-systems/toolchain/pull/20
* https://github.com/talos-systems/tools/pull/116
* https://github.com/talos-systems/pkgs/pull/214
* https://github.com/talos-systems/pkgs/pull/215
* https://github.com/talos-systems/pkgs/pull/216
* https://github.com/talos-systems/pkgs/pull/217
* https://github.com/talos-systems/go-procfs/pull/4

New empty amd64 images for u-boot & rpi-firmware reduce the size of
amd64 installer image.

For backwards compatibility QEMU provisioner still injects "legacy" KSPP
kernel args into initial boot environment.

Installer correctly upgrades KSPP options when moving from one version
of Talos to another.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-12-10 12:41:58 -08:00
Andrey Smirnov
c5ffe9f4f7 test: add support for mounting ISO in talosctl cluster create
If disk is empty and ISO path is given, QEMU provisioner mounts ISO on
the first boot.

To drop into maintenance mode:

```
talosctl cluster create --provisioner=qemu --iso-path=./_out/talos-amd64.iso --skip-injecting-config --wait=false
```

Then inject the config, bootstrap the node, wait for it to come up (via
`talosctl cluster health`).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-12-10 05:55:44 -08:00
Andrey Smirnov
360d887967 fix: prevent endless loop with DHCP requests in networkd
There were two problems:

* `configureInterfaces` was always failing if interface is already set
up, as the routes already exist

* `renew` was halving the renew interval each time `configureInterface`
fails, which starts at (LeaseTime/2) and goes effectively to zero

This was leading to high networkd CPU usage, storm of DHCP requests on
the network.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-12-01 08:12:12 -08:00
Andrey Smirnov
1eac88e470 feat: add support for installing to SBCs
This introduces the notion of a "board" in Talos. A board is an interface that is capable
of modifying the installation in specific ways for a given SBC. This also adds support for the
libretech_all_h3_cc_h5.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2020-11-26 07:18:25 -08:00
Andrey Smirnov
7767a41d4a feat: set interface MTU in DHCP mode even if DHCP is not successful
Fixes #2789

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-11-19 10:59:21 -08:00
Andrey Smirnov
b2b86a622e fix: remove 'token creds' from maintenance service
This fixes the reverse Go dependency from `pkg/machinery` to `talos`
package.

Add a check to `Dockerfile` to prevent `pkg/machinery/go.mod` getting
out of sync, this should prevent problems in the future.

Fix potential security issue in `token` authorizer to deny requests
without grpc metadata.

In provisioner, add support for launching nodes without the config
(config is not delivered to the provisioned nodes).

Breaking change in `pkg/provision`: now `NodeRequest.Type` should be set
to the node type (as config can be missing now).

In `talosctl cluster create` add a flag to skip providing config to the
nodes so that they enter maintenance mode, while the generated configs
are written down to disk (so they can be tweaked and applied easily).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-11-09 14:10:32 -08:00
Andrey Smirnov
a2efa44663 chore: enable gci linter
Fixes were applied automatically.

Import ordering might be questionable, but it's strict:

* stdlib
* other packages
* same package imports

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-11-09 08:09:48 -08:00
Andrey Smirnov
8560fb9662 chore: enable nlreturn linter
Most of the fixes were automatically applied.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-11-09 06:48:07 -08:00
Andrey Smirnov
350d75eb46 feat: build talosctl-cni-bundle, use it in talosctl for QEMU
This builds a bundle with CNI plugins for talosctl which is
automatically downloaded by `talosctl` if CNI plugins are missing.

CNI directories are moved by default to the `~/.talos/cni` path.

Also add a bunch of pre-flight checks to the QEMU provisioner to make it
easier to bootstrap the Talos QEMU cluster.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-10-30 16:30:37 -07:00
Artem Chernyshev
061b296530 feat: allow specifying user-disks in talosctl cluster create
User-disks are supported by QEMU and Firecracker providers.
Can be defined by using the following parameters:
```
--user-disk /mount/path:1GB
```

Can get more than 1 user disk.
Same set of user disks will be created for all master and worker nodes.

Additionally enable user-disks in qemu e2e test.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2020-10-30 08:44:08 -07:00
Andrey Smirnov
569527e6ed test: potential fix for talosctl cluster destroy being stuck
Missing timeout in shutdown is the only reason I could find for Sfyra
tests being stuck on teardown.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-10-09 05:10:08 -07:00
Andrey Smirnov
371cbfa7ae feat: implement talos.shutdown=[halt|poweroff] kernel argument
This allows to change `Shutdown()` API behavior to halt the system
instead of powering it off.

This is useful for QEMU provisioner, as it doesn't distinguiush between
power off and reboot.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-10-08 01:34:44 +03:00
Andrey Smirnov
018086d1fa refactor: extract blockdevice library
Library `blockdevice` was extracted as `talos-systems/go-blockdevice`,
this PR finalizes the move by removing Talos copy of it.

Some functions around `mkfs`/`growfs` were extracted as `makefs`
package, as they depend on `cmd` package.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-10-05 11:18:43 -07:00
Andrey Smirnov
ff0d4b305a feat: build Talos images/artifacts for amd64/arm64
By default, build outside of Drone works the same and builds only amd64
version, loads images back into dockerd, etc.

If multiple platforms are used, multi-arch images are built which can't
be exported to docker or to `.tar` image, they're always pushed to the
registry (even for PR builds to our internal CI registry).

Artifacts as files (initramfs, kernel) now have `-arch` suffix:
`vmlinuz-amd64`, `initramfs-amd64.xz`. "Magic" script normalizes output
paths depending on whether single platform or multiple platforms were
given.

VM provisioners accept magic `${ARCH}` in initramfs/kernel paths which
gets replaced by cluster architecture.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-09-27 10:32:07 -07:00
Andrey Smirnov
7f5ffdacb8 test: implement API for QEMU VM provisioner
Fixes #2515

This implements simple HTTP API which should cover same methods as IPMI
methods in Sidero.

Examples:

```
$ curl http://172.20.0.1:34791/status
{"PoweredOn":false}
```

```
$ curl -X POST http://172.20.0.1:34791/poweroff
```

API listens on bridge address, each VM has unique port which can be
found in cluster state as `apiport: NNNN`.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-09-15 15:05:55 -07:00
Andrey Smirnov
5288ac27f3 fix: default endpoint to 127.0.0.1 for Docker/OS X
Docker for OS X doesn't leave any other option, as node IPs are not
routeable from the host, and current default was to use all the control
plane node IPs in round-robin LB.

Fixes #2495

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-09-04 13:28:41 -07:00
Andrey Smirnov
59adf7315d feat: provide option to run Talos under UEFI in QEMU
This also adds integration pipeline tests for UEFI.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-08-28 12:51:10 -07:00
Andrey Smirnov
d60adf9e3b test: add support for PXE nodes in qemu provision library
This isn't supposed to be used ever in Talos directly, but rather only
in integration tests for Sidero.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-08-21 06:49:32 -07:00
Andrey Smirnov
5c6c522994 refactor: extract cluster bootstrapper via API as common component
It should be useful in any project provisioning Talos clusters.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-08-19 14:32:58 -07:00
Andrey Smirnov
bddd4f1bf6 refactor: move external API packages into machinery/
This moves `pkg/config`, `pkg/client` and `pkg/constants`
under `pkg/machinery` umbrella.

And `pkg/machinery` is published as Go module inside Talos repository.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-08-17 09:56:14 -07:00
Andrey Smirnov
2697b99b7d refactor: extract pkg/net as github.com/talos-systems/net
This extracts common package as new module/repository.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-08-14 11:04:50 -07:00
Andrey Smirnov
7226fc8be9 fix: ignore eth0 interface in docker provisioner
This avoids pause on container startup when `networkd` tries to do DHCP
over `eth0` (which fails for obvious reasons). Interfaces are
pre-configured in Docker.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-08-13 07:56:18 -07:00
Andrey Smirnov
9379cf9ee1 refactor: expose provision as public package
This change is only moving packages and updating import paths.

Goal: expose `internal/pkg/provision` as `pkg/provision` to enable other
projects to import Talos provisioning library.

As cluster checks are almost always required as part of provisioning
process, package `internal/pkg/cluster` was also made public as
`pkg/cluster`.

Other changes were direct dependencies discovered by `importvet` which
were updated.

Public packages (useful, general purpose packages with stable API):

* `internal/pkg/conditions` -> `pkg/conditions`
* `internal/pkg/tail` -> `pkg/tail`

Private packages (used only on provisioning library internally):

* `internal/pkg/inmemhttp` -> `pkg/provision/internal/inmemhttp`
* `internal/pkg/kernel/vmlinuz` -> `pkg/provision/internal/vmlinuz`
* `internal/pkg/cniutils` -> `pkg/provision/internal/cniutils`

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-08-12 05:12:05 -07:00