Commit Graph

133 Commits

Author SHA1 Message Date
Andrew Rynhard
6a85a47ffa feat: upgrade containerd to v1.4.0
This brings in the latest stable containerd.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2020-09-04 02:59:08 -07:00
Andrey Smirnov
f6ecf000c9 refactor: extract packages loadbalancer and retry
This removes in-tree packages in favor of:

* github.com/talos-systems/go-retry
* github.com/talos-systems/go-loadbalancer

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-09-02 13:46:22 -07:00
Andrey Smirnov
ac4ab11d36 chore: update k8s modules to 1.19 final version
I think we missed that when updating K8s for the final version.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-09-02 06:25:28 -07:00
Andrey Smirnov
6a7cc02648 fix: handle bootkube recover correctly, support recovery from etcd
Bootkube recover process (and `talosctl recover`) was actually
regenerating assets each time `recover` runs forcing control plane to be
at the state when cluster got created. This PR fixes that by running
recover process correctly.

Recovery via etcd was fixed to handle encrypted etcd data:
it follows the way `apiserver` handles encryption at rest, and as at
the moment AES CBC is the only supported encryption method, code simply
follows the same path.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-08-18 14:24:14 -07:00
Andrey Smirnov
7875e9499f chore: re-import talos-systems/pkg/crypto/tls
See also https://github.com/talos-systems/crypto/pull/2

This should break dependency of `pkg/client` on `pkg/grpc`.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-08-17 08:06:38 -07:00
Andrey Smirnov
2697b99b7d refactor: extract pkg/net as github.com/talos-systems/net
This extracts common package as new module/repository.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-08-14 11:04:50 -07:00
Andrey Smirnov
52c5911fcd chore: extract pkg/crypto as external module
Package `pkg/crypto` was extracted as `github.com/talos-systems/crypto`
repository and Go module.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-08-14 06:33:30 -07:00
Andrey Smirnov
7474b8ba52 feat: upgrade etcd to 3.4.10
This upgrades etcd to latest v3.4.x version as smooth upgrade from
version 3.3.22 in 0.6.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-08-13 07:33:51 -07:00
Andrew Rynhard
849959fefc feat: add dynamic config decoder
This adds the ability to dynamically decode mult-doc YAML files.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2020-07-30 08:07:14 -07:00
Andrey Smirnov
3926442704 feat: taint master nodes with NoSchedule taint
Fixes #2350

This also brings in a fix for `coredns` tolerations from
https://github.com/talos-systems/bootkube-plugin/pull/19.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-07-29 14:02:41 -07:00
Andrew Rynhard
1b491d0a66 feat: upgrade Kubernetes to v1.19.0-rc.3
This brings in the latest version of Kubernetes.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2020-07-29 11:04:50 -07:00
Andrey Smirnov
f23c9111d1 feat: upgrade etcd to 3.3.22 version
Latest version in 3.3 branch is 3.3.23, but it's broken, so we use previous
stable version.

Switch to official etcd gcr.io registry, early support for arm64.

Move `etcd` service to run in system containerd.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-07-21 09:44:43 -07:00
Andrey Smirnov
4cd6e7e200 refactor: use humanize.Bytes everywhere
This removes dependency on `bytefmt` package.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-07-20 07:26:33 -07:00
Andrey Smirnov
1a0e1bc393 chore: update module dependencies
Fixes #2316

Simply update dependencies we don't track on version level to be
compatible with Talos components (like etcd or k8s).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-07-16 12:00:50 -07:00
Andrew Rynhard
0617a10027 feat: upgrade Kubernetes to v1.19.0-rc.0
This brings in the latest version of Kubernetes.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2020-07-14 13:07:18 -07:00
Artem Chernyshev
8fc352ec4f feat: merge mode in talosctl kubeconfig
New flag `-m` will enable merge mechanism in `talosctl kubeconfig`

Command examples:

```
talosctl kubeconfig -m

talosctl kubeconfig -m ~/.kube/config
```

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2020-07-10 12:39:30 -07:00
Andrey Smirnov
4cc074cdba feat: implement API access to event history
1. Add [xid-based](https://github.com/rs/xid) event IDs. Xids
are sortable and unique enough. Xids also encode event publishing
time with a second precision.

2. Add three ways to look back into event history: based on number of
events, on time and ID. Lookup via ID might be used to restart event
polling in case of broken API connection from the same moment.

3. Reimplement core event buffer with positions which are always
incremented instead of generation+index, this implementation is much
more simple (idea from circular buffer).

4. By default, Events API works the same - it shows no history and
starts streaming new events only.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-07-08 10:54:50 -07:00
Andrew Rynhard
a5a2d959ed feat: upgrade runc to v1.0.0-rc90
This updates runc to the same version vendored by containerd.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2020-07-02 13:19:33 -07:00
Spencer Smith
90115bb3ef feat: update kubernetes to 1.19.0-beta.1
This PR brings in all changes necessary to deploy kubernetes 1.19.x.

It relies on an update to our bootkube-plugin project, as well as
implementation of some Image() functions for our various control plane
components, since they are all distinct images and not just hyperkube.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2020-06-10 15:01:11 -04:00
Andrey Smirnov
d1a4e6ee64 feat: adjust time properly in timed via adjtime()
This should be proper way to adjust time incrementally without causing
jumps one in +/- direction. Time-sensitive services might be confused by
huge jumps.

This also implements timed healh check based on first successful time
sync.

Fixed some random health check related issues in other services.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-06-03 09:39:48 -07:00
Andrew Rynhard
9412e2b478 fix: allow all seccomp profile names
This updates the bootkube plugin that brings in a fix that allows
any seccomp profile name to be used.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-05-19 18:48:22 -07:00
Andrew Rynhard
56d7bf19fe feat: add recovery API
This adds an API for recovering the self-hosted control plane.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-05-04 19:38:30 -07:00
Seán C McCord
c1299d3ff0 feat: allow dual-stack support with bootkube wrapper
Handle dual-stack configurations with the bootkube wrapper.  This uses
the new PodCIDRs and ServiceCIDRs `asset.Config` parameters in bootkube.
It also relies on the bootkube-plugin features for manipulating
kube-proxy config and installing the dual-stack DNS service.

Fixes #2055

Signed-off-by: Seán C McCord <ulexus@gmail.com>
2020-04-28 20:10:58 -07:00
Andrew Rynhard
49307d554d refactor: improve machined
This is a rewrite of machined. It addresses some of the limitations and
complexity in the implementation. This introduces the idea of a
controller. A controller is responsible for managing the runtime, the
sequencer, and a new state type introduced in this PR.

A few highlights are:

- no more event bus
- functional approach to tasks (no more types defined for each task)
  - the task function definition now offers a lot more context, like
    access to raw API requests, the current sequence, a logger, the new
    state interface, and the runtime interface.
- no more panics to handle reboots
- additional initialize and reboot sequences
- graceful gRPC server shutdown on critical errors
- config is now stored at install time to avoid having to download it at
  install time and at boot time
- upgrades now use the local config instead of downloading it
- the upgrade API's preserve option takes precedence over the config's
  install force option

Additionally, this pulls various packes in under machined to make the
code easier to navigate.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-04-28 08:20:55 -07:00
Andrew Rynhard
d34e9f0984 fix: pass dev path to mkfs
This fixes a bug caused by a missing device argument to `mkfs.xfs`.
Without a device, `mkfs.xfs` will error out. Additionally, this ensures
that the installer container is started with the `kmsg` writer that
ensures logs are formatted correctly for `/dev/kmsg`. Without this we
lose a lot of the logs output by the container, one of them being any
error from `mkfs.xfs`

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-04-21 16:30:35 -07:00
Andrew Rynhard
3791fb5cbc refactor: use upstream bootkube
This moves us off of our bootkube fork and makes us of our bootkube
plugin.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-04-21 10:27:01 -07:00
Spencer Smith
3a4eaeeef0 feat: upgrade kubernetes to 1.18
This PR will pull in the latest release of k8s 1.18 so we can start
validating it through our test suite.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2020-03-26 14:59:43 -04:00
Andrew Rynhard
eba80b453f feat: update bootkube
This brings in the latest changes from our fork of bootkube. One thing
to note is a fix that stops the pod controller cache object.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-03-24 15:53:08 -07:00
Spencer Smith
3485ea9f09 fix: update k8s to 1.17.3
This PR will update k8s to v1.17.3 to address CVEs mentioned in https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!topic/kubernetes-security-announce/2UOlsba2g0s

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2020-03-23 17:08:52 -07:00
Spencer Smith
2f4ccfda9a fix: respect dns domain from machine config
BREAKING CHANGE: This PR fixes a bug where we were only passing `cluster.local` to the
kubelet configuration. It will also pull in a new version of the
bootkube fork to ensure that custom domains got propogated down to the
API Server certs, as well as the CoreDNS configuration for a cluster.

Existing users should be aware that, if they were previously trying to
use this option in machine configs, that an upgrade will may break
their cluster. It will update a kubelet flag with the new domain, but
CoreDNS and API Server certs will not change since bootkube has already
run. One option may be to change these values manually inside the
Kubernetes cluster. However, it may prove easier to rebuild the cluster
if necessary.

Additionally, this PR also exposes a flag to `osctl config generate`
to allow tweaking this domain value as well.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2020-03-20 12:28:17 -04:00
Andrey Smirnov
7e136fee67 chore: update Firecracker Go SDK to the official release
This release includes all the fixes we upstreamed before.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-03-18 01:11:46 +03:00
Andrey Smirnov
d5d3035c8c test: enable upgrade tests 0.4.x -> latest
With the fix #1904, it's now possible to upgrade 0.4.x with
`machine.File` extra files (caused by registry mirror for
registry.ci.svc).

Bump resources for upgrade tests in attempt to speed it up.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-02-26 00:09:32 +03:00
Andrew Rynhard
64b5b32732 refactor: use go-procfs
This makes use of the external procfs pacakge that is based on the
pacakge we are removing here.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-02-19 15:58:57 -08:00
Andrew Rynhard
63ca83a02c feat: support sending machine info
This allows users to specify well known query parameters in `talos.config`.
The only supported parameter in this change is `uuid`. This will send
the node's UUID determined from SMBIOS along with the request for the
config.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-02-19 13:15:28 -08:00
Andrew Rynhard
fe7847e0b8 feat: add reboot flag to reset API
This adds the ability to automatically reboot a machine after a reboot.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-02-19 05:10:58 -08:00
Andrey Smirnov
33332f4c74 chore: support bootloader emulation in firecracker provisioner
Firecracker launches tries to open VM disk image before every boot,
parses partition table, finds boot partition, tries to read it as FAT32
filesystem, extracts uncompressed kernel from `bzImage` (firecracker
doesn't support `bzImage` yet), extracts initramfs and passes it to
firecracker binary.

This flow allows for extended tests, e.g. testing installer, upgrade and
downgrade tests, etc.

Bootloader emulation is disabled by default for now, can be enabled via
`--with-bootloader-emulation` flag to `osctl cluster create`.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-02-13 23:21:37 +03:00
Andrey Smirnov
76c2038b13 chore: implement loadbalancer for firecracker provisioner
This PR contains generic simple TCP loadbalancer code, and glue code for
firecracker provisioner to use this loadbalancer.

K8s control plane is passed through the load balancer, and Talos API is
passed only to the init node (for now, as some APIs, including
kubeconfig, don't work with non-init node).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-02-13 23:07:13 +03:00
Andrey Smirnov
4950f35440 chore: use upstream version of Firecracker Go SDK
With all our PRs merged, we can switch back to upstream version. No tag
yet, so we have to follow `master` for now.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-02-04 08:59:39 -08:00
Spencer Smith
e27b0cbfdb chore: update bootkube
This PR updates the talos branch of bootkube to add extraArgs to
bootstrap controlplane components as well.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2020-01-31 11:38:34 -08:00
Spencer Smith
ff393f8ae3 chore: update bootkube fork
This PR will pull in the latest of our bootkube fork and fix a bug with
extraArgs.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2020-01-31 09:39:43 -05:00
Andrey Smirnov
fae5e6915d chore: rework firecracker code around upstream Go SDK + PRs
This removes use of private fork with custom `ip=` kernel argument
handling and switches fully to upstream version of it.

Firecracker Go SDK version is `master` + following PRs:

* https://github.com/firecracker-microvm/firecracker-go-sdk/pull/167
* https://github.com/firecracker-microvm/firecracker-go-sdk/pull/177
* https://github.com/firecracker-microvm/firecracker-go-sdk/pull/178

MTU handling support was implemented as well.

Changes:

* hostname to each node is passed via `talos.hostname=` kernel arg
* IP configuration is generated by SDK from CNI result
* fixed bugs with wrong netmask
* nameservers & MTU is passed via Talos config

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-01-29 02:35:15 +03:00
Spencer Smith
aabd46e651 fix: re-enable control plane flags
This PR aims to fix the ability to pass extra flags to control plane
components. This will close #1523

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2020-01-23 14:28:02 -05:00
Andrew Rynhard
f3623d22b0 refactor: use tls.Config as client credentials
The `client.Creds` struct was not used very often, and made using the
`client.NewClient` function impossible to use in combination with the
`RemoteRenewingFileCertificateProvider`. This modifies
`client.NewClient` to accept a `tls.Config` instead of `client.Creds`,
allowing for the use of `RemoteRenewingFileCertificateProvider` with
`client.NewClient`.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-01-21 17:10:07 -08:00
Spencer Smith
1368bfa451 chore: update bootkube config to include cluster name
This PR will add the new cluster name field to our bootkube options.
This allows for the generated kubeconfig to include the context-name for
the default context.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2020-01-21 16:56:57 -05:00
Andrey Smirnov
2bf8540855 test: provision Talos clusters via Firecracker VMs
This is initial PR to push the initial code, it has several known
problems which are going to be addressed in follow-up PRs:

1. there's no "cluster destroy", so the only way to stop the VMs is to
`pkill firecracker`

2. provisioner creates state in `/tmp` and never deletes it, that is
required to keep cluster running when `osctl cluster create` finishes

3. doesn't run any controller process around firecracker to support
reboots/CNI cleanup (vethxyz interfaces are lingering on the host as
they're never cleaned up)

The plan is to create some structure in `~/.talos` to manage cluster
state, e.g. `~/.talos/clusters/<name>` which will contain all the
required files (disk images, file sockets, VM logs, etc.). This
directory structure will also work as a way to detect running clusters
and clean them up.

For point number 3, `osctl cluster create` is going to exec lightweight
process to control the firecracker VM process and to simulate VM reboots
if firecracker finishes cleanly (when VM reboots).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-01-16 00:27:08 +03:00
Brad Beam
95666900a7 fix: Update bootkube to include node ready check
This ensures bootkube waits until all pods and nodes are ready before tearing
down the bootstrap control plane.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2020-01-14 16:51:53 -06:00
Andrew Rynhard
5cac4f5f39 fix: set kube-dns labels
This updates the CoreDNS to use the 'kube-dns' naming convention. This
naming convention is used throughout the Kubrnetes documentation. This
also fixing the kube-dns service. The label label selector was wrong,
breaking cluster DNS.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-01-09 18:29:35 -08:00
Andrew Rynhard
79878c1d8d feat: enable DynamicKubeletConfiguration
This moves to using the KubeletConfiguration instead of flags to the
kubelet. It also enables DynamicKubeletConfiguration, which allows users
to configure kubelets using a ConfigMap.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-01-08 16:06:44 -08:00
Brad Beam
0742e5245a feat: Upgrade bootkube
This brings in the changes to run controller manager and scheduler as
a daemonset.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2020-01-08 15:41:37 -08:00
Andrey Smirnov
0081ac5fac refactor: extract Talos cluster provisioner as common code
This extracts Docker Talos cluster provisioner as common code
which might be shared between `osctl cluster` and integration-test.

There should be almost no functional changes.

As proof of concept, abstract cluster readiness checks were implemented
based on provisioned cluster state. It implements same checks as
`basic-integration.sh` in pure Go via Talos/K8s clients.

`conditions` package was promoted from machined-internal to
`internal/pkg` as it is used to run the checks.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-12-27 12:14:19 -08:00