1392 Commits

Author SHA1 Message Date
Andrey Smirnov
cafd33acd8 fix: refresh proxy settings from environment in image resolver
Fixes #1901

This is same fix as #1680, #1690, but applied to image resolver code.
Default HTTP client can't be used here, as custom TLS client config
might be set on the transport to authenticate to the registry.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-02-20 09:46:08 -05:00
Niklas Wik
08b1a782cd feat: support proxy in docker buildx
This allows building when http(s) proxy is enforced to download content on the build machine

Signed-off-by: Niklas Wik <niklas.wik@nokia.com>
2020-02-20 05:35:17 -08:00
Andrew Rynhard
9cf217d2c1 fix: default reboot flag to false
We should default to shutting down when resetting.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-02-19 16:14:00 -08:00
Andrew Rynhard
64b5b32732 refactor: use go-procfs
This makes use of the external procfs pacakge that is based on the
pacakge we are removing here.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-02-19 15:58:57 -08:00
Andrew Rynhard
8a3a76f73e fix: add reboot flag to reset command
This exposes the reboot option for thee reset API by adding a `--reboot`
flag to the CLI.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-02-19 15:44:10 -08:00
Andrew Rynhard
63ca83a02c feat: support sending machine info
This allows users to specify well known query parameters in `talos.config`.
The only supported parameter in this change is `uuid`. This will send
the node's UUID determined from SMBIOS along with the request for the
config.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-02-19 13:15:28 -08:00
Andrey Smirnov
afea21bc5a fix: stop firecracker launcher on signal
When inner function was added, `return nil` was not aborting launch
sequence, but rather leading to VM restart. `cluster destroy` still
worked fine, as it removes state directory and launcher exits on
failure.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-02-19 18:04:48 +03:00
Andrew Rynhard
fe7847e0b8 feat: add reboot flag to reset API
This adds the ability to automatically reboot a machine after a reboot.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-02-19 05:10:58 -08:00
Spencer Smith
8092362098 fix: fix reset command
This PR will fix the reset command to actually wipe the system disk as
expected.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2020-02-18 16:18:43 -05:00
Andrey Smirnov
5f330f1f64 chore: push installer & talos images to the CI registry on every build
This enables a way to run the matching installer image in firecracker
tests. New image is used in firecracker tests and bootloader support to
use installed kernel/initramfs, which opens path for upgrade tests.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-02-18 07:32:45 -08:00
Andrew Rynhard
c9a8605f87 chore: move golangci-lint.yaml to .golangci.yml
This allows local runs of golangci-lint to use the default config path.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-02-18 07:10:21 -08:00
Seán C McCord
16594a83a8 fix: allow kublet to handle multiple service CIDRs
Handle multiple service CIDRs (such as for dual-stack configurations)
cleanly, calculating the DNS service IP for each.

Fixes #1888

Signed-off-by: Seán C McCord <ulexus@gmail.com>
2020-02-18 07:10:05 -08:00
Andrey Smirnov
9bfb5f1501 test: fix RebootAllNodes test to reboot all nodes in one call
As calls to the nodes are proxied through `apid` on init node, we can't
reboot all nodes concurrently, as init node might be already down by the
moment any other node is going to be rebooted.

Rewrite the test to reboot all the nodes in a single multi-node
request.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-02-17 14:34:00 -08:00
Andrey Smirnov
491e7e58e0 test: implement RebootAllNodes test
This complements "rolling restart" RebootNodeByNode test by providing
more of a disaster scenario, when all the nodes are restarted at once.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-02-17 13:58:57 -08:00
Andrey Smirnov
638929f319 chore: remove KubernetesVersion from provision request
Not sure how it got into `ClusterRequest`, but we're not using it.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-02-17 13:29:50 -08:00
Andrew Rynhard
5b50456c05 fix: validate install disk
This adds a check that verifies the install disk in metal mode. The
check requires a value, and that the path is valid.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-02-17 08:15:27 -05:00
Seán C McCord
1a7175353e fix: PodCIDR, ServiceCIDR should be comma sets
PodCIDRs and ServiceCIDRs are now returned as comma-delimited set of
their slices, as per the documenation for kube-api-server and
kube-controller-manager in dual-stack configurations.

Ref: https://kubernetes.io/docs/concepts/services-networking/dual-stack/#enable-ipv4-ipv6-dual-stack

Fixes #1883

Signed-off-by: Seán C McCord <ulexus@gmail.com>
2020-02-17 08:15:10 -05:00
Andrew Rynhard
7f41437ace chore: prepare release v0.4.0-alpha.5
This is the official v0.4.0-alpha.5 release.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-02-15 06:01:03 -08:00
Andrey Smirnov
f51e9a14fe chore: build app container images skipping export to host
Container images for `apid`, `networkd`, etc. are now built inside the
buildkit using the `img` tool. This means that all the dependencies are
now controlled in `buildkit` and many more stages can run in parallel
without problems (overwriting content in `_out/images`).

This also simplifies Drone configuration, as we can let buildkit handle
the dependencies. I also enabled more stages to run in parallel.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-02-14 13:17:25 -08:00
Andrew Rynhard
d57598ebe1 chore: update pkgs
This brings in a number of kernel improvements.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-02-14 12:10:38 -08:00
Andrey Smirnov
e1779ac77c feat: implement registry mirror & config for image pull
When images are pulled by Talos or via CRI plugin, configuration
for each registry is applied. Mirrors allow to redirect pull request to
either local registry or cached registry. Auth & TLS enable
authentication and TLS authentication for non-public registries.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-02-14 00:28:59 +03:00
Andrey Smirnov
33332f4c74 chore: support bootloader emulation in firecracker provisioner
Firecracker launches tries to open VM disk image before every boot,
parses partition table, finds boot partition, tries to read it as FAT32
filesystem, extracts uncompressed kernel from `bzImage` (firecracker
doesn't support `bzImage` yet), extracts initramfs and passes it to
firecracker binary.

This flow allows for extended tests, e.g. testing installer, upgrade and
downgrade tests, etc.

Bootloader emulation is disabled by default for now, can be enabled via
`--with-bootloader-emulation` flag to `osctl cluster create`.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-02-13 23:21:37 +03:00
Andrey Smirnov
76c2038b13 chore: implement loadbalancer for firecracker provisioner
This PR contains generic simple TCP loadbalancer code, and glue code for
firecracker provisioner to use this loadbalancer.

K8s control plane is passed through the load balancer, and Talos API is
passed only to the init node (for now, as some APIs, including
kubeconfig, don't work with non-init node).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-02-13 23:07:13 +03:00
Andrew Rynhard
fcaed8b0dd fix: don't proxy gRPC unix connections
The default gRPC dialer honors proxy environment variables, which causes
local unix socket connections to attempt to go through the proxy. This
fixes that by using a custom dialer.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-02-10 05:37:50 -08:00
Seán C McCord
5f3485979a fix: do not add empty netconf
When `ip=dhcp` (as well as a few other conditions), the
buildKernelOptions function returns empty.  In these cases, this empty
network config should not be added to the common list for iteration.

fixes #1869

Signed-off-by: Seán C McCord <ulexus@gmail.com>
2020-02-10 05:26:38 -08:00
Andrew Rynhard
71ddd85fb5 chore: prepare release v0.4.0-alpha.4
This is the official v0.4.0-alpha.4 release.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-02-04 11:16:52 -08:00
Andrew Rynhard
51a359b115 chore: sign .drone.yml
This is required so that we don't have to approve every PR in Drone.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-02-04 11:05:21 -08:00
Spencer Smith
1d73a9e6d1 chore: only run ok-to-test when PR
This PR fixes a quick bug in CI where the ok-to-test step in drone was
running after a merge to master.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2020-02-04 10:27:46 -08:00
Spencer Smith
c825b83d47 chore: support slash commands in drone
This PR adds the necessary drone step to check for the `ok-to-test`
label before running any testing against a PR.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2020-02-04 12:57:16 -05:00
Spencer Smith
4fbfd6511b chore: get correct drone status in github actions
This PR fixes a small bug I found yesterday to make sure we're fetching
the latest drone build number always.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2020-02-04 12:43:37 -05:00
Andrey Smirnov
4950f35440 chore: use upstream version of Firecracker Go SDK
With all our PRs merged, we can switch back to upstream version. No tag
yet, so we have to follow `master` for now.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-02-04 08:59:39 -08:00
Andrey Smirnov
01d696ed10 chore: update golangci-lint-1.23.3
`gomnd` disabled, as it complains about every number used in the code,
and `wsl` became much more thorough.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-02-04 08:56:39 -08:00
Andrey Smirnov
a2dee289d1 test: skip reboot tests
Seems that with a single endpoint k8s is not able to recover (?).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-02-04 08:37:32 -08:00
Andrey Smirnov
e8bb70bf57 chore: use common method to pull etcd image
This was the only place which was still doing direct call to containerd
API, use common method to support retries.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-02-04 06:06:14 -08:00
Brad Beam
a39cd81b8f chore(networkd): Report on errors during interface configuration
This DRYs up the interface configuration and adds in an error channel to capture
any issues that come up from interface configuration. These errors are still
treated as non-fatal, but should provide some additional insight.

Signed-off-by: Brad Beam <brad.beam@b-rad.info>
2020-02-03 12:46:37 -08:00
Andrey Smirnov
afa8a48174 chore: implement reboot test
Reboot test does node-by-node reboots followed by cluster health checks
(same as done by provisioner).

Fixed bug with `Read()` returning `Reader` instead of `ReadCloser`
(minor).

Allowed `bootkube` to be `Skipped` (for rebooted node).

Added support for doing checks via provided client instance.

Implemented generic capabilities to skip tests based on cluster
platform.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-02-03 11:02:43 -08:00
Seán C McCord
dbf408ea58 fix: bind etcd to IPv6 if available
If an IPv6 address is available, etcd should bind to `[::]` instead of
`0.0.0.0`.  This will cause etcd to listen on both IPv4 and IPv6
interfaces.

Additionally, this fixes the SAN list for the etcd certificate
generation to include the FQDN of the host.

Fixes #1842
Fixes #1843

Signed-off-by: Seán C McCord <ulexus@gmail.com>
2020-02-03 11:01:50 -08:00
Brad Beam
e9113537f9 feat(networkd): Make healthcheck perform a check
This implements an actual health check for networkd. We use the arp table ( ip neighbors )
to determine if the machine is actively sending traffic. We should see at least one entry
with a REACHABLE/STALE/DELAY state during normal operating conditions.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2020-02-03 11:01:00 -08:00
Spencer Smith
effd0ee614 chore: enable slash commands in github PRs
This PR will allow us to start building out checks for slash commands,
with /test and /e2e both supported initially. I'll eventually want some
dashes in those commands, but they're not supported in the upstream
regex yet. I'll PR that later.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2020-02-03 13:55:36 -05:00
Spencer Smith
e27b0cbfdb chore: update bootkube
This PR updates the talos branch of bootkube to add extraArgs to
bootstrap controlplane components as well.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2020-01-31 11:38:34 -08:00
Spencer Smith
05aad743df chore: update capi-upstream
This PR will bring in the latest v1alpha2-supporting release ofthe upstream capi provider

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2020-01-31 11:30:16 -05:00
Andrey Smirnov
0afd0f651b chore: provide provisioned cluster info to integration test
Integration test can optionally consume cluster state as generated by
the call to `osctl cluster create` and use it to discover nodes in
integration tests.

This means that now CLI tests can use that as discovery source, and
API/K8s tests by default as well.

Flat list of nodes is to be replaced by something more complex in the
next iteration, but it's good for this PR.

As a demo, add CLI test with multiple nodes (dmesg).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-01-31 18:21:30 +03:00
Spencer Smith
ff393f8ae3 chore: update bootkube fork
This PR will pull in the latest of our bootkube fork and fix a bug with
extraArgs.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2020-01-31 09:39:43 -05:00
Tim Gerla
d662956449 docs: add a link to the Talos Systems company site to the OSS site's header
- add a separate link to get to the corporate site
- unify some styles between corp and OSS sites
- minor responsiveness fixes

Signed-off-by: Tim Gerla <tim@gerla.net>
2020-01-30 11:54:27 -08:00
Brad Beam
4593c4f727 fix(networkd): fix ticker leak
Call ticker.Stop() to prevent leak.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2020-01-29 10:05:47 -08:00
Brad Beam
88df1b50b8 feat(networkd): Add health api
This introduces a health/ready api for networkd. This
will allow us to better determine the state of networkd
and allow for some level of monitoring.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2020-01-29 09:09:27 -06:00
Andrey Smirnov
fae5e6915d chore: rework firecracker code around upstream Go SDK + PRs
This removes use of private fork with custom `ip=` kernel argument
handling and switches fully to upstream version of it.

Firecracker Go SDK version is `master` + following PRs:

* https://github.com/firecracker-microvm/firecracker-go-sdk/pull/167
* https://github.com/firecracker-microvm/firecracker-go-sdk/pull/177
* https://github.com/firecracker-microvm/firecracker-go-sdk/pull/178

MTU handling support was implemented as well.

Changes:

* hostname to each node is passed via `talos.hostname=` kernel arg
* IP configuration is generated by SDK from CNI result
* fixed bugs with wrong netmask
* nameservers & MTU is passed via Talos config

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-01-29 02:35:15 +03:00
Brad Beam
defbcf3856 docs(apid): Add apid docs
Describes apid and introduces some workflows to illustrate what apid does.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2020-01-28 11:36:13 -08:00
Andrew Rynhard
f567f8c84d fix: follow symlinks
This fixes the list API to check if the requested path is a symlink, and
to follow the symlink if so.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-01-28 11:16:41 -08:00
Andrew Rynhard
d36b3a50d6 docs: remove invalid field from docs
This removes `extraDiskArgs` from the kubelet configuration field. This
never really was a thing.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-01-28 07:35:27 -08:00