2759 Commits

Author SHA1 Message Date
Andrew Rynhard
2ba0e0ac4a
docs: add KubeSpan documentation
This adds a guide on how to use KubeSpan and a deep dive in the "Learn
More" section.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2021-10-11 19:49:19 -07:00
Andrey Smirnov
997873b6d3
fix: use ECDSA-SHA512 when generating certs for Talos < 0.13
Due to the way our crypto library is implemented, it can't generate a
key from CA with ECDSA-SHA256 on older versions of Talos.

Talos >= 0.13: ECDSA-SHA256 with P-256
Talos < 0.13: ECDSA-SHA512 with P-256

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-10-11 15:19:19 +03:00
Artem Chernyshev
7137166d1d
fix: allow overriding audit-policy-file in kube-apiserver static pod
Otherwise we lock it with our default config.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2021-10-11 11:27:36 +03:00
Andrey Smirnov
8fcd421967
chore: fix integration-qemu-race
We don't need to build for arm64, as the test runs on amd64.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-10-08 21:56:28 +03:00
Andrey Smirnov
91a858b537
fix: sort output of the argument builder
This fixes the instabilitiy on some of the internal resources, as they
get regenerated as a result of machine config changes. As map iteration
order is not stable this might cause unexpected static pod defition
regeneration with the only difference is the flag order.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-10-08 19:31:21 +03:00
Andrey Smirnov
657f7a56b1
fix: use ECDSA-SHA256 signature algorithm for Kubernetes certs
Previously Talos used ECDSA-SHA512 with P-256 EC key, which is not
widely supported combination. Use ECDSA-SHA256 instead.

There's no security benefit to use ECDSA-SHA512 with P-256 key, and this
combination is officially supported by TLS 1.3 standard.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-10-08 16:59:43 +03:00
Andrey Smirnov
983d2459e2
feat: suppress logging NTP sync to the console
Only `jump` syncs are logged to the console and any errors syncing.
Regular `slew` syncs are suppressed (only visible in
`talosctl logs controller-runtime`).

The very first sync is always reported to console.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-10-08 15:15:36 +03:00
Andrey Smirnov
022c7335f3
fix: add interface route if DHCP4 router is not directly routeable
Fixes #4320

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-10-07 22:33:05 +03:00
Andrey Smirnov
66a1579ea7
fix: don't enable 'no new privs' on the system level
This breaks some pods which specifically drop everything but gain
capabilities back via file capabilities (e.g. `nginx-ingress`).

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-10-06 21:43:14 +03:00
Alexey Palazhchenko
423861cf9f
feat: don't drop capabilities if kexec is disabled
It is needed for advanced use cases like Docker-in-Docker, our CI, etc.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@talos-systems.com>
2021-10-06 08:37:25 +00:00
Alexey Palazhchenko
facc8c38a0
docs: fix documentation for cluster discovery
Use the real value in an example.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@talos-systems.com>
2021-10-06 08:08:19 +00:00
Andrey Smirnov
ce65ca4e4a
chore: build using only amd64 builders
Our CI amd64 builders are 3-4 times faster to build Talos than our arm64
builders when building Talos.

Our Dockerfile was restructured a while ago to support cross-compilation
on all platforms but CI was still using amd64/arm64 workers, so arm64
part was done on arm64 builders.

As our CI runs on Talos, `binfmt_misc` is not enabled in the kernel, but
buildkit has built-in QEMU emulation layer which works just fine for
those small pieces which actually need to run arm64 binaries on amd64
(mostly `apk add` in the installer container). Interesting enough,
buildkit QEMU support fails for `ca-certificates` script which runs
after install. At the same time I believe we don't need
`ca-certificates` in the installer, as installer doesn't download
anything from the network, and `ca-certificates` were added a while ago
when installer was actually downloading configuration on its own.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-10-05 23:02:37 +03:00
Andrey Smirnov
e9b0f010d2
chore: update docker image in the pipeline
We use hacked version with a workaround for capability issues with
`--privileged` in Docker.

See moby/moby#42906

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-10-05 21:29:55 +03:00
Andrey Smirnov
5f277713f0
chore: prepare for 0.13-beta release
Update component versions, Go module versions.

Add platform tiers to the support matrix.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-10-01 17:15:31 +03:00
Andrey Smirnov
5e41dd4a65
feat: add an option to configure kubelet node IP based on subnets
Fixes #4243

The idea is to make sure kubelet picks node IP based on filtering by
CIDRs of the node's addresses. The flow is simple - every address is
filtered by subnet and picked if it matches the subnet.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-10-01 15:28:09 +03:00
Alexey Palazhchenko
72e49029e7
chore: allow insecure discovery in debug builds
If Talos is built with `sidero.debug` build tag (`make WITH_DEBUG=1`),
the machine configuration is allowed to use insecure HTTP for the discovery service.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@talos-systems.com>
2021-09-30 17:34:25 +00:00
Andrey Smirnov
d52befd1ac
fix: ignore 404 for AWS external IPs
Also ignore expected errors for other platforms to keep controller from
failing over and over again.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-30 16:50:14 +03:00
Andrey Smirnov
44a63e9a4d
feat: update containerd to 1.5.6
This also updates runc to 1.0.2, libseccomp to 2.5.2.

See also https://github.com/containerd/containerd/releases/tag/v1.5.6

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-30 16:17:42 +03:00
Spencer Smith
0e0fb68478
release(v0.13.0-alpha.3): prepare release
This is the official v0.13.0-alpha.3 release.

Signed-off-by: Spencer Smith <spencer.smith@talos-systems.com>
2021-09-29 18:24:30 -04:00
Andrey Smirnov
4044372e12
feat: harvest discovered endpoints and push them via discovery svc
Fixes #4250

Each KubeSpan peer sees each other KubeSpan peer endpoint as it got
connected. If the peer is behind NAT, the discovered endpoint is
different from the endpoints node knows about itself (as it punched a
hole in NAT). This discovered endpoint is pushed to the discovery
service so that every other peer now can use that punched hole to talk
to the peer.

If the endpoint observed is actually in the list of the endpoints
reported by the peer itself, discovery service will take care of
deduplicating them and suppressing updates.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-29 23:39:35 +03:00
Andrey Smirnov
9a51aa8358
feat: add an option to skip downed peers in KubeSpan
Fixes #4248

This resolves the balance between security and connectivity.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-29 23:06:14 +03:00
Andrey Smirnov
cbbd7c6821
feat: publish node's ExternalIPs as node addresses
This means that ExternalIPs (as presented by the platform) will be
published as `AddressStatus` resource, and transitively as
`NodeAddresses` (which includes cert generation) and as KubeSpan
endpoints (for KubeSpan connectivity in the cloud).

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-29 21:51:36 +03:00
Andrey Smirnov
0f60ef6d38
fix: reset inputs back to initial state in secrets.APIController
This fixes a bug when after an error generating certificates controller
gets into a state of not being able to read its expected inputs
(NetworkStatus specifically).

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-29 19:57:51 +03:00
Artem Chernyshev
64cb873ec4
feat: override static pods default args by extra Args
Use `argsbuilder` same way as it's used in services.
Rewrite `kubeProxy` generation code to override default args.

As a consequence of this change now flags do not have determined order
as they all come from a single merged map.

Introduced merge policy in the `ArgsBuilder` to deny overrides for some
arguments and do additive merge of others.

Fixes: https://github.com/talos-systems/talos/issues/4238
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2021-09-29 11:50:40 +03:00
Andrey Smirnov
ecdd7757fb
test: workaround race in the tests with zaptest package
Looks like `zaptest` package when used from the goroutine (like in gRPC
server) results in a potential data race on test tear down.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-28 23:59:00 +03:00
Andrey Smirnov
9c67fde759
release(v0.13.0-alpha.2): prepare release
This is the official v0.13.0-alpha.2 release.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-28 22:36:17 +03:00
Andrey Smirnov
30ae714243
feat: implement integration with Discovery Service
This provides integration layer with discovery service to provide
cluster discovery (and transitively KubeSpan peer discovery).

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-28 20:24:08 +03:00
Serge Logvinov
353d632ae5
feat: add nocloud platform support
* fetch cdrom/net nocloud config
* apply simple network configuration

Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-28 16:32:12 +03:00
Andrey Smirnov
628fbf9b48
chore: update Linux to 5.10.69
See https://github.com/talos-systems/pkgs/pull/336

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-28 15:44:08 +03:00
Andrey Smirnov
62acd62516
fix: check trustd API CA on worker nodes
This distributes API CA (just the certificate, not the key) to the
worker nodes on config generation, and if the CA cert is present on the
worker node, it verifies TLS connection to the trustd with the CA
certificate.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-28 15:14:23 +03:00
Serge Logvinov
ba27bc366f
feat: implement Hetzner Cloud support for virtual (shared) IP
Talos supports automatic virtual IP for the control plane with pure
layer 2 connectivity. Hetzner Cloud API supports assigning Floating IPs
to the nodes, this PR combines existing virtual IP functionality with calls
to HCloud API to move the IP address on HCloud side to the leader node.

The only thing which should be supplied in the machine configuration is
the Hetzner Cloud API token, every other setting is automatically
discovered by Talos.

Talos supports two types of floating IPs:
* external Floating IP for external network
* server alias IP for local networks

The controlplane can have only one alias on the local network interface.

Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-27 23:45:46 +03:00
Alexey Palazhchenko
95f440eaa0
test: add fuzz test for configloader
That PR contains an example of how fuzz tests can be written with Go 1.18.

It also fixes a few panics with invalid configs.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@talos-systems.com>
2021-09-27 18:44:46 +00:00
Alexey Palazhchenko
d2cf021d8f
chore: remove deprecated "join" term
Closes #3910.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@talos-systems.com>
2021-09-27 17:18:22 +00:00
Andrey Smirnov
0e18e2800f
chore: bump dependencies
Some via dependabot, some via go-mod-outdated.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-27 16:35:50 +03:00
Andrey Smirnov
b450b7cef0
chore: deprecate Interfaces and Routes APIs
Fixes #4094

Deprecate old networkd APIs, `talosctl interfaces` and `talosctl routes`
now suggest different commands to be used to achieve same task.

TUI installer was updated to stop using Interfaces API.

Those APIs will be completely removed in 0.14.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-27 15:21:02 +03:00
Artem Chernyshev
cddcb9622b
fix: find devices without partition table
This should fix lookup of CD-ROM devices.
Update `go-blockdevice` library to the version with the fix.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2021-09-27 14:49:41 +03:00
Seán C McCord
b1b6d61365
fix: check for existence of dhcp6 FQDN first
Check that dhcpv6.Options.FQDN() is not nil before trying to use it.

This fixes DHCPv6 on GCP.

Signed-off-by: Seán C McCord <ulexus@gmail.com>
2021-09-24 12:58:04 -07:00
Artem Chernyshev
519999b846
fix: use readonly mode when probing devices with All lookup
Update `go-blockdevice` library.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2021-09-23 14:47:52 +03:00
Andrey Smirnov
2b5204200a
feat: enable resource API in the maintenance mode
This basically provides `talosctl get --insecure` in maintenance mode.
Only non-sensitive resources are available (equivalent to having
`os:reader` role in the Talos client certificate).

Changes:

* refactored insecure/maintenance client setup in talosctl
* `LinkStatus` is no longer sensitive as it shows only Wireguard public
key, `LinkSpec` still contains private key for obvious reasons
* maintenance mode injects `os:reader` role implicitly

The motivation behind this PR is to deprecate networkd-era interfaces &
routes APIs which are being used in TUI installer, and we need a
replacement.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-22 21:36:34 +03:00
Artem Chernyshev
452893c260
fix: make probe open blockdevice in readonly mode
Update `go-blockdevice` library.

Readwrite mode doesn't work when there are readonly devices like `iso`.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2021-09-22 18:48:03 +03:00
Andrey Smirnov
96bccdd3b6
test: update CABPT provider to 0.3 release
Testing with new CABPT release.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-22 18:21:56 +03:00
Serge Logvinov
d9eb18bfdd
fix: containerd log symlink
Kubelet creates symlinks from /var/log/containers/<pod>.log to the log file /var/log/pod/<pod-folder>/0.log
Log senders (like fluentd) usually watch the folder /var/log/containers/*.log
Kubelet needs to share containers folder.

Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-21 15:09:02 +03:00
Seán C McCord
efa7f48e08
docs: quicklinks on landing page
Add quick links for the most important first-time-user docs to the
docs landing page.

Signed-off-by: Seán C McCord <ulexus@gmail.com>
2021-09-20 14:29:21 -07:00
Andrey Smirnov
1cb9f282b5
fix: don't marshal clock with SecretsBundle
This field is not marshalable, as it's technically an interface.

This will be used to save/load SecretsBundle as a whole in the CABPT.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-20 21:48:44 +03:00
Andrey Smirnov
b27c75b30f
release(v0.13.0-alpha.1): prepare release
This is the official v0.13.0-alpha.1 release.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-20 19:51:07 +03:00
Andrey Smirnov
9d803d75bf
chore: bump dependencies and drop firecracker support
Note: Talos can be still run under `Firecracker`, support for
Firecracker was only removed for `talosctl cluster create`.

Reason:

* code is untested/unmaintained, and probably doesn't work correctly
* firecracker Go SDK pulls lots of dependencies and it blocks CNI Go
module update

Bonus: `talosctl-linux-amd64` shrinks by 2 MiB.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-20 17:13:34 +03:00
Andrey Smirnov
50a2410482
feat: add operating system version field to discovery
Fixes #4232

The result:

```
talosctl -n 172.20.0.2 get members
NODE         NAMESPACE   TYPE     ID                       VERSION   HOSTNAME                 MACHINE TYPE   OS                                           ADDRESSES
172.20.0.2   cluster     Member   talos-default-master-1   2         talos-default-master-1   controlplane   Talos (v0.13.0-alpha.0-13-gfdd80a12-dirty)   ["172.20.0.2","fdd1:f54:2697:3902:44f8:92ff:fe2e:1aea"]
172.20.0.2   cluster     Member   talos-default-worker-1   1         talos-default-worker-1   worker         Talos (v0.13.0-alpha.0-13-gfdd80a12-dirty)   ["172.20.0.3","fdd1:f54:2697:3902:d4ba:55ff:fe8a:f551"]
172.20.0.2   cluster     Member   talos-default-worker-2   1         talos-default-worker-2   worker         Talos (v0.13.0-alpha.0-13-gfdd80a12-dirty)   ["172.20.0.4","fdd1:f54:2697:3902:e00d:f4ff:fecf:51c8"]
```

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-17 15:44:33 +03:00
Andrey Smirnov
085c61b2ec
chore: add a special condition to check for kubeconfig readiness
The problem is that the kubelet kubeconfig gets created early, but the
actual client key and cert files are not written, so controllers spam
with scary errors that the config is not valid. This PR removes those
scary messages as we wait for the kubeconfig to be usable.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-17 00:07:38 +03:00
Andrey Smirnov
21cdd85403
fix: add node address to the list of allowed IPs (kubespan)
This fixes the bug with host networking pods not being able to reach out
to the Kubernetes services.

This also moves any node-to-node networking over to KubeSpan link as
well.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-16 23:29:42 +03:00
Andrey Smirnov
fdd80a1234
feat: add an option to continue booting on NTP timeout
Fixes #4224

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-16 21:34:17 +03:00