140 Commits

Author SHA1 Message Date
Andrey Smirnov
3c9f7a7de6
chore: re-enable nolintlint and typecheck linters
Drop startup/rand.go, as since Go 1.20 `rand.Seed` is done
automatically.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-08-25 01:05:41 +04:00
Dmitriy Matrenichev
c4a1ca8d61
chore: remove <-errCh where possible in grpc methods
Simplify code by passing error directly into the pipe closer.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2023-08-07 22:28:58 +03:00
Dmitriy Matrenichev
80238a05a6
chore: unify semver under github.com/blang/semver/v4
Currently, we use `github.com/coreos/go-semver/semver` and `github.com/hashicorp/go-version`
for version parsing. As we use `github.com/blang/semver/v4` in our other projects, and it
has more features, it makes sense to use it across the projects. It also doesn't allocate
like crazy in `KubernetesVersion.SupportedWith`.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2023-08-04 00:29:52 +03:00
Andrey Smirnov
8017afb107
feat: implement CRI image management and pre-pull on K8s upgrade
Fixes #6391

Implement a set of APIs and commands to manage images in the CRI, and
pre-pull images on Kubernetes upgrades.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-11 19:25:10 +04:00
Christian Rolland
e6dde8ffc5
feat: add network chaos to qemu development environment
Add flags for configuring the qemu bridge interface with chaos options:
- network-chaos-enabled
- network-jitter
- network-latency
- network-packet-loss
- network-packet-reorder
- network-packet-corrupt
- network-bandwidth

These flags are used in /pkg/provision/providers/vm/network.go at the end of the CreateNetwork function to first see if the network-chaos-enabled flag is set, and then check if bandwidth is set.  This will allow developers to simulate clusters having a degraded WAN connection in the development environment and testing pipelines.

If bandwidth is not set, it will then enable the other options.
- Note that if bandwidth is set, the other options such as jitter, latency, packet loss, reordering and corruption will not be used.  This is for two reasons:
	- Restriction the bandwidth can often intoduce many of the other issues being set by the other options.
	- Setting the bandwidth uses a separate queuing discipline (Token Bucket Filter) from the other options (Network Emulator) and requires a much more complex configuration using a Heirarchial Token Bucket Filter which cannot be configured at a granular enough level using the vishvananda/netlink library.

Adding both queuing disciplines to the same interface may be an option to look into in the future, but would take more extensive testing and control over many more variables which I believe is out of the scope of this PR.  It is also possible to add custom profiles, but will also take more research to develop common scenarios which combine different options in a realistic manner.

Signed-off-by: Christian Rolland <christian.rolland@siderolabs.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-06-06 20:15:26 +04:00
Dmitriy Matrenichev
8a02ecd4cb
chore: add endpoints balancer controller
This PR adds support for creating a list of API endpoints (each is pair of host and port).

It gets them from
- Machine config cluster endpoint.
- Localhost with LocalAPIServerPort if machine is control panel.
- netip.Addr[0] and port from affiliates if they are control panels.

For #7191

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2023-06-05 20:47:52 -04:00
Andrey Smirnov
bab484a405
feat: use stable network interface names
Use `udevd` rules to create stable interface names.

Link controllers should wait for `udevd` to settle down, otherwise link
rename will fail (interface should not be UP).

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-06-01 21:29:12 +04:00
Andrey Smirnov
badbc51e63
refactor: rewrite code to include preliminary support for multi-doc
`config.Container` implements a multi-doc container which implements
both `Container` interface (encoding, validation, etc.), and `Conifg`
interface (accessing parts of the config).

Refactor `generate` and `bundle` packages to support multi-doc, and
provide backwards compatibility.

Implement a first (mostly example) machine config document for
SideroLink API URL.

Many places don't properly support multi-doc yet (e.g. config patches).

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-05-31 18:38:05 +04:00
Andrey Smirnov
dc6764871c
refactor: move around config interfaces, make RawV1Alpha1 typed
See #7230

Refactor more config interfaces, move config accessor interfaces
to different package to break the dependency loop.

Make `.RawV1Alpha1()` method typed to avoid type assertions everywhere.

No functional changes.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-05-23 22:08:58 +04:00
Andrey Smirnov
ea9a97dba3
fix: fall back to external IP when discovering nodes in upgrade-k8s
Fixes #7253

Also fix the case that `kube-proxy` version was updated in the machine
config in `--dry-run` mode.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-05-23 17:54:35 +04:00
Andrey Smirnov
0bb7e8a5cf
refactor: split config.Provider into Config & Container
See #7230

This is a step towards preparing for multi-doc config.

Split the `config.Provider` interface into parts which have different
implementation:

* `config.Config` accesses the config itself, it might be implemented by
  `v1alpha1.Config` for example
* `config.Container` will be a set of config documents, which implement
  validation, encoding, etc.

`Version()` method dropped, as it makes little sense and it was almost
not used.

`Raw()` method renamed to `RawV1Alpha1()` to support legacy direct
access to `v1alpha1.Config`, next PR will refactor more to make it
return proper type.

There will be many more changes coming up.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-05-23 16:05:16 +04:00
Dmitriy Matrenichev
45e6e27af7
chore: bump runtime
Use new functions and methods from runtime module.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2023-05-11 17:18:08 -04:00
Noel Georgi
cad43f0ad3
chore: remove k8s master label
Since talos now defaults to k8s 1.27, remove the handling
of `master` label for controlplane nodes.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-04-25 20:48:05 +05:30
Andrey Smirnov
374ef53853
test: submit verbose flag to e2e tests
Without it the output doesn't match the criteria for the Kubernetes
conformance.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-04-19 14:35:53 +04:00
Noel Georgi
c63cf90e32
feat: update k8s to v1.27.0-beta.0
Update k8s to v1.27.0-beta.0

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-03-21 23:59:17 +05:30
Andrey Smirnov
8ea4bfad8f
refactor: improve the kubernetes upgrade flow
Use new version of go-kubernetes, and move the `kube-proxy` DaemonSet
update to follow common logic of bootstrap manifests update.

This fixes a confusing behavior when after `k8s-upgrade` the version of
`kube-proxy` is not updated in the machine config.

See https://github.com/siderolabs/go-kubernetes/pull/3

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-03-06 15:01:29 +04:00
Noel Georgi
a78281214d
feat: add cilium e2e tests
Add cilium e2e tests. The existing cilium check was very old, update to
latest cilium version and also add a test for KPR strict mode.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-03-03 20:03:25 +05:30
Andrey Smirnov
230e46e567
refactor: extract parts of kubernetes libraries
The shared code is going out to the
github.com/siderolabs/go-kubernetes library.

The code will be used in Talos and other projects using same features.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-02-22 14:56:49 +04:00
Andrey Smirnov
1e1aa84f6c
fix: kubernetes removed resource version check
Not all Kubernetes deprecated resources are same - if the old API
version is deprecated, but new one is available, API server handles
trnasition for us. If some resource is removed completely, we need to
check for it. This reduces number of items to check, and simplifies the
check.

Move the check under the umbrella of the 'upgrade pre-checks', and make
it actually fatal.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-02-10 14:45:09 +04:00
Andrey Smirnov
89dbb0ecf0
release(v1.4.0-alpha.0): prepare release
This is the official v1.4.0-alpha.0 release.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-12-23 22:32:09 +04:00
Andrey Smirnov
2e84d2ab34
chore: update conformance product.yaml
New requirements from CNCF.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-12-16 14:44:34 +04:00
Noel Georgi
faf49218ce
feat: add more checks for K8s upgrade
Add more checks for the Talos Kubernetes upgrade.

The removed api-server resources checks are kept as is, needs to be
moved to the new checks as part of #6599.

Fixes: #6444

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-12-14 19:29:19 +05:30
Dmitriy Matrenichev
eb332cfcb7
feat: add health check for a minimal memory / disk size
This PR adds two additional checks which are performed during boot sequence and in `talosctl health`. They ensure that nodes have enough memory and disk.

- Boot check will print a warning if memory / disk size is not sufficient.
- Health check will fail if memory / disk size is not sufficient.

Closes #6467

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-12-10 07:05:08 +03:00
Andrey Smirnov
54d9032ce2
test: fix log streaming for conformance tests
Global timeout in kubeconfig was cauing log streaming to abort.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-11-18 22:47:36 +04:00
Andrey Smirnov
96aa9638f7
chore: rename talos-systems/talos to siderolabs/talos
There's a cyclic dependency on siderolink library which imports talos
machinery back. We will fix that after we get talos pushed under a new
name.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-11-03 16:50:32 +04:00
Andrey Smirnov
343c55762e
chore: replace talos-systems Go modules with siderolabs
This the first step towards replacing all import paths to be based on
`siderolabs/` instead of `talos-systems/`.

All updates contain no functional changes, just refactorings to adapt to
the new path structure.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-11-01 12:55:40 +04:00
Andrey Smirnov
08e7e49a29
test: update versions for upgrade tests
Use the latest releases in each branch.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-11-01 10:40:19 +04:00
Andrey Smirnov
0b41923c36
fix: restore the StaticPodStatus resource
It got broken with the changes to the kubelet now sourcing static pods
from a HTTP internal server.

As we don't want it to be broken, and to make health checks better, add
a new check to make sure kubelet reports control plane static pods as
running. This coupled with API server check should make it more
thorough.

Also add logging when static pod definitions are updated (they were
previously there for file-based implementation). These logs are very
helpful for troubleshooting.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-10-31 18:48:03 +04:00
Dmitriy Matrenichev
93e55b85f2
chore: bump golangci-lint to v1.50.0
I had to do several things:
- contextcheck now supports Go 1.18 generics, but I had to disable it because of this https://github.com/kkHAIKE/contextcheck/issues/9
- dupword produces to many false positives, so it's also disabled
- revive found all packages which didn't have a documentation comment before. And tehre is A LOT of them. I updated some of them, but gave up at some point and just added them to exclude rules for now.
- change lint-vulncheck to use `base` stage as base

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-10-20 18:33:19 +03:00
Dmitriy Matrenichev
fc48849d00
chore: move maps/slices/ordered to gen module
Use github.com/siderolabs/gen

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-09-21 20:22:43 +03:00
Andrey Smirnov
8b09bd4b04
feat: update Kubernetes to v1.26.0-alpha.1
Talos 1.3.0 will ship with Kubernetes 1.26.0.

See https://github.com/kubernetes/kubernetes/releases/tag/v1.26.0-alpha.1

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-09-21 18:42:31 +04:00
Utku Ozdemir
0847400f72
fix: prevent panic on health check if a member has no IPs
If a member has no IP addresses, prevent cluster health checks from failing with a panic by checking for the length of member IPs and not assuming there's always at least 1 IP.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2022-09-02 15:16:59 +02:00
Utku Ozdemir
0b339a9dc5
feat: track progress of action API calls
Track the progress of the long-running actions `reboot`, `reset`, `upgrade` and `shutdown` on the client side by default, unless `--no-wait=true` is specified.

Use the events API to follow the events using the actor ID of the action and display it using an stderr reporter with a spinner.

Closes siderolabs/talos#5499.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2022-08-29 22:54:40 +02:00
Andrey Smirnov
cdd0f08bc5
feat: check client <> server version in some Talos commands
Talos commands which are sensitive to resource API changes:

* `get`
* `edit`, `patch`
* `upgrade-k8s`

Commands with upcoming changes for actorID:

* `reboot`
* `reset`
* `shutdown`
* `upgrade`

Fixes #6101

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-26 18:37:51 +04:00
Dmitriy Matrenichev
b59ca5810e
chore: move from inet.af/netaddr to net/netip and go4.org/netipx
Closes #6007

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-08-25 17:51:32 +03:00
Andrey Smirnov
9baca49662
refactor: implement COSI resource API for Talos
Overview: deprecate existing Talos resource API, and introduce new COSI
API.

Consequences:

* COSI API can only go via one-2-one proxy (`client.WithNode`)
* client-side API access is way easier with `state.State` wrappers
* lots of small changes on the client side to use new APIs

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-12 22:31:54 +04:00
Noel Georgi
b62b18a972
feat: bump k8s to v1.25.0-beta.0
Bump k8s to v1.25.0-beta.0

Update most kubernetes `master` references to `controlplane`

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-08-10 22:17:53 +05:30
Andrey Smirnov
a6b010a8b4
chore: update Go to 1.19, Linux to 5.15.58
See https://go.dev/doc/go1.19

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-03 17:03:58 +04:00
Andrey Smirnov
0cdf222431
fix: retry Conflict errors when upgrading k8s manifests
Fixes #5985

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-07-29 13:20:04 +04:00
Andrey Smirnov
2b23fabcc1
docs: use SVG image for K8s conformance
It doesn't accept PNG images.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-06-22 22:15:46 +04:00
Andrey Smirnov
b816d0b600
docs: fix the vendor information for Kubernetes conformance tests
As we submit results to Certified Kubernetes, we provide metadata which
should be updated now, and also we lost the logo in our assets.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-06-21 22:25:10 +04:00
Andrey Smirnov
a167a54021
test: fix CLI nodes discovery without provisioner data
When integration tests run without data from Talos provisioner (e.g.
against AWS/GCP), it should work only with `talosconfig` as an input.

This specific flow was missing filling out `infoWrapper` properly.

Clean up things a bit by reducing code duplication.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-06-21 18:42:26 +04:00
Utku Ozdemir
6759fcd4ae
feat: use discovery service on cluster health checks
Query the discovery service to fetch the node list and use the results in health checks. Closes siderolabs#5554.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2022-06-15 16:01:38 +02:00
Utku Ozdemir
8d2be5e315
feat: extend node definition used in health checks
Introduce `cluster.NodeInfo` to represent the basic info about a node which can be used in the health checks. This information, where possible, will be populated by the discovery service in following PRs. Part of siderolabs#5554.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2022-06-13 14:13:42 +02:00
Dmitriy Matrenichev
4dbbf4ac50
chore: add generic methods and use them part #2
Use things from #5702.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-06-09 23:10:02 +08:00
Dmitriy Matrenichev
70fc424099
chore: add generic methods and use them
Things like ToSet, Keys etc...

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-06-09 02:59:23 +08:00
Utku Ozdemir
c19dd1b892
feat: add 'etcd members should be control plane nodes' health check
Add new health check which checks if the etcd members match the control plane nodes. Closes siderolabs#5553.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2022-06-07 10:34:38 +02:00
Dmitriy Matrenichev
bf7a6443ee
feat: add 'etcd membership is consistent across nodes' health check
Add new health check which waits for all etcd members. Closes #5552.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-05-20 21:51:17 +08:00
Andrey Smirnov
5a91f6076d
fix: ignore completed pods in cluster health check
This fixes an error when integration test become stuck with the message
like:

```
waiting for coredns to report ready: some pods are not ready: [coredns-868c687b7-g2z64]
```

After some random sequence of node restarts one of the pods might become
"stuck" in `Completed` state (as it is shown in `kubectl get pods`)
blocking the check, as the pod will never become ready.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-05-16 14:28:25 +03:00
Andrey Smirnov
f1f43131f8
fix: strip 'v' prefix from versions on Kubernetes upgrade
This fixes an issue when `talosctl upgrade-k8s` fails with unhelpful
message if the version is specified as `v1.23.5` vs. `1.23.5`.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-04-22 14:59:12 +03:00