236 Commits

Author SHA1 Message Date
Dmitriy Matrenichev
4dbbf4ac50
chore: add generic methods and use them part #2
Use things from #5702.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-06-09 23:10:02 +08:00
Andrey Smirnov
f2997c0f22
chore: bump dependencies
dependabot + go-mod-outdated

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-06-06 23:27:17 +04:00
Andrey Smirnov
2ae0e3a569
test: add a test for version of Go Talos was built with
This is to ensure that in fact Talos is built with Go version we expect.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-05-11 21:51:12 +03:00
Artem Chernyshev
2b03057b91
feat: implement a new mode try in the config manipulation commands
The new mode allows changing the config for a period of time, which
allows trying the configuration and automatically rolling it back in case
if it doesn't work for example.

The mode can only be used with changes that can be applied without a
reboot.

When changed it doesn't write the configuration to disk, only changes it
in memory.
`--timeout` parameter can be used to customize the rollback delay.
The default timeout is 1 minute.

Any consequent configuration change will abort try mode and the last
applied configuration will be used.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2022-04-21 20:31:45 +03:00
Artem Chernyshev
2b9722d1f5
feat: add dry-run flag in apply-config and edit commands
Dry run prints out config diff, selected application mode without
changing the configuration.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2022-04-14 19:12:57 +03:00
Andrey Smirnov
fc23c7a595
test: bump versions for upgrade tests
Use 0.14 -> 1.0 -> master.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-03-30 18:59:48 +03:00
Dmitriy Matrenichev
e06e1473b0
feat: update golangci-lint to 1.45.0 and gofumpt to 0.3.0
- Update golangci-lint to 1.45.0
- Update gofumpt to 0.3.0
- Fix gofumpt errors
- Add goimports and format imports since gofumports is removed
- Update Dockerfile
- Fix .golangci.yml configuration
- Fix linting errors

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-03-24 08:14:04 +04:00
Andrey Smirnov
883d401f9f
chore: rename github organization to siderolabs
Go module import paths still use talos-systems, packages use new
siderolabs name.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-03-23 21:07:46 +03:00
Andrey Smirnov
f477507262
fix: the etcd recovery client and tests
This is the follow-up fix to the PR #5129.

1. Correctly catch only expected errors in the tests.
2. Rewind the snapshot each time the upload is retried.
3. Correctly unwrap errors in the `EtcdRecovery` client.
4. Update the `grpc-proxy` library to pass through the EOF error.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-03-22 16:51:36 +03:00
Artem Chernyshev
27af5d41c6
feat: pause the boot process on some failures instead of rebooting
Some failures can be fixed by updating the machine configuration.
Now `userDisks` and `userFiles` do not make Talos to enter into reboot
loop but pause for 35 minutes.

Additionally, `apid` and `machined` are now started right after
containerd is up and running.

That makes it possible for the operator to connect to the node using
talosctl and fix the config.

Fixes: https://github.com/talos-systems/talos/issues/4669
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2022-03-21 17:39:45 +03:00
Artem Chernyshev
a50747a64a
fix: align list and diskusage command flags with their Linux analogs
Fixes: https://github.com/talos-systems/talos/issues/3018

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2022-03-02 22:27:56 +03:00
Andrey Smirnov
09efa62f68
chore: re-enable kexec and default to UEFI booting in tests
Fixes #4947

It turns out there's something related to boot process in BIOS mode
which leads to initramfs corruption on later `kexec`.

Booting via GRUB is always successful.

Problem with kexec was confirmed with:

* direct boot via QEMU
* QEMU boot via iPXE (bundled with QEMU)

The root cause is not known, but the only visible difference is the
placement of RAMDISK with UEFI and BIOS boots:

```
[    0.005508] RAMDISK: [mem 0x312dd000-0x34965fff]
```

or:

```
[    0.003821] RAMDISK: [mem 0x711aa000-0x747a7fff]
```

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-03-02 21:52:18 +03:00
Andrey Smirnov
0da370dfef
test: unlock CABPT/CACPPT provider versions
We should always test latest versions of our providers.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-02-10 00:14:15 +03:00
Andrey Smirnov
85782faa24
feat: update Kubernetes to 1.23.3
Also bumps some dependencies and updates Talos version we use in the
upgrade tests.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-01-26 17:59:21 +03:00
Artem Chernyshev
2f2bdb26aa
feat: replace flags with --mode in apply, edit and patch commands
Fixes: https://github.com/talos-systems/talos/issues/4588

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2022-01-13 16:09:53 +03:00
Andrey Smirnov
2f4b9d8d6d
feat: make machine configuration read-only in Talos (almost)
Talos shouldn't try to re-encode the machine config it was provided
with.

So add a `ReadonlyWrapper` around `*v1alpha1.Config` which makes sure
that raw config object is not available anymore (it's a private field),
but config accessors are available for read-only access.

Another thing that `ReadonlyWrapper` does is that it preserves the
original `[]byte` encoding of the config keeping it exactly same way as
it was loaded from file or read over the network.

Improved `talosctl edit mc` to preserve the config as it was submitted,
and preserve the edits on error from Talos (previously edits were lost).

`ReadonlyWrapper` is not used on config generation path though - config
there is represented by `*v1alpha.Config` and can be freely modified.

Why almost? Some parts of Talos (platform code) patch the machine
configuration with new data. We need to fix platforms to provide
networking configuration in a different way, but this will come with
other PRs later.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-12-28 20:12:55 +03:00
Andrey Smirnov
d2a7e082c2
test: retry in discovery tests
Sometimes pushing/pulling to Kubernetes registry is delayed due to
backoff on failed attempts to talk to the API server when the cluster is
still bootstrapping. Workaround that by adding retries.

Also disable kernel module controller in container mode, as it will keep
always failing.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-12-28 16:55:41 +03:00
Andrey Smirnov
c297d66a13
test: attempt number on two on proper retries in CLI time tests
See #4702

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-12-22 18:29:34 +03:00
Andrey Smirnov
17c1474881
test: retry talosctl time call in the tests
As `talosctl time` relies on default time server set in the config, and
our nodes start with `pool.ntp.org`, sometimes request to the timeserver
fails failing the tests.

Retry such errors in the tests to avoid spurious failures.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-12-17 20:55:06 +03:00
Noel Georgi
4c96e936ed
docs: add cilium guide
- Add Cilium CNI install guide
- Use Canal CNI for default examples

Fixes #4477

Signed-off-by: Noel Georgi <git@frezbo.dev>
2021-12-16 20:37:03 +05:30
Andrey Smirnov
ec641f7296
fix: use default time servers in time API if none are configured
This fixes simple bug:

```
$ talosctl -n 172.20.0.2 time
error fetching time: 1 error occurred:
	* 172.20.0.2: rpc error: code = Unknown desc = no time servers configured
```

After the change:

```
$ talosctl -n 172.20.0.2 time
NODE         NTP-SERVER     NODE-TIME                                 NTP-SERVER-TIME
172.20.0.2   pool.ntp.org   2021-12-10 14:25:38.871656717 +0000 UTC   2021-12-10 14:25:38.92119139 +0000 UTC
```

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-12-10 17:39:36 +03:00
Andrey Smirnov
97ffa7a645
feat: upgrade kubelet version in talosctl upgrade-k8s
Fixes #4656

As now changes to kubelet configuration can be applied without a reboot,
`talosctl upgrade-k8s` can handle the kubelet upgrades as well.

The gist is simply modifying machine config and waiting for `Node`
version to be updated, rest of the code is required for reliability of
the process.

Also fixed a bug in the API while watching deleted items with
tombstones.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-12-08 21:12:17 +03:00
Andrey Smirnov
64a4f6e77c
test: bump Talos versions in upgrade tests
In preparation for going 0.14-beta.0, bump versions in upgrade tests.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-12-06 18:07:24 +03:00
Artem Chernyshev
4f5d9da922
feat: allow overriding KSPP kernel parameters
Fixes: https://github.com/talos-systems/talos/issues/4385

Now sysctls defined in the config can override kernel args defined by
defaults controller.
In that case controller shows the warning that tells which param was
overridden and the new value and tells that it is not recommended.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2021-12-03 18:50:21 +03:00
Rohit Dandamudi
7f9922296a
feat: add powercycle mode in reboot
- Fixes #4569
- Updated reboot process sequence
- Updted api.descriptors to avoid proto type change linting error https://github.com/talos-systems/talos/pull/4612#discussion_r758599242
Signed-off-by: Rohit Dandamudi <rohit.dandamudi@siderolabs.com>

Signed-off-by: Rohit Dandamudi <rohit.dandamudi@siderolabs.com>
2021-12-02 22:40:04 +05:30
Nico Berlee
852bf4a7de
feat: talosctl fish completion support
Generate talosctl completion for fish

Signed-off-by: Nico Berlee <nico.berlee@on2it.net>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-11-23 16:45:16 +03:00
Andrey Smirnov
753a82188f
refactor: move pkg/resources to machinery
Fixes #4420

No functional changes, just moving packages around.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-11-15 19:50:35 +03:00
Alexey Palazhchenko
7462733bcb
chore: update golangci-lint
Fix context propagation.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@talos-systems.com>
2021-11-15 14:55:25 +00:00
Andrey Smirnov
a76f6d69db
feat: allow kubelet to be restarted and provide negative nodeIP subnets
Fixes #4407 fixes #4489

This PR started by enabling simple restart of the `kubelet` service via
services API, but it turned out there's a problem:

When kubelet restarts, CNI is already up, so there's an interface on the
host with CNI node IP, the code which picks kubelet node IP finds it and
tries to add it to the list of kubelet node IPs which completely breaks
kubelet.

Solution was easy: allow node IPs to be filtered out - e.g. we never
want kubelet node IP to be from the pod CIDR.

But this filtering feature is also useful in other cases, so I added
that as well.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-11-15 15:43:34 +03:00
Andrey Smirnov
d4b0ca21a1
test: retry upgrade mutex lock failures
With recent changes and kexec, Talos upgrades much faster in the tests
and mutex is not released properly (#4525).

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-11-12 17:49:46 +03:00
Artem Chernyshev
efbae7857d
fix: use etc folder for du cli tests
Fixes: https://github.com/talos-systems/talos/issues/4382

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2021-11-10 20:10:40 +03:00
Artem Chernyshev
261c497c71
feat: implement talosctl support command
Fixes: https://github.com/talos-systems/talos/issues/4406

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2021-11-08 16:20:50 +03:00
Andrey Smirnov
8329d21114
chore: split polymorphic RootSecret resource into specific types
Fixes #4418

Only one resource (one of the very first ones) was polymorphic: its
actual spec type depends on its ID. This was a bad idea, and it doesn't
work with protobuf specs (as type <> protobuf relationship can't be
established).

Refactor this by splitting into three separate resource types:
`OSRoot` (OS-level root secrets), `EtcdRoot` (for etcd),
`KubernetesRoot` (for Kubernetes).

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-10-27 19:56:04 +03:00
Andrey Smirnov
b6b78e7fef
test: add cluster discovery integration tests
This verifies that members match cluster state and that both cluster
registries work in sync producing same discovery data.

Fixes #4191

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-10-25 21:03:29 +03:00
Andrey Smirnov
38516a5499
test: update Talos versions in upgrade tests
Now 0.13.0 is the past release and 0.12.3 is the one before it.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-10-21 17:36:30 +03:00
Andrey Smirnov
3e100aa977
test: workaround EventsWatch test flakiness
This test sometimes fails with a message like:

```
=== RUN   TestIntegration/api.EventsSuite/TestEventsWatch
    assertion_compare.go:323:
        	Error Trace:	events.go:88
        	Error:      	"0" is not greater than or equal to "14"
        	Test:       	TestIntegration/api.EventsSuite/TestEventsWatch
        	Messages:   	[]
```

I believe the root cause is that the initial (first event) delivery
might be more than 100ms, so instead of waiting for 100ms for each
event, block for 500ms for all events to arrive.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-10-15 12:51:56 +03:00
Andrey Smirnov
b450b7cef0
chore: deprecate Interfaces and Routes APIs
Fixes #4094

Deprecate old networkd APIs, `talosctl interfaces` and `talosctl routes`
now suggest different commands to be used to achieve same task.

TUI installer was updated to stop using Interfaces API.

Those APIs will be completely removed in 0.14.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-27 15:21:02 +03:00
Andrey Smirnov
d943bb0e28
feat: update Kubernetes to 1.22.2
See https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.22.md

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-16 13:59:51 +03:00
Andrey Smirnov
a059454045
chore: build using Go 1.17
`initramfs` size for amd64 shrinks by 1.3 MiB.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-09-13 22:33:47 +03:00
Andrey Smirnov
950f122c95
chore: update versions in upgrade tests
In preparation for 0.13, start testing upgrades to 0.12.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-08-25 18:02:47 +03:00
Andrey Smirnov
dadaa65d54
feat: print uid/gid for the files in ls -l
This adds information about file ownership in the long listing which is
crucial sometimes.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2021-08-13 00:10:49 +03:00
Alexey Palazhchenko
09d70b7eaf feat: update Kubernetes to v1.22.0
Closes #3967.
Closes #3997.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@talos-systems.com>
2021-08-06 09:06:32 -07:00
Alexey Palazhchenko
eea750de2c chore: rename "join" type to "worker"
Closes #3413.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>
2021-07-09 07:10:45 -07:00
Andrey Smirnov
b969e7720e chore: update references to old protobuf package
This simply uses new protobuf package instead of old one.

Old protobuf package is still in use by Talos dependencies.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-07-08 05:34:12 -07:00
Andrey Smirnov
10c28758a4 fix: ignore DeadlineExceeded error correctly on bootstrap
The problem was that gRPC method `status.Code(err)` doesn't unwrap
errors, while Talos client returns errors wrapped with
`multierror.Error` and `fmt.Errrorf`, so `status.Code` doesn't return
error code correctly.

Fix that by introducing our own client method which correctly goes over
the chain of wrapped errors.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-07-07 12:02:26 -07:00
Andrey Smirnov
84817f7334 chore: bump Talos version in upgrade tests
Preparing for 0.11 to be stable release soon.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-06-29 07:24:48 -07:00
Alexey Palazhchenko
2fa54107b2 chore: fix tests for disabled RBAC
This commit also introduces a hidden `--json` flag for `talosctl version` command
that is not supported and should be re-worked at #907.

Refs #3852.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>
2021-06-28 13:56:40 -07:00
Alexey Palazhchenko
bbf1c091d4 feat: add RBAC to talosctl version output
Refs #3852.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>
2021-06-28 07:10:25 -07:00
Alexey Palazhchenko
ad047a7dee chore: small RBAC improvements
* `talosctl config new` now sets endpoints in the generated config.
* Avoid duplication of roles in metadata.
* Remove method name prefix handling. All methods should be set explicitly.
* Add tests.

Closes #3421.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>
2021-06-25 05:50:38 -07:00
Alexey Palazhchenko
3c1b32199d chore: refactor CLI tests
Use testing.T.TempDir.
Add support for `talosctl --endpoints`.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>
2021-06-23 05:49:00 -07:00