3937 Commits

Author SHA1 Message Date
Andrey Smirnov
ea0e9bdbe4
feat: environment variables via the kernel arguments
Unify getting environment variables, support passing environment
variables via kernel args.

Fixes #6984
See #6999

For META this will be used to pass environment variables to the
installer for ISO images (or PXE booting).

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-03-28 16:28:33 +04:00
Andrey Smirnov
94c24ca64e
chore: add machine config version contract for v1.4
No changes vs. v1.3, so mostly no-op change just to keep things
consistent.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-03-27 18:09:21 +04:00
Andrey Smirnov
cefa9c3ecb
feat: update Kubernetes to 1.27.0-rc.0
See https://github.com/kubernetes/kubernetes/releases/tag/v1.27.0-rc.0

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-03-27 14:32:54 +04:00
Andrey Smirnov
9e8603f53b
feat: implement new download URL variable ${code}
New variable value is coming from `META`, and it might be set using the
interactive console (not implemented yet, but it will come soon).

I had to refactor the URL expansion implementation:

* simplify things where possible
* provide more unit-tests for smaller units
* handle expansion of all variables in parallel
* allow parallel expansion on multiple variables

Also I refactored download code to support proper passing of endpoint
function with context.

The end result:

* Talos will try to download config for 3 hours before rebooting
* Each attempt which includes URL expansion + download is limited to 3
  minutes

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-03-24 21:49:36 +04:00
Andrey Smirnov
d30cf9c86e
test: fix misprint in e2e scripts
This bug breaks `e2e-extensions`.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-03-24 15:28:18 +04:00
Andrey Smirnov
0d0bb31cf7
fix: use stripped kernel modules
Fixes #6931

See https://github.com/siderolabs/pkgs/pull/690

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-03-23 23:44:33 +04:00
Andrey Smirnov
3583eea983
release(v1.4.0-alpha.3): prepare release
This is the official v1.4.0-alpha.3 release.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-03-23 21:26:22 +04:00
Utku Ozdemir
a7b79ef1be
feat: add network config screen to dashboard
Implement the network config screen with input forms to configure the initial node networking by writing a config to the META partition.

Closes siderolabs/talos#6961.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2023-03-23 17:29:52 +04:00
Andrey Smirnov
cf2ccc521f
fix: always shutdown maintenance API service
The problem was that `GracefulStop()` will hang forever if there is a
running API call. So if there is a running streaming call, the
maintenance service might hang until it is finished.

The problem shows up with 'Upgrade' API in the maintenance mode if there
is a concurrent streaming API call, e.g.:

1. Watch API is running against maintenance mode.
2. Upgrade API is issued, it tries to run the MaintenanceUpgrade
   sequence, which tries to take over the Initialize sequence. The
   Initialize sequence is canceled, maintenance API service context is
   canceled, but the service doesn't terminate, as it's stuck in
   `GracefulStop`. The sequence take over times out, as even the
   sequence is canceled, it hasn't terminated yet.

Sample log:

```
[talos] upgrade request received: "ghcr.io/siderolabs/installer:v1.3.3"
[talos] upgrade failed: failed to acquire lock: timeout
[talos] task loadConfig (1/1): failed: failed to receive config via maintenance service: maintenance service failed: context canceled
[talos] phase config (6/7): failed
[talos] initialize sequence: failed
<stuck here>
```

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-03-23 16:59:30 +04:00
Andrey Smirnov
a0a5db590d
feat: update Flannel to 0.21.4
See https://github.com/flannel-io/flannel/releases/tag/v0.21.4

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-03-22 22:28:50 +04:00
Noel Georgi
d1a61fd343
chore: bump golangci-lint
Bump golangci-lint and fixup new warnings. Ignore check that checks for
used function parameters, it's kind of noisy and makes it confusing to
read interface implementations.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-03-22 19:55:38 +05:30
Noel Georgi
36a9a208ec
chore: bump deps
Bump deps

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-03-22 16:37:27 +05:30
Noel Georgi
c63cf90e32
feat: update k8s to v1.27.0-beta.0
Update k8s to v1.27.0-beta.0

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-03-21 23:59:17 +05:30
Dmitriy Matrenichev
b246c90abd
fix: add uint32 to Magic1 and Magic2
Discovered in #6971. Go compiler cannot deduce proper type on 32bit architectures for those constants,
in `fmt.Print(f)` functions. Since we only compare them with uint32 variables, it makes sense to add proper
types to them.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2023-03-21 09:57:55 -03:00
Andrey Smirnov
777c8d6f6e
chore: update COSI to watch aggregated version
This should fix problem with storm of update events causing buffer
overruns.

See also 66feeeccd91c8db560ae99a960cf4cc7c92594b9.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-03-21 15:59:17 +04:00
Andrey Smirnov
bec89bf6e5
fix: use 'no block' etcd dial with multiple endpoints
The problem showed up on 'reset' of the Talos node which had multiple
endpoints for other control plane nodes, many of which weren't actually
available.

When 'grpc.WithBlock()' is used, etcd will try to dial the first
endpoint and return an error if the dial fails.

Use noblock mode by default with multiple endpoints, and blocking mode
with a single endpoint.

Pass the context to etcd to properly abort dial operations if the
context get canceled.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-03-21 15:35:31 +04:00
Andrey Smirnov
28713c2c4d
feat: update Kubernetes to 1.26.3
Mostly to backport to 1.3.x, main should be soon updated to 1.27.x.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-03-20 23:36:11 +04:00
Dzerom Dzenkins
a3cf416475
docs: add InstallConfig ignored notice to doc
Mention that `.machine.install` gets ignored on pre-installed images.

Signed-off-by: Dzerom Dzenkins <dzeri96@proton.me>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-03-20 22:48:34 +04:00
Noel Georgi
df9b851fba
chore: load all external artifacts earlier
Load all external artifacts early in the build process so that the
binaries are available for e2e tests.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-03-20 12:29:24 +05:30
Utku Ozdemir
2dd0964c5f
refactor: use resource watches on dashboard
Instead of doing excessive get/list requests, do a watch per node in an infinite retry.

Additionally, refactor the dashboard code to make the various data listener namings more consistent and reorganize the packages.

Closes siderolabs/talos#6960.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2023-03-17 23:06:35 +01:00
Noel Georgi
9933ebb6aa
chore: fix loaded artifacts file permission
Azure skips the file permissions when upload/downloaded from the object
store. Make sure all binaries under `_out` have executable permissions.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-03-17 17:44:59 +05:30
Dmitriy Matrenichev
a14a0aba04
fix: nil pointer exception in syncLink
If link has no `Info` field we can't do anything meaningful, so we'll just log and skip.
Also fix race in test.

For #6956

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2023-03-17 15:33:20 +04:00
Noel Georgi
cf101e56fb
fix: add --force flag for talosctl gen
Error out if file(s) already exists and warn user to use
`--force` to overwrite.

Fixes: #6963

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-03-17 15:07:12 +05:30
Utku Ozdemir
ea2aa06116
fix: fix data race on network config read
Fix a data race caused by the metadata field of PlatformNetworkConfig being edited after it was sent to the channel. It caused test failures.

Fix it by setting a copy of the metadata instead.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2023-03-17 00:24:22 +01:00
Andrey Smirnov
64e3d24c6b
feat: provide platform network config for 'metal' in META
A special META key might contain optional platform network config for
the `METAL` platform.

It is completely optional, but if present, it works same way as in the
clouds: it is applied with low priority (can be overridden with machine
config), but provides some initial defaults for the machine.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-03-15 23:54:39 +04:00
Andrey Smirnov
442cb9c1b0
feat: implement APIs to write to META
This allows to put keys to META partition.

META contents can be viewed with `talosctl get metakeys`.

There is not real usecase for it yet, but the next PRs will introduce
two special keys which can be written:

* platform network config for `metal`
* `${code}` variable

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-03-15 22:17:52 +04:00
Utku Ozdemir
9e07832db9
feat: implement summary dashboard
Implement the new summary dashboard with node info and logs.
Replace the previous metrics dashboard with the new dashboard which has multiple screens for node summary, metrics and editing network config.

Port the old metrics dashboard to the tview library and assign it to be a screen in the new dashboard, accessible by F2 key.

Add a new resource, infos.cluster.talos.dev which contains the cluster name and id of a node.

Disable the network config editor screen in the new dashboard until it is fully implemented with its backend.

Closes siderolabs/talos#4790.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2023-03-15 13:13:28 +01:00
Andrey Smirnov
1df841bb54
refactor: change the interface of META
Use a global instance, handle loading/saving META in global context.

Deprecate legacy syslinux ADV, provide an easier interface for
consumers.

Expose META as resources.

Fix the bootloader revert process (it was completely broken for quite a
while :sad:).

This is a first step which mostly does preparation work, real changes
will come in the next PRs:

* add APIs to write to META
* consume META keys for platform network config for `metal`
* custom key for URL `${code}`

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-03-15 15:43:16 +04:00
Spencer Smith
e9962bc3ea
chore: update CI to tag azure buckets
This PR updates CI to remove the immutability policy and tags the azure
"containers" (aka buckets) with a ci=true tag. This will allow us to
handle the deletion of buckets with the cloud-cleaner app.

Signed-off-by: Spencer Smith <spencer.smith@talos-systems.com>
2023-03-13 14:09:06 -04:00
Andrey Smirnov
9f5f5cf9bf
feat: update Flannel to v0.21.3
See https://github.com/flannel-io/flannel/releases/tag/v0.21.3

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-03-13 20:32:26 +04:00
Andrey Smirnov
02b0ff35ee
feat: generate Flannel CNI manifest from upstream
Fixes #6730

`go generate`-based step downloads the upstream manifest, transforms it
to match our requirements, and it is compiled in as the Flannel
manifest.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-03-13 20:00:35 +04:00
Andrey Smirnov
6656d35eca
docs: fix Talos version to use template
Fixes #6944

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-03-13 15:28:27 +04:00
xyhhx
72a6d1d708
docs: update nocloud
Use the correct link to nocloud cloudinit docs.

Signed-off-by: xyhhx <xyhhx@disr.it>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-03-13 14:51:28 +04:00
Serge Logvinov
9948a646d2
feat: coredns node uninitialized toleration
Launch CoreDNS even if the node is not initialized.
Network is ready already, but CCM didn't finish their job.

Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-03-13 14:29:14 +04:00
Andrey Smirnov
e03902b546
feat: update Go to 1.20.2
Also bump Linux to 6.1.15.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-03-10 16:41:17 +04:00
Steffen Windoffer
c8f8579f2d
fix: upgrade-k8s to flag should not be required since there is a default
Having a default and still requiring it confuses the user.

Signed-off-by: Steffen Windoffer <steffen@wind0r.de>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-03-09 17:33:21 +04:00
Erik Lund
230cfaf803
feat: use network information from guestinfo.metadata
Add VMware GuestInfo metadata to network configuration.

Fixes #6708

Signed-off-by: Erik Lund Jensen <info@erikjensen.it>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-03-09 16:51:08 +04:00
Nico Berlee
97048f7c37
feat: netstat in API and client
Implements netstat in Talos API and client (talosctl).

Signed-off-by: Nico Berlee <nico.berlee@on2it.net>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-03-09 15:48:30 +04:00
Andrey Smirnov
fda6da6929
fix: successful ACPI shutdown in maintenance mode
Fixes #6817

The original problem wasn't reproducible with `main`, but there was a
set of bugs in the shutdown sequence which was preventing it from
completing successfully, as in the maintenance mode nothing is running
and initialized yet.

Most of the bugs were `nil` pointer dereferences.

Fixed a small issue with final 'RebootError' printed as a failure in the
ACPI shutdown path.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-03-07 23:52:02 +04:00
Seán C McCord
b97e1abaa6
feat: set default image, validate empty image
Adds a default image URL and ensures that an empty image URL is not
sent when calling `talosctl upgrade`.

Fixes #6912

Signed-off-by: Seán C McCord <ulexus@gmail.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-03-07 18:21:54 +04:00
Artem Chernyshev
121220a3b3
chore: bump dependencies via renovate bot
Fixes: https://github.com/siderolabs/talos/pull/6914
Fixes: https://github.com/siderolabs/talos/pull/6915
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2023-03-07 15:58:25 +03:00
Dmitriy Matrenichev
ebc92f3c1d
chore: add container id to talosctl -k containers and talosctl -k logs
This PR adds first 12 symbols from container ID and adds them to `talosctl -k containers` each container output.
That way we can ensure that we get the logs from proper container even if there is a newer one.

Closes #6886

Co-authored-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2023-03-07 13:20:44 +03:00
Dmitriy Matrenichev
22ef81c1e7
feat: add grub option to drop to maintenance mode
- [x] Support `talos.experimental.wipe=system:EPHEMERAL,STATE` boot kernel arg
- [x] GRUB option to wipe like above
- [x] update GRUB library to handle that

Closes #6842

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2023-03-07 12:37:59 +03:00
Andrey Smirnov
642fe0c90c
feat: update pkgs with framebuffer console
This brings in new kernel & containerd, and the kernel has support for
framebuffer console enabled.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-03-06 22:13:33 +04:00
Noel Georgi
69cb414f01
docs: update cilium install instructions
Update cilium install instructions.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-03-06 22:57:39 +05:30
Dmitriy Matrenichev
e71cc6619b
fix: redo assertHostnames in HostnameMergeSuite.TestMerge
Use `rtestutils.AssertResources` for hostnames test.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2023-03-06 15:09:50 +03:00
Andrey Smirnov
8ea4bfad8f
refactor: improve the kubernetes upgrade flow
Use new version of go-kubernetes, and move the `kube-proxy` DaemonSet
update to follow common logic of bootstrap manifests update.

This fixes a confusing behavior when after `k8s-upgrade` the version of
`kube-proxy` is not updated in the machine config.

See https://github.com/siderolabs/go-kubernetes/pull/3

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-03-06 15:01:29 +04:00
Steve Francis
81879fc0ca
docs: add how tos for workloads on control planes, and scaling up
First set of how-tos.

Signed-off-by: Steve Francis <steve.francis@talos-systems.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-03-06 14:02:45 +04:00
Spencer Smith
05b0b721c9
chore: move blob storage to azure for builds
This PR moves blob storage to azure.

Signed-off-by: Spencer Smith <spencer.smith@talos-systems.com>
2023-03-04 15:50:04 -05:00
Noel Georgi
a78281214d
feat: add cilium e2e tests
Add cilium e2e tests. The existing cilium check was very old, update to
latest cilium version and also add a test for KPR strict mode.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-03-03 20:03:25 +05:30