4120 Commits

Author SHA1 Message Date
Andrey Smirnov
06369e8195
fix: retry CRI pod removal, fix upgrade flow in the tests
It seems that CRI has a bit of eventual consistency, and it might fail
to remove a stopped pod failing that it's still running.

Rewrite the upgrade API call in the upgrade test to actually wait for
the upgrade to be successful, and fail immediately if it's not
successful. This should improve the test stability and it should make
it easier to find issues immediately.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-12 16:20:10 +04:00
Andrey Smirnov
d32dd3a820
chore: update Go to 1.20.6
See https://go.dev/doc/devel/release#go1.20.6

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-12 15:21:26 +04:00
Andrey Smirnov
8017afb107
feat: implement CRI image management and pre-pull on K8s upgrade
Fixes #6391

Implement a set of APIs and commands to manage images in the CRI, and
pre-pull images on Kubernetes upgrades.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-11 19:25:10 +04:00
Andrey Smirnov
1c2f19b367
feat: update Kubernetes to 1.28.0-alpha.4
The Go modules were not tagged for alpha.4, so using alpha.3 tag.

Talos 1.5 will ship with Kubernetes 1.28.0.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-11 15:40:24 +04:00
Noel Georgi
94e9891c1b
chore: bump sd-boot to v254-rc1
Bump sd-boot.
Fix parsing PE executable offsets.
Set the PE file alignment to be 512 bytes.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-07-11 15:52:57 +05:30
Artem Chernyshev
936111ce06
fix: properly set up tls for KMS endpoint
The condition was inverted 🤦

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2023-07-10 21:10:02 +03:00
Artem Chernyshev
cb226eec46
fix: rewrite encryption system information flow
Pass getter to the key handler instead of already fetched node uuid.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2023-07-10 19:07:46 +03:00
Noel Georgi
3206db5289
feat: drop tpm simulator for ukify measure
We do not need a tpm simulator for ukify measure. We can pre-calculate
the values. This also means we can build ukify as a static binary.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-07-10 19:21:32 +05:30
Andrey Smirnov
bd4f89f633
fix: disable dashboard on Azure, GCP and Scaleway
Fixes #7416

These platforms don't have video console access.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-10 17:05:56 +04:00
Andrey Smirnov
bdb96189fa
refactor: make maintenance service controller-based
Fixes #7430

Introduce a set of resources which look similar to other API
implementations: CA, certs, cert SANs, etc.

Introduce a controller which manages the service based on resource
state.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-10 15:41:52 +04:00
Andrey Smirnov
d23d04de2a
feat: seed the kernel random pool from the TPM
Use the TPM2 feature to provide high-quality random bytes.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-07 23:51:11 +04:00
LukasAuerbeck
c81ce8cfb0
feat: support controlplane resources configuration
Fixes #7379

Add possibility to configure the controlplane static pod resources via
APIServer, ControllerManager and Scheduler configs.

Signed-off-by: LukasAuerbeck <17929465+LukasAuerbeck@users.noreply.github.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-07 22:44:56 +04:00
Andrey Smirnov
74de562b29
fix: mount hugepages with nosuid + nodev
Fixes #7445

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-07 21:57:19 +04:00
Artem Chernyshev
ce63abb219
feat: add KMS assisted encryption key handler
Talos now supports new type of encryption keys which rely on Sealing/Unsealing randomly generated bytes with a KMS server:

```
systemDiskEncryption:
  ephemeral:
    keys:
      - kms:
          endpoint: https://1.2.3.4:443
        slot: 0
```
gRPC API definitions and a simple reference implementation of the KMS server can be found in this
[repository](https://github.com/siderolabs/kms-client/blob/main/cmd/kms-server/main.go).

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2023-07-07 19:02:39 +03:00
Andrey Smirnov
dafbe9debd
chore: optimize dockerfile instructions
Use shell here-doc to unify multiple commands into a single layer to
have less layers created.

Use `--link` to pull in pkgs.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-07 17:54:53 +04:00
Andrey Smirnov
a4289e8703
chore: fix CLI docs generation stability
The problem first spotted by Artem, leads to spurious dirty checks.

The sort order was checking wrong (lowered) keys, so the order was
actually random.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-05 19:20:40 +04:00
Andrey Smirnov
2fec8388fc
chore: bump dependencies
Go modules, pkgs, Cilium CLI, CAPI base version.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-05 18:30:54 +04:00
Steve Francis
c1b4262dd6
docs: split simple and more complex getting started guides
Split the documents to provide easier version for the as the starting
guide.

Signed-off-by: Steve Francis <steve.francis@talos-systems.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-04 21:27:42 +04:00
Andrey Smirnov
c9a9f95611
refactor: extract secure boot certificate generation
Fixes #7412

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-03 16:55:02 +04:00
Andrey Smirnov
6be5a13d5d
feat: implement machine config documents for event and log streaming
Fixes #7228

Add some changes to make Talos accept partial machine configuration
without main v1alpha1 config.

With this change, it's possible to connect a machine already running
with machine configuration (v1alpha1), the following patch will connect
to a local SideroLink endpoint:

```yaml
apiVersion: v1alpha1
kind: SideroLinkConfig
apiUrl: grpc://172.20.0.1:4000/?jointoken=foo
---
apiVersion: v1alpha1
kind: KmsgLogConfig
name: apiSink
url: tcp://[fdae:41e4:649b:9303::1]:4001/
---
apiVersion: v1alpha1
kind: EventSinkConfig
endpoint: "[fdae:41e4:649b:9303::1]:8080"
```

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-01 00:22:44 +04:00
Andrey Smirnov
e241be85ba
fix: properly handle YAML comment stripping for multi-doc
Fixes #7425

The previously used method doesn't handle YAML multi-doc, incorrectly
stripping only the first document and throwing away everything else.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-06-30 15:59:06 +04:00
James Callahan
c02ada7d95
fix: capabilities including ALL should be uppercase
Pod security standard requires that ALL is in caps

Signed-off-by: James Callahan <james@wavesquid.com>
Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-06-29 12:58:30 +05:30
Noel Georgi
cbdf96d461
feat: support environment file for extensions
Supports setting `environmentFile` for Talos System Extension Services.

Fixes: #7316

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-06-28 00:21:13 +05:30
Andrey Smirnov
35d6adcb9a
fix: provide stashed META values before installation
Previously, if META values were supplied to the Talos ISO via
environment variable, they will be written down and available after the
install. With this fix, values are also readable and available before
the installation runs (in maintenance mode).

Most of the PR is refactoring `meta.Value(s)` to be a shared library
which is used by the installer/imager and (now) Talos.

Also fixes an issue with not returning properly `NotExist` error when
META is not yet available as a partition on disk.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-06-27 20:57:43 +04:00
Noel Georgi
258f074490
fix: ukify cert generation
Fix ukify cert generation by setting the proper Subj/CN.

Fixes: #7383

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-06-27 20:31:50 +05:30
Dennis Marttinen
bf3febb7e2
fix: refine OVMF search paths
The static paths for the OVMF firmware are limited, and won't match, for
example, any of the files installed by `edk2-ovmf` on a Fedora 38 system. This
change separates the search paths and filenames, making sure all combinations
are covered when looking for a suitable firmware. Similarly also cleans up the
OVMF vars lookup.

Signed-off-by: Dennis Marttinen <twelho@welho.tech>
Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-06-27 19:16:45 +05:30
Andrey Smirnov
fbebc17f8b
fix: disable LVM backups/archive
Fixes #3129

Talos does not have a good location to keep LVM metadata backups.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-06-27 17:28:18 +04:00
Noel Georgi
e5306ef263
chore: format and cleanup test scripts
This formats and cleanups the test scripts.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-06-27 16:53:40 +05:30
Noel Georgi
bc371ecfda
chore: add /sbin/shutdown
Some tools like qemu-guest-agent when ran as a extension service calls
`/sbin/shutdown` instead of `/sbin/poweroff`. This adds handling for the
same.

Ref: https://github.com/siderolabs/extensions/pull/173

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-06-27 16:10:51 +05:30
Utku Ozdemir
0d313b9733
feat: add reboot-mode flag to talosctl upgrade
Allow specifying the reboot mode during upgrades by introducing `--reboot-mode` flag, similar to the `--mode` flag of the reboot command.

Closes siderolabs/talos#7302.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2023-06-26 17:37:19 +02:00
Markus Reiter
7ce87f20c3
fix: compare only basename of os.Args[0] in machined
This makes handling of `exec` more flexible.

Signed-off-by: Markus Reiter <me@reitermark.us>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-06-26 17:42:30 +04:00
Tim Jones
53389b1e72
feat: auto-enroll secure boot keys
Uses the auto-enrollment feature of sd-boot to enroll required UEFI Secure
Boot keys.

Fixes: #7373

Signed-off-by: Tim Jones <tim.jones@siderolabs.com>
Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-06-24 00:44:56 +05:30
Victor Bajada
d77f0bc7bb
docs: fix broken link to powershell module
fix broken link to powershell module.

Signed-off-by: Victor Bajada <bajada.victor@gmail.com>
Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-06-23 21:07:13 +05:30
Noel Georgi
e1b150a110
release(v1.5.0-alpha.1): prepare release
This is the official v1.5.0-alpha.1 release.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-06-23 00:40:12 +05:30
Noel Georgi
8daf432b29
chore: bump deps
Bump deps.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-06-22 22:41:08 +05:30
Noel Georgi
e3f3f5794d
feat: implement revert for sd-boot
Implement revert for sd-boot.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-06-22 20:20:31 +05:30
Walt Chen
d8b0903d70
docs: vagrant setup document fix
Fix the incorrect yamlpath of controlplane.yaml to be applied to
controlplane node.

Signed-off-by: Walt Chen <godsarmycy@gmail.com>
2023-06-20 23:48:52 +05:30
Andrey Smirnov
fe0f46980f
feat: implement secure boot from disk
This includes sd-boot handling, EFI variables, etc.

There are some TODOs which need to be addressed to make things smooth.

Install to disk, upgrades work.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-06-16 20:15:16 +05:30
Dmitriy Matrenichev
445f5ad542
feat: support API server load balancer
This commit adds support for API load balancer. Quick way to enable it is during cluster creation using new `api-server-balancer-port` flag (0 by default - disabled). When enabled all API request will be routed across
cluster control plane endpoints.

Closes #7191

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2023-06-16 10:09:20 -04:00
Andrey Smirnov
19bc223de8
refactor: bootloader interface, labels
Move labels out of the bootloader interface, while moving copying assets
into the bootloader interface. GRUB is using one set of assets,
`sd-boot` will be using another one.

Fix the problem with `bootloader.Probe()` finding boot partition on the
host when it runs in a priv container, fixing issues with image creation
in the CI.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-06-14 17:33:11 +04:00
Dmitriy Matrenichev
665702ddd3
chore: fix cilium e2e tests
`WITH_CONFIG_PATCH_WORKER` check result was overriding any value set in `CONFIG_PATCH_FLAG` variable.
Move it to the different variable.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-06-14 15:08:31 +04:00
Noel Georgi
71a548d180
chore: generic boootloader implementation
This changes the bootloader code to be generic to support
multiple bootloader implementations.

Signed-off-by: Noel Georgi <git@frezbo.dev>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-06-13 23:36:20 +04:00
Andrey Smirnov
e9dbc9311b
test: bump versions for upgrade tests
As we're getting to 1.5.0, bump versions for upgrade tests.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-06-13 23:17:22 +04:00
Andrey Smirnov
0a99965efb
refactor: replace uncordonNode with controllers
Fixes #7233

Waiting for node readiness now happens in the `MachineStatus` controller
which won't mark the node as ready until Kubernetes `Node` is ready.

Handling cordoning/uncordining happens with help of additional resource
in `NodeApplyController`.

New controller provides reactive `NodeStatus` resource to see current
status of Kubernetes `Node`.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-06-13 21:48:42 +04:00
Andrey Smirnov
e858bca3a2
test: fix cilium integration tests
Due to a bug (?) cilium tests don't clean up all the deployments & pods,
leaving one pod in 'Pending' state.

Kubernetes e2e tests check for !Running pods in `kube-system` namespace.

Fix by moving cilium tests to a separate namespace.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-06-13 20:41:15 +04:00
Andrey Smirnov
455328d058
fix: allow time skew for generated kubeconfig
The `kubeconfig` can be fetched from one Talos node, while Kubernetes
API request might land on `kube-apiserver` on a different node which
might have time slightly out of sync.

The minimum time diff between the two might lead to `Unauthorized` error
on first use:

```
1 authentication.go:70] "Unable to authenticate the request" err="[x509: certificate has expired or is not yet valid: current time 2023-06-13T15:30:51Z is before 2023-06-13T15:30:52Z, verifying certificate SN=314179687645609956480346926163236202072, SKID=, AKID=E9:9E:A8:1E:0B:6C:8B:AB:1B:2B:7E:17:14:CF:A4:0A:82:6B:42:67 failed: x509: certificate has expired or is not yet valid: current time 2023-06-13T15:30:51Z is before 2023-06-13T15:30:52Z]"
```

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-06-13 20:03:38 +04:00
Noel Georgi
3ae05648ae
fix: usage of custom kernels
This fixes usage of custom kernel images to copy over the modules info
list and the default set of modules shipped with Talos.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-06-12 22:14:36 +05:30
Andrey Smirnov
0797b0d168
chore: add a pipeline to test cloud-images step without a release
Also uncomment Azure uploader.

Add the Azure environment variables to the Makefile cloud-images step.

Change disk size and tier to 16GiB and tier: P3

Add boolean value to drone pipeline and the cloud images hack will check the value to determine which Azure Compute Gallery to push images to.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Signed-off-by: Christian Rolland <christian.rolland@siderolabs.com>
2023-06-12 19:42:27 +04:00
Utku Ozdemir
e5a36268b6
docs: include allowSchedulingOnControlPlanes on talosctl gen config output
Include a description and a commented-out example for the `cluster.allowSchedulingOnControlPlanes` field on `talosctl gen config ...` output.

Closes siderolabs/talos#7313.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2023-06-12 16:38:24 +02:00
Dmitriy Matrenichev
c74d937280
chore: bump github.com/cosi-project/runtime
Replace resource.Resource with meta.ResourceWithRD where possible.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2023-06-12 09:49:08 -04:00