3887 Commits

Author SHA1 Message Date
Tim Jones
061640cccf
feat: add pod ip to kube-proxy spec
Exposes the pod IP as the `POD_IP` environment variable via the downward
API in the kube-proxy pod for use in e.g. metrics-bind-addr.

Signed-off-by: Tim Jones <tim.jones@siderolabs.com>
2023-03-03 12:52:30 +01:00
Andrey Smirnov
dea17d7234
feat: update Kubernetes to v1.26.2
See https://github.com/kubernetes/kubernetes/releases/tag/v1.26.2

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-03-01 22:50:54 +04:00
Andrey Smirnov
337aaba7a7
feat: add 'os:operator' role
This introduces a new role for Talos API which fills the gap between
`os:reader` and `os:admin` roles.

Fixes #6898

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-03-01 16:12:25 +04:00
Andrey Smirnov
40e69af224
fix: improve etcd leave on reset process
When removing a member from `etcd`, the server does a pre-check to make
sure the member is connected to a quorum of other members, and the
remove request might fail. Add a retry to wait for the etcd to be fully
connected before giving up, as some parts of the reset flow alrady ran.

Also fix an issue which appears in the integration test, when `reset` is
called early in the boot sequence when local etcd hasn't started fully yet.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-03-01 14:51:49 +04:00
Dmitriy Matrenichev
638dc9128f
fix: fix "defer" leak in ResetUserDisks
Also, print error if we failed to close the device.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2023-02-28 21:51:37 +03:00
Dmitriy Matrenichev
bfba3677b0
chore: handle grub option - "wipe"
This PR ensures that we can handle third grub option - "wipe". We will use it in 1.4.

For #6842

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2023-02-28 21:21:28 +03:00
Andrey Smirnov
594f27d878
release(v1.4.0-alpha.2): prepare release
This is the official v1.4.0-alpha.2 release.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-02-28 18:03:05 +04:00
Artem Chernyshev
b520710810
feat: introduce new flag in reset API that makes Talos reset user disks
Fixes: https://github.com/siderolabs/talos/issues/6815

Additionally, make it possible to run reset in maintenance mode: to
enable a way for resetting system disk and remove all traces of Talos
from it.

The new reset flow works in a separate sequence, changed disk probe
lookup to check the boot partition instead of the ephemeral one.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2023-02-28 15:10:41 +03:00
Utku Ozdemir
f55f5df739
feat: move dashboard package & run it in tty2
Move dashboard package into a common location where both Talos and talosctl can use it.

Add support for overriding stdin, stdout, stderr and ctt in process runner.

Create a dashboard service which runs the dashboard on /dev/tty2.

Redirect kernel messages to tty1 and switch to tty2 after starting the dashboard on it.

Related to siderolabs/talos#6841, siderolabs/talos#4791.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2023-02-28 12:00:25 +01:00
Dmitriy Matrenichev
36e077ead4
chore: bump deps
- github.com/aws/aws-sdk-go to v1.44.209
- github.com/stretchr/testify to v1.8.2
- github.com/jsimonetti/rtnetlink to v1.3.1
- google.golang.org/genproto to v0.0.0-20230223222841-637eb2293923
- github.com/emicklei/dot to v1.3.1
- github.com/gdamore/tcell/v2 to v2.6.0
- github.com/insomniacslk/dhcp to v0.0.0-20230220063916-5369909a5de7
- github.com/jsimonetti/rtnetlink to v1.3.1
- github.com/opencontainers/runtime-spec to v1.1.0-rc.1.0.20230215090456-58ec43f9fc39
- github.com/rivo/tview to v0.0.0-20230226195229-47e7db7885b4

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2023-02-28 00:14:59 +03:00
Noel Georgi
5a01d5fd47
chore: run extension build as downstream
Run extensions build as downstream

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-02-27 20:11:10 +05:30
Noel Georgi
426fe9687d
fix: extension base folder permission
The `modules.dep` kernel module dependency tree extension root path was
previously created with a permission of `0o700` which means the talos
root go a permission of `0o700` when the kernel module tree was re-built
when extensions providing kernel modules was enabled. This means that
any binaries lost the executable permission when ran as non-root
creating an `EACCES` error. Fix by making sure the temporary directory
created for building kernel modules tree has `0o755` permission
explicitly.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-02-27 19:49:06 +05:30
Andrey Smirnov
609d3a8a69
feat: support strategic merge patches on VLAN configuration
Fixes #6884

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-02-27 14:03:11 +04:00
Andrey Smirnov
7e19f32d76
chore: provide version compatibility data for Talos 1.2.x
This provides Kubernetes version compatibility for Talos 1.2.x, so that
we have a unified source of data for Talos >= 1.2.x.

Also bump supported Kubernetes version for Talos 1.4.x to be 1.25-1.27,
as Talos 1.4 is expected to ship with Kubernetes 1.27.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-02-23 20:48:11 +04:00
Andrey Smirnov
230e46e567
refactor: extract parts of kubernetes libraries
The shared code is going out to the
github.com/siderolabs/go-kubernetes library.

The code will be used in Talos and other projects using same features.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-02-22 14:56:49 +04:00
Andrey Smirnov
f3d3f0f262
fix: update go-smbios library with Hyper-V data fix
See https://github.com/siderolabs/go-smbios/pull/15

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-02-21 18:32:27 +04:00
Dmitriy Matrenichev
8711eea962
fix: use passed --context in talosctl config cmd
Use context from command line flags. Also some minor fixes.

Closes #6846

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2023-02-21 15:00:04 +03:00
Utku Ozdemir
5ac9f43e45
feat: start machined earlier & in maintenance mode
Load & start machined earlier and in initialize sequence, so that it is possible to use its API over its unix socket in maintenance mode.

Additionally, do not return features from Version API  if a config is not yet available.

Related to siderolabs/talos#4791.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2023-02-21 12:21:36 +01:00
Andrey Smirnov
36ab414a1d
docs: fix the endpoints in the libvirt guide
See #6864

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-02-21 15:00:05 +04:00
Dmitriy Matrenichev
3d55bd80f4
fix: add --force flag to talosctl gen config
Only overwrite existing files if explicitly demanded.

Closes #6847

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2023-02-20 23:44:00 +03:00
Serge Logvinov
660b8874da
feat: cmdline integer netmask
Can set netmask as number.

Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-02-20 20:55:56 +04:00
Noel Georgi
1e3daacc48
docs: update nvidia component versions
Update NVIDIA component versions.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-02-17 20:03:17 +05:30
Andrey Smirnov
b5c03a7fab
fix: docker talosctl cluster create provisioner
Recent Docker versions seem to have changed the API in the way container
IP addresses are reported.

Also fix running Talos 1.3 image under talosctl 1.4.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-02-17 16:04:30 +04:00
Andrey Smirnov
6e8f13529c
fix: add support for a fallback '*' mirror configuration
Talos always supported that, but CRI config lacked support for it.

Now with recent containerd the new `_default` host is used as a
fallback, so this re-enables the support and updates the docs.

See https://github.com/containerd/containerd/pull/8065

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-02-16 23:12:13 +04:00
Artem Chernyshev
dcd4eb1a93
fix: improve error message on single node upgrade
Fixes: https://github.com/siderolabs/talos/issues/6828

Propose a solution if the node upgrade fails.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2023-02-16 17:33:04 +03:00
Noel Georgi
ed5af3f780
chore: bump deps
Bump Go to 1.20.1
Bump containerd to 1.6.18
Bump kernel to 6.1.12
Bump go deps and enable renovate updates for markdown lint tools.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-02-16 19:08:57 +05:30
Dmitriy Matrenichev
0dc6858e5b
chore: bump cosi-project/runtime
And update all `ResourceDefinition` docs and type names. Drop unused functions and names.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2023-02-15 17:30:02 +03:00
Andrey Smirnov
da2edb9de0
chore: bump dependencies
CoreDNS: v1.10.1

And many other small bumps.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-02-15 17:29:15 +04:00
Andrey Smirnov
e51a110f0e
chore: bump dependencies
Go modules, container images.

Fixup for new COSI version: `ResourceDefinition` signature.

Update for new gRPC version: endpoints interface.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-02-15 15:26:55 +04:00
Noel Georgi
2d01480180
feat: automatically load modules based on hw info
Fixes: #6802

Automatically load kernel modules based on hardware info and modules
alias info. udevd would automatically load modules based on HW
information present.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-02-14 19:57:13 +05:30
Noel Georgi
7b75cd8b94
fix: kernel module dependency tree generation
This fixes the issue when the overlay mount target directory was used as
lowerdir for the mount, creating extra folders in the extension.

Fix the issue by adding support for normal overlay mounts to use a
source directory when specified.

Also fixes a small issue where messages was logged when error is nil.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-02-14 01:07:11 +05:30
Noel Georgi
65d02e5ade
fix: dbus shutdown when it's not initialized
If dbus is not started and a shutdown was called talos panics, fix by
checking if the mock is nil.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-02-13 21:12:54 +05:30
Andrey Smirnov
a7079ce85c
fix: quote the ampersand character in GRUB config
Not sure how I missed it in the first PR, but that's the only character
which was not quoted properly.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-02-13 18:58:34 +04:00
Andrey Smirnov
933ba2d820
fix: display correct blockdevice size
See https://github.com/siderolabs/go-blockdevice/pull/67

Fixes #6836

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-02-13 16:55:35 +04:00
Andrey Smirnov
c449cb736b
fix: talosctl reboot command passing mode in wait mode
The reboot mode was not passed correctly in wait mode.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-02-13 16:20:07 +04:00
budimanjojo
34ab0007a6
docs: port is needed for wireguard endpoint
Example of `wireguard.peers[].endpoint` is wrong

Signed-off-by: budimanjojo <budimanjojo@gmail.com>
Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-02-13 12:10:17 +05:30
Andrey Smirnov
1e1aa84f6c
fix: kubernetes removed resource version check
Not all Kubernetes deprecated resources are same - if the old API
version is deprecated, but new one is available, API server handles
trnasition for us. If some resource is removed completely, we need to
check for it. This reduces number of items to check, and simplifies the
check.

Move the check under the umbrella of the 'upgrade pre-checks', and make
it actually fatal.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-02-10 14:45:09 +04:00
Andrey Smirnov
dcbcf5a93c
fix: wait for network and retry in platform get config funcs
Wait for the network before trying to access the metadata service.

Retry the calls when appropriate (most platforms use `download.Download`
function which does proper retries).

Co-authored-by: Noel Georgi <git@frezbo.dev>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-02-09 21:04:43 +04:00
Andrey Smirnov
3d7566ec74
test: update Canal CNI manifest URL
With recent changes to Calico website, old URL returns 404.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-02-08 23:20:56 +04:00
Andrey Smirnov
e09e106665
fix: default dns domain to 'cluster.local' in local case
One case was missing: when network section is present, but value is
omitted.

Fixes #6825

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-02-08 14:35:28 +04:00
Noel Georgi
cc6e37a47f
feat: use process wrapper for dropping capabilities
Use process wrapper introduced in #6814 to drop capabilities. This change
also means the capabilities are dropped per process level and not for
PID 1 (machined), which allows us to drop capabilities per process.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-02-07 00:49:56 +05:30
Steffen Windoffer
0c6c888745
fix: trackable action flag usage text. --no-wait does not exist
--wait gets set to true

Signed-off-by: Steffen Windoffer <steffen@wind0r.de>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-02-06 15:26:38 +04:00
Noel Georgi
5cb2915d8e
feat: use wrapper for starting processes
Use a wrapper for starting processes which can setup proper cgroups,
OOMscore, and also drop capabilities for the process, then it calls
`execve`.

The containerd tests is also fixed to support cgroups when
running tests in buildkit. It used to pass previously as we did not
error if cgroup setup failed.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-02-03 18:32:09 +05:30
Andrey Smirnov
56d9453261
fix: panic in talosctl cluster show
This might happen with docker provisioner if the network is not found.

Fixes #6793

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-02-03 14:45:52 +04:00
Andrey Smirnov
38a51191e4
fix: correctly expand parameters in the URL
This fixes multiple issues:

* `log.Fatalf` in the machined code leads to kernel panic
* return URL if some expansion fails
* correctly handle destroyed event (wait for the next one)

Fixes #6807

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-02-02 18:42:45 +04:00
Andrey Smirnov
af21860a22
fix: return proper error if download attempts time out
Fixes #6795

This fixes a problem with Talos being stuck if the download attempts
time out - the returned context.Canceled error was triggering a
different flow which treats sequence take over as a special case, while
there is no other sequence to run.

Correct error should be timeout.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-02-02 18:19:04 +04:00
Andrey Smirnov
54f7d4c923
fix: correctly quote and unquote strings in GRUB config
One of the fields in the GRUB config - boot arguments - contains
user-controlled input. Talos supports variable expansion in
`talos.config` parameter, and uses `${var}` syntax.

In GRUB config, `}` is a special character, and introduction of `}`
breaks config parsing both for GRUB and Talos.

Correctly escape and unescape special characters.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-02-02 17:11:22 +04:00
Andrey Smirnov
54cf0672a7
fix: omit zero MTU in the machine config
Fixes #6747

The setting `mtu: 0` always meant "don't touch MTU", but the presence of
such line is very confusing.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-01-31 15:56:40 +04:00
Sander Maijers
bdc53ac254
docs: add hyperlink to Docker API docs about config.json
This reduces time needed to navigate docs.

Signed-off-by: Sander Maijers <3374183+sanmai-NL@users.noreply.github.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-01-30 23:10:03 +04:00
Andrey Smirnov
b3bc06dd14
chore: bump vtprotobuf to v0.4.0
Use new equality generate check.

It's not being used in Talos a lot, it's almost only in the discovery
API client code.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-01-30 20:50:45 +04:00