docs: fork docs for v1.7

Time start v1.7 development cycle!

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
This commit is contained in:
Andrey Smirnov 2024-01-18 18:25:34 +04:00
parent 1c2d10cccc
commit fe24139f3c
No known key found for this signature in database
GPG Key ID: FE042E3D4085A811
159 changed files with 38769 additions and 4 deletions

View File

@ -1007,10 +1007,10 @@ RUN protoc \
/protos/time/*.proto
FROM scratch AS docs
COPY --from=docs-build /tmp/configuration/ /website/content/v1.6/reference/configuration/
COPY --from=docs-build /tmp/cli.md /website/content/v1.6/reference/
COPY --from=docs-build /tmp/schemas /website/content/v1.6/schemas/
COPY --from=proto-docs-build /tmp/api.md /website/content/v1.6/reference/
COPY --from=docs-build /tmp/configuration/ /website/content/v1.7/reference/configuration/
COPY --from=docs-build /tmp/cli.md /website/content/v1.7/reference/
COPY --from=docs-build /tmp/schemas /website/content/v1.7/schemas/
COPY --from=proto-docs-build /tmp/api.md /website/content/v1.7/reference/
# The talosctl-cni-bundle builds the CNI bundle for talosctl.

View File

@ -140,6 +140,10 @@ offlineSearch = false
# Enable syntax highlighting and copy buttons on code blocks with Prism
prism_syntax_highlighting = false
[[params.versions]]
url = "/v1.7/"
version = "v1.7 (pre-release)"
[[params.versions]]
url = "/v1.6/"
version = "v1.6 (latest)"

View File

@ -0,0 +1,56 @@
---
title: Welcome
no_list: true
linkTitle: "Documentation"
cascade:
type: docs
lastRelease: v1.7.0-alpha.0
kubernetesRelease: "1.29.0"
prevKubernetesRelease: "1.28.3"
nvidiaContainerToolkitRelease: "v1.13.5"
nvidiaDriverRelease: "535.129.03"
preRelease: true
---
## Welcome
Welcome to the Talos documentation.
If you are just getting familiar with Talos, we recommend starting here:
- [What is Talos]({{< relref "introduction/what-is-talos" >}}): a quick description of Talos
- [Quickstart]({{< relref "introduction/quickstart" >}}): the fastest way to get a Talos cluster up and running
- [Getting Started]({{< relref "introduction/getting-started" >}}): a long-form, guided tour of getting a full Talos cluster deployed
## Open Source
### Community
- GitHub: [repo](https://github.com/siderolabs/talos)
- Support: Questions, bugs, feature requests [GitHub Discussions](https://github.com/siderolabs/talos/discussions)
- Community Slack: Join our [slack channel](https://slack.dev.talos-systems.io)
- Matrix: Join our Matrix channels:
- Community: [#talos:matrix.org](https://matrix.to/#/#talos:matrix.org)
- Community Support: [#talos-support:matrix.org](https://matrix.to/#/#talos-support:matrix.org)
- Forum: [community](https://groups.google.com/a/siderolabs.com/forum/#!forum/community)
- Twitter: [@SideroLabs](https://twitter.com/talossystems)
- Email: [info@SideroLabs.com](mailto:info@SideroLabs.com)
If you're interested in this project and would like to help in engineering efforts, or have general usage questions, we are happy to have you!
We hold a weekly meeting that all audiences are welcome to attend.
We would appreciate your feedback so that we can make Talos even better!
To do so, you can take our [survey](https://docs.google.com/forms/d/1TUna5YTYGCKot68Y9YN_CLobY6z9JzLVCq1G7DoyNjA/edit).
### Office Hours
- When: Mondays at 16:30 UTC.
- Where: [Google Meet](https://meet.google.com/day-pxhv-zky).
You can subscribe to this meeting by joining the community forum above.
## Enterprise
If you are using Talos in a production setting, and need consulting services to get started or to integrate Talos into your existing environment, we can help.
Sidero Labs, Inc. offers support contracts with SLA (Service Level Agreement)-bound terms for mission-critical environments.
[Learn More](https://www.siderolabs.com/support/)

View File

@ -0,0 +1,4 @@
---
title: "Advanced Guides"
weight: 60
---

View File

@ -0,0 +1,108 @@
---
title: "Advanced Networking"
description: "How to configure advanced networking options on Talos Linux."
aliases:
- ../guides/advanced-networking
---
## Static Addressing
Static addressing is comprised of specifying `addresses`, `routes` ( remember to add your default gateway ), and `interface`.
Most likely you'll also want to define the `nameservers` so you have properly functioning DNS.
```yaml
machine:
network:
hostname: talos
nameservers:
- 10.0.0.1
interfaces:
- interface: eth0
addresses:
- 10.0.0.201/8
mtu: 8765
routes:
- network: 0.0.0.0/0
gateway: 10.0.0.1
- interface: eth1
ignore: true
time:
servers:
- time.cloudflare.com
```
## Additional Addresses for an Interface
In some environments you may need to set additional addresses on an interface.
In the following example, we set two additional addresses on the loopback interface.
```yaml
machine:
network:
interfaces:
- interface: lo
addresses:
- 192.168.0.21/24
- 10.2.2.2/24
```
## Bonding
The following example shows how to create a bonded interface.
```yaml
machine:
network:
interfaces:
- interface: bond0
dhcp: true
bond:
mode: 802.3ad
lacpRate: fast
xmitHashPolicy: layer3+4
miimon: 100
updelay: 200
downdelay: 200
interfaces:
- eth0
- eth1
```
## Setting Up a Bridge
The following example shows how to set up a bridge between two interfaces with an assigned static address.
```yaml
machine:
network:
interfaces:
- interface: br0
addresses:
- 192.168.0.42/24
bridge:
stp:
enabled: true
interfaces:
- eth0
- eth1
```
## VLANs
To setup vlans on a specific device use an array of VLANs to add.
The master device may be configured without addressing by setting dhcp to false.
```yaml
machine:
network:
interfaces:
- interface: eth0
dhcp: false
vlans:
- vlanId: 100
addresses:
- "192.168.2.10/28"
routes:
- network: 0.0.0.0/0
gateway: 192.168.2.1
```

View File

@ -0,0 +1,164 @@
---
title: "Air-gapped Environments"
description: "Setting up Talos Linux to work in environments with no internet access."
aliases:
- ../guides/air-gapped
---
In this guide we will create a Talos cluster running in an air-gapped environment with all the required images being pulled from an internal registry.
We will use the [QEMU]({{< relref "../talos-guides/install/local-platforms/qemu" >}}) provisioner available in `talosctl` to create a local cluster, but the same approach could be used to deploy Talos in bigger air-gapped networks.
## Requirements
The follow are requirements for this guide:
- Docker 18.03 or greater
- Requirements for the Talos [QEMU]({{< relref "../talos-guides/install/local-platforms/qemu" >}}) cluster
## Identifying Images
In air-gapped environments, access to the public Internet is restricted, so Talos can't pull images from public Docker registries (`docker.io`, `ghcr.io`, etc.)
We need to identify the images required to install and run Talos.
The same strategy can be used for images required by custom workloads running on the cluster.
The `talosctl image default` command provides a list of default images used by the Talos cluster (with default configuration
settings).
To print the list of images, run:
```bash
talosctl image default
```
This list contains images required by a default deployment of Talos.
There might be additional images required for the workloads running on this cluster, and those should be added to this list.
## Preparing the Internal Registry
As access to the public registries is restricted, we have to run an internal Docker registry.
In this guide, we will launch the registry on the same machine using Docker:
```bash
$ docker run -d -p 6000:5000 --restart always --name registry-airgapped registry:2
1bf09802bee1476bc463d972c686f90a64640d87dacce1ac8485585de69c91a5
```
This registry will be accepting connections on port 6000 on the host IPs.
The registry is empty by default, so we have fill it with the images required by Talos.
First, we pull all the images to our local Docker daemon:
```bash
$ for image in `talosctl image default`; do docker pull $image; done
v0.15.1: Pulling from coreos/flannel
Digest: sha256:9a296fbb67790659adc3701e287adde3c59803b7fcefe354f1fc482840cdb3d9
...
```
All images are now stored in the Docker daemon store:
```bash
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
gcr.io/etcd-development/etcd v3.5.3 604d4f022632 6 days ago 181MB
ghcr.io/siderolabs/install-cni v1.0.0-2-gc5d3ab0 4729e54f794d 6 days ago 76MB
...
```
Now we need to re-tag them so that we can push them to our local registry.
We are going to replace the first component of the image name (before the first slash) with our registry endpoint `127.0.0.1:6000`:
```bash
$ for image in `talosctl image default`; do \
docker tag $image `echo $image | sed -E 's#^[^/]+/#127.0.0.1:6000/#'`; \
done
```
As the next step, we push images to the internal registry:
```bash
$ for image in `talosctl image default`; do \
docker push `echo $image | sed -E 's#^[^/]+/#127.0.0.1:6000/#'`; \
done
```
We can now verify that the images are pushed to the registry:
```bash
$ curl http://127.0.0.1:6000/v2/_catalog
{"repositories":["coredns/coredns","coreos/flannel","etcd-development/etcd","kube-apiserver","kube-controller-manager","kube-proxy","kube-scheduler","pause","siderolabs/install-cni","siderolabs/installer","siderolabs/kubelet"]}
```
> Note: images in the registry don't have the registry endpoint prefix anymore.
## Launching Talos in an Air-gapped Environment
For Talos to use the internal registry, we use the registry mirror feature to redirect all image pull requests to the internal registry.
This means that the registry endpoint (as the first component of the image reference) gets ignored, and all pull requests are sent directly to the specified endpoint.
We are going to use a QEMU-based Talos cluster for this guide, but the same approach works with Docker-based clusters as well.
As QEMU-based clusters go through the Talos install process, they can be used better to model a real air-gapped environment.
Identify all registry prefixes from `talosctl image default`, for example:
- `docker.io`
- `gcr.io`
- `ghcr.io`
- `registry.k8s.io`
The `talosctl cluster create` command provides conveniences for common configuration options.
The only required flag for this guide is `--registry-mirror <endpoint>=http://10.5.0.1:6000` which redirects every pull request to the internal registry, this flag
needs to be repeated for each of the identified registry prefixes above.
The endpoint being used is `10.5.0.1`, as this is the default bridge interface address which will be routable from the QEMU VMs (`127.0.0.1` IP will be pointing to the VM itself).
```bash
$ sudo --preserve-env=HOME talosctl cluster create --provisioner=qemu --install-image=ghcr.io/siderolabs/installer:{{< release >}} \
--registry-mirror docker.io=http://10.5.0.1:6000 \
--registry-mirror gcr.io=http://10.5.0.1:6000 \
--registry-mirror ghcr.io=http://10.5.0.1:6000 \
--registry-mirror registry.k8s.io=http://10.5.0.1:6000 \
validating CIDR and reserving IPs
generating PKI and tokens
creating state directory in "/home/user/.talos/clusters/talos-default"
creating network talos-default
creating load balancer
creating dhcpd
creating master nodes
creating worker nodes
waiting for API
...
```
> Note: `--install-image` should match the image which was copied into the internal registry in the previous step.
You can be verify that the cluster is air-gapped by inspecting the registry logs: `docker logs -f registry-airgapped`.
## Closing Notes
Running in an air-gapped environment might require additional configuration changes, for example using custom settings for DNS and NTP servers.
When scaling this guide to the bare-metal environment, following Talos config snippet could be used as an equivalent of the `--registry-mirror` flag above:
```bash
machine:
...
registries:
mirrors:
docker.io:
endpoints:
- http://10.5.0.1:6000/
gcr.io:
endpoints:
- http://10.5.0.1:6000/
ghcr.io:
endpoints:
- http://10.5.0.1:6000/
registry.k8s.io:
endpoints:
- http://10.5.0.1:6000/
...
```
Other implementations of Docker registry can be used in place of the Docker `registry` image used above to run the registry.
If required, auth can be configured for the internal registry (and custom TLS certificates if needed).
Please see [pull-through cache guide]({{< relref "../talos-guides/configuration/pull-through-cache" >}}) for an example using Harbor container registry with Talos.

View File

@ -0,0 +1,98 @@
---
title: "Building Custom Talos Images"
description: "How to build a custom Talos image from source."
---
There might be several reasons to build Talos images from source:
* verifying the [image integrity]({{< relref "verifying-images" >}})
* building an image with custom configuration
## Checkout Talos Source
```bash
git clone https://github.com/siderolabs/talos.git
```
If building for a specific release, checkout the corresponding tag:
```bash
git checkout {{< release >}}
```
## Set up the Build Environment
See [Developing Talos]({{< relref "developing-talos" >}}) for details on setting up the buildkit builder.
## Architectures
By default, Talos builds for `linux/amd64`, but you can customize that by passing `PLATFORM` variable to `make`:
```bash
make <target> PLATFORM=linux/arm64 # build for arm64 only
make <target> PLATFORM=linux/arm64,linux/amd64 # build for arm64 and amd64, container images will be multi-arch
```
## Customizations
Some of the build parameters can be customized by passing environment variables to `make`, e.g. `GOAMD64=v1` can be used to build
Talos images compatible with old AMD64 CPUs:
```bash
make <target> GOAMD64=v1
```
## Building Kernel and Initramfs
The most basic boot assets can be built with:
```bash
make kernel initramfs
```
Build result will be stored as `_out/vmlinuz-<arch>` and `_out/initramfs-<arch>.xz`.
## Building Container Images
Talos container images should be pushed to the registry as the result of the build process.
The default settings are:
* `IMAGE_REGISTRY` is set to `ghcr.io`
* `USERNAME` is set to the `siderolabs` (or value of environment variable `USERNAME` if it is set)
The image can be pushed to any registry you have access to, but the access credentials should be stored in `~/.docker/config.json` file (e.g. with `docker login`).
Building and pushing the image can be done with:
```bash
make installer PUSH=true IMAGE_REGISTRY=docker.io USERNAME=<username> # ghcr.io/siderolabs/installer
make imager PUSH=true IMAGE_REGISTRY=docker.io USERNAME=<username> # ghcr.io/siderolabs/installer
```
## Building ISO
The ISO image is built with the help of `imager` container image, by default `ghcr.io/siderolabs/imager` will be used with the matching tag:
```bash
make iso
```
The ISO image will be stored as `_out/talos-<arch>.iso`.
If ISO image should be built with the custom `imager` image, it can be specified with `IMAGE_REGISTRY`/`USERNAME` variables:
```bash
make iso IMAGE_REGISTRY=docker.io USERNAME=<username>
```
## Building Disk Images
The disk image is built with the help of `imager` container image, by default `ghcr.io/siderolabs/imager` will be used with the matching tag:
```bash
make image-metal
```
Available disk images are encoded in the `image-%` target, e.g. `make image-aws`.
Same as with ISO image, the custom `imager` image can be specified with `IMAGE_REGISTRY`/`USERNAME` variables.

View File

@ -0,0 +1,54 @@
---
title: "Customizing the Kernel"
description: "Guide on how to customize the kernel used by Talos Linux."
aliases:
- ../guides/customizing-the-kernel
---
The installer image contains [`ONBUILD`](https://docs.docker.com/engine/reference/builder/#onbuild) instructions that handle the following:
- the decompression, and unpacking of the `initramfs.xz`
- the unsquashing of the rootfs
- the copying of new rootfs files
- the squashing of the new rootfs
- and the packing, and compression of the new `initramfs.xz`
When used as a base image, the installer will perform the above steps automatically with the requirement that a `customization` stage be defined in the `Dockerfile`.
Build and push your own kernel:
```sh
git clone https://github.com/talos-systems/pkgs.git
cd pkgs
make kernel-menuconfig USERNAME=_your_github_user_name_
docker login ghcr.io --username _your_github_user_name_
make kernel USERNAME=_your_github_user_name_ PUSH=true
```
Using a multi-stage `Dockerfile` we can define the `customization` stage and build `FROM` the installer image:
```docker
FROM scratch AS customization
# this is needed so that Talos copies base kernel modules info and default modules shipped with Talos
COPY --from=<custom kernel image> /lib/modules /kernel/lib/modules
# this copies over the custom modules
COPY --from=<custom kernel image> /lib/modules /lib/modules
FROM ghcr.io/siderolabs/installer:latest
COPY --from=<custom kernel image> /boot/vmlinuz /usr/install/${TARGETARCH}/vmlinuz
```
When building the image, the `customization` stage will automatically be copied into the rootfs.
The `customization` stage is not limited to a single `COPY` instruction.
In fact, you can do whatever you would like in this stage, but keep in mind that everything in `/` will be copied into the rootfs.
To build the image, run:
```bash
DOCKER_BUILDKIT=0 docker build --build-arg RM="/lib/modules" -t installer:kernel .
```
> Note: buildkit has a bug [#816](https://github.com/moby/buildkit/issues/816), to disable it use `DOCKER_BUILDKIT=0`
Now that we have a custom installer we can build Talos for the specific platform we wish to deploy to.

View File

@ -0,0 +1,63 @@
---
title: "Customizing the Root Filesystem"
description: "How to add your own content to the immutable root file system of Talos Linux."
aliases:
- ../guides/customizing-the-root-filesystem
---
The installer image contains [`ONBUILD`](https://docs.docker.com/engine/reference/builder/#onbuild) instructions that handle the following:
- the decompression, and unpacking of the `initramfs.xz`
- the unsquashing of the rootfs
- the copying of new rootfs files
- the squashing of the new rootfs
- and the packing, and compression of the new `initramfs.xz`
When used as a base image, the installer will perform the above steps automatically with the requirement that a `customization` stage be defined in the `Dockerfile`.
For example, say we have an image that contains the contents of a library we wish to add to the Talos rootfs.
We need to define a stage with the name `customization`:
```docker
FROM scratch AS customization
COPY --from=<name|index> <src> <dest>
```
Using a multi-stage `Dockerfile` we can define the `customization` stage and build `FROM` the installer image:
```docker
FROM scratch AS customization
COPY --from=<name|index> <src> <dest>
FROM ghcr.io/siderolabs/installer:latest
```
When building the image, the `customization` stage will automatically be copied into the rootfs.
The `customization` stage is not limited to a single `COPY` instruction.
In fact, you can do whatever you would like in this stage, but keep in mind that everything in `/` will be copied into the rootfs.
> Note: `<dest>` is the path relative to the rootfs that you wish to place the contents of `<src>`.
To build the image, run:
```bash
docker build --squash -t <organization>/installer:latest .
```
In the case that you need to perform some cleanup _before_ adding additional files to the rootfs, you can specify the `RM` [build-time variable](https://docs.docker.com/engine/reference/commandline/build/#set-build-time-variables---build-arg):
```bash
docker build --squash --build-arg RM="[<path> ...]" -t <organization>/installer:latest .
```
This will perform a `rm -rf` on the specified paths relative to the rootfs.
> Note: `RM` must be a whitespace delimited list.
The resulting image can be used to:
- generate an image for any of the supported providers
- perform bare-metall installs
- perform upgrades
We will step through common customizations in the remainder of this section.

View File

@ -0,0 +1,334 @@
---
title: "Developing Talos"
description: "Learn how to set up a development environment for local testing and hacking on Talos itself!"
aliases:
- ../learn-more/developing-talos
---
This guide outlines steps and tricks to develop Talos operating systems and related components.
The guide assumes Linux operating system on the development host.
Some steps might work under Mac OS X, but using Linux is highly advised.
## Prepare
Check out the [Talos repository](https://github.com/siderolabs/talos).
Try running `make help` to see available `make` commands.
You would need Docker and `buildx` installed on the host.
> Note: Usually it is better to install up to date Docker from Docker apt repositories, e.g. [Ubuntu instructions](https://docs.docker.com/engine/install/ubuntu/).
>
> If `buildx` plugin is not available with OS docker packages, it can be installed [as a plugin from GitHub releases](https://docs.docker.com/buildx/working-with-buildx/#install).
Set up a builder with access to the host network:
```bash
docker buildx create --driver docker-container --driver-opt network=host --name local1 --buildkitd-flags '--allow-insecure-entitlement security.insecure' --use
```
> Note: `network=host` allows buildx builder to access host network, so that it can push to a local container registry (see below).
Make sure the following steps work:
- `make talosctl`
- `make initramfs kernel`
Set up a local docker registry:
```bash
docker run -d -p 5005:5000 \
--restart always \
--name local registry:2
```
Try to build and push to local registry an installer image:
```bash
make installer IMAGE_REGISTRY=127.0.0.1:5005 PUSH=true
```
Record the image name output in the step above.
> Note: it is also possible to force a stable image tag by using `TAG` variable: `make installer IMAGE_REGISTRY=127.0.0.1:5005 TAG=v1.0.0-alpha.1 PUSH=true`.
## Running Talos cluster
Set up local caching docker registries (this speeds up Talos cluster boot a lot), script is in the Talos repo:
```bash
bash hack/start-registry-proxies.sh
```
Start your local cluster with:
```bash
sudo --preserve-env=HOME _out/talosctl-linux-amd64 cluster create \
--provisioner=qemu \
--cidr=172.20.0.0/24 \
--registry-mirror docker.io=http://172.20.0.1:5000 \
--registry-mirror registry.k8s.io=http://172.20.0.1:5001 \
--registry-mirror gcr.io=http://172.20.0.1:5003 \
--registry-mirror ghcr.io=http://172.20.0.1:5004 \
--registry-mirror 127.0.0.1:5005=http://172.20.0.1:5005 \
--install-image=127.0.0.1:5005/siderolabs/installer:<RECORDED HASH from the build step> \
--controlplanes 3 \
--workers 2 \
--with-bootloader=false
```
- `--provisioner` selects QEMU vs. default Docker
- custom `--cidr` to make QEMU cluster use different network than default Docker setup (optional)
- `--registry-mirror` uses the caching proxies set up above to speed up boot time a lot, last one adds your local registry (installer image was pushed to it)
- `--install-image` is the image you built with `make installer` above
- `--controlplanes` & `--workers` configure cluster size, choose to match your resources; 3 controlplanes give you HA control plane; 1 controlplane is enough, never do 2 controlplanes
- `--with-bootloader=false` disables boot from disk (Talos will always boot from `_out/vmlinuz-amd64` and `_out/initramfs-amd64.xz`).
This speeds up development cycle a lot - no need to rebuild installer and perform install, rebooting is enough to get new code.
> Note: as boot loader is not used, it's not necessary to rebuild `installer` each time (old image is fine), but sometimes it's needed (when configuration changes are done and old installer doesn't validate the config).
>
> `talosctl cluster create` derives Talos machine configuration version from the install image tag, so sometimes early in the development cycle (when new minor tag is not released yet), machine config version can be overridden with `--talos-version={{< version >}}`.
If the `--with-bootloader=false` flag is not enabled, for Talos cluster to pick up new changes to the code (in `initramfs`), it will require a Talos upgrade (so new `installer` should be built).
With `--with-bootloader=false` flag, Talos always boots from `initramfs` in `_out/` directory, so simple reboot is enough to pick up new code changes.
If the installation flow needs to be tested, `--with-bootloader=false` shouldn't be used.
## Console Logs
Watching console logs is easy with `tail`:
```bash
tail -F ~/.talos/clusters/talos-default/talos-default-*.log
```
## Interacting with Talos
Once `talosctl cluster create` finishes successfully, `talosconfig` and `kubeconfig` will be set up automatically to point to your cluster.
Start playing with `talosctl`:
```bash
talosctl -n 172.20.0.2 version
talosctl -n 172.20.0.3,172.20.0.4 dashboard
talosctl -n 172.20.0.4 get members
```
Same with `kubectl`:
```bash
kubectl get nodes -o wide
```
You can deploy some Kubernetes workloads to the cluster.
You can edit machine config on the fly with `talosctl edit mc --immediate`, config patches can be applied via `--config-patch` flags, also many features have specific flags in `talosctl cluster create`.
## Quick Reboot
To reboot whole cluster quickly (e.g. to pick up a change made in the code):
```bash
for socket in ~/.talos/clusters/talos-default/talos-default-*.monitor; do echo "q" | sudo socat - unix-connect:$socket; done
```
Sending `q` to a single socket allows to reboot a single node.
> Note: This command performs immediate reboot (as if the machine was powered down and immediately powered back up), for normal Talos reboot use `talosctl reboot`.
## Development Cycle
Fast development cycle:
- bring up a cluster
- make code changes
- rebuild `initramfs` with `make initramfs`
- reboot a node to pick new `initramfs`
- verify code changes
- more code changes...
Some aspects of Talos development require to enable bootloader (when working on `installer` itself), in that case quick development cycle is no longer possible, and cluster should be destroyed and recreated each time.
## Running Integration Tests
If integration tests were changed (or when running them for the first time), first rebuild the integration test binary:
```bash
rm -f _out/integration-test-linux-amd64; make _out/integration-test-linux-amd64
```
Running short tests against QEMU provisioned cluster:
```bash
_out/integration-test-linux-amd64 \
-talos.provisioner=qemu \
-test.v \
-talos.crashdump=false \
-test.short \
-talos.talosctlpath=$PWD/_out/talosctl-linux-amd64
```
Whole test suite can be run removing `-test.short` flag.
Specfic tests can be run with `-test.run=TestIntegration/api.ResetSuite`.
## Build Flavors
`make <something> WITH_RACE=1` enables Go race detector, Talos runs slower and uses more memory, but memory races are detected.
`make <something> WITH_DEBUG=1` enables Go profiling and other debug features, useful for local development.
## Destroying Cluster
```bash
sudo --preserve-env=HOME ../talos/_out/talosctl-linux-amd64 cluster destroy --provisioner=qemu
```
This command stops QEMU and helper processes, tears down bridged network on the host, and cleans up
cluster state in `~/.talos/clusters`.
> Note: if the host machine is rebooted, QEMU instances and helpers processes won't be started back.
> In that case it's required to clean up files in `~/.talos/clusters/<cluster-name>` directory manually.
## Optional
Set up cross-build environment with:
```bash
docker run --rm --privileged multiarch/qemu-user-static --reset -p yes
```
> Note: the static qemu binaries which come with Ubuntu 21.10 seem to be broken.
## Unit tests
Unit tests can be run in buildx with `make unit-tests`, on Ubuntu systems some tests using `loop` devices will fail because Ubuntu uses low-index `loop` devices for snaps.
Most of the unit-tests can be run standalone as well, with regular `go test`, or using IDE integration:
```bash
go test -v ./internal/pkg/circular/
```
This provides much faster feedback loop, but some tests require either elevated privileges (running as `root`) or additional binaries available only in Talos `rootfs` (containerd tests).
Running tests as root can be done with `-exec` flag to `go test`, but this is risky, as test code has root access and can potentially make undesired changes:
```bash
go test -exec sudo -v ./internal/app/machined/pkg/controllers/network/...
```
## Go Profiling
Build `initramfs` with debug enabled: `make initramfs WITH_DEBUG=1`.
Launch Talos cluster with bootloader disabled, and use `go tool pprof` to capture the profile and show the output in your browser:
```bash
go tool pprof http://172.20.0.2:9982/debug/pprof/heap
```
The IP address `172.20.0.2` is the address of the Talos node, and port `:9982` depends on the Go application to profile:
- 9981: `apid`
- 9982: `machined`
- 9983: `trustd`
## Testing Air-gapped Environments
There is a hidden `talosctl debug air-gapped` command which launches two components:
- HTTP proxy capable of proxying HTTP and HTTPS requests
- HTTPS server with a self-signed certificate
The command also writes down Talos machine configuration patch to enable the HTTP proxy and add a self-signed certificate
to the list of trusted certificates:
```shell
$ talosctl debug air-gapped --advertised-address 172.20.0.1
2022/08/04 16:43:14 writing config patch to air-gapped-patch.yaml
2022/08/04 16:43:14 starting HTTP proxy on :8002
2022/08/04 16:43:14 starting HTTPS server with self-signed cert on :8001
```
The `--advertised-address` should match the bridge IP of the Talos node.
Generated machine configuration patch looks like:
```yaml
machine:
files:
- content: |
-----BEGIN CERTIFICATE-----
MIIBijCCAS+gAwIBAgIBATAKBggqhkjOPQQDAjAUMRIwEAYDVQQKEwlUZXN0IE9u
bHkwHhcNMjIwODA0MTI0MzE0WhcNMjIwODA1MTI0MzE0WjAUMRIwEAYDVQQKEwlU
ZXN0IE9ubHkwWTATBgcqhkjOPQIBBggqhkjOPQMBBwNCAAQfOJdaOFSOI1I+EeP1
RlMpsDZJaXjFdoo5zYM5VYs3UkLyTAXAmdTi7JodydgLhty0pwLEWG4NUQAEvip6
EmzTo3IwcDAOBgNVHQ8BAf8EBAMCBaAwHQYDVR0lBBYwFAYIKwYBBQUHAwEGCCsG
AQUFBwMCMA8GA1UdEwEB/wQFMAMBAf8wHQYDVR0OBBYEFCwxL+BjG0pDwaH8QgKW
Ex0J2mVXMA8GA1UdEQQIMAaHBKwUAAEwCgYIKoZIzj0EAwIDSQAwRgIhAJoW0z0D
JwpjFcgCmj4zT1SbBFhRBUX64PHJpAE8J+LgAiEAvfozZG8Or6hL21+Xuf1x9oh4
/4Hx3jozbSjgDyHOLk4=
-----END CERTIFICATE-----
permissions: 0o644
path: /etc/ssl/certs/ca-certificates
op: append
env:
http_proxy: http://172.20.0.1:8002
https_proxy: http://172.20.0.1:8002
no_proxy: 172.20.0.1/24
cluster:
extraManifests:
- https://172.20.0.1:8001/debug.yaml
```
The first section appends a self-signed certificate of the HTTPS server to the list of trusted certificates,
followed by the HTTP proxy setup (in-cluster traffic is excluded from the proxy).
The last section adds an extra Kubernetes manifest hosted on the HTTPS server.
The machine configuration patch can now be used to launch a test Talos cluster:
```shell
talosctl cluster create ... --config-patch @air-gapped-patch.yaml
```
The following lines should appear in the output of the `talosctl debug air-gapped` command:
- `CONNECT discovery.talos.dev:443`: the HTTP proxy is used to talk to the discovery service
- `http: TLS handshake error from 172.20.0.2:53512: remote error: tls: bad certificate`: an expected error on Talos side, as self-signed cert is not written yet to the file
- `GET /debug.yaml`: Talos successfully fetches the extra manifest successfully
There might be more output depending on the registry caches being used or not.
## Running Upgrade Integration Tests
Talos has a separate set of provision upgrade tests, which create a cluster on older versions of Talos, perform an upgrade,
and verify that the cluster is still functional.
Build the test binary:
```bash
rm -f _out/integration-test-provision-linux-amd64; make _out/integration-test-provision-linux-amd64
```
Prepare the test artifacts for the upgrade test:
```bash
make release-artifacts
```
Build and push an installer image for the development version of Talos:
```bash
make installer IMAGE_REGISTRY=127.0.0.1:5005 PUSH=true
```
Run the tests (the tests will create the cluster on the older version of Talos, perform an upgrade, and verify that the cluster is still functional):
```bash
sudo --preserve-env=HOME _out/integration-test-provision-linux-amd64 \
-test.v \
-talos.talosctlpath _out/talosctl-linux-amd64 \
-talos.provision.target-installer-registry=127.0.0.1:5005 \
-talos.provision.registry-mirror 127.0.0.1:5005=http://172.20.0.1:5005,docker.io=http://172.20.0.1:5000,registry.k8s.io=http://172.20.0.1:5001,quay.io=http://172.20.0.1:5002,gcr.io=http://172.20.0.1:5003,ghcr.io=http://172.20.0.1:5004 \
-talos.provision.cidr 172.20.0.0/24
```

View File

@ -0,0 +1,148 @@
---
title: "Disaster Recovery"
description: "Procedure for snapshotting etcd database and recovering from catastrophic control plane failure."
aliases:
- ../guides/disaster-recovery
---
`etcd` database backs Kubernetes control plane state, so if the `etcd` service is unavailable,
the Kubernetes control plane goes down, and the cluster is not recoverable until `etcd` is recovered.
`etcd` builds around the consensus protocol Raft, so highly-available control plane clusters can tolerate the loss of nodes so long as more than half of the members are running and reachable.
For a three control plane node Talos cluster, this means that the cluster tolerates a failure of any single node,
but losing more than one node at the same time leads to complete loss of service.
Because of that, it is important to take routine backups of `etcd` state to have a snapshot to recover the cluster from
in case of catastrophic failure.
## Backup
### Snapshotting `etcd` Database
Create a consistent snapshot of `etcd` database with `talosctl etcd snapshot` command:
```bash
$ talosctl -n <IP> etcd snapshot db.snapshot
etcd snapshot saved to "db.snapshot" (2015264 bytes)
snapshot info: hash c25fd181, revision 4193, total keys 1287, total size 3035136
```
> Note: filename `db.snapshot` is arbitrary.
This database snapshot can be taken on any healthy control plane node (with IP address `<IP>` in the example above),
as all `etcd` instances contain exactly same data.
It is recommended to configure `etcd` snapshots to be created on some schedule to allow point-in-time recovery using the latest snapshot.
### Disaster Database Snapshot
If the `etcd` cluster is not healthy (for example, if quorum has already been lost), the `talosctl etcd snapshot` command might fail.
In that case, copy the database snapshot directly from the control plane node:
```bash
talosctl -n <IP> cp /var/lib/etcd/member/snap/db .
```
This snapshot might not be fully consistent (if the `etcd` process is running), but it allows
for disaster recovery when latest regular snapshot is not available.
### Machine Configuration
Machine configuration might be required to recover the node after hardware failure.
Backup Talos node machine configuration with the command:
```bash
talosctl -n IP get mc v1alpha1 -o yaml | yq eval '.spec' -
```
## Recovery
Before starting a disaster recovery procedure, make sure that `etcd` cluster can't be recovered:
* get `etcd` cluster member list on all healthy control plane nodes with `talosctl -n IP etcd members` command and compare across all members.
* query `etcd` health across control plane nodes with `talosctl -n IP service etcd`.
If the quorum can be restored, restoring quorum might be a better strategy than performing full disaster recovery
procedure.
### Latest Etcd Snapshot
Get hold of the latest `etcd` database snapshot.
If a snapshot is not fresh enough, create a database snapshot (see above), even if the `etcd` cluster is unhealthy.
### Init Node
Make sure that there are no control plane nodes with machine type `init`:
```bash
$ talosctl -n <IP1>,<IP2>,... get machinetype
NODE NAMESPACE TYPE ID VERSION TYPE
172.20.0.2 config MachineType machine-type 2 controlplane
172.20.0.4 config MachineType machine-type 2 controlplane
172.20.0.3 config MachineType machine-type 2 controlplane
```
Init node type is deprecated, and are incompatible with `etcd` recovery procedure.
`init` node can be converted to `controlplane` type with `talosctl edit mc --mode=staged` command followed
by node reboot with `talosctl reboot` command.
### Preparing Control Plane Nodes
If some control plane nodes experienced hardware failure, replace them with new nodes.
Use machine configuration backup to re-create the nodes with the same secret material and control plane settings
to allow workers to join the recovered control plane.
If a control plane node is up but `etcd` isn't, wipe the node's [EPHEMERAL]({{< relref "../learn-more/architecture/#file-system-partitions" >}}) partition to remove the `etcd`
data directory (make sure a database snapshot is taken before doing this):
```bash
talosctl -n <IP> reset --graceful=false --reboot --system-labels-to-wipe=EPHEMERAL
```
At this point, all control plane nodes should boot up, and `etcd` service should be in the `Preparing` state.
The Kubernetes control plane endpoint should be pointed to the new control plane nodes if there were
changes to the node addresses.
### Recovering from the Backup
Make sure all `etcd` service instances are in `Preparing` state:
```bash
$ talosctl -n <IP> service etcd
NODE 172.20.0.2
ID etcd
STATE Preparing
HEALTH ?
EVENTS [Preparing]: Running pre state (17s ago)
[Waiting]: Waiting for service "cri" to be "up", time sync (18s ago)
[Waiting]: Waiting for service "cri" to be "up", service "networkd" to be "up", time sync (20s ago)
```
Execute the bootstrap command against any control plane node passing the path to the `etcd` database snapshot:
```bash
$ talosctl -n <IP> bootstrap --recover-from=./db.snapshot
recovering from snapshot "./db.snapshot": hash c25fd181, revision 4193, total keys 1287, total size 3035136
```
> Note: if database snapshot was copied out directly from the `etcd` data directory using `talosctl cp`,
> add flag `--recover-skip-hash-check` to skip integrity check on restore.
Talos node should print matching information in the kernel log:
```log
recovering etcd from snapshot: hash c25fd181, revision 4193, total keys 1287, total size 3035136
{"level":"info","msg":"restoring snapshot","path":"/var/lib/etcd.snapshot","wal-dir":"/var/lib/etcd/member/wal","data-dir":"/var/lib/etcd","snap-dir":"/var/li}
{"level":"info","msg":"restored last compact revision","meta-bucket-name":"meta","meta-bucket-name-key":"finishedCompactRev","restored-compact-revision":3360}
{"level":"info","msg":"added member","cluster-id":"a3390e43eb5274e2","local-member-id":"0","added-peer-id":"eb4f6f534361855e","added-peer-peer-urls":["https:/}
{"level":"info","msg":"restored snapshot","path":"/var/lib/etcd.snapshot","wal-dir":"/var/lib/etcd/member/wal","data-dir":"/var/lib/etcd","snap-dir":"/var/lib/etcd/member/snap"}
```
Now `etcd` service should become healthy on the bootstrap node, Kubernetes control plane components
should start and control plane endpoint should become available.
Remaining control plane nodes join `etcd` cluster once control plane endpoint is up.
## Single Control Plane Node Cluster
This guide applies to the single control plane clusters as well.
In fact, it is much more important to take regular snapshots of the `etcd` database in single control plane node
case, as loss of the control plane node might render the whole cluster irrecoverable without a backup.

View File

@ -0,0 +1,76 @@
---
title: "etcd Maintenance"
description: "Operational instructions for etcd database."
---
`etcd` database backs Kubernetes control plane state, so `etcd` health is critical for Kubernetes availability.
## Space Quota
`etcd` default database space quota is set to 2 GiB by default.
If the database size exceeds the quota, `etcd` will stop operations until the issue is resolved.
This condition can be checked with `talosctl etcd alarm list` command:
```bash
$ talosctl -n <IP> etcd alarm list
NODE MEMBER ALARM
172.20.0.2 a49c021e76e707db NOSPACE
```
If the Kubernetes database contains lots of resources, space quota can be increased to match the actual usage.
The recommended maximum size is 8 GiB.
To increase the space quota, edit the `etcd` section in the machine configuration:
```yaml
machine:
etcd:
extraArgs:
quota-backend-bytes: 4294967296 # 4 GiB
```
Once the node is rebooted with the new configuration, use `talosctl etcd alarm disarm` to clear the `NOSPACE` alarm.
## Defragmentation
`etcd` database can become fragmented over time if there are lots of writes and deletes.
Kubernetes API server performs automatic compaction of the `etcd` database, which marks deleted space as free and ready to be reused.
However, the space is not actually freed until the database is defragmented.
If the database is heavily fragmented (in use/db size ratio is less than 0.5), defragmentation might increase the performance.
If the database runs over the space quota (see above), but the actual in use database size is small, defragmentation is required to bring the on-disk database size below the limit.
Current database size can be checked with `talosctl etcd status` command:
```bash
$ talosctl -n <CP1>,<CP2>,<CP3> etcd status
NODE MEMBER DB SIZE IN USE LEADER RAFT INDEX RAFT TERM RAFT APPLIED INDEX LEARNER ERRORS
172.20.0.3 ecebb05b59a776f1 21 MB 6.0 MB (29.08%) ecebb05b59a776f1 53391 4 53391 false
172.20.0.2 a49c021e76e707db 17 MB 4.5 MB (26.10%) ecebb05b59a776f1 53391 4 53391 false
172.20.0.4 eb47fb33e59bf0e2 20 MB 5.9 MB (28.96%) ecebb05b59a776f1 53391 4 53391 false
```
If any of the nodes are over database size quota, alarms will be printed in the `ERRORS` column.
To defragment the database, run `talosctl etcd defrag` command:
```bash
talosctl -n <CP1> etcd defrag
```
> Note: defragmentation is a resource-intensive operation, so it is recommended to run it on a single node at a time.
> Defragmentation to a live member blocks the system from reading and writing data while rebuilding its state.
Once the defragmentation is complete, the database size will match closely to the in use size:
```bash
$ talosctl -n <CP1> etcd status
NODE MEMBER DB SIZE IN USE LEADER RAFT INDEX RAFT TERM RAFT APPLIED INDEX LEARNER ERRORS
172.20.0.2 a49c021e76e707db 4.5 MB 4.5 MB (100.00%) ecebb05b59a776f1 56065 4 56065 false
```
## Snapshotting
Regular backups of `etcd` database should be performed to ensure that the cluster can be restored in case of a failure.
This procedure is described in the [disaster recovery]({{< relref "disaster-recovery" >}}) guide.

View File

@ -0,0 +1,198 @@
---
title: "Extension Services"
description: "Use extension services in Talos Linux."
aliases:
- ../learn-more/extension-services
---
Talos provides a way to run additional system services early in the Talos boot process.
Extension services should be included into the Talos root filesystem (e.g. using [system extensions]({{< relref "../talos-guides/configuration/system-extensions" >}})).
Extension services run as privileged containers with ephemeral root filesystem located in the Talos root filesystem.
Extension services can be used to use extend core features of Talos in a way that is not possible via [static pods]({{< relref "../advanced/static-pods" >}}) or
Kubernetes DaemonSets.
Potential extension services use-cases:
* storage: Open iSCSI, software RAID, etc.
* networking: BGP FRR, etc.
* platform integration: VMWare open VM tools, etc.
## Configuration
Talos on boot scans directory `/usr/local/etc/containers` for `*.yaml` files describing the extension services to run.
Format of the extension service config:
```yaml
name: hello-world
container:
entrypoint: ./hello-world
# an optional path to a file containing environment variables
environmentFile: /var/etc/hello-world/env
environment:
- XDG_RUNTIME_DIR=/run
args:
- -f
mounts:
- # OCI Mount Spec
depends:
- service: cri
- path: /run/machined/machined.sock
- network:
- addresses
- connectivity
- hostname
- etcfiles
- time: true
restart: never|always|untilSuccess
```
### `name`
Field `name` sets the service name, valid names are `[a-z0-9-_]+`.
The service container root filesystem path is derived from the `name`: `/usr/local/lib/containers/<name>`.
The extension service will be registered as a Talos service under an `ext-<name>` identifier.
### `container`
* `entrypoint` defines the container entrypoint relative to the container root filesystem (`/usr/local/lib/containers/<name>`)
* `environmentFile` defines the path to a file containing environment variables, the service waits for the file to exist before starting
* `environment` defines the container environment variables, overrides the variables from `environmentFile`
* `args` defines the additional arguments to pass to the entrypoint
* `mounts` defines the volumes to be mounted into the container root
#### `container.mounts`
The section `mounts` uses the standard OCI spec:
```yaml
- source: /var/log/audit
destination: /var/log/audit
type: bind
options:
- rshared
- bind
- ro
```
All requested directories will be mounted into the extension service container mount namespace.
If the `source` directory doesn't exist in the host filesystem, it will be created (only for writable paths in the Talos root filesystem).
#### `container.security`
The section `security` follows this example:
```yaml
maskedPaths:
- "/should/be/masked"
readonlyPaths:
- "/path/that/should/be/readonly"
- "/another/readonly/path"
writeableRootfs: true
writeableSysfs: true
rootfsPropagation: shared
```
> * The rootfs is readonly by default unless `writeableRootfs: true` is set.
> * The sysfs is readonly by default unless `writeableSysfs: true` is set.
> * Masked paths if not set defaults to [containerd defaults](https://github.com/containerd/containerd/tree/main/oci/spec.go).
Masked paths will be mounted to `/dev/null`.
To set empty masked paths use:
>
> ```yaml
> container:
> security:
> maskedPaths: []
> ```
>
> * Read Only paths if not set defaults to [containerd defaults](https://github.com/containerd/containerd/tree/main/oci/spec.go).
Read-only paths will be mounted to `/dev/null`.
To set empty read only paths use:
>
> ```yaml
> container:
> security:
> readonlyPaths: []
> ```
>
> * Rootfs propagation is not set by default (container mounts are private).
### `depends`
The `depends` section describes extension service start dependencies: the service will not be started until all dependencies are met.
Available dependencies:
* `service: <name>`: wait for the service `<name>` to be running and healthy
* `path: <path>`: wait for the `<path>` to exist
* `network: [addresses, connectivity, hostname, etcfiles]`: wait for the specified network readiness checks to succeed
* `time: true`: wait for the NTP time sync
### `restart`
Field `restart` defines the service restart policy, it allows to either configure an always running service or a one-shot service:
* `always`: restart service always
* `never`: start service only once and never restart
* `untilSuccess`: restart failing service, stop restarting on successful run
## Example
Example layout of the Talos root filesystem contents for the extension service:
```text
/
└── usr
   └── local
      ├── etc
     │   └── containers
      │     └── hello-world.yaml
      └── lib
          └── containers
         └── hello-world
         ├── hello
└── config.ini
```
Talos discovers the extension service configuration in `/usr/local/etc/containers/hello-world.yaml`:
```yaml
name: hello-world
container:
entrypoint: ./hello
args:
- --config
- config.ini
depends:
- network:
- addresses
restart: always
```
Talos starts the container for the extension service with container root filesystem at `/usr/local/lib/containers/hello-world`:
```text
/
├── hello
└── config.ini
```
Extension service is registered as `ext-hello-world` in `talosctl services`:
```shell
$ talosctl service ext-hello-world
NODE 172.20.0.5
ID ext-hello-world
STATE Running
HEALTH ?
EVENTS [Running]: Started task ext-hello-world (PID 1100) for container ext-hello-world (2m47s ago)
[Preparing]: Creating service runner (2m47s ago)
[Preparing]: Running pre state (2m47s ago)
[Waiting]: Waiting for service "containerd" to be "up" (2m48s ago)
[Waiting]: Waiting for service "containerd" to be "up", network (2m49s ago)
```
An extension service can be started, restarted and stopped using `talosctl service ext-hello-world start|restart|stop`.
Use `talosctl logs ext-hello-world` to get the logs of the service.
Complete example of the extension service can be found in the [extensions repository](https://github.com/talos-systems/extensions/tree/main/examples/hello-world-service).

View File

@ -0,0 +1,83 @@
---
title: "Machine Configuration OAuth2 Authentication"
description: "How to authenticate Talos machine configuration download (`talos.config=`) on `metal` platform using OAuth."
---
Talos Linux when running on the `metal` platform can be configured to authenticate the machine configuration download using OAuth2 device flow.
The machine configuration is fetched from the URL specified with `talos.config` kernel argument, and by default this HTTP request is not authenticated.
When the OAuth2 authentication is enabled, Talos will authenticate the request using OAuth device flow first, and then pass the token to the machine configuration download endpoint.
## Prerequisites
Obtain the following information:
* OAuth client ID (mandatory)
* OAuth client secret (optional)
* OAuth device endpoint
* OAuth token endpoint
* OAuth scopes, audience (optional)
* OAuth client secret (optional)
* extra Talos variables to send to the device auth endpoint (optional)
## Configuration
Set the following kernel parameters on the initial Talos boot to enable the OAuth flow:
* `talos.config` set to the URL of the machine configuration endpoint (which will be authenticated using OAuth)
* `talos.config.oauth.client_id` set to the OAuth client ID (required)
* `talos.config.oauth.client_secret` set to the OAuth client secret (optional)
* `talos.config.oauth.scope` set to the OAuth scopes (optional, repeat the parameter for multiple scopes)
* `talos.config.oauth.audience` set to the OAuth audience (optional)
* `talos.config.oauth.device_auth_url` set to the OAuth device endpoint (if not set defaults to `talos.config` URL with the path `/device/code`)
* `talos.config.oauth.token_url` set to the OAuth token endpoint (if not set defaults to `talos.config` URL with the path `/token`)
* `talos.config.oauth.extra_variable` set to the extra Talos variables to send to the device auth endpoint (optional, repeat the parameter for multiple variables)
The list of variables supported by the `talos.config.oauth.extra_variable` parameter is same as the [list of variables]({{< relref "../reference/kernel#talosconfig" >}}) supported by the `talos.config` parameter.
## Flow
On the initial Talos boot, when machine configuration is not available, Talos will print the following messages:
```text
[talos] downloading config {"component": "controller-runtime", "controller": "config.AcquireController", "platform": "metal"}
[talos] waiting for network to be ready
[talos] [OAuth] starting the authentication device flow with the following settings:
[talos] [OAuth] - client ID: "<REDACTED>"
[talos] [OAuth] - device auth URL: "https://oauth2.googleapis.com/device/code"
[talos] [OAuth] - token URL: "https://oauth2.googleapis.com/token"
[talos] [OAuth] - extra variables: ["uuid" "mac"]
[talos] waiting for variables: [uuid mac]
[talos] waiting for variables: [mac]
[talos] [OAuth] please visit the URL https://www.google.com/device and enter the code <REDACTED>
[talos] [OAuth] waiting for the device to be authorized (expires at 14:46:55)...
```
If the OAuth service provides the complete verification URL, the QR code to scan is also printed to the console:
```text
[talos] [OAuth] or scan the following QR code:
█████████████████████████████████
█████████████████████████████████
████ ▄▄▄▄▄ ██▄▀▀ ▀█ ▄▄▄▄▄ ████
████ █ █ █▄ ▀▄██▄██ █ █ ████
████ █▄▄▄█ ██▀▄██▄ ▀█ █▄▄▄█ ████
████▄▄▄▄▄▄▄█ ▀ █ ▀ █▄█▄▄▄▄▄▄▄████
████ ▀ ▄▄ ▄█ ██▄█ ███▄█▀████
████▀█▄ ▄▄▀▄▄█▀█▄██ ▄▀▄██▄ ▄████
████▄██▀█▄▄▄███▀ ▀█▄▄ ██ █▄ ████
████▄▀▄▄▄ ▄███ ▄ ▀ ▀▀▄▀▄▀█▄ ▄████
████▄█████▄█ █ ██ ▀ ▄▄▄ █▀▀████
████ ▄▄▄▄▄ █ █ ▀█▄█▄ █▄█ █▄ ████
████ █ █ █▄ ▄▀ ▀█▀▄▄▄ ▀█▄████
████ █▄▄▄█ █ ██▄ ▀ ▀███ ▀█▀▄████
████▄▄▄▄▄▄▄█▄▄█▄██▄▄▄▄█▄███▄▄████
█████████████████████████████████
```
Once the authentication flow is complete on the OAuth provider side, Talos will print the following message:
```text
[talos] [OAuth] device authorized
[talos] fetching machine config from: "http://example.com/config.yaml"
[talos] machine config loaded successfully {"component": "controller-runtime", "controller": "config.AcquireController", "sources": ["metal"]}
```

View File

@ -0,0 +1,420 @@
---
title: "Metal Network Configuration"
description: "How to use `META`-based network configuration on Talos `metal` platform."
---
> Note: This is an advanced feature which requires deep understanding of Talos and Linux network configuration.
Talos Linux when running on a cloud platform (e.g. AWS or Azure), uses the platform-provided metadata server to provide initial network configuration to the node.
When running on bare-metal, there is no metadata server, so there are several options to provide initial network configuration (before machine configuration is acquired):
- use automatic network configuration via DHCP (Talos default)
- use initial boot [kernel command line parameters](<{{ relref "../reference/kernel" }}>) to configure networking
- use automatic network configuration via DHCP just enough to fetch machine configuration and then use machine configuration to set desired advanced configuration.
If DHCP option is available, it is by far the easiest way to configure networking.
The initial boot kernel command line parameters are not very flexible, and they are not persisted after initial Talos installation.
Talos starting with version 1.4.0 offers a new option to configure networking on bare-metal: `META`-based network configuration.
> Note: `META`-based network configuration is only available on Talos Linux `metal` platform.
Talos [dashboard](<{{ relref "../talos-guides/interactive-dashboard" }}>) provides a way to configure `META`-based network configuration for a machine using the console, but
it doesn't support all kinds of network configuration.
## Network Configuration Format
Talos `META`-based network configuration is a YAML file with the following format:
```yaml
addresses:
- address: 147.75.61.43/31
linkName: bond0
family: inet4
scope: global
flags: permanent
layer: platform
- address: 2604:1380:45f2:6c00::1/127
linkName: bond0
family: inet6
scope: global
flags: permanent
layer: platform
- address: 10.68.182.1/31
linkName: bond0
family: inet4
scope: global
flags: permanent
layer: platform
links:
- name: eth0
up: true
masterName: bond0
slaveIndex: 0
layer: platform
- name: eth1
up: true
masterName: bond0
slaveIndex: 1
layer: platform
- name: bond0
logical: true
up: true
mtu: 0
kind: bond
type: ether
bondMaster:
mode: 802.3ad
xmitHashPolicy: layer3+4
lacpRate: slow
arpValidate: none
arpAllTargets: any
primaryReselect: always
failOverMac: 0
miimon: 100
updelay: 200
downdelay: 200
resendIgmp: 1
lpInterval: 1
packetsPerSlave: 1
numPeerNotif: 1
tlbLogicalLb: 1
adActorSysPrio: 65535
layer: platform
routes:
- family: inet4
gateway: 147.75.61.42
outLinkName: bond0
table: main
priority: 1024
scope: global
type: unicast
protocol: static
layer: platform
- family: inet6
gateway: '2604:1380:45f2:6c00::'
outLinkName: bond0
table: main
priority: 2048
scope: global
type: unicast
protocol: static
layer: platform
- family: inet4
dst: 10.0.0.0/8
gateway: 10.68.182.0
outLinkName: bond0
table: main
scope: global
type: unicast
protocol: static
layer: platform
hostnames:
- hostname: ci-blue-worker-amd64-2
layer: platform
resolvers: []
timeServers: []
```
Every section is optional, so you can configure only the parts you need.
The format of each section matches the respective network [`*Spec` resource](<{{ relref "../learn-more/networking-resources" }}>) `.spec` part, e.g the `addresses:`
section matches the `.spec` of `AddressSpec` resource:
```yaml
# talosctl get addressspecs bond0/10.68.182.1/31 -o yaml | yq .spec
address: 10.68.182.1/31
linkName: bond0
family: inet4
scope: global
flags: permanent
layer: platform
```
So one way to prepare the network configuration file is to boot Talos Linux, apply necessary network configuration using Talos machine configuration, and grab the resulting
resources from the running Talos instance.
In this guide we will briefly cover the most common examples of the network configuration.
### Addresses
The addresses configured are usually routable IP addresses assigned to the machine, so
the `scope:` should be set to `global` and `flags:` to `permanent`.
Additionally, `family:` should be set to either `inet4` or `init6` depending on the address family.
The `linkName:` property should match the name of the link the address is assigned to, it might be a physical link,
e.g. `en9sp0`, or the name of a logical link, e.g. `bond0`, created in the `links:` section.
Example, IPv4 address:
```yaml
addresses:
- address: 147.75.61.43/31
linkName: bond0
family: inet4
scope: global
flags: permanent
layer: platform
```
Example, IPv6 address:
```yaml
addresses:
- address: 2604:1380:45f2:6c00::1/127
linkName: bond0
family: inet6
scope: global
flags: permanent
layer: platform
```
### Links
For physical network interfaces (links), the most usual configuration is to bring the link up:
```yaml
links:
- name: en9sp0
up: true
layer: platform
```
This will bring the link up, and it will also disable Talos auto-configuration (disables running DHCP on the link).
Another common case is to set a custom MTU:
```yaml
links:
- name: en9sp0
up: true
mtu: 9000
layer: platform
```
The order of the links in the `links:` section is not important.
#### Bonds
For bonded links, there should be a link resource for the bond itself, and a link resource for each enslaved link:
```yaml
links:
- name: bond0
logical: true
up: true
kind: bond
type: ether
bondMaster:
mode: 802.3ad
xmitHashPolicy: layer3+4
lacpRate: slow
arpValidate: none
arpAllTargets: any
primaryReselect: always
failOverMac: 0
miimon: 100
updelay: 200
downdelay: 200
resendIgmp: 1
lpInterval: 1
packetsPerSlave: 1
numPeerNotif: 1
tlbLogicalLb: 1
adActorSysPrio: 65535
layer: platform
- name: eth0
up: true
masterName: bond0
slaveIndex: 0
layer: platform
- name: eth1
up: true
masterName: bond0
slaveIndex: 1
layer: platform
```
The name of the bond can be anything supported by Linux kernel, but the following properties are important:
- `logical: true` - this is a logical link, not a physical one
- `kind: bond` - this is a bonded link
- `type: ether` - this is an Ethernet link
- `bondMaster:` - defines bond configuration, please see Linux documentation on the available options
For each enslaved link, the following properties are important:
- `masterName: bond0` - the name of the bond this link is enslaved to
- `slaveIndex: 0` - the index of the enslaved link, starting from 0, controls the order of bond slaves
#### VLANs
VLANs are logical links which have a parent link, and a VLAN ID and protocol:
```yaml
links:
- name: bond0.35
logical: true
up: true
kind: vlan
type: ether
parentName: bond0
vlan:
vlanID: 35
vlanProtocol: 802.1ad
```
The name of the VLAN link can be anything supported by Linux kernel, but the following properties are important:
- `logical: true` - this is a logical link, not a physical one
- `kind: vlan` - this is a VLAN link
- `type: ether` - this is an Ethernet link
- `parentName: bond0` - the name of the parent link
- `vlan:` - defines VLAN configuration: `vlanID` and `vlanProtocol`
### Routes
For route configuration, most of the time `table: main`, `scope: global`, `type: unicast` and `protocol: static` are used.
The route most important fields are:
- `dst:` defines the destination network, if left empty means "default gateway"
- `gateway:` defines the gateway address
- `priority:` defines the route priority (metric), lower values are preferred for the same `dst:` network
- `outLinkName:` defines the name of the link the route is associated with
- `src:` sets the source address for the route (optional)
Additionally, `family:` should be set to either `inet4` or `init6` depending on the address family.
Example, IPv6 default gateway:
```yaml
routes:
- family: inet6
gateway: '2604:1380:45f2:6c00::'
outLinkName: bond0
table: main
priority: 2048
scope: global
type: unicast
protocol: static
layer: platform
```
Example, IPv4 route to `10/8` via `10.68.182.0` gateway:
```yaml
routes:
- family: inet4
dst: 10.0.0.0/8
gateway: 10.68.182.0
outLinkName: bond0
table: main
scope: global
type: unicast
protocol: static
layer: platform
```
### Hostnames
Even though the section supports multiple hostnames, only a single one should be used:
```yaml
hostnames:
- hostname: host
domainname: some.org
layer: platform
```
The `domainname:` is optional.
If the hostname is not set, Talos will use default generated hostname.
### Resolvers
The `resolvers:` section is used to configure DNS resolvers, only single entry should be used:
```yaml
resolvers:
- dnsServers:
- 8.8.8.8
- 1.1.1.1
layer: platform
```
If the `dnsServers:` is not set, Talos will use default DNS servers.
### Time Servers
The `timeServers:` section is used to configure NTP time servers, only single entry should be used:
```yaml
timeServers:
- timeServers:
- 169.254.169.254
layer: platform
```
If the `timeServers:` is not set, Talos will use default NTP servers.
## Supplying `META` Network Configuration
Once the network configuration YAML document is ready, it can be supplied to Talos in one of the following ways:
- for a running Talos machine, using Talos API (requires already established network connectivity)
- for Talos disk images, it can be embedded into the image
- for ISO/PXE boot methods, it can be supplied via kernel command line parameters as an environment variable
The metal network configuration is stored in Talos `META` partition under the key `0xa` (decimal 10).
In this guide we will assume that the prepared network configuration is stored in the file `network.yaml`.
> Note: as JSON is a subset of YAML, the network configuration can be also supplied as a JSON document.
### Supplying Network Configuration to a Running Talos Machine
Use the `talosctl` to write a network configuration to a running Talos machine:
```bash
talosctl meta write 0xa "$(cat network.yaml)"
```
### Supplying Network Configuration to a Talos Disk Image
Following the [boot assets](<{{ relref "../talos-guides/install/boot-assets" }}>) guide, create a disk image passing the network configuration as a `--meta` flag:
```bash
docker run --rm -t -v $PWD/_out:/out -v /dev:/dev --privileged ghcr.io/siderolabs/imager:{{< release >}} metal --meta "0xa=$(cat network.yaml)"
```
### Supplying Network Configuration to a Talos ISO/PXE Boot
As there is no `META` partition created yet before Talos Linux is installed, `META` values can be set as an environment variable `INSTALLER_META_BASE64` passed to the initial boot of Talos.
The supplied value will be used immediately, and also it will be written to the `META` partition once Talos is installed.
When using `imager` to create the ISO, the `INSTALLER_META_BASE64` environment variable will be automatically generated from the `--meta` flag:
```bash
$ docker run --rm -t -v $PWD/_out:/out ghcr.io/siderolabs/imager:{{< release >}} iso --meta "0xa=$(cat network.yaml)"
...
kernel command line: ... talos.environment=INSTALLER_META_BASE64=MHhhPWZvbw==
```
When PXE booting, the value of `INSTALLER_META_BASE64` should be set manually:
```bash
echo -n "0xa=$(cat network.yaml)" | base64
```
The resulting base64 string should be passed as an environment variable `INSTALLER_META_BASE64` to the initial boot of Talos: `talos.environment=INSTALLER_META_BASE64=<base64-encoded value>`.
### Getting Current `META` Network Configuration
Talos exports `META` keys as resources:
```yaml
# talosctl get meta 0x0a -o yaml
...
spec:
value: '{"addresses": ...}'
```

View File

@ -0,0 +1,150 @@
---
title: "Migrating from Kubeadm"
description: "Migrating Kubeadm-based clusters to Talos."
aliases:
- ../guides/migrating-from-kubeadm
---
It is possible to migrate Talos from a cluster that is created using
[kubeadm](https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/) to Talos.
High-level steps are the following:
1. Collect CA certificates and a bootstrap token from a control plane node.
2. Create a Talos machine config with the CA certificates with the ones you collected.
3. Update control plane endpoint in the machine config to point to the existing control plane (i.e. your load balancer address).
4. Boot a new Talos machine and apply the machine config.
5. Verify that the new control plane node is ready.
6. Remove one of the old control plane nodes.
7. Repeat the same steps for all control plane nodes.
8. Verify that all control plane nodes are ready.
9. Repeat the same steps for all worker nodes, using the machine config generated for the workers.
## Remarks on kube-apiserver load balancer
While migrating to Talos, you need to make sure that your kube-apiserver load balancer is in place
and keeps pointing to the correct set of control plane nodes.
This process depends on your load balancer setup.
If you are using an LB that is external to the control plane nodes (e.g. cloud provider LB, F5 BIG-IP, etc.),
you need to make sure that you update the backend IPs of the load balancer to point to the control plane nodes as
you add Talos nodes and remove kubeadm-based ones.
If your load balancing is done on the control plane nodes (e.g. keepalived + haproxy on the control plane nodes),
you can do the following:
1. Add Talos nodes and remove kubeadm-based ones while updating the haproxy backends
to point to the newly added nodes except the last kubeadm-based control plane node.
2. Turn off keepalived to drop the virtual IP used by the kubeadm-based nodes (introduces kube-apiserver downtime).
3. Set up a virtual-IP based new load balancer on the new set of Talos control plane nodes.
Use the previous LB IP as the LB virtual IP.
4. Verify apiserver connectivity over the Talos-managed virtual IP.
5. Migrate the last control-plane node.
## Prerequisites
- Admin access to the kubeadm-based cluster
- Access to the `/etc/kubernetes/pki` directory (e.g. SSH & root permissions)
on the control plane nodes of the kubeadm-based cluster
- Access to kube-apiserver load-balancer configuration
## Step-by-step guide
1. Download `/etc/kubernetes/pki` directory from a control plane node of the kubeadm-based cluster.
2. Create a new join token for the new control plane nodes:
```bash
# inside a control plane node
kubeadm token create --ttl 0
```
3. Create Talos secrets from the PKI directory you downloaded on step 1 and the token you generated on step 2:
```bash
talosctl gen secrets --kubernetes-bootstrap-token <TOKEN> --from-kubernetes-pki <PKI_DIR>
```
4. Create a new Talos config from the secrets:
```bash
talosctl gen config --with-secrets secrets.yaml <CLUSTER_NAME> https://<EXISTING_CLUSTER_LB_IP>
```
5. Collect the information about the kubeadm-based cluster from the kubeadm configmap:
```bash
kubectl get configmap -n kube-system kubeadm-config -oyaml
```
Take note of the following information in the `ClusterConfiguration`:
- `.controlPlaneEndpoint`
- `.networking.dnsDomain`
- `.networking.podSubnet`
- `.networking.serviceSubnet`
6. Replace the following information in the generated `controlplane.yaml`:
- `.cluster.network.cni.name` with `none`
- `.cluster.network.podSubnets[0]` with the value of the `networking.podSubnet` from the previous step
- `.cluster.network.serviceSubnets[0]` with the value of the `networking.serviceSubnet` from the previous step
- `.cluster.network.dnsDomain` with the value of the `networking.dnsDomain` from the previous step
7. Go through the rest of `controlplane.yaml` and `worker.yaml` to customize them according to your needs, especially :
- `.cluster.secretboxEncryptionSecret` should be either removed if you don't currently use `EncryptionConfig` on your `kube-apiserver` or set to the correct value
8. Make sure that, on your current Kubeadm cluster, the first `--service-account-issuer=` parameter in `/etc/kubernetes/manifests/kube-apiserver.yaml` is equal to the value of `.cluster.controlPlane.endpoint` in `controlplane.yaml`.
If it's not, add a new `--service-account-issuer=` parameter with the correct value before your current one in `/etc/kubernetes/manifests/kube-apiserver.yaml` on all of your control planes nodes, and restart the kube-apiserver containers.
9. Bring up a Talos node to be the initial Talos control plane node.
10. Apply the generated `controlplane.yaml` to the Talos control plane node:
```bash
talosctl --nodes <TALOS_NODE_IP> apply-config --insecure --file controlplane.yaml
```
11. Wait until the new control plane node joins the cluster and is ready.
```bash
kubectl get node -owide --watch
```
12. Update your load balancer to point to the new control plane node.
13. Drain the old control plane node you are replacing:
```bash
kubectl drain <OLD_NODE> --delete-emptydir-data --force --ignore-daemonsets --timeout=10m
```
14. Remove the old control plane node from the cluster:
```bash
kubectl delete node <OLD_NODE>
```
15. Destroy the old node:
```bash
# inside the node
sudo kubeadm reset --force
```
16. Repeat the same steps, starting from step 7, for all control plane nodes.
17. Repeat the same steps, starting from step 7, for all worker nodes while applying the `worker.yaml` instead and skipping the LB step:
```bash
talosctl --nodes <TALOS_NODE_IP> apply-config --insecure --file worker.yaml
```
18. Your kubeadm `kube-proxy` configuration may not be compatible with the one generated by Talos, which will make the Talos Kubernetes upgrades impossible (labels may not be the same, and `selector.matchLabels` is an immutable field).
To be sure, export your current kube-proxy daemonset manifest, check the labels, they have to be:
```yaml
tier: node
k8s-app: kube-proxy
```
If the are not, modify all the labels fields, save the file, delete your current kube-proxy daemonset, and apply the one you modified.

View File

@ -0,0 +1,65 @@
---
title: "Proprietary Kernel Modules"
description: "Adding a proprietary kernel module to Talos Linux"
aliases:
- ../guides/adding-a-proprietary-kernel-module
---
1. Patching and building the kernel image
1. Clone the `pkgs` repository from Github and check out the revision corresponding to your version of Talos Linux
```bash
git clone https://github.com/talos-systems/pkgs pkgs && cd pkgs
git checkout v0.8.0
```
2. Clone the Linux kernel and check out the revision that pkgs uses (this can be found in `kernel/kernel-prepare/pkg.yaml` and it will be something like the following: `https://cdn.kernel.org/pub/linux/kernel/v5.x/linux-x.xx.x.tar.xz`)
```bash
git clone https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git && cd linux
git checkout v5.15
```
3. Your module will need to be converted to be in-tree.
The steps for this are different depending on the complexity of the module to port, but generally it would involve moving the module source code into the `drivers` tree and creating a new Makefile and Kconfig.
4. Stage your changes in Git with `git add -A`.
5. Run `git diff --cached --no-prefix > foobar.patch` to generate a patch from your changes.
6. Copy this patch to `kernel/kernel/patches` in the `pkgs` repo.
7. Add a `patch` line in the `prepare` segment of `kernel/kernel/pkg.yaml`:
```bash
patch -p0 < /pkg/patches/foobar.patch
```
8. Build the kernel image.
Make sure you are logged in to `ghcr.io` before running this command, and you can change or omit `PLATFORM` depending on what you want to target.
```bash
make kernel PLATFORM=linux/amd64 USERNAME=your-username PUSH=true
```
9. Make a note of the image name the `make` command outputs.
2. Building the installer image
1. Copy the following into a new `Dockerfile`:
```dockerfile
FROM scratch AS customization
COPY --from=ghcr.io/your-username/kernel:<kernel version> /lib/modules /lib/modules
FROM ghcr.io/siderolabs/installer:<talos version>
COPY --from=ghcr.io/your-username/kernel:<kernel version> /boot/vmlinuz /usr/install/${TARGETARCH}/vmlinuz
```
2. Run to build and push the installer:
```bash
INSTALLER_VERSION=<talos version>
IMAGE_NAME="ghcr.io/your-username/talos-installer:$INSTALLER_VERSION"
DOCKER_BUILDKIT=0 docker build --build-arg RM="/lib/modules" -t "$IMAGE_NAME" . && docker push "$IMAGE_NAME"
```
3. Deploying to your cluster
```bash
talosctl upgrade --image ghcr.io/your-username/talos-installer:<talos version> --preserve=true
```

View File

@ -0,0 +1,100 @@
---
title: "Static Pods"
description: "Using Talos Linux to set up static pods in Kubernetes."
aliases:
- ../guides/static-pods
---
## Static Pods
Static pods are run directly by the `kubelet` bypassing the Kubernetes API server checks and validations.
Most of the time `DaemonSet` is a better alternative to static pods, but some workloads need to run
before the Kubernetes API server is available or might need to bypass security restrictions imposed by the API server.
See [Kubernetes documentation](https://kubernetes.io/docs/tasks/configure-pod-container/static-pod/) for more information on static pods.
## Configuration
Static pod definitions are specified in the Talos machine configuration:
```yaml
machine:
pods:
- apiVersion: v1
kind: Pod
metadata:
name: nginx
spec:
containers:
- name: nginx
image: nginx
```
Talos renders static pod definitions to the `kubelet` manifest directory (`/etc/kubernetes/manifests`), `kubelet` picks up the definition and launches the pod.
Talos accepts changes to the static pod configuration without a reboot.
## Usage
Kubelet mirrors pod definition to the API server state, so static pods can be inspected with `kubectl get pods`, logs can be retrieved with `kubectl logs`, etc.
```bash
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-talos-default-controlplane-2 1/1 Running 0 17s
```
If the API server is not available, status of the static pod can also be inspected with `talosctl containers --kubernetes`:
```bash
$ talosctl containers --kubernetes
NODE NAMESPACE ID IMAGE PID STATUS
172.20.0.3 k8s.io default/nginx-talos-default-controlplane-2 registry.k8s.io/pause:3.6 4886 SANDBOX_READY
172.20.0.3 k8s.io └─ default/nginx-talos-default-controlplane-2:nginx:4183a7d7a771 docker.io/library/nginx:latest
...
```
Logs of static pods can be retrieved with `talosctl logs --kubernetes`:
```bash
$ talosctl logs --kubernetes default/nginx-talos-default-controlplane-2:nginx:4183a7d7a771
172.20.0.3: 2022-02-10T15:26:01.289208227Z stderr F 2022/02/10 15:26:01 [notice] 1#1: using the "epoll" event method
172.20.0.3: 2022-02-10T15:26:01.2892466Z stderr F 2022/02/10 15:26:01 [notice] 1#1: nginx/1.21.6
172.20.0.3: 2022-02-10T15:26:01.28925723Z stderr F 2022/02/10 15:26:01 [notice] 1#1: built by gcc 10.2.1 20210110 (Debian 10.2.1-6)
```
## Troubleshooting
Talos doesn't perform any validation on the static pod definitions.
If the pod isn't running, use `kubelet` logs (`talosctl logs kubelet`) to find the problem:
```bash
$ talosctl logs kubelet
172.20.0.2: {"ts":1644505520281.427,"caller":"config/file.go:187","msg":"Could not process manifest file","path":"/etc/kubernetes/manifests/talos-default-nginx-gvisor.yaml","err":"invalid pod: [spec.containers: Required value]"}
```
## Resource Definitions
Static pod definitions are available as `StaticPod` resources combined with Talos-generated control plane static pods:
```bash
$ talosctl get staticpods
NODE NAMESPACE TYPE ID VERSION
172.20.0.3 k8s StaticPod default-nginx 1
172.20.0.3 k8s StaticPod kube-apiserver 1
172.20.0.3 k8s StaticPod kube-controller-manager 1
172.20.0.3 k8s StaticPod kube-scheduler 1
```
Talos assigns ID `<namespace>-<name>` to the static pods specified in the machine configuration.
On control plane nodes status of the running static pods is available in the `StaticPodStatus` resource:
```bash
$ talosctl get staticpodstatus
NODE NAMESPACE TYPE ID VERSION READY
172.20.0.3 k8s StaticPodStatus default/nginx-talos-default-controlplane-2 2 True
172.20.0.3 k8s StaticPodStatus kube-system/kube-apiserver-talos-default-controlplane-2 2 True
172.20.0.3 k8s StaticPodStatus kube-system/kube-controller-manager-talos-default-controlplane-2 3 True
172.20.0.3 k8s StaticPodStatus kube-system/kube-scheduler-talos-default-controlplane-2 3 True
```

View File

@ -0,0 +1,157 @@
---
title: "Talos API access from Kubernetes"
description: "How to access Talos API from within Kubernetes."
aliases:
- ../guides/talos-api-access-from-k8s
---
In this guide, we will enable the Talos feature to access the Talos API from within Kubernetes.
## Enabling the Feature
Edit the machine configuration to enable the feature, specifying the Kubernetes namespaces from which Talos API
can be accessed and the allowed Talos API roles.
```bash
talosctl -n 172.20.0.2 edit machineconfig
```
Configure the `kubernetesTalosAPIAccess` like the following:
```yaml
spec:
machine:
features:
kubernetesTalosAPIAccess:
enabled: true
allowedRoles:
- os:reader
allowedKubernetesNamespaces:
- default
```
## Injecting Talos ServiceAccount into manifests
Create the following manifest file `deployment.yaml`:
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: talos-api-access
spec:
selector:
matchLabels:
app: talos-api-access
template:
metadata:
labels:
app: talos-api-access
spec:
containers:
- name: talos-api-access
image: alpine:3
command:
- sh
- -c
- |
wget -O /usr/local/bin/talosctl https://github.com/siderolabs/talos/releases/download/<talos version>/talosctl-linux-amd64
chmod +x /usr/local/bin/talosctl
while true; talosctl -n 172.20.0.2 version; do sleep 1; done
```
**Note:** make sure that you replace the IP `172.20.0.2` with a valid Talos node IP.
Use `talosctl inject serviceaccount` command to inject the Talos ServiceAccount into the manifest.
```bash
talosctl inject serviceaccount -f deployment.yaml > deployment-injected.yaml
```
Inspect the generated manifest:
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
creationTimestamp: null
name: talos-api-access
spec:
selector:
matchLabels:
app: talos-api-access
strategy: {}
template:
metadata:
creationTimestamp: null
labels:
app: talos-api-access
spec:
containers:
- command:
- sh
- -c
- |
wget -O /usr/local/bin/talosctl https://github.com/siderolabs/talos/releases/download/<talos version>/talosctl-linux-amd64
chmod +x /usr/local/bin/talosctl
while true; talosctl -n 172.20.0.2 version; do sleep 1; done
image: alpine:3
name: talos-api-access
resources: {}
volumeMounts:
- mountPath: /var/run/secrets/talos.dev
name: talos-secrets
tolerations:
- operator: Exists
volumes:
- name: talos-secrets
secret:
secretName: talos-api-access-talos-secrets
status: {}
---
apiVersion: talos.dev/v1alpha1
kind: ServiceAccount
metadata:
name: talos-api-access-talos-secrets
spec:
roles:
- os:reader
---
```
As you can notice, your deployment manifest is now injected with the Talos ServiceAccount.
## Testing API Access
Apply the new manifest into `default` namespace:
```bash
kubectl apply -n default -f deployment-injected.yaml
```
Follow the logs of the pods belong to the deployment:
```bash
kubectl logs -n default -f -l app=talos-api-access
```
You'll see a repeating output similar to the following:
```text
Client:
Tag: <talos version>
SHA: ....
Built:
Go version: go1.18.4
OS/Arch: linux/amd64
Server:
NODE: 172.20.0.2
Tag: <talos version>
SHA: ...
Built:
Go version: go1.18.4
OS/Arch: linux/amd64
Enabled: RBAC
```
This means that the pod can talk to Talos API of node 172.20.0.2 successfully.

View File

@ -0,0 +1,37 @@
---
title: "Verifying Images"
description: "Verifying Talos container image signatures."
---
Sidero Labs signs the container images generated for the Talos release with [cosign](https://docs.sigstore.dev/cosign/overview/):
* `ghcr.io/siderolabs/installer` (Talos installer)
* `ghcr.io/siderolabs/talos` (Talos image for container runtime)
* `ghcr.io/siderolabs/talosctl` (`talosctl` client packaged as a container image)
* `ghcr.io/siderolabs/imager` (Talos install image generator)
* all [system extension images](https://github.com/siderolabs/extensions/)
## Verifying Container Image Signatures
The `cosign` tool can be used to verify the signatures of the Talos container images:
```bash
$ cosign verify --certificate-identity-regexp '@siderolabs\.com$' --certificate-oidc-issuer https://accounts.google.com ghcr.io/siderolabs/installer:v1.4.0
Verification for ghcr.io/siderolabs/installer:v1.4.0 --
The following checks were performed on each of these signatures:
- The cosign claims were validated
- Existence of the claims in the transparency log was verified offline
- The code-signing certificate was verified using trusted certificate authority certificates
[{"critical":{"identity":{"docker-reference":"ghcr.io/siderolabs/installer"},"image":{"docker-manifest-digest":"sha256:f41795cc88f40eb1bc6b3c638c4a3123f6ef3c90627bfc35c04ebab82581e3ee"},"type":"cosign container image signature"},"optional":{"1.3.6.1.4.1.57264.1.1":"https://accounts.google.com","Bundle":{"SignedEntryTimestamp":"MEQCIERkQpgEnPWnfjUHIWO9QxC9Ute3/xJOc7TO5GUnu59xAiBKcFvrDWHoUYChT0/+gaazTrI+r0/GWSbi+Q+sEQ5AKA==","Payload":{"body":"eyJhcGlWZXJzaW9uIjoiMC4wLjEiLCJraW5kIjoiaGFzaGVkcmVrb3JkIiwic3BlYyI6eyJkYXRhIjp7Imhhc2giOnsiYWxnb3JpdGhtIjoic2hhMjU2IiwidmFsdWUiOiJkYjhjYWUyMDZmODE5MDlmZmI4NjE4ZjRkNjIzM2ZlYmM3NzY5MzliOGUxZmZkMTM1ODA4ZmZjNDgwNjYwNGExIn19LCJzaWduYXR1cmUiOnsiY29udGVudCI6Ik1FVUNJUURQWXhiVG5vSDhJTzBEakRGRE9rNU1HUjRjMXpWMys3YWFjczNHZ2J0TG1RSWdHczN4dVByWUgwQTAvM1BSZmZydDRYNS9nOUtzQVdwdG9JbE9wSDF0NllrPSIsInB1YmxpY0tleSI6eyJjb250ZW50IjoiTFMwdExTMUNSVWRKVGlCRFJWSlVTVVpKUTBGVVJTMHRMUzB0Q2sxSlNVTXhha05EUVd4NVowRjNTVUpCWjBsVlNIbEhaRTFQVEhkV09WbFFSbkJYUVRKb01qSjRVM1ZIZVZGM2QwTm5XVWxMYjFwSmVtb3dSVUYzVFhjS1RucEZWazFDVFVkQk1WVkZRMmhOVFdNeWJHNWpNMUoyWTIxVmRWcEhWakpOVWpSM1NFRlpSRlpSVVVSRmVGWjZZVmRrZW1SSE9YbGFVekZ3WW01U2JBcGpiVEZzV2tkc2FHUkhWWGRJYUdOT1RXcE5kMDVFUlRSTlZHZDZUbXBWTlZkb1kwNU5hazEzVGtSRk5FMVVaekJPYWxVMVYycEJRVTFHYTNkRmQxbElDa3R2V2tsNmFqQkRRVkZaU1V0dldrbDZhakJFUVZGalJGRm5RVVZaUVdKaVkwbDZUVzR3ZERBdlVEZHVUa0pNU0VscU1rbHlORTFQZGpoVVRrVjZUemNLUkVadVRXSldVbGc0TVdWdmExQnVZblJHTVZGMmRWQndTVm95VkV3NFFUUkdSMWw0YldFeGJFTk1kMkk0VEZOVWMzRlBRMEZZYzNkblowWXpUVUUwUndwQk1WVmtSSGRGUWk5M1VVVkJkMGxJWjBSQlZFSm5UbFpJVTFWRlJFUkJTMEpuWjNKQ1owVkdRbEZqUkVGNlFXUkNaMDVXU0ZFMFJVWm5VVlZqYWsweUNrbGpVa1lyTkhOVmRuRk5ia3hsU0ZGMVJIRkdRakZqZDBoM1dVUldVakJxUWtKbmQwWnZRVlV6T1ZCd2VqRlphMFZhWWpWeFRtcHdTMFpYYVhocE5Ga0tXa1E0ZDB0M1dVUldVakJTUVZGSUwwSkRSWGRJTkVWa1dWYzFhMk50VmpWTWJrNTBZVmhLZFdJeldrRmpNbXhyV2xoS2RtSkhSbWxqZVRWcVlqSXdkd3BMVVZsTFMzZFpRa0pCUjBSMmVrRkNRVkZSWW1GSVVqQmpTRTAyVEhrNWFGa3lUblprVnpVd1kzazFibUl5T1c1aVIxVjFXVEk1ZEUxRGMwZERhWE5IQ2tGUlVVSm5OemgzUVZGblJVaFJkMkpoU0ZJd1kwaE5Oa3g1T1doWk1rNTJaRmMxTUdONU5XNWlNamx1WWtkVmRWa3lPWFJOU1VkTFFtZHZja0puUlVVS1FXUmFOVUZuVVVOQ1NIZEZaV2RDTkVGSVdVRXpWREIzWVhOaVNFVlVTbXBIVWpSamJWZGpNMEZ4U2t0WWNtcGxVRXN6TDJnMGNIbG5Remh3TjI4MFFRcEJRVWRJYkdGbVp6Um5RVUZDUVUxQlVucENSa0ZwUVdKSE5tcDZiVUkyUkZCV1dUVXlWR1JhUmtzeGVUSkhZVk5wVW14c1IydHlSRlpRVXpsSmJGTktDblJSU1doQlR6WlZkbnBFYVVOYVFXOXZSU3RLZVdwaFpFdG5hV2xLT1RGS00yb3ZZek5CUTA5clJIcFhOamxaVUUxQmIwZERRM0ZIVTAwME9VSkJUVVFLUVRKblFVMUhWVU5OUVZCSlRUVjJVbVpIY0VGVWNqQTJVR1JDTURjeFpFOXlLMHhFSzFWQ04zbExUVWRMWW10a1UxTnJaMUp5U3l0bGNuZHdVREp6ZGdvd1NGRkdiM2h0WlRkM1NYaEJUM2htWkcxTWRIQnpjazFJZGs5cWFFSmFTMVoxVG14WmRXTkJaMVF4V1VWM1ZuZHNjR2QzYTFWUFdrWjRUemRrUnpONkNtVnZOWFJ3YVdoV1kyTndWMlozUFQwS0xTMHRMUzFGVGtRZ1EwVlNWRWxHU1VOQlZFVXRMUzB0TFFvPSJ9fX19","integratedTime":1681843022,"logIndex":18304044,"logID":"c0d23d6ad406973f9559f3ba2d1ca01f84147d8ffc5b8445c224f98b9591801d"}},"Issuer":"https://accounts.google.com","Subject":"andrey.smirnov@siderolabs.com"}}]
```
The image should be signed using [cosing keyless flow](https://docs.sigstore.dev/cosign/keyless/) by a Sidero Labs employee with and email from `siderolabs.com` domain.
## Reproducible Builds
Talos builds for `kernel`, `initramfs`, `talosctl`, ISO image, and container images are reproducible.
So you can verify that the build is the same as the one as provided on [GitHub releases page](https://github.com/siderolabs/talos/releases).
See [building Talos images]({{< relref "building-images" >}}) for more details.

View File

@ -0,0 +1,4 @@
---
title: "Introduction"
weight: 10
---

View File

@ -0,0 +1,308 @@
---
title: Getting Started
weight: 30
description: "A guide to setting up a Talos Linux cluster."
---
This document will walk you through installing a simple Talos Cluster with a single control plane node and one or more worker nodes, explaining some of the concepts.
> If this is your first use of Talos Linux, we recommend the [Quickstart]({{< relref "quickstart" >}}) first, to quickly create a local virtual cluster in containers on your workstation.
>
>For a production cluster, extra steps are needed - see [Production Notes]({{< relref "prodnotes" >}}).
Regardless of where you run Talos, the steps to create a Kubernetes cluster are:
- boot machines off the Talos Linux image
- define the endpoint for the Kubernetes API and generate your machine configurations
- configure Talos Linux by applying machine configurations to the machines
- configure `talosctl`
- bootstrap Kubernetes
## Prerequisites
### `talosctl`
`talosctl` is a CLI tool which interfaces with the Talos API.
Talos Linux has no SSH access: `talosctl` is the tool you use to interact with the operating system on the machines.
Install `talosctl` before continuing:
```bash
curl -sL https://talos.dev/install | sh
```
> Note: If you boot systems off the ISO, Talos on the ISO image runs in RAM and acts as an installer.
> The version of `talosctl` that is used to create the machine configurations controls the version of Talos Linux that is installed on the machines - NOT the image that the machines are initially booted off.
> For example, booting a machine off the Talos 1.3.7 ISO, but creating the initial configuration with `talosctl` binary of version 1.4.1, will result in a machine running Talos Linux version 1.4.1.
>
> It is advisable to use the same version of `talosctl` as the version of the boot media used.
### Network access
This guide assumes that the systems being installed have outgoing access to the internet, allowing them to pull installer and container images, query NTP, etc.
If needed, see the documentation on [registry proxies]({{< relref "../talos-guides/configuration/pull-through-cache" >}}), local registries, and [airgapped installation]({{< relref "../advanced/air-gapped" >}}).
## Acquire the Talos Linux image and boot machines
The most general way to install Talos Linux is to use the ISO image.
The latest ISO image can be found on the Github [Releases](https://github.com/siderolabs/talos/releases) page:
- X86: [https://github.com/siderolabs/talos/releases/download/{{< release >}}/metal-amd64.iso](https://github.com/siderolabs/talos/releases/download/{{< release >}}/metal-amd64.iso)
- ARM64: [https://github.com/siderolabs/talos/releases/download/{{< release >}}/metal-arm64.iso](https://github.com/siderolabs/talos/releases/download/{{< release >}}/metal-arm64.iso)
When booted from the ISO, Talos will run in RAM and will not install to disk until provided a configuration.
Thus, it is safe to boot any machine from the ISO.
At this point, you should:
- boot one machine off the ISO to be the control plane node
- boot one or more machines off the same ISO to be the workers
### Alternative Booting
For network booting and self-built media, see [Production Notes]({{< relref "prodnotes#alternative-booting" >}}).
There are installation methods specific to specific platforms, such as pre-built AMIs for AWS - check the specific [Installation Guides]({{< relref "../talos-guides/install/" >}}).)
## Define the Kubernetes Endpoint
In order to configure Kubernetes, Talos needs to know
what the endpoint of the Kubernetes API Server will be.
Because we are only creating a single control plane node in this guide, we can use the control plane node directly as the Kubernetes API endpoint.
Identify the IP address or DNS name of the control plane node that was booted above, and convert it to a fully-qualified HTTPS URL endpoint address for the Kubernetes API Server which (by default) runs on port 6443.
The endpoint should be formatted like:
- `https://192.168.0.2:6443`
- `https://kube.mycluster.mydomain.com:6443`
> NOTE: For a production cluster, you should have three control plane nodes, and have the endpoint allocate traffic to all three - see [Production Notes]({{< relref "prodnotes#control-plane-nodes" >}}).
## Accessing the Talos API
Administrative tasks are performed by calling the Talos API (usually with `talosctl`) on Talos Linux control plane nodes - thus, ensure your control
plane node is directly reachable on TCP port 50000 from the workstation where you run the `talosctl` client.
This may require changing firewall rules or cloud provider access-lists.
For production configurations, see [Production Notes]({{< relref "prodnotes#decide-the-kubernetes-endpoint" >}}).
## Configure Talos Linux
When Talos boots without a configuration, such as when booting off the Talos ISO, it
enters maintenance mode and waits for a configuration to be provided.
> A configuration can be passed in on boot via kernel parameters or metadata servers.
See [Production Notes]({{< relref "prodnotes#configure-talos" >}}).
Unlike traditional Linux, Talos Linux is *not* configured by SSHing to the server and issuing commands.
Instead, the entire state of the machine is defined by a `machine config` file which is passed to the server.
This allows machines to be managed in a declarative way, and lends itself to GitOps and modern operations paradigms.
The state of a machine is completely defined by, and can be reproduced from, the machine configuration file.
To generate the machine configurations for a cluster, run this command on the workstation where you installed `talosctl`:
```sh
talosctl gen config <cluster-name> <cluster-endpoint>
```
`cluster-name` is an arbitrary name, used as a label in your local client configuration.
It should be unique in the configuration on your local workstation.
`cluster-endpoint` is the Kubernetes Endpoint you constructed from the control plane node's IP address or DNS name above.
It should be a complete URL, with `https://`
and port.
For example:
```sh
$ talosctl gen config mycluster https://192.168.0.2:6443
generating PKI and tokens
created /Users/taloswork/controlplane.yaml
created /Users/taloswork/worker.yaml
created /Users/taloswork/talosconfig
```
When you run this command, three files are created in your current
directory:
- `controlplane.yaml`
- `worker.yaml`
- `talosconfig`
The `.yaml` files are Machine Configs: they describe everything from what disk Talos should be installed on, to network settings.
The `controlplane.yaml` file also describes how Talos should form a Kubernetes cluster.
The `talosconfig` file is your local client configuration file, used to connect to and authenticate access to the cluster.
### Controlplane and Worker
The two types of Machine Configs correspond to the two roles of Talos nodes, control plane nodes (which run both the Talos and Kubernetes control planes) and worker nodes (which run the workloads).
The main difference between Controlplane Machine Config files and Worker Machine Config files is that the former contains information about how to form the
Kubernetes cluster.
### Modifying the Machine configs
The generated Machine Configs have defaults that work for most cases.
They use DHCP for interface configuration, and install to `/dev/sda`.
Sometimes, you will need to modify the generated files to work with your systems.
A common case is needing to change the installation disk.
If you try to to apply the machine config to a node, and get an error like the below, you need to specify a different installation disk:
```sh
$ talosctl apply-config --insecure -n 192.168.0.2 --file controlplane.yaml
error applying new configuration: rpc error: code = InvalidArgument desc = configuration validation failed: 1 error occurred:
* specified install disk does not exist: "/dev/sda"
```
You can verify which disks your nodes have by using the `talosctl disks --insecure` command.
> Insecure mode is needed at this point as the PKI infrastructure has not yet been set up.
For example, the `talosctl disks` command below shows that the system has a `vda` drive, not an `sda`:
```sh
$ talosctl -n 192.168.0.2 disks --insecure
DEV MODEL SERIAL TYPE UUID WWID MODALIAS NAME SIZE BUS_PATH
/dev/vda - - HDD - - virtio:d00000002v00001AF4 - 69 GB /pci0000:00/0000:00:06.0/virtio2/
```
In this case, you would modify the `controlplane.yaml` and `worker.yaml` files and edit the line:
```yaml
install:
disk: /dev/sda # The disk used for installations.
```
to reflect `vda` instead of `sda`.
> For information on customizing your machine configurations (such as to specify the version of Kubernetes), using [machine configuration patches]({{< relref "../talos-guides/configuration/patching" >}}), or customizing configurations for individual machines (such as setting static IP addresses), see the [Production Notes]({{< relref "prodnotes#customizing-machine-configuration" >}}).
## Understand talosctl, endpoints and nodes
It is important to understand the concept of `endpoints` and `nodes`.
In short: `endpoints` are where `talosctl` *sends* commands to, but the command *operates* on the specified `nodes`.
The endpoint will forward the command to the nodes, if needed.
### Endpoints
Endpoints are the IP addresses of control plane nodes, to which the `talosctl` client directly talks.
Endpoints automatically proxy requests destined to another node in the cluster.
This means that you only need access to the control plane nodes in order to manage the rest of the cluster.
You can pass in `--endpoints <Control Plane IP Address>` or `-e <Control Plane IP Address>` to the current `talosctl` command.
In this tutorial setup, the endpoint will always be the single control plane node.
### Nodes
Nodes are the target(s) you wish to perform the operation on.
> When specifying nodes, the IPs and/or hostnames are *as seen by the endpoint servers*, not as from the client.
> This is because all connections are proxied through the endpoints.
You may provide `-n` or `--nodes` to any `talosctl` command to supply the node or (comma-separated) nodes on which you wish to perform the operation.
For example, to see the containers running on node 192.168.0.200, by routing the `containers` command through the control plane endpoint 192.168.0.2:
```bash
talosctl -e 192.168.0.2 -n 192.168.0.200 containers
```
To see the etcd logs on *both* nodes 192.168.0.10 and 192.168.0.11:
```bash
talosctl -e 192.168.0.2 -n 192.168.0.10,192.168.0.11 logs etcd
```
For a more in-depth discussion of Endpoints and Nodes, please see [talosctl]({{< relref "../learn-more/talosctl" >}}).
### Apply Configuration
To apply the Machine Configs, you need to know the machines' IP addresses.
Talos prints the IP addresses of the machines on the console during the boot process:
```log
[4.605369] [talos] task loadConfig (1/1): this machine is reachable at:
[4.607358] [talos] task loadConfig (1/1): 192.168.0.2
```
If you do not have console access, the IP address may also be discoverable from your DHCP server.
Once you have the IP address, you can then apply the correct configuration.
Apply the `controlplane.yaml` file to the control plane node, and the `worker.yaml` file to all the worker node(s).
```sh
talosctl apply-config --insecure \
--nodes 192.168.0.2 \
--file controlplane.yaml
```
The `--insecure` flag is necessary because the PKI infrastructure has not yet been made available to the node.
Note: the connection *will* be encrypted, but not authenticated.
When using the `--insecure` flag, it is not necessary to specify an endpoint.
### Default talosconfig configuration file
You reference which configuration file to use by the `--talosconfig` parameter:
```sh
talosctl --talosconfig=./talosconfig \
--nodes 192.168.0.2 -e 192.168.0.2 version
```
Note that `talosctl` comes with tooling to help you integrate and merge this configuration into the default `talosctl` configuration file.
See [Production Notes]({{< relref "prodnotes#default-configuration-file" >}}) for more information.
While getting started, a common mistake is referencing a configuration context for a different cluster, resulting in authentication or connection failures.
Thus it is recommended to explicitly pass in the configuration file while becoming familiar with Talos Linux.
## Kubernetes Bootstrap
Bootstrapping your Kubernetes cluster with Talos is as simple as calling `talosctl bootstrap` on your control plane node:
```sh
talosctl bootstrap --nodes 192.168.0.2 --endpoints 192.168.0.2 \
--talosconfig=./talosconfig
```
>The bootstrap operation should only be called **ONCE** on a **SINGLE** control plane node.
(If you have multiple control plane nodes, it doesn't matter which one you issue the bootstrap command against.)
At this point, Talos will form an `etcd` cluster, and start the Kubernetes control plane components.
After a few moments, you will be able to download your Kubernetes client configuration and get started:
```sh
talosctl kubeconfig --nodes 192.168.0.2 --endpoints 192.168.0.2
```
Running this command will add (merge) you new cluster into your local Kubernetes configuration.
If you would prefer the configuration to *not* be merged into your default Kubernetes configuration file, pass in a filename:
```sh
talosctl kubeconfig alternative-kubeconfig --nodes 192.168.0.2 --endpoints 192.168.0.2
```
You should now be able to connect to Kubernetes and see your nodes:
```sh
kubectl get nodes
```
And use talosctl to explore your cluster:
```sh
talosctl --nodes 192.168.0.2 --endpoints 192.168.0.2 health \
--talosconfig=./talosconfig
talosctl --nodes 192.168.0.2 --endpoints 192.168.0.2 dashboard \
--talosconfig=./talosconfig
```
For a list of all the commands and operations that `talosctl` provides, see the [CLI reference]({{< relref "../reference/cli/#talosctl" >}}).

View File

@ -0,0 +1,369 @@
---
title: Production Clusters
weight: 30
description: "Recommendations for setting up a Talos Linux cluster in production."
---
This document explains recommendations for running Talos Linux in production.
## Acquire the installation image
### Alternative Booting
For network booting and self-built media, you can use the published kernel and initramfs images:
- X86: [vmlinuz-amd64](https://github.com/siderolabs/talos/releases/download/{{< release >}}/vmlinuz-amd64) [initramfs-amd64.xz](https://github.com/siderolabs/talos/releases/download/{{< release >}}/initramfs-amd64.xz)
- ARM64: [vmlinuz-arm64](https://github.com/siderolabs/talos/releases/download/{{< release >}}/vmlinuz-arm64) [initramfs-arm64.xz](https://github.com/siderolabs/talos/releases/download/{{< release >}}/initramfs-arm64.xz)
Note that to use alternate booting, there are a number of required kernel parameters.
Please see the [kernel]({{< relref "../reference/kernel" >}}) docs for more information.
## Control plane nodes
For a production, highly available Kubernetes cluster, it is recommended to use three control plane nodes.
Using five nodes can provide greater fault tolerance, but imposes more replication overhead and can result in worse performance.
Boot all three control plane nodes at this point.
They will boot Talos Linux, and come up in maintenance mode, awaiting a configuration.
## Decide the Kubernetes Endpoint
The Kubernetes API Server endpoint, in order to be highly available, should be configured in a way that uses all available control plane nodes.
There are three common ways to do this: using a load-balancer, using Talos Linux's built in VIP functionality, or using multiple DNS records.
### Dedicated Load-balancer
If you are using a cloud provider or have your own load-balancer
(such as HAProxy, Nginx reverse proxy, or an F5 load-balancer), a dedicated load balancer is a natural choice.
Create an appropriate frontend for the endpoint, listening on TCP port 6443, and point the backends at the addresses of each of the Talos control plane nodes.
Your Kubernetes endpoint will be the IP address or DNS name of the load balancer front end, with the port appended (e.g. https://myK8s.mydomain.io:6443).
> Note: an HTTP load balancer can't be used, as Kubernetes API server does TLS termination and mutual TLS authentication.
### Layer 2 VIP Shared IP
Talos has integrated support for serving Kubernetes from a shared/virtual IP address.
This requires Layer 2 connectivity between control plane nodes.
Choose an unused IP address on the same subnet as the control plane nodes for the VIP.
For instance, if your control plane node IPs are:
- 192.168.0.10
- 192.168.0.11
- 192.168.0.12
you could choose the IP `192.168.0.15` as your VIP IP address.
(Make sure that `192.168.0.15` is not used by any other machine and is excluded from DHCP ranges.)
Once chosen, form the full HTTPS URL from this IP:
```url
https://192.168.0.15:6443
```
If you create a DNS record for this IP, note you will need to use the IP address itself, not the DNS name, to configure the shared IP (`machine.network.interfaces[].vip.ip`) in the Talos configuration.
After the machine configurations are generated, you will want to edit the `controlplane.yaml` file to activate the VIP:
```yaml
machine:
network:
interfaces:
- interface: enp2s0
dhcp: true
vip:
ip: 192.168.0.15
```
For more information about using a shared IP, see the related
[Guide]({{< relref "../talos-guides/network/vip" >}})
### DNS records
Add multiple A or AAAA records (one for each control plane node) to a DNS name.
For instance, you could add:
```dns
kube.cluster1.mydomain.com IN A 192.168.0.10
kube.cluster1.mydomain.com IN A 192.168.0.11
kube.cluster1.mydomain.com IN A 192.168.0.12
```
where the IP addresses are those of the control plane nodes.
Then, your endpoint would be:
```url
https://kube.cluster1.mydomain.com:6443
```
## Multihoming
If your machines are multihomed, i.e., they have more than one IPv4 and/or IPv6 addresss other than loopback, then additional configuration is required.
A point to note is that the machines may become multihomed via privileged workloads.
### Multihoming and etcd
The `etcd` cluster needs to establish a mesh of connections among the members.
It is done using the so-called advertised address - each node learns the others' addresses as they are advertised.
It is crucial that these IP addresses are stable, i.e., that each node always advertises the same IP address.
Moreover, it is beneficial to control them to establish the correct routes between the members and, e.g., avoid congested paths.
In Talos, these addresses are controlled using the `cluster.etcd.advertisedSubnets` configuration key.
### Multihoming and kubelets
Stable IP addressing for kubelets (i.e., nodeIP) is not strictly necessary but highly recommended as it ensures that, e.g., kube-proxy and CNI routing take the desired routes.
Analogously to etcd, for kubelets this is controlled via `machine.kubelet.nodeIP.validSubnets`.
### Example
Let's assume that we have a cluster with two networks:
- public network
- private network `192.168.0.0/16`
We want to use the private network for etcd and kubelet communication:
```yaml
machine:
kubelet:
nodeIP:
validSubnets:
- 192.168.0.0/16
#...
cluster:
etcd:
advertisedSubnets: # listenSubnets defaults to advertisedSubnets if not set explicitly
- 192.168.0.0/16
```
This way we ensure that the `etcd` cluster will use the private network for communication and the kubelets will use the private network for communication with the control plane.
## Load balancing the Talos API
The `talosctl` tool provides built-in client-side load-balancing across control plane nodes, so usually you do not need to configure a load balancer for the Talos API.
However, if the control plane nodes are *not* directly reachable from the workstation where you run `talosctl`, then configure a load balancer to forward TCP port 50000 to the control plane nodes.
> Note: Because the Talos Linux API uses gRPC and mutual TLS, it cannot be proxied by a HTTP/S proxy, but only by a TCP load balancer.
If you create a load balancer to forward the Talos API calls, the load balancer IP or hostname will be used as the `endpoint` for `talosctl`.
Add the load balancer IP or hostname to the `.machine.certSANs` field of the machine configuration file.
> Do *not* use Talos Linux's built in VIP function for accessing the Talos API.
> In the event of an error in `etcd`, the VIP will not function, and you will not be able to access the Talos API to recover.
## Configure Talos
In many installation methods, a configuration can be passed in on boot.
For example, Talos can be booted with the `talos.config` kernel
argument set to an HTTP(s) URL from which it should receive its
configuration.
Where a PXE server is available, this is much more efficient than
manually configuring each node.
If you do use this method, note that Talos requires a number of other
kernel commandline parameters.
See [required kernel parameters]({{< relref "../reference/kernel" >}}).
Similarly, if creating [EC2 kubernetes clusters]({{< relref "../talos-guides/install/cloud-platforms/aws/" >}}), the configuration file can be passed in as `--user-data` to the `aws ec2 run-instances` command.
See generally the [Installation Guide]({{< relref "../talos-guides/install" >}}) for the platform being deployed.
### Separating out secrets
When generating the configuration files for a Talos Linux cluster, it is recommended to start with generating a secrets bundle which should be saved in a secure location.
This bundle can be used to generate machine or client configurations at any time:
```sh
talosctl gen secrets -o secrets.yaml
```
> The `secrets.yaml` can also be extracted from the existing controlplane machine configuration with
> `talosctl gen secrets --from-controlplane-config controlplane.yaml -o secrets.yaml` command.
Now, we can generate the machine configuration for each node:
```sh
talosctl gen config --with-secrets secrets.yaml <cluster-name> <cluster-endpoint>
```
Here, `cluster-name` is an arbitrary name for the cluster, used
in your local client configuration as a label.
It should be unique in the configuration on your local workstation.
The `cluster-endpoint` is the Kubernetes Endpoint you
selected from above.
This is the Kubernetes API URL, and it should be a complete URL, with `https://`
and port.
(The default port is `6443`, but you may have configured your load balancer to forward a different port.)
For example:
```sh
$ talosctl gen config --with-secrets secrets.yaml my-cluster https://192.168.64.15:6443
generating PKI and tokens
created controlplane.yaml
created worker.yaml
created talosconfig
```
### Customizing Machine Configuration
The generated machine configuration provides sane defaults for most cases, but can be modified to fit specific needs.
Some machine configuration options are available as flags for the `talosctl gen config` command,
for example setting a specific Kubernetes version:
```sh
talosctl gen config --with-secrets secrets.yaml --kubernetes-version 1.25.4 my-cluster https://192.168.64.15:6443
```
Other modifications are done with [machine configuration patches]({{< relref "../talos-guides/configuration/patching" >}}).
Machine configuration patches can be applied with `talosctl gen config` command:
```sh
talosctl gen config --with-secrets secrets.yaml --config-patch-control-plane @cni.patch my-cluster https://192.168.64.15:6443
```
> Note: `@cni.patch` means that the patch is read from a file named `cni.patch`.
#### Machine Configs as Templates
Individual machines may need different settings: for instance, each may have a
different [static IP address]({{< relref "../advanced/advanced-networking/#static-addressing" >}}).
When different files are needed for machines of the same type, there are two supported flows:
1. Use the `talosctl gen config` command to generate a template, and then patch
the template for each machine with `talosctl machineconfig patch`.
2. Generate each machine configuration file separately with `talosctl gen config` while applying patches.
For example, given a machine configuration patch which sets the static machine hostname:
```yaml
# worker1.patch
machine:
network:
hostname: worker1
```
Either of the following commands will generate a worker machine configuration file with the hostname set to `worker1`:
```bash
$ talosctl gen config --with-secrets secrets.yaml my-cluster https://192.168.64.15:6443
created /Users/taloswork/controlplane.yaml
created /Users/taloswork/worker.yaml
created /Users/taloswork/talosconfig
$ talosctl machineconfig patch worker.yaml --patch @worker1.patch --output worker1.yaml
```
```sh
talosctl gen config --with-secrets secrets.yaml --config-patch-worker @worker1.patch --output-types worker -o worker1.yaml my-cluster https://192.168.64.15:6443
```
### Apply Configuration while validating the node identity
If you have console access you can extract the server certificate fingerprint and use it for an additional layer of validation:
```sh
talosctl apply-config --insecure \
--nodes 192.168.0.2 \
--cert-fingerprint xA9a1t2dMxB0NJ0qH1pDzilWbA3+DK/DjVbFaJBYheE= \
--file cp0.yaml
```
Using the fingerprint allows you to be sure you are sending the configuration to the correct machine, but is completely optional.
After the configuration is applied to a node, it will reboot.
Repeat this process for each of the nodes in your cluster.
## Further details about talosctl, endpoints and nodes
### Endpoints
When passed multiple endpoints, `talosctl` will automatically load balance requests to, and fail over between, all endpoints.
You can pass in `--endpoints <IP Address1>,<IP Address2>` as a comma separated list of IP/DNS addresses to the current `talosctl` command.
You can also set the `endpoints` in your `talosconfig`, by calling `talosctl config endpoint <IP Address1> <IP Address2>`.
Note: these are space separated, not comma separated.
As an example, if the IP addresses of our control plane nodes are:
- 192.168.0.2
- 192.168.0.3
- 192.168.0.4
We would set those in the `talosconfig` with:
```sh
talosctl --talosconfig=./talosconfig \
config endpoint 192.168.0.2 192.168.0.3 192.168.0.4
```
### Nodes
The node is the target you wish to perform the API call on.
It is possible to set a default set of nodes in the `talosconfig` file, but our recommendation is to explicitly pass in the node or nodes to be operated on with each `talosctl` command.
For a more in-depth discussion of Endpoints and Nodes, please see [talosctl]({{< relref "../learn-more/talosctl" >}}).
### Default configuration file
You can reference which configuration file to use directly with the `--talosconfig` parameter:
```sh
talosctl --talosconfig=./talosconfig \
--nodes 192.168.0.2 version
```
However, `talosctl` comes with tooling to help you integrate and merge this configuration into the default `talosctl` configuration file.
This is done with the `merge` option.
```sh
talosctl config merge ./talosconfig
```
This will merge your new `talosconfig` into the default configuration file (`$XDG_CONFIG_HOME/talos/config.yaml`), creating it if necessary.
Like Kubernetes, the `talosconfig` configuration files has multiple "contexts" which correspond to multiple clusters.
The `<cluster-name>` you chose above will be used as the context name.
## Kubernetes Bootstrap
Bootstrapping your Kubernetes cluster by simply calling the `bootstrap` command against any of your control plane nodes (or the loadbalancer, if used for the Talos API endpoint).:
```sh
talosctl bootstrap --nodes 192.168.0.2
```
>The bootstrap operation should only be called **ONCE** and only on a **SINGLE** control plane node!
At this point, Talos will form an `etcd` cluster, generate all of the core Kubernetes assets, and start the Kubernetes control plane components.
After a few moments, you will be able to download your Kubernetes client configuration and get started:
```sh
talosctl kubeconfig
```
Running this command will add (merge) you new cluster into your local Kubernetes configuration.
If you would prefer the configuration to *not* be merged into your default Kubernetes configuration file, pass in a filename:
```sh
talosctl kubeconfig alternative-kubeconfig
```
You should now be able to connect to Kubernetes and see your nodes:
```sh
kubectl get nodes
```
And use talosctl to explore your cluster:
```sh
talosctl -n <NODEIP> dashboard
```
For a list of all the commands and operations that `talosctl` provides, see the [CLI reference]({{< relref "../reference/cli/#talosctl" >}}).

View File

@ -0,0 +1,54 @@
---
title: Quickstart
weight: 20
description: "A short guide on setting up a simple Talos Linux cluster locally with Docker."
---
## Local Docker Cluster
The easiest way to try Talos is by using the CLI (`talosctl`) to create a cluster on a machine with `docker` installed.
### Prerequisites
#### `talosctl`
Download `talosctl`:
```bash
curl -sL https://talos.dev/install | sh
```
#### `kubectl`
Download `kubectl` via one of methods outlined in the [documentation](https://kubernetes.io/docs/tasks/tools/install-kubectl/).
### Create the Cluster
Now run the following:
```bash
talosctl cluster create
```
You can explore using Talos API commands:
```bash
talosctl dashboard --nodes 10.5.0.2
```
Verify that you can reach Kubernetes:
```bash
$ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
talos-default-controlplane-1 Ready master 115s v{{< k8s_release >}} 10.5.0.2 <none> Talos ({{< release >}}) <host kernel> containerd://1.5.5
talos-default-worker-1 Ready <none> 115s v{{< k8s_release >}} 10.5.0.3 <none> Talos ({{< release >}}) <host kernel> containerd://1.5.5
```
### Destroy the Cluster
When you are all done, remove the cluster:
```bash
talosctl cluster destroy
```

View File

@ -0,0 +1,52 @@
---
title: Support Matrix
weight: 60
description: "Table of supported Talos Linux versions and respective platforms."
---
| Talos Version | 1.7 | 1.6 |
|----------------------------------------------------------------------------------------------------------------|------------------------------------|------------------------------------|
| Release Date | 2024-04-15 (TBD) | 2023-12-15 (1.6.0) |
| End of Community Support | 1.8.0 release (2024-08-15, TBD) | 1.7.0 release (2024-04-15) |
| Enterprise Support | [offered by Sidero Labs Inc.](https://www.siderolabs.com/support/) | [offered by Sidero Labs Inc.](https://www.siderolabs.com/support/) |
| Kubernetes | 1.30, 1.29, 1.28, 1.27, 1.26, 1.25 | 1.29, 1.28, 1.27, 1.26, 1.25, 1.24 |
| Architecture | amd64, arm64 | amd64, arm64 |
| **Platforms** | | |
| - cloud | AWS, GCP, Azure, Digital Ocean, Exoscale, Hetzner, OpenStack, Oracle Cloud, Scaleway, Vultr, Upcloud | AWS, GCP, Azure, Digital Ocean, Exoscale, Hetzner, OpenStack, Oracle Cloud, Scaleway, Vultr, Upcloud |
| - bare metal | x86: BIOS, UEFI, SecureBoot; arm64: UEFI, SecureBoot; boot: ISO, PXE, disk image | x86: BIOS, UEFI; arm64: UEFI; boot: ISO, PXE, disk image |
| - virtualized | VMware, Hyper-V, KVM, Proxmox, Xen | VMware, Hyper-V, KVM, Proxmox, Xen |
| - SBCs | Banana Pi M64, Jetson Nano, Libre Computer Board ALL-H3-CC, Nano Pi R4S, Pine64, Pine64 Rock64, Radxa ROCK Pi 4c, Raspberry Pi 4B, Raspberry Pi Compute Module 4 | Banana Pi M64, Jetson Nano, Libre Computer Board ALL-H3-CC, Nano Pi R4S, Pine64, Pine64 Rock64, Radxa ROCK Pi 4c, Raspberry Pi 4B, Raspberry Pi Compute Module 4 |
| - local | Docker, QEMU | Docker, QEMU |
| **Cluster API** | | |
| [CAPI Bootstrap Provider Talos](https://github.com/siderolabs/cluster-api-bootstrap-provider-talos) | >= 0.6.3 | >= 0.6.3 |
| [CAPI Control Plane Provider Talos](https://github.com/siderolabs/cluster-api-control-plane-provider-talos) | >= 0.5.4 | >= 0.5.4 |
| [Sidero](https://www.sidero.dev/) | >= 0.6.2 | >= 0.6.2 |
## Platform Tiers
* Tier 1: Automated tests, high-priority fixes.
* Tier 2: Tested from time to time, medium-priority bugfixes.
* Tier 3: Not tested by core Talos team, community tested.
### Tier 1
* Metal
* AWS
* GCP
### Tier 2
* Azure
* Digital Ocean
* OpenStack
* VMWare
### Tier 3
* Exoscale
* Hetzner
* nocloud
* Oracle Cloud
* Scaleway
* Vultr
* Upcloud

View File

@ -0,0 +1,71 @@
---
title: System Requirements
weight: 40
description: "Hardware requirements for running Talos Linux."
---
## Minimum Requirements
<table class="table-auto">
<thead>
<tr>
<th class="px-4 py-2">Role</th>
<th class="px-4 py-2">Memory</th>
<th class="px-4 py-2">Cores</th>
<th class="px-4 py-2">System Disk</th>
</tr>
</thead>
<tbody>
<tr>
<td class="border px-4 py-2">Control Plane</td>
<td class="border px-4 py-2">2 GiB</td>
<td class="border px-4 py-2">2</td>
<td class="border px-4 py-2">10 GiB</td>
</tr>
<tr class="bg-gray-100">
<td class="border px-4 py-2">Worker</td>
<td class="border px-4 py-2">1 GiB</td>
<td class="border px-4 py-2">1</td>
<td class="border px-4 py-2">10 GiB</td>
</tr>
</tbody>
</table>
## Recommended
<table class="table-auto">
<thead>
<tr>
<th class="px-4 py-2">Role</th>
<th class="px-4 py-2">Memory</th>
<th class="px-4 py-2">Cores</th>
<th class="px-4 py-2">System Disk</th>
</tr>
</thead>
<tbody>
<tr>
<td class="border px-4 py-2">Control Plane</td>
<td class="border px-4 py-2">4 GiB</td>
<td class="border px-4 py-2">4</td>
<td class="border px-4 py-2">100 GiB</td>
</tr>
<tr class="bg-gray-100">
<td class="border px-4 py-2">Worker</td>
<td class="border px-4 py-2">2 GiB</td>
<td class="border px-4 py-2">2</td>
<td class="border px-4 py-2">100 GiB</td>
</tr>
</tbody>
</table>
These requirements are similar to that of Kubernetes.
## Storage
Talos Linux itself only requires less than 100 MB of disk space, but the EPHEMERAL partition is used to store pulled images, container work directories, and so on.
Thus a minimum is 10 GiB of disk space is required.
100 GiB is desired.
Note, however, that because Talos Linux assumes complete control of the disk it is installed on, so that it can control the partition table for image based upgrades, you cannot partition the rest of the disk for use by workloads.
Thus it is recommended to install Talos Linux on a small, dedicated disk - using a Terabyte sized SSD for the Talos install disk would be wasteful.
Sidero Labs recommends having separate disks (apart from the Talos install disk) to be used for storage.

View File

@ -0,0 +1,426 @@
---
title: "Troubleshooting"
description: "Troubleshoot control plane and other failures for Talos Linux clusters."
aliases:
- ../guides/troubleshooting-control-plane
- ../advanced/troubleshooting-control-plane
---
<!-- markdownlint-disable MD026 -->
In this guide we assume that Talos is configured with default features enabled, such as [Discovery Service]({{< relref "../talos-guides/discovery" >}}) and [KubePrism]({{< relref "../kubernetes-guides/configuration/kubeprism" >}}).
If these features are disabled, some of the troubleshooting steps may not apply or may need to be adjusted.
This guide is structured so that it can be followed step-by-step, skip sections which are not relevant to your issue.
## Network Configuration
As Talos Linux is an API-based operating system, it is important to have networking configured so that the API can be accessed.
Some information can be gathered from the [Interactive Dashboard]({{< relref "../talos-guides/interactive-dashboard " >}}) which is available on the machine console.
When running in the cloud the networking should be configured automatically.
Whereas when running on bare-metal it may need more specific configuration, see [networking `metal` configuration guide]({{< relref "../talos-guides/install/bare-metal-platforms/network-config" >}}).
## Talos API
The Talos API runs on [port 50000]({{< relref "../learn-more/talos-network-connectivity" >}}).
Control plane nodes should always serve the Talos API, while worker nodes require access to the control plane nodes to issue TLS certificates for the workers.
### Firewall Issues
Make sure that the firewall is not blocking port 50000, and [communication]({{< relref "../learn-more/talos-network-connectivity" >}}) on ports 50000/50001 inside the cluster.
### Client Configuration Issues
Make sure to use correct `talosconfig` client configuration file matching your cluster.
See [getting started]({{< relref "./getting-started" >}}) for more information.
The most common issue is that `talosctl gen config` writes `talosconfig` to the file in the current directory, while `talosctl` by default picks up the configuration from the default location (`~/.talos/config`).
The path to the configuration file can be specified with `--talosconfig` flag to `talosctl`.
### Conflict on Kubernetes and Host Subnets
If `talosctl` returns an error saying that certificate IPs are empty, it might be due to a conflict between Kubernetes and host subnets.
The Talos API runs on the host network, but it automatically excludes Kubernetes pod & network subnets from the useable set of addresses.
Talos default machine configuration specifies the following Kubernetes pod and subnet IPv4 CIDRs: `10.244.0.0/16` and `10.96.0.0/12`.
If the host network is configured with one of these subnets, change the machine configuration to use a different subnet.
### Wrong Endpoints
The `talosctl` CLI connects to the Talos API via the specified endpoints, which should be a list of control plane machine addresses.
The client will automatically retry on other endpoints if there are unavailable endpoints.
Worker nodes should not be used as the endpoint, as they are not able to forward request to other nodes.
The [VIP]({{< relref "../talos-guides/network/vip" >}}) should never be used as Talos API endpoint.
### TCP Loadbalancer
When using a TCP loadbalancer, make sure the loadbalancer endpoint is included in the `.machine.certSANs` list in the machine configuration.
## System Requirements
If minimum [system requirements]({{< relref "./system-requirements" >}}) are not met, this might manifest itself in various ways, such as random failures when starting services, or failures to pull images from the container registry.
## Running Health Checks
Talos Linux provides a set of basic health checks with `talosctl health` command which can be used to check the health of the cluster.
In the default mode, `talosctl health` uses information from the [discovery]({{< relref "../talos-guides/discovery" >}}) to get the information about cluster members.
This can be overridden with command line flags `--control-plane-nodes` and `--worker-nodes`.
## Gathering Logs
While the logs and state of the system can be queried via the Talos API, it is often useful to gather the logs from all nodes in the cluster, and analyze them offline.
The `talosctl support` command can be used to gather logs and other information from the nodes specified with `--nodes` flag (multiple nodes are supported).
## Discovery and Cluster Membership
Talos Linux uses [Discovery Service]({{< relref "../talos-guides/discovery" >}}) to discover other nodes in the cluster.
The list of members on each machine should be consistent: `talosctl -n <IP> get members`.
### Some Members are Missing
Ensure connectivity to the discovery service (default is `discovery.talos.dev:443`), and that the discovery registry is not disabled.
### Duplicate Members
Don't use same base secrets to generate machine configuration for multiple clusters, as some secrets are used to identify members of the same cluster.
So if the same machine configuration (or secrets) are used to repeatedly create and destroy clusters, the discovery service will see the same nodes as members of different clusters.
### Removed Members are Still Present
Talos Linux removes itself from the discovery service when it is [reset]({{< relref "../talos-guides/resetting-a-machine" >}}).
If the machine was not reset, it might show up as a member of the cluster for the maximum TTL of the discovery service (30 minutes), and after that it will be automatically removed.
## `etcd` Issues
`etcd` is the distributed key-value store used by Kubernetes to store its state.
Talos Linux provides automation to manage `etcd` members running on control plane nodes.
If `etcd` is not healthy, the Kubernetes API server will not be able to function correctly.
It is always recommended to run an odd number of `etcd` members, as with 3 or more members it provides fault tolerance for less than quorum member failures.
Common troubleshooting steps:
- check `etcd` service state with `talosctl -n IP service etcd` for each control plane node
- check `etcd` membership on each control plane node with `talosctl -n IP etcd member list`
- check `etcd` logs with `talosctl -n IP logs etcd`
- check `etcd` alarms with `talosctl -n IP etcd alarm list`
### All `etcd` Services are Stuck in `Pre` State
Make sure that a single member was [bootstrapped]({{< relref "./getting-started#kubernetes-bootstrap" >}}).
Check that the machine is able to pull the `etcd` container image, check `talosctl dmesg` for messages starting with `retrying:` prefix.
### Some `etcd` Services are Stuck in `Pre` State
Make sure traffic is not blocked on port 2380 between controlplane nodes.
Check that `etcd` quorum is not lost.
Check that all control plane nodes are reported in `talosctl get members` output.
### `etcd` Reports and Alarm
See [etcd maintenance]({{< relref "../advanced/etcd-maintenance" >}}) guide.
### `etcd` Quorum is Lost
See [disaster recovery]({{< relref "../advanced/disaster-recovery" >}}) guide.
### Other Issues
`etcd` will only run on control plane nodes.
If a node is designated as a worker node, you should not expect `etcd` to be running on it.
When a node boots for the first time, the `etcd` data directory (`/var/lib/etcd`) is empty, and it will only be populated when `etcd` is launched.
If the `etcd` service is crashing and restarting, check its logs with `talosctl -n <IP> logs etcd`.
The most common reasons for crashes are:
- wrong arguments passed via `extraArgs` in the configuration;
- booting Talos on non-empty disk with an existing Talos installation, `/var/lib/etcd` contains data from the old cluster.
## `kubelet` and Kubernetes Node Issues
The `kubelet` service should be running on all Talos nodes, and it is responsible for running Kubernetes pods,
static pods (including control plane components), and registering the node with the Kubernetes API server.
If the `kubelet` doesn't run on a control plane node, it will block the control plane components from starting.
The node will not be registered in Kubernetes until the Kubernetes API server is up and initial Kubernetes manifests are applied.
### `kubelet` is not running
Check that `kubelet` image is available (`talosctl image ls --namespace system`).
Check `kubelet` logs with `talosctl -n IP logs kubelet` for startup errors:
- make sure Kubernetes version is [supported]({{< relref "./support-matrix" >}}) with this Talos release
- make sure `kubelet` extra arguments and extra configuration supplied with Talos machine configuration is valid
### Talos Complains about Node Not Found
`kubelet` hasn't yet registered the node with the Kubernetes API server, this is expected during initial cluster bootstrap, the error will go away.
If the message persists, check Kubernetes API health.
The Kubernetes controller manager (`kube-controller-manager`) is responsible for monitoring the certificate
signing requests (CSRs) and issuing certificates for each of them.
The `kubelet` is responsible for generating and submitting the CSRs for its
associated node.
The state of any CSRs can be checked with `kubectl get csr`:
```bash
$ kubectl get csr
NAME AGE SIGNERNAME REQUESTOR CONDITION
csr-jcn9j 14m kubernetes.io/kube-apiserver-client-kubelet system:bootstrap:q9pyzr Approved,Issued
csr-p6b9q 14m kubernetes.io/kube-apiserver-client-kubelet system:bootstrap:q9pyzr Approved,Issued
csr-sw6rm 14m kubernetes.io/kube-apiserver-client-kubelet system:bootstrap:q9pyzr Approved,Issued
csr-vlghg 14m kubernetes.io/kube-apiserver-client-kubelet system:bootstrap:q9pyzr Approved,Issued
```
### `kubectl get nodes` Reports Wrong Internal IP
Configure the correct internal IP address with [`.machine.kubelet.nodeIP`]({{< relref "../reference/configuration/v1alpha1/config#Config.machine.kubelet.nodeIP" >}})
### `kubectl get nodes` Reports Wrong External IP
Talos Linux doesn't manage the external IP, it is managed with the Kubernetes Cloud Controller Manager.
### `kubectl get nodes` Reports Wrong Node Name
By default, the Kubernetes node name is derived from the hostname.
Update the hostname using the machine configuration, cloud configuration, or via DHCP server.
### Node Is Not Ready
A Node in Kubernetes is marked as `Ready` only once its CNI is up.
It takes a minute or two for the CNI images to be pulled and for the CNI to start.
If the node is stuck in this state for too long, check CNI pods and logs with `kubectl`.
Usually, CNI-related resources are created in `kube-system` namespace.
For example, for the default Talos Flannel CNI:
```bash
$ kubectl -n kube-system get pods
NAME READY STATUS RESTARTS AGE
...
kube-flannel-25drx 1/1 Running 0 23m
kube-flannel-8lmb6 1/1 Running 0 23m
kube-flannel-gl7nx 1/1 Running 0 23m
kube-flannel-jknt9 1/1 Running 0 23m
...
```
### Duplicate/Stale Nodes
Talos Linux doesn't remove Kubernetes nodes automatically, so if a node is removed from the cluster, it will still be present in Kubernetes.
Remove the node from Kubernetes with `kubectl delete node <node-name>`.
### Talos Complains about Certificate Errors on `kubelet` API
This error might appear during initial cluster bootstrap, and it will go away once the Kubernetes API server is up and the node is registered.
By default configuration, `kubelet` issues a self-signed server certificate, but when `rotate-server-certificates` feature is enabled,
`kubelet` issues its certificate using `kube-apiserver`.
Make sure the `kubelet` CSR is approved by the Kubernetes API server.
In either case, this error is not critical, as it only affects reporting of the pod status to Talos Linux.
## Kubernetes Control Plane
The Kubernetes control plane consists of the following components:
- `kube-apiserver` - the Kubernetes API server
- `kube-controller-manager` - the Kubernetes controller manager
- `kube-scheduler` - the Kubernetes scheduler
Optionally, `kube-proxy` runs as a DaemonSet to provide pod-to-service communication.
`coredns` provides name resolution for the cluster.
CNI is not part of the control plane, but it is required for Kubernetes pods using pod networking.
Troubleshooting should always start with `kube-apiserver`, and then proceed to other components.
Talos Linux configures `kube-apiserver` to talk to the `etcd` running on the same node, so `etcd` must be healthy before `kube-apiserver` can start.
The `kube-controller-manager` and `kube-scheduler` are configured to talk to the `kube-apiserver` on the same node, so they will not start until `kube-apiserver` is healthy.
### Control Plane Static Pods
Talos should generate the static pod definitions for the Kubernetes control plane
as resources:
```bash
$ talosctl -n <IP> get staticpods
NODE NAMESPACE TYPE ID VERSION
172.20.0.2 k8s StaticPod kube-apiserver 1
172.20.0.2 k8s StaticPod kube-controller-manager 1
172.20.0.2 k8s StaticPod kube-scheduler 1
```
Talos should report that the static pod definitions are rendered for the `kubelet`:
```bash
$ talosctl -n <IP> dmesg | grep 'rendered new'
172.20.0.2: user: warning: [2023-04-26T19:17:52.550527204Z]: [talos] rendered new static pod {"component": "controller-runtime", "controller": "k8s.StaticPodServerController", "id": "kube-apiserver"}
172.20.0.2: user: warning: [2023-04-26T19:17:52.552186204Z]: [talos] rendered new static pod {"component": "controller-runtime", "controller": "k8s.StaticPodServerController", "id": "kube-controller-manager"}
172.20.0.2: user: warning: [2023-04-26T19:17:52.554607204Z]: [talos] rendered new static pod {"component": "controller-runtime", "controller": "k8s.StaticPodServerController", "id": "kube-scheduler"}
```
If the static pod definitions are not rendered, check `etcd` and `kubelet` service health (see above)
and the controller runtime logs (`talosctl logs controller-runtime`).
### Control Plane Pod Status
Initially the `kube-apiserver` component will not be running, and it takes some time before it becomes fully up
during bootstrap (image should be pulled from the Internet, etc.)
The status of the control plane components on each of the control plane nodes can be checked with `talosctl containers -k`:
```bash
$ talosctl -n <IP> containers --kubernetes
NODE NAMESPACE ID IMAGE PID STATUS
172.20.0.2 k8s.io kube-system/kube-apiserver-talos-default-controlplane-1 registry.k8s.io/pause:3.2 2539 SANDBOX_READY
172.20.0.2 k8s.io └─ kube-system/kube-apiserver-talos-default-controlplane-1:kube-apiserver:51c3aad7a271 registry.k8s.io/kube-apiserver:v{{< k8s_release >}} 2572 CONTAINER_RUNNING
```
The logs of the control plane components can be checked with `talosctl logs --kubernetes` (or with `-k` as a shorthand):
```bash
talosctl -n <IP> logs -k kube-system/kube-apiserver-talos-default-controlplane-1:kube-apiserver:51c3aad7a271
```
If the control plane component reports error on startup, check that:
- make sure Kubernetes version is [supported]({{< relref "./support-matrix" >}}) with this Talos release
- make sure extra arguments and extra configuration supplied with Talos machine configuration is valid
### Kubernetes Bootstrap Manifests
As part of the bootstrap process, Talos injects bootstrap manifests into Kubernetes API server.
There are two kinds of these manifests: system manifests built-in into Talos and extra manifests downloaded (custom CNI, extra manifests in the machine config):
```bash
$ talosctl -n <IP> get manifests
NODE NAMESPACE TYPE ID VERSION
172.20.0.2 controlplane Manifest 00-kubelet-bootstrapping-token 1
172.20.0.2 controlplane Manifest 01-csr-approver-role-binding 1
172.20.0.2 controlplane Manifest 01-csr-node-bootstrap 1
172.20.0.2 controlplane Manifest 01-csr-renewal-role-binding 1
172.20.0.2 controlplane Manifest 02-kube-system-sa-role-binding 1
172.20.0.2 controlplane Manifest 03-default-pod-security-policy 1
172.20.0.2 controlplane Manifest 05-https://docs.projectcalico.org/manifests/calico.yaml 1
172.20.0.2 controlplane Manifest 10-kube-proxy 1
172.20.0.2 controlplane Manifest 11-core-dns 1
172.20.0.2 controlplane Manifest 11-core-dns-svc 1
172.20.0.2 controlplane Manifest 11-kube-config-in-cluster 1
```
Details of each manifest can be queried by adding `-o yaml`:
```bash
$ talosctl -n <IP> get manifests 01-csr-approver-role-binding --namespace=controlplane -o yaml
node: 172.20.0.2
metadata:
namespace: controlplane
type: Manifests.kubernetes.talos.dev
id: 01-csr-approver-role-binding
version: 1
phase: running
spec:
- apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: system-bootstrap-approve-node-client-csr
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:certificates.k8s.io:certificatesigningrequests:nodeclient
subjects:
- apiGroup: rbac.authorization.k8s.io
kind: Group
name: system:bootstrappers
```
### Other Control Plane Components
Once the Kubernetes API server is up, other control plane components issues can be troubleshooted with `kubectl`:
```shell
kubectl get nodes -o wide
kubectl get pods -o wide --all-namespaces
kubectl describe pod -n NAMESPACE POD
kubectl logs -n NAMESPACE POD
```
## Kubernetes API
The Kubernetes API client configuration (`kubeconfig`) can be retrieved using Talos API with `talosctl -n <IP> kubeconfig` command.
Talos Linux mostly doesn't depend on the Kubernetes API endpoint for the cluster, but Kubernetes API endpoint should be configured
correctly for external access to the cluster.
### Kubernetes Control Plane Endpoint
The Kubernetes control plane endpoint is the single canonical URL by which the
Kubernetes API is accessed.
Especially with high-availability (HA) control planes, this endpoint may point to a load balancer or a DNS name which may
have multiple `A` and `AAAA` records.
Like Talos' own API, the Kubernetes API uses mutual TLS, client
certs, and a common Certificate Authority (CA).
Unlike general-purpose websites, there is no need for an upstream CA, so tools
such as cert-manager, Let's Encrypt, or products such
as validated TLS certificates are not required.
Encryption, however, _is_, and hence the URL scheme will always be `https://`.
By default, the Kubernetes API server in Talos runs on port 6443.
As such, the control plane endpoint URLs for Talos will almost always be of the form
`https://endpoint:6443`.
(The port, since it is not the `https` default of `443` is required.)
The `endpoint` above may be a DNS name or IP address, but it should be
directed to the _set_ of all controlplane nodes, as opposed to a
single one.
As mentioned above, this can be achieved by a number of strategies, including:
- an external load balancer
- DNS records
- Talos-builtin shared IP ([VIP]({{< relref "../talos-guides/network/vip" >}}))
- BGP peering of a shared IP (such as with [kube-vip](https://kube-vip.io))
Using a DNS name here is a good idea, since it allows any other option, while offering
a layer of abstraction.
It allows the underlying IP addresses to change without impacting the
canonical URL.
Unlike most services in Kubernetes, the API server runs with host networking,
meaning that it shares the network namespace with the host.
This means you can use the IP address(es) of the host to refer to the Kubernetes
API server.
For availability of the API, it is important that any load balancer be aware of
the health of the backend API servers, to minimize disruptions during
common node operations like reboots and upgrades.
## Miscellaneous
### Checking Controller Runtime Logs
Talos runs a set of [controllers]({{< relref "../learn-more/controllers-resources" >}}) which operate on resources to build and support machine operations.
Some debugging information can be queried from the controller logs with `talosctl logs controller-runtime`:
```bash
talosctl -n <IP> logs controller-runtime
```
Controllers continuously run a reconcile loop, so at any time, they may be starting, failing, or restarting.
This is expected behavior.
If there are no new messages in the `controller-runtime` log, it means that the controllers have successfully finished reconciling, and that the current system state is the desired system state.

View File

@ -0,0 +1,9 @@
---
title: What's New in Talos 1.7.0
weight: 50
description: "List of new and shiny features in Talos Linux."
---
See also [upgrade notes]({{< relref "../../talos-guides/upgrading-talos/">}}) for important changes.
TBD

View File

@ -0,0 +1,28 @@
---
title: What is Talos?
weight: 10
description: "A quick introduction in to what Talos is and why it should be used."
---
Talos is a container optimized Linux distro; a reimagining of Linux for distributed systems such as Kubernetes.
Designed to be as minimal as possible while still maintaining practicality.
For these reasons, Talos has a number of features unique to it:
- it is immutable
- it is atomic
- it is ephemeral
- it is minimal
- it is secure by default
- it is managed via a single declarative configuration file and gRPC API
Talos can be deployed on container, cloud, virtualized, and bare metal platforms.
## Why Talos
In having less, Talos offers more.
Security.
Efficiency.
Resiliency.
Consistency.
All of these areas are improved simply by having less.

View File

@ -0,0 +1,5 @@
---
title: "Kubernetes Guides"
weight: 30
description: "Management of a Kubernetes Cluster hosted by Talos Linux"
---

View File

@ -0,0 +1,5 @@
---
title: "Configuration"
weight: 10
description: "How to configure components of the Kubernetes cluster itself."
---

View File

@ -0,0 +1,281 @@
---
title: "Ceph Storage cluster with Rook"
description: "Guide on how to create a simple Ceph storage cluster with Rook for Kubernetes"
aliases:
- ../../guides/configuring-ceph-with-rook
---
## Preparation
Talos Linux reserves an entire disk for the OS installation, so machines with multiple available disks are needed for a reliable Ceph cluster with Rook and Talos Linux.
Rook requires that the block devices or partitions used by Ceph have no partitions or formatted filesystems before use.
Rook also requires a minimum Kubernetes version of `v1.16` and Helm `v3.0` for installation of charts.
It is highly recommended that the [Rook Ceph overview](https://rook.io/docs/rook/v1.8/ceph-storage.html) is read and understood before deploying a Ceph cluster with Rook.
## Installation
Creating a Ceph cluster with Rook requires two steps; first the Rook Operator needs to be installed which can be done with a Helm Chart.
The example below installs the Rook Operator into the `rook-ceph` namespace, which is the default for a Ceph cluster with Rook.
```shell
$ helm repo add rook-release https://charts.rook.io/release
"rook-release" has been added to your repositories
$ helm install --create-namespace --namespace rook-ceph rook-ceph rook-release/rook-ceph
W0327 17:52:44.277830 54987 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0327 17:52:44.612243 54987 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
NAME: rook-ceph
LAST DEPLOYED: Sun Mar 27 17:52:42 2022
NAMESPACE: rook-ceph
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
The Rook Operator has been installed. Check its status by running:
kubectl --namespace rook-ceph get pods -l "app=rook-ceph-operator"
Visit https://rook.io/docs/rook/latest for instructions on how to create and configure Rook clusters
Important Notes:
- You must customize the 'CephCluster' resource in the sample manifests for your cluster.
- Each CephCluster must be deployed to its own namespace, the samples use `rook-ceph` for the namespace.
- The sample manifests assume you also installed the rook-ceph operator in the `rook-ceph` namespace.
- The helm chart includes all the RBAC required to create a CephCluster CRD in the same namespace.
- Any disk devices you add to the cluster in the 'CephCluster' must be empty (no filesystem and no partitions).
```
Once that is complete, the Ceph cluster can be installed with the official Helm Chart.
The Chart can be installed with default values, which will attempt to use all nodes in the Kubernetes cluster, and all unused disks on each node for Ceph storage, and make available block storage, object storage, as well as a shared filesystem.
Generally more specific node/device/cluster configuration is used, and the [Rook documentation](https://rook.io/docs/rook/v1.8/ceph-cluster-crd.html) explains all the available options in detail.
For this example the defaults will be adequate.
```shell
$ helm install --create-namespace --namespace rook-ceph rook-ceph-cluster --set operatorNamespace=rook-ceph rook-release/rook-ceph-cluster
NAME: rook-ceph-cluster
LAST DEPLOYED: Sun Mar 27 18:12:46 2022
NAMESPACE: rook-ceph
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
The Ceph Cluster has been installed. Check its status by running:
kubectl --namespace rook-ceph get cephcluster
Visit https://rook.github.io/docs/rook/latest/ceph-cluster-crd.html for more information about the Ceph CRD.
Important Notes:
- You can only deploy a single cluster per namespace
- If you wish to delete this cluster and start fresh, you will also have to wipe the OSD disks using `sfdisk`
```
Now the Ceph cluster configuration has been created, the Rook operator needs time to install the Ceph cluster and bring all the components online.
The progression of the Ceph cluster state can be followed with the following command.
```shell
$ watch kubectl --namespace rook-ceph get cephcluster rook-ceph
Every 2.0s: kubectl --namespace rook-ceph get cephcluster rook-ceph
NAME DATADIRHOSTPATH MONCOUNT AGE PHASE MESSAGE HEALTH EXTERNAL
rook-ceph /var/lib/rook 3 57s Progressing Configuring Ceph Mons
```
Depending on the size of the Ceph cluster and the availability of resources the Ceph cluster should become available, and with it the storage classes that can be used with Kubernetes Physical Volumes.
```shell
$ kubectl --namespace rook-ceph get cephcluster rook-ceph
NAME DATADIRHOSTPATH MONCOUNT AGE PHASE MESSAGE HEALTH EXTERNAL
rook-ceph /var/lib/rook 3 40m Ready Cluster created successfully HEALTH_OK
$ kubectl get storageclass
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
ceph-block (default) rook-ceph.rbd.csi.ceph.com Delete Immediate true 77m
ceph-bucket rook-ceph.ceph.rook.io/bucket Delete Immediate false 77m
ceph-filesystem rook-ceph.cephfs.csi.ceph.com Delete Immediate true 77m
```
## Talos Linux Considerations
It is important to note that a Rook Ceph cluster saves cluster information directly onto the node (by default `dataDirHostPath` is set to `/var/lib/rook`).
If running only a single `mon` instance, cluster management is little bit more involved, as any time a Talos Linux node is reconfigured or upgraded, the partition that stores the `/var` [file system]({{< relref "../../learn-more/architecture#the-file-system" >}}) is wiped, but the `--preserve` option of [`talosctl upgrade`]({{< relref "../../reference/cli#talosctl-upgrade" >}}) will ensure that doesn't happen.
By default, Rook configues Ceph to have 3 `mon` instances, in which case the data stored in `dataDirHostPath` can be regenerated from the other `mon` instances.
So when performing maintenance on a Talos Linux node with a Rook Ceph cluster (e.g. upgrading the Talos Linux version), it is imperative that care be taken to maintain the health of the Ceph cluster.
Before upgrading, you should always check the health status of the Ceph cluster to ensure that it is healthy.
```shell
$ kubectl --namespace rook-ceph get cephclusters.ceph.rook.io rook-ceph
NAME DATADIRHOSTPATH MONCOUNT AGE PHASE MESSAGE HEALTH EXTERNAL
rook-ceph /var/lib/rook 3 98m Ready Cluster created successfully HEALTH_OK
```
If it is, you can begin the upgrade process for the Talos Linux node, during which time the Ceph cluster will become unhealthy as the node is reconfigured.
Before performing any other action on the Talos Linux nodes, the Ceph cluster must return to a healthy status.
```shell
$ talosctl upgrade --nodes 172.20.15.5 --image ghcr.io/talos-systems/installer:v0.14.3
NODE ACK STARTED
172.20.15.5 Upgrade request received 2022-03-27 20:29:55.292432887 +0200 CEST m=+10.050399758
$ kubectl --namespace rook-ceph get cephclusters.ceph.rook.io
NAME DATADIRHOSTPATH MONCOUNT AGE PHASE MESSAGE HEALTH EXTERNAL
rook-ceph /var/lib/rook 3 99m Progressing Configuring Ceph Mgr(s) HEALTH_WARN
$ kubectl --namespace rook-ceph wait --timeout=1800s --for=jsonpath='{.status.ceph.health}=HEALTH_OK' rook-ceph
cephcluster.ceph.rook.io/rook-ceph condition met
```
The above steps need to be performed for each Talos Linux node undergoing maintenance, one at a time.
## Cleaning Up
### Rook Ceph Cluster Removal
Removing a Rook Ceph cluster requires a few steps, starting with signalling to Rook that the Ceph cluster is really being destroyed.
Then all Persistent Volumes (and Claims) backed by the Ceph cluster must be deleted, followed by the Storage Classes and the Ceph storage types.
```shell
$ kubectl --namespace rook-ceph patch cephcluster rook-ceph --type merge -p '{"spec":{"cleanupPolicy":{"confirmation":"yes-really-destroy-data"}}}'
cephcluster.ceph.rook.io/rook-ceph patched
$ kubectl delete storageclasses ceph-block ceph-bucket ceph-filesystem
storageclass.storage.k8s.io "ceph-block" deleted
storageclass.storage.k8s.io "ceph-bucket" deleted
storageclass.storage.k8s.io "ceph-filesystem" deleted
$ kubectl --namespace rook-ceph delete cephblockpools ceph-blockpool
cephblockpool.ceph.rook.io "ceph-blockpool" deleted
$ kubectl --namespace rook-ceph delete cephobjectstore ceph-objectstore
cephobjectstore.ceph.rook.io "ceph-objectstore" deleted
$ kubectl --namespace rook-ceph delete cephfilesystem ceph-filesystem
cephfilesystem.ceph.rook.io "ceph-filesystem" deleted
```
Once that is complete, the Ceph cluster itself can be removed, along with the Rook Ceph cluster Helm chart installation.
```shell
$ kubectl --namespace rook-ceph delete cephcluster rook-ceph
cephcluster.ceph.rook.io "rook-ceph" deleted
$ helm --namespace rook-ceph uninstall rook-ceph-cluster
release "rook-ceph-cluster" uninstalled
```
If needed, the Rook Operator can also be removed along with all the Custom Resource Definitions that it created.
```shell
$ helm --namespace rook-ceph uninstall rook-ceph
W0328 12:41:14.998307 147203 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
These resources were kept due to the resource policy:
[CustomResourceDefinition] cephblockpools.ceph.rook.io
[CustomResourceDefinition] cephbucketnotifications.ceph.rook.io
[CustomResourceDefinition] cephbuckettopics.ceph.rook.io
[CustomResourceDefinition] cephclients.ceph.rook.io
[CustomResourceDefinition] cephclusters.ceph.rook.io
[CustomResourceDefinition] cephfilesystemmirrors.ceph.rook.io
[CustomResourceDefinition] cephfilesystems.ceph.rook.io
[CustomResourceDefinition] cephfilesystemsubvolumegroups.ceph.rook.io
[CustomResourceDefinition] cephnfses.ceph.rook.io
[CustomResourceDefinition] cephobjectrealms.ceph.rook.io
[CustomResourceDefinition] cephobjectstores.ceph.rook.io
[CustomResourceDefinition] cephobjectstoreusers.ceph.rook.io
[CustomResourceDefinition] cephobjectzonegroups.ceph.rook.io
[CustomResourceDefinition] cephobjectzones.ceph.rook.io
[CustomResourceDefinition] cephrbdmirrors.ceph.rook.io
[CustomResourceDefinition] objectbucketclaims.objectbucket.io
[CustomResourceDefinition] objectbuckets.objectbucket.io
release "rook-ceph" uninstalled
$ kubectl delete crds cephblockpools.ceph.rook.io cephbucketnotifications.ceph.rook.io cephbuckettopics.ceph.rook.io \
cephclients.ceph.rook.io cephclusters.ceph.rook.io cephfilesystemmirrors.ceph.rook.io \
cephfilesystems.ceph.rook.io cephfilesystemsubvolumegroups.ceph.rook.io \
cephnfses.ceph.rook.io cephobjectrealms.ceph.rook.io cephobjectstores.ceph.rook.io \
cephobjectstoreusers.ceph.rook.io cephobjectzonegroups.ceph.rook.io cephobjectzones.ceph.rook.io \
cephrbdmirrors.ceph.rook.io objectbucketclaims.objectbucket.io objectbuckets.objectbucket.io
customresourcedefinition.apiextensions.k8s.io "cephblockpools.ceph.rook.io" deleted
customresourcedefinition.apiextensions.k8s.io "cephbucketnotifications.ceph.rook.io" deleted
customresourcedefinition.apiextensions.k8s.io "cephbuckettopics.ceph.rook.io" deleted
customresourcedefinition.apiextensions.k8s.io "cephclients.ceph.rook.io" deleted
customresourcedefinition.apiextensions.k8s.io "cephclusters.ceph.rook.io" deleted
customresourcedefinition.apiextensions.k8s.io "cephfilesystemmirrors.ceph.rook.io" deleted
customresourcedefinition.apiextensions.k8s.io "cephfilesystems.ceph.rook.io" deleted
customresourcedefinition.apiextensions.k8s.io "cephfilesystemsubvolumegroups.ceph.rook.io" deleted
customresourcedefinition.apiextensions.k8s.io "cephnfses.ceph.rook.io" deleted
customresourcedefinition.apiextensions.k8s.io "cephobjectrealms.ceph.rook.io" deleted
customresourcedefinition.apiextensions.k8s.io "cephobjectstores.ceph.rook.io" deleted
customresourcedefinition.apiextensions.k8s.io "cephobjectstoreusers.ceph.rook.io" deleted
customresourcedefinition.apiextensions.k8s.io "cephobjectzonegroups.ceph.rook.io" deleted
customresourcedefinition.apiextensions.k8s.io "cephobjectzones.ceph.rook.io" deleted
customresourcedefinition.apiextensions.k8s.io "cephrbdmirrors.ceph.rook.io" deleted
customresourcedefinition.apiextensions.k8s.io "objectbucketclaims.objectbucket.io" deleted
customresourcedefinition.apiextensions.k8s.io "objectbuckets.objectbucket.io" deleted
```
### Talos Linux Rook Metadata Removal
If the Rook Operator is cleanly removed following the above process, the node metadata and disks should be clean and ready to be re-used.
In the case of an unclean cluster removal, there may be still a few instances of metadata stored on the system disk, as well as the partition information on the storage disks.
First the node metadata needs to be removed, make sure to update the `nodeName` with the actual name of a storage node that needs cleaning, and `path` with the Rook configuration `dataDirHostPath` set when installing the chart.
The following will need to be repeated for each node used in the Rook Ceph cluster.
```shell
$ cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: disk-clean
spec:
restartPolicy: Never
nodeName: <storage-node-name>
volumes:
- name: rook-data-dir
hostPath:
path: <dataDirHostPath>
containers:
- name: disk-clean
image: busybox
securityContext:
privileged: true
volumeMounts:
- name: rook-data-dir
mountPath: /node/rook-data
command: ["/bin/sh", "-c", "rm -rf /node/rook-data/*"]
EOF
pod/disk-clean created
$ kubectl wait --timeout=900s --for=jsonpath='{.status.phase}=Succeeded' pod disk-clean
pod/disk-clean condition met
$ kubectl delete pod disk-clean
pod "disk-clean" deleted
```
Lastly, the disks themselves need the partition and filesystem data wiped before they can be reused.
Again, the following as to be repeated for each node **and** disk used in the Rook Ceph cluster, updating `nodeName` and `of=` in the `command` as needed.
```shell
$ cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: disk-wipe
spec:
restartPolicy: Never
nodeName: <storage-node-name>
containers:
- name: disk-wipe
image: busybox
securityContext:
privileged: true
command: ["/bin/sh", "-c", "dd if=/dev/zero bs=1M count=100 oflag=direct of=<device>"]
EOF
pod/disk-wipe created
$ kubectl wait --timeout=900s --for=jsonpath='{.status.phase}=Succeeded' pod disk-wipe
pod/disk-wipe condition met
$ kubectl delete pod disk-clean
pod "disk-wipe" deleted
```

View File

@ -0,0 +1,45 @@
---
title: "Deploying Metrics Server"
description: "In this guide you will learn how to set up metrics-server."
aliases:
- ../../guides/deploy-metrics-server
---
Metrics Server enables use of the [Horizontal Pod Autoscaler](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/) and [Vertical Pod Autoscaler](https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler).
It does this by gathering metrics data from the kubelets in a cluster.
By default, the certificates in use by the kubelets will not be recognized by metrics-server.
This can be solved by either configuring metrics-server to do no validation of the TLS certificates, or by modifying the kubelet configuration to rotate its certificates and use ones that will be recognized by metrics-server.
## Node Configuration
To enable kubelet certificate rotation, all nodes should have the following Machine Config snippet:
```yaml
machine:
kubelet:
extraArgs:
rotate-server-certificates: true
```
## Install During Bootstrap
We will want to ensure that new certificates for the kubelets are approved automatically.
This can easily be done with the [Kubelet Serving Certificate Approver](https://github.com/alex1989hu/kubelet-serving-cert-approver), which will automatically approve the Certificate Signing Requests generated by the kubelets.
We can have Kubelet Serving Certificate Approver and metrics-server installed on the cluster automatically during bootstrap by adding the following snippet to the Cluster Config of the node that will be handling the bootstrap process:
```yaml
cluster:
extraManifests:
- https://raw.githubusercontent.com/alex1989hu/kubelet-serving-cert-approver/main/deploy/standalone-install.yaml
- https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
```
## Install After Bootstrap
If you choose not to use `extraManifests` to install Kubelet Serving Certificate Approver and metrics-server during bootstrap, you can install them once the cluster is online using `kubectl`:
```sh
kubectl apply -f https://raw.githubusercontent.com/alex1989hu/kubelet-serving-cert-approver/main/deploy/standalone-install.yaml
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
```

View File

@ -0,0 +1,51 @@
---
title: "KubePrism"
description: "Enabling in-cluster highly-available controlplane endpoint."
---
Kubernetes pods running in CNI mode can use the `kubernetes.default.svc` service endpoint to access the Kubernetes API server,
while pods running in host networking mode can only use the external cluster endpoint to access the Kubernetes API server.
Kubernetes controlplane components run in host networking mode, and it is critical for them to be able to access the Kubernetes API server,
same as CNI components (when CNI requires access to Kubernetes API).
The external cluster endpoint might be unavailable due to misconfiguration or network issues, or it might have higher latency than the internal endpoint.
A failure to access the Kubernetes API server might cause a series of issues in the cluster: pods are not scheduled, service IPs stop working, etc.
KubePrism feature solves this problem by enabling in-cluster highly-available controlplane endpoint on every node in the cluster.
## Video Walkthrough
To see a live demo of this writeup, see the video below:
<iframe width="560" height="315" src="https://www.youtube.com/embed/VNRE64R5akM" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
## Enabling KubePrism
As of Talos 1.6, KubePrism is enabled by default with port 7445.
> Note: the `port` specified should be available on every node in the cluster.
## How it works
Talos spins up a TCP loadbalancer on every machine on the `localhost` on the specified port which automatically picks up one of the endpoints:
* the external cluster endpoint as specified in the machine configuration
* for controlplane machines: `https://localhost:<api-server-local-port>` (`http://localhost:6443` in the default configuration)
* `https://<controlplane-address>:<api-server-port>` for every controlplane machine (based on the information from [Cluster Discovery]({{< relref "../../talos-guides/discovery" >}}))
KubePrism automatically filters out unhealthy (or unreachable) endpoints, and prefers lower-latency endpoints over higher-latency endpoints.
Talos automatically reconfigures `kubelet`, `kube-scheduler` and `kube-controller-manager` to use the KubePrism endpoint.
The `kube-proxy` manifest is also reconfigured to use the KubePrism endpoint by default, but when enabling KubePrism for a running cluster the manifest should be updated
with `talosctl upgrade-k8s` command.
When using CNI components that require access to the Kubernetes API server, the KubePrism endpoint should be passed to the CNI configuration (e.g. Cilium, Calico CNIs).
## Notes
As the list of endpoints for KubePrism includes the external cluster endpoint, KubePrism in the worst case scenario will behave the same as the external cluster endpoint.
For controlplane nodes, the KubePrism should pick up the `localhost` endpoint of the `kube-apiserver`, minimizing the latency.
Worker nodes might use direct address of the controlplane endpoint if the latency is lower than the latency of the external cluster endpoint.
KubePrism listen endpoint is bound to `localhost` address, so it can't be used outside the cluster.

View File

@ -0,0 +1,35 @@
---
title: "Local Storage"
description: "Using local storage for Kubernetes workloads."
---
Using local storage for Kubernetes workloads implies that the pod will be bound to the node where the local storage is available.
Local storage is not replicated, so in case of a machine failure contents of the local storage will be lost.
> Note: when using `EPHEMERAL` Talos partition (`/var`), make sure to use `--preserve` set while performing upgrades, otherwise you risk losing data.
## `hostPath` mounts
The simplest way to use local storage is to use `hostPath` mounts.
When using `hostPath` mounts, make sure the root directory of the mount is mounted into the `kubelet` container:
```yaml
machine:
kubelet:
extraMounts:
- destination: /var/mnt
type: bind
source: /var/mnt
options:
- bind
- rshared
- rw
```
Both `EPHEMERAL` partition and user disks can be used for `hostPath` mounts.
## Local Path Provisioner
[Local Path Provisioner](https://github.com/rancher/local-path-provisioner) can be used to dynamically provision local storage.
Make sure to update its configuration to use a path under `/var`, e.g. `/var/local-path-provisioner` as the root path for the local storage.
(In Talos Linux default local path provisioner path `/opt/local-path-provisioner` is read-only).

View File

@ -0,0 +1,178 @@
---
title: "Pod Security"
description: "Enabling Pod Security Admission plugin to configure Pod Security Standards."
aliases:
- ../../guides/pod-security
---
Kubernetes deprecated [Pod Security Policy](https://kubernetes.io/docs/concepts/policy/pod-security-policy/) as of v1.21, and it was removed in v1.25.
Pod Security Policy was replaced with [Pod Security Admission](https://kubernetes.io/docs/concepts/security/pod-security-admission/), which is enabled by default
starting with Kubernetes v1.23.
Talos Linux by default enables and configures Pod Security Admission plugin to enforce [Pod Security Standards](https://kubernetes.io/docs/concepts/security/pod-security-standards/) with the
`baseline` profile as the default enforced with the exception of `kube-system` namespace which enforces `privileged` profile.
Some applications (e.g. Prometheus node exporter or storage solutions) require more relaxed Pod Security Standards, which can be configured by either updating the Pod Security Admission plugin configuration,
or by using the `pod-security.kubernetes.io/enforce` label on the namespace level:
```shell
kubectl label namespace NAMESPACE-NAME pod-security.kubernetes.io/enforce=privileged
```
## Configuration
Talos provides default Pod Security Admission in the machine configuration:
```yaml
apiVersion: pod-security.admission.config.k8s.io/v1alpha1
kind: PodSecurityConfiguration
defaults:
enforce: "baseline"
enforce-version: "latest"
audit: "restricted"
audit-version: "latest"
warn: "restricted"
warn-version: "latest"
exemptions:
usernames: []
runtimeClasses: []
namespaces: [kube-system]
```
This is a cluster-wide configuration for the Pod Security Admission plugin:
* by default `baseline` [Pod Security Standard](https://kubernetes.io/docs/concepts/security/pod-security-standards/) profile is enforced
* more strict `restricted` profile is not enforced, but API server warns about found issues
This default policy can be modified by updating the generated machine configuration before the cluster is created or on the fly by using the `talosctl` CLI utility.
Verify current admission plugin configuration with:
```shell
$ talosctl get admissioncontrolconfigs.kubernetes.talos.dev admission-control -o yaml
node: 172.20.0.2
metadata:
namespace: controlplane
type: AdmissionControlConfigs.kubernetes.talos.dev
id: admission-control
version: 1
owner: config.K8sControlPlaneController
phase: running
created: 2022-02-22T20:28:21Z
updated: 2022-02-22T20:28:21Z
spec:
config:
- name: PodSecurity
configuration:
apiVersion: pod-security.admission.config.k8s.io/v1alpha1
defaults:
audit: restricted
audit-version: latest
enforce: baseline
enforce-version: latest
warn: restricted
warn-version: latest
exemptions:
namespaces:
- kube-system
runtimeClasses: []
usernames: []
kind: PodSecurityConfiguration
```
## Usage
Create a deployment that satisfies the `baseline` policy but gives warnings on `restricted` policy:
```shell
$ kubectl create deployment nginx --image=nginx
Warning: would violate PodSecurity "restricted:latest": allowPrivilegeEscalation != false (container "nginx" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "nginx" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "nginx" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "nginx" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
deployment.apps/nginx created
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-85b98978db-j68l8 1/1 Running 0 2m3s
```
Create a daemonset which fails to meet requirements of the `baseline` policy:
```yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
labels:
app: debug-container
name: debug-container
namespace: default
spec:
revisionHistoryLimit: 10
selector:
matchLabels:
app: debug-container
template:
metadata:
creationTimestamp: null
labels:
app: debug-container
spec:
containers:
- args:
- "360000"
command:
- /bin/sleep
image: ubuntu:latest
imagePullPolicy: IfNotPresent
name: debug-container
resources: {}
securityContext:
privileged: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirstWithHostNet
hostIPC: true
hostPID: true
hostNetwork: true
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
updateStrategy:
rollingUpdate:
maxSurge: 0
maxUnavailable: 1
type: RollingUpdate
```
```shell
$ kubectl apply -f debug.yaml
Warning: would violate PodSecurity "restricted:latest": host namespaces (hostNetwork=true, hostPID=true, hostIPC=true), privileged (container "debug-container" must not set securityContext.privileged=true), allowPrivilegeEscalation != false (container "debug-container" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "debug-container" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "debug-container" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "debug-container" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
daemonset.apps/debug-container created
```
Daemonset `debug-container` gets created, but no pods are scheduled:
```shell
$ kubectl get ds
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
debug-container 0 0 0 0 0 <none> 34s
```
Pod Security Admission plugin errors are in the daemonset events:
```shell
$ kubectl describe ds debug-container
...
Warning FailedCreate 92s daemonset-controller Error creating: pods "debug-container-kwzdj" is forbidden: violates PodSecurity "baseline:latest": host namespaces (hostNetwork=true, hostPID=true, hostIPC=true), privileged (container "debug-container" must not set securityContext.privileged=true)
```
Pod Security Admission configuration can also be overridden on a namespace level:
```shell
$ kubectl label ns default pod-security.kubernetes.io/enforce=privileged
namespace/default labeled
$ kubectl get ds
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
debug-container 2 2 0 2 0 <none> 4s
```
As enforce policy was updated to the `privileged` for the `default` namespace, `debug-container` is now successfully running.

View File

@ -0,0 +1,203 @@
---
title: "Replicated Local Storage"
description: "Using local storage with OpenEBS Jiva"
aliases:
- ../../guides/storage
---
If you want to use replicated storage leveraging disk space from a local disk with Talos Linux installed, OpenEBS Jiva is a great option.
This requires installing the `iscsi-tools` [system extension]({{< relref "../../talos-guides/configuration/system-extensions" >}}).
Since OpenEBS Jiva is a replicated storage, it's recommended to have at least three nodes where sufficient local disk space is available.
The documentation will follow installing OpenEBS Jiva via the offical Helm chart.
Since Talos is different from standard Operating Systems, the OpenEBS components need a little tweaking after the Helm installation.
Refer to the OpenEBS Jiva [documentation](https://github.com/openebs/jiva-operator/blob/develop/docs/quickstart.md) if you need further customization.
> NB: Also note that the Talos nodes need to be upgraded with `--preserve` set while running OpenEBS Jiva, otherwise you risk losing data.
> Even though it's possible to recover data from other replicas if the node is wiped during an upgrade, this can require extra operational knowledge to recover, so it's highly recommended to use `--preserve` to avoid data loss.
## Preparing the nodes
Create the [boot assets]({{< relref "../../talos-guides/install/boot-assets" >}}) which includes the `iscsi-tools` system extensions (or create a custom installer and perform a machine upgrade if Talos is already installed).
Create a machine config patch with the contents below and save as `patch.yaml`
```yaml
machine:
kubelet:
extraMounts:
- destination: /var/openebs/local
type: bind
source: /var/openebs/local
options:
- bind
- rshared
- rw
```
Apply the machine config to all the nodes using talosctl:
```bash
talosctl -e <endpoint ip/hostname> -n <node ip/hostname> patch mc -p @patch.yaml
```
The extension status can be verified by running the following command:
```bash
talosctl -e <endpoint ip/hostname> -n <node ip/hostname> get extensions
```
An output similar to below can be observed:
```text
NODE NAMESPACE TYPE ID VERSION NAME VERSION
192.168.20.61 runtime ExtensionStatus 000.ghcr.io-siderolabs-iscsi-tools-v0.1.1 1 iscsi-tools v0.1.1
```
The service status can be checked by running the following command:
```bash
talosctl -e <endpoint ip/hostname> -n <node ip/hostname> services
```
You should see that the `ext-tgtd` and the `ext-iscsid` services are running.
```text
NODE SERVICE STATE HEALTH LAST CHANGE LAST EVENT
192.168.20.51 apid Running OK 64h57m15s ago Health check successful
192.168.20.51 containerd Running OK 64h57m23s ago Health check successful
192.168.20.51 cri Running OK 64h57m20s ago Health check successful
192.168.20.51 etcd Running OK 64h55m29s ago Health check successful
192.168.20.51 ext-iscsid Running ? 64h57m19s ago Started task ext-iscsid (PID 4040) for container ext-iscsid
192.168.20.51 ext-tgtd Running ? 64h57m19s ago Started task ext-tgtd (PID 3999) for container ext-tgtd
192.168.20.51 kubelet Running OK 38h14m10s ago Health check successful
192.168.20.51 machined Running ? 64h57m29s ago Service started as goroutine
192.168.20.51 trustd Running OK 64h57m19s ago Health check successful
192.168.20.51 udevd Running OK 64h57m21s ago Health check successful
```
## Install OpenEBS Jiva
```bash
helm repo add openebs-jiva https://openebs.github.io/jiva-operator
helm repo update
helm upgrade --install --create-namespace --namespace openebs --version 3.2.0 openebs-jiva openebs-jiva/jiva
```
This will create a storage class named `openebs-jiva-csi-default` which can be used for workloads.
The storage class named `openebs-hostpath` is used by jiva to create persistent volumes backed by local storage and then used for replicated storage by the jiva controller.
## Patching the Namespace
when using the default Pod Security Admissions created by Talos you need the following labels on your namespace:
```yaml
pod-security.kubernetes.io/audit: privileged
pod-security.kubernetes.io/enforce: privileged
pod-security.kubernetes.io/warn: privileged
```
or via kubectl:
```bash
kubectl label ns openebs pod-security.kubernetes.io/audit=privileged pod-security.kubernetes.io/enforce=privileged pod-security.kubernetes.io/warn=privileged
```
## Number of Replicas
By Default Jiva uses 3 replicas if your cluster consists of lesser nodes consider setting `defaultPolicy.replicas` to the number of nodes in your cluster e.g. 2.
## Patching the jiva installation
Since Jiva assumes `iscisd` to be running natively on the host and not as a Talos [extension service]({{< relref "../../advanced/extension-services.md" >}}), we need to modify the CSI node daemonset to enable it to find the PID of the `iscsid` service.
The default config map used by Jiva also needs to be modified so that it can execute `iscsiadm` commands inside the PID namespace of the `iscsid` service.
Start by creating a configmap definition named `config.yaml` as below:
```yaml
apiVersion: v1
kind: ConfigMap
metadata:
labels:
app.kubernetes.io/managed-by: pulumi
name: openebs-jiva-csi-iscsiadm
namespace: openebs
data:
iscsiadm: |
#!/bin/sh
iscsid_pid=$(pgrep iscsid)
nsenter --mount="/proc/${iscsid_pid}/ns/mnt" --net="/proc/${iscsid_pid}/ns/net" -- /usr/local/sbin/iscsiadm "$@"
```
Replace the existing config map with the above config map by running the following command:
```bash
kubectl --namespace openebs apply --filename config.yaml
```
Now we need to update the jiva CSI daemonset to run with `hostPID: true` so it can find the PID of the `iscsid` service, by running the following command:
```bash
kubectl --namespace openebs patch daemonset openebs-jiva-csi-node --type=json --patch '[{"op": "add", "path": "/spec/template/spec/hostPID", "value": true}]'
```
## Testing a simple workload
In order to test the Jiva installation, let's first create a PVC referencing the `openebs-jiva-csi-default` storage class:
```yaml
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: example-jiva-csi-pvc
spec:
storageClassName: openebs-jiva-csi-default
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 4Gi
```
and then create a deployment using the above PVC:
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: fio
spec:
selector:
matchLabels:
name: fio
replicas: 1
strategy:
type: Recreate
rollingUpdate: null
template:
metadata:
labels:
name: fio
spec:
containers:
- name: perfrunner
image: openebs/tests-fio
command: ["/bin/bash"]
args: ["-c", "while true ;do sleep 50; done"]
volumeMounts:
- mountPath: /datadir
name: fio-vol
volumes:
- name: fio-vol
persistentVolumeClaim:
claimName: example-jiva-csi-pvc
```
You can clean up the test resources by running the following command:
```bash
kubectl delete deployment fio
kubectl delete pvc example-jiva-csi-pvc
```

View File

@ -0,0 +1,118 @@
---
title: "Seccomp Profiles"
description: "Using custom Seccomp Profiles with Kubernetes workloads."
aliases:
- ../../guides/pod-security
---
Seccomp stands for secure computing mode and has been a feature of the Linux kernel since version 2.6.12.
It can be used to sandbox the privileges of a process, restricting the calls it is able to make from userspace into the kernel.
Refer the [Kubernetes Seccomp Guide](https://kubernetes.io/docs/tutorials/security/seccomp/) for more details.
In this guide we are going to configure a custom Seccomp Profile that logs all syscalls made by the workload.
## Preparing the nodes
Create a machine config path with the contents below and save as `patch.yaml`
```yaml
machine:
seccompProfiles:
- name: audit.json
value:
defaultAction: SCMP_ACT_LOG
```
Apply the machine config to all the nodes using talosctl:
```bash
talosctl -e <endpoint ip/hostname> -n <node ip/hostname> patch mc -p @patch.yaml
```
This would create a seccomp profile name `audit.json` on the node at `/var/lib/kubelet/seccomp/profiles`.
The profiles can be used by Kubernetes pods by specfying the pod `securityContext` as below:
```yaml
spec:
securityContext:
seccompProfile:
type: Localhost
localhostProfile: profiles/audit.json
```
> Note that the `localhostProfile` uses the name of the profile created under `profiles` directory.
> So make sure to use path as `profiles/<profile-name.json>`
This can be verfied by running the below commands:
```bash
talosctl -e <endpoint ip/hostname> -n <node ip/hostname> get seccompprofiles
```
An output similar to below can be observed:
```text
NODE NAMESPACE TYPE ID VERSION
10.5.0.3 cri SeccompProfile audit.json 1
```
The content of the seccomp profile can be viewed by running the below command:
```bash
talosctl -e <endpoint ip/hostname> -n <node ip/hostname> read /var/lib/kubelet/seccomp/profiles/audit.json
```
An output similar to below can be observed:
```text
{"defaultAction":"SCMP_ACT_LOG"}
```
## Create a Kubernetes workload that uses the custom Seccomp Profile
Here we'll be using an example workload from the Kubernetes [documentation](https://kubernetes.io/docs/tutorials/security/seccomp/).
First open up a second terminal and run the following talosctl command so that we can view the Syscalls being logged in realtime:
```bash
talosctl -e <endpoint ip/hostname> -n <node ip/hostname> dmesg --follow --tail
```
Now deploy the example workload from the Kubernetes documentation:
```bash
kubectl apply -f https://k8s.io/examples/pods/security/seccomp/ga/audit-pod.yaml
```
Once the pod starts running the terminal running `talosctl dmesg` command from above should log similar to below:
```text
10.5.0.3: kern: info: [2022-07-28T11:49:42.489473063Z]: cni0: port 1(veth32488a86) entered blocking state
10.5.0.3: kern: info: [2022-07-28T11:49:42.490852063Z]: cni0: port 1(veth32488a86) entered disabled state
10.5.0.3: kern: info: [2022-07-28T11:49:42.492470063Z]: device veth32488a86 entered promiscuous mode
10.5.0.3: kern: info: [2022-07-28T11:49:42.503105063Z]: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
10.5.0.3: kern: info: [2022-07-28T11:49:42.503944063Z]: IPv6: ADDRCONF(NETDEV_CHANGE): veth32488a86: link becomes ready
10.5.0.3: kern: info: [2022-07-28T11:49:42.504764063Z]: cni0: port 1(veth32488a86) entered blocking state
10.5.0.3: kern: info: [2022-07-28T11:49:42.505423063Z]: cni0: port 1(veth32488a86) entered forwarding state
10.5.0.3: kern: warning: [2022-07-28T11:49:44.873616063Z]: kauditd_printk_skb: 14 callbacks suppressed
10.5.0.3: kern: notice: [2022-07-28T11:49:44.873619063Z]: audit: type=1326 audit(1659008985.445:25): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=2784 comm="runc:[2:INIT]" exe="/" sig=0 arch=c000003e syscall=3 compat=0 ip=0x55ec0657bd3b code=0x7ffc0000
10.5.0.3: kern: notice: [2022-07-28T11:49:44.876609063Z]: audit: type=1326 audit(1659008985.445:26): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=2784 comm="runc:[2:INIT]" exe="/" sig=0 arch=c000003e syscall=3 compat=0 ip=0x55ec0657bd3b code=0x7ffc0000
10.5.0.3: kern: notice: [2022-07-28T11:49:44.878789063Z]: audit: type=1326 audit(1659008985.449:27): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=2784 comm="runc:[2:INIT]" exe="/" sig=0 arch=c000003e syscall=257 compat=0 ip=0x55ec0657bdaa code=0x7ffc0000
10.5.0.3: kern: notice: [2022-07-28T11:49:44.886693063Z]: audit: type=1326 audit(1659008985.461:28): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=2784 comm="runc:[2:INIT]" exe="/" sig=0 arch=c000003e syscall=202 compat=0 ip=0x55ec06532b43 code=0x7ffc0000
10.5.0.3: kern: notice: [2022-07-28T11:49:44.888764063Z]: audit: type=1326 audit(1659008985.461:29): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=2784 comm="runc:[2:INIT]" exe="/" sig=0 arch=c000003e syscall=202 compat=0 ip=0x55ec06532b43 code=0x7ffc0000
10.5.0.3: kern: notice: [2022-07-28T11:49:44.891009063Z]: audit: type=1326 audit(1659008985.461:30): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=2784 comm="runc:[2:INIT]" exe="/" sig=0 arch=c000003e syscall=1 compat=0 ip=0x55ec0657bd3b code=0x7ffc0000
10.5.0.3: kern: notice: [2022-07-28T11:49:44.893162063Z]: audit: type=1326 audit(1659008985.461:31): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=2784 comm="runc:[2:INIT]" exe="/" sig=0 arch=c000003e syscall=3 compat=0 ip=0x55ec0657bd3b code=0x7ffc0000
10.5.0.3: kern: notice: [2022-07-28T11:49:44.895365063Z]: audit: type=1326 audit(1659008985.461:32): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=2784 comm="runc:[2:INIT]" exe="/" sig=0 arch=c000003e syscall=39 compat=0 ip=0x55ec066eb68b code=0x7ffc0000
10.5.0.3: kern: notice: [2022-07-28T11:49:44.898306063Z]: audit: type=1326 audit(1659008985.461:33): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=2784 comm="runc:[2:INIT]" exe="/" sig=0 arch=c000003e syscall=59 compat=0 ip=0x55ec0657be16 code=0x7ffc0000
10.5.0.3: kern: notice: [2022-07-28T11:49:44.901518063Z]: audit: type=1326 audit(1659008985.473:34): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=2784 comm="http-echo" exe="/http-echo" sig=0 arch=c000003e syscall=158 compat=0 ip=0x455f35 code=0x7ffc0000
```
## Cleanup
You can clean up the test resources by running the following command:
```bash
kubectl delete pod audit-pod
```

View File

@ -0,0 +1,196 @@
---
title: "Storage"
description: "Setting up storage for a Kubernetes cluster"
aliases:
- ../../guides/storage
---
In Kubernetes, using storage in the right way is well-facilitated by the API.
However, unless you are running in a major public cloud, that API may not be hooked up to anything.
This frequently sends users down a rabbit hole of researching all the various options for storage backends for their platform, for Kubernetes, and for their workloads.
There are a _lot_ of options out there, and it can be fairly bewildering.
For Talos, we try to limit the options somewhat to make the decision-making easier.
## Public Cloud
If you are running on a major public cloud, use their block storage.
It is easy and automatic.
## Storage Clusters
> **Sidero Labs** recommends having separate disks (apart from the Talos install disk) to be used for storage.
Redundancy, scaling capabilities, reliability, speed, maintenance load, and ease of use are all factors you must consider when managing your own storage.
Running a storage cluster can be a very good choice when managing your own storage, and there are two projects we recommend, depending on your situation.
If you need vast amounts of storage composed of more than a dozen or so disks, we recommend you use Rook to manage Ceph.
Also, if you need _both_ mount-once _and_ mount-many capabilities, Ceph is your answer.
Ceph also bundles in an S3-compatible object store.
The down side of Ceph is that there are a lot of moving parts.
> Please note that _most_ people should _never_ use mount-many semantics.
> NFS is pervasive because it is old and easy, _not_ because it is a good idea.
> While it may seem like a convenience at first, there are all manner of locking, performance, change control, and reliability concerns inherent in _any_ mount-many situation, so we **strongly** recommend you avoid this method.
If your storage needs are small enough to not need Ceph, use Mayastor.
### Rook/Ceph
[Ceph](https://ceph.io) is the grandfather of open source storage clusters.
It is big, has a lot of pieces, and will do just about anything.
It scales better than almost any other system out there, open source or proprietary, being able to easily add and remove storage over time with no downtime, safely and easily.
It comes bundled with RadosGW, an S3-compatible object store; CephFS, a NFS-like clustered filesystem; and RBD, a block storage system.
With the help of [Rook](https://rook.io), the vast majority of the complexity of Ceph is hidden away by a very robust operator, allowing you to control almost everything about your Ceph cluster from fairly simple Kubernetes CRDs.
So if Ceph is so great, why not use it for everything?
Ceph can be rather slow for small clusters.
It relies heavily on CPUs and massive parallelisation to provide good cluster performance, so if you don't have much of those dedicated to Ceph, it is not going to be well-optimised for you.
Also, if your cluster is small, just running Ceph may eat up a significant amount of the resources you have available.
Troubleshooting Ceph can be difficult if you do not understand its architecture.
There are lots of acronyms and the documentation assumes a fair level of knowledge.
There are very good tools for inspection and debugging, but this is still frequently seen as a concern.
### Mayastor
[Mayastor](https://github.com/openebs/Mayastor) is an OpenEBS project built in Rust utilising the modern NVMEoF system.
(Despite the name, Mayastor does _not_ require you to have NVME drives.)
It is fast and lean but still cluster-oriented and cloud native.
Unlike most of the other OpenEBS project, it is _not_ built on the ancient iSCSI system.
Unlike Ceph, Mayastor is _just_ a block store.
It focuses on block storage and does it well.
It is much less complicated to set up than Ceph, but you probably wouldn't want to use it for more than a few dozen disks.
Mayastor is new, maybe _too_ new.
If you're looking for something well-tested and battle-hardened, this is not it.
However, if you're looking for something lean, future-oriented, and simpler than Ceph, it might be a great choice.
#### Video Walkthrough
To see a live demo of this section, see the video below:
<iframe width="560" height="315" src="https://www.youtube.com/embed/q86Kidk81xE" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
#### Prep Nodes
Either during initial cluster creation or on running worker nodes, several machine config values should be edited.
(This information is gathered from the Mayastor [documentation](https://mayastor.gitbook.io/introduction/quickstart/preparing-the-cluster).)
We need to set the `vm.nr_hugepages` sysctl and add `openebs.io/engine=mayastor` labels to the nodes which are meant to be storage nodes.
This can be done with `talosctl patch machineconfig` or via config patches during `talosctl gen config`.
Some examples are shown below: modify as needed.
First create a config patch file named `mayastor-patch.yaml` with the following contents:
```yaml
- op: add
path: /machine/sysctls
value:
vm.nr_hugepages: "1024"
- op: add
path: /machine/nodeLabels
value:
openebs.io/engine: mayastor
```
Using gen config
```bash
talosctl gen config my-cluster https://mycluster.local:6443 --config-patch @mayastor-patch.yaml
```
Patching an existing node
```bash
talosctl patch --mode=no-reboot machineconfig -n <node ip> --patch @mayastor-patch.yaml
```
> Note: If you are adding/updating the `vm.nr_hugepages` on a node which already had the `openebs.io/engine=mayastor` label set, you'd need to restart kubelet so that it picks up the new value, by issuing the following command
```bash
talosctl -n <node ip> service kubelet restart
```
#### Deploy Mayastor
Continue setting up [Mayastor](https://mayastor.gitbook.io/introduction/quickstart/deploy-mayastor) using the official documentation.
### Piraeus / LINSTOR
* [Piraeus-Operator](https://piraeus.io/)
* [LINSTOR](https://linbit.com/drbd/)
* [DRBD Extension](https://github.com/siderolabs/extensions#storage)
#### Install Piraeus Operator V2
There is already a how-to for Talos: [Link](https://github.com/piraeusdatastore/piraeus-operator/blob/v2/docs/how-to/talos.md)
#### Create first storage pool and PVC
Before proceeding, install linstor plugin for kubectl:
https://github.com/piraeusdatastore/kubectl-linstor
Or use [krew](https://krew.sigs.k8s.io/): `kubectl krew install linstor`
```sh
# Create device pool on a blank (no partitation table!) disk on node01
kubectl linstor physical-storage create-device-pool --pool-name nvme_lvm_pool LVM node01 /dev/nvme0n1 --storage-pool nvme_pool
```
piraeus-sc.yml
```yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: simple-nvme
parameters:
csi.storage.k8s.io/fstype: xfs
linstor.csi.linbit.com/autoPlace: "3"
linstor.csi.linbit.com/storagePool: nvme_pool
provisioner: linstor.csi.linbit.com
volumeBindingMode: WaitForFirstConsumer
```
```sh
# Create storage class
kubectl apply -f piraeus-sc.yml
```
## NFS
NFS is an old pack animal long past its prime.
NFS is slow, has all kinds of bottlenecks involving contention, distributed locking, single points of service, and more.
However, it is supported by a wide variety of systems.
You don't want to use it unless you have to, but unfortunately, that "have to" is too frequent.
The NFS client is part of the [`kubelet` image](https://github.com/talos-systems/kubelet) maintained by the Talos team.
This means that the version installed in your running `kubelet` is the version of NFS supported by Talos.
You can reduce some of the contention problems by parceling Persistent Volumes from separate underlying directories.
## Object storage
Ceph comes with an S3-compatible object store, but there are other options, as
well.
These can often be built on top of other storage backends.
For instance, you may have your block storage running with Mayastor but assign a
Pod a large Persistent Volume to serve your object store.
One of the most popular open source add-on object stores is [MinIO](https://min.io/).
## Others (iSCSI)
The most common remaining systems involve iSCSI in one form or another.
These include the original OpenEBS, Rancher's Longhorn, and many proprietary systems.
iSCSI in Linux is facilitated by [open-iscsi](https://github.com/open-iscsi/open-iscsi).
This system was designed long before containers caught on, and it is not well
suited to the task, especially when coupled with a read-only host operating
system.
iSCSI support in Talos is now supported via the [iscsi-tools](https://github.com/siderolabs/extensions/pkgs/container/iscsi-tools) [system extension]({{< relref "../../talos-guides/configuration/system-extensions" >}}) installed.
The extension enables compatibility with OpenEBS Jiva - refer to the [local storage]({{< relref "replicated-local-storage-with-openebs-jiva.md" >}}) installation guide for more information.

View File

@ -0,0 +1,259 @@
---
title: "iSCSI Storage with Synology CSI"
description: "Automatically provision iSCSI volumes on a Synology NAS with the synology-csi driver."
aliases:
- ../../guides/synology-csi
---
## Background
Synology is a company that specializes in Network Attached Storage (NAS) devices.
They provide a number of features within a simple web OS, including an LDAP server, Docker support, and (perhaps most relevant to this guide) function as an iSCSI host.
The focus of this guide is to allow a Kubernetes cluster running on Talos to provision Kubernetes storage (both dynamic or static) on a Synology NAS using a direct integration, rather than relying on an intermediary layer like Rook/Ceph or Maystor.
This guide assumes a very basic familiarity with iSCSI terminology (LUN, iSCSI target, etc.).
## Prerequisites
* Synology NAS running DSM 7.0 or above
* Provisioned Talos cluster running Kubernetes v1.20 or above
* (Optional) Both [Volume Snapshot CRDs](https://github.com/kubernetes-csi/external-snapshotter/tree/v4.0.0/client/config/crd) and the [common snapshot controller](https://github.com/kubernetes-csi/external-snapshotter/tree/v4.0.0/deploy/kubernetes/snapshot-controller) must be installed in your Kubernetes cluster if you want to use the **Snapshot** feature
## Setting up the Synology user account
The `synology-csi` controller interacts with your NAS in two different ways: via the API and via the iSCSI protocol.
Actions such as creating a new iSCSI target or deleting an old one are accomplished via the Synology API, and require administrator access.
On the other hand, mounting the disk to a pod and reading from / writing to it will utilize iSCSI.
Because you can only authenticate with one account per DSM configured, that account needs to have admin privileges.
In order to minimize access in the case of these credentials being compromised, you should configure the account with the lease possible amount of access explicitly specify "No Access" on all volumes when configuring the user permissions.
## Setting up the Synology CSI
> Note: this guide is paraphrased from the Synology CSI [readme](https://github.com/zebernst/synology-csi-talos).
> Please consult the readme for more in-depth instructions and explanations.
Clone the git repository.
```bash
git clone https://github.com/zebernst/synology-csi-talos.git
```
While Synology provides some automated scripts to deploy the CSI driver, they can be finicky especially when making changes to the source code.
We will be configuring and deploying things manually in this guide.
The relevant files we will be touching are in the following locations:
```text
.
├── Dockerfile
├── Makefile
├── config
│ └── client-info-template.yml
└── deploy
└── kubernetes
└── v1.20
├── controller.yml
├── csi-driver.yml
├── namespace.yml
├── node.yml
├── snapshotter
│ ├── snapshotter.yaml
│ └── volume-snapshot-class.yml
└── storage-class.yml
```
### Configure connection info
Use `config/client-info-template.yml` as an example to configure the connection information for DSM.
You can specify **one or more** storage systems on which the CSI volumes will be created.
See below for an example:
```yaml
---
clients:
- host: 192.168.1.1 # ipv4 address or domain of the DSM
port: 5000 # port for connecting to the DSM
https: false # set this true to use https. you need to specify the port to DSM HTTPS port as well
username: username # username
password: password # password
```
Create a Kubernetes secret using the client information config file.
```bash
kubectl create secret -n synology-csi generic client-info-secret --from-file=config/client-info.yml
```
Note that if you rename the secret to something other than `client-info-secret`, make sure you update the corresponding references in the deployment manifests as well.
### Build the Talos-compatible image
Modify the `Makefile` so that the image is built and tagged under your GitHub Container Registry username:
```makefile
REGISTRY_NAME=ghcr.io/<username>
```
When you run `make docker-build` or `make docker-build-multiarch`, it will push the resulting image to `ghcr.io/<username>/synology-csi:v1.1.0`.
Ensure that you find and change any reference to `synology/synology-csi:v1.1.0` to point to your newly-pushed image within the deployment manifests.
### Configure the CSI driver
By default, the deployment manifests include one storage class and one volume snapshot class.
See below for examples:
```yaml
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
annotations:
storageclass.kubernetes.io/is-default-class: "false"
name: syno-storage
provisioner: csi.san.synology.com
parameters:
fsType: 'ext4'
dsm: '192.168.1.1'
location: '/volume1'
reclaimPolicy: Retain
allowVolumeExpansion: true
---
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
name: syno-snapshot
annotations:
storageclass.kubernetes.io/is-default-class: "false"
driver: csi.san.synology.com
deletionPolicy: Delete
parameters:
description: 'Kubernetes CSI'
```
It can be useful to configure multiple different StorageClasses.
For example, a popular strategy is to create two nearly identical StorageClasses, with one configured with `reclaimPolicy: Retain` and the other with `reclaimPolicy: Delete`.
Alternately, a workload may require a specific filesystem, such as `ext4`.
If a Synology NAS is going to be the most common way to configure storage on your cluster, it can be convenient to add the `storageclass.kubernetes.io/is-default-class: "true"` annotation to one of your StorageClasses.
The following table details the configurable parameters for the Synology StorageClass.
| Name | Type | Description | Default | Supported protocols |
| ------------------------------------------------ | ------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------- | ------------------- |
| *dsm* | string | The IPv4 address of your DSM, which must be included in the `client-info.yml` for the CSI driver to log in to DSM | - | iSCSI, SMB |
| *location* | string | The location (/volume1, /volume2, ...) on DSM where the LUN for *PersistentVolume* will be created | - | iSCSI, SMB |
| *fsType* | string | The formatting file system of the *PersistentVolumes* when you mount them on the pods. This parameter only works with iSCSI. For SMB, the fsType is always cifs. | `ext4` | iSCSI |
| *protocol* | string | The backing storage protocol. Enter iscsi to create LUNs or smb to create shared folders on DSM. | `iscsi` | iSCSI, SMB |
| *csi.storage.k8s.io/node-stage-secret-name* | string | The name of node-stage-secret. Required if DSM shared folder is accessed via SMB. | - | SMB |
| *csi.storage.k8s.io/node-stage-secret-namespace* | string | The namespace of node-stage-secret. Required if DSM shared folder is accessed via SMB. | - | SMB |
The VolumeSnapshotClass can be similarly configured with the following parameters:
| Name | Type | Description | Default | Supported protocols |
| ------------- | ------ | -------------------------------------------- | ------- | ------------------- |
| *description* | string | The description of the snapshot on DSM | - | iSCSI |
| *is_locked* | string | Whether you want to lock the snapshot on DSM | `false` | iSCSI, SMB |
### Apply YAML manifests
Once you have created the desired StorageClass(es) and VolumeSnapshotClass(es), the final step is to apply the Kubernetes manifests against the cluster.
The easiest way to apply them all at once is to create a `kustomization.yaml` file in the same directory as the manifests and use Kustomize to apply:
```bash
kubectl apply -k path/to/manifest/directory
```
Alternately, you can apply each manifest one-by-one:
```bash
kubectl apply -f <file>
```
## Run performance tests
In order to test the provisioning, mounting, and performance of using a Synology NAS as Kubernetes persistent storage, use the following command:
```bash
kubectl apply -f speedtest.yaml
```
Content of speedtest.yaml ([source](https://github.com/phnmnl/k8s-volume-test))
```yaml
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: test-claim
spec:
# storageClassName: syno-storage
accessModes:
- ReadWriteMany
resources:
requests:
storage: 5G
---
apiVersion: batch/v1
kind: Job
metadata:
name: read
spec:
template:
metadata:
name: read
labels:
app: speedtest
job: read
spec:
containers:
- name: read
image: ubuntu:xenial
command: ["dd","if=/mnt/pv/test.img","of=/dev/null","bs=8k"]
volumeMounts:
- mountPath: "/mnt/pv"
name: test-volume
volumes:
- name: test-volume
persistentVolumeClaim:
claimName: test-claim
restartPolicy: Never
---
apiVersion: batch/v1
kind: Job
metadata:
name: write
spec:
template:
metadata:
name: write
labels:
app: speedtest
job: write
spec:
containers:
- name: write
image: ubuntu:xenial
command: ["dd","if=/dev/zero","of=/mnt/pv/test.img","bs=1G","count=1","oflag=dsync"]
volumeMounts:
- mountPath: "/mnt/pv"
name: test-volume
volumes:
- name: test-volume
persistentVolumeClaim:
claimName: test-claim
restartPolicy: Never
```
If these two jobs complete successfully, use the following commands to get the results of the speed tests:
```bash
# Pod logs for read test:
kubectl logs -l app=speedtest,job=read
# Pod logs for write test:
kubectl logs -l app=speedtest,job=write
```
When you're satisfied with the results of the test, delete the artifacts created from the speedtest:
```bash
kubectl delete -f speedtest.yaml
```

View File

@ -0,0 +1,5 @@
---
title: "Network"
weight: 20
description: "Managing the Kubernetes cluster networking"
---

View File

@ -0,0 +1,293 @@
---
title: "Deploying Cilium CNI"
description: "In this guide you will learn how to set up Cilium CNI on Talos."
aliases:
- ../../guides/deploying-cilium
---
> Cilium can be installed either via the `cilium` cli or using `helm`.
This documentation will outline installing Cilium CNI v1.14.0 on Talos in six different ways.
Adhering to Talos principles we'll deploy Cilium with IPAM mode set to Kubernetes, and using the `cgroupv2` and `bpffs` mount that talos already provides.
As Talos does not allow loading kernel modules by Kubernetes workloads, `SYS_MODULE` capability needs to be dropped from the Cilium default set of values, this override can be seen in the helm/cilium cli install commands.
Each method can either install Cilium using kube proxy (default) or without: [Kubernetes Without kube-proxy](https://docs.cilium.io/en/v1.14/network/kubernetes/kubeproxy-free/)
In this guide we assume that [KubePrism]({{< relref "../configuration/kubeprism" >}}) is enabled and configured to use the port 7445.
## Machine config preparation
When generating the machine config for a node set the CNI to none.
For example using a config patch:
Create a `patch.yaml` file with the following contents:
```yaml
cluster:
network:
cni:
name: none
```
```bash
talosctl gen config \
my-cluster https://mycluster.local:6443 \
--config-patch @patch.yaml
```
Or if you want to deploy Cilium without kube-proxy, you also need to disable kube proxy:
Create a `patch.yaml` file with the following contents:
```yaml
cluster:
network:
cni:
name: none
proxy:
disabled: true
```
```bash
talosctl gen config \
my-cluster https://mycluster.local:6443 \
--config-patch @patch.yaml
```
### Installation using Cilium CLI
> Note: It is recommended to template the cilium manifest using helm and use it as part of Talos machine config, but if you want to install Cilium using the Cilium CLI, you can follow the steps below.
Install the [Cilium CLI](https://docs.cilium.io/en/v1.13/gettingstarted/k8s-install-default/#install-the-cilium-cli) following the steps here.
#### With kube-proxy
```bash
cilium install \
--helm-set=ipam.mode=kubernetes \
--helm-set=kubeProxyReplacement=disabled \
--helm-set=securityContext.capabilities.ciliumAgent="{CHOWN,KILL,NET_ADMIN,NET_RAW,IPC_LOCK,SYS_ADMIN,SYS_RESOURCE,DAC_OVERRIDE,FOWNER,SETGID,SETUID}" \
--helm-set=securityContext.capabilities.cleanCiliumState="{NET_ADMIN,SYS_ADMIN,SYS_RESOURCE}" \
--helm-set=cgroup.autoMount.enabled=false \
--helm-set=cgroup.hostRoot=/sys/fs/cgroup
```
#### Without kube-proxy
```bash
cilium install \
--helm-set=ipam.mode=kubernetes \
--helm-set=kubeProxyReplacement=true \
--helm-set=securityContext.capabilities.ciliumAgent="{CHOWN,KILL,NET_ADMIN,NET_RAW,IPC_LOCK,SYS_ADMIN,SYS_RESOURCE,DAC_OVERRIDE,FOWNER,SETGID,SETUID}" \
--helm-set=securityContext.capabilities.cleanCiliumState="{NET_ADMIN,SYS_ADMIN,SYS_RESOURCE}" \
--helm-set=cgroup.autoMount.enabled=false \
--helm-set=cgroup.hostRoot=/sys/fs/cgroup \
--helm-set=k8sServiceHost=localhost \
--helm-set=k8sServicePort=7445
```
### Installation using Helm
Refer to [Installing with Helm](https://docs.cilium.io/en/v1.13/installation/k8s-install-helm/) for more information.
First we'll need to add the helm repo for Cilium.
```bash
helm repo add cilium https://helm.cilium.io/
helm repo update
```
### Method 1: Helm install
After applying the machine config and bootstrapping Talos will appear to hang on phase 18/19 with the message: retrying error: node not ready.
This happens because nodes in Kubernetes are only marked as ready once the CNI is up.
As there is no CNI defined, the boot process is pending and will reboot the node to retry after 10 minutes, this is expected behavior.
During this window you can install Cilium manually by running the following:
```bash
helm install \
cilium \
cilium/cilium \
--version 1.14.0 \
--namespace kube-system \
--set ipam.mode=kubernetes \
--set=kubeProxyReplacement=disabled \
--set=securityContext.capabilities.ciliumAgent="{CHOWN,KILL,NET_ADMIN,NET_RAW,IPC_LOCK,SYS_ADMIN,SYS_RESOURCE,DAC_OVERRIDE,FOWNER,SETGID,SETUID}" \
--set=securityContext.capabilities.cleanCiliumState="{NET_ADMIN,SYS_ADMIN,SYS_RESOURCE}" \
--set=cgroup.autoMount.enabled=false \
--set=cgroup.hostRoot=/sys/fs/cgroup
```
Or if you want to deploy Cilium without kube-proxy, also set some extra paramaters:
```bash
helm install \
cilium \
cilium/cilium \
--version 1.14.0 \
--namespace kube-system \
--set ipam.mode=kubernetes \
--set=kubeProxyReplacement=true \
--set=securityContext.capabilities.ciliumAgent="{CHOWN,KILL,NET_ADMIN,NET_RAW,IPC_LOCK,SYS_ADMIN,SYS_RESOURCE,DAC_OVERRIDE,FOWNER,SETGID,SETUID}" \
--set=securityContext.capabilities.cleanCiliumState="{NET_ADMIN,SYS_ADMIN,SYS_RESOURCE}" \
--set=cgroup.autoMount.enabled=false \
--set=cgroup.hostRoot=/sys/fs/cgroup \
--set=k8sServiceHost=localhost \
--set=k8sServicePort=7445
```
After Cilium is installed the boot process should continue and complete successfully.
### Method 2: Helm manifests install
Instead of directly installing Cilium you can instead first generate the manifest and then apply it:
```bash
helm template \
cilium \
cilium/cilium \
--version 1.14.0 \
--namespace kube-system \
--set ipam.mode=kubernetes \
--set=kubeProxyReplacement=disabled \
--set=securityContext.capabilities.ciliumAgent="{CHOWN,KILL,NET_ADMIN,NET_RAW,IPC_LOCK,SYS_ADMIN,SYS_RESOURCE,DAC_OVERRIDE,FOWNER,SETGID,SETUID}" \
--set=securityContext.capabilities.cleanCiliumState="{NET_ADMIN,SYS_ADMIN,SYS_RESOURCE}" \
--set=cgroup.autoMount.enabled=false \
--set=cgroup.hostRoot=/sys/fs/cgroup > cilium.yaml
kubectl apply -f cilium.yaml
```
Without kube-proxy:
```bash
helm template \
cilium \
cilium/cilium \
--version 1.14.0 \
--namespace kube-system \
--set ipam.mode=kubernetes \
--set=kubeProxyReplacement=true \
--set=securityContext.capabilities.ciliumAgent="{CHOWN,KILL,NET_ADMIN,NET_RAW,IPC_LOCK,SYS_ADMIN,SYS_RESOURCE,DAC_OVERRIDE,FOWNER,SETGID,SETUID}" \
--set=securityContext.capabilities.cleanCiliumState="{NET_ADMIN,SYS_ADMIN,SYS_RESOURCE}" \
--set=cgroup.autoMount.enabled=false \
--set=cgroup.hostRoot=/sys/fs/cgroup \
--set=k8sServiceHost=localhost \
--set=k8sServicePort=7445 > cilium.yaml
kubectl apply -f cilium.yaml
```
### Method 3: Helm manifests hosted install
After generating `cilium.yaml` using `helm template`, instead of applying this manifest directly during the Talos boot window (before the reboot timeout).
You can also host this file somewhere and patch the machine config to apply this manifest automatically during bootstrap.
To do this patch your machine configuration to include this config instead of the above:
Create a `patch.yaml` file with the following contents:
```yaml
cluster:
network:
cni:
name: custom
urls:
- https://server.yourdomain.tld/some/path/cilium.yaml
```
```bash
talosctl gen config \
my-cluster https://mycluster.local:6443 \
--config-patch @patch.yaml
```
However, beware of the fact that the helm generated Cilium manifest contains sensitive key material.
As such you should definitely not host this somewhere publicly accessible.
### Method 4: Helm manifests inline install
A more secure option would be to include the `helm template` output manifest inside the machine configuration.
The machine config should be generated with CNI set to `none`
Create a `patch.yaml` file with the following contents:
```yaml
cluster:
network:
cni:
name: none
```
```bash
talosctl gen config \
my-cluster https://mycluster.local:6443 \
--config-patch @patch.yaml
```
if deploying Cilium with `kube-proxy` disabled, you can also include the following:
Create a `patch.yaml` file with the following contents:
```yaml
cluster:
network:
cni:
name: none
proxy:
disabled: true
machine:
features:
kubePrism:
enabled: true
port: 7445
```
```bash
talosctl gen config \
my-cluster https://mycluster.local:6443 \
--config-patch @patch.yaml
```
To do so patch this into your machine configuration:
``` yaml
inlineManifests:
- name: cilium
contents: |
--
# Source: cilium/templates/cilium-agent/serviceaccount.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: "cilium"
namespace: kube-system
---
# Source: cilium/templates/cilium-operator/serviceaccount.yaml
apiVersion: v1
kind: ServiceAccount
-> Your cilium.yaml file will be pretty long....
```
This will install the Cilium manifests at just the right time during bootstrap.
Beware though:
- Changing the namespace when templating with Helm does not generate a manifest containing the yaml to create that namespace.
As the inline manifest is processed from top to bottom make sure to manually put the namespace yaml at the start of the inline manifest.
- Only add the Cilium inline manifest to the control plane nodes machine configuration.
- Make sure all control plane nodes have an identical configuration.
- If you delete any of the generated resources they will be restored whenever a control plane node reboots.
- As a safety messure Talos only creates missing resources from inline manifests, it never deletes or updates anything.
- If you need to update a manifest make sure to first edit all control plane machine configurations and then run `talosctl upgrade-k8s` as it will take care of updating inline manifests.
## Known issues
- There are some gotchas when using Talos and Cilium on the Google cloud platform when using internal load balancers.
For more details: [GCP ILB support / support scope local routes to be configured](https://github.com/siderolabs/talos/issues/4109)
## Other things to know
- Talos has full kernel module support for eBPF, See:
- [Cilium System Requirements](https://docs.cilium.io/en/v1.14/operations/system_requirements/)
- [Talos Kernel Config AMD64](https://github.com/siderolabs/pkgs/blob/main/kernel/build/config-amd64)
- [Talos Kernel Config ARM64](https://github.com/siderolabs/pkgs/blob/main/kernel/build/config-arm64)

View File

@ -0,0 +1,354 @@
---
title: "Upgrading Kubernetes"
description: "Guide on how to upgrade the Kubernetes cluster from Talos Linux."
aliases:
- guides/upgrading-kubernetes
---
This guide covers upgrading Kubernetes on Talos Linux clusters.
For a list of Kubernetes versions compatible with each Talos release, see the [Support Matrix]({{< relref "../introduction/support-matrix" >}}).
For upgrading the Talos Linux operating system, see [Upgrading Talos]({{< relref "../talos-guides/upgrading-talos" >}})
## Video Walkthrough
To see a demo of this process, watch this video:
<iframe width="560" height="315" src="https://www.youtube.com/embed/uOKveKbD8MQ" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
## Automated Kubernetes Upgrade
The recommended method to upgrade Kubernetes is to use the `talosctl upgrade-k8s` command.
This will automatically update the components needed to upgrade Kubernetes safely.
Upgrading Kubernetes is non-disruptive to the cluster workloads.
To trigger a Kubernetes upgrade, issue a command specifying the version of Kubernetes to ugprade to, such as:
`talosctl --nodes <controlplane node> upgrade-k8s --to {{< k8s_release >}}`
Note that the `--nodes` parameter specifies the control plane node to send the API call to, but all members of the cluster will be upgraded.
To check what will be upgraded you can run `talosctl upgrade-k8s` with the `--dry-run` flag:
```bash
$ talosctl --nodes <controlplane node> upgrade-k8s --to {{< k8s_release >}} --dry-run
WARNING: found resources which are going to be deprecated/migrated in the version {{< k8s_release >}}
RESOURCE COUNT
validatingwebhookconfigurations.v1beta1.admissionregistration.k8s.io 4
mutatingwebhookconfigurations.v1beta1.admissionregistration.k8s.io 3
customresourcedefinitions.v1beta1.apiextensions.k8s.io 25
apiservices.v1beta1.apiregistration.k8s.io 54
leases.v1beta1.coordination.k8s.io 4
automatically detected the lowest Kubernetes version {{< k8s_prev_release >}}
checking for resource APIs to be deprecated in version {{< k8s_release >}}
discovered controlplane nodes ["172.20.0.2" "172.20.0.3" "172.20.0.4"]
discovered worker nodes ["172.20.0.5" "172.20.0.6"]
updating "kube-apiserver" to version "{{< k8s_release >}}"
> "172.20.0.2": starting update
> update kube-apiserver: v{{< k8s_prev_release >}} -> {{< k8s_release >}}
> skipped in dry-run
> "172.20.0.3": starting update
> update kube-apiserver: v{{< k8s_prev_release >}} -> {{< k8s_release >}}
> skipped in dry-run
> "172.20.0.4": starting update
> update kube-apiserver: v{{< k8s_prev_release >}} -> {{< k8s_release >}}
> skipped in dry-run
updating "kube-controller-manager" to version "{{< k8s_release >}}"
> "172.20.0.2": starting update
> update kube-controller-manager: v{{< k8s_prev_release >}} -> {{< k8s_release >}}
> skipped in dry-run
> "172.20.0.3": starting update
<snip>
updating manifests
> apply manifest Secret bootstrap-token-3lb63t
> apply skipped in dry run
> apply manifest ClusterRoleBinding system-bootstrap-approve-node-client-csr
> apply skipped in dry run
<snip>
```
To upgrade Kubernetes from v{{< k8s_prev_release >}} to v{{< k8s_release >}} run:
```bash
$ talosctl --nodes <controlplane node> upgrade-k8s --to {{< k8s_release >}}
automatically detected the lowest Kubernetes version {{< k8s_prev_release >}}
checking for resource APIs to be deprecated in version {{< k8s_release >}}
discovered controlplane nodes ["172.20.0.2" "172.20.0.3" "172.20.0.4"]
discovered worker nodes ["172.20.0.5" "172.20.0.6"]
updating "kube-apiserver" to version "{{< k8s_release >}}"
> "172.20.0.2": starting update
> update kube-apiserver: v{{< k8s_prev_release >}} -> {{< k8s_release >}}
> "172.20.0.2": machine configuration patched
> "172.20.0.2": waiting for API server state pod update
< "172.20.0.2": successfully updated
> "172.20.0.3": starting update
> update kube-apiserver: v{{< k8s_prev_release >}} -> {{< k8s_release >}}
<snip>
```
This command runs in several phases:
1. Images for new Kubernetes components are pre-pulled to the nodes to minimize downtime and test for image availability.
2. Every control plane node machine configuration is patched with the new image version for each control plane component.
Talos renders new static pod definitions on the configuration update which is picked up by the kubelet.
The command waits for the change to propagate to the API server state.
3. The command updates the `kube-proxy` daemonset with the new image version.
4. On every node in the cluster, the `kubelet` version is updated.
The command then waits for the `kubelet` service to be restarted and become healthy.
The update is verified by checking the `Node` resource state.
5. Kubernetes bootstrap manifests are re-applied to the cluster.
Updated bootstrap manifests might come with a new Talos version (e.g. CoreDNS version update), or might be the result of machine configuration change.
> Note: The `upgrade-k8s` command never deletes any resources from the cluster: they should be deleted manually.
If the command fails for any reason, it can be safely restarted to continue the upgrade process from the moment of the failure.
## Manual Kubernetes Upgrade
Kubernetes can be upgraded manually by following the steps outlined below.
They are equivalent to the steps performed by the `talosctl upgrade-k8s` command.
### Kubeconfig
In order to edit the control plane, you need a working `kubectl` config.
If you don't already have one, you can get one by running:
```bash
talosctl --nodes <controlplane node> kubeconfig
```
### API Server
Patch machine configuration using `talosctl patch` command:
```bash
$ talosctl -n <CONTROL_PLANE_IP_1> patch mc --mode=no-reboot -p '[{"op": "replace", "path": "/cluster/apiServer/image", "value": "registry.k8s.io/kube-apiserver:v{{< k8s_release >}}"}]'
patched mc at the node 172.20.0.2
```
The JSON patch might need to be adjusted if current machine configuration is missing `.cluster.apiServer.image` key.
Also the machine configuration can be edited manually with `talosctl -n <IP> edit mc --mode=no-reboot`.
Capture the new version of `kube-apiserver` config with:
```bash
$ talosctl -n <CONTROL_PLANE_IP_1> get kcpc kube-apiserver -o yaml
node: 172.20.0.2
metadata:
namespace: config
type: KubernetesControlPlaneConfigs.config.talos.dev
id: kube-apiserver
version: 5
phase: running
spec:
image: registry.k8s.io/kube-apiserver:v{{< k8s_release >}}
cloudProvider: ""
controlPlaneEndpoint: https://172.20.0.1:6443
etcdServers:
- https://127.0.0.1:2379
localPort: 6443
serviceCIDR: 10.96.0.0/12
extraArgs: {}
extraVolumes: []
```
In this example, the new version is `5`.
Wait for the new pod definition to propagate to the API server state (replace `talos-default-controlplane-1` with the node name):
```bash
$ kubectl get pod -n kube-system -l k8s-app=kube-apiserver --field-selector spec.nodeName=talos-default-controlplane-1 -o jsonpath='{.items[0].metadata.annotations.talos\.dev/config\-version}'
5
```
Check that the pod is running:
```bash
$ kubectl get pod -n kube-system -l k8s-app=kube-apiserver --field-selector spec.nodeName=talos-default-controlplane-1
NAME READY STATUS RESTARTS AGE
kube-apiserver-talos-default-controlplane-1 1/1 Running 0 16m
```
Repeat this process for every control plane node, verifying that state got propagated successfully between each node update.
### Controller Manager
Patch machine configuration using `talosctl patch` command:
```bash
$ talosctl -n <CONTROL_PLANE_IP_1> patch mc --mode=no-reboot -p '[{"op": "replace", "path": "/cluster/controllerManager/image", "value": "registry.k8s.io/kube-controller-manager:v{{< k8s_release >}}"}]'
patched mc at the node 172.20.0.2
```
The JSON patch might need be adjusted if current machine configuration is missing `.cluster.controllerManager.image` key.
Capture new version of `kube-controller-manager` config with:
```bash
$ talosctl -n <CONTROL_PLANE_IP_1> get kcpc kube-controller-manager -o yaml
node: 172.20.0.2
metadata:
namespace: config
type: KubernetesControlPlaneConfigs.config.talos.dev
id: kube-controller-manager
version: 3
phase: running
spec:
image: registry.k8s.io/kube-controller-manager:v{{< k8s_release >}}
cloudProvider: ""
podCIDR: 10.244.0.0/16
serviceCIDR: 10.96.0.0/12
extraArgs: {}
extraVolumes: []
```
In this example, new version is `3`.
Wait for the new pod definition to propagate to the API server state (replace `talos-default-controlplane-1` with the node name):
```bash
$ kubectl get pod -n kube-system -l k8s-app=kube-controller-manager --field-selector spec.nodeName=talos-default-controlplane-1 -o jsonpath='{.items[0].metadata.annotations.talos\.dev/config\-version}'
3
```
Check that the pod is running:
```bash
$ kubectl get pod -n kube-system -l k8s-app=kube-controller-manager --field-selector spec.nodeName=talos-default-controlplane-1
NAME READY STATUS RESTARTS AGE
kube-controller-manager-talos-default-controlplane-1 1/1 Running 0 35m
```
Repeat this process for every control plane node, verifying that state propagated successfully between each node update.
### Scheduler
Patch machine configuration using `talosctl patch` command:
```bash
$ talosctl -n <CONTROL_PLANE_IP_1> patch mc --mode=no-reboot -p '[{"op": "replace", "path": "/cluster/scheduler/image", "value": "registry.k8s.io/kube-scheduler:v{{< k8s_release >}}"}]'
patched mc at the node 172.20.0.2
```
JSON patch might need be adjusted if current machine configuration is missing `.cluster.scheduler.image` key.
Capture new version of `kube-scheduler` config with:
```bash
$ talosctl -n <CONTROL_PLANE_IP_1> get kcpc kube-scheduler -o yaml
node: 172.20.0.2
metadata:
namespace: config
type: KubernetesControlPlaneConfigs.config.talos.dev
id: kube-scheduler
version: 3
phase: running
spec:
image: registry.k8s.io/kube-scheduler:v{{< k8s_release >}}
extraArgs: {}
extraVolumes: []
```
In this example, new version is `3`.
Wait for the new pod definition to propagate to the API server state (replace `talos-default-controlplane-1` with the node name):
```bash
$ kubectl get pod -n kube-system -l k8s-app=kube-scheduler --field-selector spec.nodeName=talos-default-controlplane-1 -o jsonpath='{.items[0].metadata.annotations.talos\.dev/config\-version}'
3
```
Check that the pod is running:
```bash
$ kubectl get pod -n kube-system -l k8s-app=kube-scheduler --field-selector spec.nodeName=talos-default-controlplane-1
NAME READY STATUS RESTARTS AGE
kube-scheduler-talos-default-controlplane-1 1/1 Running 0 39m
```
Repeat this process for every control plane node, verifying that state got propagated successfully between each node update.
### Proxy
In the proxy's `DaemonSet`, change:
```yaml
kind: DaemonSet
...
spec:
...
template:
...
spec:
containers:
- name: kube-proxy
image: registry.k8s.io/kube-proxy:v{{< k8s_release >}}
tolerations:
- ...
```
to:
```yaml
kind: DaemonSet
...
spec:
...
template:
...
spec:
containers:
- name: kube-proxy
image: registry.k8s.io/kube-proxy:v{{< k8s_release >}}
tolerations:
- ...
- key: node-role.kubernetes.io/control-plane
operator: Exists
effect: NoSchedule
```
To edit the `DaemonSet`, run:
```bash
kubectl edit daemonsets -n kube-system kube-proxy
```
### Bootstrap Manifests
Bootstrap manifests can be retrieved in a format which works for `kubectl` with the following command:
```bash
talosctl -n <controlplane IP> get manifests -o yaml | yq eval-all '.spec | .[] | splitDoc' - > manifests.yaml
```
Diff the manifests with the cluster:
```bash
kubectl diff -f manifests.yaml
```
Apply the manifests:
```bash
kubectl apply -f manifests.yaml
```
> Note: if some bootstrap resources were removed, they have to be removed from the cluster manually.
### kubelet
For every node, patch machine configuration with new kubelet version, wait for the kubelet to restart with new version:
```bash
$ talosctl -n <IP> patch mc --mode=no-reboot -p '[{"op": "replace", "path": "/machine/kubelet/image", "value": "ghcr.io/siderolabs/kubelet:v{{< k8s_release >}}"}]'
patched mc at the node 172.20.0.2
```
Once `kubelet` restarts with the new configuration, confirm upgrade with `kubectl get nodes <name>`:
```bash
$ kubectl get nodes talos-default-controlplane-1
NAME STATUS ROLES AGE VERSION
talos-default-controlplane-1 Ready control-plane 123m v{{< k8s_release >}}
```

View File

@ -0,0 +1,4 @@
---
title: "Learn More"
weight: 80
---

View File

@ -0,0 +1,56 @@
---
title: "Architecture"
weight: 20
description: "Learn the system architecture of Talos Linux itself."
---
Talos is designed to be **atomic** in _deployment_ and **modular** in _composition_.
It is atomic in that the entirety of Talos is distributed as a
single, self-contained image, which is versioned, signed, and immutable.
It is modular in that it is composed of many separate components
which have clearly defined gRPC interfaces which facilitate internal flexibility
and external operational guarantees.
All of the main Talos components communicate with each other by gRPC, through a socket on the local machine.
This imposes a clear separation of concerns and ensures that changes over time which affect the interoperation of components are a part of the public git record.
The benefit is that each component may be iterated and changed as its needs dictate, so long as the external API is controlled.
This is a key component in reducing coupling and maintaining modularity.
## File system partitions
Talos uses these partitions with the following labels:
1. **EFI** - stores EFI boot data.
1. **BIOS** - used for GRUB's second stage boot.
1. **BOOT** - used for the boot loader, stores initramfs and kernel data.
1. **META** - stores metadata about the talos node, such as node id's.
1. **STATE** - stores machine configuration, node identity data for cluster discovery and KubeSpan info
1. **EPHEMERAL** - stores ephemeral state information, mounted at `/var`
## The File System
One of the unique design decisions in Talos is the layout of the root file system.
There are three "layers" to the Talos root file system.
At its core the rootfs is a read-only squashfs.
The squashfs is then mounted as a loop device into memory.
This provides Talos with an immutable base.
The next layer is a set of `tmpfs` file systems for runtime specific needs.
Aside from the standard pseudo file systems such as `/dev`, `/proc`, `/run`, `/sys` and `/tmp`, a special `/system` is created for internal needs.
One reason for this is that we need special files such as `/etc/hosts`, and `/etc/resolv.conf` to be writable (remember that the rootfs is read-only).
For example, at boot Talos will write `/system/etc/hosts` and then bind mount it over `/etc/hosts`.
This means that instead of making all of `/etc` writable, Talos only makes very specific files writable under `/etc`.
All files under `/system` are completely recreated on each boot.
For files and directories that need to persist across boots, Talos creates `overlayfs` file systems.
The `/etc/kubernetes` is a good example of this.
Directories like this are `overlayfs` backed by an XFS file system mounted at `/var`.
The `/var` directory is owned by Kubernetes with the exception of the above `overlayfs` file systems.
This directory is writable and used by `etcd` (in the case of control plane nodes), the kubelet, and the CRI (containerd).
Its content survives machine reboots, but it is wiped and lost on machine upgrades and resets, unless the
`--preserve` option of [`talosctl upgrade`]({{< relref "../reference/cli#talosctl-upgrade" >}}) or the
`--system-labels-to-wipe` option of [`talosctl reset`]({{< relref "../reference/cli#talosctl-reset" >}})
is used.

View File

@ -0,0 +1,120 @@
---
title: "Components"
weight: 40
description: "Understand the system components that make up Talos Linux."
---
In this section, we discuss the various components that underpin Talos.
## Components
Talos Linux and Kubernetes are tightly integrated.
![Talos Linux and Kubernetes components](/images/components.drawio.svg)
In the following, the focus is on the Talos Linux specific components.
| Component | Description |
| ---------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| apid | When interacting with Talos, the gRPC API endpoint you interact with directly is provided by `apid`. `apid` acts as the gateway for all component interactions and forwards the requests to `machined`. |
| containerd | An industry-standard container runtime with an emphasis on simplicity, robustness, and portability. To learn more, see the [containerd website](https://containerd.io). |
| machined | Talos replacement for the traditional Linux init-process. Specially designed to run Kubernetes and does not allow starting arbitrary user services. |
| kernel | The Linux kernel included with Talos is configured according to the recommendations outlined in the [Kernel Self Protection Project](http://kernsec.org/wiki/index.php/Kernel_Self_Protection_Project). |
| trustd | To run and operate a Kubernetes cluster, a certain level of trust is required. Based on the concept of a 'Root of Trust', `trustd` is a simple daemon responsible for establishing trust within the system. |
| udevd | Implementation of `eudev` into `machined`. `eudev` is Gentoo's fork of udev, systemd's device file manager for the Linux kernel. It manages device nodes in /dev and handles all user space actions when adding or removing devices. To learn more, see the [Gentoo Wiki](https://wiki.gentoo.org/wiki/Eudev). |
### apid
When interacting with Talos, the gRPC api endpoint you will interact with directly is `apid`.
Apid acts as the gateway for all component interactions.
Apid provides a mechanism to route requests to the appropriate destination when running on a control plane node.
We'll use some examples below to illustrate what `apid` is doing.
When a user wants to interact with a Talos component via `talosctl`, there are two flags that control the interaction with `apid`.
The `-e | --endpoints` flag specifies which Talos node ( via `apid` ) should handle the connection.
Typically this is a public-facing server.
The `-n | --nodes` flag specifies which Talos node(s) should respond to the request.
If `--nodes` is omitted, the first endpoint will be used.
> Note: Typically, there will be an `endpoint` already defined in the Talos config file.
> Optionally, `nodes` can be included here as well.
For example, if a user wants to interact with `machined`, a command like `talosctl -e cluster.talos.dev memory` may be used.
```bash
$ talosctl -e cluster.talos.dev memory
NODE TOTAL USED FREE SHARED BUFFERS CACHE AVAILABLE
cluster.talos.dev 7938 1768 2390 145 53 3724 6571
```
In this case, `talosctl` is interacting with `apid` running on `cluster.talos.dev` and forwarding the request to the `machined` api.
If we wanted to extend our example to retrieve `memory` from another node in our cluster, we could use the command `talosctl -e cluster.talos.dev -n node02 memory`.
```bash
$ talosctl -e cluster.talos.dev -n node02 memory
NODE TOTAL USED FREE SHARED BUFFERS CACHE AVAILABLE
node02 7938 1768 2390 145 53 3724 6571
```
The `apid` instance on `cluster.talos.dev` receives the request and forwards it to `apid` running on `node02`, which forwards the request to the `machined` api.
We can further extend our example to retrieve `memory` for all nodes in our cluster by appending additional `-n node` flags or using a comma separated list of nodes ( `-n node01,node02,node03` ):
```bash
$ talosctl -e cluster.talos.dev -n node01 -n node02 -n node03 memory
NODE TOTAL USED FREE SHARED BUFFERS CACHE AVAILABLE
node01 7938 871 4071 137 49 2945 7042
node02 257844 14408 190796 18138 49 52589 227492
node03 257844 1830 255186 125 49 777 254556
```
The `apid` instance on `cluster.talos.dev` receives the request and forwards it to `node01`, `node02`, and `node03`, which then forwards the request to their local `machined` api.
### containerd
[Containerd](https://github.com/containerd/containerd) provides the container runtime to launch workloads on Talos and Kubernetes.
Talos services are namespaced under the `system` namespace in containerd, whereas the Kubernetes services are namespaced under the `k8s.io` namespace.
### machined
A common theme throughout the design of Talos is minimalism.
We believe strongly in the UNIX philosophy that each program should do one job well.
The `init` included in Talos is one example of this, and we are calling it "`machined`".
We wanted to create a focused `init` that had one job - run Kubernetes.
To that extent, `machined` is relatively static in that it does not allow for arbitrary user-defined services.
Only the services necessary to run Kubernetes and manage the node are available.
This includes:
- containerd
- etcd
- [kubelet](https://kubernetes.io/docs/concepts/overview/components/)
- networkd
- trustd
- udevd
The `machined` process handles all machine configuration, API handling, resource and controller management.
### kernel
The Linux kernel included with Talos is configured according to the recommendations outlined in the Kernel Self Protection Project ([KSSP](http://kernsec.org/wiki/index.php/Kernel_Self_Protection_Project)).
### trustd
Security is one of the highest priorities within Talos.
To run a Kubernetes cluster, a certain level of trust is required to operate a cluster.
For example, orchestrating the bootstrap of a highly available control plane requires sensitive PKI data distribution.
To that end, we created `trustd`.
Based on a Root of Trust concept, `trustd` is a simple daemon responsible for establishing trust within the system.
Once trust is established, various methods become available to the trustee.
For example, it can accept a write request from another node to place a file on disk.
Additional methods and capabilities will be added to the `trustd` component to support new functionality in the rest of the Talos environment.
### udevd
Udevd handles the kernel device notifications and sets up the necessary links in `/dev`.

View File

@ -0,0 +1,145 @@
---
title: "Control Plane"
weight: 50
description: "Understand the Kubernetes Control Plane."
---
This guide provides information about the Kubernetes control plane, and details on how Talos runs and bootstraps the Kubernetes control plane.
<!-- markdownlint-disable MD026 -->
## What is a control plane node?
A control plane node is a node which:
- runs etcd, the Kubernetes database
- runs the Kubernetes control plane
- kube-apiserver
- kube-controller-manager
- kube-scheduler
- serves as an administrative proxy to the worker nodes
These nodes are critical to the operation of your cluster.
Without control plane nodes, Kubernetes will not respond to changes in the
system, and certain central services may not be available.
Talos nodes which have `.machine.type` of `controlplane` are control plane nodes.
(check via `talosctl get member`)
Control plane nodes are tainted by default to prevent workloads from being scheduled onto them.
This is both to protect the control plane from workloads consuming resources and starving the control plane processes, and also to reduce the risk of a vulnerability exposes the control plane's credentials to a workload.
## The Control Plane and Etcd
A critical design concept of Kubernetes (and Talos) is the `etcd` database.
Properly managed (which Talos Linux does), `etcd` should never have split brain or noticeable down time.
In order to do this, `etcd` maintains the concept of "membership" and of
"quorum".
To perform any operation, read or write, the database requires
quorum.
That is, a majority of members must agree on the current leader, and absenteeism (members that are down, or not reachable)
counts as a negative.
For example, if there are three members, at least two out
of the three must agree on the current leader.
If two disagree or fail to answer, the `etcd` database will lock itself
until quorum is achieved in order to protect the integrity of
the data.
This design means that having two controlplane nodes is _worse_ than having only one, because if _either_ goes down, your database will lock (and the chance of one of two nodes going down is greater than the chance of just a single node going down).
Similarly, a 4 node etcd cluster is worse than a 3 node etcd cluster - a 4 node cluster requires 3 nodes to be up to achieve quorum (in order to have a majority), while the 3 node cluster requires 2 nodes:
i.e. both can support a single node failure and keep running - but the chance of a node failing in a 4 node cluster is higher than that in a 3 node cluster.
Another note about etcd: due to the need to replicate data amongst members, performance of etcd _decreases_ as the cluster scales.
A 5 node cluster can commit about 5% less writes per second than a 3 node cluster running on the same hardware.
## Recommendations for your control plane
- Run your clusters with three or five control plane nodes.
Three is enough for most use cases.
Five will give you better availability (in that it can tolerate two node failures simultaneously), but cost you more both in the number of nodes required, and also as each node may require more hardware resources to offset the performance degradation seen in larger clusters.
- Implement good monitoring and put processes in place to deal with a failed node in a timely manner (and test them!)
- Even with robust monitoring and procedures for replacing failed nodes in place, backup etcd and your control plane node configuration to guard against unforeseen disasters.
- Monitor the performance of your etcd clusters.
If etcd performance is slow, vertically scale the nodes, not the number of nodes.
- If a control plane node fails, remove it first, then add the replacement node.
(This ensures that the failed node does not "vote" when adding in the new node, minimizing the chances of a quorum violation.)
- If replacing a node that has not failed, add the new one, then remove the old.
## Bootstrapping the Control Plane
Every new cluster must be bootstrapped only once, which is achieved by telling a single control plane node to initiate the bootstrap.
Bootstrapping itself does not do anything with Kubernetes.
Bootstrapping only tells `etcd` to form a cluster, so don't judge the success of
a bootstrap by the failure of Kubernetes to start.
Kubernetes relies on `etcd`, so bootstrapping is _required_, but it is not
_sufficient_ for Kubernetes to start.
If your Kubernetes cluster fails to form for other reasons (say, a bad
configuration option or unavailable container repository), if the bootstrap API
call returns successfully, you do NOT need to bootstrap again:
just fix the config or let Kubernetes retry.
### High-level Overview
Talos cluster bootstrap flow:
1. The `etcd` service is started on control plane nodes.
Instances of `etcd` on control plane nodes build the `etcd` cluster.
2. The `kubelet` service is started.
3. Control plane components are started as static pods via the `kubelet`, and the `kube-apiserver` component connects to the local (running on the same node) `etcd` instance.
4. The `kubelet` issues client certificate using the bootstrap token using the control plane endpoint (via `kube-apiserver` and `kube-controller-manager`).
5. The `kubelet` registers the node in the API server.
6. Kubernetes control plane schedules pods on the nodes.
### Cluster Bootstrapping
All nodes start the `kubelet` service.
The `kubelet` tries to contact the control plane endpoint, but as it is not up yet, it keeps retrying.
One of the control plane nodes is chosen as the bootstrap node, and promoted using the bootstrap API (`talosctl bootstrap`).
The bootstrap node initiates the `etcd` bootstrap process by initializing `etcd` as the first member of the cluster.
> Once `etcd` is bootstrapped, the bootstrap node has no special role and acts the same way as other control plane nodes.
Services `etcd` on non-bootstrap nodes try to get `Endpoints` resource via control plane endpoint, but that request fails as control plane endpoint is not up yet.
As soon as `etcd` is up on the bootstrap node, static pod definitions for the Kubernetes control plane components (`kube-apiserver`, `kube-controller-manager`, `kube-scheduler`) are rendered to disk.
The `kubelet` service on the bootstrap node picks up the static pod definitions and starts the Kubernetes control plane components.
As soon as `kube-apiserver` is launched, the control plane endpoint comes up.
The bootstrap node acquires an `etcd` mutex and injects the bootstrap manifests into the API server.
The set of the bootstrap manifests specify the Kubernetes join token and kubelet CSR auto-approval.
The `kubelet` service on all the nodes is now able to issue client certificates for themselves and register nodes in the API server.
Other bootstrap manifests specify additional resources critical for Kubernetes operations (i.e. CNI, PSP, etc.)
The `etcd` service on non-bootstrap nodes is now able to discover other members of the `etcd` cluster via the Kubernetes `Endpoints` resource.
The `etcd` cluster is now formed and consists of all control plane nodes.
All control plane nodes render static pod manifests for the control plane components.
Each node now runs a full set of components to make the control plane HA.
The `kubelet` service on worker nodes is now able to issue the client certificate and register itself with the API server.
### Scaling Up the Control Plane
When new nodes are added to the control plane, the process is the same as the bootstrap process above: the `etcd` service discovers existing members of the control plane via the
control plane endpoint, joins the `etcd` cluster, and the control plane components are scheduled on the node.
### Scaling Down the Control Plane
Scaling down the control plane involves removing a node from the cluster.
The most critical part is making sure that the node which is being removed leaves the etcd cluster.
The recommended way to do this is to use:
- `talosctl -n IP.of.node.to.remove reset`
- `kubectl delete node`
When using `talosctl reset` command, the targeted control plane node leaves the `etcd` cluster as part of the reset sequence, and its disks are erased.
### Upgrading Talos on Control Plane Nodes
When a control plane node is upgraded, Talos leaves `etcd`, wipes the system disk, installs a new version of itself, and reboots.
The upgraded node then joins the `etcd` cluster on reboot.
So upgrading a control plane node is equivalent to scaling down the control plane node followed by scaling up with a new version of Talos.

View File

@ -0,0 +1,230 @@
---
title: "Controllers and Resources"
weight: 60
description: "Discover how Talos Linux uses the concepts on Controllers and Resources."
---
<!-- markdownlint-disable MD038 -->
Talos implements concepts of *resources* and *controllers* to facilitate internal operations of the operating system.
Talos resources and controllers are very similar to Kubernetes resources and controllers, but there are some differences.
The content of this document is not required to operate Talos, but it is useful for troubleshooting.
Starting with Talos 0.9, most of the Kubernetes control plane bootstrapping and operations is implemented via controllers and resources which allows Talos to be reactive to configuration changes, environment changes (e.g. time sync).
## Resources
A resource captures a piece of system state.
Each resource belongs to a "Type" which defines resource contents.
Resource state can be split in two parts:
* metadata: fixed set of fields describing resource - namespace, type, ID, etc.
* spec: contents of the resource (depends on resource type).
Resource is uniquely identified by (`namespace`, `type`, `id`).
Namespaces provide a way to avoid conflicts on duplicate resource IDs.
At the moment of this writing, all resources are local to the node and stored in memory.
So on every reboot resource state is rebuilt from scratch (the only exception is `MachineConfig` resource which reflects current machine config).
## Controllers
Controllers run as independent lightweight threads in Talos.
The goal of the controller is to reconcile the state based on inputs and eventually update outputs.
A controller can have any number of resource types (and namespaces) as inputs.
In other words, it watches specified resources for changes and reconciles when these changes occur.
A controller might also have additional inputs: running reconcile on schedule, watching `etcd` keys, etc.
A controller has a single output: a set of resources of fixed type in a fixed namespace.
Only one controller can manage resource type in the namespace, so conflicts are avoided.
## Querying Resources
Talos CLI tool `talosctl` provides read-only access to the resource API which includes getting specific resource,
listing resources and watching for changes.
Talos stores resources describing resource types and namespaces in `meta` namespace:
```bash
$ talosctl get resourcedefinitions
NODE NAMESPACE TYPE ID VERSION
172.20.0.2 meta ResourceDefinition bootstrapstatuses.v1alpha1.talos.dev 1
172.20.0.2 meta ResourceDefinition etcdsecrets.secrets.talos.dev 1
172.20.0.2 meta ResourceDefinition kubernetescontrolplaneconfigs.config.talos.dev 1
172.20.0.2 meta ResourceDefinition kubernetessecrets.secrets.talos.dev 1
172.20.0.2 meta ResourceDefinition machineconfigs.config.talos.dev 1
172.20.0.2 meta ResourceDefinition machinetypes.config.talos.dev 1
172.20.0.2 meta ResourceDefinition manifests.kubernetes.talos.dev 1
172.20.0.2 meta ResourceDefinition manifeststatuses.kubernetes.talos.dev 1
172.20.0.2 meta ResourceDefinition namespaces.meta.cosi.dev 1
172.20.0.2 meta ResourceDefinition resourcedefinitions.meta.cosi.dev 1
172.20.0.2 meta ResourceDefinition rootsecrets.secrets.talos.dev 1
172.20.0.2 meta ResourceDefinition secretstatuses.kubernetes.talos.dev 1
172.20.0.2 meta ResourceDefinition services.v1alpha1.talos.dev 1
172.20.0.2 meta ResourceDefinition staticpods.kubernetes.talos.dev 1
172.20.0.2 meta ResourceDefinition staticpodstatuses.kubernetes.talos.dev 1
172.20.0.2 meta ResourceDefinition timestatuses.v1alpha1.talos.dev 1
```
```bash
$ talosctl get namespaces
NODE NAMESPACE TYPE ID VERSION
172.20.0.2 meta Namespace config 1
172.20.0.2 meta Namespace controlplane 1
172.20.0.2 meta Namespace meta 1
172.20.0.2 meta Namespace runtime 1
172.20.0.2 meta Namespace secrets 1
```
Most of the time namespace flag (`--namespace`) can be omitted, as `ResourceDefinition` contains default
namespace which is used if no namespace is given:
```bash
$ talosctl get resourcedefinitions resourcedefinitions.meta.cosi.dev -o yaml
node: 172.20.0.2
metadata:
namespace: meta
type: ResourceDefinitions.meta.cosi.dev
id: resourcedefinitions.meta.cosi.dev
version: 1
phase: running
spec:
type: ResourceDefinitions.meta.cosi.dev
displayType: ResourceDefinition
aliases:
- resourcedefinitions
- resourcedefinition
- resourcedefinitions.meta
- resourcedefinitions.meta.cosi
- rd
- rds
printColumns: []
defaultNamespace: meta
```
Resource definition also contains type aliases which can be used interchangeably with canonical resource name:
```bash
$ talosctl get ns config
NODE NAMESPACE TYPE ID VERSION
172.20.0.2 meta Namespace config 1
```
### Output
Command `talosctl get` supports following output modes:
* `table` (default) prints resource list as a table
* `yaml` prints pretty formatted resources with details, including full metadata spec.
This format carries most details from the backend resource (e.g. comments in `MachineConfig` resource)
* `json` prints same information as `yaml`, some additional details (e.g. comments) might be lost.
This format is useful for automated processing with tools like `jq`.
### Watching Changes
If flag `--watch` is appended to the `talosctl get` command, the command switches to watch mode.
If list of resources was requested, `talosctl` prints initial contents of the list and then appends resource information for every change:
```bash
$ talosctl get svc -w
NODE * NAMESPACE TYPE ID VERSION RUNNING HEALTHY
172.20.0.2 + runtime Service timed 2 true true
172.20.0.2 + runtime Service trustd 2 true true
172.20.0.2 + runtime Service udevd 2 true true
172.20.0.2 - runtime Service timed 2 true true
172.20.0.2 + runtime Service timed 1 true false
172.20.0.2 runtime Service timed 2 true true
```
Column `*` specifies event type:
* `+` is created
* `-` is deleted
* ` ` is updated
In YAML/JSON output, field `event` is added to the resource representation to describe the event type.
### Examples
Getting machine config:
```bash
$ talosctl get machineconfig -o yaml
node: 172.20.0.2
metadata:
namespace: config
type: MachineConfigs.config.talos.dev
id: v1alpha1
version: 2
phase: running
spec:
version: v1alpha1 # Indicates the schema used to decode the contents.
debug: false # Enable verbose logging to the console.
persist: true # Indicates whether to pull the machine config upon every boot.
# Provides machine specific configuration options.
...
```
Getting control plane static pod statuses:
```bash
$ talosctl get staticpodstatus
NODE NAMESPACE TYPE ID VERSION READY
172.20.0.2 controlplane StaticPodStatus kube-system/kube-apiserver-talos-default-controlplane-1 3 True
172.20.0.2 controlplane StaticPodStatus kube-system/kube-controller-manager-talos-default-controlplane-1 3 True
172.20.0.2 controlplane StaticPodStatus kube-system/kube-scheduler-talos-default-controlplane-1 4 True
```
Getting static pod definition for `kube-apiserver`:
```bash
$ talosctl get sp kube-apiserver -n 172.20.0.2 -o yaml
node: 172.20.0.2
metadata:
namespace: controlplane
type: StaticPods.kubernetes.talos.dev
id: kube-apiserver
version: 3
phase: running
finalizers:
- k8s.StaticPodStatus("kube-apiserver")
spec:
apiVersion: v1
kind: Pod
metadata:
annotations:
talos.dev/config-version: "1"
talos.dev/secrets-version: "2"
...
```
## Inspecting Controller Dependencies
Talos can report current dependencies between controllers and resources for debugging purposes:
```bash
$ talosctl inspect dependencies
digraph {
n1[label="config.K8sControlPlaneController",shape="box"];
n3[label="config.MachineTypeController",shape="box"];
n2[fillcolor="azure2",label="config:KubernetesControlPlaneConfigs.config.talos.dev",shape="note",style="filled"];
...
```
This outputs graph in `graphviz` format which can be rendered to PNG with command:
```bash
talosctl inspect dependencies | dot -T png > deps.png
```
![Controller Dependencies](/images/controller-dependencies-v2.png)
Graph can be enhanced by replacing resource types with actual resource instances:
```bash
talosctl inspect dependencies --with-resources | dot -T png > deps.png
```
![Controller Dependencies with Resources](/images/controller-dependencies-with-resources-v2.png)

View File

@ -0,0 +1,72 @@
---
title: "FAQs"
weight: 999
description: "Frequently Asked Questions about Talos Linux."
---
<!-- markdownlint-disable MD026 -->
## How is Talos different from other container optimized Linux distros?
Talos integrates tightly with Kubernetes, and is not meant to be a general-purpose operating system.
The most important difference is that Talos is fully controlled by an API via a gRPC interface, instead of an ordinary shell.
We don't ship SSH, and there is no console access.
Removing components such as these has allowed us to dramatically reduce the footprint of Talos, and in turn, improve a number of other areas like security, predictability, reliability, and consistency across platforms.
It's a big change from how operating systems have been managed in the past, but we believe that API-driven OSes are the future.
## Why no shell or SSH?
Since Talos is fully API-driven, all maintenance and debugging operations are possible via the OS API.
We would like for Talos users to start thinking about what a "machine" is in the context of a Kubernetes cluster.
That is, that a Kubernetes _cluster_ can be thought of as one massive machine, and the _nodes_ are merely additional, undifferentiated resources.
We don't want humans to focus on the _nodes_, but rather on the _machine_ that is the Kubernetes cluster.
Should an issue arise at the node level, `talosctl` should provide the necessary tooling to assist in the identification, debugging, and remediation of the issue.
However, the API is based on the Principle of Least Privilege, and exposes only a limited set of methods.
We envision Talos being a great place for the application of [control theory](https://en.wikipedia.org/wiki/Control_theory) in order to provide a self-healing platform.
## Why the name "Talos"?
Talos was an automaton created by the Greek God of the forge to protect the island of Crete.
He would patrol the coast and enforce laws throughout the land.
We felt it was a fitting name for a security focused operating system designed to run Kubernetes.
## Why does Talos rely on a separate configuration from Kubernetes?
The `talosconfig` file contains client credentials to access the Talos Linux API.
Sometimes Kubernetes might be down for a number of reasons (etcd issues, misconfiguration, etc.), while Talos API access will always be available.
The Talos API is a way to access the operating system and fix issues, e.g. fixing access to Kubernetes.
When Talos Linux is running fine, using the Kubernetes APIs (via `kubeconfig`) is all you should need to deploy and manage Kubernetes workloads.
## How does Talos handle certificates?
During the machine config generation process, Talos generates a set of certificate authorities (CAs) that remains valid for 10 years.
Talos is responsible for managing certificates for `etcd`, Talos API (`apid`), node certificates (`kubelet`), and other components.
It also handles the automatic rotation of server-side certificates.
However, client certificates such as `talosconfig` and `kubeconfig` are the user's responsibility, and by default, they have a validity period of 1 year.
To renew the `talosconfig` certificate, the follow [this process]({{< relref "../talos-guides/configuration/managing-pki" >}}).
To renew `kubeconfig`, use `talosctl kubeconfig` command, and the time-to-live (TTL) is defined in the [configuration]({{< relref "../reference/configuration/#adminkubeconfigconfig" >}}).
## How can I set the timezone of my Talos Linux clusters?
Talos doesn't support timezones, and will always run in UTC.
This ensures consistency of log timestamps for all Talos Linux clusters, simplifying debugging.
Your containers can run with any timezone configuration you desire, but the timezone of Talos Linux is not configurable.
## How do I see Talos kernel configuration?
### Using Talos API
Current kernel config can be read with `talosctl -n <NODE> read /proc/config.gz`.
For example:
```shell
talosctl -n NODE read /proc/config.gz | zgrep E1000
```
### Using GitHub
For `amd64`, see https://github.com/siderolabs/pkgs/blob/main/kernel/build/config-amd64.
Use appropriate branch to see the kernel config matching your Talos release.

View File

@ -0,0 +1,177 @@
---
title: "Image Factory"
weight: 55
description: "Image Factory generates customized Talos Linux images based on configured schematics."
---
The Image Factory provides a way to download Talos Linux artifacts.
Artifacts can be generated with customizations defined by a "schematic".
A schematic can be applied to any of the versions of Talos Linux offered by the Image Factory to produce a "model".
The following assets are provided:
* ISO
* `kernel`, `initramfs`, and kernel command line
* UKI
* disk images in various formats (e.g. AWS, GCP, VMware, etc.)
* installer container images
The supported frontends are:
* HTTP
* PXE
* Container Registry
The official instance of Image Factory is available at https://factory.talos.dev.
See [Boot Assets]({{< relref "../talos-guides/install/boot-assets#image-factory" >}}) for an example of how to use the Image Factory to boot and upgrade Talos on different platforms.
Full API documentation for the Image Factory is available at [GitHub](https://github.com/siderolabs/image-factory#readme).
## Schematics
Schematics are YAML files that define customizations to be applied to a Talos Linux image.
Schematics can be applied to any of the versions of Talos Linux offered by the Image Factory to produce a "model", which is a Talos Linux image with the customizations applied.
Schematics are content-addressable, that is, the content of the schematic is used to generate a unique ID.
The schematic should be uploaded to the Image Factory first, and then the ID can be used to reference the schematic in a model.
Schematics can be generated using the [Image Factory UI](#ui), or using the Image Factory API:
```yaml
customization:
extraKernelArgs: # optional
- vga=791
meta: # optional, allows to set initial Talos META
- key: 0xa
value: "{}"
systemExtensions: # optional
officialExtensions: # optional
- siderolabs/gvisor
- siderolabs/amd-ucode
```
The "vanilla" schematic is:
```yaml
customization:
```
and has an ID of `376567988ad370138ad8b2698212367b8edcb69b5fd68c80be1f2ec7d603b4ba`.
The schematic can be applied by uploading it to the Image Factory:
```shell
curl -X POST --data-binary @schematic.yaml https://factory.talos.dev/schematics
```
As the schematic is content-addressable, the same schematic can be uploaded multiple times, and the Image Factory will return the same ID.
## Models
Models are Talos Linux images with customizations applied.
The inputs to generate a model are:
* schematic ID
* Talos Linux version
* model type (e.g. ISO, UKI, etc.)
* architecture (e.g. amd64, arm64)
* various model type specific options (e.g. disk image format, disk image size, etc.)
## Frontends
Image Factory provides several frontends to retrieve models:
* HTTP frontend to download models (e.g. download an ISO or a disk image)
* PXE frontend to boot bare-metal machines (PXE script references kernel/initramfs from HTTP frontend)
* Registry frontend to fetch customized `installer` images (for initial Talos Linux installation and upgrades)
The links to different models are available in the [Image Factory UI](#ui), and a full list of possible models is documented at [GitHub](https://github.com/siderolabs/image-factory#readme).
In this guide we will provide a list of examples:
* amd64 ISO (for Talos {{< release >}}, "vanilla" schematic) [https://factory.talos.dev/image/376567988ad370138ad8b2698212367b8edcb69b5fd68c80be1f2ec7d603b4ba/{{< release >}}/metal-amd64.iso](https://factory.talos.dev/image/376567988ad370138ad8b2698212367b8edcb69b5fd68c80be1f2ec7d603b4ba/{{< release >}}/metal-amd64.iso)
* arm64 AWS image (for Talos {{< release >}}, "vanilla" schematic) [https://factory.talos.dev/image/376567988ad370138ad8b2698212367b8edcb69b5fd68c80be1f2ec7d603b4ba/{{< release >}}/aws-arm64.raw.xz](https://factory.talos.dev/image/376567988ad370138ad8b2698212367b8edcb69b5fd68c80be1f2ec7d603b4ba/{{< release >}}/aws-arm64.raw.xz)
* amd64 PXE boot script (for Talos {{< release >}}, "vanilla" schematic) [https://pxe.factory.talos.dev/pxe/376567988ad370138ad8b2698212367b8edcb69b5fd68c80be1f2ec7d603b4ba/{{< release >}}/metal-amd64](https://pxe.factory.talos.dev/image/376567988ad370138ad8b2698212367b8edcb69b5fd68c80be1f2ec7d603b4ba/{{< release >}}/metal-amd64)
* Talos `installer` image (for Talos {{< release >}}, "vanilla" schematic, architecture is detected automatically): `factory.talos.dev/installer/376567988ad370138ad8b2698212367b8edcb69b5fd68c80be1f2ec7d603b4ba:{{< release >}}`
The `installer` image can be used to install Talos Linux on a bare-metal machine, or to upgrade an existing Talos Linux installation.
As the Talos version and schematic ID can be changed, via an upgrade process, the `installer` image can be used to upgrade to any version of Talos Linux, or replace a set of installed system extensions.
## UI
The Image Factory UI is available at https://factory.talos.dev.
The UI provides a way to list supported Talos Linux versions, list of system extensions available for each release, and a way to generate schematic based on the selected system extensions.
The UI operations are equivalent to API operations.
## Find Schematic ID from Talos Installation
Image Factory always appends "virtual" system extension with the version matching schematic ID used to generate the model.
So, for any running Talos Linux instance the schematic ID can be found by looking at the list of system extensions:
```shell
$ talosctl get extensions
NAMESPACE TYPE ID VERSION NAME VERSION
runtime ExtensionStatus 0 1 schematic 376567988ad370138ad8b2698212367b8edcb69b5fd68c80be1f2ec7d603b4ba
```
## Restrictions
Some models don't include every customization of the schematic:
* `installer` and `initramfs` images only support system extensions (kernel args and META are ignored)
* `kernel` assets don't depend on the schematic
Other models have full support for all customizations:
* any disk image format
* ISO, PXE boot script
When installing Talos Linux using ISO/PXE boot, Talos will be installed on the disk using the `installer` image, so the `installer` image in the machine configuration
should be using the same schematic as the ISO/PXE boot image.
Some system extensions are not available for all Talos Linux versions, so an attempt to generate a model with an unsupported system extension will fail.
List of supported Talos versions and supported system extensions for each version is available in the [Image Factory UI](#ui) and [API](https://github.com/siderolabs/image-factory#readme).
## Under the Hood
Image Factory is based on the Talos `imager` container which provides both the Talos base boot assets, and the ability to generate custom assets based on a configuration.
Image Factory manages a set of `imager` container images to acquire base Talos Linux boot assets (`kernel`, `initramfs`), a set of Talos Linux system extension images, and a set of schematics.
When a model is requested, Image Factory uses the `imager` container to generate the requested assets based on the schematic and the Talos Linux version.
## Security
Image Factory verifies signatures of all source container images fetched:
* `imager` container images (base boot assets)
* `extensions` system extensions catalogs
* `installer` contianer images (base installer layer)
* Talos Linux system extension images
Internally, Image Factory caches generated boot assets and signs all cached images using a private key.
Image Factory verifies the signature of the cached images before serving them to clients.
Image Factory signs generated `installer` images, and verifies the signature of the `installer` images before serving them to clients.
Image Factory does not provide a way to list all schematics, as schematics may contain sensitive information (e.g. private kernel boot arguments).
As the schematic ID is content-addressable, it is not possible to guess the ID of a schematic without knowing the content of the schematic.
## Running your own Image Factory
Image Factory can be deployed on-premises to provide in-house asset generation.
Image Factory requires following components:
* an OCI registry to store schematics (private)
* an OCI registry to store cached assets (private)
* an OCI registry to store `installer` images (should allow public read-only access)
* a container image signing key: ECDSA P-256 private key in PEM format
Image Factory is configured using command line flags, use `--help` to see a list of available flags.
Image Factory should be configured to use proper authentication to push to the OCI registries:
* by mounting proper credentials via `~/.docker/config.json`
* by supplying `GITHUB_TOKEN` (for `ghcr.io`)
Image Factory performs HTTP redirects to the public registry endpoint for `installer` images, so the public endpoint
should be available to Talos Linux machines to pull the `installer` images.

View File

@ -0,0 +1,98 @@
---
title: "Knowledge Base"
weight: 1999
description: "Recipes for common configuration tasks with Talos Linux."
---
## Disabling `GracefulNodeShutdown` on a node
Talos Linux enables [Graceful Node Shutdown](https://kubernetes.io/docs/concepts/architecture/nodes/#graceful-node-shutdown) Kubernetes feature by default.
If this feature should be disabled, modify the `kubelet` part of the machine configuration with:
```yaml
machine:
kubelet:
extraArgs:
feature-gates: GracefulNodeShutdown=false
extraConfig:
shutdownGracePeriod: 0s
shutdownGracePeriodCriticalPods: 0s
```
## Generating Talos Linux ISO image with custom kernel arguments
Pass additional kernel arguments using `--extra-kernel-arg` flag:
```shell
$ docker run --rm -i ghcr.io/siderolabs/imager:{{< release >}} iso --arch amd64 --tar-to-stdout --extra-kernel-arg console=ttyS1 --extra-kernel-arg console=tty0 | tar xz
2022/05/25 13:18:47 copying /usr/install/amd64/vmlinuz to /mnt/boot/vmlinuz
2022/05/25 13:18:47 copying /usr/install/amd64/initramfs.xz to /mnt/boot/initramfs.xz
2022/05/25 13:18:47 creating grub.cfg
2022/05/25 13:18:47 creating ISO
```
ISO will be output to the file `talos-<arch>.iso` in the current directory.
## Logging Kubernetes audit logs with loki
If using loki-stack helm chart to gather logs from the Kubernetes cluster, you can use the helm values to configure loki-stack to log Kubernetes API server audit logs:
```yaml
promtail:
extraArgs:
- -config.expand-env
# this is required so that the promtail process can read the kube-apiserver audit logs written as `nobody` user
containerSecurityContext:
capabilities:
add:
- DAC_READ_SEARCH
extraVolumes:
- name: audit-logs
hostPath:
path: /var/log/audit/kube
extraVolumeMounts:
- name: audit-logs
mountPath: /var/log/audit/kube
readOnly: true
config:
snippets:
extraScrapeConfigs: |
- job_name: auditlogs
static_configs:
- targets:
- localhost
labels:
job: auditlogs
host: ${HOSTNAME}
__path__: /var/log/audit/kube/*.log
```
## Setting CPU scaling governer
While its possible to set [CPU scaling governer](https://kernelnewbies.org/Linux_5.9#CPU_Frequency_scaling) via `.machine.sysfs` it's sometimes cumbersome to set it for all CPU's individually.
A more elegant approach would be set it via a kernel commandline parameter.
This also means that the options are applied way early in the boot process.
This can be set in the machineconfig via the snippet below:
```yaml
machine:
install:
extraKernelArgs:
- cpufreq.default_governor=performance
```
> Note: Talos needs to be upgraded for the `extraKernelArgs` to take effect.
## Disable `admissionControl` on control plane nodes
Talos Linux enables admission control in the API Server by default.
Although it is not recommended from a security point of view, admission control can be removed by patching your control plane machine configuration:
```bash
talosctl gen config \
my-cluster https://mycluster.local:6443 \
--config-patch-control-plane '[{"op": "remove", "path": "/cluster/apiServer/admissionControl"}]'
```

View File

@ -0,0 +1,207 @@
---
title: "KubeSpan"
weight: 100
description: "Understand more about KubeSpan for Talos Linux."
---
## WireGuard Peer Discovery
The key pieces of information needed for WireGuard generally are:
- the public key of the host you wish to connect to
- an IP address and port of the host you wish to connect to
The latter is really only required of _one_ side of the pair.
Once traffic is received, that information is learned and updated by WireGuard automatically.
Kubernetes, though, also needs to know which traffic goes to which WireGuard peer.
Because this information may be dynamic, we need a way to keep this information up to date.
If we already have a connection to Kubernetes, it's fairly easy: we can just keep that information in Kubernetes.
Otherwise, we have to have some way to discover it.
Talos Linux implements a multi-tiered approach to gathering this information.
Each tier can operate independently, but the amalgamation of the mechanisms produces a more robust set of connection criteria.
These mechanisms are:
- an external service
- a Kubernetes-based system
See [discovery service]({{< relref "../talos-guides/discovery" >}}) to learn more about the external service.
The Kubernetes-based system utilizes annotations on Kubernetes Nodes which describe each node's public key and local addresses.
On top of this, KubeSpan can optionally route Pod subnets.
This is usually taken care of by the CNI, but there are many situations where the CNI fails to be able to do this itself, across networks.
## NAT, Multiple Routes, Multiple IPs
One of the difficulties in communicating across networks is that there is often not a single address and port which can identify a connection for each node on the system.
For instance, a node sitting on the same network might see its peer as `192.168.2.10`, but a node across the internet may see it as `2001:db8:1ef1::10`.
We need to be able to handle any number of addresses and ports, and we also need to have a mechanism to _try_ them.
WireGuard only allows us to select one at a time.
KubeSpan implements a controller which continuously discovers and rotates these IP:port pairs until a connection is established.
It then starts trying again if that connection ever fails.
## Packet Routing
After we have established a WireGuard connection, we have to make sure that the right packets get sent to the WireGuard interface.
WireGuard supplies a convenient facility for tagging packets which come from _it_, which is great.
But in our case, we need to be able to allow traffic which both does _not_ come from WireGuard and _also_ is not destined for another Kubernetes node to flow through the normal mechanisms.
Unlike many corporate or privacy-oriented VPNs, we need to allow general internet traffic to flow normally.
Also, as our cluster grows, this set of IP addresses can become quite large and quite dynamic.
This would be very cumbersome and slow in `iptables`.
Luckily, the kernel supplies a convenient mechanism by which to define this arbitrarily large set of IP addresses: IP sets.
Talos collects all of the IPs and subnets which are considered "in-cluster" and maintains these in the kernel as an IP set.
Now that we have the IP set defined, we need to tell the kernel how to use it.
The traditional way of doing this would be to use `iptables`.
However, there is a big problem with IPTables.
It is a common namespace in which any number of other pieces of software may dump things.
We have no surety that what we add will not be wiped out by something else (from Kubernetes itself, to the CNI, to some workload application), be rendered unusable by higher-priority rules, or just generally cause trouble and conflicts.
Instead, we use a three-pronged system which is both more foundational and less centralised.
NFTables offers a separately namespaced, decentralised way of marking packets for later processing based on IP sets.
Instead of a common set of well-known tables, NFTables uses hooks into the kernel's netfilter system, which are less vulnerable to being usurped, bypassed, or a source of interference than IPTables, but which are rendered down by the kernel to the same underlying XTables system.
Our NFTables system is where we store the IP sets.
Any packet which enters the system, either by forward from inside Kubernetes or by generation from the host itself, is compared against a hash table of this IP set.
If it is matched, it is marked for later processing by our next stage.
This is a high-performance system which exists fully in the kernel and which ultimately becomes an eBPF program, so it scales well to hundreds of nodes.
The next stage is the kernel router's route rules.
These are defined as a common ordered list of operations for the whole operating system, but they are intended to be tightly constrained and are rarely used by applications in any case.
The rules we add are very simple: if a packet is marked by our NFTables system, send it to an alternate routing table.
This leads us to our third and final stage of packet routing.
We have a custom routing table with two rules:
- send all IPv4 traffic to the WireGuard interface
- send all IPv6 traffic to the WireGuard interface
So in summary, we:
- mark packets destined for Kubernetes applications or Kubernetes nodes
- send marked packets to a special routing table
- send anything which is sent to that routing table through the WireGuard interface
This gives us an isolated, resilient, tolerant, and non-invasive way to route Kubernetes traffic safely, automatically, and transparently through WireGuard across almost any set of network topologies.
## Design Decisions
### Routing
Routing for Wireguard is a touch complicated when the set of possible peer
endpoints includes at least one member of the set of _destinations_.
That is, packets from Wireguard to a peer endpoint should not be sent to
Wireguard, lest a loop be created.
In order to handle this situation, Wireguard provides the ability to mark
packets which it generates, so their routing can be handled separately.
In our case, though, we actually want the inverse of this: we want to route
Wireguard packets however the normal networking routes and rules say they should
be routed, while packets destined for the other side of Wireguard Peers should
be forced into Wireguard interfaces.
While IP Rules allow you to invert matches, they do not support matching based
on IP sets.
That means, to use simple rules, we would have to add a rule for
each destination, which could reach into hundreds or thousands of rules to
manage.
This is not really much of a performance issue, but it is a management
issue, since it is expected that we would not be the only manager of rules in
the system, and rules offer no facility to tag for ownership.
IP Sets are supported by IPTables, and we could integrate there.
However, IPTables exists in a global namespace, which makes it fragile having
multiple parties manipulating it.
The newer NFTables replacement for IPTables, though, allows users to
independently hook into various points of XTables, keeping all such rules and
sets independent.
This means that regardless of what CNIs or other user-side routing rules may do,
our KubeSpan setup will not be messed up.
Therefore, we utilise NFTables (which natively supports IP sets and owner
grouping) instead, to mark matching traffic which should be sent to the
Wireguard interface.
This way, we can keep all our KubeSpan set logic in one place, allowing us to
simply use a single `ip rule` match:
for our fwmark, and sending those matched packets to a separate routing table
with one rule: default to the wireguard interface.
So we have three components:
1. A routing table for Wireguard-destined packets
2. An NFTables table which defines the set of destinations packets to which will
be marked with our firewall mark.
- Hook into PreRouting (type Filter)
- Hook into Outgoing (type Route)
3. One IP Rule which sends packets marked with our firewall mark to our Wireguard
routing table.
### Routing Table
The routing table (number 180 by default) is simple, containing a single route for each family: send everything through the Wireguard interface.
### NFTables
The logic inside NFTables is fairly simple.
First, everything is compiled into a single table: `talos_kubespan`.
Next, two chains are set up: one for the `prerouting` hook (`kubespan_prerouting`)
and the other for the `outgoing` hook (`kubespan_outgoing`).
We define two sets of target IP prefixes: one for IPv6 (`kubespan_targets_ipv6`)
and the other for IPv4 (`kubespan_targets_ipv4`).
Last, we add rules to each chain which basically specify:
1. If the packet is marked as _from_ Wireguard, just accept it and terminate
the chain.
2. If the packet matches an IP in either of the target IP sets, mark that
packet with the _to_ Wireguard mark.
### Rules
There are two route rules defined: one to match IPv6 packets and the other to
match IPv4 packets.
These rules say the same thing for each: if the packet is marked that it should
go _to_ Wireguard, send it to the Wireguard
routing table.
### Firewall Mark
KubeSpan is using only two bits of the firewall mark with the mask `0x00000060`.
> Note: if other software on the node is using the bits `0x60` of the firewall mark, this
> might cause conflicts and break KubeSpan.
>
> At the moment of the writing, it was confirmed that Calico CNI is using bits `0xffff0000` and
> Cilium CNI is using bits `0xf00`, so KubeSpan is compatible with both.
> Flannel CNI uses `0x4000` mask, so it is also compatible.
In the routing rules table, we match on the mark `0x40` with the mask `0x60`:
```text
32500: from all fwmark 0x40/0x60 lookup 180
```
In the NFTables table, we match with the same mask `0x60` and we set the mask by only modifying
bits from the `0x60` mask:
```text
meta mark & 0x00000060 == 0x00000020 accept
ip daddr @kubespan_targets_ipv4 meta mark set meta mark & 0xffffffdf | 0x00000040 accept
ip6 daddr @kubespan_targets_ipv6 meta mark set meta mark & 0xffffffdf | 0x00000040 accept
```

View File

@ -0,0 +1,434 @@
---
title: "Networking Resources"
weight: 70
description: "Delve deeper into networking of Talos Linux."
---
Talos network configuration subsystem is powered by [COSI]({{< relref "controllers-resources" >}}).
Talos translates network configuration from multiple sources: machine configuration, cloud metadata, network automatic configuration (e.g. DHCP) into COSI resources.
Network configuration and network state can be inspected using `talosctl get` command.
Network machine configuration can be modified using `talosctl edit mc` command (also variants `talosctl patch mc`, `talosctl apply-config`) without a reboot.
As API access requires network connection, [`--mode=try`]({{< relref "../talos-guides/configuration/editing-machine-configuration" >}})
can be used to test the configuration with automatic rollback to avoid losing network access to the node.
## Resources
There are six basic network configuration items in Talos:
* `Address` (IP address assigned to the interface/link);
* `Route` (route to a destination);
* `Link` (network interface/link configuration);
* `Resolver` (list of DNS servers);
* `Hostname` (node hostname and domainname);
* `TimeServer` (list of NTP servers).
Each network configuration item has two counterparts:
* `*Status` (e.g. `LinkStatus`) describes the current state of the system (Linux kernel state);
* `*Spec` (e.g. `LinkSpec`) defines the desired configuration.
| Resource | Status | Spec |
|--------------------|------------------------|----------------------|
| `Address` | `AddressStatus` | `AddressSpec` |
| `Route` | `RouteStatus` | `RouteSpec` |
| `Link` | `LinkStatus` | `LinkSpec` |
| `Resolver` | `ResolverStatus` | `ResolverSpec` |
| `Hostname` | `HostnameStatus` | `HostnameSpec` |
| `TimeServer` | `TimeServerStatus` | `TimeServerSpec` |
Status resources have aliases with the `Status` suffix removed, so for example
`AddressStatus` is also available as `Address`.
Talos networking controllers reconcile the state so that `*Status` equals the desired `*Spec`.
## Observing State
The current network configuration state can be observed by querying `*Status` resources via
`talosctl`:
```sh
$ talosctl get addresses
NODE NAMESPACE TYPE ID VERSION ADDRESS LINK
172.20.0.2 network AddressStatus eth0/172.20.0.2/24 1 172.20.0.2/24 eth0
172.20.0.2 network AddressStatus eth0/fe80::9804:17ff:fe9d:3058/64 2 fe80::9804:17ff:fe9d:3058/64 eth0
172.20.0.2 network AddressStatus flannel.1/10.244.4.0/32 1 10.244.4.0/32 flannel.1
172.20.0.2 network AddressStatus flannel.1/fe80::10b5:44ff:fe62:6fb8/64 2 fe80::10b5:44ff:fe62:6fb8/64 flannel.1
172.20.0.2 network AddressStatus lo/127.0.0.1/8 1 127.0.0.1/8 lo
172.20.0.2 network AddressStatus lo/::1/128 1 ::1/128 lo
```
In the output there are addresses set up by Talos (e.g. `eth0/172.20.0.2/24`) and
addresses set up by other facilities (e.g. `flannel.1/10.244.4.0/32` set up by CNI).
Talos networking controllers watch the kernel state and update resources
accordingly.
Additional details about the address can be accessed via the YAML output:
```yaml
# talosctl get address eth0/172.20.0.2/24 -o yaml
node: 172.20.0.2
metadata:
namespace: network
type: AddressStatuses.net.talos.dev
id: eth0/172.20.0.2/24
version: 1
owner: network.AddressStatusController
phase: running
created: 2021-06-29T20:23:18Z
updated: 2021-06-29T20:23:18Z
spec:
address: 172.20.0.2/24
local: 172.20.0.2
broadcast: 172.20.0.255
linkIndex: 4
linkName: eth0
family: inet4
scope: global
flags: permanent
```
Resources can be watched for changes with the `--watch` flag to see how configuration changes over time.
Other networking status resources can be inspected with `talosctl get routes`, `talosctl get links`, etc.
For example:
```sh
$ talosctl get resolvers
NODE NAMESPACE TYPE ID VERSION RESOLVERS
172.20.0.2 network ResolverStatus resolvers 2 ["8.8.8.8","1.1.1.1"]
```
```yaml
# talosctl get links -o yaml
node: 172.20.0.2
metadata:
namespace: network
type: LinkStatuses.net.talos.dev
id: eth0
version: 2
owner: network.LinkStatusController
phase: running
created: 2021-06-29T20:23:18Z
updated: 2021-06-29T20:23:18Z
spec:
index: 4
type: ether
linkIndex: 0
flags: UP,BROADCAST,RUNNING,MULTICAST,LOWER_UP
hardwareAddr: 4e:95:8e:8f:e4:47
broadcastAddr: ff:ff:ff:ff:ff:ff
mtu: 1500
queueDisc: pfifo_fast
operationalState: up
kind: ""
slaveKind: ""
driver: virtio_net
linkState: true
speedMbit: 4294967295
port: Other
duplex: Unknown
```
## Inspecting Configuration
The desired networking configuration is combined from multiple sources and presented
as `*Spec` resources:
```sh
$ talosctl get addressspecs
NODE NAMESPACE TYPE ID VERSION
172.20.0.2 network AddressSpec eth0/172.20.0.2/24 2
172.20.0.2 network AddressSpec lo/127.0.0.1/8 2
172.20.0.2 network AddressSpec lo/::1/128 2
```
These `AddressSpecs` are applied to the Linux kernel to reach the desired state.
If, for example, an `AddressSpec` is removed, the address is removed from the Linux network interface as well.
`*Spec` resources can't be manipulated directly, they are generated automatically by Talos
from multiple configuration sources (see a section below for details).
If a `*Spec` resource is queried in YAML format, some additional information is available:
```yaml
# talosctl get addressspecs eth0/172.20.0.2/24 -o yaml
node: 172.20.0.2
metadata:
namespace: network
type: AddressSpecs.net.talos.dev
id: eth0/172.20.0.2/24
version: 2
owner: network.AddressMergeController
phase: running
created: 2021-06-29T20:23:18Z
updated: 2021-06-29T20:23:18Z
finalizers:
- network.AddressSpecController
spec:
address: 172.20.0.2/24
linkName: eth0
family: inet4
scope: global
flags: permanent
layer: operator
```
An important field is the `layer` field, which describes a configuration layer this spec is coming from: in this case, it's generated by a network operator (see below) and is set by the DHCPv4 operator.
## Configuration Merging
Spec resources described in the previous section show the final merged configuration state,
while initial specs are put to a different unmerged namespace `network-config`.
Spec resources in the `network-config` namespace are merged with conflict resolution to produce the final merged representation in the `network` namespace.
Let's take `HostnameSpec` as an example.
The final merged representation is:
```yaml
# talosctl get hostnamespec -o yaml
node: 172.20.0.2
metadata:
namespace: network
type: HostnameSpecs.net.talos.dev
id: hostname
version: 2
owner: network.HostnameMergeController
phase: running
created: 2021-06-29T20:23:18Z
updated: 2021-06-29T20:23:18Z
finalizers:
- network.HostnameSpecController
spec:
hostname: talos-default-controlplane-1
domainname: ""
layer: operator
```
We can see that the final configuration for the hostname is `talos-default-controlplane-1`.
And this is the hostname that was actually applied.
This can be verified by querying a `HostnameStatus` resource:
```sh
$ talosctl get hostnamestatus
NODE NAMESPACE TYPE ID VERSION HOSTNAME DOMAINNAME
172.20.0.2 network HostnameStatus hostname 1 talos-default-controlplane-1
```
Initial configuration for the hostname in the `network-config` namespace is:
```yaml
# talosctl get hostnamespec -o yaml --namespace network-config
node: 172.20.0.2
metadata:
namespace: network-config
type: HostnameSpecs.net.talos.dev
id: default/hostname
version: 2
owner: network.HostnameConfigController
phase: running
created: 2021-06-29T20:23:18Z
updated: 2021-06-29T20:23:18Z
spec:
hostname: talos-172-20-0-2
domainname: ""
layer: default
---
node: 172.20.0.2
metadata:
namespace: network-config
type: HostnameSpecs.net.talos.dev
id: dhcp4/eth0/hostname
version: 1
owner: network.OperatorSpecController
phase: running
created: 2021-06-29T20:23:18Z
updated: 2021-06-29T20:23:18Z
spec:
hostname: talos-default-controlplane-1
domainname: ""
layer: operator
```
We can see that there are two specs for the hostname:
* one from the `default` configuration layer which defines the hostname as `talos-172-20-0-2` (default driven by the default node address);
* another one from the layer `operator` that defines the hostname as `talos-default-controlplane-1` (DHCP).
Talos merges these two specs into a final `HostnameSpec` based on the configuration layer and merge rules.
Here is the order of precedence from low to high:
* `default` (defaults provided by Talos);
* `cmdline` (from the kernel command line);
* `platform` (driven by the cloud provider);
* `operator` (various dynamic configuration options: DHCP, Virtual IP, etc);
* `configuration` (derived from the machine configuration).
So in our example the `operator` layer `HostnameSpec` overrides the `default` layer producing the final hostname `talos-default-controlplane-1`.
The merge process applies to all six core networking specs.
For each spec, the `layer` controls the merge behavior
If multiple configuration specs
appear at the same layer, they can be merged together if possible, otherwise merge result
is stable but not defined (e.g. if DHCP on multiple interfaces provides two different hostnames for the node).
`LinkSpecs` are merged across layers, so for example, machine configuration for the interface MTU overrides an MTU set by the DHCP server.
## Network Operators
Network operators provide dynamic network configuration which can change over time as the node is running:
* DHCPv4
* DHCPv6
* Virtual IP
Network operators produce specs for addresses, routes, links, etc., which are then merged and applied according to the rules described above.
Operators are configured with `OperatorSpec` resources which describe when operators
should run and additional configuration for the operator:
```yaml
# talosctl get operatorspecs -o yaml
node: 172.20.0.2
metadata:
namespace: network
type: OperatorSpecs.net.talos.dev
id: dhcp4/eth0
version: 1
owner: network.OperatorConfigController
phase: running
created: 2021-06-29T20:23:18Z
updated: 2021-06-29T20:23:18Z
spec:
operator: dhcp4
linkName: eth0
requireUp: true
dhcp4:
routeMetric: 1024
```
`OperatorSpec` resources are generated by Talos based on machine configuration mostly.
DHCP4 operator is created automatically for all physical network links which are not configured explicitly via the kernel command line or the machine configuration.
This also means that on the first boot, without a machine configuration, a DHCP request is made on all physical network interfaces by default.
Specs generated by operators are prefixed with the operator ID (`dhcp4/eth0` in the example above) in the unmerged `network-config` namespace:
```sh
$ talosctl -n 172.20.0.2 get addressspecs --namespace network-config
NODE NAMESPACE TYPE ID VERSION
172.20.0.2 network-config AddressSpec dhcp4/eth0/eth0/172.20.0.2/24 1
```
## Other Network Resources
There are some additional resources describing the network subsystem state.
The `NodeAddress` resource presents node addresses excluding link-local and loopback addresses:
```sh
$ talosctl get nodeaddresses
NODE NAMESPACE TYPE ID VERSION ADDRESSES
10.100.2.23 network NodeAddress accumulative 6 ["10.100.2.23","147.75.98.173","147.75.195.143","192.168.95.64","2604:1380:1:ca00::17"]
10.100.2.23 network NodeAddress current 5 ["10.100.2.23","147.75.98.173","192.168.95.64","2604:1380:1:ca00::17"]
10.100.2.23 network NodeAddress default 1 ["10.100.2.23"]
```
* `default` is the node default address;
* `current` is the set of addresses a node currently has;
* `accumulative` is the set of addresses a node had over time (it might include virtual IPs which are not owned by the node at the moment).
`NodeAddress` resources are used to pick up the default address for `etcd` peer URL, to populate SANs field in the generated certificates, etc.
Another important resource is `Nodename` which provides `Node` name in Kubernetes:
```sh
$ talosctl get nodename
NODE NAMESPACE TYPE ID VERSION NODENAME
10.100.2.23 controlplane Nodename nodename 1 infra-green-cp-mmf7v
```
Depending on the machine configuration `nodename` might be just a hostname or the FQDN of the node.
`NetworkStatus` aggregates the current state of the network configuration:
```yaml
# talosctl get networkstatus -o yaml
node: 10.100.2.23
metadata:
namespace: network
type: NetworkStatuses.net.talos.dev
id: status
version: 5
owner: network.StatusController
phase: running
created: 2021-06-24T18:56:00Z
updated: 2021-06-24T18:56:02Z
spec:
addressReady: true
connectivityReady: true
hostnameReady: true
etcFilesReady: true
```
## Network Controllers
For each of the six basic resource types, there are several controllers:
* `*StatusController` populates `*Status` resources observing the Linux kernel state.
* `*ConfigController` produces the initial unmerged `*Spec` resources in the `network-config` namespace based on defaults, kernel command line, and machine configuration.
* `*MergeController` merges `*Spec` resources into the final representation in the `network` namespace.
* `*SpecController` applies merged `*Spec` resources to the kernel state.
For the network operators:
* `OperatorConfigController` produces `OperatorSpec` resources based on machine configuration and deafauls.
* `OperatorSpecController` runs network operators watching `OperatorSpec` resources and producing various `*Spec` resources in the `network-config` namespace.
## Configuration Sources
There are several configuration sources for the network configuration, which are described in this section.
### Defaults
* `lo` interface is assigned addresses `127.0.0.1/8` and `::1/128`;
* hostname is set to the `talos-<IP>` where `IP` is the default node address;
* resolvers are set to `8.8.8.8`, `1.1.1.1`;
* time servers are set to `pool.ntp.org`;
* DHCP4 operator is run on any physical interface which is not configured explicitly.
### Cmdline
The kernel [command line]({{< relref "../reference/kernel" >}}) is parsed for the following options:
* `ip=` option is parsed for node IP, default gateway, hostname, DNS servers, NTP servers;
* `bond=` option is parsed for bonding interfaces and their options;
* `talos.hostname=` option is used to set node hostname;
* `talos.network.interface.ignore=` can be used to make Talos skip network interface configuration completely.
### Platform
Platform configuration delivers cloud environment-specific options (e.g. the hostname).
Platform configuration is specific to the environment metadata: for example, on Equinix Metal, Talos automatically
configures public and private IPs, routing, link bonding, hostname.
Platform configuration is cached across reboots in `/system/state/platform-network.yaml`.
### Operator
Network operators provide configuration for all basic resource types.
### Machine Configuration
The machine configuration is parsed for link configuration, addresses, routes, hostname,
resolvers and time servers.
Any changes to `.machine.network` configuration can be applied in immediate mode.
## Network Configuration Debugging
Most of the network controller operations and failures are logged to the kernel console,
additional logs with `debug` level are available with `talosctl logs controller-runtime` command.
If the network configuration can't be established and the API is not available, `debug` level
logs can be sent to the console with `debug: true` option in the machine configuration.

View File

@ -0,0 +1,110 @@
---
title: Philosophy
weight: 10
description: "Learn about the philosophy behind the need for Talos Linux."
---
## Distributed
Talos is intended to be operated in a distributed manner: it is built for a high-availability dataplane _first_.
Its `etcd` cluster is built in an ad-hoc manner, with each appointed node joining on its own directive (with proper security validations enforced, of course).
Like Kubernetes, workloads are intended to be distributed across any number of compute nodes.
There should be no single points of failure, and the level of required coordination is as low as each platform allows.
## Immutable
Talos takes immutability very seriously.
Talos itself, even when installed on a disk, always runs from a SquashFS image, meaning that even if a directory is mounted to be writable, the image itself is never modified.
All images are signed and delivered as single, versioned files.
We can always run integrity checks on our image to verify that it has not been modified.
While Talos does allow a few, highly-controlled write points to the filesystem, we strive to make them as non-unique and non-critical as possible.
We call the writable partition the "ephemeral" partition precisely because we want to make sure none of us ever uses it for unique, non-replicated, non-recreatable data.
Thus, if all else fails, we can always wipe the disk and get back up and running.
## Minimal
We are always trying to reduce Talos' footprint.
Because nearly the entire OS is built from scratch in Go, we are
in a good position.
We have no shell.
We have no SSH.
We have none of the GNU utilities, not even a rollup tool such as busybox.
Everything in Talos is there because it is necessary, and
nothing is included which isn't.
As a result, the OS right now produces a SquashFS image size of less than **80 MB**.
## Ephemeral
Everything Talos writes to its disk is either replicated or reconstructable.
Since the controlplane is highly available, the loss of any node will cause
neither service disruption nor loss of data.
No writes are even allowed to the vast majority of the filesystem.
We even call the writable partition "ephemeral" to keep this idea always in
focus.
## Secure
Talos has always been designed with security in mind.
With its immutability, its minimalism, its signing, and its componenture, we are
able to simply bypass huge classes of vulnerabilities.
Moreover, because of the way we have designed Talos, we are able to take
advantage of a number of additional settings, such as the recommendations of the Kernel Self Protection Project (kspp) and completely disabling dynamic modules.
There are no passwords in Talos.
All networked communication is encrypted and key-authenticated.
The Talos certificates are short-lived and automatically-rotating.
Kubernetes is always constructed with its own separate PKI structure which is
enforced.
## Declarative
Everything which can be configured in Talos is done through a single YAML
manifest.
There is no scripting and no procedural steps.
Everything is defined by the one declarative YAML file.
This configuration includes that of both Talos itself and the Kubernetes which
it forms.
This is achievable because Talos is tightly focused to do one thing: run
Kubernetes, in the easiest, most secure, most reliable way it can.
## Not based on X distro
Talos Linux _isn't_ based on any other distribution.
We think of ourselves as being the second-generation of
container-optimised operating systems, where things like CoreOS, Flatcar, and Rancher represent the first generation (but the technology is not derived from any of those.)
Talos Linux is actually a ground-up rewrite of the userspace, from PID 1.
We run the Linux kernel, but everything downstream of that is our own custom
code, written in Go, rigorously-tested, and published as an immutable,
integrated image.
The Linux kernel launches what we call `machined`, for instance, not `systemd`.
There is no `systemd` on our system.
There are no GNU utilities, no shell, no SSH, no packages, nothing you could associate with
any other distribution.
## An Operating System designed for Kubernetes
Technically, Talos Linux installs to a computer like any other operating system.
_Unlike_ other operating systems, Talos is not meant to run alone, on a
single machine.
A design goal of Talos Linux is eliminating the management
of individual nodes as much as possible.
In order to do that, Talos Linux operates as a cluster of machines, with lots of
checking and coordination between them, at all levels.
There is only a cluster.
Talos is meant to do one thing: maintain a Kubernetes cluster, and it does this
very, very well.
The entirety of the configuration of any machine is specified by a single
configuration file, which can often be the _same_ configuration file used
across _many_ machines.
Much like a biological system, if some component misbehaves, just cut it out and
let a replacement grow.
Rebuilds of Talos are remarkably fast, whether they be new machines, upgrades,
or reinstalls.
Never get hung up on an individual machine.

View File

@ -0,0 +1,23 @@
---
title: "Process Capabilities"
weight: 105
description: "Understand the Linux process capabilities restrictions with Talos Linux."
---
Linux defines a set of [process capabilities](https://man7.org/linux/man-pages/man7/capabilities.7.html) that can be used to fine-tune the process permissions.
Talos Linux for security reasons restricts any process from gaining the following capabilities:
* `CAP_SYS_MODULE` (loading kernel modules)
* `CAP_SYS_BOOT` (rebooting the system)
This means that any process including privileged Kubernetes pods will not be able to get these capabilities.
If you see the following error on starting a pod, make sure it doesn't have any of the capabilities listed above in the spec:
```text
Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: unable to apply caps: operation not permitted: unknown
```
> Note: even with `CAP_SYS_MODULE` capability, Linux kernel module loading is restricted by requiring a valid signature.
> Talos Linux creates a throw away signing key during kernel build, so it's not possible to build/sign a kernel module for Talos Linux outside of the build process.

View File

@ -0,0 +1,74 @@
---
title: "Network Connectivity"
weight: 80
description: "Description of the Networking Connectivity needed by Talos Linux"
aliases:
- ../guides/configuring-network-connectivity
---
## Configuring Network Connectivity
The simplest way to deploy Talos is by ensuring that all the remote components of the system (`talosctl`, the control plane nodes, and worker nodes) all have layer 2 connectivity.
This is not always possible, however, so this page lays out the minimal network access that is required to configure and operate a talos cluster.
> Note: These are the ports required for Talos specifically, and should be configured _in addition_ to the ports required by kuberenetes.
> See the [kubernetes docs](https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/#check-required-ports) for information on the ports used by kubernetes itself.
### Control plane node(s)
<table class="table-auto">
<thead>
<tr>
<th class="px-4 py-2">Protocol</th>
<th class="px-4 py-2">Direction</th>
<th class="px-4 py-2">Port Range</th>
<th class="px-4 py-2">Purpose</th>
<th class="px-4 py-2">Used By</th>
</tr>
</thead>
<tbody>
<tr>
<td class="border px-4 py-2">TCP</td>
<td class="border px-4 py-2">Inbound</td>
<td class="border px-4 py-2">50000*</td>
<td class="border px-4 py-2"><a href="../../learn-more/components/#apid">apid</a></td>
<td class="border px-4 py-2">talosctl, control plane nodes</td>
</tr>
<tr>
<td class="border px-4 py-2">TCP</td>
<td class="border px-4 py-2">Inbound</td>
<td class="border px-4 py-2">50001*</td>
<td class="border px-4 py-2"><a href="../../learn-more/components/#trustd">trustd</a></td>
<td class="border px-4 py-2">Worker nodes</td>
</tr>
</tbody>
</table>
> Ports marked with a `*` are not currently configurable, but that may change in the future.
> [Follow along here](https://github.com/siderolabs/talos/issues/1836).
### Worker node(s)
<table class="table-auto">
<thead>
<tr>
<th class="px-4 py-2">Protocol</th>
<th class="px-4 py-2">Direction</th>
<th class="px-4 py-2">Port Range</th>
<th class="px-4 py-2">Purpose</th>
<th class="px-4 py-2">Used By</th>
</tr>
</thead>
<tbody>
<tr>
<td class="border px-4 py-2">TCP</td>
<td class="border px-4 py-2">Inbound</td>
<td class="border px-4 py-2">50000*</td>
<td class="border px-4 py-2"><a href="../../learn-more/components/#apid">apid</a></td>
<td class="border px-4 py-2">Control plane nodes</td>
</tr>
</tbody>
</table>
> Ports marked with a `*` are not currently configurable, but that may change in the future.
> [Follow along here](https://github.com/siderolabs/talos/issues/1836).

View File

@ -0,0 +1,62 @@
---
title: "talosctl"
weight: 110
description: "The design and use of the Talos Linux control application."
---
The `talosctl` tool acts as a reference implementation for the Talos API, but it also handles a lot of
conveniences for the use of Talos and its clusters.
### Video Walkthrough
To see some live examples of talosctl usage, view the following video:
<iframe width="560" height="315" src="https://www.youtube.com/embed/pl0l_K_3Y6o" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
## Client Configuration
Talosctl configuration is located in `$XDG_CONFIG_HOME/talos/config.yaml` if `$XDG_CONFIG_HOME` is defined.
Otherwise it is in `$HOME/.talos/config`.
The location can always be overridden by the `TALOSCONFIG` environment variable or the `--talosconfig` parameter.
Like `kubectl`, `talosctl` uses the concept of configuration contexts, so any number of Talos clusters can be managed with a single configuration file.
It also comes with some intelligent tooling to manage the merging of new contexts into the config.
The default operation is a non-destructive merge, where if a context of the same name already exists in the file, the context to be added is renamed by appending an index number.
You can easily overwrite instead, as well.
See the `talosctl config help` for more information.
## Endpoints and Nodes
![Endpoints and Nodes](/images/endpoints-and-nodes.png)
`endpoints` are the communication endpoints to which the client directly talks.
These can be load balancers, DNS hostnames, a list of IPs, etc.
If multiple endpoints are specified, the client will automatically load
balance and fail over between them.
It is recommended that these point to the set of control plane nodes, either directly or through a load balancer.
Each endpoint will automatically proxy requests destined to another node through it, so it is not necessary to change the endpoint configuration just because you wish to talk to a different node within the cluster.
Endpoints _do_, however, need to be members of the same Talos cluster as the target node, because these proxied connections reply on certificate-based authentication.
The `node` is the target node on which you wish to perform the API call.
While you can configure the target node (or even set of target nodes) inside the 'talosctl' configuration file, it is recommended not to do so, but to explicitly declare the target node(s) using the `-n` or `--nodes` command-line parameter.
> When specifying nodes, their IPs and/or hostnames are as seen by the endpoint servers, not as from the client.
> This is because all connections are proxied first through the endpoints.
## Kubeconfig
The configuration for accessing a Talos Kubernetes cluster is obtained with `talosctl`.
By default, `talosctl` will safely merge the cluster into the default kubeconfig.
Like `talosctl` itself, in the event of a naming conflict, the new context name will be index-appended before insertion.
The `--force` option can be used to overwrite instead.
You can also specify an alternate path by supplying it as a positional parameter.
Thus, like Talos clusters themselves, `talosctl` makes it easy to manage any
number of kubernetes clusters from the same workstation.
## Commands
Please see the [CLI reference]({{< relref "../reference/cli" >}}) for the entire list of commands which are available from `talosctl`.

View File

@ -0,0 +1,4 @@
---
title: "Reference"
weight: 70
---

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,28 @@
---
title: Configuration
description: Talos Linux machine configuration reference.
---
Talos Linux machine is fully configured via a single YAML file called *machine configuration*.
The file might contain one or more configuration documents separated by `---` (three dashes) lines.
At the moment, majority of the configuration options are within the [v1alpha1]({{< relref "./v1alpha1" >}}) document, so
this is the only mandatory document in the configuration file.
Configuration documents might be named (contain a `name:` field) or unnamed.
Unnamed documents can be supplied to the machine configuration file only once, while named documents can be supplied multiple times with unique names.
The `v1alpha1` document has its own (legacy) structure, while every other document has the following set of fields:
```yaml
apiVersion: v1alpha1 # version of the document
kind: NetworkRuleConfig # type of document
name: rule1 # only for named documents
```
This section contains the configuration reference, to learn more about Talos Linux machine configuration management, please see:
* [quick guide to configuration generation]({{< relref "../../introduction/getting-started#configure-talos-linux" >}})
* [configuration management in production]({{< relref "../../introduction/prodnotes#configure-talos" >}})
* [configuration patches]({{< relref "../../talos-guides/configuration/patching" >}})
* [editing live machine configuration]({{< relref "../../talos-guides/configuration/editing-machine-configuration" >}})

View File

@ -0,0 +1,8 @@
---
description: |
Package network provides network machine configuration documents.
title: network
---
<!-- markdownlint-disable -->

View File

@ -0,0 +1,31 @@
---
description: NetworkDefaultActionConfig is a ingress firewall default action configuration document.
title: NetworkDefaultActionConfig
---
<!-- markdownlint-disable -->
{{< highlight yaml >}}
apiVersion: v1alpha1
kind: NetworkDefaultActionConfig
ingress: accept # Default action for all not explicitly configured ingress traffic: accept or block.
{{< /highlight >}}
| Field | Type | Description | Value(s) |
|-------|------|-------------|----------|
|`ingress` |DefaultAction |Default action for all not explicitly configured ingress traffic: accept or block. |`accept`<br />`block`<br /> |

View File

@ -0,0 +1,90 @@
---
description: NetworkRuleConfig is a network firewall rule config document.
title: NetworkRuleConfig
---
<!-- markdownlint-disable -->
{{< highlight yaml >}}
apiVersion: v1alpha1
kind: NetworkRuleConfig
name: ingress-apid # Name of the config document.
# Port selector defines which ports and protocols on the host are affected by the rule.
portSelector:
# Ports defines a list of port ranges or single ports.
ports:
- 50000
protocol: tcp # Protocol defines traffic protocol (e.g. TCP or UDP).
# Ingress defines which source subnets are allowed to access the host ports/protocols defined by the `portSelector`.
ingress:
- subnet: 192.168.0.0/16 # Subnet defines a source subnet.
{{< /highlight >}}
| Field | Type | Description | Value(s) |
|-------|------|-------------|----------|
|`name` |string |Name of the config document. | |
|`portSelector` |<a href="#NetworkRuleConfig.portSelector">RulePortSelector</a> |Port selector defines which ports and protocols on the host are affected by the rule. | |
|`ingress` |<a href="#NetworkRuleConfig.ingress.">[]IngressRule</a> |Ingress defines which source subnets are allowed to access the host ports/protocols defined by the `portSelector`. | |
## portSelector {#NetworkRuleConfig.portSelector}
RulePortSelector is a port selector for the network rule.
| Field | Type | Description | Value(s) |
|-------|------|-------------|----------|
|`ports` |PortRanges |<details><summary>Ports defines a list of port ranges or single ports.</summary>The port ranges are inclusive, and should not overlap.</details> <details><summary>Show example(s)</summary>{{< highlight yaml >}}
ports:
- 80
- 443
{{< /highlight >}}{{< highlight yaml >}}
ports:
- 1200-1299
- 8080
{{< /highlight >}}</details> | |
|`protocol` |Protocol |Protocol defines traffic protocol (e.g. TCP or UDP). |`tcp`<br />`udp`<br />`icmp`<br />`icmpv6`<br /> |
## ingress[] {#NetworkRuleConfig.ingress.}
IngressRule is a ingress rule.
| Field | Type | Description | Value(s) |
|-------|------|-------------|----------|
|`subnet` |Prefix |Subnet defines a source subnet. <details><summary>Show example(s)</summary>{{< highlight yaml >}}
subnet: 10.3.4.0/24
{{< /highlight >}}{{< highlight yaml >}}
subnet: 2001:db8::/32
{{< /highlight >}}{{< highlight yaml >}}
subnet: 1.3.4.5/32
{{< /highlight >}}</details> | |
|`except` |Prefix |Except defines a source subnet to exclude from the rule, it gets excluded from the `subnet`. | |

View File

@ -0,0 +1,8 @@
---
description: |
Package runtime provides runtime machine configuration documents.
title: runtime
---
<!-- markdownlint-disable -->

View File

@ -0,0 +1,33 @@
---
description: EventSinkConfig is a event sink config document.
title: EventSinkConfig
---
<!-- markdownlint-disable -->
{{< highlight yaml >}}
apiVersion: v1alpha1
kind: EventSinkConfig
endpoint: 192.168.10.3:3247 # The endpoint for the event sink as 'host:port'.
{{< /highlight >}}
| Field | Type | Description | Value(s) |
|-------|------|-------------|----------|
|`endpoint` |string |The endpoint for the event sink as 'host:port'. <details><summary>Show example(s)</summary>{{< highlight yaml >}}
endpoint: 10.3.7.3:2810
{{< /highlight >}}</details> | |

View File

@ -0,0 +1,35 @@
---
description: KmsgLogConfig is a event sink config document.
title: KmsgLogConfig
---
<!-- markdownlint-disable -->
{{< highlight yaml >}}
apiVersion: v1alpha1
kind: KmsgLogConfig
name: remote-log # Name of the config document.
url: tcp://192.168.3.7:3478/ # The URL encodes the log destination.
{{< /highlight >}}
| Field | Type | Description | Value(s) |
|-------|------|-------------|----------|
|`name` |string |Name of the config document. | |
|`url` |URL |<details><summary>The URL encodes the log destination.</summary>The scheme must be tcp:// or udp://.<br />The path must be empty.<br />The port is required.</details> <details><summary>Show example(s)</summary>{{< highlight yaml >}}
url: udp://10.3.7.3:2810
{{< /highlight >}}</details> | |

View File

@ -0,0 +1,8 @@
---
description: |
Package siderolink provides SideroLink machine configuration documents.
title: siderolink
---
<!-- markdownlint-disable -->

View File

@ -0,0 +1,33 @@
---
description: SideroLinkConfig is a SideroLink connection machine configuration document.
title: SideroLinkConfig
---
<!-- markdownlint-disable -->
{{< highlight yaml >}}
apiVersion: v1alpha1
kind: SideroLinkConfig
apiUrl: https://siderolink.api/join?token=secret # SideroLink API URL to connect to.
{{< /highlight >}}
| Field | Type | Description | Value(s) |
|-------|------|-------------|----------|
|`apiUrl` |URL |SideroLink API URL to connect to. <details><summary>Show example(s)</summary>{{< highlight yaml >}}
apiUrl: https://siderolink.api/join?token=secret
{{< /highlight >}}</details> | |

View File

@ -0,0 +1,14 @@
---
description: |
Package v1alpha1 contains definition of the `v1alpha1` configuration document.
Even though the machine configuration in Talos Linux is multi-document, at the moment
this configuration document contains most of the configuration options.
It is expected that new configuration options will be added as new documents, and existing ones
migrated to their own documents.
title: v1alpha1
---
<!-- markdownlint-disable -->

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,246 @@
---
title: "Kernel"
description: "Linux kernel reference."
---
## Commandline Parameters
Talos supports a number of kernel commandline parameters. Some are required for
it to operate. Others are optional and useful in certain circumstances.
Several of these are enforced by the Kernel Self Protection Project [KSPP](https://kernsec.org/wiki/index.php/Kernel_Self_Protection_Project/Recommended_Settings).
**Required** parameters:
* `talos.platform`: can be one of `aws`, `azure`, `container`, `digitalocean`, `equinixMetal`, `gcp`, `hcloud`, `metal`, `nocloud`, `openstack`, `oracle`, `scaleway`, `upcloud`, `vmware` or `vultr`
* `slab_nomerge`: required by KSPP
* `pti=on`: required by KSPP
**Recommended** parameters:
* `init_on_alloc=1`: advised by KSPP, enabled by default in kernel config
* `init_on_free=1`: advised by KSPP, enabled by default in kernel config
### Available Talos-specific parameters
#### `ip`
Initial configuration of the interface, routes, DNS, NTP servers (multiple `ip=` kernel parameters are accepted).
Full documentation is available in the [Linux kernel docs](https://www.kernel.org/doc/Documentation/filesystems/nfs/nfsroot.txt).
`ip=<client-ip>:<server-ip>:<gw-ip>:<netmask>:<hostname>:<device>:<autoconf>:<dns0-ip>:<dns1-ip>:<ntp0-ip>`
Talos will use the configuration supplied via the kernel parameter as the initial network configuration.
This parameter is useful in the environments where DHCP doesn't provide IP addresses or when default DNS and NTP servers should be overridden
before loading machine configuration.
Partial configuration can be applied as well, e.g. `ip=:::::::<dns0-ip>:<dns1-ip>:<ntp0-ip>` sets only the DNS and NTP servers.
IPv6 addresses can be specified by enclosing them in the square brackets, e.g. `ip=[2001:db8::a]:[2001:db8::b]:[fe80::1]::controlplane1:eth1::[2001:4860:4860::6464]:[2001:4860:4860::64]:[2001:4860:4806::]`.
`<netmask>` can use either an IP address notation (IPv4: `255.255.255.0`, IPv6: `[ffff:ffff:ffff:ffff::0]`), or simply a number of one bits in the netmask (`24`).
`<device>` can be traditional interface naming scheme `eth0, eth1` or `enx<MAC>`, example: `enx78e7d1ea46da`
DHCP can be enabled by setting `<autoconf>` to `dhcp`, example: `ip=:::::eth0.3:dhcp`.
Alternative syntax is `ip=eth0.3:dhcp`.
#### `bond`
Bond interface configuration.
Full documentation is available in the [Dracut kernel docs](https://man7.org/linux/man-pages/man7/dracut.cmdline.7.html).
`bond=<bondname>:<bondslaves>:<options>:<mtu>`
Talos will use the `bond=` kernel parameter if supplied to set the initial bond configuration.
This parameter is useful in environments where the switch ports are suspended if the machine doesn't setup a LACP bond.
If only the bond name is supplied, the bond will be created with `eth0` and `eth1` as slaves and bond mode set as `balance-rr`
All these below configurations are equivalent:
* `bond=bond0`
* `bond=bond0:`
* `bond=bond0::`
* `bond=bond0:::`
* `bond=bond0:eth0,eth1`
* `bond=bond0:eth0,eth1:balance-rr`
An example of a bond configuration with all options specified:
`bond=bond1:eth3,eth4:mode=802.3ad,xmit_hash_policy=layer2+3:1450`
This will create a bond interface named `bond1` with `eth3` and `eth4` as slaves and set the bond mode to `802.3ad`, the transmit hash policy to `layer2+3` and bond interface MTU to 1450.
#### `vlan`
The interface vlan configuration.
Full documentation is available in the [Dracut kernel docs](https://man7.org/linux/man-pages/man7/dracut.cmdline.7.html).
Talos will use the `vlan=` kernel parameter if supplied to set the initial vlan configuration.
This parameter is useful in environments where the switch ports are VLAN tagged with no native VLAN.
Only one vlan can be configured at this stage.
An example of a vlan configuration including static ip configuration:
`vlan=eth0.100:eth0 ip=172.20.0.2::172.20.0.1:255.255.255.0::eth0.100:::::`
This will create a vlan interface named `eth0.100` with `eth0` as the underlying interface and set the vlan id to 100 with static IP 172.20.0.2/24 and 172.20.0.1 as default gateway.
#### `net.ifnames=0`
Disable the predictable network interface names by specifying `net.ifnames=0` on the kernel command line.
#### `panic`
The amount of time to wait after a panic before a reboot is issued.
Talos will always reboot if it encounters an unrecoverable error.
However, when collecting debug information, it may reboot too quickly for
humans to read the logs.
This option allows the user to delay the reboot to give time to collect debug
information from the console screen.
A value of `0` disables automatic rebooting entirely.
#### `talos.config`
The URL at which the machine configuration data may be found (only for `metal` platform, with the kernel parameter `talos.platform=metal`).
This parameter supports variable substitution inside URL query values for the following case-insensitive placeholders:
* `${uuid}` the SMBIOS UUID
* `${serial}` the SMBIOS Serial Number
* `${mac}` the MAC address of the first network interface attaining link state `up`
* `${hostname}` the hostname of the machine
The following example
`http://example.com/metadata?h=${hostname}&m=${mac}&s=${serial}&u=${uuid}`
may translate to
`http://example.com/metadata?h=myTestHostname&m=52%3A2f%3Afd%3Adf%3Afc%3Ac0&s=0OCZJ19N65&u=40dcbd19-3b10-444e-bfff-aaee44a51fda`
For backwards compatibility we insert the system UUID into the query parameter `uuid` if its value is empty. As in
`http://example.com/metadata?uuid=` => `http://example.com/metadata?uuid=40dcbd19-3b10-444e-bfff-aaee44a51fda`
##### `metal-iso`
When the kernel parameter `talos.config=metal-iso` is set, Talos will attempt to load the machine configuration from any block device with a filesystem label of `metal-iso`.
Talos will look for a file named `config.yaml` in the root of the filesystem.
For example, such ISO filesystem can be created with:
```sh
mkdir iso/
cp config.yaml iso/
mkisofs -joliet -rock -volid 'metal-iso' -output config.iso iso/
```
#### `talos.config.auth.*`
Kernel parameters prefixed with `talos.config.auth.` are used to configure [OAuth2 authentication for the machine configuration]({{< relref "../advanced/machine-config-oauth" >}}).
#### `talos.platform`
The platform name on which Talos will run.
Valid options are:
* `aws`
* `azure`
* `container`
* `digitalocean`
* `equinixMetal`
* `gcp`
* `hcloud`
* `metal`
* `nocloud`
* `openstack`
* `oracle`
* `scaleway`
* `upcloud`
* `vmware`
* `vultr`
#### `talos.board`
The board name, if Talos is being used on an ARM64 SBC.
Supported boards are:
* `bananapi_m64`: Banana Pi M64
* `libretech_all_h3_cc_h5`: Libre Computer ALL-H3-CC
* `rock64`: Pine64 Rock64
* ...
#### `talos.hostname`
The hostname to be used.
The hostname is generally specified in the machine config.
However, in some cases, the DHCP server needs to know the hostname
before the machine configuration has been acquired.
Unless specifically required, the machine configuration should be used
instead.
#### `talos.shutdown`
The type of shutdown to use when Talos is told to shutdown.
Valid options are:
* `halt`
* `poweroff`
#### `talos.network.interface.ignore`
A network interface which should be ignored and not configured by Talos.
Before a configuration is applied (early on each boot), Talos attempts to
configure each network interface by DHCP.
If there are many network interfaces on the machine which have link but no
DHCP server, this can add significant boot delays.
This option may be specified multiple times for multiple network interfaces.
#### `talos.experimental.wipe`
Resets the disk before starting up the system.
Valid options are:
* `system` resets system disk.
* `system:EPHEMERAL,STATE` resets ephemeral and state partitions. Doing this reverts Talos into maintenance mode.
#### `talos.unified_cgroup_hierarchy`
Talos defaults to always using the unified cgroup hierarchy (`cgroupsv2`), but `cgroupsv1`
can be forced with `talos.unified_cgroup_hierarchy=0`.
> Note: `cgroupsv1` is deprecated and it should be used only for compatibility with workloads which don't support `cgroupsv2` yet.
#### `talos.dashboard.disabled`
By default, Talos redirects kernel logs to virtual console `/dev/tty1` and starts the dashboard on `/dev/tty2`,
then switches to the dashboard tty.
If you set `talos.dashboard.disabled=1`, this behavior will be disabled.
Kernel logs will be sent to the currently active console and the dashboard will not be started.
It is set to be `1` by default on SBCs.
#### `talos.environment`
Each value of the argument sets a default environment variable.
The expected format is `key=value`.
Example:
```text
talos.environment=http_proxy=http://proxy.example.com:8080 talos.environment=https_proxy=http://proxy.example.com:8080
```

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,5 @@
---
title: Talos Linux Guides
weight: 20
description: "Documentation on how to manage Talos Linux"
---

View File

@ -0,0 +1,5 @@
---
title: "Configuration"
weight: 20
description: "Guides on how to configure Talos Linux machines"
---

View File

@ -0,0 +1,23 @@
---
title: "Custom Certificate Authorities"
description: "How to supply custom certificate authorities"
aliases:
- ../../guides/configuring-certificate-authorities
---
## Appending the Certificate Authority
Put into each machine the PEM encoded certificate:
```yaml
machine:
...
files:
- content: |
-----BEGIN CERTIFICATE-----
...
-----END CERTIFICATE-----
permissions: 0644
path: /etc/ssl/certs/ca-certificates
op: append
```

View File

@ -0,0 +1,65 @@
---
title: "Containerd"
description: "Customize Containerd Settings"
aliases:
- ../../guides/configuring-containerd
---
The base containerd configuration expects to merge in any additional configs present in `/etc/cri/conf.d/20-customization.part`.
## Examples
### Exposing Metrics
Patch the machine config by adding the following:
```yaml
machine:
files:
- content: |
[metrics]
address = "0.0.0.0:11234"
path: /etc/cri/conf.d/20-customization.part
op: create
```
Once the server reboots, metrics are now available:
```bash
$ curl ${IP}:11234/v1/metrics
# HELP container_blkio_io_service_bytes_recursive_bytes The blkio io service bytes recursive
# TYPE container_blkio_io_service_bytes_recursive_bytes gauge
container_blkio_io_service_bytes_recursive_bytes{container_id="0677d73196f5f4be1d408aab1c4125cf9e6c458a4bea39e590ac779709ffbe14",device="/dev/dm-0",major="253",minor="0",namespace="k8s.io",op="Async"} 0
container_blkio_io_service_bytes_recursive_bytes{container_id="0677d73196f5f4be1d408aab1c4125cf9e6c458a4bea39e590ac779709ffbe14",device="/dev/dm-0",major="253",minor="0",namespace="k8s.io",op="Discard"} 0
...
...
```
### Pause Image
This change is often required for air-gapped environments, as `containerd` CRI plugin has a reference to the `pause` image which is used
to create pods, and it can't be controlled with Kubernetes pod definitions.
```yaml
machine:
files:
- content: |
[plugins]
[plugins."io.containerd.grpc.v1.cri"]
sandbox_image = "registry.k8s.io/pause:3.8"
path: /etc/cri/conf.d/20-customization.part
op: create
```
Now the `pause` image is set to `registry.k8s.io/pause:3.8`:
```bash
$ talosctl containers --kubernetes
NODE NAMESPACE ID IMAGE PID STATUS
172.20.0.5 k8s.io kube-system/kube-flannel-6hfck registry.k8s.io/pause:3.8 1773 SANDBOX_READY
172.20.0.5 k8s.io └─ kube-system/kube-flannel-6hfck:install-cni:bc39fec3cbac ghcr.io/siderolabs/install-cni:v1.3.0-alpha.0-2-gb155fa0 0 CONTAINER_EXITED
172.20.0.5 k8s.io └─ kube-system/kube-flannel-6hfck:install-config:5c3989353b98 ghcr.io/siderolabs/flannel:v0.20.1 0 CONTAINER_EXITED
172.20.0.5 k8s.io └─ kube-system/kube-flannel-6hfck:kube-flannel:116c67b50da8 ghcr.io/siderolabs/flannel:v0.20.1 2092 CONTAINER_RUNNING
172.20.0.5 k8s.io kube-system/kube-proxy-xp7jq registry.k8s.io/pause:3.8 1780 SANDBOX_READY
172.20.0.5 k8s.io └─ kube-system/kube-proxy-xp7jq:kube-proxy:84fc77c59e17 registry.k8s.io/kube-proxy:v1.26.0-alpha.3 1843 CONTAINER_RUNNING
```

View File

@ -0,0 +1,196 @@
---
title: "Disk Encryption"
description: "Guide on using system disk encryption"
aliases:
- ../../guides/disk-encryption
---
It is possible to enable encryption for system disks at the OS level.
Currently, only [STATE]({{< relref "../../learn-more/architecture/#file-system-partitions" >}}) and [EPHEMERAL]({{< relref "../../learn-more/architecture/#file-system-partitions" >}}) partitions can be encrypted.
STATE contains the most sensitive node data: secrets and certs.
The EPHEMERAL partition may contain sensitive workload data.
Data is encrypted using LUKS2, which is provided by the Linux kernel modules and `cryptsetup` utility.
The operating system will run additional setup steps when encryption is enabled.
If the disk encryption is enabled for the STATE partition, the system will:
- Save STATE encryption config as JSON in the META partition.
- Before mounting the STATE partition, load encryption configs either from the machine config or from the META partition.
Note that the machine config is always preferred over the META one.
- Before mounting the STATE partition, format and encrypt it.
This occurs only if the STATE partition is empty and has no filesystem.
If the disk encryption is enabled for the EPHEMERAL partition, the system will:
- Get the encryption config from the machine config.
- Before mounting the EPHEMERAL partition, encrypt and format it.
This occurs only if the EPHEMERAL partition is empty and has no filesystem.
Talos Linux supports four encryption methods, which can be combined together for a single partition:
- `static` - encrypt with the static passphrase (weakest protection, for `STATE` partition encryption it means that the passphrase will be stored in the `META` partition).
- `nodeID` - encrypt with the key derived from the node UUID (weak, it is designed to protect against data being leaked or recovered from a drive that has been removed from a Talos Linux node).
- `kms` - encrypt using key sealed with network KMS (strong, but requires network access to decrypt the data.)
- `tpm` - encrypt with the key derived from the TPM (strong, when used with [SecureBoot]({{< relref "../install/bare-metal-platforms/secureboot" >}})).
> Note: `nodeID` encryption is not designed to protect against attacks where physical access to the machine, including the drive, is available.
> It uses the hardware characteristics of the machine in order to decrypt the data, so drives that have been removed, or recycled from a cloud environment or attached to a different virtual machine, will maintain their protection and encryption.
## Configuration
Disk encryption is disabled by default.
To enable disk encryption you should modify the machine configuration with the following options:
```yaml
machine:
...
systemDiskEncryption:
ephemeral:
provider: luks2
keys:
- nodeID: {}
slot: 0
state:
provider: luks2
keys:
- nodeID: {}
slot: 0
```
### Encryption Keys
> Note: What the LUKS2 docs call "keys" are, in reality, a passphrase.
> When this passphrase is added, LUKS2 runs argon2 to create an actual key from that passphrase.
LUKS2 supports up to 32 encryption keys and it is possible to specify all of them in the machine configuration.
Talos always tries to sync the keys list defined in the machine config with the actual keys defined for the LUKS2 partition.
So if you update the keys list, keep at least one key that is not changed to be used for key management.
When you define a key you should specify the key kind and the `slot`:
```yaml
machine:
...
state:
keys:
- nodeID: {} # key kind
slot: 1
ephemeral:
keys:
- static:
passphrase: supersecret
slot: 0
```
Take a note that key order does not play any role on which key slot is used.
Every key must always have a slot defined.
### Encryption Key Kinds
Talos supports two kinds of keys:
- `nodeID` which is generated using the node UUID and the partition label (note that if the node UUID is not really random it will fail the entropy check).
- `static` which you define right in the configuration.
- `kms` which is sealed with the network KMS.
- `tpm` which is sealed using the TPM and protected with SecureBoot.
> Note: Use static keys only if your STATE partition is encrypted and only for the EPHEMERAL partition.
> For the STATE partition it will be stored in the META partition, which is not encrypted.
### Key Rotation
In order to completely rotate keys, it is necessary to do `talosctl apply-config` a couple of times, since there is a need to always maintain a single working key while changing the other keys around it.
So, for example, first add a new key:
```yaml
machine:
...
ephemeral:
keys:
- static:
passphrase: oldkey
slot: 0
- static:
passphrase: newkey
slot: 1
...
```
Run:
```bash
talosctl apply-config -n <node> -f config.yaml
```
Then remove the old key:
```yaml
machine:
...
ephemeral:
keys:
- static:
passphrase: newkey
slot: 1
...
```
Run:
```bash
talosctl apply-config -n <node> -f config.yaml
```
## Going from Unencrypted to Encrypted and Vice Versa
### Ephemeral Partition
There is no in-place encryption support for the partitions right now, so to avoid losing data only empty partitions can be encrypted.
As such, migration from unencrypted to encrypted needs some additional handling, especially around explicitly wiping partitions.
- `apply-config` should be called with `--mode=staged`.
- Partition should be wiped after `apply-config`, but before the reboot.
Edit your machine config and add the encryption configuration:
```bash
vim config.yaml
```
Apply the configuration with `--mode=staged`:
```bash
talosctl apply-config -f config.yaml -n <node ip> --mode=staged
```
Wipe the partition you're going to encrypt:
```bash
talosctl reset --system-labels-to-wipe EPHEMERAL -n <node ip> --reboot=true
```
That's it!
After you run the last command, the partition will be wiped and the node will reboot.
During the next boot the system will encrypt the partition.
### State Partition
Calling wipe against the STATE partition will make the node lose the config, so the previous flow is not going to work.
The flow should be to first wipe the STATE partition:
```bash
talosctl reset --system-labels-to-wipe STATE -n <node ip> --reboot=true
```
Node will enter into maintenance mode, then run `apply-config` with `--insecure` flag:
```bash
talosctl apply-config --insecure -n <node ip> -f config.yaml
```
After installation is complete the node should encrypt the STATE partition.

View File

@ -0,0 +1,157 @@
---
title: "Editing Machine Configuration"
description: "How to edit and patch Talos machine configuration, with reboot, immediately, or stage update on reboot."
aliases:
- ../../guides/editing-machine-configuration
---
Talos node state is fully defined by [machine configuration]({{< relref "../../reference/configuration" >}}).
Initial configuration is delivered to the node at bootstrap time, but configuration can be updated while the node is running.
There are three `talosctl` commands which facilitate machine configuration updates:
* `talosctl apply-config` to apply configuration from the file
* `talosctl edit machineconfig` to launch an editor with existing node configuration, make changes and apply configuration back
* `talosctl patch machineconfig` to apply automated machine configuration via JSON patch
Each of these commands can operate in one of four modes:
* apply change in automatic mode (default): reboot if the change can't be applied without a reboot, otherwise apply the change immediately
* apply change with a reboot (`--mode=reboot`): update configuration, reboot Talos node to apply configuration change
* apply change immediately (`--mode=no-reboot` flag): change is applied immediately without a reboot, fails if the change contains any fields that can not be updated without a reboot
* apply change on next reboot (`--mode=staged`): change is staged to be applied after a reboot, but node is not rebooted
* apply change with automatic revert (`--mode=try`): change is applied immediately (if not possible, returns an error), and reverts it automatically in 1 minute if no configuration update is applied
* apply change in the interactive mode (`--mode=interactive`; only for `talosctl apply-config`): launches TUI based interactive installer
> Note: applying change on next reboot (`--mode=staged`) doesn't modify current node configuration, so next call to
> `talosctl edit machineconfig --mode=staged` will not see changes
Additionally, there is also `talosctl get machineconfig -o yaml`, which retrieves the current node configuration API resource and contains the machine configuration in the `.spec` field.
It can be used to modify the configuration locally before being applied to the node.
The list of config changes allowed to be applied immediately in Talos {{< release >}}:
* `.debug`
* `.cluster`
* `.machine.time`
* `.machine.certCANs`
* `.machine.install` (configuration is only applied during install/upgrade)
* `.machine.network`
* `.machine.nodeLabels`
* `.machine.sysfs`
* `.machine.sysctls`
* `.machine.logging`
* `.machine.controlplane`
* `.machine.kubelet`
* `.machine.pods`
* `.machine.kernel`
* `.machine.registries` (CRI containerd plugin will not pick up the registry authentication settings without a reboot)
* `.machine.features.kubernetesTalosAPIAccess`
### `talosctl apply-config`
This command is traditionally used to submit initial machine configuration generated by `talosctl gen config` to the node.
It can also be used to apply configuration to running nodes.
The initial YAML for this is typically obtained using `talosctl get machineconfig -o yaml | yq eval .spec >machs.yaml`.
(We must use [`yq`](https://github.com/mikefarah/yq) because for historical reasons, `get` returns the configuration as a full resource, while `apply-config` only accepts the raw machine config directly.)
Example:
```bash
talosctl -n <IP> apply-config -f config.yaml
```
Command `apply-config` can also be invoked as `apply machineconfig`:
```bash
talosctl -n <IP> apply machineconfig -f config.yaml
```
Applying machine configuration immediately (without a reboot):
```bash
talosctl -n IP apply machineconfig -f config.yaml --mode=no-reboot
```
Starting the interactive installer:
```bash
talosctl -n IP apply machineconfig --mode=interactive
```
> Note: when a Talos node is running in the maintenance mode it's necessary to provide `--insecure (-i)` flag to connect to the API and apply the config.
### `taloctl edit machineconfig`
Command `talosctl edit` loads current machine configuration from the node and launches configured editor to modify the config.
If config hasn't been changed in the editor (or if updated config is empty), update is not applied.
> Note: Talos uses environment variables `TALOS_EDITOR`, `EDITOR` to pick up the editor preference.
> If environment variables are missing, `vi` editor is used by default.
Example:
```bash
talosctl -n <IP> edit machineconfig
```
Configuration can be edited for multiple nodes if multiple IP addresses are specified:
```bash
talosctl -n <IP1>,<IP2>,... edit machineconfig
```
Applying machine configuration change immediately (without a reboot):
```bash
talosctl -n <IP> edit machineconfig --mode=no-reboot
```
### `talosctl patch machineconfig`
Command `talosctl patch` works similar to `talosctl edit` command - it loads current machine configuration, but instead of launching configured editor it applies a set of [JSON patches](http://jsonpatch.com/) to the configuration and writes the result back to the node.
Example, updating kubelet version (in auto mode):
```bash
$ talosctl -n <IP> patch machineconfig -p '[{"op": "replace", "path": "/machine/kubelet/image", "value": "ghcr.io/siderolabs/kubelet:v{{< k8s_release >}}"}]'
patched mc at the node <IP>
```
Updating kube-apiserver version in immediate mode (without a reboot):
```bash
$ talosctl -n <IP> patch machineconfig --mode=no-reboot -p '[{"op": "replace", "path": "/cluster/apiServer/image", "value": "registry.k8s.io/kube-apiserver:v{{< k8s_release >}}"}]'
patched mc at the node <IP>
```
A patch might be applied to multiple nodes when multiple IPs are specified:
```bash
talosctl -n <IP1>,<IP2>,... patch machineconfig -p '[{...}]'
```
Patches can also be sourced from files using `@file` syntax:
```bash
talosctl -n <IP> patch machineconfig -p @kubelet-patch.json -p @manifest-patch.json
```
It might be easier to store patches in YAML format vs. the default JSON format.
Talos can detect file format automatically:
```yaml
# kubelet-patch.yaml
- op: replace
path: /machine/kubelet/image
value: ghcr.io/siderolabs/kubelet:v{{< k8s_release >}}
```
```bash
talosctl -n <IP> patch machineconfig -p @kubelet-patch.yaml
```
### Recovering from Node Boot Failures
If a Talos node fails to boot because of wrong configuration (for example, control plane endpoint is incorrect), configuration can be updated to fix the issue.

View File

@ -0,0 +1,413 @@
---
title: "Logging"
description: "Dealing with Talos Linux logs."
aliases:
- ../../guiides/logging
---
## Viewing logs
Kernel messages can be retrieved with `talosctl dmesg` command:
```sh
$ talosctl -n 172.20.1.2 dmesg
172.20.1.2: kern: info: [2021-11-10T10:09:37.662764956Z]: Command line: init_on_alloc=1 slab_nomerge pti=on consoleblank=0 nvme_core.io_timeout=4294967295 printk.devkmsg=on ima_template=ima-ng ima_appraise=fix ima_hash=sha512 console=ttyS0 reboot=k panic=1 talos.shutdown=halt talos.platform=metal talos.config=http://172.20.1.1:40101/config.yaml
[...]
```
Service logs can be retrieved with `talosctl logs` command:
```sh
$ talosctl -n 172.20.1.2 services
NODE SERVICE STATE HEALTH LAST CHANGE LAST EVENT
172.20.1.2 apid Running OK 19m27s ago Health check successful
172.20.1.2 containerd Running OK 19m29s ago Health check successful
172.20.1.2 cri Running OK 19m27s ago Health check successful
172.20.1.2 etcd Running OK 19m22s ago Health check successful
172.20.1.2 kubelet Running OK 19m20s ago Health check successful
172.20.1.2 machined Running ? 19m30s ago Service started as goroutine
172.20.1.2 trustd Running OK 19m27s ago Health check successful
172.20.1.2 udevd Running OK 19m28s ago Health check successful
$ talosctl -n 172.20.1.2 logs machined
172.20.1.2: [talos] task setupLogger (1/1): done, 106.109µs
172.20.1.2: [talos] phase logger (1/7): done, 564.476µs
[...]
```
Container logs for Kubernetes pods can be retrieved with `talosctl logs -k` command:
```sh
$ talosctl -n 172.20.1.2 containers -k
NODE NAMESPACE ID IMAGE PID STATUS
172.20.1.2 k8s.io kube-system/kube-flannel-dk6d5 registry.k8s.io/pause:3.6 1329 SANDBOX_READY
172.20.1.2 k8s.io └─ kube-system/kube-flannel-dk6d5:install-cni:f1d4cf68feb9 ghcr.io/siderolabs/install-cni:v0.7.0-alpha.0-1-g2bb2efc 0 CONTAINER_EXITED
172.20.1.2 k8s.io └─ kube-system/kube-flannel-dk6d5:install-config:bc39fec3cbac quay.io/coreos/flannel:v0.13.0 0 CONTAINER_EXITED
172.20.1.2 k8s.io └─ kube-system/kube-flannel-dk6d5:kube-flannel:5c3989353b98 quay.io/coreos/flannel:v0.13.0 1610 CONTAINER_RUNNING
172.20.1.2 k8s.io kube-system/kube-proxy-gfkqj registry.k8s.io/pause:3.5 1311 SANDBOX_READY
172.20.1.2 k8s.io └─ kube-system/kube-proxy-gfkqj:kube-proxy:ad5e8ddc7e7f registry.k8s.io/kube-proxy:v{{< k8s_release >}} 1379 CONTAINER_RUNNING
$ talosctl -n 172.20.1.2 logs -k kube-system/kube-proxy-gfkqj:kube-proxy:ad5e8ddc7e7f
172.20.1.2: 2021-11-30T19:13:20.567825192Z stderr F I1130 19:13:20.567737 1 server_others.go:138] "Detected node IP" address="172.20.0.3"
172.20.1.2: 2021-11-30T19:13:20.599684397Z stderr F I1130 19:13:20.599613 1 server_others.go:206] "Using iptables Proxier"
[...]
```
## Sending logs
### Service logs
You can enable logs sendings in machine configuration:
```yaml
machine:
logging:
destinations:
- endpoint: "udp://127.0.0.1:12345/"
format: "json_lines"
- endpoint: "tcp://host:5044/"
format: "json_lines"
```
Several destinations can be specified.
Supported protocols are UDP and TCP.
The only currently supported format is `json_lines`:
```json
{
"msg": "[talos] apply config request: immediate true, on reboot false",
"talos-level": "info",
"talos-service": "machined",
"talos-time": "2021-11-10T10:48:49.294858021Z"
}
```
Messages are newline-separated when sent over TCP.
Over UDP messages are sent with one message per packet.
`msg`, `talos-level`, `talos-service`, and `talos-time` fields are always present; there may be additional fields.
### Kernel logs
Kernel log delivery can be enabled with the `talos.logging.kernel` kernel command line argument, which can be specified
in the `.machine.installer.extraKernelArgs`:
```yaml
machine:
install:
extraKernelArgs:
- talos.logging.kernel=tcp://host:5044/
```
Also kernel logs delivery can be configured using the [document]({{< relref "../../reference/configuration/runtime/kmsglogconfig.md" >}}) in machine configuration:
```yaml
apiVersion: v1alpha1
kind: KmsgLogConfig
name: remote-log
url: tcp://host:5044/
```
Kernel log destination is specified in the same way as service log endpoint.
The only supported format is `json_lines`.
Sample message:
```json
{
"clock":6252819, // time relative to the kernel boot time
"facility":"user",
"msg":"[talos] task startAllServices (1/1): waiting for 6 services\n",
"priority":"warning",
"seq":711,
"talos-level":"warn", // Talos-translated `priority` into common logging level
"talos-time":"2021-11-26T16:53:21.3258698Z" // Talos-translated `clock` using current time
}
```
> `extraKernelArgs` in the machine configuration are only applied on Talos upgrades, not just by applying the config.
> (Upgrading to the same version is fine).
### Filebeat example
To forward logs to other Log collection services, one way to do this is sending
them to a [Filebeat](https://www.elastic.co/beats/filebeat) running in the
cluster itself (in the host network), which takes care of forwarding it to
other endpoints (and the necessary transformations).
If [Elastic Cloud on Kubernetes](https://www.elastic.co/elastic-cloud-kubernetes)
is being used, the following Beat (custom resource) configuration might be
helpful:
```yaml
apiVersion: beat.k8s.elastic.co/v1beta1
kind: Beat
metadata:
name: talos
spec:
type: filebeat
version: 7.15.1
elasticsearchRef:
name: talos
config:
filebeat.inputs:
- type: "udp"
host: "127.0.0.1:12345"
processors:
- decode_json_fields:
fields: ["message"]
target: ""
- timestamp:
field: "talos-time"
layouts:
- "2006-01-02T15:04:05.999999999Z07:00"
- drop_fields:
fields: ["message", "talos-time"]
- rename:
fields:
- from: "msg"
to: "message"
daemonSet:
updateStrategy:
rollingUpdate:
maxUnavailable: 100%
podTemplate:
spec:
dnsPolicy: ClusterFirstWithHostNet
hostNetwork: true
securityContext:
runAsUser: 0
containers:
- name: filebeat
ports:
- protocol: UDP
containerPort: 12345
hostPort: 12345
```
The input configuration ensures that messages and timestamps are extracted properly.
Refer to the Filebeat documentation on how to forward logs to other outputs.
Also note the `hostNetwork: true` in the `daemonSet` configuration.
This ensures filebeat uses the host network, and listens on `127.0.0.1:12345`
(UDP) on every machine, which can then be specified as a logging endpoint in
the machine configuration.
### Fluent-bit example
First, we'll create a value file for the `fluentd-bit` Helm chart.
```yaml
# fluentd-bit.yaml
podAnnotations:
fluentbit.io/exclude: 'true'
extraPorts:
- port: 12345
containerPort: 12345
protocol: TCP
name: talos
config:
service: |
[SERVICE]
Flush 5
Daemon Off
Log_Level warn
Parsers_File custom_parsers.conf
inputs: |
[INPUT]
Name tcp
Listen 0.0.0.0
Port 12345
Format json
Tag talos.*
[INPUT]
Name tail
Alias kubernetes
Path /var/log/containers/*.log
Parser containerd
Tag kubernetes.*
[INPUT]
Name tail
Alias audit
Path /var/log/audit/kube/*.log
Parser audit
Tag audit.*
filters: |
[FILTER]
Name kubernetes
Alias kubernetes
Match kubernetes.*
Kube_Tag_Prefix kubernetes.var.log.containers.
Use_Kubelet Off
Merge_Log On
Merge_Log_Trim On
Keep_Log Off
K8S-Logging.Parser Off
K8S-Logging.Exclude On
Annotations Off
Labels On
[FILTER]
Name modify
Match kubernetes.*
Add source kubernetes
Remove logtag
customParsers: |
[PARSER]
Name audit
Format json
Time_Key requestReceivedTimestamp
Time_Format %Y-%m-%dT%H:%M:%S.%L%z
[PARSER]
Name containerd
Format regex
Regex ^(?<time>[^ ]+) (?<stream>stdout|stderr) (?<logtag>[^ ]*) (?<log>.*)$
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L%z
outputs: |
[OUTPUT]
Name stdout
Alias stdout
Match *
Format json_lines
# If you wish to ship directly to Loki from Fluentbit,
# Uncomment the following output, updating the Host with your Loki DNS/IP info as necessary.
# [OUTPUT]
# Name loki
# Match *
# Host loki.loki.svc
# Port 3100
# Labels job=fluentbit
# Auto_Kubernetes_Labels on
daemonSetVolumes:
- name: varlog
hostPath:
path: /var/log
daemonSetVolumeMounts:
- name: varlog
mountPath: /var/log
tolerations:
- operator: Exists
effect: NoSchedule
```
Next, we will add the helm repo for FluentBit, and deploy it to the cluster.
```shell
helm repo add fluent https://fluent.github.io/helm-charts
helm upgrade -i --namespace=kube-system -f fluentd-bit.yaml fluent-bit fluent/fluent-bit
```
Now we need to find the service IP.
```shell
$ kubectl -n kube-system get svc -l app.kubernetes.io/name=fluent-bit
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
fluent-bit ClusterIP 10.200.0.138 <none> 2020/TCP,5170/TCP 108m
```
Finally, we will change talos log destination with the command ```talosctl edit mc```.
```yaml
machine:
logging:
destinations:
- endpoint: "tcp://10.200.0.138:5170"
format: "json_lines"
```
This example configuration was well tested with Cilium CNI, and it should work with iptables/ipvs based CNI plugins too.
### Vector example
[Vector](https://vector.dev) is a lightweight observability pipeline ideal for a Kubernetes environment.
It can ingest (source) logs from multiple sources, perform remapping on the logs (transform), and forward the resulting pipeline to multiple destinations (sinks).
As it is an end to end platform, it can be run as a single-deployment 'aggregator' as well as a replicaSet of 'Agents' that run on each node.
As Talos can be set as above to send logs to a destination, we can run Vector as an Aggregator, and forward both kernel and service to a UDP socket in-cluster.
Below is an excerpt of a source/sink setup for Talos, with a 'sink' destination of an in-cluster [Grafana Loki](https://grafana.com/oss/loki/) log aggregation service.
As Loki can create labels from the log input, we have set up the Loki sink to create labels based on the host IP, service and facility of the inbound logs.
Note that a method of exposing the Vector service will be required which may vary depending on your setup - a LoadBalancer is a good option.
```yaml
role: "Stateless-Aggregator"
# Sources
sources:
talos_kernel_logs:
address: 0.0.0.0:6050
type: socket
mode: udp
max_length: 102400
decoding:
codec: json
host_key: __host
talos_service_logs:
address: 0.0.0.0:6051
type: socket
mode: udp
max_length: 102400
decoding:
codec: json
host_key: __host
# Sinks
sinks:
talos_kernel:
type: loki
inputs:
- talos_kernel_logs_xform
endpoint: http://loki.system-monitoring:3100
encoding:
codec: json
except_fields:
- __host
batch:
max_bytes: 1048576
out_of_order_action: rewrite_timestamp
labels:
hostname: >-
{{`{{ __host }}`}}
facility: >-
{{`{{ facility }}`}}
talos_service:
type: loki
inputs:
- talos_service_logs_xform
endpoint: http://loki.system-monitoring:3100
encoding:
codec: json
except_fields:
- __host
batch:
max_bytes: 400000
out_of_order_action: rewrite_timestamp
labels:
hostname: >-
{{`{{ __host }}`}}
service: >-
{{`{{ "talos-service" }}`}}
```

View File

@ -0,0 +1,72 @@
---
title: "Managing Talos PKI"
description: "How to manage Public Key Infrastructure"
aliases:
- ../../guides/managing-pki
---
## Generating New Client Configuration
### Using Controlplane Node
If you have a valid (not expired) `talosconfig` with `os:admin` role,
a new client configuration file can be generated with `talosctl config new` against
any controlplane node:
```shell
talosctl -n CP1 config new talosconfig-reader --roles os:reader --crt-ttl 24h
```
A specific [role]({{< relref "rbac" >}}) and certificate lifetime can be specified.
### From Secrets Bundle
If a secrets bundle (`secrets.yaml` from `talosctl gen secrets`) was saved while
[generating machine configuration]({{< relref "../../introduction/getting-started/#configure-talos ">}}):
```shell
talosctl gen config --with-secrets secrets.yaml --output-types talosconfig -o talosconfig <cluster-name> https://<cluster-endpoint>
```
> Note: `<cluster-name>` and `<cluster-endpoint>` arguments don't matter, as they are not used for `talosconfig`.
### From Control Plane Machine Configuration
In order to create a new key pair for client configuration, you will need the root Talos API CA.
The base64 encoded CA can be found in the control plane node's configuration file.
Save the the CA public key, and CA private key as `ca.crt`, and `ca.key` respectively:
```shell
yq eval .machine.ca.crt controlplane.yaml | base64 -d > ca.crt
yq eval .machine.ca.key controlplane.yaml | base64 -d > ca.key
```
Now, run the following commands to generate a certificate:
```bash
talosctl gen key --name admin
talosctl gen csr --key admin.key --ip 127.0.0.1
talosctl gen crt --ca ca --csr admin.csr --name admin
```
Put the base64-encoded files to the respective location to the `talosconfig`:
```yaml
context: mycluster
contexts:
mycluster:
endpoints:
- CP1
- CP2
ca: <base64-encoded ca.crt>
crt: <base64-encoded admin.crt>
key: <base64-encoded admin.key>
```
## Renewing an Expired Administrator Certificate
By default admin `talosconfig` certificate is valid for 365 days, while cluster CAs are valid for 10 years.
In order to prevent admin `talosconfig` from expiring, renew the client config before expiration using `talosctl config new` command described above.
If the `talosconfig` is expired or lost, you can still generate a new one using either the `secrets.yaml`
secrets bundle or the control plane node's configuration file using methods described above.

View File

@ -0,0 +1,37 @@
---
title: "NVIDIA Fabric Manager"
description: "In this guide we'll follow the procedure to enable NVIDIA Fabric Manager."
aliases:
- ../../guides/nvidia-fabricmanager
---
NVIDIA GPUs that have nvlink support (for eg: A100) will need the [nvidia-fabricmanager](https://github.com/siderolabs/extensions/pkgs/container/nvidia-fabricmanager) system extension also enabled in addition to the [NVIDIA drivers]({{< relref "nvidia-gpu" >}}).
For more information on Fabric Manager refer https://docs.nvidia.com/datacenter/tesla/fabric-manager-user-guide/index.html
The published versions of the NVIDIA fabricmanager system extensions is available [here](https://github.com/siderolabs/extensions/pkgs/container/nvidia-fabricmanager)
> The `nvidia-fabricmanager` extension version has to match with the NVIDIA driver version in use.
## Enabling the NVIDIA fabricmanager system extension
Create the [boot assets]({{< relref "../install/boot-assets" >}}) or a custom installer and perform a machine upgrade which include the following system extensions:
```text
ghcr.io/siderolabs/nvidia-open-gpu-kernel-modules:{{< nvidia_driver_release >}}-{{< release >}}
ghcr.io/siderolabs/nvidia-container-toolkit:{{< nvidia_driver_release >}}-{{< nvidia_container_toolkit_release >}}
ghcr.io/siderolabs/nvidia-fabricmanager:{{< nvidia_driver_release >}}
```
Patch the machine configuration to load the required modules:
```yaml
machine:
kernel:
modules:
- name: nvidia
- name: nvidia_uvm
- name: nvidia_drm
- name: nvidia_modeset
sysctls:
net.core.bpf_jit_harden: 1
```

View File

@ -0,0 +1,147 @@
---
title: "NVIDIA GPU (Proprietary drivers)"
description: "In this guide we'll follow the procedure to support NVIDIA GPU using proprietary drivers on Talos."
aliases:
- ../../guides/nvidia-gpu-proprietary
---
> Enabling NVIDIA GPU support on Talos is bound by [NVIDIA EULA](https://www.nvidia.com/en-us/drivers/nvidia-license/).
> The Talos published NVIDIA drivers are bound to a specific Talos release.
> The extensions versions also needs to be updated when upgrading Talos.
We will be using the following NVIDIA system extensions:
- `nonfree-kmod-nvidia`
- `nvidia-container-toolkit`
> To build a NVIDIA driver version not published by SideroLabs follow the instructions [here]({{< relref "../../../v1.4/talos-guides/configuration/nvidia-gpu-proprietary" >}})
Create the [boot assets]({{< relref "../install/boot-assets" >}}) which includes the system extensions mentioned above (or create a custom installer and perform a machine upgrade if Talos is already installed).
> Make sure the driver version matches for both the `nonfree-kmod-nvidia` and `nvidia-container-toolkit` extensions.
> The `nonfree-kmod-nvidia` extension is versioned as `<nvidia-driver-version>-<talos-release-version>` and the `nvidia-container-toolkit` extension is versioned as `<nvidia-driver-version>-<nvidia-container-toolkit-version>`.
## Enabling the NVIDIA modules and the system extension
Patch Talos machine configuration using the patch `gpu-worker-patch.yaml`:
```yaml
machine:
kernel:
modules:
- name: nvidia
- name: nvidia_uvm
- name: nvidia_drm
- name: nvidia_modeset
sysctls:
net.core.bpf_jit_harden: 1
```
Now apply the patch to all Talos nodes in the cluster having NVIDIA GPU's installed:
```bash
talosctl patch mc --patch @gpu-worker-patch.yaml
```
The NVIDIA modules should be loaded and the system extension should be installed.
This can be confirmed by running:
```bash
talosctl read /proc/modules
```
which should produce an output similar to below:
```text
nvidia_uvm 1146880 - - Live 0xffffffffc2733000 (PO)
nvidia_drm 69632 - - Live 0xffffffffc2721000 (PO)
nvidia_modeset 1142784 - - Live 0xffffffffc25ea000 (PO)
nvidia 39047168 - - Live 0xffffffffc00ac000 (PO)
```
```bash
talosctl get extensions
```
which should produce an output similar to below:
```text
NODE NAMESPACE TYPE ID VERSION NAME VERSION
172.31.41.27 runtime ExtensionStatus 000.ghcr.io-frezbo-nvidia-container-toolkit-510.60.02-v1.9.0 1 nvidia-container-toolkit 510.60.02-v1.9.0
```
```bash
talosctl read /proc/driver/nvidia/version
```
which should produce an output similar to below:
```text
NVRM version: NVIDIA UNIX x86_64 Kernel Module 510.60.02 Wed Mar 16 11:24:05 UTC 2022
GCC version: gcc version 11.2.0 (GCC)
```
## Deploying NVIDIA device plugin
First we need to create the `RuntimeClass`
Apply the following manifest to create a runtime class that uses the extension:
```yaml
---
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
name: nvidia
handler: nvidia
```
Install the NVIDIA device plugin:
```bash
helm repo add nvdp https://nvidia.github.io/k8s-device-plugin
helm repo update
helm install nvidia-device-plugin nvdp/nvidia-device-plugin --version=0.13.0 --set=runtimeClassName=nvidia
```
## (Optional) Setting the default runtime class as `nvidia`
> Do note that this will set the default runtime class to `nvidia` for all pods scheduled on the node.
Create a patch yaml `nvidia-default-runtimeclass.yaml` to update the machine config similar to below:
```yaml
- op: add
path: /machine/files
value:
- content: |
[plugins]
[plugins."io.containerd.grpc.v1.cri"]
[plugins."io.containerd.grpc.v1.cri".containerd]
default_runtime_name = "nvidia"
path: /etc/cri/conf.d/20-customization.part
op: create
```
Now apply the patch to all Talos nodes in the cluster having NVIDIA GPU's installed:
```bash
talosctl patch mc --patch @nvidia-default-runtimeclass.yaml
```
### Testing the runtime class
> Note the `spec.runtimeClassName` being explicitly set to `nvidia` in the pod spec.
Run the following command to test the runtime class:
```bash
kubectl run \
nvidia-test \
--restart=Never \
-ti --rm \
--image nvcr.io/nvidia/cuda:12.1.0-base-ubuntu22.04 \
--overrides '{"spec": {"runtimeClassName": "nvidia"}}' \
nvidia-smi
```

View File

@ -0,0 +1,146 @@
---
title: "NVIDIA GPU (OSS drivers)"
description: "In this guide we'll follow the procedure to support NVIDIA GPU using OSS drivers on Talos."
aliases:
- ../../guides/nvidia-gpu
---
> Enabling NVIDIA GPU support on Talos is bound by [NVIDIA EULA](https://www.nvidia.com/en-us/drivers/nvidia-license/).
> The Talos published NVIDIA OSS drivers are bound to a specific Talos release.
> The extensions versions also needs to be updated when upgrading Talos.
We will be using the following NVIDIA OSS system extensions:
- `nvidia-open-gpu-kernel-modules`
- `nvidia-container-toolkit`
Create the [boot assets]({{< relref "../install/boot-assets" >}}) which includes the system extensions mentioned above (or create a custom installer and perform a machine upgrade if Talos is already installed).
> Make sure the driver version matches for both the `nvidia-open-gpu-kernel-modules` and `nvidia-container-toolkit` extensions.
> The `nvidia-open-gpu-kernel-modules` extension is versioned as `<nvidia-driver-version>-<talos-release-version>` and the `nvidia-container-toolkit` extension is versioned as `<nvidia-driver-version>-<nvidia-container-toolkit-version>`.
## Enabling the NVIDIA OSS modules
Patch Talos machine configuration using the patch `gpu-worker-patch.yaml`:
```yaml
machine:
kernel:
modules:
- name: nvidia
- name: nvidia_uvm
- name: nvidia_drm
- name: nvidia_modeset
sysctls:
net.core.bpf_jit_harden: 1
```
Now apply the patch to all Talos nodes in the cluster having NVIDIA GPU's installed:
```bash
talosctl patch mc --patch @gpu-worker-patch.yaml
```
The NVIDIA modules should be loaded and the system extension should be installed.
This can be confirmed by running:
```bash
talosctl read /proc/modules
```
which should produce an output similar to below:
```text
nvidia_uvm 1146880 - - Live 0xffffffffc2733000 (PO)
nvidia_drm 69632 - - Live 0xffffffffc2721000 (PO)
nvidia_modeset 1142784 - - Live 0xffffffffc25ea000 (PO)
nvidia 39047168 - - Live 0xffffffffc00ac000 (PO)
```
```bash
talosctl get extensions
```
which should produce an output similar to below:
```text
NODE NAMESPACE TYPE ID VERSION NAME VERSION
172.31.41.27 runtime ExtensionStatus 000.ghcr.io-siderolabs-nvidia-container-toolkit-515.65.01-v1.10.0 1 nvidia-container-toolkit 515.65.01-v1.10.0
172.31.41.27 runtime ExtensionStatus 000.ghcr.io-siderolabs-nvidia-open-gpu-kernel-modules-515.65.01-v1.2.0 1 nvidia-open-gpu-kernel-modules 515.65.01-v1.2.0
```
```bash
talosctl read /proc/driver/nvidia/version
```
which should produce an output similar to below:
```text
NVRM version: NVIDIA UNIX x86_64 Kernel Module 515.65.01 Wed Mar 16 11:24:05 UTC 2022
GCC version: gcc version 12.2.0 (GCC)
```
## Deploying NVIDIA device plugin
First we need to create the `RuntimeClass`
Apply the following manifest to create a runtime class that uses the extension:
```yaml
---
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
name: nvidia
handler: nvidia
```
Install the NVIDIA device plugin:
```bash
helm repo add nvdp https://nvidia.github.io/k8s-device-plugin
helm repo update
helm install nvidia-device-plugin nvdp/nvidia-device-plugin --version=0.13.0 --set=runtimeClassName=nvidia
```
## (Optional) Setting the default runtime class as `nvidia`
> Do note that this will set the default runtime class to `nvidia` for all pods scheduled on the node.
Create a patch yaml `nvidia-default-runtimeclass.yaml` to update the machine config similar to below:
```yaml
- op: add
path: /machine/files
value:
- content: |
[plugins]
[plugins."io.containerd.grpc.v1.cri"]
[plugins."io.containerd.grpc.v1.cri".containerd]
default_runtime_name = "nvidia"
path: /etc/cri/conf.d/20-customization.part
op: create
```
Now apply the patch to all Talos nodes in the cluster having NVIDIA GPU's installed:
```bash
talosctl patch mc --patch @nvidia-default-runtimeclass.yaml
```
### Testing the runtime class
> Note the `spec.runtimeClassName` being explicitly set to `nvidia` in the pod spec.
Run the following command to test the runtime class:
```bash
kubectl run \
nvidia-test \
--restart=Never \
-ti --rm \
--image nvcr.io/nvidia/cuda:12.1.0-base-ubuntu22.04 \
--overrides '{"spec": {"runtimeClassName": "nvidia"}}' \
nvidia-smi
```

View File

@ -0,0 +1,342 @@
---
title: "Configuration Patches"
description: "In this guide, we'll patch the generated machine configuration."
---
Talos generates machine configuration for two types of machines: controlplane and worker machines.
Many configuration options can be adjusted using `talosctl gen config` but not all of them.
Configuration patching allows modifying machine configuration to fit it for the cluster or a specific machine.
## Configuration Patch Formats
Talos supports two configuration patch formats:
- strategic merge patches
- RFC6902 (JSON patches)
Strategic merge patches are the easiest to use, but JSON patches allow more precise configuration adjustments.
> Note: Talos 1.5+ supports [multi-document machine configuration]({{< relref "../../reference/configuration" >}}).
> JSON patches don't support multi-document machine configuration, while strategic merge patches do.
### Strategic Merge patches
Strategic merge patches look like incomplete machine configuration files:
```yaml
machine:
network:
hostname: worker1
```
When applied to the machine configuration, the patch gets merged with the respective section of the machine configuration:
```yaml
machine:
network:
interfaces:
- interface: eth0
addresses:
- 10.0.0.2/24
hostname: worker1
```
In general, machine configuration contents are merged with the contents of the strategic merge patch, with strategic merge patch
values overriding machine configuration values.
There are some special rules:
- If the field value is a list, the patch value is appended to the list, with the following exceptions:
- values of the fields `cluster.network.podSubnets` and `cluster.network.serviceSubnets` are overwritten on merge
- `network.interfaces` section is merged with the value in the machine config if there is a match on `interface:` or `deviceSelector:` keys
- `network.interfaces.vlans` section is merged with the value in the machine config if there is a match on the `vlanId:` key
- `cluster.apiServer.auditPolicy` value is replaced on merge
When patching a [multi-document machine configuration]({{< relref "../../reference/configuration" >}}), following rules apply:
- for each document in the patch, the document is merged with the respective document in the machine configuration (matching by `kind`, `apiVersion` and `name` for named documents)
- if the patch document doesn't exist in the machine configuration, it is appended to the machine configuration
The strategic merge patch itself might be a multi-document YAML, and each document will be applied as a patch to the base machine configuration.
### RFC6902 (JSON Patches)
[JSON patches](https://jsonpatch.com/) can be written either in JSON or YAML format.
A proper JSON patch requires an `op` field that depends on the machine configuration contents: whether the path already exists or not.
For example, the strategic merge patch from the previous section can be written either as:
```yaml
- op: replace
path: /machine/network/hostname
value: worker1
```
or:
```yaml
- op: add
path: /machine/network/hostname
value: worker1
```
The correct `op` depends on whether the `/machine/network/hostname` section exists already in the machine config or not.
## Examples
### Machine Network
Base machine configuration:
```yaml
# ...
machine:
network:
interfaces:
- interface: eth0
dhcp: false
addresses:
- 192.168.10.3/24
```
The goal is to add a virtual IP `192.168.10.50` to the `eth0` interface and add another interface `eth1` with DHCP enabled.
<!-- markdownlint-disable MD007 -->
<!-- markdownlint-disable MD032 -->
<!-- markdownlint-disable MD025 -->
{{< tabpane lang="yaml" >}}
{{< tab header="Strategic merge patch" >}}
machine:
network:
interfaces:
- interface: eth0
vip:
ip: 192.168.10.50
- interface: eth1
dhcp: true
{{< /tab >}}
{{< tab header="JSON patch" >}}
- op: add
path: /machine/network/interfaces/0/vip
value:
ip: 192.168.10.50
- op: add
path: /machine/network/interfaces/-
value:
interface: eth1
dhcp: true
{{< /tab >}}
{{< /tabpane >}}
Patched machine configuration:
```yaml
machine:
network:
interfaces:
- interface: eth0
dhcp: false
addresses:
- 192.168.10.3/24
vip:
ip: 192.168.10.50
- interface: eth1
dhcp: true
```
### Cluster Network
Base machine configuration:
```yaml
cluster:
network:
dnsDomain: cluster.local
podSubnets:
- 10.244.0.0/16
serviceSubnets:
- 10.96.0.0/12
```
The goal is to update pod and service subnets and disable default CNI (Flannel).
{{< tabpane lang="yaml" >}}
{{< tab header="Strategic merge patch" >}}
cluster:
network:
podSubnets:
- 192.168.0.0/16
serviceSubnets:
- 192.0.0.0/12
cni:
name: none
{{< /tab >}}
{{< tab header="JSON patch" >}}
- op: replace
path: /cluster/network/podSubnets
value:
- 192.168.0.0/16
- op: replace
path: /cluster/network/serviceSubnets
value:
- 192.0.0.0/12
- op: add
path: /cluster/network/cni
value:
name: none
{{< /tab >}}
{{< /tabpane >}}
Patched machine configuration:
```yaml
cluster:
network:
dnsDomain: cluster.local
podSubnets:
- 192.168.0.0/16
serviceSubnets:
- 192.0.0.0/12
cni:
name: none
```
### Kubelet
Base machine configuration:
```yaml
# ...
machine:
kubelet: {}
```
The goal is to set the `kubelet` node IP to come from the subnet `192.168.10.0/24`.
{{< tabpane lang="yaml" >}}
{{< tab header="Strategic merge patch" >}}
machine:
kubelet:
nodeIP:
validSubnets:
- 192.168.10.0/24
{{< /tab >}}
{{< tab header="JSON patch" >}}
- op: add
path: /machine/kubelet/nodeIP
value:
validSubnets:
- 192.168.10.0/24
{{< /tab >}}
{{< /tabpane >}}
Patched machine configuration:
```yaml
machine:
kubelet:
nodeIP:
validSubnets:
- 192.168.10.0/24
```
### Admission Control: Pod Security Policy
Base machine configuration:
```yaml
cluster:
apiServer:
admissionControl:
- name: PodSecurity
configuration:
apiVersion: pod-security.admission.config.k8s.io/v1alpha1
defaults:
audit: restricted
audit-version: latest
enforce: baseline
enforce-version: latest
warn: restricted
warn-version: latest
exemptions:
namespaces:
- kube-system
runtimeClasses: []
usernames: []
kind: PodSecurityConfiguration
```
The goal is to add an exemption for the namespace `rook-ceph`.
{{< tabpane lang="yaml" >}}
{{< tab header="Strategic merge patch" >}}
cluster:
apiServer:
admissionControl:
- name: PodSecurity
configuration:
exemptions:
namespaces:
- rook-ceph
{{< /tab >}}
{{< tab header="JSON patch" >}}
- op: add
path: /cluster/apiServer/admissionControl/0/configuration/exemptions/namespaces/-
value: rook-ceph
{{< /tab >}}
{{< /tabpane >}}
Patched machine configuration:
```yaml
cluster:
apiServer:
admissionControl:
- name: PodSecurity
configuration:
apiVersion: pod-security.admission.config.k8s.io/v1alpha1
defaults:
audit: restricted
audit-version: latest
enforce: baseline
enforce-version: latest
warn: restricted
warn-version: latest
exemptions:
namespaces:
- kube-system
- rook-ceph
runtimeClasses: []
usernames: []
kind: PodSecurityConfiguration
```
## Configuration Patching with `talosctl` CLI
Several `talosctl` commands accept config patches as command-line flags.
Config patches might be passed either as an inline value or as a reference to a file with `@file.patch` syntax:
```shell
talosctl ... --patch '[{"op": "add", "path": "/machine/network/hostname", "value": "worker1"}]' --patch @file.patch
```
If multiple config patches are specified, they are applied in the order of appearance.
The format of the patch (JSON patch or strategic merge patch) is detected automatically.
Talos machine configuration can be patched at the moment of generation with `talosctl gen config`:
```shell
talosctl gen config test-cluster https://172.20.0.1:6443 --config-patch @all.yaml --config-patch-control-plane @cp.yaml --config-patch-worker @worker.yaml
```
Generated machine configuration can also be patched after the fact with `talosctl machineconfig patch`
```shell
talosctl machineconfig patch worker.yaml --patch @patch.yaml -o worker1.yaml
```
Machine configuration on the running Talos node can be patched with `talosctl patch`:
```shell
talosctl patch mc --nodes 172.20.0.2 --patch @patch.yaml
```

View File

@ -0,0 +1,171 @@
---
title: Pull Through Image Cache
description: "How to set up local transparent container images caches."
aliases:
- ../../guides/configuring-pull-through-cache
---
In this guide we will create a set of local caching Docker registry proxies to minimize local cluster startup time.
When running Talos locally, pulling images from container registries might take a significant amount of time.
We spin up local caching pass-through registries to cache images and configure a local Talos cluster to use those proxies.
A similar approach might be used to run Talos in production in air-gapped environments.
It can be also used to verify that all the images are available in local registries.
## Video Walkthrough
To see a live demo of this writeup, see the video below:
<iframe width="560" height="315" src="https://www.youtube.com/embed/PRiQJR9Q33s" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
## Requirements
The follow are requirements for creating the set of caching proxies:
- Docker 18.03 or greater
- Local cluster requirements for either [docker]({{< relref "../install/local-platforms/docker" >}}) or [QEMU]({{< relref "../install/local-platforms/qemu" >}}).
## Launch the Caching Docker Registry Proxies
Talos pulls from `docker.io`, `registry.k8s.io`, `gcr.io`, and `ghcr.io` by default.
If your configuration is different, you might need to modify the commands below:
```bash
docker run -d -p 5000:5000 \
-e REGISTRY_PROXY_REMOTEURL=https://registry-1.docker.io \
--restart always \
--name registry-docker.io registry:2
docker run -d -p 5001:5000 \
-e REGISTRY_PROXY_REMOTEURL=https://registry.k8s.io \
--restart always \
--name registry-registry.k8s.io registry:2
docker run -d -p 5003:5000 \
-e REGISTRY_PROXY_REMOTEURL=https://gcr.io \
--restart always \
--name registry-gcr.io registry:2
docker run -d -p 5004:5000 \
-e REGISTRY_PROXY_REMOTEURL=https://ghcr.io \
--restart always \
--name registry-ghcr.io registry:2
```
> Note: Proxies are started as docker containers, and they're automatically configured to start with Docker daemon.
As a registry container can only handle a single upstream Docker registry, we launch a container per upstream, each on its own
host port (5000, 5001, 5002, 5003 and 5004).
## Using Caching Registries with `QEMU` Local Cluster
With a [QEMU]({{< relref "../install/local-platforms/qemu" >}}) local cluster, a bridge interface is created on the host.
As registry containers expose their ports on the host, we can use bridge IP to direct proxy requests.
```bash
sudo talosctl cluster create --provisioner qemu \
--registry-mirror docker.io=http://10.5.0.1:5000 \
--registry-mirror registry.k8s.io=http://10.5.0.1:5001 \
--registry-mirror gcr.io=http://10.5.0.1:5003 \
--registry-mirror ghcr.io=http://10.5.0.1:5004
```
The Talos local cluster should now start pulling via caching registries.
This can be verified via registry logs, e.g. `docker logs -f registry-docker.io`.
The first time cluster boots, images are pulled and cached, so next cluster boot should be much faster.
> Note: `10.5.0.1` is a bridge IP with default network (`10.5.0.0/24`), if using custom `--cidr`, value should be adjusted accordingly.
## Using Caching Registries with `docker` Local Cluster
With a [docker]({{< relref "../install/local-platforms/docker" >}}) local cluster we can use docker bridge IP, default value for that IP is `172.17.0.1`.
On Linux, the docker bridge address can be inspected with `ip addr show docker0`.
```bash
talosctl cluster create --provisioner docker \
--registry-mirror docker.io=http://172.17.0.1:5000 \
--registry-mirror registry.k8s.io=http://172.17.0.1:5001 \
--registry-mirror gcr.io=http://172.17.0.1:5003 \
--registry-mirror ghcr.io=http://172.17.0.1:5004
```
## Machine Configuration
The caching registries can be configured via machine configuration [patch]({{< relref "patching" >}}), equivalent to the command line flags above:
```yaml
machine:
registries:
mirrors:
docker.io:
endpoints:
- http://10.5.0.1:5000
gcr.io:
endpoints:
- http://10.5.0.1:5003
ghcr.io:
endpoints:
- http://10.5.0.1:5004
registry.k8s.io:
endpoints:
- http://10.5.0.1:5001
```
## Cleaning Up
To cleanup, run:
```bash
docker rm -f registry-docker.io
docker rm -f registry-registry.k8s.io
docker rm -f registry-gcr.io
docker rm -f registry-ghcr.io
```
> Note: Removing docker registry containers also removes the image cache.
> So if you plan to use caching registries, keep the containers running.
## Using Harbor as a Caching Registry
[Harbor](https://goharbor.io/) is an open source container registry that can be used as a caching proxy.
Harbor supports configuring multiple upstream registries, so it can be used to cache multiple registries at once behind a single endpoint.
![Harbor Endpoints](/images/harbor-endpoints.png)
![Harbor Projects](/images/harbor-projects.png)
As Harbor puts a registry name in the pull image path, we need to set `overridePath: true` to prevent Talos and containerd from appending `/v2` to the path.
```yaml
machine:
registries:
mirrors:
docker.io:
endpoints:
- http://harbor/v2/proxy-docker.io
overridePath: true
ghcr.io:
endpoints:
- http://harbor/v2/proxy-ghcr.io
overridePath: true
gcr.io:
endpoints:
- http://harbor/v2/proxy-gcr.io
overridePath: true
registry.k8s.io:
endpoints:
- http://harbor/v2/proxy-registry.k8s.io
overridePath: true
```
The Harbor external endpoint (`http://harbor` in this example) can be configured with authentication or custom TLS:
```yaml
machine:
registries:
config:
harbor:
auth:
username: admin
password: password
```

View File

@ -0,0 +1,52 @@
---
title: "Role-based access control (RBAC)"
description: "Set up RBAC on the Talos Linux API."
aliases:
- ../../guides/rbac
---
Talos v0.11 introduced initial support for role-based access control (RBAC).
This guide will explain what that is and how to enable it without losing access to the cluster.
## RBAC in Talos
Talos uses certificates to authorize users.
The certificate subject's organization field is used to encode user roles.
There is a set of predefined roles that allow access to different [API methods]({{< relref "../../reference/api" >}}):
* `os:admin` grants access to all methods;
* `os:operator` grants everything `os:reader` role does, plus additional methods: rebooting, shutting down, etcd backup, etcd alarm management, and so on;
* `os:reader` grants access to "safe" methods (for example, that includes the ability to list files, but does not include the ability to read files content);
* `os:etcd:backup` grants access to [`/machine.MachineService/EtcdSnapshot`]({{< relref "../../reference/api#machine.EtcdSnapshotRequest" >}}) method.
Roles in the current `talosconfig` can be checked with the following command:
```sh
$ talosctl config info
[...]
Roles: os:admin
[...]
```
RBAC is enabled by default in new clusters created with `talosctl` v0.11+ and disabled otherwise.
## Enabling RBAC
First, both the Talos cluster and `talosctl` tool should be [upgraded]({{< relref "../upgrading-talos" >}}).
Then the `talosctl config new` command should be used to generate a new client configuration with the `os:admin` role.
Additional configurations and certificates for different roles can be generated by passing `--roles` flag:
```sh
talosctl config new --roles=os:reader reader
```
That command will create a new client configuration file `reader` with a new certificate with `os:reader` role.
After that, RBAC should be enabled in the machine configuration:
```yaml
machine:
features:
rbac: true
```

View File

@ -0,0 +1,105 @@
---
title: "System Extensions"
description: "Customizing the Talos Linux immutable root file system."
aliases:
- ../../guides/system-extensions
---
System extensions allow extending the Talos root filesystem, which enables a variety of features, such as including custom
container runtimes, loading additional firmware, etc.
System extensions are only activated during the installation or upgrade of Talos Linux.
With system extensions installed, the Talos root filesystem is still immutable and read-only.
## Installing System Extensions
> Note: the way to install system extensions in the `.machine.install` section of the machine configuration is now deprecated.
Starting with Talos v1.5.0, Talos supports generation of boot media with system extensions included, this removes the need to rebuild
the `initramfs.xz` on the machine itself during the installation or upgrade.
There are two kinds of boot assets that Talos can generate:
* initial boot assets (ISO, PXE, etc.) that are used to boot the machine
* disk images that have Talos pre-installed
* `installer` container images that can be used to install or upgrade Talos on a machine (installation happens when booted from ISO or PXE)
Depending on the nature of the system extension (e.g. network device driver or `containerd` plugin), it may be necessary to include the extension in
both initial boot assets and disk images/`installer`, or just the `installer`.
The process of generating boot assets with extensions included is described in the [boot assets guide]({{< relref "../install/boot-assets" >}}).
### Example: Booting from an ISO
Let's assume NVIDIA extension is required on a bare metal machine which is going to be booted from an ISO.
As NVIDIA extension is not required for the initial boot and install step, it is sufficient to include the extension in the `installer` image only.
1. Use a generic Talos ISO to boot the machine.
2. Prepare a custom `installer` container image with NVIDIA extension included, push the image to a registry.
3. Ensure that machine configuration field `.machine.install.image` points to the custom `installer` image.
4. Boot the machine using the ISO, apply the machine configuration.
5. Talos pulls a custom installer image from the registry (containing NVIDIA extension), installs Talos on the machine, and reboots.
When it's time to upgrade Talos, generate a custom `installer` container for a new version of Talos, push it to a registry, and perform upgrade
pointing to the custom `installer` image.
### Example: Disk Image
Let's assume NVIDIA extension is required on AWS VM.
1. Prepare an AWS disk image with NVIDIA extension included.
2. Upload the image to AWS, register it as an AMI.
3. Use the AMI to launch a VM.
4. Talos boots with NVIDIA extension included.
When it's time to upgrade Talos, either repeat steps 1-4 to replace the VM with a new AMI, or
like in the previous example, generate a custom `installer` and use it to upgrade Talos in-place.
## Authoring System Extensions
A Talos system extension is a container image with the [specific folder structure](https://github.com/siderolabs/extensions#readme).
System extensions can be built and managed using any tool that produces container images, e.g. `docker build`.
Sidero Labs maintains a [repository of system extensions](https://github.com/siderolabs/extensions).
## Resource Definitions
Use `talosctl get extensions` to get a list of system extensions:
```bash
$ talosctl get extensions
NODE NAMESPACE TYPE ID VERSION NAME VERSION
172.20.0.2 runtime ExtensionStatus 000.ghcr.io-talos-systems-gvisor-54b831d 1 gvisor 20220117.0-v1.0.0
172.20.0.2 runtime ExtensionStatus 001.ghcr.io-talos-systems-intel-ucode-54b831d 1 intel-ucode microcode-20210608-v1.0.0
```
Use YAML or JSON format to see additional details about the extension:
```bash
$ talosctl -n 172.20.0.2 get extensions 001.ghcr.io-talos-systems-intel-ucode-54b831d -o yaml
node: 172.20.0.2
metadata:
namespace: runtime
type: ExtensionStatuses.runtime.talos.dev
id: 001.ghcr.io-talos-systems-intel-ucode-54b831d
version: 1
owner: runtime.ExtensionStatusController
phase: running
created: 2022-02-10T18:25:04Z
updated: 2022-02-10T18:25:04Z
spec:
image: 001.ghcr.io-talos-systems-intel-ucode-54b831d.sqsh
metadata:
name: intel-ucode
version: microcode-20210608-v1.0.0
author: Spencer Smith
description: |
This system extension provides Intel microcode binaries.
compatibility:
talos:
version: '>= v1.0.0'
```
## Example: gVisor
See [readme of the gVisor extension](https://github.com/siderolabs/extensions/tree/main/container-runtime/gvisor#readme).

View File

@ -0,0 +1,158 @@
---
title: "Discovery Service"
description: "Talos Linux Node discovery services"
aliases:
- ../../guides/discovery
---
Talos Linux includes node-discovery capabilities that depend on a discovery registry.
This allows you to see the members of your cluster, and the associated IP addresses of the nodes.
```bash
talosctl get members
NODE NAMESPACE TYPE ID VERSION HOSTNAME MACHINE TYPE OS ADDRESSES
10.5.0.2 cluster Member talos-default-controlplane-1 1 talos-default-controlplane-1 controlplane Talos (v1.2.3) ["10.5.0.2"]
10.5.0.2 cluster Member talos-default-worker-1 1 talos-default-worker-1 worker Talos (v1.2.3) ["10.5.0.3"]
```
There are currently two supported discovery services: a Kubernetes registry (which stores data in the cluster's etcd service) and an external registry service.
Sidero Labs runs a public external registry service, which is enabled by default.
The Kubernetes registry service is disabled by default.
The advantage of the external registry service is that it is not dependent on etcd, and thus can inform you of cluster membership even when Kubernetes is down.
## Video Walkthrough
To see a live demo of Cluster Discovery, see the video below:
<iframe width="560" height="315" src="https://www.youtube.com/embed/GCBTrHhjawY" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
## Registries
Peers are aggregated from enabled registries.
By default, Talos will use the `service` registry, while the `kubernetes` registry is disabled.
To disable a registry, set `disabled` to `true` (this option is the same for all registries):
For example, to disable the `service` registry:
```yaml
cluster:
discovery:
enabled: true
registries:
service:
disabled: true
```
Disabling all registries effectively disables member discovery.
> Note: An enabled discovery service is required for [KubeSpan]({{< relref "../talos-guides/network/kubespan/" >}}) to function correctly.
The `Kubernetes` registry uses Kubernetes `Node` resource data and additional Talos annotations:
```sh
$ kubectl describe node <nodename>
Annotations: cluster.talos.dev/node-id: Utoh3O0ZneV0kT2IUBrh7TgdouRcUW2yzaaMl4VXnCd
networking.talos.dev/assigned-prefixes: 10.244.0.0/32,10.244.0.1/24
networking.talos.dev/self-ips: 172.20.0.2,fd83:b1f7:fcb5:2802:8c13:71ff:feaf:7c94
...
```
The `Service` registry by default uses a public external Discovery Service to exchange encrypted information about cluster members.
> Note: Talos supports operations when Discovery Service is disabled, but some features will rely on Kubernetes API availability to discover
> controlplane endpoints, so in case of a failure disabled Discovery Service makes troubleshooting much harder.
## Discovery Service
Sidero Labs maintains a public discovery service at `https://discovery.talos.dev/` whereby cluster members use a shared key that is globally unique to coordinate basic connection information (i.e. the set of possible "endpoints", or IP:port pairs).
We call this data "affiliate data."
> Note: If KubeSpan is enabled the data has the addition of the WireGuard public key.
Data sent to the discovery service is encrypted with AES-GCM encryption and endpoint data is separately encrypted with AES in ECB mode so that endpoints coming from different sources can be deduplicated server-side.
Each node submits its own data, plus the endpoints it sees from other peers, to the discovery service.
The discovery service aggregates the data, deduplicates the endpoints, and sends updates to each connected peer.
Each peer receives information back from the discovery service, decrypts it and uses it to drive KubeSpan and cluster discovery.
Data is stored in memory only.
The cluster ID is used as a key to select the affiliates (so that different clusters see different affiliates).
To summarize, the discovery service knows the client version, cluster ID, the number of affiliates, some encrypted data for each affiliate, and a list of encrypted endpoints.
The discovery service doesnt see actual node information it only stores and updates encrypted blobs.
Discovery data is encrypted/decrypted by the clients the cluster members.
The discovery service does not have the encryption key.
The discovery service may, with a commercial license, be operated by your organization and can be [downloaded here](https://github.com/siderolabs/discovery-service).
In order for nodes to communicate to the discovery service, they must be able to connect to it on TCP port 443.
## Resource Definitions
Talos provides resources that can be used to introspect the discovery and KubeSpan features.
### Discovery
#### Identities
The node's unique identity (base62 encoded random 32 bytes) can be obtained with:
> Note: Using base62 allows the ID to be URL encoded without having to use the ambiguous URL-encoding version of base64.
```sh
$ talosctl get identities -o yaml
...
spec:
nodeId: Utoh3O0ZneV0kT2IUBrh7TgdouRcUW2yzaaMl4VXnCd
```
Node identity is used as the unique `Affiliate` identifier.
Node identity resource is preserved in the [STATE]({{< relref "../learn-more/architecture/#file-system-partitions" >}}) partition in `node-identity.yaml` file.
Node identity is preserved across reboots and upgrades, but it is regenerated if the node is reset (wiped).
#### Affiliates
An affiliate is a proposed member: the node has the same cluster ID and secret.
```sh
$ talosctl get affiliates
ID VERSION HOSTNAME MACHINE TYPE ADDRESSES
2VfX3nu67ZtZPl57IdJrU87BMjVWkSBJiL9ulP9TCnF 2 talos-default-controlplane-2 controlplane ["172.20.0.3","fd83:b1f7:fcb5:2802:986b:7eff:fec5:889d"]
6EVq8RHIne03LeZiJ60WsJcoQOtttw1ejvTS6SOBzhUA 2 talos-default-worker-1 worker ["172.20.0.5","fd83:b1f7:fcb5:2802:cc80:3dff:fece:d89d"]
NVtfu1bT1QjhNq5xJFUZl8f8I8LOCnnpGrZfPpdN9WlB 2 talos-default-worker-2 worker ["172.20.0.6","fd83:b1f7:fcb5:2802:2805:fbff:fe80:5ed2"]
Utoh3O0ZneV0kT2IUBrh7TgdouRcUW2yzaaMl4VXnCd 4 talos-default-controlplane-1 controlplane ["172.20.0.2","fd83:b1f7:fcb5:2802:8c13:71ff:feaf:7c94"]
b3DebkPaCRLTLLWaeRF1ejGaR0lK3m79jRJcPn0mfA6C 2 talos-default-controlplane-3 controlplane ["172.20.0.4","fd83:b1f7:fcb5:2802:248f:1fff:fe5c:c3f"]
```
One of the `Affiliates` with the `ID` matching node identity is populated from the node data, other `Affiliates` are pulled from the registries.
Enabled discovery registries run in parallel and discovered data is merged to build the list presented above.
Details about data coming from each registry can be queried from the `cluster-raw` namespace:
```sh
$ talosctl get affiliates --namespace=cluster-raw
ID VERSION HOSTNAME MACHINE TYPE ADDRESSES
k8s/2VfX3nu67ZtZPl57IdJrU87BMjVWkSBJiL9ulP9TCnF 3 talos-default-controlplane-2 controlplane ["172.20.0.3","fd83:b1f7:fcb5:2802:986b:7eff:fec5:889d"]
k8s/6EVq8RHIne03LeZiJ60WsJcoQOtttw1ejvTS6SOBzhUA 2 talos-default-worker-1 worker ["172.20.0.5","fd83:b1f7:fcb5:2802:cc80:3dff:fece:d89d"]
k8s/NVtfu1bT1QjhNq5xJFUZl8f8I8LOCnnpGrZfPpdN9WlB 2 talos-default-worker-2 worker ["172.20.0.6","fd83:b1f7:fcb5:2802:2805:fbff:fe80:5ed2"]
k8s/b3DebkPaCRLTLLWaeRF1ejGaR0lK3m79jRJcPn0mfA6C 3 talos-default-controlplane-3 controlplane ["172.20.0.4","fd83:b1f7:fcb5:2802:248f:1fff:fe5c:c3f"]
service/2VfX3nu67ZtZPl57IdJrU87BMjVWkSBJiL9ulP9TCnF 23 talos-default-controlplane-2 controlplane ["172.20.0.3","fd83:b1f7:fcb5:2802:986b:7eff:fec5:889d"]
service/6EVq8RHIne03LeZiJ60WsJcoQOtttw1ejvTS6SOBzhUA 26 talos-default-worker-1 worker ["172.20.0.5","fd83:b1f7:fcb5:2802:cc80:3dff:fece:d89d"]
service/NVtfu1bT1QjhNq5xJFUZl8f8I8LOCnnpGrZfPpdN9WlB 20 talos-default-worker-2 worker ["172.20.0.6","fd83:b1f7:fcb5:2802:2805:fbff:fe80:5ed2"]
service/b3DebkPaCRLTLLWaeRF1ejGaR0lK3m79jRJcPn0mfA6C 14 talos-default-controlplane-3 controlplane ["172.20.0.4","fd83:b1f7:fcb5:2802:248f:1fff:fe5c:c3f"]
```
Each `Affiliate` ID is prefixed with `k8s/` for data coming from the Kubernetes registry and with `service/` for data coming from the discovery service.
#### Members
A member is an affiliate that has been approved to join the cluster.
The members of the cluster can be obtained with:
```sh
$ talosctl get members
ID VERSION HOSTNAME MACHINE TYPE OS ADDRESSES
talos-default-controlplane-1 2 talos-default-controlplane-1 controlplane Talos ({{< release >}}) ["172.20.0.2","fd83:b1f7:fcb5:2802:8c13:71ff:feaf:7c94"]
talos-default-controlplane-2 1 talos-default-controlplane-2 controlplane Talos ({{< release >}}) ["172.20.0.3","fd83:b1f7:fcb5:2802:986b:7eff:fec5:889d"]
talos-default-controlplane-3 1 talos-default-controlplane-3 controlplane Talos ({{< release >}}) ["172.20.0.4","fd83:b1f7:fcb5:2802:248f:1fff:fe5c:c3f"]
talos-default-worker-1 1 talos-default-worker-1 worker Talos ({{< release >}}) ["172.20.0.5","fd83:b1f7:fcb5:2802:cc80:3dff:fece:d89d"]
talos-default-worker-2 1 talos-default-worker-2 worker Talos ({{< release >}}) ["172.20.0.6","fd83:b1f7:fcb5:2802:2805:fbff:fe80:5ed2"]
```

View File

@ -0,0 +1,5 @@
---
title: "How Tos"
weight: 30
description: "How to guide for common tasks in Talos Linux"
---

View File

@ -0,0 +1,14 @@
---
title: "How to manage certificate lifetimes with Talos Linux"
aliases:
---
Talos Linux automatically manages and rotates all server side certs for etcd, Kubernetes, and the Talos API.
Note however that the kubelet needs to be restarted at least once a year in order for the certificates to be rotated.
Any upgrade/reboot of the node will suffice for this effect.
Client certs (`talosconfig` and `kubeconfig`) are the user's responsibility.
Each time you download the `kubeconfig` file from a Talos Linux cluster, the client certificate is regenerated giving you a kubeconfig which is valid for a year.
The `talosconfig` file should be renewed at least once a year, using the `talosctl config new` command.

View File

@ -0,0 +1,17 @@
---
title: "How to scale down a Talos cluster"
description: "How to remove nodes from a Talos Linux cluster."
aliases:
---
To remove nodes from a Talos Linux cluster:
- `talosctl -n <IP.of.node.to.remove> reset`
- `kubectl delete node <nodename>`
The command [`talosctl reset`]({{< relref "../../reference/cli/#talosctl-reset">}}) will cordon and drain the node, leaving `etcd` if required, and then erase its disks and power down the system.
This command will also remove the node from registration with the discovery service, so it will no longer show up in `talosctl get members`.
It is still necessary to remove the node from Kubernetes, as noted above.

View File

@ -0,0 +1,27 @@
---
title: "How to scale up a Talos cluster"
description: "How to add more nodes to a Talos Linux cluster."
aliases:
---
To add more nodes to a Talos Linux cluster, follow the same procedure as when initially creating the cluster:
- boot the new machines to install Talos Linux
- apply the `worker.yaml` or `controlplane.yaml` configuration files to the new machines
You need the `controlplane.yaml` and `worker.yaml` that were created when you initially deployed your cluster.
These contain the certificates that enable new machines to join.
Once you have the IP address, you can then apply the correct configuration for each machine you are adding, either `worker` or `controlplane`.
```bash
talosctl apply-config --insecure \
--nodes [NODE IP] \
--file controlplane.yaml
```
The insecure flag is necessary because the PKI infrastructure has not yet been made available to the node.
You do not need to bootstrap the new node.
Regardless of whether you are adding a control plane or worker node, it will now join the cluster in its role.

View File

@ -0,0 +1,19 @@
---
title: "How to enable workers on your control plane nodes"
description: "How to enable workers on your control plane nodes."
aliases:
---
By default, Talos Linux taints control plane nodes so that workloads are not schedulable on them.
In order to allow workloads to run on the control plane nodes (useful for single node clusters, or non-production clusters), follow the procedure below.
Modify the MachineConfig for the controlplane nodes to add `allowSchedulingOnControlPlanes: true`:
```yaml
cluster:
allowSchedulingOnControlPlanes: true
```
This may be done via editing the `controlplane.yaml` file before it is applied to the control plane nodes, by [editing the machine config]({{< relref "../configuration/editing-machine-configuration" >}}), or by [patching the machine config]({{< relref "../configuration/patching">}}).

View File

@ -0,0 +1,5 @@
---
title: "Installation"
weight: 10
description: "How to install Talos Linux on various platforms"
---

View File

@ -0,0 +1,5 @@
---
title: "Bare Metal Platforms"
weight: 20
description: "Installation of Talos Linux on various bare-metal platforms."
---

View File

@ -0,0 +1,172 @@
---
title: "Digital Rebar"
description: "In this guide we will create an Kubernetes cluster with 1 worker node, and 2 controlplane nodes using an existing digital rebar deployment."
aliases:
- ../../../bare-metal-platforms/digital-rebar
---
## Prerequisites
- 3 nodes (please see [hardware requirements]({{< relref "../../../introduction/system-requirements/" >}}))
- Loadbalancer
- Digital Rebar Server
- Talosctl access (see [talosctl setup]({{< relref "../../../introduction/getting-started/#talosctl" >}}))
## Creating a Cluster
In this guide we will create an Kubernetes cluster with 1 worker node, and 2 controlplane nodes.
We assume an existing digital rebar deployment, and some familiarity with iPXE.
We leave it up to the user to decide if they would like to use static networking, or DHCP.
The setup and configuration of DHCP will not be covered.
### Create the Machine Configuration Files
#### Generating Base Configurations
Using the DNS name of the load balancer, generate the base configuration files for the Talos machines:
```bash
$ talosctl gen config talos-k8s-metal-tutorial https://<load balancer IP or DNS>:<port>
created controlplane.yaml
created worker.yaml
created talosconfig
```
> The loadbalancer is used to distribute the load across multiple controlplane nodes.
> This isn't covered in detail, because we assume some loadbalancing knowledge before hand.
> If you think this should be added to the docs, please [create a issue](https://github.com/siderolabs/talos/issues).
At this point, you can modify the generated configs to your liking.
Optionally, you can specify `--config-patch` with RFC6902 jsonpatch which will be applied during the config generation.
#### Validate the Configuration Files
```bash
$ talosctl validate --config controlplane.yaml --mode metal
controlplane.yaml is valid for metal mode
$ talosctl validate --config worker.yaml --mode metal
worker.yaml is valid for metal mode
```
#### Publishing the Machine Configuration Files
Digital Rebar has a built-in fileserver, which means we can use this feature to expose the talos configuration files.
We will place `controlplane.yaml`, and `worker.yaml` into Digital Rebar file server by using the `drpcli` tools.
Copy the generated files from the step above into your Digital Rebar installation.
```bash
drpcli file upload <file>.yaml as <file>.yaml
```
Replacing `<file>` with controlplane or worker.
### Download the boot files
Download a recent version of `boot.tar.gz` from [github.](https://github.com/siderolabs/talos/releases/)
Upload to DRB:
```bash
$ drpcli isos upload boot.tar.gz as talos.tar.gz
{
"Path": "talos.tar.gz",
"Size": 96470072
}
```
We have some Digital Rebar [example files](https://github.com/siderolabs/talos/tree/master/hack/test/digitalrebar/) in the Git repo you can use to provision Digital Rebar with drpcli.
To apply these configs you need to create them, and then apply them as follow:
```bash
$ drpcli bootenvs create talos
{
"Available": true,
"BootParams": "",
"Bundle": "",
"Description": "",
"Documentation": "",
"Endpoint": "",
"Errors": [],
"Initrds": [],
"Kernel": "",
"Meta": {},
"Name": "talos",
"OS": {
"Codename": "",
"Family": "",
"IsoFile": "",
"IsoSha256": "",
"IsoUrl": "",
"Name": "",
"SupportedArchitectures": {},
"Version": ""
},
"OnlyUnknown": false,
"OptionalParams": [],
"ReadOnly": false,
"RequiredParams": [],
"Templates": [],
"Validated": true
}
```
```bash
drpcli bootenvs update talos - < bootenv.yaml
```
> You need to do this for all files in the example directory.
> If you don't have access to the `drpcli` tools you can also use the webinterface.
It's important to have a corresponding SHA256 hash matching the boot.tar.gz
#### Bootenv BootParams
We're using some of Digital Rebar built in templating to make sure the machine gets the correct role assigned.
`talos.platform=metal talos.config={{ .ProvisionerURL }}/files/{{.Param \"talos/role\"}}.yaml"`
This is why we also include a `params.yaml` in the example directory to make sure the role is set to one of the following:
- controlplane
- worker
The `{{.Param \"talos/role\"}}` then gets populated with one of the above roles.
### Boot the Machines
In the UI of Digital Rebar you need to select the machines you want to provision.
Once selected, you need to assign to following:
- Profile
- Workflow
This will provision the Stage and Bootenv with the talos values.
Once this is done, you can boot the machine.
### Bootstrap Etcd
To configure `talosctl` we will need the first control plane node's IP:
Set the `endpoints` and `nodes`:
```bash
talosctl --talosconfig talosconfig config endpoint <control plane 1 IP>
talosctl --talosconfig talosconfig config node <control plane 1 IP>
```
Bootstrap `etcd`:
```bash
talosctl --talosconfig talosconfig bootstrap
```
### Retrieve the `kubeconfig`
At this point we can retrieve the admin `kubeconfig` by running:
```bash
talosctl --talosconfig talosconfig kubeconfig .
```

View File

@ -0,0 +1,173 @@
---
title: "Equinix Metal"
description: "Creating Talos clusters with Equinix Metal."
aliases:
- ../../../bare-metal-platforms/equinix-metal
---
You can create a Talos Linux cluster on Equinix Metal in a variety of ways, such as through the EM web UI, the `metal` command line too, or through PXE booting.
Talos Linux is a supported OS install option on Equinix Metal, so it's an easy process.
Regardless of the method, the process is:
* Create a DNS entry for your Kubernetes endpoint.
* Generate the configurations using `talosctl`.
* Provision your machines on Equinix Metal.
* Push the configurations to your servers (if not done as part of the machine provisioning).
* configure your Kubernetes endpoint to point to the newly created control plane nodes
* bootstrap the cluster
## Define the Kubernetes Endpoint
There are a variety of ways to create an HA endpoint for the Kubernetes cluster.
Some of the ways are:
* DNS
* Load Balancer
* BGP
Whatever way is chosen, it should result in an IP address/DNS name that routes traffic to all the control plane nodes.
We do not know the control plane node IP addresses at this stage, but we should define the endpoint DNS entry so that we can use it in creating the cluster configuration.
After the nodes are provisioned, we can use their addresses to create the endpoint A records, or bind them to the load balancer, etc.
## Create the Machine Configuration Files
### Generating Configurations
Using the DNS name of the loadbalancer defined above, generate the base configuration files for the Talos machines:
```bash
$ talosctl gen config talos-k8s-em-tutorial https://<load balancer IP or DNS>:<port>
created controlplane.yaml
created worker.yaml
created talosconfig
```
> The `port` used above should be 6443, unless your load balancer maps a different port to port 6443 on the control plane nodes.
### Validate the Configuration Files
```bash
talosctl validate --config controlplane.yaml --mode metal
talosctl validate --config worker.yaml --mode metal
```
> Note: Validation of the install disk could potentially fail as validation
> is performed on your local machine and the specified disk may not exist.
### Passing in the configuration as User Data
You can use the metadata service provide by Equinix Metal to pass in the machines configuration.
It is required to add a shebang to the top of the configuration file.
<!-- textlint-disable one-sentence-per-line -->
The convention we use is `#!talos`.
<!-- textlint-enable one-sentence-per-line -->
## Provision the machines in Equinix Metal
### Using the Equinix Metal UI
Simply select the location and type of machines in the Equinix Metal web interface.
Select Talos as the Operating System, then select the number of servers to create, and name them (in lowercase only.)
Under *optional settings*, you can optionally paste in the contents of `controlplane.yaml` that was generated, above (ensuring you add a first line of `#!talos`).
You can repeat this process to create machines of different types for control plane and worker nodes (although you would pass in `worker.yaml` for the worker nodes, as user data).
If you did not pass in the machine configuration as User Data, you need to provide it to each machine, with the following command:
`talosctl apply-config --insecure --nodes <Node IP> --file ./controlplane.yaml`
### Creating a Cluster via the Equinix Metal CLI
This guide assumes the user has a working API token,and the [Equinix Metal CLI](https://github.com/equinix/metal-cli/) installed.
Because Talos Linux is a supported operating system, Talos Linux machines can be provisioned directly via the CLI, using the `-O talos_v1` parameter (for Operating System).
<!-- textlint-disable one-sentence-per-line -->
> Note: Ensure you have prepended `#!talos` to the `controlplane.yaml` file.
<!-- textlint-enable one-sentence-per-line -->
```bash
metal device create \
--project-id $PROJECT_ID \
--facility $FACILITY \
--operating-system "talos_v1" \
--plan $PLAN\
--hostname $HOSTNAME\
--userdata-file controlplane.yaml
```
e.g. `metal device create -p <projectID> -f da11 -O talos_v1 -P c3.small.x86 -H steve.test.11 --userdata-file ./controlplane.yaml`
Repeat this to create each control plane node desired: there should usually be 3 for a HA cluster.
### Network Booting via iPXE
Talos Linux can be PXE-booted on Equinix Metal using [Image Factory]({{< relref "../../../learn-more/image-factory" >}}), using the `equinixMetal` platform: e.g.
`https://pxe.factory.talos.dev/pxe/376567988ad370138ad8b2698212367b8edcb69b5fd68c80be1f2ec7d603b4ba/{{< release >}}/equinixMetal-amd64` (this URL references the default schematic and `amd64` architecture).
#### Create the Control Plane Nodes
```bash
metal device create \
--project-id $PROJECT_ID \
--facility $FACILITY \
--ipxe-script-url $PXE_SERVER \
--operating-system "custom_ipxe" \
--plan $PLAN\
--hostname $HOSTNAME\
--userdata-file controlplane.yaml
```
> Note: Repeat this to create each control plane node desired: there should usually be 3 for a HA cluster.
#### Create the Worker Nodes
```bash
metal device create \
--project-id $PROJECT_ID \
--facility $FACILITY \
--ipxe-script-url $PXE_SERVER \
--operating-system "custom_ipxe" \
--plan $PLAN\
--hostname $HOSTNAME\
--userdata-file worker.yaml
```
## Update the Kubernetes endpoint
Now our control plane nodes have been created, and we know their IP addresses, we can associate them with the Kubernetes endpoint.
Configure your load balancer to route traffic to these nodes, or add `A` records to your DNS entry for the endpoint, for each control plane node.
e.g.
```bash
host endpoint.mydomain.com
endpoint.mydomain.com has address 145.40.90.201
endpoint.mydomain.com has address 147.75.109.71
endpoint.mydomain.com has address 145.40.90.177
```
## Bootstrap Etcd
Set the `endpoints` and `nodes` for `talosctl`:
```bash
talosctl --talosconfig talosconfig config endpoint <control plane 1 IP>
talosctl --talosconfig talosconfig config node <control plane 1 IP>
```
Bootstrap `etcd`:
```bash
talosctl --talosconfig talosconfig bootstrap
```
This only needs to be issued to one control plane node.
## Retrieve the `kubeconfig`
At this point we can retrieve the admin `kubeconfig` by running:
```bash
talosctl --talosconfig talosconfig kubeconfig .
```

View File

@ -0,0 +1,21 @@
---
title: "ISO"
description: "Booting Talos on bare-metal with ISO."
---
Talos can be installed on bare-metal machine using an ISO image.
ISO images for `amd64` and `arm64` architectures are available on the [Talos releases page](https://github.com/siderolabs/talos/releases/latest/).
Talos doesn't install itself to disk when booted from an ISO until the machine configuration is applied.
Please follow the [getting started guide]({{< relref "../../../introduction/getting-started" >}}) for the generic steps on how to install Talos.
> Note: If there is already a Talos installation on the disk, the machine will boot into that installation when booting from a Talos ISO.
> The boot order should prefer disk over ISO, or the ISO should be removed after the installation to make Talos boot from disk.
See [kernel parameters reference]({{< relref "../../../reference/kernel" >}}) for the list of kernel parameters supported by Talos.
There are two flavors of ISO images available:
* `metal-<arch>.iso` supports booting on BIOS and UEFI systems (for x86, UEFI only for arm64)
* `metal-<arch>-secureboot.iso` supports booting on only UEFI systems in SecureBoot mode (via [Image Factory]({{< relref "../../../learn-more/image-factory" >}}))

Some files were not shown because too many files have changed in this diff Show More