docs: fork docs for Talos 1.2

Now master generates docs for the future v1.2.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This commit is contained in:
Andrey Smirnov 2022-06-10 22:21:39 +04:00
parent a0dd010a87
commit 90bf34fed9
No known key found for this signature in database
GPG Key ID: 7B26396447AB6DFD
116 changed files with 20686 additions and 6 deletions

View File

@ -781,9 +781,9 @@ RUN protoc \
/protos/time/*.proto
FROM scratch AS docs
COPY --from=docs-build /tmp/configuration.md /website/content/v1.1/reference/
COPY --from=docs-build /tmp/cli.md /website/content/v1.1/reference/
COPY --from=proto-docs-build /tmp/api.md /website/content/v1.1/reference/
COPY --from=docs-build /tmp/configuration.md /website/content/v1.2/reference/
COPY --from=docs-build /tmp/cli.md /website/content/v1.2/reference/
COPY --from=proto-docs-build /tmp/api.md /website/content/v1.2/reference/
# The talosctl-cni-bundle builds the CNI bundle for talosctl.

View File

@ -135,6 +135,10 @@ offlineSearch = false
# Enable syntax highlighting and copy buttons on code blocks with Prism
prism_syntax_highlighting = false
[[params.versions]]
url = "/v1.2/"
version = "v1.2 (pre-release)"
[[params.versions]]
url = "/v1.1/"
version = "v1.1 (pre-release)"

View File

@ -5,7 +5,7 @@ linkTitle: "Documentation"
cascade:
type: docs
preRelease: true
lastRelease: v1.1.0-beta.1
lastRelease: v1.1.0-beta.2
kubernetesRelease: "1.24.1"
prevKubernetesRelease: "1.23.5"
---

View File

@ -6,8 +6,8 @@ description: "Table of supported Talos Linux versions and respective platforms."
| Talos Version | 1.1 | 1.0 |
|----------------------------------------------------------------------------------------------------------------|------------------------------------|------------------------------------|
| Release Date | 2022-06-01, T | 2022-03-29 (1.0.0) |
| End of Community Support | 1.2.0 release (2022-09-01, TBD) | 1.1.0 release (2022-06-01, TBD) |
| Release Date | 2022-06-24, TBD | 2022-03-29 (1.0.0) |
| End of Community Support | 1.2.0 release (2022-09-01, TBD) | 1.1.0 release (2022-06-24, TBD) |
| Enterprise Support | [offered by Sidero Labs Inc.](https://www.siderolabs.com/support/) | [offered by Sidero Labs Inc.](https://www.siderolabs.com/support/) |
| Kubernetes | 1.24, 1.23, 1.22 | 1.23, 1.22, 1.21 |
| Architecture | amd64, arm64 | amd64, arm64 |

View File

@ -0,0 +1,54 @@
---
title: Welcome
no_list: true
linkTitle: "Documentation"
cascade:
type: docs
preRelease: true
lastRelease: v1.2.0-alpha.0
kubernetesRelease: "1.24.1"
prevKubernetesRelease: "1.24.1"
---
## Welcome
Welcome to the Talos documentation.
If you are just getting familiar with Talos, we recommend starting here:
- [What is Talos]({{< relref "introduction/what-is-talos" >}}): a quick description of Talos
- [Quickstart]({{< relref "introduction/quickstart" >}}): the fastest way to get a Talos cluster up and running
- [Getting Started]({{< relref "introduction/getting-started" >}}): a long-form, guided tour of getting a full Talos cluster deployed
## Open Source
### Community
- GitHub: [repo](https://github.com/siderolabs/talos)
- Slack: Join our [slack channel](https://slack.dev.talos-systems.io)
- Matrix: Join our Matrix channels:
- Community: [#talos:matrix.org](https://matrix.to/#/#talos:matrix.org)
- Support: [#talos-support:matrix.org](https://matrix.to/#/#talos-support:matrix.org)
- Support: Questions, bugs, feature requests [GitHub Discussions](https://github.com/siderolabs/talos/discussions)
- Forum: [community](https://groups.google.com/a/siderolabs.com/forum/#!forum/community)
- Twitter: [@SideroLabs](https://twitter.com/talossystems)
- Email: [info@SideroLabs.com](mailto:info@SideroLabs.com)
If you're interested in this project and would like to help in engineering efforts, or have general usage questions, we are happy to have you!
We hold a weekly meeting that all audiences are welcome to attend.
We would appreciate your feedback so that we can make Talos even better!
To do so, you can take our [survey](https://docs.google.com/forms/d/1TUna5YTYGCKot68Y9YN_CLobY6z9JzLVCq1G7DoyNjA/edit).
### Office Hours
- When: Mondays at 16:30 UTC.
- Where: [Google Meet](https://meet.google.com/day-pxhv-zky).
You can subscribe to this meeting by joining the community forum above.
## Enterprise
If you are using Talos in a production setting, and need consulting services to get started or to integrate Talos into your existing environment, we can help.
Sidero Labs, Inc. offers support contracts with SLA (Service Level Agreement)-bound terms for mission-critical environments.
[Learn More](https://www.siderolabs.com/support/)

View File

@ -0,0 +1,4 @@
---
title: "Advanced Guides"
weight: 60
---

View File

@ -0,0 +1,89 @@
---
title: "Advanced Networking"
description: "How to configure advanced networking options on Talos Linux."
aliases:
- ../guides/advanced-networking
---
## Static Addressing
Static addressing is comprised of specifying `addresses`, `routes` ( remember to add your default gateway ), and `interface`.
Most likely you'll also want to define the `nameservers` so you have properly functioning DNS.
```yaml
machine:
network:
hostname: talos
nameservers:
- 10.0.0.1
interfaces:
- interface: eth0
addresses:
- 10.0.0.201/8
mtu: 8765
routes:
- network: 0.0.0.0/0
gateway: 10.0.0.1
- interface: eth1
ignore: true
time:
servers:
- time.cloudflare.com
```
## Additional Addresses for an Interface
In some environments you may need to set additional addresses on an interface.
In the following example, we set two additional addresses on the loopback interface.
```yaml
machine:
network:
interfaces:
- interface: lo
addresses:
- 192.168.0.21/24
- 10.2.2.2/24
```
## Bonding
The following example shows how to create a bonded interface.
```yaml
machine:
network:
interfaces:
- interface: bond0
dhcp: true
bond:
mode: 802.3ad
lacpRate: fast
xmitHashPolicy: layer3+4
miimon: 100
updelay: 200
downdelay: 200
interfaces:
- eth0
- eth1
```
## VLANs
To setup vlans on a specific device use an array of VLANs to add.
The master device may be configured without addressing by setting dhcp to false.
```yaml
machine:
network:
interfaces:
- interface: eth0
dhcp: false
vlans:
- vlanId: 100
addresses:
- "192.168.2.10/28"
routes:
- network: 0.0.0.0/0
gateway: 192.168.2.1
```

View File

@ -0,0 +1,167 @@
---
title: "Air-gapped Environments"
description: "Setting up Talos Linux to work in environments with no internet access."
aliases:
- ../guides/air-gapped
---
In this guide we will create a Talos cluster running in an air-gapped environment with all the required images being pulled from an internal registry.
We will use the [QEMU]({{< relref "../talos-guides/install/local-platforms/qemu" >}}) provisioner available in `talosctl` to create a local cluster, but the same approach could be used to deploy Talos in bigger air-gapped networks.
## Requirements
The follow are requirements for this guide:
- Docker 18.03 or greater
- Requirements for the Talos [QEMU]({{< relref "../talos-guides/install/local-platforms/qemu" >}}) cluster
## Identifying Images
In air-gapped environments, access to the public Internet is restricted, so Talos can't pull images from public Docker registries (`docker.io`, `ghcr.io`, etc.)
We need to identify the images required to install and run Talos.
The same strategy can be used for images required by custom workloads running on the cluster.
The `talosctl images` command provides a list of default images used by the Talos cluster (with default configuration
settings).
To print the list of images, run:
```bash
talosctl images
```
This list contains images required by a default deployment of Talos.
There might be additional images required for the workloads running on this cluster, and those should be added to this list.
## Preparing the Internal Registry
As access to the public registries is restricted, we have to run an internal Docker registry.
In this guide, we will launch the registry on the same machine using Docker:
```bash
$ docker run -d -p 6000:5000 --restart always --name registry-airgapped registry:2
1bf09802bee1476bc463d972c686f90a64640d87dacce1ac8485585de69c91a5
```
This registry will be accepting connections on port 6000 on the host IPs.
The registry is empty by default, so we have fill it with the images required by Talos.
First, we pull all the images to our local Docker daemon:
```bash
$ for image in `talosctl images`; do docker pull $image; done
v0.15.1: Pulling from coreos/flannel
Digest: sha256:9a296fbb67790659adc3701e287adde3c59803b7fcefe354f1fc482840cdb3d9
...
```
All images are now stored in the Docker daemon store:
```bash
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
gcr.io/etcd-development/etcd v3.5.3 604d4f022632 6 days ago 181MB
ghcr.io/siderolabs/install-cni v1.0.0-2-gc5d3ab0 4729e54f794d 6 days ago 76MB
...
```
Now we need to re-tag them so that we can push them to our local registry.
We are going to replace the first component of the image name (before the first slash) with our registry endpoint `127.0.0.1:6000`:
```bash
$ for image in `talosctl images`; do \
docker tag $image `echo $image | sed -E 's#^[^/]+/#127.0.0.1:6000/#'`; \
done
```
As the next step, we push images to the internal registry:
```bash
$ for image in `talosctl images`; do \
docker push `echo $image | sed -E 's#^[^/]+/#127.0.0.1:6000/#'`; \
done
```
We can now verify that the images are pushed to the registry:
```bash
$ curl http://127.0.0.1:6000/v2/_catalog
{"repositories":["coredns/coredns","coreos/flannel","etcd-development/etcd","kube-apiserver","kube-controller-manager","kube-proxy","kube-scheduler","pause","siderolabs/install-cni","siderolabs/installer","siderolabs/kubelet"]}
```
> Note: images in the registry don't have the registry endpoint prefix anymore.
## Launching Talos in an Air-gapped Environment
For Talos to use the internal registry, we use the registry mirror feature to redirect all image pull requests to the internal registry.
This means that the registry endpoint (as the first component of the image reference) gets ignored, and all pull requests are sent directly to the specified endpoint.
We are going to use a QEMU-based Talos cluster for this guide, but the same approach works with Docker-based clusters as well.
As QEMU-based clusters go through the Talos install process, they can be used better to model a real air-gapped environment.
Identify all registry prefixes from `talosctl images`, for example:
- `docker.io`
- `gcr.io`
- `ghcr.io`
- `k8s.gcr.io`
- `quay.io`
The `talosctl cluster create` command provides conveniences for common configuration options.
The only required flag for this guide is `--registry-mirror <endpoint>=http://10.5.0.1:6000` which redirects every pull request to the internal registry, this flag
needs to be repeated for each of the identified registry prefixes above.
The endpoint being used is `10.5.0.1`, as this is the default bridge interface address which will be routable from the QEMU VMs (`127.0.0.1` IP will be pointing to the VM itself).
```bash
$ sudo -E talosctl cluster create --provisioner=qemu --install-image=ghcr.io/siderolabs/installer:{{< release >}} \
--registry-mirror docker.io=http://10.5.0.1:6000 \
--registry-mirror gcr.io=http://10.5.0.1:6000 \
--registry-mirror ghcr.io=http://10.5.0.1:6000 \
--registry-mirror k8s.gcr.io=http://10.5.0.1:6000 \
--registry-mirror quay.io=http://10.5.0.1:6000
validating CIDR and reserving IPs
generating PKI and tokens
creating state directory in "/home/user/.talos/clusters/talos-default"
creating network talos-default
creating load balancer
creating dhcpd
creating master nodes
creating worker nodes
waiting for API
...
```
> Note: `--install-image` should match the image which was copied into the internal registry in the previous step.
You can be verify that the cluster is air-gapped by inspecting the registry logs: `docker logs -f registry-airgapped`.
## Closing Notes
Running in an air-gapped environment might require additional configuration changes, for example using custom settings for DNS and NTP servers.
When scaling this guide to the bare-metal environment, following Talos config snippet could be used as an equivalent of the `--registry-mirror` flag above:
```bash
machine:
...
registries:
mirrors:
docker.io:
endpoints:
- http://10.5.0.1:6000/
gcr.io:
endpoints:
- http://10.5.0.1:6000/
ghcr.io:
endpoints:
- http://10.5.0.1:6000/
k8s.gcr.io:
endpoints:
- http://10.5.0.1:6000/
quay.io:
endpoints:
- http://10.5.0.1:6000/
...
```
Other implementations of Docker registry can be used in place of the Docker `registry` image used above to run the registry.
If required, auth can be configured for the internal registry (and custom TLS certificates if needed).

View File

@ -0,0 +1,51 @@
---
title: "Customizing the Kernel"
description: "Guide on how to customize the kernel used by Talos Linux."
aliases:
- ../guides/customizing-the-kernel
---
The installer image contains [`ONBUILD`](https://docs.docker.com/engine/reference/builder/#onbuild) instructions that handle the following:
- the decompression, and unpacking of the `initramfs.xz`
- the unsquashing of the rootfs
- the copying of new rootfs files
- the squashing of the new rootfs
- and the packing, and compression of the new `initramfs.xz`
When used as a base image, the installer will perform the above steps automatically with the requirement that a `customization` stage be defined in the `Dockerfile`.
Build and push your own kernel:
```sh
git clone https://github.com/talos-systems/pkgs.git
cd pkgs
make kernel-menuconfig USERNAME=_your_github_user_name_
docker login ghcr.io --username _your_github_user_name_
make kernel USERNAME=_your_github_user_name_ PUSH=true
```
Using a multi-stage `Dockerfile` we can define the `customization` stage and build `FROM` the installer image:
```docker
FROM scratch AS customization
COPY --from=<custom kernel image> /lib/modules /lib/modules
FROM ghcr.io/siderolabs/installer:latest
COPY --from=<custom kernel image> /boot/vmlinuz /usr/install/${TARGETARCH}/vmlinuz
```
When building the image, the `customization` stage will automatically be copied into the rootfs.
The `customization` stage is not limited to a single `COPY` instruction.
In fact, you can do whatever you would like in this stage, but keep in mind that everything in `/` will be copied into the rootfs.
To build the image, run:
```bash
DOCKER_BUILDKIT=0 docker build --build-arg RM="/lib/modules" -t installer:kernel .
```
> Note: buildkit has a bug [#816](https://github.com/moby/buildkit/issues/816), to disable it use `DOCKER_BUILDKIT=0`
Now that we have a custom installer we can build Talos for the specific platform we wish to deploy to.

View File

@ -0,0 +1,63 @@
---
title: "Customizing the Root Filesystem"
description: "How to add your own content to the immutable root file system of Talos Linux."
aliases:
- ../guides/customizing-the-root-filesystem
---
The installer image contains [`ONBUILD`](https://docs.docker.com/engine/reference/builder/#onbuild) instructions that handle the following:
- the decompression, and unpacking of the `initramfs.xz`
- the unsquashing of the rootfs
- the copying of new rootfs files
- the squashing of the new rootfs
- and the packing, and compression of the new `initramfs.xz`
When used as a base image, the installer will perform the above steps automatically with the requirement that a `customization` stage be defined in the `Dockerfile`.
For example, say we have an image that contains the contents of a library we wish to add to the Talos rootfs.
We need to define a stage with the name `customization`:
```docker
FROM scratch AS customization
COPY --from=<name|index> <src> <dest>
```
Using a multi-stage `Dockerfile` we can define the `customization` stage and build `FROM` the installer image:
```docker
FROM scratch AS customization
COPY --from=<name|index> <src> <dest>
FROM ghcr.io/siderolabs/installer:latest
```
When building the image, the `customization` stage will automatically be copied into the rootfs.
The `customization` stage is not limited to a single `COPY` instruction.
In fact, you can do whatever you would like in this stage, but keep in mind that everything in `/` will be copied into the rootfs.
> Note: `<dest>` is the path relative to the rootfs that you wish to place the contents of `<src>`.
To build the image, run:
```bash
docker build --squash -t <organization>/installer:latest .
```
In the case that you need to perform some cleanup _before_ adding additional files to the rootfs, you can specify the `RM` [build-time variable](https://docs.docker.com/engine/reference/commandline/build/#set-build-time-variables---build-arg):
```bash
docker build --squash --build-arg RM="[<path> ...]" -t <organization>/installer:latest .
```
This will perform a `rm -rf` on the specified paths relative to the rootfs.
> Note: `RM` must be a whitespace delimited list.
The resulting image can be used to:
- generate an image for any of the supported providers
- perform bare-metall installs
- perform upgrades
We will step through common customizations in the remainder of this section.

View File

@ -0,0 +1,235 @@
---
title: "Developing Talos"
description: "Learn how to set up a development environment for local testing and hacking on Talos itself!"
aliases:
- ../learn-more/developing-talos
---
This guide outlines steps and tricks to develop Talos operating systems and related components.
The guide assumes Linux operating system on the development host.
Some steps might work under Mac OS X, but using Linux is highly advised.
## Prepare
Check out the [Talos repository](https://github.com/siderolabs/talos).
Try running `make help` to see available `make` commands.
You would need Docker and `buildx` installed on the host.
> Note: Usually it is better to install up to date Docker from Docker apt repositories, e.g. [Ubuntu instructions](https://docs.docker.com/engine/install/ubuntu/).
>
> If `buildx` plugin is not available with OS docker packages, it can be installed [as a plugin from GitHub releases](https://docs.docker.com/buildx/working-with-buildx/#install).
Set up a builder with access to the host network:
```bash
docker buildx create --driver docker-container --driver-opt network=host --name local1 --buildkitd-flags '--allow-insecure-entitlement security.insecure' --use
```
> Note: `network=host` allows buildx builder to access host network, so that it can push to a local container registry (see below).
Make sure the following steps work:
- `make talosctl`
- `make initramfs kernel`
Set up a local docker registry:
```bash
docker run -d -p 5005:5000 \
--restart always \
--name local registry:2
```
Try to build and push to local registry an installer image:
```bash
make installer IMAGE_REGISTRY=127.0.0.1:5005 PUSH=true
```
Record the image name output in the step above.
> Note: it is also possible to force a stable image tag by using `TAG` variable: `make installer IMAGE_REGISTRY=127.0.0.1:5005 TAG=v1.0.0-alpha.1 PUSH=true`.
## Running Talos cluster
Set up local caching docker registries (this speeds up Talos cluster boot a lot), script is in the Talos repo:
```bash
bash hack/start-registry-proxies.sh
```
Start your local cluster with:
```bash
sudo -E _out/talosctl-linux-amd64 cluster create \
--provisioner=qemu \
--cidr=172.20.0.0/24 \
--registry-mirror docker.io=http://172.20.0.1:5000 \
--registry-mirror k8s.gcr.io=http://172.20.0.1:5001 \
--registry-mirror quay.io=http://172.20.0.1:5002 \
--registry-mirror gcr.io=http://172.20.0.1:5003 \
--registry-mirror ghcr.io=http://172.20.0.1:5004 \
--registry-mirror 127.0.0.1:5005=http://172.20.0.1:5005 \
--install-image=127.0.0.1:5005/siderolabs/installer:<RECORDED HASH from the build step> \
--masters 3 \
--workers 2 \
--with-bootloader=false
```
- `--provisioner` selects QEMU vs. default Docker
- custom `--cidr` to make QEMU cluster use different network than default Docker setup (optional)
- `--registry-mirror` uses the caching proxies set up above to speed up boot time a lot, last one adds your local registry (installer image was pushed to it)
- `--install-image` is the image you built with `make installer` above
- `--masters` & `--workers` configure cluster size, choose to match your resources; 3 masters give you HA control plane; 1 master is enough, never do 2 masters
- `--with-bootloader=false` disables boot from disk (Talos will always boot from `_out/vmlinuz-amd64` and `_out/initramfs-amd64.xz`).
This speeds up development cycle a lot - no need to rebuild installer and perform install, rebooting is enough to get new code.
> Note: as boot loader is not used, it's not necessary to rebuild `installer` each time (old image is fine), but sometimes it's needed (when configuration changes are done and old installer doesn't validate the config).
>
> `talosctl cluster create` derives Talos machine configuration version from the install image tag, so sometimes early in the development cycle (when new minor tag is not released yet), machine config version can be overridden with `--talos-version={{< version >}}`.
If the `--with-bootloader=false` flag is not enabled, for Talos cluster to pick up new changes to the code (in `initramfs`), it will require a Talos upgrade (so new `installer` should be built).
With `--with-bootloader=false` flag, Talos always boots from `initramfs` in `_out/` directory, so simple reboot is enough to pick up new code changes.
If the installation flow needs to be tested, `--with-bootloader=false` shouldn't be used.
## Console Logs
Watching console logs is easy with `tail`:
```bash
tail -F ~/.talos/clusters/talos-default/talos-default-*.log
```
## Interacting with Talos
Once `talosctl cluster create` finishes successfully, `talosconfig` and `kubeconfig` will be set up automatically to point to your cluster.
Start playing with `talosctl`:
```bash
talosctl -n 172.20.0.2 version
talosctl -n 172.20.0.3,172.20.0.4 dashboard
talosctl -n 172.20.0.4 get members
```
Same with `kubectl`:
```bash
kubectl get nodes -o wide
```
You can deploy some Kubernetes workloads to the cluster.
You can edit machine config on the fly with `talosctl edit mc --immediate`, config patches can be applied via `--config-patch` flags, also many features have specific flags in `talosctl cluster create`.
## Quick Reboot
To reboot whole cluster quickly (e.g. to pick up a change made in the code):
```bash
for socket in ~/.talos/clusters/talos-default/talos-default-*.monitor; do echo "q" | sudo socat - unix-connect:$socket; done
```
Sending `q` to a single socket allows to reboot a single node.
> Note: This command performs immediate reboot (as if the machine was powered down and immediately powered back up), for normal Talos reboot use `talosctl reboot`.
## Development Cycle
Fast development cycle:
- bring up a cluster
- make code changes
- rebuild `initramfs` with `make initramfs`
- reboot a node to pick new `initramfs`
- verify code changes
- more code changes...
Some aspects of Talos development require to enable bootloader (when working on `installer` itself), in that case quick development cycle is no longer possible, and cluster should be destroyed and recreated each time.
## Running Integration Tests
If integration tests were changed (or when running them for the first time), first rebuild the integration test binary:
```bash
rm -f _out/integration-test-linux-amd64; make _out/integration-test-linux-amd64
```
Running short tests against QEMU provisioned cluster:
```bash
_out/integration-test-linux-amd64 \
-talos.provisioner=qemu \
-test.v \
-talos.crashdump=false \
-test.short \
-talos.talosctlpath=$PWD/_out/talosctl-linux-amd64
```
Whole test suite can be run removing `-test.short` flag.
Specfic tests can be run with `-test.run=TestIntegration/api.ResetSuite`.
## Build Flavors
`make <something> WITH_RACE=1` enables Go race detector, Talos runs slower and uses more memory, but memory races are detected.
`make <something> WITH_DEBUG=1` enables Go profiling and other debug features, useful for local development.
## Destroying Cluster
```bash
sudo -E ../talos/_out/talosctl-linux-amd64 cluster destroy --provisioner=qemu
```
This command stops QEMU and helper processes, tears down bridged network on the host, and cleans up
cluster state in `~/.talos/clusters`.
> Note: if the host machine is rebooted, QEMU instances and helpers processes won't be started back.
> In that case it's required to clean up files in `~/.talos/clusters/<cluster-name>` directory manually.
## Optional
Set up cross-build environment with:
```bash
docker run --rm --privileged multiarch/qemu-user-static --reset -p yes
```
> Note: the static qemu binaries which come with Ubuntu 21.10 seem to be broken.
## Unit tests
Unit tests can be run in buildx with `make unit-tests`, on Ubuntu systems some tests using `loop` devices will fail because Ubuntu uses low-index `loop` devices for snaps.
Most of the unit-tests can be run standalone as well, with regular `go test`, or using IDE integration:
```bash
go test -v ./internal/pkg/circular/
```
This provides much faster feedback loop, but some tests require either elevated privileges (running as `root`) or additional binaries available only in Talos `rootfs` (containerd tests).
Running tests as root can be done with `-exec` flag to `go test`, but this is risky, as test code has root access and can potentially make undesired changes:
```bash
go test -exec sudo -v ./internal/app/machined/pkg/controllers/network/...
```
## Go Profiling
Build `initramfs` with debug enabled: `make initramfs WITH_DEBUG=1`.
Launch Talos cluster with bootloader disabled, and use `go tool pprof` to capture the profile and show the output in your browser:
```bash
go tool pprof http://172.20.0.2:9982/debug/pprof/heap
```
The IP address `172.20.0.2` is the address of the Talos node, and port `:9982` depends on the Go application to profile:
- 9981: `apid`
- 9982: `machined`
- 9983: `trustd`

View File

@ -0,0 +1,149 @@
---
title: "Disaster Recovery"
description: "Procedure for snapshotting etcd database and recovering from catastrophic control plane failure."
aliases:
- ../guides/disaster-recovery
---
`etcd` database backs Kubernetes control plane state, so if the `etcd` service is unavailable
Kubernetes control plane goes down, and the cluster is not recoverable until `etcd` is recovered with contents.
The `etcd` consistency model builds around the consensus protocol Raft, so for highly-available control plane clusters,
loss of one control plane node doesn't impact cluster health.
In general, `etcd` stays up as long as a sufficient number of nodes to maintain quorum are up.
For a three control plane node Talos cluster, this means that the cluster tolerates a failure of any single node,
but losing more than one node at the same time leads to complete loss of service.
Because of that, it is important to take routine backups of `etcd` state to have a snapshot to recover cluster from
in case of catastrophic failure.
## Backup
### Snapshotting `etcd` Database
Create a consistent snapshot of `etcd` database with `talosctl etcd snapshot` command:
```bash
$ talosctl -n <IP> etcd snapshot db.snapshot
etcd snapshot saved to "db.snapshot" (2015264 bytes)
snapshot info: hash c25fd181, revision 4193, total keys 1287, total size 3035136
```
> Note: filename `db.snapshot` is arbitrary.
This database snapshot can be taken on any healthy control plane node (with IP address `<IP>` in the example above),
as all `etcd` instances contain exactly same data.
It is recommended to configure `etcd` snapshots to be created on some schedule to allow point-in-time recovery using the latest snapshot.
### Disaster Database Snapshot
If `etcd` cluster is not healthy, the `talosctl etcd snapshot` command might fail.
In that case, copy the database snapshot directly from the control plane node:
```bash
talosctl -n <IP> cp /var/lib/etcd/member/snap/db .
```
This snapshot might not be fully consistent (if the `etcd` process is running), but it allows
for disaster recovery when latest regular snapshot is not available.
### Machine Configuration
Machine configuration might be required to recover the node after hardware failure.
Backup Talos node machine configuration with the command:
```bash
talosctl -n IP get mc v1alpha1 -o yaml | yq eval '.spec' -
```
## Recovery
Before starting a disaster recovery procedure, make sure that `etcd` cluster can't be recovered:
* get `etcd` cluster member list on all healthy control plane nodes with `talosctl -n IP etcd members` command and compare across all members.
* query `etcd` health across control plane nodes with `talosctl -n IP service etcd`.
If the quorum can be restored, restoring quorum might be a better strategy than performing full disaster recovery
procedure.
### Latest Etcd Snapshot
Get hold of the latest `etcd` database snapshot.
If a snapshot is not fresh enough, create a database snapshot (see above), even if the `etcd` cluster is unhealthy.
### Init Node
Make sure that there are no control plane nodes with machine type `init`:
```bash
$ talosctl -n <IP1>,<IP2>,... get machinetype
NODE NAMESPACE TYPE ID VERSION TYPE
172.20.0.2 config MachineType machine-type 2 controlplane
172.20.0.4 config MachineType machine-type 2 controlplane
172.20.0.3 config MachineType machine-type 2 controlplane
```
Nodes with `init` type are incompatible with `etcd` recovery procedure.
`init` node can be converted to `controlplane` type with `talosctl edit mc --mode=staged` command followed
by node reboot with `talosctl reboot` command.
### Preparing Control Plane Nodes
If some control plane nodes experienced hardware failure, replace them with new nodes.
Use machine configuration backup to re-create the nodes with the same secret material and control plane settings
to allow workers to join the recovered control plane.
If a control plane node is healthy but `etcd` isn't, wipe the node's [EPHEMERAL]({{< relref "../learn-more/architecture/#file-system-partitions" >}}) partition to remove the `etcd`
data directory (make sure a database snapshot is taken before doing this):
```bash
talosctl -n <IP> reset --graceful=false --reboot --system-labels-to-wipe=EPHEMERAL
```
At this point, all control plane nodes should boot up, and `etcd` service should be in the `Preparing` state.
Kubernetes control plane endpoint should be pointed to the new control plane nodes if there were
any changes to the node addresses.
### Recovering from the Backup
Make sure all `etcd` service instances are in `Preparing` state:
```bash
$ talosctl -n <IP> service etcd
NODE 172.20.0.2
ID etcd
STATE Preparing
HEALTH ?
EVENTS [Preparing]: Running pre state (17s ago)
[Waiting]: Waiting for service "cri" to be "up", time sync (18s ago)
[Waiting]: Waiting for service "cri" to be "up", service "networkd" to be "up", time sync (20s ago)
```
Execute the bootstrap command against any control plane node passing the path to the `etcd` database snapshot:
```bash
$ talosctl -n <IP> bootstrap --recover-from=./db.snapshot
recovering from snapshot "./db.snapshot": hash c25fd181, revision 4193, total keys 1287, total size 3035136
```
> Note: if database snapshot was copied out directly from the `etcd` data directory using `talosctl cp`,
> add flag `--recover-skip-hash-check` to skip integrity check on restore.
Talos node should print matching information in the kernel log:
```log
recovering etcd from snapshot: hash c25fd181, revision 4193, total keys 1287, total size 3035136
{"level":"info","msg":"restoring snapshot","path":"/var/lib/etcd.snapshot","wal-dir":"/var/lib/etcd/member/wal","data-dir":"/var/lib/etcd","snap-dir":"/var/li}
{"level":"info","msg":"restored last compact revision","meta-bucket-name":"meta","meta-bucket-name-key":"finishedCompactRev","restored-compact-revision":3360}
{"level":"info","msg":"added member","cluster-id":"a3390e43eb5274e2","local-member-id":"0","added-peer-id":"eb4f6f534361855e","added-peer-peer-urls":["https:/}
{"level":"info","msg":"restored snapshot","path":"/var/lib/etcd.snapshot","wal-dir":"/var/lib/etcd/member/wal","data-dir":"/var/lib/etcd","snap-dir":"/var/lib/etcd/member/snap"}
```
Now `etcd` service should become healthy on the bootstrap node, Kubernetes control plane components
should start and control plane endpoint should become available.
Remaining control plane nodes join `etcd` cluster once control plane endpoint is up.
## Single Control Plane Node Cluster
This guide applies to the single control plane clusters as well.
In fact, it is much more important to take regular snapshots of the `etcd` database in single control plane node
case, as loss of the control plane node might render the whole cluster irrecoverable without a backup.

View File

@ -0,0 +1,189 @@
---
title: "Extension Services"
description: "Use extension services in Talos Linux."
aliases:
- ../learn-more/extension-services
---
Talos provides a way to run additional system services early in the Talos boot process.
Extension services should be included into the Talos root filesystem (e.g. using [system extensions]({{< relref "../talos-guides/configuration/system-extensions" >}})).
Extension services run as privileged containers with ephemeral root filesystem located in the Talos root filesystem.
Extension services can be used to use extend core features of Talos in a way that is not possible via [static pods]({{< relref "../advanced/static-pods" >}}) or
Kubernetes DaemonSets.
Potential extension services use-cases:
* storage: Open iSCSI, software RAID, etc.
* networking: BGP FRR, etc.
* platform integration: VMWare open VM tools, etc.
## Configuration
Talos on boot scans directory `/usr/local/etc/containers` for `*.yaml` files describing the extension services to run.
Format of the extension service config:
```yaml
name: hello-world
container:
entrypoint: ./hello-world
args:
- -f
mounts:
- # OCI Mount Spec
depends:
- service: cri
- path: /run/machined/machined.sock
- network:
- addresses
- connectivity
- hostname
- etcfiles
- time: true
restart: never|always|untilSuccess
```
### `name`
Field `name` sets the service name, valid names are `[a-z0-9-_]+`.
The service container root filesystem path is derived from the `name`: `/usr/local/lib/containers/<name>`.
The extension service will be registered as a Talos service under an `ext-<name>` identifier.
### `container`
* `entrypoint` defines the container entrypoint relative to the container root filesystem (`/usr/local/lib/containers/<name>`)
* `args` defines the additional arguments to pass to the entrypoint
* `mounts` defines the volumes to be mounted into the container root
#### `container.mounts`
The section `mounts` uses the standard OCI spec:
```yaml
- source: /var/log/audit
destination: /var/log/audit
type: bind
options:
- rshared
- bind
- ro
```
All requested directories will be mounted into the extension service container mount namespace.
If the `source` directory doesn't exist in the host filesystem, it will be created (only for writable paths in the Talos root filesystem).
#### `container.security`
The section `security` follows this example:
```yaml
maskedPaths:
- "/should/be/masked"
readonlyPaths:
- "/path/that/should/be/readonly"
- "/another/readonly/path"
writeableRootfs: true
writeableSysfs: true
```
> * The rootfs is readonly by default unless `writeableRootfs: true` is set.
> * The sysfs is readonly by default unless `writeableSysfs: true` is set.
> * Masked paths if not set defaults to [containerd defaults](https://github.com/containerd/containerd/tree/main/oci/spec.go).
Masked paths will be mounted to `/dev/null`.
To set empty masked paths use:
>
> ```yaml
> container:
> security:
> maskedPaths: []
> ```
>
> * Read Only paths if not set defaults to [containerd defaults](https://github.com/containerd/containerd/tree/main/oci/spec.go).
Read-only paths will be mounted to `/dev/null`.
To set empty read only paths use:
>
> ```yaml
> container:
> security:
> readonlyPaths: []
> ```
### `depends`
The `depends` section describes extension service start dependencies: the service will not be started until all dependencies are met.
Available dependencies:
* `service: <name>`: wait for the service `<name>` to be running and healthy
* `path: <path>`: wait for the `<path>` to exist
* `network: [addresses, connectivity, hostname, etcfiles]`: wait for the specified network readiness checks to succeed
* `time: true`: wait for the NTP time sync
### `restart`
Field `restart` defines the service restart policy, it allows to either configure an always running service or a one-shot service:
* `always`: restart service always
* `never`: start service only once and never restart
* `untilSuccess`: restart failing service, stop restarting on successful run
## Example
Example layout of the Talos root filesystem contents for the extension service:
```text
/
└── usr
   └── local
      ├── etc
     │   └── containers
      │     └── hello-world.yaml
      └── lib
          └── containers
         └── hello-world
         ├── hello
└── config.ini
```
Talos discovers the extension service configuration in `/usr/local/etc/containers/hello-world.yaml`:
```yaml
name: hello-world
container:
entrypoint: ./hello
args:
- --config
- config.ini
depends:
- network:
- addresses
restart: always
```
Talos starts the container for the extension service with container root filesystem at `/usr/local/lib/containers/hello-world`:
```text
/
├── hello
└── config.ini
```
Extension service is registered as `ext-hello-world` in `talosctl services`:
```shell
$ talosctl service ext-hello-world
NODE 172.20.0.5
ID ext-hello-world
STATE Running
HEALTH ?
EVENTS [Running]: Started task ext-hello-world (PID 1100) for container ext-hello-world (2m47s ago)
[Preparing]: Creating service runner (2m47s ago)
[Preparing]: Running pre state (2m47s ago)
[Waiting]: Waiting for service "containerd" to be "up" (2m48s ago)
[Waiting]: Waiting for service "containerd" to be "up", network (2m49s ago)
```
An extension service can be started, restarted and stopped using `talosctl service ext-hello-world start|restart|stop`.
Use `talosctl logs ext-hello-world` to get the logs of the service.
Complete example of the extension service can be found in the [extensions repository](https://github.com/talos-systems/extensions/tree/main/examples/hello-world-service).

View File

@ -0,0 +1,65 @@
---
title: "Proprietary Kernel Modules"
description: "Adding a proprietary kernel module to Talos Linux"
aliases:
- ../guides/adding-a-proprietary-kernel-module
---
1. Patching and building the kernel image
1. Clone the `pkgs` repository from Github and check out the revision corresponding to your version of Talos Linux
```bash
git clone https://github.com/talos-systems/pkgs pkgs && cd pkgs
git checkout v0.8.0
```
2. Clone the Linux kernel and check out the revision that pkgs uses (this can be found in `kernel/kernel-prepare/pkg.yaml` and it will be something like the following: `https://cdn.kernel.org/pub/linux/kernel/v5.x/linux-x.xx.x.tar.xz`)
```bash
git clone https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git && cd linux
git checkout v5.15
```
3. Your module will need to be converted to be in-tree.
The steps for this are different depending on the complexity of the module to port, but generally it would involve moving the module source code into the `drivers` tree and creating a new Makefile and Kconfig.
4. Stage your changes in Git with `git add -A`.
5. Run `git diff --cached --no-prefix > foobar.patch` to generate a patch from your changes.
6. Copy this patch to `kernel/kernel/patches` in the `pkgs` repo.
7. Add a `patch` line in the `prepare` segment of `kernel/kernel/pkg.yaml`:
```bash
patch -p0 < /pkg/patches/foobar.patch
```
8. Build the kernel image.
Make sure you are logged in to `ghcr.io` before running this command, and you can change or omit `PLATFORM` depending on what you want to target.
```bash
make kernel PLATFORM=linux/amd64 USERNAME=your-username PUSH=true
```
9. Make a note of the image name the `make` command outputs.
2. Building the installer image
1. Copy the following into a new `Dockerfile`:
```dockerfile
FROM scratch AS customization
COPY --from=ghcr.io/your-username/kernel:<kernel version> /lib/modules /lib/modules
FROM ghcr.io/siderolabs/installer:<talos version>
COPY --from=ghcr.io/your-username/kernel:<kernel version> /boot/vmlinuz /usr/install/${TARGETARCH}/vmlinuz
```
2. Run to build and push the installer:
```bash
INSTALLER_VERSION=<talos version>
IMAGE_NAME="ghcr.io/your-username/talos-installer:$INSTALLER_VERSION"
DOCKER_BUILDKIT=0 docker build --build-arg RM="/lib/modules" -t "$IMAGE_NAME" . && docker push "$IMAGE_NAME"
```
3. Deploying to your cluster
```bash
talosctl upgrade --image ghcr.io/your-username/talos-installer:<talos version> --preserve=true
```

View File

@ -0,0 +1,100 @@
---
title: "Static Pods"
description: "Using Talos Linux to set up static pods in Kubernetes."
aliases:
- ../guides/static-pods
---
## Static Pods
Static pods are run directly by the `kubelet` bypassing the Kubernetes API server checks and validations.
Most of the time `DaemonSet` is a better alternative to static pods, but some workloads need to run
before the Kubernetes API server is available or might need to bypass security restrictions imposed by the API server.
See [Kubernetes documentation](https://kubernetes.io/docs/tasks/configure-pod-container/static-pod/) for more information on static pods.
## Configuration
Static pod definitions are specified in the Talos machine configuration:
```yaml
machine:
pods:
- apiVersion: v1
kind: Pod
metadata:
name: nginx
spec:
containers:
- name: nginx
image: nginx
```
Talos renders static pod definitions to the `kubelet` manifest directory (`/etc/kubernetes/manifests`), `kubelet` picks up the definition and launches the pod.
Talos accepts changes to the static pod configuration without a reboot.
## Usage
Kubelet mirrors pod definition to the API server state, so static pods can be inspected with `kubectl get pods`, logs can be retrieved with `kubectl logs`, etc.
```bash
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-talos-default-master-2 1/1 Running 0 17s
```
If the API server is not available, status of the static pod can also be inspected with `talosctl containers --kubernetes`:
```bash
$ talosctl containers --kubernetes
NODE NAMESPACE ID IMAGE PID STATUS
172.20.0.3 k8s.io default/nginx-talos-default-master-2 k8s.gcr.io/pause:3.6 4886 SANDBOX_READY
172.20.0.3 k8s.io └─ default/nginx-talos-default-master-2:nginx docker.io/library/nginx:latest
...
```
Logs of static pods can be retrieved with `talosctl logs --kubernetes`:
```bash
$ talosctl logs --kubernetes default/nginx-talos-default-master-2:nginx
172.20.0.3: 2022-02-10T15:26:01.289208227Z stderr F 2022/02/10 15:26:01 [notice] 1#1: using the "epoll" event method
172.20.0.3: 2022-02-10T15:26:01.2892466Z stderr F 2022/02/10 15:26:01 [notice] 1#1: nginx/1.21.6
172.20.0.3: 2022-02-10T15:26:01.28925723Z stderr F 2022/02/10 15:26:01 [notice] 1#1: built by gcc 10.2.1 20210110 (Debian 10.2.1-6)
```
## Troubleshooting
Talos doesn't perform any validation on the static pod definitions.
If the pod isn't running, use `kubelet` logs (`talosctl logs kubelet`) to find the problem:
```bash
$ talosctl logs kubelet
172.20.0.2: {"ts":1644505520281.427,"caller":"config/file.go:187","msg":"Could not process manifest file","path":"/etc/kubernetes/manifests/talos-default-nginx-gvisor.yaml","err":"invalid pod: [spec.containers: Required value]"}
```
## Resource Definitions
Static pod definitions are available as `StaticPod` resources combined with Talos-generated control plane static pods:
```bash
$ talosctl get staticpods
NODE NAMESPACE TYPE ID VERSION
172.20.0.3 k8s StaticPod default-nginx 1
172.20.0.3 k8s StaticPod kube-apiserver 1
172.20.0.3 k8s StaticPod kube-controller-manager 1
172.20.0.3 k8s StaticPod kube-scheduler 1
```
Talos assigns ID `<namespace>-<name>` to the static pods specified in the machine configuration.
On control plane nodes status of the running static pods is available in the `StaticPodStatus` resource:
```bash
$ talosctl get staticpodstatus
NODE NAMESPACE TYPE ID VERSION READY
172.20.0.3 k8s StaticPodStatus default/nginx-talos-default-master-2 2 True
172.20.0.3 k8s StaticPodStatus kube-system/kube-apiserver-talos-default-master-2 2 True
172.20.0.3 k8s StaticPodStatus kube-system/kube-controller-manager-talos-default-master-2 3 True
172.20.0.3 k8s StaticPodStatus kube-system/kube-scheduler-talos-default-master-2 3 True
```

View File

@ -0,0 +1,487 @@
---
title: "Troubleshooting Control Plane"
description: "Troubleshoot control plane failures for running cluster and bootstrap process."
aliases:
- ../guides/troubleshooting-control-plane
---
<!-- markdownlint-disable MD026 -->
This guide is written as series of topics and detailed answers for each topic.
It starts with basics of control plane and goes into Talos specifics.
In this guide we assume that Talos client config is available and Talos API access is available.
Kubernetes client configuration can be pulled from control plane nodes with `talosctl -n <IP> kubeconfig`
(this command works before Kubernetes is fully booted).
### What is a control plane node?
A control plane node is a node which:
- runs etcd, the Kubernetes database
- runs the Kubernetes control plane
- kube-apiserver
- kube-controller-manager
- kube-scheduler
- serves as an administrative proxy to the worker nodes
These nodes are critical to the operation of your cluster.
Without control plane nodes, Kubernetes will not respond to changes in the
system, and certain central services may not be available.
Talos nodes which have `.machine.type` of `controlplane` are control plane nodes.
Control plane nodes are tainted by default to prevent workloads from being scheduled to control plane nodes.
### How many control plane nodes should be deployed?
Because control plane nodes are so important, it is important that they be
deployed with redundancy to ensure consistent, reliable operation of the cluster
during upgrades, reboots, hardware failures, and other such events.
This is also known as high-availability or just HA.
Non-HA clusters are sometimes used as test clusters, CI clusters, or in specific scenarios
which warrant the loss of redundancy, but they should almost never be used in production.
Maintaining the proper count of control plane nodes is also critical.
The etcd database operates on the principles of membership and quorum, so
membership should always be an odd number, and there is exponentially-increasing
overhead for each additional member.
Therefore, the number of control plane nodes should almost always be 3.
In some particularly large or distributed clusters, the count may be 5, but this
is very rare.
See [this document]({{< relref "../learn-more/concepts#control-planes-are-not-linear-replicas" >}}) on the topic for more information.
### What is the control plane endpoint?
The Kubernetes control plane endpoint is the single canonical URL by which the
Kubernetes API is accessed.
Especially with high-availability (HA) control planes, it is common that this endpoint may not point to the Kubernetes API server
directly, but may be instead point to a load balancer or a DNS name which may
have multiple `A` and `AAAA` records.
Like Talos' own API, the Kubernetes API is constructed with mutual TLS, client
certs, and a common Certificate Authority (CA).
Unlike general-purpose websites, there is no need for an upstream CA, so tools
such as cert-manager, services such as Let's Encrypt, or purchased products such
as validated TLS certificates are not required.
Encryption, however, _is_, and hence the URL scheme will always be `https://`.
By default, the Kubernetes API server in Talos runs on port 6443.
As such, the control plane endpoint URLs for Talos will almost always be of the form
`https://endpoint:6443`, noting that the port, since it is not the `https`
default of `443` is _required_.
The `endpoint` above may be a DNS name or IP address, but it should be
ultimately be directed to the _set_ of all controlplane nodes, as opposed to a
single one.
As mentioned above, this can be achieved by a number of strategies, including:
- an external load balancer
- DNS records
- Talos-builtin shared IP ([VIP]({{< relref "../talos-guides/network/vip" >}}))
- BGP peering of a shared IP (such as with [kube-vip](https://kube-vip.io))
Using a DNS name here is usually a good idea, it being the most flexible
option, since it allows the combination with any _other_ option, while offering
a layer of abstraction.
It allows the underlying IP addresses to change over time without impacting the
canonical URL.
Unlike most services in Kubernetes, the API server runs with host networking,
meaning that it shares the network namespace with the host.
This means you can use the IP address(es) of the host to refer to the Kubernetes
API server.
For availability of the API, it is important that any load balancer be aware of
the health of the backend API servers.
This makes a load balancer-based system valuable to minimize disruptions during
common node lifecycle operations like reboots and upgrades.
It is critical that control plane endpoint works correctly during cluster bootstrap phase, as nodes discover
each other using control plane endpoint.
### kubelet is not running on control plane node
The `kubelet` service should be running on control plane nodes as soon as networking is configured:
```bash
$ talosctl -n <IP> service kubelet
NODE 172.20.0.2
ID kubelet
STATE Running
HEALTH OK
EVENTS [Running]: Health check successful (2m54s ago)
[Running]: Health check failed: Get "http://127.0.0.1:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused (3m4s ago)
[Running]: Started task kubelet (PID 2334) for container kubelet (3m6s ago)
[Preparing]: Creating service runner (3m6s ago)
[Preparing]: Running pre state (3m15s ago)
[Waiting]: Waiting for service "timed" to be "up" (3m15s ago)
[Waiting]: Waiting for service "cri" to be "up", service "timed" to be "up" (3m16s ago)
[Waiting]: Waiting for service "cri" to be "up", service "networkd" to be "up", service "timed" to be "up" (3m18s ago)
```
If the `kubelet` is not running, it may be due to invalid configuration.
Check `kubelet` logs with the `talosctl logs` command:
```bash
$ talosctl -n <IP> logs kubelet
172.20.0.2: I0305 20:45:07.756948 2334 controller.go:101] kubelet config controller: starting controller
172.20.0.2: I0305 20:45:07.756995 2334 controller.go:267] kubelet config controller: ensuring filesystem is set up correctly
172.20.0.2: I0305 20:45:07.757000 2334 fsstore.go:59] kubelet config controller: initializing config checkpoints directory "/etc/kubernetes/kubelet/store"
```
### etcd is not running
By far the most likely cause of `etcd` not running is because the cluster has
not yet been bootstrapped or because bootstrapping is currently in progress.
The `talosctl bootstrap` command must be run manually and only _once_ per
cluster, and this step is commonly missed.
Once a node is bootstrapped, it will start `etcd` and, over the course of a
minute or two (depending on the download speed of the control plane nodes), the
other control plane nodes should discover it and join themselves to the cluster.
Also, `etcd` will only run on control plane nodes.
If a node is designated as a worker node, you should not expect `etcd` to be
running on it.
When node boots for the first time, the `etcd` data directory (`/var/lib/etcd`) is empty, and it will only be populated when `etcd` is launched.
If `etcd` is not running, check service `etcd` state:
```bash
$ talosctl -n <IP> service etcd
NODE 172.20.0.2
ID etcd
STATE Running
HEALTH OK
EVENTS [Running]: Health check successful (3m21s ago)
[Running]: Started task etcd (PID 2343) for container etcd (3m26s ago)
[Preparing]: Creating service runner (3m26s ago)
[Preparing]: Running pre state (3m26s ago)
[Waiting]: Waiting for service "cri" to be "up", service "networkd" to be "up", service "timed" to be "up" (3m26s ago)
```
If service is stuck in `Preparing` state for bootstrap node, it might be related to slow network - at this stage
Talos pulls the `etcd` image from the container registry.
If the `etcd` service is crashing and restarting, check its logs with `talosctl -n <IP> logs etcd`.
The most common reasons for crashes are:
- wrong arguments passed via `extraArgs` in the configuration;
- booting Talos on non-empty disk with previous Talos installation, `/var/lib/etcd` contains data from old cluster.
### etcd is not running on non-bootstrap control plane node
The `etcd` service on control plane nodes which were not the target of the cluster bootstrap will wait until the bootstrapped control plane node has completed.
The bootstrap and discovery processes may take a few minutes to complete.
As soon as the bootstrapped node starts its Kubernetes control plane components, `kubectl get endpoints` will return the IP of bootstrapped control plane node.
At this point, the other control plane nodes will start their `etcd` services, join the cluster, and then start their own Kubernetes control plane components.
### Kubernetes static pod definitions are not generated
Talos should write the static pod definitions for the Kubernetes control plane
in `/etc/kubernetes/manifests`:
```bash
$ talosctl -n <IP> ls /etc/kubernetes/manifests
NODE NAME
172.20.0.2 .
172.20.0.2 talos-kube-apiserver.yaml
172.20.0.2 talos-kube-controller-manager.yaml
172.20.0.2 talos-kube-scheduler.yaml
```
If the static pod definitions are not rendered, check `etcd` and `kubelet` service health (see above)
and the controller runtime logs (`talosctl logs controller-runtime`).
### Talos prints error `an error on the server ("") has prevented the request from succeeding`
This is expected during initial cluster bootstrap and sometimes after a reboot:
```bash
[ 70.093289] [talos] task labelNodeAsMaster (1/1): starting
[ 80.094038] [talos] retrying error: an error on the server ("") has prevented the request from succeeding (get nodes talos-default-master-1)
```
Initially `kube-apiserver` component is not running yet, and it takes some time before it becomes fully up
during bootstrap (image should be pulled from the Internet, etc.)
Once the control plane endpoint is up, Talos should continue with its boot
process.
If Talos doesn't proceed, it may be due to a configuration issue.
In any case, the status of the control plane components on each control plane nodes can be checked with `talosctl containers -k`:
```bash
$ talosctl -n <IP> containers --kubernetes
NODE NAMESPACE ID IMAGE PID STATUS
172.20.0.2 k8s.io kube-system/kube-apiserver-talos-default-master-1 k8s.gcr.io/pause:3.2 2539 SANDBOX_READY
172.20.0.2 k8s.io └─ kube-system/kube-apiserver-talos-default-master-1:kube-apiserver k8s.gcr.io/kube-apiserver:v{{< k8s_release >}} 2572 CONTAINER_RUNNING
```
If `kube-apiserver` shows as `CONTAINER_EXITED`, it might have exited due to configuration error.
Logs can be checked with `taloctl logs --kubernetes` (or with `-k` as a shorthand):
```bash
$ talosctl -n <IP> logs -k kube-system/kube-apiserver-talos-default-master-1:kube-apiserver
172.20.0.2: 2021-03-05T20:46:13.133902064Z stderr F 2021/03/05 20:46:13 Running command:
172.20.0.2: 2021-03-05T20:46:13.133933824Z stderr F Command env: (log-file=, also-stdout=false, redirect-stderr=true)
172.20.0.2: 2021-03-05T20:46:13.133938524Z stderr F Run from directory:
172.20.0.2: 2021-03-05T20:46:13.13394154Z stderr F Executable path: /usr/local/bin/kube-apiserver
...
```
### Talos prints error `nodes "talos-default-master-1" not found`
This error means that `kube-apiserver` is up and the control plane endpoint is healthy, but the `kubelet` hasn't received
its client certificate yet, and it wasn't able to register itself to Kubernetes.
The Kubernetes controller manager (`kube-controller-manager`)is responsible for monitoring the certificate
signing requests (CSRs) and issuing certificates for each of them.
The kubelet is responsible for generating and submitting the CSRs for its
associated node.
For the `kubelet` to get its client certificate, then, the Kubernetes control plane
must be healthy:
- the API server is running and available at the Kubernetes control plane
endpoint URL
- the controller manager is running and a leader has been elected
The states of any CSRs can be checked with `kubectl get csr`:
```bash
$ kubectl get csr
NAME AGE SIGNERNAME REQUESTOR CONDITION
csr-jcn9j 14m kubernetes.io/kube-apiserver-client-kubelet system:bootstrap:q9pyzr Approved,Issued
csr-p6b9q 14m kubernetes.io/kube-apiserver-client-kubelet system:bootstrap:q9pyzr Approved,Issued
csr-sw6rm 14m kubernetes.io/kube-apiserver-client-kubelet system:bootstrap:q9pyzr Approved,Issued
csr-vlghg 14m kubernetes.io/kube-apiserver-client-kubelet system:bootstrap:q9pyzr Approved,Issued
```
### Talos prints error `node not ready`
A Node in Kubernetes is marked as `Ready` only once its CNI is up.
It takes a minute or two for the CNI images to be pulled and for the CNI to start.
If the node is stuck in this state for too long, check CNI pods and logs with `kubectl`.
Usually, CNI-related resources are created in `kube-system` namespace.
For example, for Talos default Flannel CNI:
```bash
$ kubectl -n kube-system get pods
NAME READY STATUS RESTARTS AGE
...
kube-flannel-25drx 1/1 Running 0 23m
kube-flannel-8lmb6 1/1 Running 0 23m
kube-flannel-gl7nx 1/1 Running 0 23m
kube-flannel-jknt9 1/1 Running 0 23m
...
```
### Talos prints error `x509: certificate signed by unknown authority`
The full error might look like:
```bash
x509: certificate signed by unknown authority (possiby because of crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes"
```
Usually, this occurs because the control plane endpoint points to a different
cluster than the client certificate was generated for.
If a node was recycled between clusters, make sure it was properly wiped between
uses.
If a client has multiple client configurations, make sure you are matching the correct `talosconfig` with the
correct cluster.
### etcd is running on bootstrap node, but stuck in `pre` state on non-bootstrap nodes
Please see question `etcd is not running on non-bootstrap control plane node`.
### Checking `kube-controller-manager` and `kube-scheduler`
If the control plane endpoint is up, the status of the pods can be ascertained with `kubectl`:
```bash
$ kubectl get pods -n kube-system -l k8s-app=kube-controller-manager
NAME READY STATUS RESTARTS AGE
kube-controller-manager-talos-default-master-1 1/1 Running 0 28m
kube-controller-manager-talos-default-master-2 1/1 Running 0 28m
kube-controller-manager-talos-default-master-3 1/1 Running 0 28m
```
If the control plane endpoint is not yet up, the container status of the control plane components can be queried with
`talosctl containers --kubernetes`:
```bash
$ talosctl -n <IP> c -k
NODE NAMESPACE ID IMAGE PID STATUS
...
172.20.0.2 k8s.io kube-system/kube-controller-manager-talos-default-master-1 k8s.gcr.io/pause:3.2 2547 SANDBOX_READY
172.20.0.2 k8s.io └─ kube-system/kube-controller-manager-talos-default-master-1:kube-controller-manager k8s.gcr.io/kube-controller-manager:v{{< k8s_release >}} 2580 CONTAINER_RUNNING
172.20.0.2 k8s.io kube-system/kube-scheduler-talos-default-master-1 k8s.gcr.io/pause:3.2 2638 SANDBOX_READY
172.20.0.2 k8s.io └─ kube-system/kube-scheduler-talos-default-master-1:kube-scheduler k8s.gcr.io/kube-scheduler:v{{< k8s_release >}} 2670 CONTAINER_RUNNING
...
```
If some of the containers are not running, it could be that image is still being pulled.
Otherwise the process might crashing.
The logs can be checked with `talosctl logs --kubernetes <containerID>`:
```bash
$ talosctl -n <IP> logs -k kube-system/kube-controller-manager-talos-default-master-1:kube-controller-manager
172.20.0.3: 2021-03-09T13:59:34.291667526Z stderr F 2021/03/09 13:59:34 Running command:
172.20.0.3: 2021-03-09T13:59:34.291702262Z stderr F Command env: (log-file=, also-stdout=false, redirect-stderr=true)
172.20.0.3: 2021-03-09T13:59:34.291707121Z stderr F Run from directory:
172.20.0.3: 2021-03-09T13:59:34.291710908Z stderr F Executable path: /usr/local/bin/kube-controller-manager
172.20.0.3: 2021-03-09T13:59:34.291719163Z stderr F Args (comma-delimited): /usr/local/bin/kube-controller-manager,--allocate-node-cidrs=true,--cloud-provider=,--cluster-cidr=10.244.0.0/16,--service-cluster-ip-range=10.96.0.0/12,--cluster-signing-cert-file=/system/secrets/kubernetes/kube-controller-manager/ca.crt,--cluster-signing-key-file=/system/secrets/kubernetes/kube-controller-manager/ca.key,--configure-cloud-routes=false,--kubeconfig=/system/secrets/kubernetes/kube-controller-manager/kubeconfig,--leader-elect=true,--root-ca-file=/system/secrets/kubernetes/kube-controller-manager/ca.crt,--service-account-private-key-file=/system/secrets/kubernetes/kube-controller-manager/service-account.key,--profiling=false
172.20.0.3: 2021-03-09T13:59:34.293870359Z stderr F 2021/03/09 13:59:34 Now listening for interrupts
172.20.0.3: 2021-03-09T13:59:34.761113762Z stdout F I0309 13:59:34.760982 10 serving.go:331] Generated self-signed cert in-memory
...
```
### Checking controller runtime logs
Talos runs a set of controllers which operate on resources to build and support the Kubernetes control plane.
Some debugging information can be queried from the controller logs with `talosctl logs controller-runtime`:
```bash
$ talosctl -n <IP> logs controller-runtime
172.20.0.2: 2021/03/09 13:57:11 secrets.EtcdController: controller starting
172.20.0.2: 2021/03/09 13:57:11 config.MachineTypeController: controller starting
172.20.0.2: 2021/03/09 13:57:11 k8s.ManifestApplyController: controller starting
172.20.0.2: 2021/03/09 13:57:11 v1alpha1.BootstrapStatusController: controller starting
172.20.0.2: 2021/03/09 13:57:11 v1alpha1.TimeStatusController: controller starting
...
```
Controllers continuously run a reconcile loop, so at any time, they may be starting, failing, or restarting.
This is expected behavior.
Things to look for:
`v1alpha1.BootstrapStatusController: bootkube initialized status not found`: control plane is not self-hosted, running with static pods.
`k8s.KubeletStaticPodController: writing static pod "/etc/kubernetes/manifests/talos-kube-apiserver.yaml"`: static pod definitions were rendered successfully.
`k8s.ManifestApplyController: controller failed: error creating mapping for object /v1/Secret/bootstrap-token-q9pyzr: an error on the server ("") has prevented the request from succeeding`: control plane endpoint is not up yet, bootstrap manifests can't be injected, controller is going to retry.
`k8s.KubeletStaticPodController: controller failed: error refreshing pod status: error fetching pod status: an error on the server ("Authorization error (user=apiserver-kubelet-client, verb=get, resource=nodes, subresource=proxy)") has prevented the request from succeeding`: kubelet hasn't been able to contact `kube-apiserver` yet to push pod status, controller
is going to retry.
`k8s.ManifestApplyController: created rbac.authorization.k8s.io/v1/ClusterRole/psp:privileged`: one of the bootstrap manifests got successfully applied.
`secrets.KubernetesController: controller failed: missing cluster.aggregatorCA secret`: Talos is running with 0.8 configuration, if the cluster was upgraded from 0.8, this is expected, and conversion process will fix machine config
automatically.
If this cluster was bootstrapped with version 0.9, machine configuration should be regenerated with 0.9 talosctl.
If there are no new messages in the `controller-runtime` log, it means that the controllers have successfully finished reconciling, and that the current system state is the desired system state.
### Checking static pod definitions
Talos generates static pod definitions for the `kube-apiserver`, `kube-controller-manager`, and `kube-scheduler`
components based on its machine configuration.
These definitions can be checked as resources with `talosctl get staticpods`:
```bash
$ talosctl -n <IP> get staticpods -o yaml
get staticpods -o yaml
node: 172.20.0.2
metadata:
namespace: controlplane
type: StaticPods.kubernetes.talos.dev
id: kube-apiserver
version: 2
phase: running
finalizers:
- k8s.StaticPodStatus("kube-apiserver")
spec:
apiVersion: v1
kind: Pod
metadata:
annotations:
talos.dev/config-version: "1"
talos.dev/secrets-version: "1"
creationTimestamp: null
labels:
k8s-app: kube-apiserver
tier: control-plane
name: kube-apiserver
namespace: kube-system
...
```
The status of the static pods can queried with `talosctl get staticpodstatus`:
```bash
$ talosctl -n <IP> get staticpodstatus
NODE NAMESPACE TYPE ID VERSION READY
172.20.0.2 controlplane StaticPodStatus kube-system/kube-apiserver-talos-default-master-1 1 True
172.20.0.2 controlplane StaticPodStatus kube-system/kube-controller-manager-talos-default-master-1 1 True
172.20.0.2 controlplane StaticPodStatus kube-system/kube-scheduler-talos-default-master-1 1 True
```
The most important status field is `READY`, which is the last column printed.
The complete status can be fetched by adding `-o yaml` flag.
### Checking bootstrap manifests
As part of the bootstrap process, Talos injects bootstrap manifests into Kubernetes API server.
There are two kinds of these manifests: system manifests built-in into Talos and extra manifests downloaded (custom CNI, extra manifests in the machine config):
```bash
$ talosctl -n <IP> get manifests
NODE NAMESPACE TYPE ID VERSION
172.20.0.2 controlplane Manifest 00-kubelet-bootstrapping-token 1
172.20.0.2 controlplane Manifest 01-csr-approver-role-binding 1
172.20.0.2 controlplane Manifest 01-csr-node-bootstrap 1
172.20.0.2 controlplane Manifest 01-csr-renewal-role-binding 1
172.20.0.2 controlplane Manifest 02-kube-system-sa-role-binding 1
172.20.0.2 controlplane Manifest 03-default-pod-security-policy 1
172.20.0.2 controlplane Manifest 05-https://docs.projectcalico.org/manifests/calico.yaml 1
172.20.0.2 controlplane Manifest 10-kube-proxy 1
172.20.0.2 controlplane Manifest 11-core-dns 1
172.20.0.2 controlplane Manifest 11-core-dns-svc 1
172.20.0.2 controlplane Manifest 11-kube-config-in-cluster 1
```
Details of each manifest can be queried by adding `-o yaml`:
```bash
$ talosctl -n <IP> get manifests 01-csr-approver-role-binding --namespace=controlplane -o yaml
node: 172.20.0.2
metadata:
namespace: controlplane
type: Manifests.kubernetes.talos.dev
id: 01-csr-approver-role-binding
version: 1
phase: running
spec:
- apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: system-bootstrap-approve-node-client-csr
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:certificates.k8s.io:certificatesigningrequests:nodeclient
subjects:
- apiGroup: rbac.authorization.k8s.io
kind: Group
name: system:bootstrappers
```
### Worker node is stuck with `apid` health check failures
Control plane nodes have enough secret material to generate `apid` server certificates, but worker nodes
depend on control plane `trustd` services to generate certificates.
Worker nodes wait for their `kubelet` to join the cluster.
Then the Talos `apid` queries the Kubernetes endpoints via control plane
endpoint to find `trustd` endpoints.
They then use `trustd` to request and receive their certificate.
So if `apid` health checks are failing on worker node:
- make sure control plane endpoint is healthy
- check that worker node `kubelet` joined the cluster

View File

@ -0,0 +1,4 @@
---
title: "Introduction"
weight: 10
---

View File

@ -0,0 +1,463 @@
---
title: Getting Started
weight: 30
description: "A guide to setting up a Talos Linux cluster on multiple machines."
---
This document will walk you through installing a full Talos Cluster.
You may wish to try the [Quickstart]({{< relref "quickstart" >}}) first, to quickly create a local virtual cluster on your workstation.
Regardless of where you run Talos, there is a pattern to deploying it.
In general you need to:
- acquire the installation image
- decide on the endpoint for Kubernetes
- optionally create a load balancer
- configure Talos
- configure `talosctl`
- bootstrap Kubernetes
## Prerequisites
### `talosctl`
`talosctl` is a CLI tool which interfaces with the Talos API in
an easy manner.
It also includes a number of useful options for creating and managing clusters.
You should install `talosctl` before continuing:
#### `amd64`
```bash
curl -Lo /usr/local/bin/talosctl https://github.com/siderolabs/talos/releases/download/{{< release >}}/talosctl-$(uname -s | tr "[:upper:]" "[:lower:]")-amd64
chmod +x /usr/local/bin/talosctl
```
#### `arm64`
For `linux` and `darwin` operating systems `talosctl` is also available for the `arm64` processor architecture.
```bash
curl -Lo /usr/local/bin/talosctl https://github.com/siderolabs/talos/releases/download/{{< release >}}/talosctl-$(uname -s | tr "[:upper:]" "[:lower:]")-arm64
chmod +x /usr/local/bin/talosctl
```
## Acquire the installation image
The easiest way to install Talos is to use the ISO image.
The latest ISO image can be found on the Github [Releases](https://github.com/siderolabs/talos/releases) page:
- X86: [https://github.com/siderolabs/talos/releases/download/{{< release >}}/talos-amd64.iso](https://github.com/siderolabs/talos/releases/download/{{< release >}}/talos-amd64.iso)
- ARM64: [https://github.com/siderolabs/talos/releases/download/{{< release >}}/talos-arm64.iso](https://github.com/siderolabs/talos/releases/download/{{< release >}}/talos-arm64.iso)
When booted from the ISO, Talos will run in RAM, and it will not install itself
until it is provided a configuration.
Thus, it is safe to boot the ISO onto any machine.
### Alternative Booting
For network booting and self-built media, you can use the published kernel and initramfs images:
- X86: [vmlinuz-amd64](https://github.com/siderolabs/talos/releases/download/{{< release >}}/vmlinuz-amd64) [initramfs-amd64.xz](https://github.com/siderolabs/talos/releases/download/{{< release >}}/initramfs-amd64.xz)
- ARM64: [vmlinuz-arm64](https://github.com/siderolabs/talos/releases/download/{{< release >}}/vmlinuz-arm64) [initramfs-arm64.xz](https://github.com/siderolabs/talos/releases/download/{{< release >}}/initramfs-arm64.xz)
Note that to use alternate booting, there are a number of required kernel parameters.
Please see the [kernel]({{< relref "../reference/kernel" >}}) docs for more information.
## Decide the Kubernetes Endpoint
In order to configure Kubernetes and bootstrap the cluster, Talos needs to know
what the endpoint (DNS name or IP address) of the Kubernetes API Server will be.
The endpoint should be the fully-qualified HTTP(S) URL for the Kubernetes API
Server, which (by default) runs on port 6443 using HTTPS.
Thus, the format of the endpoint may be something like:
- `https://192.168.0.10:6443`
- `https://kube.mycluster.mydomain.com:6443`
- `https://[2001:db8:1234::80]:6443`
Because the Kubernetes controlplane is meant to be highly
available, we must also choose how to bind the API server endpoint to the servers
themselves.
There are three common ways to do this:
### Dedicated Load-balancer
If you are using a cloud provider or have your own load-balancer available (such
as HAProxy, nginx reverse proxy, or an F5 load-balancer), using
a dedicated load balancer is a natural choice.
Create an appropriate frontend matching the endpoint, and point the backends at each of the addresses of the Talos controlplane nodes.
### Layer 2 Shared IP
Talos has integrated support for serving Kubernetes from a shared (sometimes
called "virtual") IP address.
This method relies on OSI Layer 2 connectivity between controlplane Talos nodes.
In this case, we choose an IP address on the same subnet as the Talos
controlplane nodes which is not otherwise assigned to any machine.
For instance, if your controlplane node IPs are:
- 192.168.0.10
- 192.168.0.11
- 192.168.0.12
you could choose the ip `192.168.0.15` as your shared IP address.
Just make sure that `192.168.0.15` is not used by any other machine and that your DHCP
will not serve it to any other machine.
Once chosen, form the full HTTPS URL from this IP:
```url
https://192.168.0.15:6443
```
You are free to set a DNS record to this IP address to identify the Kubernetes API endpoint, but you will need to use the IP address itself, not the DNS name, to configure the shared IP (`machine.network.interfaces[].vip.ip`) in the Talos configuration.
For more information about using a shared IP, see the related
[Guide]({{< relref "../talos-guides/network/vip" >}})
### DNS records
If neither of the other methods work for you, you can use DNS records to
provide a measure of redundancy.
In this case, you would add multiple A or AAAA records (one for each controlpane node) to a DNS name.
For instance, you could add:
```dns
kube.cluster1.mydomain.com IN A 192.168.0.10
kube.cluster1.mydomain.com IN A 192.168.0.11
kube.cluster1.mydomain.com IN A 192.168.0.12
```
Then, your endpoint would be:
```url
https://kube.cluster1.mydomain.com:6443
```
## Decide how to access the Talos API
Since Talos is entirely API-driven, Talos comes with a number of mechanisms to make accessing the API easier.
Controlplane nodes can proxy requests for worker nodes.
This means that you only need access to the controlplane nodes in order to access
the rest of the network.
This is useful for security (your worker nodes do not need to have
public IPs or be otherwise connected to the Internet), and it also makes working
with highly-variable clusters easier, since you only need to know the
controlplane nodes in advance.
Even better, the `talosctl` tool will automatically load balance requests and fail over
between all of your controlplane nodes, so long as it is informed of the
controlplane node IPs.
This means you need to tell your client (`talosctl`) how to communicate with the controlplane nodes, which is done by defining the `endpoints`.
In general, it is recommended that these point to the set of control plane
nodes, either directly or through a reverse proxy or load balancer, similarly to accessing the Kubernetes API.
The difference is that the Talos API listens on port `50000/tcp`.
Whichever way you wish to access the Talos API, be sure to note the IP(s) or
hostname(s) so that you can configure your `talosctl` tool's `endpoints` below.
**NOTE**: The [Virtual IP]({{< relref "../talos-guides/network/vip.md" >}}) method is not recommended when accessing the Talos API as it requires etcd to be bootstrapped and functional.
This can make debugging any issues via the Talos API more difficult as issues with Talos configuration may result in etcd not achieving quorum, and therefore the Virtual IP not being available.
In this case setting the endpoints to the IP or hostnames of the control plane nodes themselves is preferred.
## Configure Talos
When Talos boots without a configuration, such as when using the Talos ISO, it
enters a limited maintenance mode and waits for a configuration to be provided.
Alternatively, the Talos installer can be booted with the `talos.config` kernel
commandline argument set to an HTTP(s) URL from which it should receive its
configuration.
In cases where a PXE server can be available, this is much more efficient than
manually configuring each node.
If you do use this method, just note that Talos does require a number of other
kernel commandline parameters.
See the [required kernel parameters]({{< relref "../reference/kernel" >}}) for more information.
In either case, we need to generate the configuration which is to be provided.
Luckily, the `talosctl` tool comes with a configuration generator for exactly
this purpose.
```sh
talosctl gen config "cluster-name" "cluster-endpoint"
```
Here, `cluster-name` is an arbitrary name for the cluster which will be used
in your local client configuration as a label.
It does not affect anything in the cluster itself, but it should be unique in the configuration on your local workstation.
The `cluster-endpoint` is where you insert the Kubernetes Endpoint you
selected from above.
This is the Kubernetes API URL, and it should be a complete URL, with `https://`
and port.
(The default port is `6443`.)
When you run this command, you will receive a number of files in your current
directory:
- `controlplane.yaml`
- `worker.yaml`
- `talosconfig`
The `.yaml` files are what we call Machine Configs.
They are installed onto the Talos servers, and they provide their complete configuration,
describing everything from what disk Talos should be installed to, to what
sysctls to set, to what network settings it should have.
In the case of the `controlplane.yaml`, it even describes how Talos should form its Kubernetes cluster.
The `talosconfig` file (which is also YAML) is your local client configuration
file.
### Controlplane and Worker
The two types of Machine Configs correspond to the two roles of Talos nodes.
The Controlplane Machine Config describes the configuration of a Talos server on
which the Kubernetes Controlplane should run.
The Worker Machine Config describes everything else: workload servers.
The main difference between Controlplane Machine Config files and Worker Machine
Config files is that the former contains information about how to form the
Kubernetes cluster.
### Templates
The generated files can be thought of as templates.
Individual machines may need specific settings (for instance, each may have a
different static IP address).
When different files are needed for machines of the same type, simply
copy the source template (`controlplane.yaml` or `worker.yaml`) and make whatever
modifications need to be done.
For instance, if you had three controlplane nodes and three worker nodes, you
may do something like this:
```bash
for i in $(seq 0 2); do
cp controlplane.yaml cp$i.yaml
end
for i in $(seq 0 2); do
cp worker.yaml w$i.yaml
end
```
In cases where there is no special configuration needed, you may use the same
file for each machine of the same type.
### Apply Configuration
After you have generated each machine's Machine Config, you need to load them
into the machines themselves.
For that, you need to know their IP addresses.
If you have access to the console or console logs of the machines, you can read
them to find the IP address(es).
Talos will print them out during the boot process:
```log
[ 4.605369] [talos] task loadConfig (1/1): this machine is reachable at:
[ 4.607358] [talos] task loadConfig (1/1): 192.168.0.2
[ 4.608766] [talos] task loadConfig (1/1): server certificate fingerprint:
[ 4.611106] [talos] task loadConfig (1/1): xA9a1t2dMxB0NJ0qH1pDzilWbA3+DK/DjVbFaJBYheE=
[ 4.613822] [talos] task loadConfig (1/1):
[ 4.614985] [talos] task loadConfig (1/1): upload configuration using talosctl:
[ 4.616978] [talos] task loadConfig (1/1): talosctl apply-config --insecure --nodes 192.168.0.2 --file <config.yaml>
[ 4.620168] [talos] task loadConfig (1/1): or apply configuration using talosctl interactive installer:
[ 4.623046] [talos] task loadConfig (1/1): talosctl apply-config --insecure --nodes 192.168.0.2 --mode=interactive
[ 4.626365] [talos] task loadConfig (1/1): optionally with node fingerprint check:
[ 4.628692] [talos] task loadConfig (1/1): talosctl apply-config --insecure --nodes 192.168.0.2 --cert-fingerprint 'xA9a1t2dMxB0NJ0qH1pDzilWbA3+DK/DjVbFaJBYheE=' --file <config.yaml>
```
If you do not have console access, the IP address may also be discoverable from
your DHCP server.
Once you have the IP address, you can then apply the correct configuration.
```sh
talosctl apply-config --insecure \
--nodes 192.168.0.2 \
--file cp0.yaml
```
The insecure flag is necessary at this point because the PKI infrastructure has
not yet been made available to the node.
Note that the connection _will_ be encrypted, it is just unauthenticated.
If you have console access, though, you can extract the server
certificate fingerprint and use it for an additional layer of validation:
```sh
talosctl apply-config --insecure \
--nodes 192.168.0.2 \
--cert-fingerprint xA9a1t2dMxB0NJ0qH1pDzilWbA3+DK/DjVbFaJBYheE= \
--file cp0.yaml
```
Using the fingerprint allows you to be sure you are sending the configuration to
the right machine, but it is completely optional.
After the configuration is applied to a node, it will reboot.
You may repeat this process for each of the nodes in your cluster.
## Configure your talosctl client
Now that the nodes are running Talos with its full PKI security suite, you need
to use that PKI to talk to the machines.
That means configuring your client, and that is what that `talosconfig` file is for.
### Endpoints
Endpoints are the communication endpoints to which the client directly talks.
These can be load balancers, DNS hostnames, a list of IPs, etc.
In general, it is recommended that these point to the set of control plane
nodes, either directly or through a reverse proxy or load balancer.
Each endpoint will automatically proxy requests destined to another node through
it, so it is not necessary to change the endpoint configuration just because you
wish to talk to a different node within the cluster.
Endpoints _do_, however, need to be members of the same Talos cluster as the
target node, because these proxied connections reply on certificate-based
authentication.
We need to set the `endpoints` in your `talosconfig`.
`talosctl` will automatically load balance and fail over among the endpoints,
so no external load balancer or DNS abstraction is required
(though you are free to use them).
As an example, if the IP addresses of our controlplane nodes are:
- 192.168.0.2
- 192.168.0.3
- 192.168.0.4
We would set those in the `talosconfig` with:
```sh
talosctl --talosconfig=./talosconfig \
config endpoint 192.168.0.2 192.168.0.3 192.168.0.4
```
### Nodes
The node is the target node on which you wish to perform the API call.
Keep in mind, when specifying nodes, their IPs and/or hostnames are *as seen by the endpoint servers*, not as from the client.
This is because all connections are proxied through the endpoints.
Some people also like to set a default set of nodes in the `talosconfig`.
This can be done in the same manner, replacing `endpoint` with `node`.
If you do this, however, know that you could easily reboot the wrong machine
by forgetting to declare the right one explicitly.
Worse, if you set several nodes as defaults, you could, with one `talosctl upgrade`
command upgrade your whole cluster all at the same time.
It's a powerful tool, and with that comes great responsibility.
The author of this document generally sets a single controlplane node to be the
default node, which provides the most flexible default operation while limiting
the scope of the disaster should a command be entered erroneously:
```sh
talosctl --talosconfig=./talosconfig \
config node 192.168.0.2
```
You may simply provide `-n` or `--nodes` to any `talosctl` command to
supply the node or (comma-delimited) nodes on which you wish to perform the
operation.
Supplying the commandline parameter will override any default nodes
in the configuration file.
To verify default node(s) you're currently configured to use, you can run:
```bash
$ talosctl version
Client:
...
Server:
NODE: <node>
...
```
For a more in-depth discussion of Endpoints and Nodes, please see
[talosctl]({{< relref "../learn-more/talosctl" >}}).
### Default configuration file
You _can_ reference which configuration file to use directly with the `--talosconfig` parameter:
```sh
talosctl --talosconfig=./talosconfig \
--nodes 192.168.0.2 version
```
However, `talosctl` comes with tooling to help you integrate and merge this
configuration into the default `talosctl` configuration file.
This is done with the `merge` option.
```sh
talosctl config merge ./talosconfig
```
This will merge your new `talosconfig` into the default configuration file
(`$XDG_CONFIG_HOME/talos/config.yaml`), creating it if necessary.
Like Kubernetes, the `talosconfig` configuration files has multiple "contexts"
which correspond to multiple clusters.
The `<cluster-name>` you chose above will be used as the context name.
## Kubernetes Bootstrap
All of your machines are configured, and your `talosctl` client is set up.
Now, you are ready to bootstrap your Kubernetes cluster.
If that sounds daunting, you haven't used Talos before.
Bootstrapping your Kubernetes cluster with Talos is as simple as:
```sh
talosctl bootstrap --nodes 192.168.0.2
```
**IMPORTANT**: the bootstrap operation should only be called **ONCE** and only on a **SINGLE**
controlplane node!
The IP can be any of your controlplanes (or the loadbalancer, if you have
one).
It should only be issued once.
At this point, Talos will form an `etcd` cluster, generate all of the core
Kubernetes assets, and start the Kubernetes controlplane components.
After a few moments, you will be able to download your Kubernetes client
configuration and get started:
```sh
talosctl kubeconfig
```
Running this command will add (merge) you new cluster into you local Kubernetes
configuration in the same way as `talosctl config merge` merged the Talos client
configuration into your local Talos client configuration file.
If you would prefer for the configuration to _not_ be merged into your default
Kubernetes configuration file, simple tell it a filename:
```sh
talosctl kubeconfig alternative-kubeconfig
```
If all goes well, you should now be able to connect to Kubernetes and see your
nodes:
```sh
kubectl get nodes
```

View File

@ -0,0 +1,70 @@
---
title: Quickstart
weight: 20
description: "A short guide on setting up a simple Talos Linux cluster locally with Docker."
---
There are two easy ways to try out Talos Linux.
Instructions for each are detailed below.
## Katacoda Sandbox
First, you can explore a sandbox environment hosted on Katacoda.
This approach has the benefit of having no prerequisites and being a bit more guided, so you can quickly learn how to interact with a cluster.
Please visit Katacoda [here](https://katacoda.com/siderolabs/scenarios/talos-intro) to try it out.
## Local Docker Cluster
Another easy way to try Talos is by using the CLI (`talosctl`) to create a cluster on a machine with `docker` installed.
### Prerequisites
#### `talosctl`
Download `talosctl`:
##### `amd64`
```bash
curl -Lo /usr/local/bin/talosctl https://github.com/siderolabs/talos/releases/download/{{< release >}}/talosctl-$(uname -s | tr "[:upper:]" "[:lower:]")-amd64
chmod +x /usr/local/bin/talosctl
```
##### `arm64`
For `linux` and `darwin` operating systems `talosctl` is also available for the `arm64` processor architecture.
```bash
curl -Lo /usr/local/bin/talosctl https://github.com/siderolabs/talos/releases/download/{{< release >}}/talosctl-$(uname -s | tr "[:upper:]" "[:lower:]")-arm64
chmod +x /usr/local/bin/talosctl
```
#### `kubectl`
Download `kubectl` via one of methods outlined in the [documentation](https://kubernetes.io/docs/tasks/tools/install-kubectl/).
### Create the Cluster
Now run the following:
```bash
talosctl cluster create
```
Verify that you can reach Kubernetes:
```bash
$ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
talos-default-master-1 Ready master 115s v{{< k8s_release >}} 10.5.0.2 <none> Talos ({{< release >}}) <host kernel> containerd://1.5.5
talos-default-worker-1 Ready <none> 115s v{{< k8s_release >}} 10.5.0.3 <none> Talos ({{< release >}}) <host kernel> containerd://1.5.5
```
### Destroy the Cluster
When you are all done, remove the cluster:
```bash
talosctl cluster destroy
```

View File

@ -0,0 +1,53 @@
---
title: Support Matrix
weight: 60
description: "Table of supported Talos Linux versions and respective platforms."
---
| Talos Version | 1.2 | 1.1 |
|----------------------------------------------------------------------------------------------------------------|------------------------------------|------------------------------------|
| Release Date | 2022-09-01, TBD | 2022-06-24 (1.1.0) |
| End of Community Support | 1.3.0 release (2022-12-01, TBD) | 1.2.0 release (2022-06-24, TBD) |
| Enterprise Support | [offered by Sidero Labs Inc.](https://www.siderolabs.com/support/) | [offered by Sidero Labs Inc.](https://www.siderolabs.com/support/) |
| Kubernetes | 1.24, 1.23, 1.22 |
| Architecture | amd64, arm64 | amd64, arm64 |
| **Platforms** | | |
| - cloud | AWS, GCP, Azure, Digital Ocean, Hetzner, OpenStack, Oracle Cloud, Scaleway, Vultr, Upcloud | AWS, GCP, Azure, Digital Ocean, Hetzner, OpenStack, Scaleway, Vultr, Upcloud |
| - bare metal | x86: BIOS, UEFI; arm64: UEFI; boot: ISO, PXE, disk image | x86: BIOS, UEFI; arm64: UEFI; boot: ISO, PXE, disk image |
| - virtualized | VMware, Hyper-V, KVM, Proxmox, Xen | VMware, Hyper-V, KVM, Proxmox, Xen |
| - SBCs | Banana Pi M64, Jetson Nano, Libre Computer Board ALL-H3-CC, Pine64, Pine64 Rock64, Radxa ROCK Pi 4c, Raspberry Pi 4B | Raspberry Pi4, Banana Pi M64, Pine64, and other |
| - local | Docker, QEMU | Docker, QEMU |
| **Cluster API** | | |
| [CAPI Bootstrap Provider Talos](https://github.com/siderolabs/cluster-api-bootstrap-provider-talos) | >= 0.5.3 | >= 0.5.3 |
| [CAPI Control Plane Provider Talos](https://github.com/siderolabs/cluster-api-control-plane-provider-talos) | >= 0.4.5 | >= 0.4.5 |
| [Sidero](https://www.sidero.dev/) | >= 0.5.0 | >= 0.5.0 |
| **UI** | | |
| [Theila](https://github.com/siderolabs/theila) | ✓ | ✓ |
## Platform Tiers
Tier 1: Automated tests, high-priority fixes.
Tier 2: Tested from time to time, medium-priority bugfixes.
Tier 3: Not tested by core Talos team, community tested.
### Tier 1
* Metal
* AWS
* GCP
### Tier 2
* Azure
* Digital Ocean
* OpenStack
* VMWare
### Tier 3
* Hetzner
* nocloud
* Oracle Cloud
* Scaleway
* Vultr
* Upcloud

View File

@ -0,0 +1,55 @@
---
title: System Requirements
weight: 40
description: "Hardware requirements for running Talos Linux."
---
## Minimum Requirements
<table class="table-auto">
<thead>
<tr>
<th class="px-4 py-2">Role</th>
<th class="px-4 py-2">Memory</th>
<th class="px-4 py-2">Cores</th>
</tr>
</thead>
<tbody>
<tr>
<td class="border px-4 py-2">Init/Control Plane</td>
<td class="border px-4 py-2">2GB</td>
<td class="border px-4 py-2">2</td>
</tr>
<tr class="bg-gray-100">
<td class="border px-4 py-2">Worker</td>
<td class="border px-4 py-2">1GB</td>
<td class="border px-4 py-2">1</td>
</tr>
</tbody>
</table>
## Recommended
<table class="table-auto">
<thead>
<tr>
<th class="px-4 py-2">Role</th>
<th class="px-4 py-2">Memory</th>
<th class="px-4 py-2">Cores</th>
</tr>
</thead>
<tbody>
<tr>
<td class="border px-4 py-2">Init/Control Plane</td>
<td class="border px-4 py-2">4GB</td>
<td class="border px-4 py-2">4</td>
</tr>
<tr class="bg-gray-100">
<td class="border px-4 py-2">Worker</td>
<td class="border px-4 py-2">2GB</td>
<td class="border px-4 py-2">2</td>
</tr>
</tbody>
</table>
These requirements are similar to that of kubernetes.

View File

@ -0,0 +1,7 @@
---
title: What's New in Talos 1.2
weight: 50
description: "List of new and shiny features in Talos Linux."
---
TBD

View File

@ -0,0 +1,28 @@
---
title: What is Talos?
weight: 10
description: "A quick introduction in to what Talos is and why it should be used."
---
Talos is a container optimized Linux distro; a reimagining of Linux for distributed systems such as Kubernetes.
Designed to be as minimal as possible while still maintaining practicality.
For these reasons, Talos has a number of features unique to it:
- it is immutable
- it is atomic
- it is ephemeral
- it is minimal
- it is secure by default
- it is managed via a single declarative configuration file and gRPC API
Talos can be deployed on container, cloud, virtualized, and bare metal platforms.
## Why Talos
In having less, Talos offers more.
Security.
Efficiency.
Resiliency.
Consistency.
All of these areas are improved simply by having less.

View File

@ -0,0 +1,5 @@
---
title: "Kubernetes Guides"
weight: 30
description: "Management of a Kubernetes Cluster hosted by Talos Linux"
---

View File

@ -0,0 +1,5 @@
---
title: "Configuration"
weight: 10
description: "How to configure components of the Kubernetes cluster itself."
---

View File

@ -0,0 +1,281 @@
---
title: "Ceph Storage cluster with Rook"
description: "Guide on how to create a simple Ceph storage cluster with Rook for Kubernetes"
aliases:
- ../../guides/configuring-ceph-with-rook
---
## Preparation
Talos Linux reserves an entire disk for the OS installation, so machines with multiple available disks are needed for a reliable Ceph cluster with Rook and Talos Linux.
Rook requires that the block devices or partitions used by Ceph have no partitions or formatted filesystems before use.
Rook also requires a minimum Kubernetes version of `v1.16` and Helm `v3.0` for installation of charts.
It is highly recommended that the [Rook Ceph overview](https://rook.io/docs/rook/v1.8/ceph-storage.html) is read and understood before deploying a Ceph cluster with Rook.
## Installation
Creating a Ceph cluster with Rook requires two steps; first the Rook Operator needs to be installed which can be done with a Helm Chart.
The example below installs the Rook Operator into the `rook-ceph` namespace, which is the default for a Ceph cluster with Rook.
```shell
$ helm repo add rook-release https://charts.rook.io/release
"rook-release" has been added to your repositories
$ helm install --create-namespace --namespace rook-ceph rook-ceph rook-release/rook-ceph
W0327 17:52:44.277830 54987 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0327 17:52:44.612243 54987 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
NAME: rook-ceph
LAST DEPLOYED: Sun Mar 27 17:52:42 2022
NAMESPACE: rook-ceph
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
The Rook Operator has been installed. Check its status by running:
kubectl --namespace rook-ceph get pods -l "app=rook-ceph-operator"
Visit https://rook.io/docs/rook/latest for instructions on how to create and configure Rook clusters
Important Notes:
- You must customize the 'CephCluster' resource in the sample manifests for your cluster.
- Each CephCluster must be deployed to its own namespace, the samples use `rook-ceph` for the namespace.
- The sample manifests assume you also installed the rook-ceph operator in the `rook-ceph` namespace.
- The helm chart includes all the RBAC required to create a CephCluster CRD in the same namespace.
- Any disk devices you add to the cluster in the 'CephCluster' must be empty (no filesystem and no partitions).
```
Once that is complete, the Ceph cluster can be installed with the official Helm Chart.
The Chart can be installed with default values, which will attempt to use all nodes in the Kubernetes cluster, and all unused disks on each node for Ceph storage, and make available block storage, object storage, as well as a shared filesystem.
Generally more specific node/device/cluster configuration is used, and the [Rook documentation](https://rook.io/docs/rook/v1.8/ceph-cluster-crd.html) explains all the available options in detail.
For this example the defaults will be adequate.
```shell
$ helm install --create-namespace --namespace rook-ceph rook-ceph-cluster --set operatorNamespace=rook-ceph rook-release/rook-ceph-cluster
NAME: rook-ceph-cluster
LAST DEPLOYED: Sun Mar 27 18:12:46 2022
NAMESPACE: rook-ceph
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
The Ceph Cluster has been installed. Check its status by running:
kubectl --namespace rook-ceph get cephcluster
Visit https://rook.github.io/docs/rook/latest/ceph-cluster-crd.html for more information about the Ceph CRD.
Important Notes:
- You can only deploy a single cluster per namespace
- If you wish to delete this cluster and start fresh, you will also have to wipe the OSD disks using `sfdisk`
```
Now the Ceph cluster configuration has been created, the Rook operator needs time to install the Ceph cluster and bring all the components online.
The progression of the Ceph cluster state can be followed with the following command.
```shell
$ watch kubectl --namespace rook-ceph get cephcluster rook-ceph
Every 2.0s: kubectl --namespace rook-ceph get cephcluster rook-ceph
NAME DATADIRHOSTPATH MONCOUNT AGE PHASE MESSAGE HEALTH EXTERNAL
rook-ceph /var/lib/rook 3 57s Progressing Configuring Ceph Mons
```
Depending on the size of the Ceph cluster and the availability of resources the Ceph cluster should become available, and with it the storage classes that can be used with Kubernetes Physical Volumes.
```shell
$ kubectl --namespace rook-ceph get cephcluster rook-ceph
NAME DATADIRHOSTPATH MONCOUNT AGE PHASE MESSAGE HEALTH EXTERNAL
rook-ceph /var/lib/rook 3 40m Ready Cluster created successfully HEALTH_OK
$ kubectl get storageclass
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
ceph-block (default) rook-ceph.rbd.csi.ceph.com Delete Immediate true 77m
ceph-bucket rook-ceph.ceph.rook.io/bucket Delete Immediate false 77m
ceph-filesystem rook-ceph.cephfs.csi.ceph.com Delete Immediate true 77m
```
## Talos Linux Considerations
It is important to note that a Rook Ceph cluster saves cluster information directly onto the node (by default `dataDirHostPath` is set to `/var/lib/rook`).
If running only a single `mon` instance, cluster management is little bit more involved, as any time a Talos Linux node is reconfigured or upgraded, the partition that stores the `/var` [file system]({{< relref "../../learn-more/architecture#the-file-system" >}}) is wiped, but the `--preserve` option of [`talosctl upgrade`]({{< relref "../../reference/cli#talosctl-upgrade" >}}) will ensure that doesn't happen.
By default, Rook configues Ceph to have 3 `mon` instances, in which case the data stored in `dataDirHostPath` can be regenerated from the other `mon` instances.
So when performing maintenance on a Talos Linux node with a Rook Ceph cluster (e.g. upgrading the Talos Linux version), it is imperative that care be taken to maintain the health of the Ceph cluster.
Before upgrading, you should always check the health status of the Ceph cluster to ensure that it is healthy.
```shell
$ kubectl --namespace rook-ceph get cephclusters.ceph.rook.io rook-ceph
NAME DATADIRHOSTPATH MONCOUNT AGE PHASE MESSAGE HEALTH EXTERNAL
rook-ceph /var/lib/rook 3 98m Ready Cluster created successfully HEALTH_OK
```
If it is, you can begin the upgrade process for the Talos Linux node, during which time the Ceph cluster will become unhealthy as the node is reconfigured.
Before performing any other action on the Talos Linux nodes, the Ceph cluster must return to a healthy status.
```shell
$ talosctl upgrade --nodes 172.20.15.5 --image ghcr.io/talos-systems/installer:v0.14.3
NODE ACK STARTED
172.20.15.5 Upgrade request received 2022-03-27 20:29:55.292432887 +0200 CEST m=+10.050399758
$ kubectl --namespace rook-ceph get cephclusters.ceph.rook.io
NAME DATADIRHOSTPATH MONCOUNT AGE PHASE MESSAGE HEALTH EXTERNAL
rook-ceph /var/lib/rook 3 99m Progressing Configuring Ceph Mgr(s) HEALTH_WARN
$ kubectl --namespace rook-ceph wait --timeout=1800s --for=jsonpath='{.status.ceph.health}=HEALTH_OK' rook-ceph
cephcluster.ceph.rook.io/rook-ceph condition met
```
The above steps need to be performed for each Talos Linux node undergoing maintenance, one at a time.
## Cleaning Up
### Rook Ceph Cluster Removal
Removing a Rook Ceph cluster requires a few steps, starting with signalling to Rook that the Ceph cluster is really being destroyed.
Then all Persistent Volumes (and Claims) backed by the Ceph cluster must be deleted, followed by the Storage Classes and the Ceph storage types.
```shell
$ kubectl --namespace rook-ceph patch cephcluster rook-ceph --type merge -p '{"spec":{"cleanupPolicy":{"confirmation":"yes-really-destroy-data"}}}'
cephcluster.ceph.rook.io/rook-ceph patched
$ kubectl delete storageclasses ceph-block ceph-bucket ceph-filesystem
storageclass.storage.k8s.io "ceph-block" deleted
storageclass.storage.k8s.io "ceph-bucket" deleted
storageclass.storage.k8s.io "ceph-filesystem" deleted
$ kubectl --namespace rook-ceph delete cephblockpools ceph-blockpool
cephblockpool.ceph.rook.io "ceph-blockpool" deleted
$ kubectl --namespace rook-ceph delete cephobjectstore ceph-objectstore
cephobjectstore.ceph.rook.io "ceph-objectstore" deleted
$ kubectl --namespace rook-ceph delete cephfilesystem ceph-filesystem
cephfilesystem.ceph.rook.io "ceph-filesystem" deleted
```
Once that is complete, the Ceph cluster itself can be removed, along with the Rook Ceph cluster Helm chart installation.
```shell
$ kubectl --namespace rook-ceph delete cephcluster rook-ceph
cephcluster.ceph.rook.io "rook-ceph" deleted
$ helm --namespace rook-ceph uninstall rook-ceph-cluster
release "rook-ceph-cluster" uninstalled
```
If needed, the Rook Operator can also be removed along with all the Custom Resource Definitions that it created.
```shell
$ helm --namespace rook-ceph uninstall rook-ceph
W0328 12:41:14.998307 147203 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
These resources were kept due to the resource policy:
[CustomResourceDefinition] cephblockpools.ceph.rook.io
[CustomResourceDefinition] cephbucketnotifications.ceph.rook.io
[CustomResourceDefinition] cephbuckettopics.ceph.rook.io
[CustomResourceDefinition] cephclients.ceph.rook.io
[CustomResourceDefinition] cephclusters.ceph.rook.io
[CustomResourceDefinition] cephfilesystemmirrors.ceph.rook.io
[CustomResourceDefinition] cephfilesystems.ceph.rook.io
[CustomResourceDefinition] cephfilesystemsubvolumegroups.ceph.rook.io
[CustomResourceDefinition] cephnfses.ceph.rook.io
[CustomResourceDefinition] cephobjectrealms.ceph.rook.io
[CustomResourceDefinition] cephobjectstores.ceph.rook.io
[CustomResourceDefinition] cephobjectstoreusers.ceph.rook.io
[CustomResourceDefinition] cephobjectzonegroups.ceph.rook.io
[CustomResourceDefinition] cephobjectzones.ceph.rook.io
[CustomResourceDefinition] cephrbdmirrors.ceph.rook.io
[CustomResourceDefinition] objectbucketclaims.objectbucket.io
[CustomResourceDefinition] objectbuckets.objectbucket.io
release "rook-ceph" uninstalled
$ kubectl delete crds cephblockpools.ceph.rook.io cephbucketnotifications.ceph.rook.io cephbuckettopics.ceph.rook.io \
cephclients.ceph.rook.io cephclusters.ceph.rook.io cephfilesystemmirrors.ceph.rook.io \
cephfilesystems.ceph.rook.io cephfilesystemsubvolumegroups.ceph.rook.io \
cephnfses.ceph.rook.io cephobjectrealms.ceph.rook.io cephobjectstores.ceph.rook.io \
cephobjectstoreusers.ceph.rook.io cephobjectzonegroups.ceph.rook.io cephobjectzones.ceph.rook.io \
cephrbdmirrors.ceph.rook.io objectbucketclaims.objectbucket.io objectbuckets.objectbucket.io
customresourcedefinition.apiextensions.k8s.io "cephblockpools.ceph.rook.io" deleted
customresourcedefinition.apiextensions.k8s.io "cephbucketnotifications.ceph.rook.io" deleted
customresourcedefinition.apiextensions.k8s.io "cephbuckettopics.ceph.rook.io" deleted
customresourcedefinition.apiextensions.k8s.io "cephclients.ceph.rook.io" deleted
customresourcedefinition.apiextensions.k8s.io "cephclusters.ceph.rook.io" deleted
customresourcedefinition.apiextensions.k8s.io "cephfilesystemmirrors.ceph.rook.io" deleted
customresourcedefinition.apiextensions.k8s.io "cephfilesystems.ceph.rook.io" deleted
customresourcedefinition.apiextensions.k8s.io "cephfilesystemsubvolumegroups.ceph.rook.io" deleted
customresourcedefinition.apiextensions.k8s.io "cephnfses.ceph.rook.io" deleted
customresourcedefinition.apiextensions.k8s.io "cephobjectrealms.ceph.rook.io" deleted
customresourcedefinition.apiextensions.k8s.io "cephobjectstores.ceph.rook.io" deleted
customresourcedefinition.apiextensions.k8s.io "cephobjectstoreusers.ceph.rook.io" deleted
customresourcedefinition.apiextensions.k8s.io "cephobjectzonegroups.ceph.rook.io" deleted
customresourcedefinition.apiextensions.k8s.io "cephobjectzones.ceph.rook.io" deleted
customresourcedefinition.apiextensions.k8s.io "cephrbdmirrors.ceph.rook.io" deleted
customresourcedefinition.apiextensions.k8s.io "objectbucketclaims.objectbucket.io" deleted
customresourcedefinition.apiextensions.k8s.io "objectbuckets.objectbucket.io" deleted
```
### Talos Linux Rook Metadata Removal
If the Rook Operator is cleanly removed following the above process, the node metadata and disks should be clean and ready to be re-used.
In the case of an unclean cluster removal, there may be still a few instances of metadata stored on the system disk, as well as the partition information on the storage disks.
First the node metadata needs to be removed, make sure to update the `nodeName` with the actual name of a storage node that needs cleaning, and `path` with the Rook configuration `dataDirHostPath` set when installing the chart.
The following will need to be repeated for each node used in the Rook Ceph cluster.
```shell
$ cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: disk-clean
spec:
restartPolicy: Never
nodeName: <storage-node-name>
volumes:
- name: rook-data-dir
hostPath:
path: <dataDirHostPath>
containers:
- name: disk-clean
image: busybox
securityContext:
privileged: true
volumeMounts:
- name: rook-data-dir
mountPath: /node/rook-data
command: ["/bin/sh", "-c", "rm -rf /node/rook-data/*"]
EOF
pod/disk-clean created
$ kubectl wait --timeout=900s --for=jsonpath='{.status.phase}=Succeeded' pod disk-clean
pod/disk-clean condition met
$ kubectl delete pod disk-clean
pod "disk-clean" deleted
```
Lastly, the disks themselves need the partition and filesystem data wiped before they can be reused.
Again, the following as to be repeated for each node **and** disk used in the Rook Ceph cluster, updating `nodeName` and `of=` in the `command` as needed.
```shell
$ cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: disk-wipe
spec:
restartPolicy: Never
nodeName: <storage-node-name>
containers:
- name: disk-wipe
image: busybox
securityContext:
privileged: true
command: ["/bin/sh", "-c", "dd if=/dev/zero bs=1M count=100 oflag=direct of=<device>"]
EOF
pod/disk-wipe created
$ kubectl wait --timeout=900s --for=jsonpath='{.status.phase}=Succeeded' pod disk-wipe
pod/disk-wipe condition met
$ kubectl delete pod disk-clean
pod "disk-wipe" deleted
```

View File

@ -0,0 +1,47 @@
---
title: "Cluster Endpoint"
description: "How to explicitly set up an endpoint for the cluster API"
alises:
- ../../guides/configuring-the-cluster-endpoint
---
In this section, we will step through the configuration of a Talos based Kubernetes cluster.
There are three major components we will configure:
- `apid` and `talosctl`
- the master nodes
- the worker nodes
Talos enforces a high level of security by using mutual TLS for authentication and authorization.
We recommend that the configuration of Talos be performed by a cluster owner.
A cluster owner should be a person of authority within an organization, perhaps a director, manager, or senior member of a team.
They are responsible for storing the root CA, and distributing the PKI for authorized cluster administrators.
### Recommended settings
Talos runs great out of the box, but if you tweak some minor settings it will make your life
a lot easier in the future.
This is not a requirement, but rather a document to explain some key settings.
#### Endpoint
To configure the `talosctl` endpoint, it is recommended you use a resolvable DNS name.
This way, if you decide to upgrade to a multi-controlplane cluster you only have to add the ip address to the hostname configuration.
The configuration can either be done on a Loadbalancer, or simply trough DNS.
For example:
> This is in the config file for the cluster e.g. controlplane.yaml and worker.yaml.
> for more details, please see: [v1alpha1 endpoint configuration]({{< relref "../../reference/configuration#controlplaneconfig" >}})
```yaml
.....
cluster:
controlPlane:
endpoint: https://endpoint.example.local:6443
.....
```
If you have a DNS name as the endpoint, you can upgrade your talos cluster with multiple controlplanes in the future (if you don't have a multi-controlplane setup from the start)
Using a DNS name generates the corresponding Certificates (Kubernetes and Talos) for the correct hostname.

View File

@ -0,0 +1,45 @@
---
title: "Deploying Metrics Server"
description: "In this guide you will learn how to set up metrics-server."
aliases:
- ../../guides/deploy-metrics-server
---
Metrics Server enables use of the [Horizontal Pod Autoscaler](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/) and [Vertical Pod Autoscaler](https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler).
It does this by gathering metrics data from the kubelets in a cluster.
By default, the certificates in use by the kubelets will not be recognized by metrics-server.
This can be solved by either configuring metrics-server to do no validation of the TLS certificates, or by modifying the kubelet configuration to rotate its certificates and use ones that will be recognized by metrics-server.
## Node Configuration
To enable kubelet certificate rotation, all nodes should have the following Machine Config snippet:
```yaml
machine:
kubelet:
extraArgs:
rotate-server-certificates: true
```
## Install During Bootstrap
We will want to ensure that new certificates for the kubelets are approved automatically.
This can easily be done with the [Kubelet Serving Certificate Approver](https://github.com/alex1989hu/kubelet-serving-cert-approver), which will automatically approve the Certificate Signing Requests generated by the kubelets.
We can have Kubelet Serving Certificate Approver and metrics-server installed on the cluster automatically during bootstrap by adding the following snippet to the Cluster Config of the node that will be handling the bootstrap process:
```yaml
cluster:
extraManifests:
- https://raw.githubusercontent.com/alex1989hu/kubelet-serving-cert-approver/main/deploy/standalone-install.yaml
- https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
```
## Install After Bootstrap
If you choose not to use `extraManifests` to install Kubelet Serving Certificate Approver and metrics-server during bootstrap, you can install them once the cluster is online using `kubectl`:
```sh
kubectl apply -f https://raw.githubusercontent.com/alex1989hu/kubelet-serving-cert-approver/main/deploy/standalone-install.yaml
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
```

View File

@ -0,0 +1,118 @@
---
title: "Discovery"
description: "How to use Talos Linux cluster discovery"
aliases:
- ../../guides/discovery
---
## Video Walkthrough
To see a live demo of Cluster Discovery, see the video below:
<iframe width="560" height="315" src="https://www.youtube.com/embed/GCBTrHhjawY" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
## Registries
Peers are aggregated from a number of optional registries.
By default, Talos will use the `kubernetes` and `service` registries.
Either one can be disabled.
To disable a registry, set `disabled` to `true` (this option is the same for all registries):
For example, to disable the `service` registry:
```yaml
cluster:
discovery:
enabled: true
registries:
service:
disabled: true
```
Disabling all registries effectively disables member discovery altogether.
> Talos supports the `kubernetes` and `service` registries.
`Kubernetes` registry uses Kubernetes `Node` resource data and additional Talos annotations:
```sh
$ kubectl describe node <nodename>
Annotations: cluster.talos.dev/node-id: Utoh3O0ZneV0kT2IUBrh7TgdouRcUW2yzaaMl4VXnCd
networking.talos.dev/assigned-prefixes: 10.244.0.0/32,10.244.0.1/24
networking.talos.dev/self-ips: 172.20.0.2,fd83:b1f7:fcb5:2802:8c13:71ff:feaf:7c94
...
```
`Service` registry uses external [Discovery Service]({{< relref "../../learn-more/discovery/" >}}) to exchange encrypted information about cluster members.
## Resource Definitions
Talos provides seven resources that can be used to introspect the new discovery and KubeSpan features.
### Discovery
#### Identities
The node's unique identity (base62 encoded random 32 bytes) can be obtained with:
> Note: Using base62 allows the ID to be URL encoded without having to use the ambiguous URL-encoding version of base64.
```sh
$ talosctl get identities -o yaml
...
spec:
nodeId: Utoh3O0ZneV0kT2IUBrh7TgdouRcUW2yzaaMl4VXnCd
```
Node identity is used as the unique `Affiliate` identifier.
Node identity resource is preserved in the [STATE]({{< relref "../../learn-more/architecture/#file-system-partitions" >}}) partition in `node-identity.yaml` file.
Node identity is preserved across reboots and upgrades, but it is regenerated if the node is reset (wiped).
#### Affiliates
An affiliate is a proposed member attributed to the fact that the node has the same cluster ID and secret.
```sh
$ talosctl get affiliates
ID VERSION HOSTNAME MACHINE TYPE ADDRESSES
2VfX3nu67ZtZPl57IdJrU87BMjVWkSBJiL9ulP9TCnF 2 talos-default-master-2 controlplane ["172.20.0.3","fd83:b1f7:fcb5:2802:986b:7eff:fec5:889d"]
6EVq8RHIne03LeZiJ60WsJcoQOtttw1ejvTS6SOBzhUA 2 talos-default-worker-1 worker ["172.20.0.5","fd83:b1f7:fcb5:2802:cc80:3dff:fece:d89d"]
NVtfu1bT1QjhNq5xJFUZl8f8I8LOCnnpGrZfPpdN9WlB 2 talos-default-worker-2 worker ["172.20.0.6","fd83:b1f7:fcb5:2802:2805:fbff:fe80:5ed2"]
Utoh3O0ZneV0kT2IUBrh7TgdouRcUW2yzaaMl4VXnCd 4 talos-default-master-1 controlplane ["172.20.0.2","fd83:b1f7:fcb5:2802:8c13:71ff:feaf:7c94"]
b3DebkPaCRLTLLWaeRF1ejGaR0lK3m79jRJcPn0mfA6C 2 talos-default-master-3 controlplane ["172.20.0.4","fd83:b1f7:fcb5:2802:248f:1fff:fe5c:c3f"]
```
One of the `Affiliates` with the `ID` matching node identity is populated from the node data, other `Affiliates` are pulled from the registries.
Enabled discovery registries run in parallel and discovered data is merged to build the list presented above.
Details about data coming from each registry can be queried from the `cluster-raw` namespace:
```sh
$ talosctl get affiliates --namespace=cluster-raw
ID VERSION HOSTNAME MACHINE TYPE ADDRESSES
k8s/2VfX3nu67ZtZPl57IdJrU87BMjVWkSBJiL9ulP9TCnF 3 talos-default-master-2 controlplane ["172.20.0.3","fd83:b1f7:fcb5:2802:986b:7eff:fec5:889d"]
k8s/6EVq8RHIne03LeZiJ60WsJcoQOtttw1ejvTS6SOBzhUA 2 talos-default-worker-1 worker ["172.20.0.5","fd83:b1f7:fcb5:2802:cc80:3dff:fece:d89d"]
k8s/NVtfu1bT1QjhNq5xJFUZl8f8I8LOCnnpGrZfPpdN9WlB 2 talos-default-worker-2 worker ["172.20.0.6","fd83:b1f7:fcb5:2802:2805:fbff:fe80:5ed2"]
k8s/b3DebkPaCRLTLLWaeRF1ejGaR0lK3m79jRJcPn0mfA6C 3 talos-default-master-3 controlplane ["172.20.0.4","fd83:b1f7:fcb5:2802:248f:1fff:fe5c:c3f"]
service/2VfX3nu67ZtZPl57IdJrU87BMjVWkSBJiL9ulP9TCnF 23 talos-default-master-2 controlplane ["172.20.0.3","fd83:b1f7:fcb5:2802:986b:7eff:fec5:889d"]
service/6EVq8RHIne03LeZiJ60WsJcoQOtttw1ejvTS6SOBzhUA 26 talos-default-worker-1 worker ["172.20.0.5","fd83:b1f7:fcb5:2802:cc80:3dff:fece:d89d"]
service/NVtfu1bT1QjhNq5xJFUZl8f8I8LOCnnpGrZfPpdN9WlB 20 talos-default-worker-2 worker ["172.20.0.6","fd83:b1f7:fcb5:2802:2805:fbff:fe80:5ed2"]
service/b3DebkPaCRLTLLWaeRF1ejGaR0lK3m79jRJcPn0mfA6C 14 talos-default-master-3 controlplane ["172.20.0.4","fd83:b1f7:fcb5:2802:248f:1fff:fe5c:c3f"]
```
Each `Affiliate` ID is prefixed with `k8s/` for data coming from the Kubernetes registry and with `service/` for data coming from the discovery service.
#### Members
A member is an affiliate that has been approved to join the cluster.
The members of the cluster can be obtained with:
```sh
$ talosctl get members
ID VERSION HOSTNAME MACHINE TYPE OS ADDRESSES
talos-default-master-1 2 talos-default-master-1 controlplane Talos ({{< release >}}) ["172.20.0.2","fd83:b1f7:fcb5:2802:8c13:71ff:feaf:7c94"]
talos-default-master-2 1 talos-default-master-2 controlplane Talos ({{< release >}}) ["172.20.0.3","fd83:b1f7:fcb5:2802:986b:7eff:fec5:889d"]
talos-default-master-3 1 talos-default-master-3 controlplane Talos ({{< release >}}) ["172.20.0.4","fd83:b1f7:fcb5:2802:248f:1fff:fe5c:c3f"]
talos-default-worker-1 1 talos-default-worker-1 worker Talos ({{< release >}}) ["172.20.0.5","fd83:b1f7:fcb5:2802:cc80:3dff:fece:d89d"]
talos-default-worker-2 1 talos-default-worker-2 worker Talos ({{< release >}}) ["172.20.0.6","fd83:b1f7:fcb5:2802:2805:fbff:fe80:5ed2"]
```

View File

@ -0,0 +1,181 @@
---
title: "Pod Security"
description: "Enabling Pod Security Admission plugin to configure Pod Security Standards."
aliases:
- ../../guides/pod-security
---
Kubernetes deprecated [Pod Security Policy](https://kubernetes.io/docs/concepts/policy/pod-security-policy/) as of v1.21, and it is
going to be removed in v1.25.
Pod Security Policy was replaced with [Pod Security Admission](https://kubernetes.io/docs/concepts/security/pod-security-admission/).
Pod Security Admission is alpha in v1.22 (requires a feature gate) and beta in v1.23 (enabled by default).
In this guide we are going to enable and configure Pod Security Admission in Talos.
## Configuration
Prepare the following machine configuration patch and store it in the `pod-security-patch.yaml`:
```yaml
- op: add
path: /cluster/apiServer/admissionControl
value:
- name: PodSecurity
configuration:
apiVersion: pod-security.admission.config.k8s.io/v1alpha1
kind: PodSecurityConfiguration
defaults:
enforce: "baseline"
enforce-version: "latest"
audit: "restricted"
audit-version: "latest"
warn: "restricted"
warn-version: "latest"
exemptions:
usernames: []
runtimeClasses: []
namespaces: [kube-system]
```
This is a cluster-wide configuration for the Pod Security Admission plugin:
* by default `baseline` [Pod Security Standard](https://kubernetes.io/docs/concepts/security/pod-security-standards/) profile is enforced
* more strict `restricted` profile is not enforced, but API server warns about found issues
Generate Talos machine configuration applying the patch above:
```shell
talosctl gen config cluster1 https://<IP>:6443/ --config-patch-control-plane @../pod-security-patch.yaml
```
Deploy Talos using the generated machine configuration.
Verify current admission plugin configuration with:
```shell
$ talosctl get kubernetescontrolplaneconfigs apiserver-admission-control -o yaml
node: 172.20.0.2
metadata:
namespace: config
type: KubernetesControlPlaneConfigs.config.talos.dev
id: apiserver-admission-control
version: 1
owner: config.K8sControlPlaneController
phase: running
created: 2022-02-22T20:28:21Z
updated: 2022-02-22T20:28:21Z
spec:
config:
- name: PodSecurity
configuration:
apiVersion: pod-security.admission.config.k8s.io/v1alpha1
defaults:
audit: restricted
audit-version: latest
enforce: baseline
enforce-version: latest
warn: restricted
warn-version: latest
exemptions:
namespaces:
- kube-system
runtimeClasses: []
usernames: []
kind: PodSecurityConfiguration
```
## Usage
Create a deployment that satisfies the `baseline` policy but gives warnings on `restricted` policy:
```shell
$ kubectl create deployment nginx --image=nginx
Warning: would violate PodSecurity "restricted:latest": allowPrivilegeEscalation != false (container "nginx" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "nginx" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "nginx" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "nginx" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
deployment.apps/nginx created
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-85b98978db-j68l8 1/1 Running 0 2m3s
```
Create a daemonset which fails to meet requirements of the `baseline` policy:
```yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
labels:
app: debug-container
name: debug-container
namespace: default
spec:
revisionHistoryLimit: 10
selector:
matchLabels:
app: debug-container
template:
metadata:
creationTimestamp: null
labels:
app: debug-container
spec:
containers:
- args:
- "360000"
command:
- /bin/sleep
image: ubuntu:latest
imagePullPolicy: IfNotPresent
name: debug-container
resources: {}
securityContext:
privileged: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirstWithHostNet
hostIPC: true
hostPID: true
hostNetwork: true
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
updateStrategy:
rollingUpdate:
maxSurge: 0
maxUnavailable: 1
type: RollingUpdate
```
```shell
$ kubectl apply -f debug.yaml
Warning: would violate PodSecurity "restricted:latest": host namespaces (hostNetwork=true, hostPID=true, hostIPC=true), privileged (container "debug-container" must not set securityContext.privileged=true), allowPrivilegeEscalation != false (container "debug-container" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "debug-container" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "debug-container" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "debug-container" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
daemonset.apps/debug-container created
```
Daemonset `debug-container` gets created, but no pods are scheduled:
```shell
$ kubectl get ds
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
debug-container 0 0 0 0 0 <none> 34s
```
Pod Security Admission plugin errors are in the daemonset events:
```shell
$ kubectl describe ds debug-container
...
Warning FailedCreate 92s daemonset-controller Error creating: pods "debug-container-kwzdj" is forbidden: violates PodSecurity "baseline:latest": host namespaces (hostNetwork=true, hostPID=true, hostIPC=true), privileged (container "debug-container" must not set securityContext.privileged=true)
```
Pod Security Admission configuration can also be overridden on a namespace level:
```shell
$ kubectl label ns default pod-security.kubernetes.io/enforce=privileged
namespace/default labeled
$ kubectl get ds
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
debug-container 2 2 0 2 0 <none> 4s
```
As enforce policy was updated to the `privileged` for the `default` namespace, `debug-container` is now successfully running.

View File

@ -0,0 +1,141 @@
---
title: "Storage"
description: "Setting up storage for a Kubernetes cluster"
aliases:
- ../../guides/storage
---
In Kubernetes, using storage in the right way is well-facilitated by the API.
However, unless you are running in a major public cloud, that API may not be hooked up to anything.
This frequently sends users down a rabbit hole of researching all the various options for storage backends for their platform, for Kubernetes, and for their workloads.
There are a _lot_ of options out there, and it can be fairly bewildering.
For Talos, we try to limit the options somewhat to make the decision-making easier.
## Public Cloud
If you are running on a major public cloud, use their block storage.
It is easy and automatic.
## Storage Clusters
> **Sidero Labs** recommends having separate disks (apart from the Talos install disk) to be used for storage.
Redundancy, scaling capabilities, reliability, speed, maintenance load, and ease of use are all factors you must consider when managing your own storage.
Running a storage cluster can be a very good choice when managing your own storage, and there are two projects we recommend, depending on your situation.
If you need vast amounts of storage composed of more than a dozen or so disks, we recommend you use Rook to manage Ceph.
Also, if you need _both_ mount-once _and_ mount-many capabilities, Ceph is your answer.
Ceph also bundles in an S3-compatible object store.
The down side of Ceph is that there are a lot of moving parts.
> Please note that _most_ people should _never_ use mount-many semantics.
> NFS is pervasive because it is old and easy, _not_ because it is a good idea.
> While it may seem like a convenience at first, there are all manner of locking, performance, change control, and reliability concerns inherent in _any_ mount-many situation, so we **strongly** recommend you avoid this method.
If your storage needs are small enough to not need Ceph, use Mayastor.
### Rook/Ceph
[Ceph](https://ceph.io) is the grandfather of open source storage clusters.
It is big, has a lot of pieces, and will do just about anything.
It scales better than almost any other system out there, open source or proprietary, being able to easily add and remove storage over time with no downtime, safely and easily.
It comes bundled with RadosGW, an S3-compatible object store; CephFS, a NFS-like clustered filesystem; and RBD, a block storage system.
With the help of [Rook](https://rook.io), the vast majority of the complexity of Ceph is hidden away by a very robust operator, allowing you to control almost everything about your Ceph cluster from fairly simple Kubernetes CRDs.
So if Ceph is so great, why not use it for everything?
Ceph can be rather slow for small clusters.
It relies heavily on CPUs and massive parallelisation to provide good cluster performance, so if you don't have much of those dedicated to Ceph, it is not going to be well-optimised for you.
Also, if your cluster is small, just running Ceph may eat up a significant amount of the resources you have available.
Troubleshooting Ceph can be difficult if you do not understand its architecture.
There are lots of acronyms and the documentation assumes a fair level of knowledge.
There are very good tools for inspection and debugging, but this is still frequently seen as a concern.
### Mayastor
[Mayastor](https://github.com/openebs/Mayastor) is an OpenEBS project built in Rust utilising the modern NVMEoF system.
(Despite the name, Mayastor does _not_ require you to have NVME drives.)
It is fast and lean but still cluster-oriented and cloud native.
Unlike most of the other OpenEBS project, it is _not_ built on the ancient iSCSI system.
Unlike Ceph, Mayastor is _just_ a block store.
It focuses on block storage and does it well.
It is much less complicated to set up than Ceph, but you probably wouldn't want to use it for more than a few dozen disks.
Mayastor is new, maybe _too_ new.
If you're looking for something well-tested and battle-hardened, this is not it.
However, if you're looking for something lean, future-oriented, and simpler than Ceph, it might be a great choice.
#### Video Walkthrough
To see a live demo of this section, see the video below:
<iframe width="560" height="315" src="https://www.youtube.com/embed/q86Kidk81xE" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
#### Prep Nodes
Either during initial cluster creation or on running worker nodes, several machine config values should be edited.
(This information is gathered from the Mayastor [documentation](https://mayastor.gitbook.io/introduction/quickstart/preparing-the-cluster).)
We need to set the `vm.nr_hugepages` sysctl and add `openebs.io/engine=mayastor` labels to the nodes which are meant to be storage nodes.
This can be done with `talosctl patch machineconfig` or via config patches during `talosctl gen config`.
Some examples are shown below: modify as needed.
Using gen config
```bash
talosctl gen config my-cluster https://mycluster.local:6443 --config-patch '[{"op": "add", "path": "/machine/sysctls", "value": {"vm.nr_hugepages": "1024"}}, {"op": "add", "path": "/machine/kubelet/extraArgs", "value": {"node-labels": "openebs.io/engine=mayastor"}}]'
```
Patching an existing node
```bash
talosctl patch --mode=no-reboot machineconfig -n <node ip> --patch '[{"op": "add", "path": "/machine/sysctls", "value": {"vm.nr_hugepages": "1024"}}, {"op": "add", "path": "/machine/kubelet/extraArgs", "value": {"node-labels": "openebs.io/engine=mayastor"}}]'
```
> Note: If you are adding/updating the `vm.nr_hugepages` on a node which already had the `openebs.io/engine=mayastor` label set, you'd need to restart kubelet so that it picks up the new value, by issuing the following command
```bash
talosctl -n <node ip> service kubelet restart
```
#### Deploy Mayastor
Continue setting up [Mayastor](https://mayastor.gitbook.io/introduction/quickstart/deploy-mayastor) using the official documentation.
## NFS
NFS is an old pack animal long past its prime.
NFS is slow, has all kinds of bottlenecks involving contention, distributed locking, single points of service, and more.
However, it is supported by a wide variety of systems.
You don't want to use it unless you have to, but unfortunately, that "have to" is too frequent.
The NFS client is part of the [`kubelet` image](https://github.com/talos-systems/kubelet) maintained by the Talos team.
This means that the version installed in your running `kubelet` is the version of NFS supported by Talos.
You can reduce some of the contention problems by parceling Persistent Volumes from separate underlying directories.
## Object storage
Ceph comes with an S3-compatible object store, but there are other options, as
well.
These can often be built on top of other storage backends.
For instance, you may have your block storage running with Mayastor but assign a
Pod a large Persistent Volume to serve your object store.
One of the most popular open source add-on object stores is [MinIO](https://min.io/).
## Others (iSCSI)
The most common remaining systems involve iSCSI in one form or another.
These include the original OpenEBS, Rancher's Longhorn, and many proprietary systems.
Unfortunately, Talos does _not_ support iSCSI-based systems.
iSCSI in Linux is facilitated by [open-iscsi](https://github.com/open-iscsi/open-iscsi).
This system was designed long before containers caught on, and it is not well
suited to the task, especially when coupled with a read-only host operating
system.
One day, we hope to work out a solution for facilitating iSCSI-based systems, but this is not yet available.

View File

@ -0,0 +1,5 @@
---
title: "Network"
weight: 20
description: "Managing the Kubernetes cluster networking"
---

View File

@ -0,0 +1,208 @@
---
title: "Deploying Cilium CNI"
description: "In this guide you will learn how to set up Cilium CNI on Talos."
aliases:
- ../../guides/deploying-cilium
---
From v1.9 onwards Cilium does no longer provide a one-liner install manifest that can be used to install Cilium on a node via `kubectl apply -f` or passing it in as an extra url in the `urls` part in the Talos machine configuration.
> Installing Cilium the new way via the `cilium` cli is broken, so we'll be using `helm` to install Cilium.
For more information: [Install with CLI fails, works with Helm](https://github.com/cilium/cilium-cli/issues/505)
Refer to [Installing with Helm](https://docs.cilium.io/en/v1.11/gettingstarted/k8s-install-helm/) for more information.
First we'll need to add the helm repo for Cilium.
```bash
helm repo add cilium https://helm.cilium.io/
helm repo update
```
This documentation will outline installing Cilium CNI v1.11.2 on Talos in four different ways.
Adhering to Talos principles we'll deploy Cilium with IPAM mode set to Kubernetes.
Each method can either install Cilium using kube proxy (default) or without: [Kubernetes Without kube-proxy](https://docs.cilium.io/en/v1.11/gettingstarted/kubeproxy-free/)
## Machine config preparation
When generating the machine config for a node set the CNI to none.
For example using a config patch:
```bash
talosctl gen config \
my-cluster https://mycluster.local:6443 \
--config-patch '[{"op":"add", "path": "/cluster/network/cni", "value": {"name": "none"}}]'
```
Or if you want to deploy Cilium in strict mode without kube-proxy, you also need to disable kube proxy:
```bash
talosctl gen config \
my-cluster https://mycluster.local:6443 \
--config-patch '[{"op": "add", "path": "/cluster/proxy", "value": {"disabled": true}}, {"op":"add", "path": "/cluster/network/cni", "value": {"name": "none"}}]'
```
## Method 1: Helm install
After applying the machine config and bootstrapping Talos will appear to hang on phase 18/19 with the message: retrying error: node not ready.
This happens because nodes in Kubernetes are only marked as ready once the CNI is up.
As there is no CNI defined, the boot process is pending and will reboot the node to retry after 10 minutes, this is expected behavior.
During this window you can install Cilium manually by running the following:
```bash
helm install cilium cilium/cilium \
--version 1.11.2 \
--namespace kube-system \
--set ipam.mode=kubernetes
```
Or if you want to deploy Cilium in strict mode without kube-proxy, also set some extra paramaters:
```bash
export KUBERNETES_API_SERVER_ADDRESS=<>
export KUBERNETES_API_SERVER_PORT=6443
helm install cilium cilium/cilium \
--version 1.11.2 \
--namespace kube-system \
--set ipam.mode=kubernetes \
--set kubeProxyReplacement=strict \
--set k8sServiceHost="${KUBERNETES_API_SERVER_ADDRESS}" \
--set k8sServicePort="${KUBERNETES_API_SERVER_PORT}"
```
After Cilium is installed the boot process should continue and complete successfully.
## Method 2: Helm manifests install
Instead of directly installing Cilium you can instead first generate the manifest and then apply it:
```bash
helm template cilium cilium/cilium \
--version 1.11.2 \
--namespace kube-system \
--set ipam.mode=kubernetes > cilium.yaml
kubectl apply -f cilium.yaml
```
Without kube-proxy:
```bash
export KUBERNETES_API_SERVER_ADDRESS=<>
export KUBERNETES_API_SERVER_PORT=6443
helm template cilium cilium/cilium \
--version 1.11.2 \
--namespace kube-system \
--set ipam.mode=kubernetes \
--set kubeProxyReplacement=strict \
--set k8sServiceHost="${KUBERNETES_API_SERVER_ADDRESS}" \
--set k8sServicePort="${KUBERNETES_API_SERVER_PORT}" > cilium.yaml
kubectl apply -f cilium.yaml
```
## Method 3: Helm manifests hosted install
After generating `cilium.yaml` using `helm template`, instead of applying this manifest directly during the Talos boot window (before the reboot timeout).
You can also host this file somewhere and patch the machine config to apply this manifest automatically during bootstrap.
To do this patch your machine configuration to include this config instead of the above:
```bash
talosctl gen config \
my-cluster https://mycluster.local:6443 \
--config-patch '[{"op":"add", "path": "/cluster/network/cni", "value": {"name": "custom", "urls": ["https://server.yourdomain.tld/some/path/cilium.yaml"]}}]'
```
Resulting in a config that look like this:
``` yaml
name: custom # Name of CNI to use.
# URLs containing manifests to apply for the CNI.
urls:
- https://server.yourdomain.tld/some/path/cilium.yaml
```
However, beware of the fact that the helm generated Cilium manifest contains sensitive key material.
As such you should definitely not host this somewhere publicly accessible.
## Method 4: Helm manifests inline install
A more secure option would be to include the `helm template` output manifest inside the machine configuration.
The machine config should be generated with CNI set to `none`
```bash
talosctl gen config \
my-cluster https://mycluster.local:6443 \
--config-patch '[{"op":"add", "path": "/cluster/network/cni", "value": {"name": "none"}}]'
```
if deploying Cilium with `kube-proxy` disabled, you can also include the following:
```bash
talosctl gen config \
my-cluster https://mycluster.local:6443 \
--config-patch '[{"op": "add", "path": "/cluster/proxy", "value": {"disabled": true}}, {"op":"add", "path": "/cluster/network/cni", "value": {"name": "none"}}]'
```
To do so patch this into your machine configuration:
``` yaml
inlineManifests:
- name: cilium
contents: |
--
# Source: cilium/templates/cilium-agent/serviceaccount.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: "cilium"
namespace: kube-system
---
# Source: cilium/templates/cilium-operator/serviceaccount.yaml
apiVersion: v1
kind: ServiceAccount
-> Your cilium.yaml file will be pretty long....
```
This will install the Cilium manifests at just the right time during bootstrap.
Beware though:
- Changing the namespace when templating with Helm does not generate a manifest containing the yaml to create that namespace.
As the inline manifest is processed from top to bottom make sure to manually put the namespace yaml at the start of the inline manifest.
- Only add the Cilium inline manifest to the control plane nodes machine configuration.
- Make sure all control plane nodes have an identical configuration.
- If you delete any of the generated resources they will be restored whenever a control plane node reboots.
- As a safety messure Talos only creates missing resources from inline manifests, it never deletes or updates anything.
- If you need to update a manifest make sure to first edit all control plane machine configurations and then run `talosctl upgrade-k8s` as it will take care of updating inline manifests.
## Known issues
- Currently there is an interaction between a Kubespan enabled Talos cluster and Cilium that results in the cluster going down during bootstrap after applying the Cilium manifests.
For more details: [Kubespan and Cilium compatiblity: etcd is failing](https://github.com/siderolabs/talos/issues/4836)
- There are some gotchas when using Talos and Cilium on the Google cloud platform when using internal load balancers.
For more details: [GCP ILB support / support scope local routes to be configured](https://github.com/siderolabs/talos/issues/4109)
- Some kernel values changed by kube-proxy are not set to good defaults when running the cilium kernel-proxy alternative.
For more details: [Kernel default values (sysctl)](https://github.com/siderolabs/talos/issues/4654)
## Other things to know
- Talos has full kernel module support for eBPF, See:
- [Cilium System Requirements](https://docs.cilium.io/en/v1.11/operations/system_requirements/)
- [Talos Kernel Config AMD64](https://github.com/talos-systems/pkgs/blob/master/kernel/build/config-amd64)
- [Talos Kernel Config ARM64](https://github.com/talos-systems/pkgs/blob/master/kernel/build/config-arm64)
- Talos also includes the modules:
- `CONFIG_NETFILTER_XT_TARGET_TPROXY=m`
- `CONFIG_NETFILTER_XT_TARGET_CT=m`
- `CONFIG_NETFILTER_XT_MATCH_MARK=m`
- `CONFIG_NETFILTER_XT_MATCH_SOCKET=m`
This allows you to set `--set enableXTSocketFallback=false` on the helm install/template command preventing Cilium from disabling the `ip_early_demux` kernel feature.
This will win back some performance.

View File

@ -0,0 +1,190 @@
---
title: "KubeSpan"
description: "Learn to use KubeSpan to connect Talos Linux machines securely across networks."
aliases:
- ../../guides/kubespan
---
KubeSpan is a feature of Talos that automates the setup and maintenance of a full mesh [WireGuard](https://www.wireguard.com) network for your cluster, giving you the ability to operate hybrid Kubernetes clusters that can span the edge, datacenter, and cloud.
Management of keys and discovery of peers can be completely automated for a zero-touch experience that makes it simple and easy to create hybrid clusters.
KubeSpan consists of client code in Talos Linux, as well as a discovery service that enables clients to securely find each other.
Sidero Labs operates a free Discovery Service, but the discovery service may be operated by your organization and can be [downloaded here](https://github.com/siderolabs/discovery-service).
## Video Walkthrough
To learn more about KubeSpan, see the video below:
<iframe width="560" height="315" src="https://www.youtube.com/embed/lPl3u9BN7j4" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
To see a live demo of KubeSpan, see one the videos below:
<iframe width="560" height="315" src="https://www.youtube.com/embed/RRk8gYzRHJg" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
<iframe width="560" height="315" src="https://www.youtube.com/embed/sBKIFLhC9MQ" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
## Enabling
### Creating a New Cluster
To generate configuration files for a new cluster, we can use the `--with-kubespan` flag in `talosctl gen config`.
This will enable peer discovery and KubeSpan.
```yaml
...
# Provides machine specific network configuration options.
network:
# Configures KubeSpan feature.
kubespan:
enabled: true # Enable the KubeSpan feature.
...
# Configures cluster member discovery.
discovery:
enabled: true # Enable the cluster membership discovery feature.
# Configure registries used for cluster member discovery.
registries:
# Kubernetes registry uses Kubernetes API server to discover cluster members and stores additional information
kubernetes: {}
# Service registry is using an external service to push and pull information about cluster members.
service: {}
...
# Provides cluster specific configuration options.
cluster:
id: yui150Ogam0pdQoNZS2lZR-ihi8EWxNM17bZPktJKKE= # Globally unique identifier for this cluster.
secret: dAmFcyNmDXusqnTSkPJrsgLJ38W8oEEXGZKM0x6Orpc= # Shared secret of cluster.
```
> The default discovery service is an external service hosted for free by Sidero Labs.
> The default value is `https://discovery.talos.dev/`.
> Contact Sidero Labs if you need to run this service privately.
### Upgrading an Existing Cluster
In order to enable KubeSpan for an existing cluster, upgrade to the latest version of Talos ({{< release >}}).
Once your cluster is upgraded, the configuration of each node must contain the globally unique identifier, the shared secret for the cluster, and have KubeSpan and discovery enabled.
> Note: Discovery can be used without KubeSpan, but KubeSpan requires at least one discovery registry.
#### Talos v0.11 or Less
If you are migrating from Talos v0.11 or less, we need to generate a cluster ID and secret.
To generate an `id`:
```sh
$ openssl rand -base64 32
EUsCYz+oHNuBppS51P9aKSIOyYvIPmbZK944PWgiyMQ=
```
To generate a `secret`:
```sh
$ openssl rand -base64 32
AbdsWjY9i797kGglghKvtGdxCsdllX9CemLq+WGVeaw=
```
Now, update the configuration of each node with the cluster with the generated `id` and `secret`.
You should end up with the addition of something like this (your `id` and `secret` should be different):
```yaml
cluster:
id: EUsCYz+oHNuBppS51P9aKSIOyYvIPmbZK944PWgiyMQ=
secret: AbdsWjY9i797kGglghKvtGdxCsdllX9CemLq+WGVeaw=
```
> Note: This can be applied in immediate mode (no reboot required).
#### Talos v0.12 or More
Enable `kubespan` and `discovery`.
```yaml
machine:
network:
kubespan:
enabled: true
cluster:
discovery:
enabled: true
```
## Resource Definitions
### KubeSpanIdentities
A node's WireGuard identities can be obtained with:
```sh
$ talosctl get kubespanidentities -o yaml
...
spec:
address: fd83:b1f7:fcb5:2802:8c13:71ff:feaf:7c94/128
subnet: fd83:b1f7:fcb5:2802::/64
privateKey: gNoasoKOJzl+/B+uXhvsBVxv81OcVLrlcmQ5jQwZO08=
publicKey: NzW8oeIH5rJyY5lefD9WRoHWWRr/Q6DwsDjMX+xKjT4=
```
Talos automatically configures unique IPv6 address for each node in the cluster-specific IPv6 ULA prefix.
Wireguard private key is generated for the node, private key never leaves the node while public key is published through the cluster discovery.
`KubeSpanIdentity` is persisted across reboots and upgrades in [STATE]({{< relref "../../learn-more/architecture/#file-system-partitions" >}}) partition in the file `kubespan-identity.yaml`.
### KubeSpanPeerSpecs
A node's WireGuard peers can be obtained with:
```sh
$ talosctl get kubespanpeerspecs
ID VERSION LABEL ENDPOINTS
06D9QQOydzKrOL7oeLiqHy9OWE8KtmJzZII2A5/FLFI= 2 talos-default-master-2 ["172.20.0.3:51820"]
THtfKtfNnzJs1nMQKs5IXqK0DFXmM//0WMY+NnaZrhU= 2 talos-default-master-3 ["172.20.0.4:51820"]
nVHu7l13uZyk0AaI1WuzL2/48iG8af4WRv+LWmAax1M= 2 talos-default-worker-2 ["172.20.0.6:51820"]
zXP0QeqRo+CBgDH1uOBiQ8tA+AKEQP9hWkqmkE/oDlc= 2 talos-default-worker-1 ["172.20.0.5:51820"]
```
The peer ID is the Wireguard public key.
`KubeSpanPeerSpecs` are built from the cluster discovery data.
### KubeSpanPeerStatuses
The status of a node's WireGuard peers can be obtained with:
```sh
$ talosctl get kubespanpeerstatuses
ID VERSION LABEL ENDPOINT STATE RX TX
06D9QQOydzKrOL7oeLiqHy9OWE8KtmJzZII2A5/FLFI= 63 talos-default-master-2 172.20.0.3:51820 up 15043220 17869488
THtfKtfNnzJs1nMQKs5IXqK0DFXmM//0WMY+NnaZrhU= 62 talos-default-master-3 172.20.0.4:51820 up 14573208 18157680
nVHu7l13uZyk0AaI1WuzL2/48iG8af4WRv+LWmAax1M= 60 talos-default-worker-2 172.20.0.6:51820 up 130072 46888
zXP0QeqRo+CBgDH1uOBiQ8tA+AKEQP9hWkqmkE/oDlc= 60 talos-default-worker-1 172.20.0.5:51820 up 130044 46556
```
KubeSpan peer status includes following information:
* the actual endpoint used for peer communication
* link state:
* `unknown`: the endpoint was just changed, link state is not known yet
* `up`: there is a recent handshake from the peer
* `down`: there is no handshake from the peer
* number of bytes sent/received over the Wireguard link with the peer
If the connection state goes `down`, Talos will be cycling through the available endpoints until it finds the one which works.
Peer status information is updated every 30 seconds.
### KubeSpanEndpoints
A node's WireGuard endpoints (peer addresses) can be obtained with:
```sh
$ talosctl get kubespanendpoints
ID VERSION ENDPOINT AFFILIATE ID
06D9QQOydzKrOL7oeLiqHy9OWE8KtmJzZII2A5/FLFI= 1 172.20.0.3:51820 2VfX3nu67ZtZPl57IdJrU87BMjVWkSBJiL9ulP9TCnF
THtfKtfNnzJs1nMQKs5IXqK0DFXmM//0WMY+NnaZrhU= 1 172.20.0.4:51820 b3DebkPaCRLTLLWaeRF1ejGaR0lK3m79jRJcPn0mfA6C
nVHu7l13uZyk0AaI1WuzL2/48iG8af4WRv+LWmAax1M= 1 172.20.0.6:51820 NVtfu1bT1QjhNq5xJFUZl8f8I8LOCnnpGrZfPpdN9WlB
zXP0QeqRo+CBgDH1uOBiQ8tA+AKEQP9hWkqmkE/oDlc= 1 172.20.0.5:51820 6EVq8RHIne03LeZiJ60WsJcoQOtttw1ejvTS6SOBzhUA
```
The endpoint ID is the base64 encoded WireGuard public key.
The observed endpoints are submitted back to the discovery service (if enabled) so that other peers can try additional endpoints to establish the connection.

View File

@ -0,0 +1,349 @@
---
title: "Upgrading Kubernetes"
description: "Guide on how to upgrade the Kubernetes cluster from Talos Linux."
aliases:
- guides/upgrading-kubernetes
---
This guide covers upgrading Kubernetes on Talos Linux clusters.
For upgrading the Talos Linux operating system, see [Upgrading Talos]({{< relref "../talos-guides/upgrading-talos" >}})
## Video Walkthrough
To see a demo of this process, watch this video:
<iframe width="560" height="315" src="https://www.youtube.com/embed/uOKveKbD8MQ" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
## Automated Kubernetes Upgrade
The recommended method to upgrade Kubernetes is to use the `talosctl upgrade-k8s` command.
This will automatically update the components needed to upgrade Kubernetes safely.
Upgrading Kubernetes is non-disruptive to the cluster workloads.
To trigger a Kubernetes upgrade, issue a command specifiying the version of Kubernetes to ugprade to, such as:
`talosctl --nodes <master node> upgrade-k8s --to {{< k8s_release >}}`
Note that the `--nodes` parameter specifies the control plane node to send the API call to, but all members of the cluster will be upgraded.
To check what will be upgraded you can run `talosctl upgrade-k8s` with the `--dry-run` flag:
```bash
$ talosctl --nodes <master node> upgrade-k8s --to {{< k8s_release >}} --dry-run
WARNING: found resources which are going to be deprecated/migrated in the version {{< k8s_release >}}
RESOURCE COUNT
validatingwebhookconfigurations.v1beta1.admissionregistration.k8s.io 4
mutatingwebhookconfigurations.v1beta1.admissionregistration.k8s.io 3
customresourcedefinitions.v1beta1.apiextensions.k8s.io 25
apiservices.v1beta1.apiregistration.k8s.io 54
leases.v1beta1.coordination.k8s.io 4
automatically detected the lowest Kubernetes version {{< k8s_prev_release >}}
checking for resource APIs to be deprecated in version {{< k8s_release >}}
discovered master nodes ["172.20.0.2" "172.20.0.3" "172.20.0.4"]
discovered worker nodes ["172.20.0.5" "172.20.0.6"]
updating "kube-apiserver" to version "{{< k8s_release >}}"
> "172.20.0.2": starting update
> update kube-apiserver: v{{< k8s_prev_release >}} -> {{< k8s_release >}}
> skipped in dry-run
> "172.20.0.3": starting update
> update kube-apiserver: v{{< k8s_prev_release >}} -> {{< k8s_release >}}
> skipped in dry-run
> "172.20.0.4": starting update
> update kube-apiserver: v{{< k8s_prev_release >}} -> {{< k8s_release >}}
> skipped in dry-run
updating "kube-controller-manager" to version "{{< k8s_release >}}"
> "172.20.0.2": starting update
> update kube-controller-manager: v{{< k8s_prev_release >}} -> {{< k8s_release >}}
> skipped in dry-run
> "172.20.0.3": starting update
<snip>
updating manifests
> apply manifest Secret bootstrap-token-3lb63t
> apply skipped in dry run
> apply manifest ClusterRoleBinding system-bootstrap-approve-node-client-csr
> apply skipped in dry run
<snip>
```
To upgrade Kubernetes from v{{< k8s_prev_release >}} to v{{< k8s_release >}} run:
```bash
$ talosctl --nodes <master node> upgrade-k8s --to {{< k8s_release >}}
automatically detected the lowest Kubernetes version {{< k8s_prev_release >}}
checking for resource APIs to be deprecated in version {{< k8s_release >}}
discovered master nodes ["172.20.0.2" "172.20.0.3" "172.20.0.4"]
discovered worker nodes ["172.20.0.5" "172.20.0.6"]
updating "kube-apiserver" to version "{{< k8s_release >}}"
> "172.20.0.2": starting update
> update kube-apiserver: v{{< k8s_prev_release >}} -> {{< k8s_release >}}
> "172.20.0.2": machine configuration patched
> "172.20.0.2": waiting for API server state pod update
< "172.20.0.2": successfully updated
> "172.20.0.3": starting update
> update kube-apiserver: v{{< k8s_prev_release >}} -> {{< k8s_release >}}
<snip>
```
This command runs in several phases:
1. Every control plane node machine configuration is patched with the new image version for each control plane component.
Talos renders new static pod definitions on the configuration update which is picked up by the kubelet.
The command waits for the change to propagate to the API server state.
2. The command updates the `kube-proxy` daemonset with the new image version.
3. On every node in the cluster, the `kubelet` version is updated.
The command then waits for the `kubelet` service to be restarted and become healthy.
The update is verified by checking the `Node` resource state.
4. Kubernetes bootstrap manifests are re-applied to the cluster.
Updated bootstrap manifests might come with a new Talos version (e.g. CoreDNS version update), or might be the result of machine configuration change.
Note: The `upgrade-k8s` command never deletes any resources from the cluster: they should be deleted manually.
If the command fails for any reason, it can be safely restarted to continue the upgrade process from the moment of the failure.
## Manual Kubernetes Upgrade
Kubernetes can be upgraded manually by following the steps outlined below.
They are equivalent to the steps performed by the `talosctl upgrade-k8s` command.
### Kubeconfig
In order to edit the control plane, you need a working `kubectl` config.
If you don't already have one, you can get one by running:
```bash
talosctl --nodes <master node> kubeconfig
```
### API Server
Patch machine configuration using `talosctl patch` command:
```bash
$ talosctl -n <CONTROL_PLANE_IP_1> patch mc --mode=no-reboot -p '[{"op": "replace", "path": "/cluster/apiServer/image", "value": "k8s.gcr.io/kube-apiserver:v{{< k8s_release >}}"}]'
patched mc at the node 172.20.0.2
```
The JSON patch might need to be adjusted if current machine configuration is missing `.cluster.apiServer.image` key.
Also the machine configuration can be edited manually with `talosctl -n <IP> edit mc --mode=no-reboot`.
Capture the new version of `kube-apiserver` config with:
```bash
$ talosctl -n <CONTROL_PLANE_IP_1> get kcpc kube-apiserver -o yaml
node: 172.20.0.2
metadata:
namespace: config
type: KubernetesControlPlaneConfigs.config.talos.dev
id: kube-apiserver
version: 5
phase: running
spec:
image: k8s.gcr.io/kube-apiserver:v{{< k8s_release >}}
cloudProvider: ""
controlPlaneEndpoint: https://172.20.0.1:6443
etcdServers:
- https://127.0.0.1:2379
localPort: 6443
serviceCIDR: 10.96.0.0/12
extraArgs: {}
extraVolumes: []
```
In this example, the new version is `5`.
Wait for the new pod definition to propagate to the API server state (replace `talos-default-master-1` with the node name):
```bash
$ kubectl get pod -n kube-system -l k8s-app=kube-apiserver --field-selector spec.nodeName=talos-default-master-1 -o jsonpath='{.items[0].metadata.annotations.talos\.dev/config\-version}'
5
```
Check that the pod is running:
```bash
$ kubectl get pod -n kube-system -l k8s-app=kube-apiserver --field-selector spec.nodeName=talos-default-master-1
NAME READY STATUS RESTARTS AGE
kube-apiserver-talos-default-master-1 1/1 Running 0 16m
```
Repeat this process for every control plane node, verifying that state got propagated successfully between each node update.
### Controller Manager
Patch machine configuration using `talosctl patch` command:
```bash
$ talosctl -n <CONTROL_PLANE_IP_1> patch mc --mode=no-reboot -p '[{"op": "replace", "path": "/cluster/controllerManager/image", "value": "k8s.gcr.io/kube-controller-manager:v{{< k8s_release >}}"}]'
patched mc at the node 172.20.0.2
```
The JSON patch might need be adjusted if current machine configuration is missing `.cluster.controllerManager.image` key.
Capture new version of `kube-controller-manager` config with:
```bash
$ talosctl -n <CONTROL_PLANE_IP_1> get kcpc kube-controller-manager -o yaml
node: 172.20.0.2
metadata:
namespace: config
type: KubernetesControlPlaneConfigs.config.talos.dev
id: kube-controller-manager
version: 3
phase: running
spec:
image: k8s.gcr.io/kube-controller-manager:v{{< k8s_release >}}
cloudProvider: ""
podCIDR: 10.244.0.0/16
serviceCIDR: 10.96.0.0/12
extraArgs: {}
extraVolumes: []
```
In this example, new version is `3`.
Wait for the new pod definition to propagate to the API server state (replace `talos-default-master-1` with the node name):
```bash
$ kubectl get pod -n kube-system -l k8s-app=kube-controller-manager --field-selector spec.nodeName=talos-default-master-1 -o jsonpath='{.items[0].metadata.annotations.talos\.dev/config\-version}'
3
```
Check that the pod is running:
```bash
$ kubectl get pod -n kube-system -l k8s-app=kube-controller-manager --field-selector spec.nodeName=talos-default-master-1
NAME READY STATUS RESTARTS AGE
kube-controller-manager-talos-default-master-1 1/1 Running 0 35m
```
Repeat this process for every control plane node, verifying that state propagated successfully between each node update.
### Scheduler
Patch machine configuration using `talosctl patch` command:
```bash
$ talosctl -n <CONTROL_PLANE_IP_1> patch mc --mode=no-reboot -p '[{"op": "replace", "path": "/cluster/scheduler/image", "value": "k8s.gcr.io/kube-scheduler:v{{< k8s_release >}}"}]'
patched mc at the node 172.20.0.2
```
JSON patch might need be adjusted if current machine configuration is missing `.cluster.scheduler.image` key.
Capture new version of `kube-scheduler` config with:
```bash
$ talosctl -n <CONTROL_PLANE_IP_1> get kcpc kube-scheduler -o yaml
node: 172.20.0.2
metadata:
namespace: config
type: KubernetesControlPlaneConfigs.config.talos.dev
id: kube-scheduler
version: 3
phase: running
spec:
image: k8s.gcr.io/kube-scheduler:v{{< k8s_release >}}
extraArgs: {}
extraVolumes: []
```
In this example, new version is `3`.
Wait for the new pod definition to propagate to the API server state (replace `talos-default-master-1` with the node name):
```bash
$ kubectl get pod -n kube-system -l k8s-app=kube-scheduler --field-selector spec.nodeName=talos-default-master-1 -o jsonpath='{.items[0].metadata.annotations.talos\.dev/config\-version}'
3
```
Check that the pod is running:
```bash
$ kubectl get pod -n kube-system -l k8s-app=kube-scheduler --field-selector spec.nodeName=talos-default-master-1
NAME READY STATUS RESTARTS AGE
kube-scheduler-talos-default-master-1 1/1 Running 0 39m
```
Repeat this process for every control plane node, verifying that state got propagated successfully between each node update.
### Proxy
In the proxy's `DaemonSet`, change:
```yaml
kind: DaemonSet
...
spec:
...
template:
...
spec:
containers:
- name: kube-proxy
image: k8s.gcr.io/kube-proxy:v{{< k8s_release >}}
tolerations:
- ...
```
to:
```yaml
kind: DaemonSet
...
spec:
...
template:
...
spec:
containers:
- name: kube-proxy
image: k8s.gcr.io/kube-proxy:v{{< k8s_release >}}
tolerations:
- ...
- key: node-role.kubernetes.io/control-plane
operator: Exists
effect: NoSchedule
```
To edit the `DaemonSet`, run:
```bash
kubectl edit daemonsets -n kube-system kube-proxy
```
### Bootstrap Manifests
Bootstrap manifests can be retrieved in a format which works for `kubectl` with the following command:
```bash
talosctl -n <master IP> get manifests -o yaml | yq eval-all '.spec | .[] | splitDoc' - > manifests.yaml
```
Diff the manifests with the cluster:
```bash
kubectl diff -f manifests.yaml
```
Apply the manifests:
```bash
kubectl apply -f manifests.yaml
```
> Note: if some boostrap resources were removed, they have to be removed from the cluster manually.
### kubelet
For every node, patch machine configuration with new kubelet version, wait for the kubelet to restart with new version:
```bash
$ talosctl -n <IP> patch mc --mode=no-reboot -p '[{"op": "replace", "path": "/machine/kubelet/image", "value": "ghcr.io/siderolabs/kubelet:v{{< k8s_release >}}"}]'
patched mc at the node 172.20.0.2
```
Once `kubelet` restarts with the new configuration, confirm upgrade with `kubectl get nodes <name>`:
```bash
$ kubectl get nodes talos-default-master-1
NAME STATUS ROLES AGE VERSION
talos-default-master-1 Ready control-plane,master 123m v{{< k8s_release >}}
```

View File

@ -0,0 +1,4 @@
---
title: "Learn More"
weight: 80
---

View File

@ -0,0 +1,52 @@
---
title: "Architecture"
weight: 20
description: "Learn the system architecture of Talos Linux itself."
---
Talos is designed to be **atomic** in _deployment_ and **modular** in _composition_.
It is atomic in that the entirety of Talos is distributed as a
single, self-contained image, which is versioned, signed, and immutable.
It is modular in that it is composed of many separate components
which have clearly defined gRPC interfaces which facilitate internal flexibility
and external operational guarantees.
All of the main Talos components communicate with each other by gRPC, through a socket on the local machine.
This imposes a clear separation of concerns and ensures that changes over time which affect the interoperation of components are a part of the public git record.
The benefit is that each component may be iterated and changed as its needs dictate, so long as the external API is controlled.
This is a key component in reducing coupling and maintaining modularity.
## File system partitions
Talos uses these partitions with the following labels:
1. **EFI** - stores EFI boot data.
1. **BIOS** - used for GRUB's second stage boot.
1. **BOOT** - used for the boot loader, stores initramfs and kernel data.
1. **META** - stores metadata about the talos node, such as node id's.
1. **STATE** - stores machine configuration, node identity data for cluster discovery and KubeSpan info
1. **EPHEMERAL** - stores ephemeral state information, mounted at `/var`
## The File System
One of the unique design decisions in Talos is the layout of the root file system.
There are three "layers" to the Talos root file system.
At its core the rootfs is a read-only squashfs.
The squashfs is then mounted as a loop device into memory.
This provides Talos with an immutable base.
The next layer is a set of `tmpfs` file systems for runtime specific needs.
Aside from the standard pseudo file systems such as `/dev`, `/proc`, `/run`, `/sys` and `/tmp`, a special `/system` is created for internal needs.
One reason for this is that we need special files such as `/etc/hosts`, and `/etc/resolv.conf` to be writable (remember that the rootfs is read-only).
For example, at boot Talos will write `/system/etc/hosts` and then bind mount it over `/etc/hosts`.
This means that instead of making all of `/etc` writable, Talos only makes very specific files writable under `/etc`.
All files under `/system` are completely recreated on each boot.
For files and directories that need to persist across boots, Talos creates `overlayfs` file systems.
The `/etc/kubernetes` is a good example of this.
Directories like this are `overlayfs` backed by an XFS file system mounted at `/var`.
The `/var` directory is owned by Kubernetes with the exception of the above `overlayfs` file systems.
This directory is writable and used by `etcd` (in the case of control plane nodes), the kubelet, and the CRI (containerd).

View File

@ -0,0 +1,124 @@
---
title: "Components"
weight: 40
description: "Understand the system components that make up Talos Linux."
---
In this section, we discuss the various components that underpin Talos.
## Components
| Component | Description |
| ------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| apid | When interacting with Talos, the gRPC API endpoint you interact with directly is provided by `apid`. `apid` acts as the gateway for all component interactions and forwards the requests to `machined`. |
| containerd | An industry-standard container runtime with an emphasis on simplicity, robustness, and portability. To learn more, see the [containerd website](https://containerd.io). |
| machined | Talos replacement for the traditional Linux init-process. Specially designed to run Kubernetes and does not allow starting arbitrary user services. |
| networkd | Handles all of the host level network configuration. The configuration is defined under the `networking` key |
| kernel | The Linux kernel included with Talos is configured according to the recommendations outlined in the [Kernel Self Protection Project](http://kernsec.org/wiki/index.php/Kernel_Self_Protection_Project). |
| trustd | To run and operate a Kubernetes cluster, a certain level of trust is required. Based on the concept of a 'Root of Trust', `trustd` is a simple daemon responsible for establishing trust within the system. |
| udevd | Implementation of `eudev` into `machined`. `eudev` is Gentoo's fork of udev, systemd's device file manager for the Linux kernel. It manages device nodes in /dev and handles all user space actions when adding or removing devices. To learn more, see the [Gentoo Wiki](https://wiki.gentoo.org/wiki/Eudev). |
### apid
When interacting with Talos, the gRPC api endpoint you will interact with directly is `apid`.
Apid acts as the gateway for all component interactions.
Apid provides a mechanism to route requests to the appropriate destination when running on a control plane node.
We'll use some examples below to illustrate what `apid` is doing.
When a user wants to interact with a Talos component via `talosctl`, there are two flags that control the interaction with `apid`.
The `-e | --endpoints` flag specifies which Talos node ( via `apid` ) should handle the connection.
Typically this is a public-facing server.
The `-n | --nodes` flag specifies which Talos node(s) should respond to the request.
If `--nodes` is omitted, the first endpoint will be used.
> Note: Typically, there will be an `endpoint` already defined in the Talos config file.
> Optionally, `nodes` can be included here as well.
For example, if a user wants to interact with `machined`, a command like `talosctl -e cluster.talos.dev memory` may be used.
```bash
$ talosctl -e cluster.talos.dev memory
NODE TOTAL USED FREE SHARED BUFFERS CACHE AVAILABLE
cluster.talos.dev 7938 1768 2390 145 53 3724 6571
```
In this case, `talosctl` is interacting with `apid` running on `cluster.talos.dev` and forwarding the request to the `machined` api.
If we wanted to extend our example to retrieve `memory` from another node in our cluster, we could use the command `talosctl -e cluster.talos.dev -n node02 memory`.
```bash
$ talosctl -e cluster.talos.dev -n node02 memory
NODE TOTAL USED FREE SHARED BUFFERS CACHE AVAILABLE
node02 7938 1768 2390 145 53 3724 6571
```
The `apid` instance on `cluster.talos.dev` receives the request and forwards it to `apid` running on `node02`, which forwards the request to the `machined` api.
We can further extend our example to retrieve `memory` for all nodes in our cluster by appending additional `-n node` flags or using a comma separated list of nodes ( `-n node01,node02,node03` ):
```bash
$ talosctl -e cluster.talos.dev -n node01 -n node02 -n node03 memory
NODE TOTAL USED FREE SHARED BUFFERS CACHE AVAILABLE
node01 7938 871 4071 137 49 2945 7042
node02 257844 14408 190796 18138 49 52589 227492
node03 257844 1830 255186 125 49 777 254556
```
The `apid` instance on `cluster.talos.dev` receives the request and forwards it to `node01`, `node02`, and `node03`, which then forwards the request to their local `machined` api.
### containerd
[Containerd](https://github.com/containerd/containerd) provides the container runtime to launch workloads on Talos and Kubernetes.
Talos services are namespaced under the `system` namespace in containerd, whereas the Kubernetes services are namespaced under the `k8s.io` namespace.
### machined
A common theme throughout the design of Talos is minimalism.
We believe strongly in the UNIX philosophy that each program should do one job well.
The `init` included in Talos is one example of this, and we are calling it "`machined`".
We wanted to create a focused `init` that had one job - run Kubernetes.
To that extent, `machined` is relatively static in that it does not allow for arbitrary user-defined services.
Only the services necessary to run Kubernetes and manage the node are available.
This includes:
- containerd
- [kubelet](https://kubernetes.io/docs/concepts/overview/components/)
- networkd
- trustd
- udevd
### networkd
Networkd handles all of the host level network configuration.
The configuration is defined under the `networking` key.
By default, we attempt to issue a DHCP request for every interface on the server.
This can be overridden by supplying one of the following kernel arguments:
- `talos.network.interface.ignore` - specify a list of interfaces to skip discovery on
- `ip` - `ip=<client-ip>:<server-ip>:<gw-ip>:<netmask>:<hostname>:<device>:<autoconf>:<dns0-ip>:<dns1-ip>:<ntp0-ip>` as documented in the [kernel here](https://www.kernel.org/doc/Documentation/filesystems/nfs/nfsroot.txt)
- ex, `ip=10.0.0.99:::255.0.0.0:control-1:eth0:off:10.0.0.1`
### kernel
The Linux kernel included with Talos is configured according to the recommendations outlined in the Kernel Self Protection Project ([KSSP](http://kernsec.org/wiki/index.php/Kernel_Self_Protection_Project)).
### trustd
Security is one of the highest priorities within Talos.
To run a Kubernetes cluster, a certain level of trust is required to operate a cluster.
For example, orchestrating the bootstrap of a highly available control plane requires sensitive PKI data distribution.
To that end, we created `trustd`.
Based on a Root of Trust concept, `trustd` is a simple daemon responsible for establishing trust within the system.
Once trust is established, various methods become available to the trustee.
For example, it can accept a write request from another node to place a file on disk.
Additional methods and capabilities will be added to the `trustd` component to support new functionality in the rest of the Talos environment.
### udevd
Udevd handles the kernel device notifications and sets up the necessary links in `/dev`.

View File

@ -0,0 +1,143 @@
---
title: "Concepts"
weight: 30
description: "Summary of Talos Linux."
---
When people come across Talos, they frequently want a nice, bite-sized summary
of it.
This is surprisingly difficult when Talos represents such a
fundamentally-rethought operating system.
## Not based on X distro
A useful way to summarize an operating system is to say that it is based on X, but focused on Y.
For instance, Mint was originally based on Ubuntu, but focused on Gnome 2 (instead of, at the time, Unity).
Or maybe something like Raspbian is based on Debian, but it is focused on the Raspberry Pi.
CentOS is RHEL, but made license-free.
Talos Linux _isn't_ based on any other distribution.
We often think of ourselves as being the second-generation of
container-optimised operating systems, where things like CoreOS, Flatcar, and Rancher represent the first generation, but that implies heredity where there is none.
Talos Linux is actually a ground-up rewrite of the userspace, from PID 1.
We run the Linux kernel, but everything downstream of that is our own custom
code, written in Go, rigorously-tested, and published as an immutable,
integrated, cohesive image.
The Linux kernel launches what we call `machined`, for instance, not `systemd`.
There is no `systemd` on our system.
There are no GNU utilities, no shell, no SSH, no packages, nothing you could associate with
any other distribution.
We don't even have a build toolchain in the normal sense of the word.
## Not for individual use
Technically, Talos Linux installs to a computer much as other operating systems.
_Unlike_ other operating systems, Talos is not meant to run alone, on a
single machine.
Talos Linux comes with tooling from the very foundation to form clusters, even
before Kubernetes comes into play.
A design goal of Talos Linux is eliminating the management
of individual nodes as much as possible.
In order to do that, Talos Linux operates as a cluster of machines, with lots of
checking and coordination between them, at all levels.
Break from your mind the idea of running an application on a computer.
There are no individual computers.
There is only a cluster.
Talos is meant to do one thing: maintain a Kubernetes cluster, and it does this
very, very well.
The entirety of the configuration of any machine is specified by a single,
simple configuration file, which can often be the _same_ configuration file used
across _many_ machines.
Much like a biological system, if some component misbehaves, just cut it out and
let a replacement grow.
Rebuilds of Talos are remarkably fast, whether they be new machines, upgrades,
or reinstalls.
Never get hung up on an individual machine.
## Control Planes are not linear replicas
People familiar with traditional relational database replication often
overlook a critical design concept of the Kubernetes (and Talos) database:
`etcd`.
Unlike linear replicas, which have dedicated masters and slaves/replicas, `etcd`
is highly dynamic.
The `master` in an `etcd` cluster is entirely temporal.
This means fail-overs are handled easily, and usually without any notice
of operators.
This _also_ means that the operational architecture is fundamentally different.
Properly managed (which Talos Linux does), `etcd` should never have split brain
and should never encounter noticeable down time.
In order to do this, though, `etcd` maintains the concept of "membership" and of
"quorum".
In order to perform _any_ operation, read _or_ write, the database requires
quorum to be sustained.
That is, a _strict_ majority must agree on the current leader, and absenteeism
counts as a negative.
In other words, if there are three registered members (voters), at least two out
of the three must be actively asserting that the current master _is_ the master.
If any two disagree or even fail to answer, the `etcd` database will lock itself
until quorum is again achieved in order to protect itself and the integrity of
the data.
This is fantastically important for handling distributed systems and the various
types of contention which may arise.
This design means, however, that having an incorrect number of members can be
devastating.
Having only two controlplane nodes, for instance, is mostly _worse_ than having
only one, because if _either_ goes down, your entire database will lock.
You would be better off just making periodic snapshots of the data and restoring
it when necessary.
Another common situation occurs when replacing controlplane nodes.
If you have three controlplane nodes and replace one, you will not have three
members, you will have four, and one of those will never be available again.
Thus, if _any_ of your three remaining nodes goes down, your database will lock,
because only two out of the four members will be available: four nodes is
_worse_ than three nodes!
So it is critical that controlplane members which are replaced be removed.
Luckily, the Talos API makes this easy.
## Bootstrap once
In the old days, Talos Linux had the idea of an `init` node.
The `init` node was a "special" controlplane node which was designated as the
founder of the cluster.
It was the first, was guaranteed to be the elector, and was authorised to create
a cluster...
even if one already existed.
This made the formation of a cluster cluster really easy, but it had a lot of
down sides.
Mostly, these related to rebuilding or replacing that `init` node:
you could easily end up with a split-brain scenario in which you had two different clusters:
a single node one and a two-node one.
Needless to say, this was an unhappy arrangement.
Fortunately, `init` nodes are gone, but that means that the critical operation
of forming a cluster is a manual process.
It's an _easy_ process, consisting of a single API call, but it can be a
confusing one, until you understand what it does.
Every new cluster must be bootstrapped exactly and only once.
This means you do NOT bootstrap each node in a cluster, not even each
controlplane node.
You bootstrap only a _single_ controlplane node, because you are bootstrapping the
_cluster_, not the node.
It doesn't matter _which_ controlplane node is told to bootstrap, but it must be
a controlplane node, and it must be only one.
Bootstrapping is _fast_ and sure.
Even if your Kubernetes cluster fails to form for other reasons (say, a bad
configuration option or unavailable container repository), if the bootstrap API
call returns successfully, you do NOT need to bootstrap again:
just fix the config or let Kubernetes retry.
Bootstrapping itself does not do anything with Kubernetes.
Bootstrapping only tells `etcd` to form a cluster, so don't judge the success of
a bootstrap by the failure of Kubernetes to start.
Kubernetes relies on `etcd`, so bootstrapping is _required_, but it is not
_sufficient_ for Kubernetes to start.

View File

@ -0,0 +1,68 @@
---
title: "Control Plane"
weight: 50
description: "Understand the Kubernetes Control Plane."
---
This guide provides details on how Talos runs and bootstraps the Kubernetes control plane.
### High-level Overview
Talos cluster bootstrap flow:
1. The `etcd` service is started on control plane nodes.
Instances of `etcd` on control plane nodes build the `etcd` cluster.
2. The `kubelet` service is started.
3. Control plane components are started as static pods via the `kubelet`, and the `kube-apiserver` component connects to the local (running on the same node) `etcd` instance.
4. The `kubelet` issues client certificate using the bootstrap token using the control plane endpoint (via `kube-apiserver` and `kube-controller-manager`).
5. The `kubelet` registers the node in the API server.
6. Kubernetes control plane schedules pods on the nodes.
### Cluster Bootstrapping
All nodes start the `kubelet` service.
The `kubelet` tries to contact the control plane endpoint, but as it is not up yet, it keeps retrying.
One of the control plane nodes is chosen as the bootstrap node.
The node's type can be either `init` or `controlplane`, where the `controlplane` type is promoted using the bootstrap API (`talosctl bootstrap`).
The bootstrap node initiates the `etcd` bootstrap process by initializing `etcd` as the first member of the cluster.
> Note: there should be only one bootstrap node for the cluster lifetime.
> Once `etcd` is bootstrapped, the bootstrap node has no special role and acts the same way as other control plane nodes.
Services `etcd` on non-bootstrap nodes try to get `Endpoints` resource via control plane endpoint, but that request fails as control plane endpoint is not up yet.
As soon as `etcd` is up on the bootstrap node, static pod definitions for the Kubernetes control plane components (`kube-apiserver`, `kube-controller-manager`, `kube-scheduler`) are rendered to disk.
The `kubelet` service on the bootstrap node picks up the static pod definitions and starts the Kubernetes control plane components.
As soon as `kube-apiserver` is launched, the control plane endpoint comes up.
The bootstrap node acquires an `etcd` mutex and injects the bootstrap manifests into the API server.
The set of the bootstrap manifests specify the Kubernetes join token and kubelet CSR auto-approval.
The `kubelet` service on all the nodes is now able to issue client certificates for themselves and register nodes in the API server.
Other bootstrap manifests specify additional resources critical for Kubernetes operations (i.e. CNI, PSP, etc.)
The `etcd` service on non-bootstrap nodes is now able to discover other members of the `etcd` cluster via the Kubernetes `Endpoints` resource.
The `etcd` cluster is now formed and consists of all control plane nodes.
All control plane nodes render static pod manifests for the control plane components.
Each node now runs a full set of components to make the control plane HA.
The `kubelet` service on worker nodes is now able to issue the client certificate and register itself with the API server.
### Scaling Up the Control Plane
When new nodes are added to the control plane, the process is the same as the bootstrap process above: the `etcd` service discovers existing members of the control plane via the
control plane endpoint, joins the `etcd` cluster, and the control plane components are scheduled on the node.
### Scaling Down the Control Plane
Scaling down the control plane involves removing a node from the cluster.
The most critical part is making sure that the node which is being removed leaves the etcd cluster.
When using `talosctl reset` command, the targeted control plane node leaves the `etcd` cluster as part of the reset sequence.
### Upgrading Control Plane Nodes
When a control plane node is upgraded, Talos leaves `etcd`, wipes the system disk, installs a new version of itself, and reboots.
The upgraded node then joins the `etcd` cluster on reboot.
So upgrading a control plane node is equivalent to scaling down the control plane node followed by scaling up with a new version of Talos.

View File

@ -0,0 +1,230 @@
---
title: "Controllers and Resources"
weight: 60
description: "Discover how Talos Linux uses the concepts on Controllers and Resources."
---
<!-- markdownlint-disable MD038 -->
Talos implements concepts of *resources* and *controllers* to facilitate internal operations of the operating system.
Talos resources and controllers are very similar to Kubernetes resources and controllers, but there are some differences.
The content of this document is not required to operate Talos, but it is useful for troubleshooting.
Starting with Talos 0.9, most of the Kubernetes control plane boostrapping and operations is implemented via controllers and resources which allows Talos to be reactive to configuration changes, environment changes (e.g. time sync).
## Resources
A resource captures a piece of system state.
Each resource belongs to a "Type" which defines resource contents.
Resource state can be split in two parts:
* metadata: fixed set of fields describing resource - namespace, type, ID, etc.
* spec: contents of the resource (depends on resource type).
Resource is uniquely identified by (`namespace`, `type`, `id`).
Namespaces provide a way to avoid conflicts on duplicate resource IDs.
At the moment of this writing, all resources are local to the node and stored in memory.
So on every reboot resource state is rebuilt from scratch (the only exception is `MachineConfig` resource which reflects current machine config).
## Controllers
Controllers run as independent lightweight threads in Talos.
The goal of the controller is to reconcile the state based on inputs and eventually update outputs.
A controller can have any number of resource types (and namespaces) as inputs.
In other words, it watches specified resources for changes and reconciles when these changes occur.
A controller might also have additional inputs: running reconcile on schedule, watching `etcd` keys, etc.
A controller has a single output: a set of resources of fixed type in a fixed namespace.
Only one controller can manage resource type in the namespace, so conflicts are avoided.
## Querying Resources
Talos CLI tool `talosctl` provides read-only access to the resource API which includes getting specific resource,
listing resources and watching for changes.
Talos stores resources describing resource types and namespaces in `meta` namespace:
```bash
$ talosctl get resourcedefinitions
NODE NAMESPACE TYPE ID VERSION
172.20.0.2 meta ResourceDefinition bootstrapstatuses.v1alpha1.talos.dev 1
172.20.0.2 meta ResourceDefinition etcdsecrets.secrets.talos.dev 1
172.20.0.2 meta ResourceDefinition kubernetescontrolplaneconfigs.config.talos.dev 1
172.20.0.2 meta ResourceDefinition kubernetessecrets.secrets.talos.dev 1
172.20.0.2 meta ResourceDefinition machineconfigs.config.talos.dev 1
172.20.0.2 meta ResourceDefinition machinetypes.config.talos.dev 1
172.20.0.2 meta ResourceDefinition manifests.kubernetes.talos.dev 1
172.20.0.2 meta ResourceDefinition manifeststatuses.kubernetes.talos.dev 1
172.20.0.2 meta ResourceDefinition namespaces.meta.cosi.dev 1
172.20.0.2 meta ResourceDefinition resourcedefinitions.meta.cosi.dev 1
172.20.0.2 meta ResourceDefinition rootsecrets.secrets.talos.dev 1
172.20.0.2 meta ResourceDefinition secretstatuses.kubernetes.talos.dev 1
172.20.0.2 meta ResourceDefinition services.v1alpha1.talos.dev 1
172.20.0.2 meta ResourceDefinition staticpods.kubernetes.talos.dev 1
172.20.0.2 meta ResourceDefinition staticpodstatuses.kubernetes.talos.dev 1
172.20.0.2 meta ResourceDefinition timestatuses.v1alpha1.talos.dev 1
```
```bash
$ talosctl get namespaces
NODE NAMESPACE TYPE ID VERSION
172.20.0.2 meta Namespace config 1
172.20.0.2 meta Namespace controlplane 1
172.20.0.2 meta Namespace meta 1
172.20.0.2 meta Namespace runtime 1
172.20.0.2 meta Namespace secrets 1
```
Most of the time namespace flag (`--namespace`) can be omitted, as `ResourceDefinition` contains default
namespace which is used if no namespace is given:
```bash
$ talosctl get resourcedefinitions resourcedefinitions.meta.cosi.dev -o yaml
node: 172.20.0.2
metadata:
namespace: meta
type: ResourceDefinitions.meta.cosi.dev
id: resourcedefinitions.meta.cosi.dev
version: 1
phase: running
spec:
type: ResourceDefinitions.meta.cosi.dev
displayType: ResourceDefinition
aliases:
- resourcedefinitions
- resourcedefinition
- resourcedefinitions.meta
- resourcedefinitions.meta.cosi
- rd
- rds
printColumns: []
defaultNamespace: meta
```
Resource definition also contains type aliases which can be used interchangeably with canonical resource name:
```bash
$ talosctl get ns config
NODE NAMESPACE TYPE ID VERSION
172.20.0.2 meta Namespace config 1
```
### Output
Command `talosctl get` supports following output modes:
* `table` (default) prints resource list as a table
* `yaml` prints pretty formatted resources with details, including full metadata spec.
This format carries most details from the backend resource (e.g. comments in `MachineConfig` resource)
* `json` prints same information as `yaml`, some additional details (e.g. comments) might be lost.
This format is useful for automated processing with tools like `jq`.
### Watching Changes
If flag `--watch` is appended to the `talosctl get` command, the command switches to watch mode.
If list of resources was requested, `talosctl` prints initial contents of the list and then appends resource information for every change:
```bash
$ talosctl get svc -w
NODE * NAMESPACE TYPE ID VERSION RUNNING HEALTHY
172.20.0.2 + runtime Service timed 2 true true
172.20.0.2 + runtime Service trustd 2 true true
172.20.0.2 + runtime Service udevd 2 true true
172.20.0.2 - runtime Service timed 2 true true
172.20.0.2 + runtime Service timed 1 true false
172.20.0.2 runtime Service timed 2 true true
```
Column `*` specifies event type:
* `+` is created
* `-` is deleted
* ` ` is updated
In YAML/JSON output, field `event` is added to the resource representation to describe the event type.
### Examples
Getting machine config:
```bash
$ talosctl get machineconfig -o yaml
node: 172.20.0.2
metadata:
namespace: config
type: MachineConfigs.config.talos.dev
id: v1alpha1
version: 2
phase: running
spec:
version: v1alpha1 # Indicates the schema used to decode the contents.
debug: false # Enable verbose logging to the console.
persist: true # Indicates whether to pull the machine config upon every boot.
# Provides machine specific configuration options.
...
```
Getting control plane static pod statuses:
```bash
$ talosctl get staticpodstatus
NODE NAMESPACE TYPE ID VERSION READY
172.20.0.2 controlplane StaticPodStatus kube-system/kube-apiserver-talos-default-master-1 3 True
172.20.0.2 controlplane StaticPodStatus kube-system/kube-controller-manager-talos-default-master-1 3 True
172.20.0.2 controlplane StaticPodStatus kube-system/kube-scheduler-talos-default-master-1 4 True
```
Getting static pod definition for `kube-apiserver`:
```bash
$ talosctl get sp kube-apiserver -n 172.20.0.2 -o yaml
node: 172.20.0.2
metadata:
namespace: controlplane
type: StaticPods.kubernetes.talos.dev
id: kube-apiserver
version: 3
phase: running
finalizers:
- k8s.StaticPodStatus("kube-apiserver")
spec:
apiVersion: v1
kind: Pod
metadata:
annotations:
talos.dev/config-version: "1"
talos.dev/secrets-version: "2"
...
```
## Inspecting Controller Dependencies
Talos can report current dependencies between controllers and resources for debugging purposes:
```bash
$ talosctl inspect dependencies
digraph {
n1[label="config.K8sControlPlaneController",shape="box"];
n3[label="config.MachineTypeController",shape="box"];
n2[fillcolor="azure2",label="config:KubernetesControlPlaneConfigs.config.talos.dev",shape="note",style="filled"];
...
```
This outputs graph in `graphviz` format which can be rendered to PNG with command:
```bash
talosctl inspect dependencies | dot -T png > deps.png
```
![Controller Dependencies](/images/controller-dependencies-v2.png)
Graph can be enhanced by replacing resource types with actual resource instances:
```bash
talosctl inspect dependencies --with-resources | dot -T png > deps.png
```
![Controller Dependencies with Resources](/images/controller-dependencies-with-resources-v2.png)

View File

@ -0,0 +1,21 @@
---
title: "Discovery"
weight: 90
description: "Discover how Sidero Labs implements Talos node discovery."
---
We maintain a public discovery service whereby members of your cluster can use a shared key that is globally unique to coordinate the most basic connection information (i.e. the set of possible "endpoints", or IP:port pairs).
We call this data "affiliate data."
> Note: If KubeSpan is enabled the data has the addition of the WireGuard public key.
Before sending data to the discovery service, Talos will encrypt the affiliate data with AES-GCM encryption and separately encrypt endpoints with AES in ECB mode so that endpoints coming from different sources can be deduplicated server-side.
Each node submits its data encrypted plus it submits the endpoints it sees from other peers to the discovery service.
The discovery service aggregates the data, deduplicates the endpoints, and sends updates to each connected peer.
Each peer receives information back about other affiliates from the discovery service, decrypts it and uses it to drive KubeSpan and cluster discovery.
The discovery service has no persistence.
Data is stored in memory only with a TTL set by the clients (i.e. Talos).
The cluster ID is used as a key to select the affiliates (so that different clusters see different affiliates).
To summarize, the discovery service knows the client version, cluster ID, the number of affiliates, some encrypted data for each affiliate, and a list of encrypted endpoints.

View File

@ -0,0 +1,39 @@
---
title: "FAQs"
weight: 999
description: "Frequently Asked Questions about Talos Linux."
---
<!-- markdownlint-disable MD026 -->
## How is Talos different from other container optimized Linux distros?
Talos shares a lot of attributes with other distros, but there are some important differences.
Talos integrates tightly with Kubernetes, and is not meant to be a general-purpose operating system.
The most important difference is that Talos is fully controlled by an API via a gRPC interface, instead of an ordinary shell.
We don't ship SSH, and there is no console access.
Removing components such as these has allowed us to dramatically reduce the footprint of Talos, and in turn, improve a number of other areas like security, predictability, reliability, and consistency across platforms.
It's a big change from how operating systems have been managed in the past, but we believe that API-driven OSes are the future.
## Why no shell or SSH?
Since Talos is fully API-driven, all maintenance and debugging operations should be possible via the OS API.
We would like for Talos users to start thinking about what a "machine" is in the context of a Kubernetes cluster.
That is, that a Kubernetes _cluster_ can be thought of as one massive machine, and the _nodes_ are merely additional, undifferentiated resources.
We don't want humans to focus on the _nodes_, but rather on the _machine_ that is the Kubernetes cluster.
Should an issue arise at the node level, `talosctl` should provide the necessary tooling to assist in the identification, debugging, and remediation of the issue.
However, the API is based on the Principle of Least Privilege, and exposes only a limited set of methods.
We envision Talos being a great place for the application of [control theory](https://en.wikipedia.org/wiki/Control_theory) in order to provide a self-healing platform.
## Why the name "Talos"?
Talos was an automaton created by the Greek God of the forge to protect the island of Crete.
He would patrol the coast and enforce laws throughout the land.
We felt it was a fitting name for a security focused operating system designed to run Kubernetes.
## Why does Talos rely on a separate configuration from Kubernetes?
The `talosconfig` file contains client credentials to access the Talos Linux API.
Sometimes Kubernetes might be down for a number of reasons (etcd issues, misconfiguration, etc.), while Talos API access will always be available.
The Talos API is a way to access the operating system and fix issues, e.g. fixing access to Kubernetes.
When Talos Linux is running fine, using the Kubernetes APIs (via `kubeconfig`) is all you should need to deploy and manage Kubernetes workloads.

View File

@ -0,0 +1,35 @@
---
title: "Knowledge Base"
weight: 1999
description: "Recipes for common configuration tasks with Talos Linux."
---
## Disabling `GracefulNodeShutdown` on a node
Talos Linux enables [Graceful Node Shutdown](https://kubernetes.io/docs/concepts/architecture/nodes/#graceful-node-shutdown) Kubernetes feature by default.
If this feature should be disabled, modify the `kubelet` part of the machine configuration with:
```yaml
machine:
kubelet:
extraArgs:
feature-gates: GracefulNodeShutdown=false
extraConfig:
shutdownGracePeriod: 0s
shutdownGracePeriodCriticalPods: 0s
```
## Generating Talos Linux ISO image with custom kernel arguments
Pass additional kernel arguments using `--extra-kernel-arg` flag:
```shell
$ docker run --rm -i ghcr.io/siderolabs/imager:{{< release >}} iso --arch amd64 --tar-to-stdout --extra-kernel-arg console=ttyS1 --extra-kernel-arg console=tty0 | tar xz
2022/05/25 13:18:47 copying /usr/install/amd64/vmlinuz to /mnt/boot/vmlinuz
2022/05/25 13:18:47 copying /usr/install/amd64/initramfs.xz to /mnt/boot/initramfs.xz
2022/05/25 13:18:47 creating grub.cfg
2022/05/25 13:18:47 creating ISO
```
ISO will be output to the file `talos-<arch>.iso` in the current directory.

View File

@ -0,0 +1,100 @@
---
title: "KubeSpan"
weight: 100
description: "Understand more about KubeSpan for Talos Linux."
---
## WireGuard Peer Discovery
The key pieces of information needed for WireGuard generally are:
- the public key of the host you wish to connect to
- an IP address and port of the host you wish to connect to
The latter is really only required of _one_ side of the pair.
Once traffic is received, that information is known and updated by WireGuard automatically and internally.
For Kubernetes, though, this is not quite sufficient.
Kubernetes also needs to know which traffic goes to which WireGuard peer.
Because this information may be dynamic, we need a way to be able to constantly keep this information up to date.
If we have a functional connection to Kubernetes otherwise, it's fairly easy: we can just keep that information in Kubernetes.
Otherwise, we have to have some way to discover it.
In our solution, we have a multi-tiered approach to gathering this information.
Each tier can operate independently, but the amalgamation of the tiers produces a more robust set of connection criteria.
For this discussion, we will point out two of these tiers:
- an external service
- a Kubernetes-based system
See [discovery service]({{< relref "discovery" >}}) to learn more about the external service.
The Kubernetes-based system utilises annotations on Kubernetes Nodes which describe each node's public key and local addresses.
On top of this, we also route Pod subnets.
This is often (maybe even usually) taken care of by the CNI, but there are many situations where the CNI fails to be able to do this itself, across networks.
So we also scrape the Kubernetes Node resource to discover its `podCIDRs`.
## NAT, Multiple Routes, Multiple IPs
One of the difficulties in communicating across networks is that there is often not a single address and port which can identify a connection for each node on the system.
For instance, a node sitting on the same network might see its peer as `192.168.2.10`, but a node across the internet may see it as `2001:db8:1ef1::10`.
We need to be able to handle any number of addresses and ports, and we also need to have a mechanism to _try_ them.
WireGuard only allows us to select one at a time.
For our implementation, then, we have built a controller which continuously discovers and rotates these IP:port pairs until a connection is established.
It then starts trying again if that connection ever fails.
## Packet Routing
After we have established a WireGuard connection, our work is not done.
We still have to make sure that the right packets get sent to the WireGuard interface.
WireGuard supplies a convenient facility for tagging packets which come from _it_, which is great.
But in our case, we need to be able to allow traffic which both does _not_ come from WireGuard and _also_ is not destined for another Kubernetes node to flow through the normal mechanisms.
Unlike many corporate or privacy-oriented VPNs, we need to allow general internet traffic to flow normally.
Also, as our cluster grows, this set of IP addresses can become quite large and quite dynamic.
This would be very cumbersome and slow in `iptables`.
Luckily, the kernel supplies a convenient mechanism by which to define this arbitrarily large set of IP addresses: IP sets.
Talos collects all of the IPs and subnets which are considered "in-cluster" and maintains these in the kernel as an IP set.
Now that we have the IP set defined, we need to tell the kernel how to use it.
The traditional way of doing this would be to use `iptables`.
However, there is a big problem with IPTables.
It is a common namespace in which any number of other pieces of software may dump things.
We have no surety that what we add will not be wiped out by something else (from Kubernetes itself, to the CNI, to some workload application), be rendered unusable by higher-priority rules, or just generally cause trouble and conflicts.
Instead, we use a three-pronged system which is both more foundational and less centralised.
NFTables offers a separately namespaced, decentralised way of marking packets for later processing based on IP sets.
Instead of a common set of well-known tables, NFTables uses hooks into the kernel's netfilter system, which are less vulnerable to being usurped, bypassed, or a source of interference than IPTables, but which are rendered down by the kernel to the same underlying XTables system.
Our NFTables system is where we store the IP sets.
Any packet which enters the system, either by forward from inside Kubernetes or by generation from the host itself, is compared against a hash table of this IP set.
If it is matched, it is marked for later processing by our next stage.
This is a high-performance system which exists fully in the kernel and which ultimately becomes an eBPF program, so it scales well to hundreds of nodes.
The next stage is the kernel router's route rules.
These are defined as a common ordered list of operations for the whole operating system, but they are intended to be tightly constrained and are rarely used by applications in any case.
The rules we add are very simple: if a packet is marked by our NFTables system, send it to an alternate routing table.
This leads us to our third and final stage of packet routing.
We have a custom routing table with two rules:
- send all IPv4 traffic to the WireGuard interface
- send all IPv6 traffic to the WireGuard interface
So in summary, we:
- mark packets destined for Kubernetes applications or Kubernetes nodes
- send marked packets to a special routing table
- send anything which is sent to that routing table through the WireGuard interface
This gives us an isolated, resilient, tolerant, and non-invasive way to route Kubernetes traffic safely, automatically, and transparently through WireGuard across almost any set of network topologies.

View File

@ -0,0 +1,434 @@
---
title: "Networking Resources"
weight: 70
description: "Delve deeper into networking of Talos Linux."
---
Talos network configuration subsystem is powered by [COSI]({{< relref "controllers-resources" >}}).
Talos translates network configuration from multiple sources: machine configuration, cloud metadata, network automatic configuration (e.g. DHCP) into COSI resources.
Network configuration and network state can be inspected using `talosctl get` command.
Network machine configuration can be modified using `talosctl edit mc` command (also variants `talosctl patch mc`, `talosctl apply-config`) without a reboot.
As API access requires network connection, [`--mode=try`]({{< relref "../talos-guides/configuration/editing-machine-configuration" >}})
can be used to test the configuration with automatic rollback to avoid losing network access to the node.
## Resources
There are six basic network configuration items in Talos:
* `Address` (IP address assigned to the interface/link);
* `Route` (route to a destination);
* `Link` (network interface/link configuration);
* `Resolver` (list of DNS servers);
* `Hostname` (node hostname and domainname);
* `TimeServer` (list of NTP servers).
Each network configuration item has two counterparts:
* `*Status` (e.g. `LinkStatus`) describes the current state of the system (Linux kernel state);
* `*Spec` (e.g. `LinkSpec`) defines the desired configuration.
| Resource | Status | Spec |
|--------------------|------------------------|----------------------|
| `Address` | `AddressStatus` | `AddressSpec` |
| `Route` | `RouteStatus` | `RouteSpec` |
| `Link` | `LinkStatus` | `LinkSpec` |
| `Resolver` | `ResolverStatus` | `ResolverSpec` |
| `Hostname` | `HostnameStatus` | `HostnameSpec` |
| `TimeServer` | `TimeServerStatus` | `TimeServerSpec` |
Status resources have aliases with the `Status` suffix removed, so for example
`AddressStatus` is also available as `Address`.
Talos networking controllers reconcile the state so that `*Status` equals the desired `*Spec`.
## Observing State
The current network configuration state can be observed by querying `*Status` resources via
`talosctl`:
```sh
$ talosctl get addresses
NODE NAMESPACE TYPE ID VERSION ADDRESS LINK
172.20.0.2 network AddressStatus eth0/172.20.0.2/24 1 172.20.0.2/24 eth0
172.20.0.2 network AddressStatus eth0/fe80::9804:17ff:fe9d:3058/64 2 fe80::9804:17ff:fe9d:3058/64 eth0
172.20.0.2 network AddressStatus flannel.1/10.244.4.0/32 1 10.244.4.0/32 flannel.1
172.20.0.2 network AddressStatus flannel.1/fe80::10b5:44ff:fe62:6fb8/64 2 fe80::10b5:44ff:fe62:6fb8/64 flannel.1
172.20.0.2 network AddressStatus lo/127.0.0.1/8 1 127.0.0.1/8 lo
172.20.0.2 network AddressStatus lo/::1/128 1 ::1/128 lo
```
In the output there are addresses set up by Talos (e.g. `eth0/172.20.0.2/24`) and
addresses set up by other facilities (e.g. `flannel.1/10.244.4.0/32` set up by CNI).
Talos networking controllers watch the kernel state and update resources
accordingly.
Additional details about the address can be accessed via the YAML output:
```yaml
# talosctl get address eth0/172.20.0.2/24 -o yaml
node: 172.20.0.2
metadata:
namespace: network
type: AddressStatuses.net.talos.dev
id: eth0/172.20.0.2/24
version: 1
owner: network.AddressStatusController
phase: running
created: 2021-06-29T20:23:18Z
updated: 2021-06-29T20:23:18Z
spec:
address: 172.20.0.2/24
local: 172.20.0.2
broadcast: 172.20.0.255
linkIndex: 4
linkName: eth0
family: inet4
scope: global
flags: permanent
```
Resources can be watched for changes with the `--watch` flag to see how configuration changes over time.
Other networking status resources can be inspected with `talosctl get routes`, `talosctl get links`, etc.
For example:
```sh
$ talosctl get resolvers
NODE NAMESPACE TYPE ID VERSION RESOLVERS
172.20.0.2 network ResolverStatus resolvers 2 ["8.8.8.8","1.1.1.1"]
```
```yaml
# talosctl get links -o yaml
node: 172.20.0.2
metadata:
namespace: network
type: LinkStatuses.net.talos.dev
id: eth0
version: 2
owner: network.LinkStatusController
phase: running
created: 2021-06-29T20:23:18Z
updated: 2021-06-29T20:23:18Z
spec:
index: 4
type: ether
linkIndex: 0
flags: UP,BROADCAST,RUNNING,MULTICAST,LOWER_UP
hardwareAddr: 4e:95:8e:8f:e4:47
broadcastAddr: ff:ff:ff:ff:ff:ff
mtu: 1500
queueDisc: pfifo_fast
operationalState: up
kind: ""
slaveKind: ""
driver: virtio_net
linkState: true
speedMbit: 4294967295
port: Other
duplex: Unknown
```
## Inspecting Configuration
The desired networking configuration is combined from multiple sources and presented
as `*Spec` resources:
```sh
$ talosctl get addressspecs
NODE NAMESPACE TYPE ID VERSION
172.20.0.2 network AddressSpec eth0/172.20.0.2/24 2
172.20.0.2 network AddressSpec lo/127.0.0.1/8 2
172.20.0.2 network AddressSpec lo/::1/128 2
```
These `AddressSpecs` are applied to the Linux kernel to reach the desired state.
If, for example, an `AddressSpec` is removed, the address is removed from the Linux network interface as well.
`*Spec` resources can't be manipulated directly, they are generated automatically by Talos
from multiple configuration sources (see a section below for details).
If a `*Spec` resource is queried in YAML format, some additional information is available:
```yaml
# talosctl get addressspecs eth0/172.20.0.2/24 -o yaml
node: 172.20.0.2
metadata:
namespace: network
type: AddressSpecs.net.talos.dev
id: eth0/172.20.0.2/24
version: 2
owner: network.AddressMergeController
phase: running
created: 2021-06-29T20:23:18Z
updated: 2021-06-29T20:23:18Z
finalizers:
- network.AddressSpecController
spec:
address: 172.20.0.2/24
linkName: eth0
family: inet4
scope: global
flags: permanent
layer: operator
```
An important field is the `layer` field, which describes a configuration layer this spec is coming from: in this case, it's generated by a network operator (see below) and is set by the DHCPv4 operator.
## Configuration Merging
Spec resources described in the previous section show the final merged configuration state,
while initial specs are put to a different unmerged namespace `network-config`.
Spec resources in the `network-config` namespace are merged with conflict resolution to produce the final merged representation in the `network` namespace.
Let's take `HostnameSpec` as an example.
The final merged representation is:
```yaml
# talosctl get hostnamespec -o yaml
node: 172.20.0.2
metadata:
namespace: network
type: HostnameSpecs.net.talos.dev
id: hostname
version: 2
owner: network.HostnameMergeController
phase: running
created: 2021-06-29T20:23:18Z
updated: 2021-06-29T20:23:18Z
finalizers:
- network.HostnameSpecController
spec:
hostname: talos-default-master-1
domainname: ""
layer: operator
```
We can see that the final configuration for the hostname is `talos-default-master-1`.
And this is the hostname that was actually applied.
This can be verified by querying a `HostnameStatus` resource:
```sh
$ talosctl get hostnamestatus
NODE NAMESPACE TYPE ID VERSION HOSTNAME DOMAINNAME
172.20.0.2 network HostnameStatus hostname 1 talos-default-master-1
```
Initial configuration for the hostname in the `network-config` namespace is:
```yaml
# talosctl get hostnamespec -o yaml --namespace network-config
node: 172.20.0.2
metadata:
namespace: network-config
type: HostnameSpecs.net.talos.dev
id: default/hostname
version: 2
owner: network.HostnameConfigController
phase: running
created: 2021-06-29T20:23:18Z
updated: 2021-06-29T20:23:18Z
spec:
hostname: talos-172-20-0-2
domainname: ""
layer: default
---
node: 172.20.0.2
metadata:
namespace: network-config
type: HostnameSpecs.net.talos.dev
id: dhcp4/eth0/hostname
version: 1
owner: network.OperatorSpecController
phase: running
created: 2021-06-29T20:23:18Z
updated: 2021-06-29T20:23:18Z
spec:
hostname: talos-default-master-1
domainname: ""
layer: operator
```
We can see that there are two specs for the hostname:
* one from the `default` configuration layer which defines the hostname as `talos-172-20-0-2` (default driven by the default node address);
* another one from the layer `operator` that defines the hostname as `talos-default-master-1` (DHCP).
Talos merges these two specs into a final `HostnameSpec` based on the configuration layer and merge rules.
Here is the order of precedence from low to high:
* `default` (defaults provided by Talos);
* `cmdline` (from the kernel command line);
* `platform` (driven by the cloud provider);
* `operator` (various dynamic configuration options: DHCP, Virtual IP, etc);
* `configuration` (derived from the machine configuration).
So in our example the `operator` layer `HostnameSpec` overrides the `default` layer producing the final hostname `talos-default-master-1`.
The merge process applies to all six core networking specs.
For each spec, the `layer` controls the merge behavior
If multiple configuration specs
appear at the same layer, they can be merged together if possible, otherwise merge result
is stable but not defined (e.g. if DHCP on multiple interfaces provides two different hostnames for the node).
`LinkSpecs` are merged across layers, so for example, machine configuration for the interface MTU overrides an MTU set by the DHCP server.
## Network Operators
Network operators provide dynamic network configuration which can change over time as the node is running:
* DHCPv4
* DHCPv6
* Virtual IP
Network operators produce specs for addresses, routes, links, etc., which are then merged and applied according to the rules described above.
Operators are configured with `OperatorSpec` resources which describe when operators
should run and additional configuration for the operator:
```yaml
# talosctl get operatorspecs -o yaml
node: 172.20.0.2
metadata:
namespace: network
type: OperatorSpecs.net.talos.dev
id: dhcp4/eth0
version: 1
owner: network.OperatorConfigController
phase: running
created: 2021-06-29T20:23:18Z
updated: 2021-06-29T20:23:18Z
spec:
operator: dhcp4
linkName: eth0
requireUp: true
dhcp4:
routeMetric: 1024
```
`OperatorSpec` resources are generated by Talos based on machine configuration mostly.
DHCP4 operator is created automatically for all physical network links which are not configured explicitly via the kernel command line or the machine configuration.
This also means that on the first boot, without a machine configuration, a DHCP request is made on all physical network interfaces by default.
Specs generated by operators are prefixed with the operator ID (`dhcp4/eth0` in the example above) in the unmerged `network-config` namespace:
```sh
$ talosctl -n 172.20.0.2 get addressspecs --namespace network-config
NODE NAMESPACE TYPE ID VERSION
172.20.0.2 network-config AddressSpec dhcp4/eth0/eth0/172.20.0.2/24 1
```
## Other Network Resources
There are some additional resources describing the network subsystem state.
The `NodeAddress` resource presents node addresses excluding link-local and loopback addresses:
```sh
$ talosctl get nodeaddresses
NODE NAMESPACE TYPE ID VERSION ADDRESSES
10.100.2.23 network NodeAddress accumulative 6 ["10.100.2.23","147.75.98.173","147.75.195.143","192.168.95.64","2604:1380:1:ca00::17"]
10.100.2.23 network NodeAddress current 5 ["10.100.2.23","147.75.98.173","192.168.95.64","2604:1380:1:ca00::17"]
10.100.2.23 network NodeAddress default 1 ["10.100.2.23"]
```
* `default` is the node default address;
* `current` is the set of addresses a node currently has;
* `accumulative` is the set of addresses a node had over time (it might include virtual IPs which are not owned by the node at the moment).
`NodeAddress` resources are used to pick up the default address for `etcd` peer URL, to populate SANs field in the generated certificates, etc.
Another important resource is `Nodename` which provides `Node` name in Kubernetes:
```sh
$ talosctl get nodename
NODE NAMESPACE TYPE ID VERSION NODENAME
10.100.2.23 controlplane Nodename nodename 1 infra-green-cp-mmf7v
```
Depending on the machine configuration `nodename` might be just a hostname or the FQDN of the node.
`NetworkStatus` aggregates the current state of the network configuration:
```yaml
# talosctl get networkstatus -o yaml
node: 10.100.2.23
metadata:
namespace: network
type: NetworkStatuses.net.talos.dev
id: status
version: 5
owner: network.StatusController
phase: running
created: 2021-06-24T18:56:00Z
updated: 2021-06-24T18:56:02Z
spec:
addressReady: true
connectivityReady: true
hostnameReady: true
etcFilesReady: true
```
## Network Controllers
For each of the six basic resource types, there are several controllers:
* `*StatusController` populates `*Status` resources observing the Linux kernel state.
* `*ConfigController` produces the initial unmerged `*Spec` resources in the `network-config` namespace based on defaults, kernel command line, and machine configuration.
* `*MergeController` merges `*Spec` resources into the final representation in the `network` namespace.
* `*SpecController` applies merged `*Spec` resources to the kernel state.
For the network operators:
* `OperatorConfigController` produces `OperatorSpec` resources based on machine configuration and deafauls.
* `OperatorSpecController` runs network operators watching `OperatorSpec` resources and producing various `*Spec` resources in the `network-config` namespace.
## Configuration Sources
There are several configuration sources for the network configuration, which are described in this section.
### Defaults
* `lo` interface is assigned addresses `127.0.0.1/8` and `::1/128`;
* hostname is set to the `talos-<IP>` where `IP` is the default node address;
* resolvers are set to `8.8.8.8`, `1.1.1.1`;
* time servers are set to `pool.ntp.org`;
* DHCP4 operator is run on any physical interface which is not configured explicitly.
### Cmdline
The kernel [command line]({{< relref "../reference/kernel" >}}) is parsed for the following options:
* `ip=` option is parsed for node IP, default gateway, hostname, DNS servers, NTP servers;
* `bond=` option is parsed for bonding interfaces and their options;
* `talos.hostname=` option is used to set node hostname;
* `talos.network.interface.ignore=` can be used to make Talos skip network interface configuration completely.
### Platform
Platform configuration delivers cloud environment-specific options (e.g. the hostname).
Platform configuration is specific to the environment metadata: for example, on Equinix Metal, Talos automatically
configures public and private IPs, routing, link bonding, hostname.
Platform configuration is cached across reboots in `/system/state/platform-network.yaml`.
### Operator
Network operators provide configuration for all basic resource types.
### Machine Configuration
The machine configuration is parsed for link configuration, addresses, routes, hostname,
resolvers and time servers.
Any changes to `.machine.network` configuration can be applied in immediate mode.
## Network Configuration Debugging
Most of the network controller operations and failures are logged to the kernel console,
additional logs with `debug` level are available with `talosctl logs controller-runtime` command.
If the network configuration can't be established and the API is not available, `debug` level
logs can be sent to the console with `debug: true` option in the machine configuration.

View File

@ -0,0 +1,73 @@
---
title: Philosophy
weight: 10
description: "Learn about the philosophy behind the need for Talos Linux."
---
## Distributed
Talos is intended to be operated in a distributed manner.
That is, it is built for a high-availability dataplane _first_.
Its `etcd` cluster is built in an ad-hoc manner, with each appointed node joining on its own directive (with proper security validations enforced, of course).
Like as kubernetes itself, workloads are intended to be distributed across any number of compute nodes.
There should be no single points of failure, and the level of required coordination is as low as each platform allows.
## Immutable
Talos takes immutability very seriously.
Talos itself, even when installed on a disk, always runs from a SquashFS image, meaning that even if a directory is mounted to be writable, the image itself is never modified.
All images are signed and delivered as single, versioned files.
We can always run integrity checks on our image to verify that it has not been modified.
While Talos does allow a few, highly-controlled write points to the filesystem, we strive to make them as non-unique and non-critical as possible.
In fact, we call the writable partition the "ephemeral" partition precisely because we want to make sure none of us ever uses it for unique, non-replicated, non-recreatable data.
Thus, if all else fails, we can always wipe the disk and get back up and running.
## Minimal
We are always trying to reduce and keep small Talos' footprint.
Because nearly the entire OS is built from scratch in Go, we are already
starting out in a good position.
We have no shell.
We have no SSH.
We have none of the GNU utilities, not even a rollup tool such as busybox.
Everything which is included in Talos is there because it is necessary, and
nothing is included which isn't.
As a result, the OS right now produces a SquashFS image size of less than **80 MB**.
## Ephemeral
Everything Talos writes to its disk is either replicated or reconstructable.
Since the controlplane is high availability, the loss of any node will cause
neither service disruption nor loss of data.
No writes are even allowed to the vast majority of the filesystem.
We even call the writable partition "ephemeral" to keep this idea always in
focus.
## Secure
Talos has always been designed with security in mind.
With its immutability, its minimalism, its signing, and its componenture, we are
able to simply bypass huge classes of vulnerabilities.
Moreover, because of the way we have designed Talos, we are able to take
advantage of a number of additional settings, such as the recommendations of the Kernel Self Protection Project (kspp) and the complete disablement of dynamic modules.
There are no passwords in Talos.
All networked communication is encrypted and key-authenticated.
The Talos certificates are short-lived and automatically-rotating.
Kubernetes is always constructed with its own separate PKI structure which is
enforced.
## Declarative
Everything which can be configured in Talos is done so through a single YAML
manifest.
There is no scripting and no procedural steps.
Everything is defined by the one declarative YAML file.
This configuration includes that of both Talos itself and the Kubernetes which
it forms.
This is achievable because Talos is tightly focused to do one thing: run
kubernetes, in the easiest, most secure, most reliable way it can.

View File

@ -0,0 +1,74 @@
---
title: "Network Connectivity"
weight: 80
description: "Description of the Networking Connectivity needed by Talos Linux"
aliases:
- ../guides/configuring-network-connectivity
---
## Configuring Network Connectivity
The simplest way to deploy Talos is by ensuring that all the remote components of the system (`talosctl`, the control plane nodes, and worker nodes) all have layer 2 connectivity.
This is not always possible, however, so this page lays out the minimal network access that is required to configure and operate a talos cluster.
> Note: These are the ports required for Talos specifically, and should be configured _in addition_ to the ports required by kuberenetes.
> See the [kubernetes docs](https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/#check-required-ports) for information on the ports used by kubernetes itself.
### Control plane node(s)
<table class="table-auto">
<thead>
<tr>
<th class="px-4 py-2">Protocol</th>
<th class="px-4 py-2">Direction</th>
<th class="px-4 py-2">Port Range</th>
<th class="px-4 py-2">Purpose</th>
<th class="px-4 py-2">Used By</th>
</tr>
</thead>
<tbody>
<tr>
<td class="border px-4 py-2">TCP</td>
<td class="border px-4 py-2">Inbound</td>
<td class="border px-4 py-2">50000*</td>
<td class="border px-4 py-2"><a href="../../learn-more/components/#apid">apid</a></td>
<td class="border px-4 py-2">talosctl</td>
</tr>
<tr>
<td class="border px-4 py-2">TCP</td>
<td class="border px-4 py-2">Inbound</td>
<td class="border px-4 py-2">50001*</td>
<td class="border px-4 py-2"><a href="../../learn-more/components/#trustd">trustd</a></td>
<td class="border px-4 py-2">Control plane nodes, worker nodes</td>
</tr>
</tbody>
</table>
> Ports marked with a `*` are not currently configurable, but that may change in the future.
> [Follow along here](https://github.com/siderolabs/talos/issues/1836).
### Worker node(s)
<table class="table-auto">
<thead>
<tr>
<th class="px-4 py-2">Protocol</th>
<th class="px-4 py-2">Direction</th>
<th class="px-4 py-2">Port Range</th>
<th class="px-4 py-2">Purpose</th>
<th class="px-4 py-2">Used By</th>
</tr>
</thead>
<tbody>
<tr>
<td class="border px-4 py-2">TCP</td>
<td class="border px-4 py-2">Inbound</td>
<td class="border px-4 py-2">50000*</td>
<td class="border px-4 py-2"><a href="../../learn-more/components/#apid">apid</a></td>
<td class="border px-4 py-2">Control plane nodes</td>
</tr>
</tbody>
</table>
> Ports marked with a `*` are not currently configurable, but that may change in the future.
> [Follow along here](https://github.com/siderolabs/talos/issues/1836).

View File

@ -0,0 +1,63 @@
---
title: "talosctl"
weight: 110
description: "The design and use of the Talos Linux control application."
---
The `talosctl` tool packs a lot of power into a small package.
It acts as a reference implementation for the Talos API, but it also handles a lot of
conveniences for the use of Talos and its clusters.
### Video Walkthrough
To see some live examples of talosctl usage, view the following video:
<iframe width="560" height="315" src="https://www.youtube.com/embed/pl0l_K_3Y6o" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
## Client Configuration
Talosctl configuration is located in `$XDG_CONFIG_HOME/talos/config.yaml` if `$XDG_CONFIG_HOME` is defined.
Otherwise it is in `$HOME/.talos/config`.
The location can always be overridden by the `TALOSCONFIG` environment variable or the `--talosconfig` parameter.
Like `kubectl`, `talosctl` uses the concept of configuration contexts, so any number of Talos clusters can be managed with a single configuration file.
Unlike `kubectl`, it also comes with some intelligent tooling to manage the merging of new contexts into the config.
The default operation is a non-destructive merge, where if a context of the same name already exists in the file, the context to be added is renamed by appending an index number.
You can easily overwrite instead, as well.
See the `talosctl config help` for more information.
## Endpoints and Nodes
![Endpoints and Nodes](/images/endpoints-and-nodes.png)
The `endpoints` are the communication endpoints to which the client directly talks.
These can be load balancers, DNS hostnames, a list of IPs, etc.
Further, if multiple endpoints are specified, the client will automatically load
balance and fail over between them.
In general, it is recommended that these point to the set of control plane nodes, either directly or through a reverse proxy or load balancer.
Each endpoint will automatically proxy requests destined to another node through it, so it is not necessary to change the endpoint configuration just because you wish to talk to a different node within the cluster.
Endpoints _do_, however, need to be members of the same Talos cluster as the target node, because these proxied connections reply on certificate-based authentication.
The `node` is the target node on which you wish to perform the API call.
While you can configure the target node (or even set of target nodes) inside the 'talosctl' configuration file, it is often useful to simply and explicitly declare the target node(s) using the `-n` or `--nodes` command-line parameter.
Keep in mind, when specifying nodes that their IPs and/or hostnames are as seen by the endpoint servers, not as from the client.
This is because all connections are proxied first through the endpoints.
## Kubeconfig
The configuration for accessing a Talos Kubernetes cluster is obtained with `talosctl`.
By default, `talosctl` will safely merge the cluster into the default kubeconfig.
Like `talosctl` itself, in the event of a naming conflict, the new context name will be index-appended before insertion.
The `--force` option can be used to overwrite instead.
You can also specify an alternate path by supplying it as a positional parameter.
Thus, like Talos clusters themselves, `talosctl` makes it easy to manage any
number of kubernetes clusters from the same workstation.
## Commands
Please see the [CLI reference]({{< relref "../reference/cli" >}}) for the entire list of commands which are available from `talosctl`.

View File

@ -0,0 +1,4 @@
---
title: "Reference"
weight: 70
---

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,145 @@
---
title: "Kernel"
description: "Linux kernel reference."
---
## Commandline Parameters
Talos supports a number of kernel commandline parameters. Some are required for
it to operate. Others are optional and useful in certain circumstances.
Several of these are enforced by the Kernel Self Protection Project [KSPP](https://kernsec.org/wiki/index.php/Kernel_Self_Protection_Project/Recommended_Settings).
**Required** parameters:
- `talos.config`: the HTTP(S) URL at which the machine configuration data can be found
- `talos.platform`: can be one of `aws`, `azure`, `container`, `digitalocean`, `gcp`, `metal`, `equinixMetal`, or `vmware`
- `init_on_alloc=1`: required by KSPP
- `slab_nomerge`: required by KSPP
- `pti=on`: required by KSPP
**Recommended** parameters:
- `init_on_free=1`: advised by KSPP if minimizing stale data lifetime is
important
### Available Talos-specific parameters
#### `ip`
Initial configuration of the interface, routes, DNS, NTP servers.
Full documentation is available in the [Linux kernel docs](https://www.kernel.org/doc/Documentation/filesystems/nfs/nfsroot.txt).
`ip=<client-ip>:<server-ip>:<gw-ip>:<netmask>:<hostname>:<device>:<autoconf>:<dns0-ip>:<dns1-ip>:<ntp0-ip>`
Talos will use the configuration supplied via the kernel parameter as the initial network configuration.
This parameter is useful in the environments where DHCP doesn't provide IP addresses or when default DNS and NTP servers should be overridden
before loading machine configuration.
Partial configuration can be applied as well, e.g. `ip=<:::::::<dns0-ip>:<dns1-ip>:<ntp0-ip>` sets only the DNS and NTP servers.
IPv6 addresses can be specified by enclosing them in the square brackets, e.g. `ip=[2001:db8::a]:[2001:db8::b]:[fe80::1]::master1:eth1::[2001:4860:4860::6464]:[2001:4860:4860::64]:[2001:4860:4806::]`.
#### `bond`
Bond interface configuration.
Full documentation is available in the [Dracut kernel docs](https://man7.org/linux/man-pages/man7/dracut.cmdline.7.html).
`bond=<bondname>:<bondslaves>:<options>:<mtu>`
Talos will use the `bond=` kernel parameter if supplied to set the initial bond configuration.
This parameter is useful in environments where the switch ports are suspended if the machine doesn't setup a LACP bond.
If only the bond name is supplied, the bond will be created with `eth0` and `eth1` as slaves and bond mode set as `balance-rr`
All these below configurations are equivalent:
* `bond=bond0`
* `bond=bond0:`
* `bond=bond0::`
* `bond=bond0:::`
* `bond=bond0:eth0:eth1`
* `bond=bond0:eth0:eth1:balance-rr`
An example of a bond configuration with all options specified:
`bond=bond1:eth3,eth4:mode=802.3ad,xmit_hash_policy=layer2+3:1450`
This will create a bond interface named `bond1` with `eth3` and `eth4` as slaves and set the bond mode to `802.3ad`, the transmit hash policy to `layer2+3` and bond interface MTU to 1450.
#### `panic`
The amount of time to wait after a panic before a reboot is issued.
Talos will always reboot if it encounters an unrecoverable error.
However, when collecting debug information, it may reboot too quickly for
humans to read the logs.
This option allows the user to delay the reboot to give time to collect debug
information from the console screen.
A value of `0` disables automatic rebooting entirely.
#### `talos.config`
The URL at which the machine configuration data may be found.
#### `talos.platform`
The platform name on which Talos will run.
Valid options are:
- `aws`
- `azure`
- `container`
- `digitalocean`
- `gcp`
- `metal`
- `equinixMetal`
- `vmware`
#### `talos.board`
The board name, if Talos is being used on an ARM64 SBC.
Supported boards are:
- `bananapi_m64`: Banana Pi M64
- `libretech_all_h3_cc_h5`: Libre Computer ALL-H3-CC
- `rock64`: Pine64 Rock64
- `rpi_4`: Raspberry Pi 4, Model B
#### `talos.hostname`
The hostname to be used.
The hostname is generally specified in the machine config.
However, in some cases, the DHCP server needs to know the hostname
before the machine configuration has been acquired.
Unless specifically required, the machine configuration should be used
instead.
#### `talos.shutdown`
The type of shutdown to use when Talos is told to shutdown.
Valid options are:
- `halt`
- `poweroff`
#### `talos.network.interface.ignore`
A network interface which should be ignored and not configured by Talos.
Before a configuration is applied (early on each boot), Talos attempts to
configure each network interface by DHCP.
If there are many network interfaces on the machine which have link but no
DHCP server, this can add significant boot delays.
This option may be specified multiple times for multiple network interfaces.
#### `talos.experimental.wipe`
Resets the disk before starting up the system.
Valid options are:
- `system` resets system disk.

View File

@ -0,0 +1,10 @@
---
title: "Platform"
description: "Visualization of the bootstrap process on bare metal machines."
---
### Metal
Below is a image to visualize the process of bootstrapping nodes.
<img src="/images/metal-overview.png" width="950">

View File

@ -0,0 +1,5 @@
---
title: Talos Linux Guides
weight: 20
description: "Documentation on how to manage Talos Linux"
---

View File

@ -0,0 +1,5 @@
---
title: "Configuration"
weight: 20
description: "Guides on how to configure Talos Linux machines"
---

View File

@ -0,0 +1,23 @@
---
title: "Custom Certificate Authorities"
description: "How to supply custom certificate authorities"
aliases:
- ../../guides/configuring-certificate-authorities
---
## Appending the Certificate Authority
Put into each machine the PEM encoded certificate:
```yaml
machine:
...
files:
- content: |
-----BEGIN CERTIFICATE-----
...
-----END CERTIFICATE-----
permissions: 0644
path: /etc/ssl/certs/ca-certificates
op: append
```

View File

@ -0,0 +1,35 @@
---
title: "Containerd"
description: "Customize Containerd Settings"
aliases:
- ../../guides/configuring-containerd
---
The base containerd configuration expects to merge in any additional configs present in `/var/cri/conf.d/*.toml`.
## An example of exposing metrics
Into each machine config, add the following:
```yaml
machine:
...
files:
- content: |
[metrics]
address = "0.0.0.0:11234"
path: /var/cri/conf.d/metrics.toml
op: create
```
Create cluster like normal and see that metrics are now present on this port:
```bash
$ curl 127.0.0.1:11234/v1/metrics
# HELP container_blkio_io_service_bytes_recursive_bytes The blkio io service bytes recursive
# TYPE container_blkio_io_service_bytes_recursive_bytes gauge
container_blkio_io_service_bytes_recursive_bytes{container_id="0677d73196f5f4be1d408aab1c4125cf9e6c458a4bea39e590ac779709ffbe14",device="/dev/dm-0",major="253",minor="0",namespace="k8s.io",op="Async"} 0
container_blkio_io_service_bytes_recursive_bytes{container_id="0677d73196f5f4be1d408aab1c4125cf9e6c458a4bea39e590ac779709ffbe14",device="/dev/dm-0",major="253",minor="0",namespace="k8s.io",op="Discard"} 0
...
...
```

View File

@ -0,0 +1,181 @@
---
title: "Disk Encryption"
description: "Guide on using system disk encryption"
aliases:
- ../../guides/disk-encryption
---
It is possible to enable encryption for system disks at the OS level.
As of this writing, only [STATE]({{< relref "../../learn-more/architecture/#file-system-partitions" >}}) and [EPHEMERAL]({{< relref "../../learn-more/architecture/#file-system-partitions" >}}) partitions can be encrypted.
STATE contains the most sensitive node data: secrets and certs.
EPHEMERAL partition may contain some sensitive workload data.
Data is encrypted using LUKS2, which is provided by the Linux kernel modules and `cryptsetup` utility.
The operating system will run additional setup steps when encryption is enabled.
If the disk encryption is enabled for the STATE partition, the system will:
- Save STATE encryption config as JSON in the META partition.
- Before mounting the STATE partition, load encryption configs either from the machine config or from the META partition.
Note that the machine config is always preferred over the META one.
- Before mounting the STATE partition, format and encrypt it.
This occurs only if the STATE partition is empty and has no filesystem.
If the disk encryption is enabled for the EPHEMERAL partition, the system will:
- Get the encryption config from the machine config.
- Before mounting the EPHEMERAL partition, encrypt and format it.
This occurs only if the EPHEMERAL partition is empty and has no filesystem.
## Configuration
Right now this encryption is disabled by default.
To enable disk encryption you should modify the machine configuration with the following options:
```yaml
machine:
...
systemDiskEncryption:
ephemeral:
keys:
- nodeID: {}
slot: 0
state:
keys:
- nodeID: {}
slot: 0
```
### Encryption Keys
> Note: What the LUKS2 docs call "keys" are, in reality, a passphrase.
> When this passphrase is added, LUKS2 runs argon2 to create an actual key from that passphrase.
LUKS2 supports up to 32 encryption keys and it is possible to specify all of them in the machine configuration.
Talos always tries to sync the keys list defined in the machine config with the actual keys defined for the LUKS2 partition.
So if you update the keys list you should have at least one key that is not changed to be used for keys management.
When you define a key you should specify the key kind and the `slot`:
```yaml
machine:
...
state:
keys:
- nodeID: {} # key kind
slot: 1
ephemeral:
keys:
- static:
passphrase: supersecret
slot: 0
```
Take a note that key order does not play any role on which key slot is used.
Every key must always have a slot defined.
### Encryption Key Kinds
Talos supports two kinds of keys:
- `nodeID` which is generated using the node UUID and the partition label (note that if the node UUID is not really random it will fail the entropy check).
- `static` which you define right in the configuration.
> Note: Use static keys only if your STATE partition is encrypted and only for the EPHEMERAL partition.
> For the STATE partition it will be stored in the META partition, which is not encrypted.
### Key Rotation
It is necessary to do `talosctl apply-config` a couple of times to rotate keys, since there is a need to always maintain a single working key while changing the other keys around it.
So, for example, first add a new key:
```yaml
machine:
...
ephemeral:
keys:
- static:
passphrase: oldkey
slot: 0
- static:
passphrase: newkey
slot: 1
...
```
Run:
```bash
talosctl apply-config -n <node> -f config.yaml
```
Then remove the old key:
```yaml
machine:
...
ephemeral:
keys:
- static:
passphrase: newkey
slot: 1
...
```
Run:
```bash
talosctl apply-config -n <node> -f config.yaml
```
## Going from Unencrypted to Encrypted and Vice Versa
### Ephemeral Partition
There is no in-place encryption support for the partitions right now, so to avoid losing any data only empty partitions can be encrypted.
As such, migration from unencrypted to encrypted needs some additional handling, especially around explicitly wiping partitions.
- `apply-config` should be called with `--mode=staged`.
- Partition should be wiped after `apply-config`, but before the reboot.
Edit your machine config and add the encryption configuration:
```bash
vim config.yaml
```
Apply the configuration with `--mode=staged`:
```bash
talosctl apply-config -f config.yaml -n <node ip> --mode=staged
```
Wipe the partition you're going to encrypt:
```bash
talosctl reset --system-labels-to-wipe EPHEMERAL -n <node ip> --reboot=true
```
That's it!
After you run the last command, the partition will be wiped and the node will reboot.
During the next boot the system will encrypt the partition.
### State Partition
Calling wipe against the STATE partition will make the node lose the config, so the previous flow is not going to work.
The flow should be to first wipe the STATE partition:
```bash
talosctl reset --system-labels-to-wipe STATE -n <node ip> --reboot=true
```
Node will enter into maintenance mode, then run `apply-config` with `--insecure` flag:
```bash
talosctl apply-config --insecure -n <node ip> -f config.yaml
```
After installation is complete the node should encrypt the STATE partition.

View File

@ -0,0 +1,154 @@
---
title: "Editing Machine Configuration"
description: "How to edit and patch Talos machine configuration, with reboot, immediately, or stage update on reboot."
aliases:
- ../../guides/editing-machine-configuration
---
Talos node state is fully defined by [machine configuration]({{< relref "../../reference/configuration" >}}).
Initial configuration is delivered to the node at bootstrap time, but configuration can be updated while the node is running.
> Note: Be sure that config is persisted so that configuration updates are not overwritten on reboots.
> Configuration persistence was enabled by default since Talos 0.5 (`persist: true` in machine configuration).
There are three `talosctl` commands which facilitate machine configuration updates:
* `talosctl apply-config` to apply configuration from the file
* `talosctl edit machineconfig` to launch an editor with existing node configuration, make changes and apply configuration back
* `talosctl patch machineconfig` to apply automated machine configuration via JSON patch
Each of these commands can operate in one of four modes:
* apply change in automatic mode(default): reboot if the change can't be applied without a reboot, otherwise apply the change immediately
* apply change with a reboot (`--mode=reboot`): update configuration, reboot Talos node to apply configuration change
* apply change immediately (`--mode=no-reboot` flag): change is applied immediately without a reboot, fails if the change contains any fields that can not be updated without a reboot
* apply change on next reboot (`--mode=staged`): change is staged to be applied after a reboot, but node is not rebooted
* apply change in the interactive mode (`--mode=interactive`; only for `talosctl apply-config`): launches TUI based interactive installer
> Note: applying change on next reboot (`--mode=staged`) doesn't modify current node configuration, so next call to
> `talosctl edit machineconfig --mode=staged` will not see changes
Additionally, there is also `talosctl get machineconfig`, which retrieves the current node configuration API resource and contains the machine configuration in the `.spec` field.
It can be used to modify the configuration locally before being applied to the node.
The list of config changes allowed to be applied immediately in Talos {{< release >}}:
* `.debug`
* `.cluster`
* `.machine.time`
* `.machine.certCANs`
* `.machine.install` (configuration is only applied during install/upgrade)
* `.machine.network`
* `.machine.sysfs`
* `.machine.sysctls`
* `.machine.logging`
* `.machine.controlplane`
* `.machine.kubelet`
* `.machine.pods`
* `.machine.kernel`
* `.machine.registries` (CRI containerd plugin will not pick up the registry authentication settings without a reboot)
### `talosctl apply-config`
This command is mostly used to submit initial machine configuration to the node (generated by `talosctl gen config`).
It can be used to apply new configuration from the file to the running node as well, but most of the time it's not convenient, as it doesn't operate on the current node machine configuration.
Example:
```bash
talosctl -n <IP> apply-config -f config.yaml
```
Command `apply-config` can also be invoked as `apply machineconfig`:
```bash
talosctl -n <IP> apply machineconfig -f config.yaml
```
Applying machine configuration immediately (without a reboot):
```bash
talosctl -n IP apply machineconfig -f config.yaml --mode=no-reboot
```
Starting the interactive installer:
```bash
talosctl -n IP apply machineconfig --mode=interactive
```
> Note: when a Talos node is running in the maintenance mode it's necessary to provide `--insecure (-i)` flag to connect to the API and apply the config.
### `taloctl edit machineconfig`
Command `talosctl edit` loads current machine configuration from the node and launches configured editor to modify the config.
If config hasn't been changed in the editor (or if updated config is empty), update is not applied.
> Note: Talos uses environment variables `TALOS_EDITOR`, `EDITOR` to pick up the editor preference.
> If environment variables are missing, `vi` editor is used by default.
Example:
```bash
talosctl -n <IP> edit machineconfig
```
Configuration can be edited for multiple nodes if multiple IP addresses are specified:
```bash
talosctl -n <IP1>,<IP2>,... edit machineconfig
```
Applying machine configuration change immediately (without a reboot):
```bash
talosctl -n <IP> edit machineconfig --mode=no-reboot
```
### `talosctl patch machineconfig`
Command `talosctl patch` works similar to `talosctl edit` command - it loads current machine configuration, but instead of launching configured editor it applies a set of [JSON patches](http://jsonpatch.com/) to the configuration and writes the result back to the node.
Example, updating kubelet version (in auto mode):
```bash
$ talosctl -n <IP> patch machineconfig -p '[{"op": "replace", "path": "/machine/kubelet/image", "value": "ghcr.io/siderolabs/kubelet:v{{< k8s_release >}}"}]'
patched mc at the node <IP>
```
Updating kube-apiserver version in immediate mode (without a reboot):
```bash
$ talosctl -n <IP> patch machineconfig --mode=no-reboot -p '[{"op": "replace", "path": "/cluster/apiServer/image", "value": "k8s.gcr.io/kube-apiserver:v{{< k8s_release >}}"}]'
patched mc at the node <IP>
```
A patch might be applied to multiple nodes when multiple IPs are specified:
```bash
talosctl -n <IP1>,<IP2>,... patch machineconfig -p '[{...}]'
```
Patches can also be sourced from files using `@file` syntax:
```bash
talosctl -n <IP> patch machineconfig -p @kubelet-patch.json -p @manifest-patch.json
```
It might be easier to store patches in YAML format vs. the default JSON format.
Talos can detect file format automatically:
```yaml
# kubelet-patch.yaml
- op: replace
path: /machine/kubelet/image
value: ghcr.io/siderolabs/kubelet:v{{< k8s_release >}}
```
```bash
talosctl -n <IP> patch machineconfig -p @kubelet-patch.yaml
```
### Recovering from Node Boot Failures
If a Talos node fails to boot because of wrong configuration (for example, control plane endpoint is incorrect), configuration can be updated to fix the issue.

View File

@ -0,0 +1,401 @@
---
title: "Logging"
description: "Dealing with Talos Linux logs."
aliases:
- ../../guiides/logging
---
## Viewing logs
Kernel messages can be retrieved with `talosctl dmesg` command:
```sh
$ talosctl -n 172.20.1.2 dmesg
172.20.1.2: kern: info: [2021-11-10T10:09:37.662764956Z]: Command line: init_on_alloc=1 slab_nomerge pti=on consoleblank=0 nvme_core.io_timeout=4294967295 random.trust_cpu=on printk.devkmsg=on ima_template=ima-ng ima_appraise=fix ima_hash=sha512 console=ttyS0 reboot=k panic=1 talos.shutdown=halt talos.platform=metal talos.config=http://172.20.1.1:40101/config.yaml
[...]
```
Service logs can be retrieved with `talosctl logs` command:
```sh
$ talosctl -n 172.20.1.2 services
NODE SERVICE STATE HEALTH LAST CHANGE LAST EVENT
172.20.1.2 apid Running OK 19m27s ago Health check successful
172.20.1.2 containerd Running OK 19m29s ago Health check successful
172.20.1.2 cri Running OK 19m27s ago Health check successful
172.20.1.2 etcd Running OK 19m22s ago Health check successful
172.20.1.2 kubelet Running OK 19m20s ago Health check successful
172.20.1.2 machined Running ? 19m30s ago Service started as goroutine
172.20.1.2 trustd Running OK 19m27s ago Health check successful
172.20.1.2 udevd Running OK 19m28s ago Health check successful
$ talosctl -n 172.20.1.2 logs machined
172.20.1.2: [talos] task setupLogger (1/1): done, 106.109µs
172.20.1.2: [talos] phase logger (1/7): done, 564.476µs
[...]
```
Container logs for Kubernetes pods can be retrieved with `talosctl logs -k` command:
```sh
$ talosctl -n 172.20.1.2 containers -k
NODE NAMESPACE ID IMAGE PID STATUS
172.20.1.2 k8s.io kube-system/kube-flannel-dk6d5 k8s.gcr.io/pause:3.5 1329 SANDBOX_READY
172.20.1.2 k8s.io └─ kube-system/kube-flannel-dk6d5:install-cni ghcr.io/siderolabs/install-cni:v0.7.0-alpha.0-1-g2bb2efc 0 CONTAINER_EXITED
172.20.1.2 k8s.io └─ kube-system/kube-flannel-dk6d5:install-config quay.io/coreos/flannel:v0.13.0 0 CONTAINER_EXITED
172.20.1.2 k8s.io └─ kube-system/kube-flannel-dk6d5:kube-flannel quay.io/coreos/flannel:v0.13.0 1610 CONTAINER_RUNNING
172.20.1.2 k8s.io kube-system/kube-proxy-gfkqj k8s.gcr.io/pause:3.5 1311 SANDBOX_READY
172.20.1.2 k8s.io └─ kube-system/kube-proxy-gfkqj:kube-proxy k8s.gcr.io/kube-proxy:v{{< k8s_release >}} 1379 CONTAINER_RUNNING
$ talosctl -n 172.20.1.2 logs -k kube-system/kube-proxy-gfkqj:kube-proxy
172.20.1.2: 2021-11-30T19:13:20.567825192Z stderr F I1130 19:13:20.567737 1 server_others.go:138] "Detected node IP" address="172.20.0.3"
172.20.1.2: 2021-11-30T19:13:20.599684397Z stderr F I1130 19:13:20.599613 1 server_others.go:206] "Using iptables Proxier"
[...]
```
## Sending logs
### Service logs
You can enable logs sendings in machine configuration:
```yaml
machine:
logging:
destinations:
- endpoint: "udp://127.0.0.1:12345/"
format: "json_lines"
- endpoint: "tcp://host:5044/"
format: "json_lines"
```
Several destinations can be specified.
Supported protocols are UDP and TCP.
The only currently supported format is `json_lines`:
```json
{
"msg": "[talos] apply config request: immediate true, on reboot false",
"talos-level": "info",
"talos-service": "machined",
"talos-time": "2021-11-10T10:48:49.294858021Z"
}
```
Messages are newline-separated when sent over TCP.
Over UDP messages are sent with one message per packet.
`msg`, `talos-level`, `talos-service`, and `talos-time` fields are always present; there may be additional fields.
### Kernel logs
Kernel log delivery can be enabled with the `talos.logging.kernel` kernel command line argument, which can be specified
in the `.machine.installer.extraKernelArgs`:
```yaml
machine:
install:
extraKernelArgs:
- talos.logging.kernel=tcp://host:5044/
```
Kernel log destination is specified in the same way as service log endpoint.
The only supported format is `json_lines`.
Sample message:
```json
{
"clock":6252819, // time relative to the kernel boot time
"facility":"user",
"msg":"[talos] task startAllServices (1/1): waiting for 6 services\n",
"priority":"warning",
"seq":711,
"talos-level":"warn", // Talos-translated `priority` into common logging level
"talos-time":"2021-11-26T16:53:21.3258698Z" // Talos-translated `clock` using current time
}
```
### Filebeat example
To forward logs to other Log collection services, one way to do this is sending
them to a [Filebeat](https://www.elastic.co/beats/filebeat) running in the
cluster itself (in the host network), which takes care of forwarding it to
other endpoints (and the necessary transformations).
If [Elastic Cloud on Kubernetes](https://www.elastic.co/elastic-cloud-kubernetes)
is being used, the following Beat (custom resource) configuration might be
helpful:
```yaml
apiVersion: beat.k8s.elastic.co/v1beta1
kind: Beat
metadata:
name: talos
spec:
type: filebeat
version: 7.15.1
elasticsearchRef:
name: talos
config:
filebeat.inputs:
- type: "udp"
host: "127.0.0.1:12345"
processors:
- decode_json_fields:
fields: ["message"]
target: ""
- timestamp:
field: "talos-time"
layouts:
- "2006-01-02T15:04:05.999999999Z07:00"
- drop_fields:
fields: ["message", "talos-time"]
- rename:
fields:
- from: "msg"
to: "message"
daemonSet:
updateStrategy:
rollingUpdate:
maxUnavailable: 100%
podTemplate:
spec:
dnsPolicy: ClusterFirstWithHostNet
hostNetwork: true
securityContext:
runAsUser: 0
containers:
- name: filebeat
ports:
- protocol: UDP
containerPort: 12345
hostPort: 12345
```
The input configuration ensures that messages and timestamps are extracted properly.
Refer to the Filebeat documentation on how to forward logs to other outputs.
Also note the `hostNetwork: true` in the `daemonSet` configuration.
This ensures filebeat uses the host network, and listens on `127.0.0.1:12345`
(UDP) on every machine, which can then be specified as a logging endpoint in
the machine configuration.
### Fluent-bit example
First, we'll create a value file for the `fluentd-bit` Helm chart.
```yaml
# fluentd-bit.yaml
podAnnotations:
fluentbit.io/exclude: 'true'
extraPorts:
- port: 12345
containerPort: 12345
protocol: TCP
name: talos
config:
service: |
[SERVICE]
Flush 5
Daemon Off
Log_Level warn
Parsers_File custom_parsers.conf
inputs: |
[INPUT]
Name tcp
Listen 0.0.0.0
Port 12345
Format json
Tag talos.*
[INPUT]
Name tail
Alias kubernetes
Path /var/log/containers/*.log
Parser containerd
Tag kubernetes.*
[INPUT]
Name tail
Alias audit
Path /var/log/audit/kube/*.log
Parser audit
Tag audit.*
filters: |
[FILTER]
Name kubernetes
Alias kubernetes
Match kubernetes.*
Kube_Tag_Prefix kubernetes.var.log.containers.
Use_Kubelet Off
Merge_Log On
Merge_Log_Trim On
Keep_Log Off
K8S-Logging.Parser Off
K8S-Logging.Exclude On
Annotations Off
Labels On
[FILTER]
Name modify
Match kubernetes.*
Add source kubernetes
Remove logtag
customParsers: |
[PARSER]
Name audit
Format json
Time_Key requestReceivedTimestamp
Time_Format %Y-%m-%dT%H:%M:%S.%L%z
[PARSER]
Name containerd
Format regex
Regex ^(?<time>[^ ]+) (?<stream>stdout|stderr) (?<logtag>[^ ]*) (?<log>.*)$
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L%z
outputs: |
[OUTPUT]
Name stdout
Alias stdout
Match *
Format json_lines
# If you wish to ship directly to Loki from Fluentbit,
# Uncomment the following output, updating the Host with your Loki DNS/IP info as necessary.
# [OUTPUT]
# Name loki
# Match *
# Host loki.loki.svc
# Port 3100
# Labels job=fluentbit
# Auto_Kubernetes_Labels on
daemonSetVolumes:
- name: varlog
hostPath:
path: /var/log
daemonSetVolumeMounts:
- name: varlog
mountPath: /var/log
tolerations:
- operator: Exists
effect: NoSchedule
```
Next, we will add the helm repo for FluentBit, and deploy it to the cluster.
```shell
helm repo add fluent https://fluent.github.io/helm-charts
helm upgrade -i --namespace=kube-system -f fluentd-bit.yaml fluent-bit fluent/fluent-bit
```
Now we need to find the service IP.
```shell
$ kubectl -n kube-system get svc -l app.kubernetes.io/name=fluent-bit
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
fluent-bit ClusterIP 10.200.0.138 <none> 2020/TCP,5170/TCP 108m
```
Finally, we will change talos log destination with the command ```talosctl edit mc```.
```yaml
machine:
logging:
destinations:
- endpoint: "tcp://10.200.0.138:5170"
format: "json_lines"
```
This example configuration was well tested with Cilium CNI, and it should work with iptables/ipvs based CNI plugins too.
### Vector example
[Vector](https://vector.dev) is a lightweight observability pipeline ideal for a Kubernetes environment.
It can ingest (source) logs from multiple sources, perform remapping on the logs (transform), and forward the resulting pipeline to multiple destinations (sinks).
As it is an end to end platform, it can be run as a single-deployment 'aggregator' as well as a replicaSet of 'Agents' that run on each node.
As Talos can be set as above to send logs to a destination, we can run Vector as an Aggregator, and forward both kernel and service to a UDP socket in-cluster.
Below is an excerpt of a source/sink setup for Talos, with a 'sink' destination of an in-cluster [Grafana Loki](https://grafana.com/oss/loki/) log aggregation service.
As Loki can create labels from the log input, we have set up the Loki sink to create labels based on the host IP, service and facility of the inbound logs.
Note that a method of exposing the Vector service will be required which may vary depending on your setup - a LoadBalancer is a good option.
```yaml
role: "Stateless-Aggregator"
# Sources
sources:
talos_kernel_logs:
address: 0.0.0.0:6050
type: socket
mode: udp
max_length: 102400
decoding:
codec: json
host_key: __host
talos_service_logs:
address: 0.0.0.0:6051
type: socket
mode: udp
max_length: 102400
decoding:
codec: json
host_key: __host
# Sinks
sinks:
talos_kernel:
type: loki
inputs:
- talos_kernel_logs_xform
endpoint: http://loki.system-monitoring:3100
encoding:
codec: json
except_fields:
- __host
batch:
max_bytes: 1048576
out_of_order_action: rewrite_timestamp
labels:
hostname: >-
{{`{{ __host }}`}}
facility: >-
{{`{{ facility }}`}}
talos_service:
type: loki
inputs:
- talos_service_logs_xform
endpoint: http://loki.system-monitoring:3100
encoding:
codec: json
except_fields:
- __host
batch:
max_bytes: 400000
out_of_order_action: rewrite_timestamp
labels:
hostname: >-
{{`{{ __host }}`}}
service: >-
{{`{{ "talos-service" }}`}}
```

View File

@ -0,0 +1,51 @@
---
title: "Managing PKI"
description: "How to manage Public Key Infrastructure"
aliases:
- ../../guides/managing-pki
---
## Generating an Administrator Key Pair
In order to create a key pair, you will need the root CA.
Save the the CA public key, and CA private key as `ca.crt`, and `ca.key` respectively.
Now, run the following commands to generate a certificate:
```bash
talosctl gen key --name admin
talosctl gen csr --key admin.key --ip 127.0.0.1
talosctl gen crt --ca ca --csr admin.csr --name admin
```
Now, base64 encode `admin.crt`, and `admin.key`:
```bash
cat admin.crt | base64
cat admin.key | base64
```
You can now set the `crt` and `key` fields in the `talosconfig` to the base64 encoded strings.
## Renewing an Expired Administrator Certificate
In order to renew the certificate, you will need the root CA, and the admin private key.
The base64 encoded key can be found in any one of the control plane node's configuration file.
Where it is exactly will depend on the specific version of the configuration file you are using.
Save the the CA public key, CA private key, and admin private key as `ca.crt`, `ca.key`, and `admin.key` respectively.
Now, run the following commands to generate a certificate:
```bash
talosctl gen csr --key admin.key --ip 127.0.0.1
talosctl gen crt --ca ca --csr admin.csr --name admin
```
You should see `admin.crt` in your current directory.
Now, base64 encode `admin.crt`:
```bash
cat admin.crt | base64
```
You can now set the certificate in the `talosconfig` to the base64 encoded string.

View File

@ -0,0 +1,232 @@
---
title: "NVIDIA GPU"
description: "In this guide we'll follow the procedure to support NVIDIA GPU on Talos."
aliases:
- ../../guides/nvidia-gpu
---
> Enabling NVIDIA GPU support on Talos is bound by [NVIDIA EULA](https://www.nvidia.com/en-us/drivers/nvidia-license/)
> Talos GPU support is an **alpha** feature.
These are the steps to enabling NVIDIA support in Talos.
- Talos pre-installed on a node with NVIDIA GPU installed.
- Building a custom Talos installer image with NVIDIA modules
- Building NVIDIA container toolkit system extension which allows to register a custom runtime with containerd
- Upgrading Talos with the custom installer and enabling NVIDIA modules and the system extension
Both these components require that the user build and maintain their own Talos installer image and the NVIDIA container toolkit [Talos System Extension]({{< relref "system-extensions" >}}).
## Prerequisites
This guide assumes the user has access to a container registry with `push` permissions, docker installed on the build machine and the Talos host has `pull` access to the container registry.
Set the local registry and username environment variables:
```bash
export USERNAME=<username>
export REGISTRY=<registry>
```
For eg:
```bash
export USERNAME=talos-user
export REGISTRY=ghcr.io
```
> The examples below will use the sample variables set above.
Modify accordingly for your environment.
## Building the installer image
Start by cloning the [pkgs](https://github.com/siderolabs/pkgs) repository.
Now run the following command to build and push custom Talos kernel image and the NVIDIA image with the NVIDIA kernel modules signed by the kernel built along with it.
```bash
make kernel nonfree-kmod-nvidia PLATFORM=linux/amd64 PUSH=true
```
> Replace the platform with `linux/arm64` if building for ARM64
Now we need to create a custom Talos installer image.
Start by creating a `Dockerfile` with the following content:
```Dockerfile
FROM scratch as customization
COPY --from=ghcr.io/talos-user/nonfree-kmod-nvidia:{{< release >}}-nvidia /lib/modules /lib/modules
FROM ghcr.io/siderolabs/installer:{{< release >}}
COPY --from=ghcr.io/talos-user/kernel:{{< release >}}-nvidia /boot/vmlinuz /usr/install/${TARGETARCH}/vmlinuz
```
Now build the image and push it to the registry.
```bash
DOCKER_BUILDKIT=0 docker build --squash --build-arg RM="/lib/modules" -t ghcr.io/talos-user/installer:{{< release >}}-nvidia .
docker push ghcr.io/talos-user/installer:{{< release >}}-nvidia
```
> Note: buildkit has a bug [#816](https://github.com/moby/buildkit/issues/816), to disable it use DOCKER_BUILDKIT=0
## Building the system extension
Start by cloning the [extensions](https://github.com/siderolabs/extensions) repository.
Now run the following command to build and push the system extension.
```bash
make nvidia-container-toolkit PLATFORM=linux/amd64 PUSH=true TAG=510.60.02-v1.9.0
```
> Replace the platform with `linux/arm64` if building for ARM64
## Upgrading Talos and enabling the NVIDIA modules and the system extension
> Make sure to use `talosctl` version {{< release >}} or later
First create a patch yaml `gpu-worker-patch.yaml` to update the machine config similar to below:
```yaml
- op: add
path: /machine/install/extensions
value:
- image: ghcr.io/talos-user/nvidia-container-toolkit:510.60.02-v1.9.0
- op: add
path: /machine/kernel
value:
modules:
- name: nvidia
- name: nvidia_uvm
- name: nvidia_drm
- name: nvidia_modeset
- op: add
path: /machine/sysctls
value:
net.core.bpf_jit_harden: 1
```
Now apply the patch to all Talos nodes in the cluster having NVIDIA GPU's installed:
```bash
talosctl patch mc --patch @gpu-worker-patch.yaml
```
Now we can proceed to upgrading Talos with the installer built previously:
```bash
talosctl upgrade --image=ghcr.io/talos-user/installer:{{< release >}}-nvidia
```
Once the node reboots, the NVIDIA modules should be loaded and the system extension should be installed.
This can be confirmed by running:
```bash
talosctl read /proc/modules
```
which should produce an output similar to below:
```text
nvidia_uvm 1146880 - - Live 0xffffffffc2733000 (PO)
nvidia_drm 69632 - - Live 0xffffffffc2721000 (PO)
nvidia_modeset 1142784 - - Live 0xffffffffc25ea000 (PO)
nvidia 39047168 - - Live 0xffffffffc00ac000 (PO)
```
```bash
talosctl get extensions
```
which should produce an output similar to below:
```text
NODE NAMESPACE TYPE ID VERSION NAME VERSION
172.31.41.27 runtime ExtensionStatus 000.ghcr.io-frezbo-nvidia-container-toolkit-510.60.02-v1.9.0 1 nvidia-container-toolkit 510.60.02-v1.9.0
```
```bash
talosctl read /proc/driver/nvidia/version
```
which should produce an output similar to below:
```text
NVRM version: NVIDIA UNIX x86_64 Kernel Module 510.60.02 Wed Mar 16 11:24:05 UTC 2022
GCC version: gcc version 11.2.0 (GCC)
```
## Deploying NVIDIA device plugin
First we need to create the `RuntimeClass`
Apply the following manifest to create a runtime class that uses the extension:
```yaml
---
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
name: nvidia
handler: nvidia
```
Install the NVIDIA device plugin:
```bash
helm repo add nvdp https://nvidia.github.io/k8s-device-plugin
helm repo update
helm install nvidia-device-plugin nvdp/nvidia-device-plugin --version=0.11.0 --set=runtimeClassName=nvidia
```
Apply the following manifest to run CUDA pod via nvidia runtime:
```bash
cat <<EOF | kubectl apply -f -
---
apiVersion: v1
kind: Pod
metadata:
name: gpu-operator-test
spec:
restartPolicy: OnFailure
runtimeClassName: nvidia
containers:
- name: cuda-vector-add
image: "nvidia/samples:vectoradd-cuda11.6.0"
resources:
limits:
nvidia.com/gpu: 1
<<EOF
```
The status can be viewed by running:
```bash
kubectl get pods
```
which should produce an output similar to below:
```text
NAME READY STATUS RESTARTS AGE
gpu-operator-test 0/1 Completed 0 13s
```
```bash
kubectl logs gpu-operator-test
```
which should produce an output similar to below:
```text
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done
```

View File

@ -0,0 +1,113 @@
---
title: Pull Through Image Cache
description: "How to set up local transparent container images caches."
aliases:
- ../../guides/configuring-pull-through-cache
---
In this guide we will create a set of local caching Docker registry proxies to minimize local cluster startup time.
When running Talos locally, pulling images from Docker registries might take a significant amount of time.
We spin up local caching pass-through registries to cache images and configure a local Talos cluster to use those proxies.
A similar approach might be used to run Talos in production in air-gapped environments.
It can be also used to verify that all the images are available in local registries.
## Video Walkthrough
To see a live demo of this writeup, see the video below:
<iframe width="560" height="315" src="https://www.youtube.com/embed/PRiQJR9Q33s" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
## Requirements
The follow are requirements for creating the set of caching proxies:
- Docker 18.03 or greater
- Local cluster requirements for either [docker]({{< relref "../install/local-platforms/docker" >}}) or [QEMU]({{< relref "../install/local-platforms/qemu" >}}).
## Launch the Caching Docker Registry Proxies
Talos pulls from `docker.io`, `k8s.gcr.io`, `quay.io`, `gcr.io`, and `ghcr.io` by default.
If your configuration is different, you might need to modify the commands below:
```bash
docker run -d -p 5000:5000 \
-e REGISTRY_PROXY_REMOTEURL=https://registry-1.docker.io \
--restart always \
--name registry-docker.io registry:2
docker run -d -p 5001:5000 \
-e REGISTRY_PROXY_REMOTEURL=https://k8s.gcr.io \
--restart always \
--name registry-k8s.gcr.io registry:2
docker run -d -p 5002:5000 \
-e REGISTRY_PROXY_REMOTEURL=https://quay.io \
--restart always \
--name registry-quay.io registry:2.5
docker run -d -p 5003:5000 \
-e REGISTRY_PROXY_REMOTEURL=https://gcr.io \
--restart always \
--name registry-gcr.io registry:2
docker run -d -p 5004:5000 \
-e REGISTRY_PROXY_REMOTEURL=https://ghcr.io \
--restart always \
--name registry-ghcr.io registry:2
```
> Note: Proxies are started as docker containers, and they're automatically configured to start with Docker daemon.
> Please note that `quay.io` proxy doesn't support recent Docker image schema, so we run older registry image version (2.5).
As a registry container can only handle a single upstream Docker registry, we launch a container per upstream, each on its own
host port (5000, 5001, 5002, 5003 and 5004).
## Using Caching Registries with `QEMU` Local Cluster
With a [QEMU]({{< relref "../install/local-platforms/qemu" >}}) local cluster, a bridge interface is created on the host.
As registry containers expose their ports on the host, we can use bridge IP to direct proxy requests.
```bash
sudo talosctl cluster create --provisioner qemu \
--registry-mirror docker.io=http://10.5.0.1:5000 \
--registry-mirror k8s.gcr.io=http://10.5.0.1:5001 \
--registry-mirror quay.io=http://10.5.0.1:5002 \
--registry-mirror gcr.io=http://10.5.0.1:5003 \
--registry-mirror ghcr.io=http://10.5.0.1:5004
```
The Talos local cluster should now start pulling via caching registries.
This can be verified via registry logs, e.g. `docker logs -f registry-docker.io`.
The first time cluster boots, images are pulled and cached, so next cluster boot should be much faster.
> Note: `10.5.0.1` is a bridge IP with default network (`10.5.0.0/24`), if using custom `--cidr`, value should be adjusted accordingly.
## Using Caching Registries with `docker` Local Cluster
With a [docker]({{< relref "../install/local-platforms/docker" >}}) local cluster we can use docker bridge IP, default value for that IP is `172.17.0.1`.
On Linux, the docker bridge address can be inspected with `ip addr show docker0`.
```bash
talosctl cluster create --provisioner docker \
--registry-mirror docker.io=http://172.17.0.1:5000 \
--registry-mirror k8s.gcr.io=http://172.17.0.1:5001 \
--registry-mirror quay.io=http://172.17.0.1:5002 \
--registry-mirror gcr.io=http://172.17.0.1:5003 \
--registry-mirror ghcr.io=http://172.17.0.1:5004
```
## Cleaning Up
To cleanup, run:
```bash
docker rm -f registry-docker.io
docker rm -f registry-k8s.gcr.io
docker rm -f registry-quay.io
docker rm -f registry-gcr.io
docker rm -f registry-ghcr.io
```
> Note: Removing docker registry containers also removes the image cache.
> So if you plan to use caching registries, keep the containers running.

View File

@ -0,0 +1,51 @@
---
title: "Role-based access control (RBAC)"
description: "Set up RBAC on the Talos Linux API."
aliases:
- ../../guides/rbac
---
Talos v0.11 introduced initial support for role-based access control (RBAC).
This guide will explain what that is and how to enable it without losing access to the cluster.
## RBAC in Talos
Talos uses certificates to authorize users.
The certificate subject's organization field is used to encode user roles.
There is a set of predefined roles that allow access to different [API methods]({{< relref "../../reference/api" >}}):
* `os:admin` grants access to all methods;
* `os:reader` grants access to "safe" methods (for example, that includes the ability to list files, but does not include the ability to read files content);
* `os:etcd:backup` grants access to [`/machine.MachineService/EtcdSnapshot`]({{< relref "../../reference/api#machine.EtcdSnapshotRequest" >}}) method.
Roles in the current `talosconfig` can be checked with the following command:
```sh
$ talosctl config info
[...]
Roles: os:admin
[...]
```
RBAC is enabled by default in new clusters created with `talosctl` v0.11+ and disabled otherwise.
## Enabling RBAC
First, both the Talos cluster and `talosctl` tool should be [upgraded]({{< relref "../upgrading-talos" >}}).
Then the `talosctl config new` command should be used to generate a new client configuration with the `os:admin` role.
Additional configurations and certificates for different roles can be generated by passing `--roles` flag:
```sh
talosctl config new --roles=os:reader reader
```
That command will create a new client configuration file `reader` with a new certificate with `os:reader` role.
After that, RBAC should be enabled in the machine configuration:
```yaml
machine:
features:
rbac: true
```

View File

@ -0,0 +1,88 @@
---
title: "System Extensions"
description: "Customizing the Talos Linux immutable root file system."
aliases:
- ../../guides/system-extensions
---
System extensions allow extending the Talos root filesystem, which enables a variety of features, such as including custom
container runtimes, loading additional firmware, etc.
System extensions are only activated during the installation or upgrade of Talos Linux.
With system extensions installed, the Talos root filesystem is still immutable and read-only.
## Configuration
System extensions are configured in the `.machine.install` section:
```yaml
machine:
install:
extensions:
- image: ghcr.io/siderolabs/gvisor:33f613e
```
During the initial install (e.g. when PXE booting or booting from an ISO), Talos will pull down container images for system extensions,
validate them, and include them into the Talos `initramfs` image.
System extensions will be activated on boot and overlaid on top of the Talos root filesystem.
In order to update the system extensions for a running instance, update `.machine.install.extensions` and upgrade Talos.
(Note: upgrading to the same version of Talos is fine).
## Building a Talos Image with System Extensions
System extensions can be installed into the Talos disk image (e.g. AWS AMI or VMWare OVF) by running the following command to generate the image
from the Talos source tree:
```sh
make image-metal IMAGER_SYSTEM_EXTENSIONS="ghcr.io/siderolabs/amd-ucode:20220411 ghcr.io/siderolabs/gvisor:20220405.0-v1.0.0-10-g82b41ad"
```
## Authoring System Extensions
A Talos system extension is a container image with the [specific folder structure](https://github.com/siderolabs/extensions#readme).
System extensions can be built and managed using any tool that produces container images, e.g. `docker build`.
Sidero Labs maintains a [repository of system extensions](https://github.com/siderolabs/extensions).
## Resource Definitions
Use `talosctl get extensions` to get a list of system extensions:
```bash
$ talosctl get extensions
NODE NAMESPACE TYPE ID VERSION NAME VERSION
172.20.0.2 runtime ExtensionStatus 000.ghcr.io-talos-systems-gvisor-54b831d 1 gvisor 20220117.0-v1.0.0
172.20.0.2 runtime ExtensionStatus 001.ghcr.io-talos-systems-intel-ucode-54b831d 1 intel-ucode microcode-20210608-v1.0.0
```
Use YAML or JSON format to see additional details about the extension:
```bash
$ talosctl -n 172.20.0.2 get extensions 001.ghcr.io-talos-systems-intel-ucode-54b831d -o yaml
node: 172.20.0.2
metadata:
namespace: runtime
type: ExtensionStatuses.runtime.talos.dev
id: 001.ghcr.io-talos-systems-intel-ucode-54b831d
version: 1
owner: runtime.ExtensionStatusController
phase: running
created: 2022-02-10T18:25:04Z
updated: 2022-02-10T18:25:04Z
spec:
image: 001.ghcr.io-talos-systems-intel-ucode-54b831d.sqsh
metadata:
name: intel-ucode
version: microcode-20210608-v1.0.0
author: Spencer Smith
description: |
This system extension provides Intel microcode binaries.
compatibility:
talos:
version: '>= v1.0.0'
```
## Example: gVisor
See [readme of the gVisor extension](https://github.com/siderolabs/extensions/tree/main/container-runtime/gvisor#readme).

View File

@ -0,0 +1,5 @@
---
title: "Installation"
weight: 10
description: "How to install Talos Linux on various platforms"
---

View File

@ -0,0 +1,5 @@
---
title: "Bare Metal Platforms"
weight: 20
description: "Installation of Talos Linux on various bare-metal platforms."
---

View File

@ -0,0 +1,174 @@
---
title: "Digital Rebar"
description: "In this guide we will create an Kubernetes cluster with 1 worker node, and 2 controlplane nodes using an existing digital rebar deployment."
aliases:
- ../../../bare-metal-platforms/digital-rebar
---
## Prerequisites
- 3 nodes (please see [hardware requirements]({{< relref "../../../introduction/system-requirements/" >}}))
- Loadbalancer
- Digital Rebar Server
- Talosctl access (see [talosctl setup]({{< relref "../../../introduction/getting-started/#talosctl" >}}))
## Creating a Cluster
In this guide we will create an Kubernetes cluster with 1 worker node, and 2 controlplane nodes.
We assume an existing digital rebar deployment, and some familiarity with iPXE.
We leave it up to the user to decide if they would like to use static networking, or DHCP.
The setup and configuration of DHCP will not be covered.
### Create the Machine Configuration Files
#### Generating Base Configurations
Using the DNS name of the load balancer, generate the base configuration files for the Talos machines:
```bash
$ talosctl gen config talos-k8s-metal-tutorial https://<load balancer IP or DNS>:<port>
created controlplane.yaml
created worker.yaml
created talosconfig
```
> The loadbalancer is used to distribute the load across multiple controlplane nodes.
> This isn't covered in detail, because we assume some loadbalancing knowledge before hand.
> If you think this should be added to the docs, please [create a issue](https://github.com/siderolabs/talos/issues).
At this point, you can modify the generated configs to your liking.
Optionally, you can specify `--config-patch` with RFC6902 jsonpatch which will be applied during the config generation.
#### Validate the Configuration Files
```bash
$ talosctl validate --config controlplane.yaml --mode metal
controlplane.yaml is valid for metal mode
$ talosctl validate --config worker.yaml --mode metal
worker.yaml is valid for metal mode
```
#### Publishing the Machine Configuration Files
Digital Rebar has a built-in fileserver, which means we can use this feature to expose the talos configuration files.
We will place `controlplane.yaml`, and `worker.yaml` into Digital Rebar file server by using the `drpcli` tools.
Copy the generated files from the step above into your Digital Rebar installation.
```bash
drpcli file upload <file>.yaml as <file>.yaml
```
Replacing `<file>` with controlplane or worker.
### Download the boot files
Download a recent version of `boot.tar.gz` from [github.](https://github.com/siderolabs/talos/releases/)
Upload to DRB:
```bash
$ drpcli isos upload boot.tar.gz as talos.tar.gz
{
"Path": "talos.tar.gz",
"Size": 96470072
}
```
We have some Digital Rebar [example files](https://github.com/siderolabs/talos/tree/master/hack/test/digitalrebar/) in the Git repo you can use to provision Digital Rebar with drpcli.
To apply these configs you need to create them, and then apply them as follow:
```bash
$ drpcli bootenvs create talos
{
"Available": true,
"BootParams": "",
"Bundle": "",
"Description": "",
"Documentation": "",
"Endpoint": "",
"Errors": [],
"Initrds": [],
"Kernel": "",
"Meta": {},
"Name": "talos",
"OS": {
"Codename": "",
"Family": "",
"IsoFile": "",
"IsoSha256": "",
"IsoUrl": "",
"Name": "",
"SupportedArchitectures": {},
"Version": ""
},
"OnlyUnknown": false,
"OptionalParams": [],
"ReadOnly": false,
"RequiredParams": [],
"Templates": [],
"Validated": true
}
```
```bash
drpcli bootenvs update talos - < bootenv.yaml
```
> You need to do this for all files in the example directory.
> If you don't have access to the `drpcli` tools you can also use the webinterface.
It's important to have a corresponding SHA256 hash matching the boot.tar.gz
#### Bootenv BootParams
We're using some of Digital Rebar built in templating to make sure the machine gets the correct role assigned.
`talos.platform=metal talos.config={{ .ProvisionerURL }}/files/{{.Param \"talos/role\"}}.yaml"`
This is why we also include a `params.yaml` in the example directory to make sure the role is set to one of the following:
- controlplane
- worker
The `{{.Param \"talos/role\"}}` then gets populated with one of the above roles.
### Boot the Machines
In the UI of Digital Rebar you need to select the machines you want to provision.
Once selected, you need to assign to following:
- Profile
- Workflow
This will provision the Stage and Bootenv with the talos values.
Once this is done, you can boot the machine.
To understand the boot process, we have a higher level overview located at [metal overview]({{< relref "../../../reference/platform" >}}).
### Bootstrap Etcd
To configure `talosctl` we will need the first control plane node's IP:
Set the `endpoints` and `nodes`:
```bash
talosctl --talosconfig talosconfig config endpoint <control plane 1 IP>
talosctl --talosconfig talosconfig config node <control plane 1 IP>
```
Bootstrap `etcd`:
```bash
talosctl --talosconfig talosconfig bootstrap
```
### Retrieve the `kubeconfig`
At this point we can retrieve the admin `kubeconfig` by running:
```bash
talosctl --talosconfig talosconfig kubeconfig .
```

View File

@ -0,0 +1,127 @@
---
title: "Equinix Metal"
description: "Creating Talos cluster using Equinix Metal."
aliases:
- ../../../bare-metal-platforms/equinix-metal
---
## Prerequisites
This guide assumes the user has a working API token, the [Equinix Metal CLI](https://github.com/equinix/metal-cli/) installed, and some familiarity with the CLI.
## Network Booting
To install Talos to a server a working TFTP and iPXE server are needed.
How this is done varies and is left as an exercise for the user.
In general this requires a Talos kernel vmlinuz and initramfs.
These assets can be downloaded from a given [release](https://github.com/siderolabs/talos/releases).
## Special Considerations
### PXE Boot Kernel Parameters
The following is a list of kernel parameters required by Talos:
- `talos.platform`: set this to `equinixMetal`
- `init_on_alloc=1`: required by KSPP
- `slab_nomerge`: required by KSPP
- `pti=on`: required by KSPP
### User Data
<!-- textlint-disable one-sentence-per-line -->
To configure a Talos you can use the metadata service provide by Equinix Metal.
It is required to add a shebang to the top of the configuration file.
The shebang is arbitrary in the case of Talos, and the convention we use is `#!talos`.
<!-- textlint-enable one-sentence-per-line -->
## Creating a Cluster via the Equinix Metal CLI
### Control Plane Endpoint
The strategy used for an HA cluster varies and is left as an exercise for the user.
Some of the known ways are:
- DNS
- Load Balancer
- BPG
### Create the Machine Configuration Files
#### Generating Base Configurations
Using the DNS name of the loadbalancer created earlier, generate the base configuration files for the Talos machines:
```bash
$ talosctl gen config talos-k8s-aws-tutorial https://<load balancer IP or DNS>:<port>
created controlplane.yaml
created worker.yaml
created talosconfig
```
Now add the required shebang (e.g. `#!talos`) at the top of `controlplane.yaml`, and `worker.yaml`
At this point, you can modify the generated configs to your liking.
Optionally, you can specify `--config-patch` with RFC6902 jsonpatch which will be applied during the config generation.
#### Validate the Configuration Files
```bash
talosctl validate --config controlplane.yaml --mode metal
talosctl validate --config worker.yaml --mode metal
```
> Note: Validation of the install disk could potentially fail as the validation
> is performed on you local machine and the specified disk may not exist.
#### Create the Control Plane Nodes
```bash
metal device create \
--project-id $PROJECT_ID \
--facility $FACILITY \
--ipxe-script-url $PXE_SERVER \
--operating-system "custom_ipxe" \
--plan $PLAN\
--hostname $HOSTNAME\
--userdata-file controlplane.yaml
```
> Note: The above should be invoked at least twice in order for `etcd` to form quorum.
#### Create the Worker Nodes
```bash
metal device create \
--project-id $PROJECT_ID \
--facility $FACILITY \
--ipxe-script-url $PXE_SERVER \
--operating-system "custom_ipxe" \
--plan $PLAN\
--hostname $HOSTNAME\
--userdata-file worker.yaml
```
### Bootstrap Etcd
Set the `endpoints` and `nodes`:
```bash
talosctl --talosconfig talosconfig config endpoint <control plane 1 IP>
talosctl --talosconfig talosconfig config node <control plane 1 IP>
```
Bootstrap `etcd`:
```bash
talosctl --talosconfig talosconfig bootstrap
```
### Retrieve the `kubeconfig`
At this point we can retrieve the admin `kubeconfig` by running:
```bash
talosctl --talosconfig talosconfig kubeconfig .
```

View File

@ -0,0 +1,176 @@
---
title: "Matchbox"
description: "In this guide we will create an HA Kubernetes cluster with 3 worker nodes using an existing load balancer and matchbox deployment."
aliases:
- ../../../bare-metal-platforms/matchbox
---
## Creating a Cluster
In this guide we will create an HA Kubernetes cluster with 3 worker nodes.
We assume an existing load balancer, matchbox deployment, and some familiarity with iPXE.
We leave it up to the user to decide if they would like to use static networking, or DHCP.
The setup and configuration of DHCP will not be covered.
### Create the Machine Configuration Files
#### Generating Base Configurations
Using the DNS name of the load balancer, generate the base configuration files for the Talos machines:
```bash
$ talosctl gen config talos-k8s-metal-tutorial https://<load balancer IP or DNS>:<port>
created controlplane.yaml
created worker.yaml
created talosconfig
```
At this point, you can modify the generated configs to your liking.
Optionally, you can specify `--config-patch` with RFC6902 jsonpatch which will be applied during the config generation.
#### Validate the Configuration Files
```bash
$ talosctl validate --config controlplane.yaml --mode metal
controlplane.yaml is valid for metal mode
$ talosctl validate --config worker.yaml --mode metal
worker.yaml is valid for metal mode
```
#### Publishing the Machine Configuration Files
In bare-metal setups it is up to the user to provide the configuration files over HTTP(S).
A special kernel parameter (`talos.config`) must be used to inform Talos about _where_ it should retreive its' configuration file.
To keep things simple we will place `controlplane.yaml`, and `worker.yaml` into Matchbox's `assets` directory.
This directory is automatically served by Matchbox.
### Create the Matchbox Configuration Files
The profiles we will create will reference `vmlinuz`, and `initramfs.xz`.
Download these files from the [release](https://github.com/siderolabs/talos/releases) of your choice, and place them in `/var/lib/matchbox/assets`.
#### Profiles
##### Control Plane Nodes
```json
{
"id": "control-plane",
"name": "control-plane",
"boot": {
"kernel": "/assets/vmlinuz",
"initrd": ["/assets/initramfs.xz"],
"args": [
"initrd=initramfs.xz",
"init_on_alloc=1",
"slab_nomerge",
"pti=on",
"console=tty0",
"console=ttyS0",
"printk.devkmsg=on",
"talos.platform=metal",
"talos.config=http://matchbox.talos.dev/assets/controlplane.yaml"
]
}
}
```
> Note: Be sure to change `http://matchbox.talos.dev` to the endpoint of your matchbox server.
##### Worker Nodes
```json
{
"id": "default",
"name": "default",
"boot": {
"kernel": "/assets/vmlinuz",
"initrd": ["/assets/initramfs.xz"],
"args": [
"initrd=initramfs.xz",
"init_on_alloc=1",
"slab_nomerge",
"pti=on",
"console=tty0",
"console=ttyS0",
"printk.devkmsg=on",
"talos.platform=metal",
"talos.config=http://matchbox.talos.dev/assets/worker.yaml"
]
}
}
```
#### Groups
Now, create the following groups, and ensure that the `selector`s are accurate for your specific setup.
```json
{
"id": "control-plane-1",
"name": "control-plane-1",
"profile": "control-plane",
"selector": {
...
}
}
```
```json
{
"id": "control-plane-2",
"name": "control-plane-2",
"profile": "control-plane",
"selector": {
...
}
}
```
```json
{
"id": "control-plane-3",
"name": "control-plane-3",
"profile": "control-plane",
"selector": {
...
}
}
```
```json
{
"id": "default",
"name": "default",
"profile": "default"
}
```
### Boot the Machines
Now that we have our configuraton files in place, boot all the machines.
Talos will come up on each machine, grab its' configuration file, and bootstrap itself.
### Bootstrap Etcd
Set the `endpoints` and `nodes`:
```bash
talosctl --talosconfig talosconfig config endpoint <control plane 1 IP>
talosctl --talosconfig talosconfig config node <control plane 1 IP>
```
Bootstrap `etcd`:
```bash
talosctl --talosconfig talosconfig bootstrap
```
### Retrieve the `kubeconfig`
At this point we can retrieve the admin `kubeconfig` by running:
```bash
talosctl --talosconfig talosconfig kubeconfig .
```

View File

@ -0,0 +1,10 @@
---
title: "Sidero"
description: "Sidero is a project created by the Talos team that has native support for Talos."
aliases:
- ../../../bare-metal-platforms/sidero
---
Sidero Metal is a project created by the Talos team that provides a bare metal installer for Cluster API, and that has native support for Talos Linux.
It can be easily installed using clusterctl.
The best way to get started with Sidero Metal is to visit the [website](https://www.sidero.dev/).

View File

@ -0,0 +1,5 @@
---
title: "Cloud Platforms"
weight: 40
description: "Installation of Talos Linux on many cloud platforms."
---

View File

@ -0,0 +1,269 @@
---
title: "AWS"
description: "Creating a cluster via the AWS CLI."
aliases:
- ../../../cloud-platforms/aws
---
## Official AMI Images
Official AMI image ID can be found in the `cloud-images.json` file attached to the Talos release:
```bash
curl -sL https://github.com/siderolabs/talos/releases/download/{{< release >}}/cloud-images.json | \
jq -r '.[] | select(.region == "us-east-1") | select (.arch == "amd64") | .id'
```
Replace `us-east-1` and `amd64` in the line above with the desired region and architecture.
## Creating a Cluster via the AWS CLI
In this guide we will create an HA Kubernetes cluster with 3 worker nodes.
We assume an existing VPC, and some familiarity with AWS.
If you need more information on AWS specifics, please see the [official AWS documentation](https://docs.aws.amazon.com).
### Create the Subnet
```bash
aws ec2 create-subnet \
--region $REGION \
--vpc-id $VPC \
--cidr-block ${CIDR_BLOCK}
```
### Create the AMI
#### Prepare the Import Prerequisites
##### Create the S3 Bucket
```bash
aws s3api create-bucket \
--bucket $BUCKET \
--create-bucket-configuration LocationConstraint=$REGION \
--acl private
```
##### Create the `vmimport` Role
In order to create an AMI, ensure that the `vmimport` role exists as described in the [official AWS documentation](https://docs.aws.amazon.com/vm-import/latest/userguide/vmie_prereqs.html#vmimport-role).
Note that the role should be associated with the S3 bucket we created above.
##### Create the Image Snapshot
First, download the AWS image from a Talos release:
```bash
curl -LO https://github.com/siderolabs/talos/releases/download/{{< release >}}/aws-amd64.tar.gz | tar -xv
```
Copy the RAW disk to S3 and import it as a snapshot:
```bash
aws s3 cp disk.raw s3://$BUCKET/talos-aws-tutorial.raw
aws ec2 import-snapshot \
--region $REGION \
--description "Talos kubernetes tutorial" \
--disk-container "Format=raw,UserBucket={S3Bucket=$BUCKET,S3Key=talos-aws-tutorial.raw}"
```
Save the `SnapshotId`, as we will need it once the import is done.
To check on the status of the import, run:
```bash
aws ec2 describe-import-snapshot-tasks \
--region $REGION \
--import-task-ids
```
Once the `SnapshotTaskDetail.Status` indicates `completed`, we can register the image.
##### Register the Image
```bash
aws ec2 register-image \
--region $REGION \
--block-device-mappings "DeviceName=/dev/xvda,VirtualName=talos,Ebs={DeleteOnTermination=true,SnapshotId=$SNAPSHOT,VolumeSize=4,VolumeType=gp2}" \
--root-device-name /dev/xvda \
--virtualization-type hvm \
--architecture x86_64 \
--ena-support \
--name talos-aws-tutorial-ami
```
We now have an AMI we can use to create our cluster.
Save the AMI ID, as we will need it when we create EC2 instances.
### Create a Security Group
```bash
aws ec2 create-security-group \
--region $REGION \
--group-name talos-aws-tutorial-sg \
--description "Security Group for EC2 instances to allow ports required by Talos"
```
Using the security group ID from above, allow all internal traffic within the same security group:
```bash
aws ec2 authorize-security-group-ingress \
--region $REGION \
--group-name talos-aws-tutorial-sg \
--protocol all \
--port 0 \
--source-group $SECURITY_GROUP
```
and expose the Talos and Kubernetes APIs:
```bash
aws ec2 authorize-security-group-ingress \
--region $REGION \
--group-name talos-aws-tutorial-sg \
--protocol tcp \
--port 6443 \
--cidr 0.0.0.0/0
aws ec2 authorize-security-group-ingress \
--region $REGION \
--group-name talos-aws-tutorial-sg \
--protocol tcp \
--port 50000-50001 \
--cidr 0.0.0.0/0
```
### Create a Load Balancer
```bash
aws elbv2 create-load-balancer \
--region $REGION \
--name talos-aws-tutorial-lb \
--type network --subnets $SUBNET
```
Take note of the DNS name and ARN.
We will need these soon.
### Create the Machine Configuration Files
#### Generating Base Configurations
Using the DNS name of the loadbalancer created earlier, generate the base configuration files for the Talos machines:
```bash
$ talosctl gen config talos-k8s-aws-tutorial https://<load balancer IP or DNS>:<port> --with-examples=false --with-docs=false
created controlplane.yaml
created worker.yaml
created talosconfig
```
Take note that the generated configs are too long for AWS userdata field if the `--with-examples` and `--with-docs` flags are not passed.
At this point, you can modify the generated configs to your liking.
Optionally, you can specify `--config-patch` with RFC6902 jsonpatch which will be applied during the config generation.
#### Validate the Configuration Files
```bash
$ talosctl validate --config controlplane.yaml --mode cloud
controlplane.yaml is valid for cloud mode
$ talosctl validate --config worker.yaml --mode cloud
worker.yaml is valid for cloud mode
```
### Create the EC2 Instances
> Note: There is a known issue that prevents Talos from running on T2 instance types.
> Please use T3 if you need burstable instance types.
#### Create the Control Plane Nodes
```bash
CP_COUNT=1
while [[ "$CP_COUNT" -lt 4 ]]; do
aws ec2 run-instances \
--region $REGION \
--image-id $AMI \
--count 1 \
--instance-type t3.small \
--user-data file://controlplane.yaml \
--subnet-id $SUBNET \
--security-group-ids $SECURITY_GROUP \
--associate-public-ip-address \
--tag-specifications "ResourceType=instance,Tags=[{Key=Name,Value=talos-aws-tutorial-cp-$CP_COUNT}]"
((CP_COUNT++))
done
```
> Make a note of the resulting `PrivateIpAddress` from the init and controlplane nodes for later use.
#### Create the Worker Nodes
```bash
aws ec2 run-instances \
--region $REGION \
--image-id $AMI \
--count 3 \
--instance-type t3.small \
--user-data file://worker.yaml \
--subnet-id $SUBNET \
--security-group-ids $SECURITY_GROUP
--tag-specifications "ResourceType=instance,Tags=[{Key=Name,Value=talos-aws-tutorial-worker}]"
```
### Configure the Load Balancer
```bash
aws elbv2 create-target-group \
--region $REGION \
--name talos-aws-tutorial-tg \
--protocol TCP \
--port 6443 \
--target-type ip \
--vpc-id $VPC
```
Now, using the target group's ARN, and the **PrivateIpAddress** from the instances that you created :
```bash
aws elbv2 register-targets \
--region $REGION \
--target-group-arn $TARGET_GROUP_ARN \
--targets Id=$CP_NODE_1_IP Id=$CP_NODE_2_IP Id=$CP_NODE_3_IP
```
Using the ARNs of the load balancer and target group from previous steps, create the listener:
```bash
aws elbv2 create-listener \
--region $REGION \
--load-balancer-arn $LOAD_BALANCER_ARN \
--protocol TCP \
--port 443 \
--default-actions Type=forward,TargetGroupArn=$TARGET_GROUP_ARN
```
### Bootstrap Etcd
Set the `endpoints` and `nodes`:
```bash
talosctl --talosconfig talosconfig config endpoint <control plane 1 IP>
talosctl --talosconfig talosconfig config node <control plane 1 IP>
```
Bootstrap `etcd`:
```bash
talosctl --talosconfig talosconfig bootstrap
```
### Retrieve the `kubeconfig`
At this point we can retrieve the admin `kubeconfig` by running:
```bash
talosctl --talosconfig talosconfig kubeconfig .
```

View File

@ -0,0 +1,294 @@
---
title: "Azure"
description: "Creating a cluster via the CLI on Azure."
aliases:
- ../../../cloud-platforms/azure
---
## Creating a Cluster via the CLI
In this guide we will create an HA Kubernetes cluster with 1 worker node.
We assume existing [Blob Storage](https://docs.microsoft.com/en-us/azure/storage/blobs/), and some familiarity with Azure.
If you need more information on Azure specifics, please see the [official Azure documentation](https://docs.microsoft.com/en-us/azure/).
### Environment Setup
We'll make use of the following environment variables throughout the setup.
Edit the variables below with your correct information.
```bash
# Storage account to use
export STORAGE_ACCOUNT="StorageAccountName"
# Storage container to upload to
export STORAGE_CONTAINER="StorageContainerName"
# Resource group name
export GROUP="ResourceGroupName"
# Location
export LOCATION="centralus"
# Get storage account connection string based on info above
export CONNECTION=$(az storage account show-connection-string \
-n $STORAGE_ACCOUNT \
-g $GROUP \
-o tsv)
```
### Create the Image
First, download the Azure image from a [Talos release](https://github.com/siderolabs/talos/releases).
Once downloaded, untar with `tar -xvf /path/to/azure-amd64.tar.gz`
#### Upload the VHD
Once you have pulled down the image, you can upload it to blob storage with:
```bash
az storage blob upload \
--connection-string $CONNECTION \
--container-name $STORAGE_CONTAINER \
-f /path/to/extracted/talos-azure.vhd \
-n talos-azure.vhd
```
#### Register the Image
Now that the image is present in our blob storage, we'll register it.
```bash
az image create \
--name talos \
--source https://$STORAGE_ACCOUNT.blob.core.windows.net/$STORAGE_CONTAINER/talos-azure.vhd \
--os-type linux \
-g $GROUP
```
### Network Infrastructure
#### Virtual Networks and Security Groups
Once the image is prepared, we'll want to work through setting up the network.
Issue the following to create a network security group and add rules to it.
```bash
# Create vnet
az network vnet create \
--resource-group $GROUP \
--location $LOCATION \
--name talos-vnet \
--subnet-name talos-subnet
# Create network security group
az network nsg create -g $GROUP -n talos-sg
# Client -> apid
az network nsg rule create \
-g $GROUP \
--nsg-name talos-sg \
-n apid \
--priority 1001 \
--destination-port-ranges 50000 \
--direction inbound
# Trustd
az network nsg rule create \
-g $GROUP \
--nsg-name talos-sg \
-n trustd \
--priority 1002 \
--destination-port-ranges 50001 \
--direction inbound
# etcd
az network nsg rule create \
-g $GROUP \
--nsg-name talos-sg \
-n etcd \
--priority 1003 \
--destination-port-ranges 2379-2380 \
--direction inbound
# Kubernetes API Server
az network nsg rule create \
-g $GROUP \
--nsg-name talos-sg \
-n kube \
--priority 1004 \
--destination-port-ranges 6443 \
--direction inbound
```
#### Load Balancer
We will create a public ip, load balancer, and a health check that we will use for our control plane.
```bash
# Create public ip
az network public-ip create \
--resource-group $GROUP \
--name talos-public-ip \
--allocation-method static
# Create lb
az network lb create \
--resource-group $GROUP \
--name talos-lb \
--public-ip-address talos-public-ip \
--frontend-ip-name talos-fe \
--backend-pool-name talos-be-pool
# Create health check
az network lb probe create \
--resource-group $GROUP \
--lb-name talos-lb \
--name talos-lb-health \
--protocol tcp \
--port 6443
# Create lb rule for 6443
az network lb rule create \
--resource-group $GROUP \
--lb-name talos-lb \
--name talos-6443 \
--protocol tcp \
--frontend-ip-name talos-fe \
--frontend-port 6443 \
--backend-pool-name talos-be-pool \
--backend-port 6443 \
--probe-name talos-lb-health
```
#### Network Interfaces
In Azure, we have to pre-create the NICs for our control plane so that they can be associated with our load balancer.
```bash
for i in $( seq 0 1 2 ); do
# Create public IP for each nic
az network public-ip create \
--resource-group $GROUP \
--name talos-controlplane-public-ip-$i \
--allocation-method static
# Create nic
az network nic create \
--resource-group $GROUP \
--name talos-controlplane-nic-$i \
--vnet-name talos-vnet \
--subnet talos-subnet \
--network-security-group talos-sg \
--public-ip-address talos-controlplane-public-ip-$i\
--lb-name talos-lb \
--lb-address-pools talos-be-pool
done
# NOTES:
# Talos can detect PublicIPs automatically if PublicIP SKU is Basic.
# Use `--sku Basic` to set SKU to Basic.
```
### Cluster Configuration
With our networking bits setup, we'll fetch the IP for our load balancer and create our configuration files.
```bash
LB_PUBLIC_IP=$(az network public-ip show \
--resource-group $GROUP \
--name talos-public-ip \
--query [ipAddress] \
--output tsv)
talosctl gen config talos-k8s-azure-tutorial https://${LB_PUBLIC_IP}:6443
```
### Compute Creation
We are now ready to create our azure nodes.
Azure allows you to pass Talos machine configuration to the virtual machine at bootstrap time via
`user-data` or `custom-data` methods.
Talos supports only `custom-data` method, machine configuration is available to the VM only on the first boot.
```bash
# Create availability set
az vm availability-set create \
--name talos-controlplane-av-set \
-g $GROUP
# Create the controlplane nodes
for i in $( seq 0 1 2 ); do
az vm create \
--name talos-controlplane-$i \
--image talos \
--custom-data ./controlplane.yaml \
-g $GROUP \
--admin-username talos \
--generate-ssh-keys \
--verbose \
--boot-diagnostics-storage $STORAGE_ACCOUNT \
--os-disk-size-gb 20 \
--nics talos-controlplane-nic-$i \
--availability-set talos-controlplane-av-set \
--no-wait
done
# Create worker node
az vm create \
--name talos-worker-0 \
--image talos \
--vnet-name talos-vnet \
--subnet talos-subnet \
--custom-data ./worker.yaml \
-g $GROUP \
--admin-username talos \
--generate-ssh-keys \
--verbose \
--boot-diagnostics-storage $STORAGE_ACCOUNT \
--nsg talos-sg \
--os-disk-size-gb 20 \
--no-wait
# NOTES:
# `--admin-username` and `--generate-ssh-keys` are required by the az cli,
# but are not actually used by talos
# `--os-disk-size-gb` is the backing disk for Kubernetes and any workload containers
# `--boot-diagnostics-storage` is to enable console output which may be necessary
# for troubleshooting
```
### Bootstrap Etcd
You should now be able to interact with your cluster with `talosctl`.
We will need to discover the public IP for our first control plane node first.
```bash
CONTROL_PLANE_0_IP=$(az network public-ip show \
--resource-group $GROUP \
--name talos-controlplane-public-ip-0 \
--query [ipAddress] \
--output tsv)
```
Set the `endpoints` and `nodes`:
```bash
talosctl --talosconfig talosconfig config endpoint $CONTROL_PLANE_0_IP
talosctl --talosconfig talosconfig config node $CONTROL_PLANE_0_IP
```
Bootstrap `etcd`:
```bash
talosctl --talosconfig talosconfig bootstrap
```
### Retrieve the `kubeconfig`
At this point we can retrieve the admin `kubeconfig` by running:
```bash
talosctl --talosconfig talosconfig kubeconfig .
```

View File

@ -0,0 +1,159 @@
---
title: "DigitalOcean"
description: "Creating a cluster via the CLI on DigitalOcean."
aliases:
- ../../../cloud-platforms/digitalocean
---
## Creating a Cluster via the CLI
In this guide we will create an HA Kubernetes cluster with 1 worker node.
We assume an existing [Space](https://www.digitalocean.com/docs/spaces/), and some familiarity with DigitalOcean.
If you need more information on DigitalOcean specifics, please see the [official DigitalOcean documentation](https://www.digitalocean.com/docs/).
### Create the Image
First, download the DigitalOcean image from a Talos release.
Extract the archive to get the `disk.raw` file, compress it using `gzip` to `disk.raw.gz`.
Using an upload method of your choice (`doctl` does not have Spaces support), upload the image to a space.
Now, create an image using the URL of the uploaded image:
```bash
doctl compute image create \
--region $REGION \
--image-description talos-digital-ocean-tutorial \
--image-url https://talos-tutorial.$REGION.digitaloceanspaces.com/disk.raw.gz \
Talos
```
Save the image ID.
We will need it when creating droplets.
### Create a Load Balancer
```bash
doctl compute load-balancer create \
--region $REGION \
--name talos-digital-ocean-tutorial-lb \
--tag-name talos-digital-ocean-tutorial-control-plane \
--health-check protocol:tcp,port:6443,check_interval_seconds:10,response_timeout_seconds:5,healthy_threshold:5,unhealthy_threshold:3 \
--forwarding-rules entry_protocol:tcp,entry_port:443,target_protocol:tcp,target_port:6443
```
We will need the IP of the load balancer.
Using the ID of the load balancer, run:
```bash
doctl compute load-balancer get --format IP <load balancer ID>
```
Save it, as we will need it in the next step.
### Create the Machine Configuration Files
#### Generating Base Configurations
Using the DNS name of the loadbalancer created earlier, generate the base configuration files for the Talos machines:
```bash
$ talosctl gen config talos-k8s-digital-ocean-tutorial https://<load balancer IP or DNS>:<port>
created controlplane.yaml
created worker.yaml
created talosconfig
```
At this point, you can modify the generated configs to your liking.
Optionally, you can specify `--config-patch` with RFC6902 jsonpatch which will be applied during the config generation.
#### Validate the Configuration Files
```bash
$ talosctl validate --config controlplane.yaml --mode cloud
controlplane.yaml is valid for cloud mode
$ talosctl validate --config worker.yaml --mode cloud
worker.yaml is valid for cloud mode
```
### Create the Droplets
#### Create the Control Plane Nodes
Run the following twice, to give ourselves three total control plane nodes:
```bash
doctl compute droplet create \
--region $REGION \
--image <image ID> \
--size s-2vcpu-4gb \
--enable-private-networking \
--tag-names talos-digital-ocean-tutorial-control-plane \
--user-data-file controlplane.yaml \
--ssh-keys <ssh key fingerprint> \
talos-control-plane-1
doctl compute droplet create \
--region $REGION \
--image <image ID> \
--size s-2vcpu-4gb \
--enable-private-networking \
--tag-names talos-digital-ocean-tutorial-control-plane \
--user-data-file controlplane.yaml \
--ssh-keys <ssh key fingerprint> \
talos-control-plane-2
doctl compute droplet create \
--region $REGION \
--image <image ID> \
--size s-2vcpu-4gb \
--enable-private-networking \
--tag-names talos-digital-ocean-tutorial-control-plane \
--user-data-file controlplane.yaml \
--ssh-keys <ssh key fingerprint> \
talos-control-plane-3
```
> Note: Although SSH is not used by Talos, DigitalOcean still requires that an SSH key be associated with the droplet.
> Create a dummy key that can be used to satisfy this requirement.
#### Create the Worker Nodes
Run the following to create a worker node:
```bash
doctl compute droplet create \
--region $REGION \
--image <image ID> \
--size s-2vcpu-4gb \
--enable-private-networking \
--user-data-file worker.yaml \
--ssh-keys <ssh key fingerprint> \
talos-worker-1
```
### Bootstrap Etcd
To configure `talosctl` we will need the first control plane node's IP:
```bash
doctl compute droplet get --format PublicIPv4 <droplet ID>
```
Set the `endpoints` and `nodes`:
```bash
talosctl --talosconfig talosconfig config endpoint <control plane 1 IP>
talosctl --talosconfig talosconfig config node <control plane 1 IP>
```
Bootstrap `etcd`:
```bash
talosctl --talosconfig talosconfig bootstrap
```
### Retrieve the `kubeconfig`
At this point we can retrieve the admin `kubeconfig` by running:
```bash
talosctl --talosconfig talosconfig kubeconfig .
```

View File

@ -0,0 +1,424 @@
---
title: "GCP"
description: "Creating a cluster via the CLI on Google Cloud Platform."
aliases:
- ../../../cloud-platforms/gcp
---
## Creating a Cluster via the CLI
In this guide, we will create an HA Kubernetes cluster in GCP with 1 worker node.
We will assume an existing [Cloud Storage bucket](https://cloud.google.com/storage/docs/creating-buckets), and some familiarity with Google Cloud.
If you need more information on Google Cloud specifics, please see the [official Google documentation](https://cloud.google.com/docs/).
[jq](https://stedolan.github.io/jq/) and [talosctl]({{< relref "../../../introduction/quickstart#talosctl" >}}) also needs to be installed
## Manual Setup
### Environment Setup
We'll make use of the following environment variables throughout the setup.
Edit the variables below with your correct information.
```bash
# Storage account to use
export STORAGE_BUCKET="StorageBucketName"
# Region
export REGION="us-central1"
```
### Create the Image
First, download the Google Cloud image from a Talos [release](https://github.com/siderolabs/talos/releases).
These images are called `gcp-$ARCH.tar.gz`.
#### Upload the Image
Once you have downloaded the image, you can upload it to your storage bucket with:
```bash
gsutil cp /path/to/gcp-amd64.tar.gz gs://$STORAGE_BUCKET
```
#### Register the image
Now that the image is present in our bucket, we'll register it.
```bash
gcloud compute images create talos \
--source-uri=gs://$STORAGE_BUCKET/gcp-amd64.tar.gz \
--guest-os-features=VIRTIO_SCSI_MULTIQUEUE
```
### Network Infrastructure
#### Load Balancers and Firewalls
Once the image is prepared, we'll want to work through setting up the network.
Issue the following to create a firewall, load balancer, and their required components.
`130.211.0.0/22` and `35.191.0.0/16` are the GCP [Load Balancer IP ranges](https://cloud.google.com/load-balancing/docs/health-checks#fw-rule)
```bash
# Create Instance Group
gcloud compute instance-groups unmanaged create talos-ig \
--zone $REGION-b
# Create port for IG
gcloud compute instance-groups set-named-ports talos-ig \
--named-ports tcp6443:6443 \
--zone $REGION-b
# Create health check
gcloud compute health-checks create tcp talos-health-check --port 6443
# Create backend
gcloud compute backend-services create talos-be \
--global \
--protocol TCP \
--health-checks talos-health-check \
--timeout 5m \
--port-name tcp6443
# Add instance group to backend
gcloud compute backend-services add-backend talos-be \
--global \
--instance-group talos-ig \
--instance-group-zone $REGION-b
# Create tcp proxy
gcloud compute target-tcp-proxies create talos-tcp-proxy \
--backend-service talos-be \
--proxy-header NONE
# Create LB IP
gcloud compute addresses create talos-lb-ip --global
# Forward 443 from LB IP to tcp proxy
gcloud compute forwarding-rules create talos-fwd-rule \
--global \
--ports 443 \
--address talos-lb-ip \
--target-tcp-proxy talos-tcp-proxy
# Create firewall rule for health checks
gcloud compute firewall-rules create talos-controlplane-firewall \
--source-ranges 130.211.0.0/22,35.191.0.0/16 \
--target-tags talos-controlplane \
--allow tcp:6443
# Create firewall rule to allow talosctl access
gcloud compute firewall-rules create talos-controlplane-talosctl \
--source-ranges 0.0.0.0/0 \
--target-tags talos-controlplane \
--allow tcp:50000
```
### Cluster Configuration
With our networking bits setup, we'll fetch the IP for our load balancer and create our configuration files.
```bash
LB_PUBLIC_IP=$(gcloud compute forwarding-rules describe talos-fwd-rule \
--global \
--format json \
| jq -r .IPAddress)
talosctl gen config talos-k8s-gcp-tutorial https://${LB_PUBLIC_IP}:443
```
Additionally, you can specify `--config-patch` with RFC6902 jsonpatch which will be applied during the config generation.
### Compute Creation
We are now ready to create our GCP nodes.
```bash
# Create the control plane nodes.
for i in $( seq 1 3 ); do
gcloud compute instances create talos-controlplane-$i \
--image talos \
--zone $REGION-b \
--tags talos-controlplane \
--boot-disk-size 20GB \
--metadata-from-file=user-data=./controlplane.yaml
done
# Add control plane nodes to instance group
for i in $( seq 0 1 3 ); do
gcloud compute instance-groups unmanaged add-instances talos-ig \
--zone $REGION-b \
--instances talos-controlplane-$i
done
# Create worker
gcloud compute instances create talos-worker-0 \
--image talos \
--zone $REGION-b \
--boot-disk-size 20GB \
--metadata-from-file=user-data=./worker.yaml
```
### Bootstrap Etcd
You should now be able to interact with your cluster with `talosctl`.
We will need to discover the public IP for our first control plane node first.
```bash
CONTROL_PLANE_0_IP=$(gcloud compute instances describe talos-controlplane-0 \
--zone $REGION-b \
--format json \
| jq -r '.networkInterfaces[0].accessConfigs[0].natIP')
```
Set the `endpoints` and `nodes`:
```bash
talosctl --talosconfig talosconfig config endpoint $CONTROL_PLANE_0_IP
talosctl --talosconfig talosconfig config node $CONTROL_PLANE_0_IP
```
Bootstrap `etcd`:
```bash
talosctl --talosconfig talosconfig bootstrap
```
### Retrieve the `kubeconfig`
At this point we can retrieve the admin `kubeconfig` by running:
```bash
talosctl --talosconfig talosconfig kubeconfig .
```
### Cleanup
```bash
# cleanup VM's
gcloud compute instances delete \
talos-worker-0 \
talos-controlplane-0 \
talos-controlplane-1 \
talos-controlplane-2
# cleanup firewall rules
gcloud compute firewall-rules delete \
talos-controlplane-talosctl \
talos-controlplane-firewall
# cleanup forwarding rules
gcloud compute forwarding-rules delete \
talos-fwd-rule
# cleanup addresses
gcloud compute addresses delete \
talos-lb-ip
# cleanup proxies
gcloud compute target-tcp-proxies delete \
talos-tcp-proxy
# cleanup backend services
gcloud compute backend-services delete \
talos-be
# cleanup health checks
gcloud compute health-checks delete \
talos-health-check
# cleanup unmanaged instance groups
gcloud compute instance-groups unmanaged delete \
talos-ig
# cleanup Talos image
gcloud compute images delete \
talos
```
## Using GCP Deployment manager
Using GCP deployment manager automatically creates a Google Storage bucket and uploads the Talos image to it.
Once the deployment is complete the generated `talosconfig` and `kubeconfig` files are uploaded to the bucket.
By default this setup creates a three node control plane and a single worker in `us-west1-b`
First we need to create a folder to store our deployment manifests and perform all subsequent operations from that folder.
```bash
mkdir -p talos-gcp-deployment
cd talos-gcp-deployment
```
### Getting the deployment manifests
We need to download two deployment manifests for the deployment from the Talos github repository.
```bash
curl -fsSLO "https://raw.githubusercontent.com/siderolabs/talos/master/website/content/{{< version >}}/cloud-platforms/gcp/config.yaml"
curl -fsSLO "https://raw.githubusercontent.com/siderolabs/talos/master/website/content/{{< version >}}/cloud-platforms/gcp/talos-ha.jinja"
# if using ccm
curl -fsSLO "https://raw.githubusercontent.com/siderolabs/talos/master/website/content/{{< version >}}/cloud-platforms/gcp/gcp-ccm.yaml"
```
### Updating the config
Now we need to update the local `config.yaml` file with any required changes such as changing the default zone, Talos version, machine sizes, nodes count etc.
An example `config.yaml` file is shown below:
```yaml
imports:
- path: talos-ha.jinja
resources:
- name: talos-ha
type: talos-ha.jinja
properties:
zone: us-west1-b
talosVersion: v0.13.2
externalCloudProvider: false
controlPlaneNodeCount: 5
controlPlaneNodeType: n1-standard-1
workerNodeCount: 3
workerNodeType: n1-standard-1
outputs:
- name: bucketName
value: $(ref.talos-ha.bucketName)
```
#### Enabling external cloud provider
Note: The `externalCloudProvider` property is set to `false` by default.
The [manifest](https://raw.githubusercontent.com/siderolabs/talos/master/website/content/{{< version >}}/cloud-platforms/gcp/gcp-ccm.yaml#L256) used for deploying the ccm (cloud controller manager) is currently using the GCP ccm provided by openshift since there are no public images for the [ccm](https://github.com/kubernetes/cloud-provider-gcp) yet.
> Since the routes controller is disabled while deploying the CCM, the CNI pods needs to be restarted after the CCM deployment is complete to remove the `node.kubernetes.io/network-unavailable` taint.
See [Nodes network-unavailable taint not removed after installing ccm](https://github.com/kubernetes/cloud-provider-gcp/issues/291) for more information
Use a custom built image for the ccm deployment if required.
### Creating the deployment
Now we are ready to create the deployment.
Confirm with `y` for any prompts.
Run the following command to create the deployment:
```bash
# use a unique name for the deployment, resources are prefixed with the deployment name
export DEPLOYMENT_NAME="<deployment name>"
gcloud deployment-manager deployments create "${DEPLOYMENT_NAME}" --config config.yaml
```
### Retrieving the outputs
First we need to get the deployment outputs.
```bash
# first get the outputs
OUTPUTS=$(gcloud deployment-manager deployments describe "${DEPLOYMENT_NAME}" --format json | jq '.outputs[]')
BUCKET_NAME=$(jq -r '. | select(.name == "bucketName").finalValue' <<< "${OUTPUTS}")
# used when cloud controller is enabled
SERVICE_ACCOUNT=$(jq -r '. | select(.name == "serviceAccount").finalValue' <<< "${OUTPUTS}")
PROJECT=$(jq -r '. | select(.name == "project").finalValue' <<< "${OUTPUTS}")
```
Note: If cloud controller manager is enabled, the below command needs to be run to allow the controller custom role to access cloud resources
```bash
gcloud projects add-iam-policy-binding \
"${PROJECT}" \
--member "serviceAccount:${SERVICE_ACCOUNT}" \
--role roles/iam.serviceAccountUser
gcloud projects add-iam-policy-binding \
"${PROJECT}" \
--member serviceAccount:"${SERVICE_ACCOUNT}" \
--role roles/compute.admin
gcloud projects add-iam-policy-binding \
"${PROJECT}" \
--member serviceAccount:"${SERVICE_ACCOUNT}" \
--role roles/compute.loadBalancerAdmin
```
### Downloading talos and kube config
In addition to the `talosconfig` and `kubeconfig` files, the storage bucket contains the `controlplane.yaml` and `worker.yaml` files used to join additional nodes to the cluster.
```bash
gsutil cp "gs://${BUCKET_NAME}/generated/talosconfig" .
gsutil cp "gs://${BUCKET_NAME}/generated/kubeconfig" .
```
### Deploying the cloud controller manager
```bash
kubectl \
--kubeconfig kubeconfig \
--namespace kube-system \
apply \
--filename gcp-ccm.yaml
# wait for the ccm to be up
kubectl \
--kubeconfig kubeconfig \
--namespace kube-system \
rollout status \
daemonset cloud-controller-manager
```
If the cloud controller manager is enabled, we need to restart the CNI pods to remove the `node.kubernetes.io/network-unavailable` taint.
```bash
# restart the CNI pods, in this case flannel
kubectl \
--kubeconfig kubeconfig \
--namespace kube-system \
rollout restart \
daemonset kube-flannel
# wait for the pods to be restarted
kubectl \
--kubeconfig kubeconfig \
--namespace kube-system \
rollout status \
daemonset kube-flannel
```
### Check cluster status
```bash
kubectl \
--kubeconfig kubeconfig \
get nodes
```
### Cleanup deployment
Warning: This will delete the deployment and all resources associated with it.
Run below if cloud controller manager is enabled
```bash
gcloud projects remove-iam-policy-binding \
"${PROJECT}" \
--member "serviceAccount:${SERVICE_ACCOUNT}" \
--role roles/iam.serviceAccountUser
gcloud projects remove-iam-policy-binding \
"${PROJECT}" \
--member serviceAccount:"${SERVICE_ACCOUNT}" \
--role roles/compute.admin
gcloud projects remove-iam-policy-binding \
"${PROJECT}" \
--member serviceAccount:"${SERVICE_ACCOUNT}" \
--role roles/compute.loadBalancerAdmin
```
Now we can finally remove the deployment
```bash
# delete the objects in the bucket first
gsutil -m rm -r "gs://${BUCKET_NAME}"
gcloud deployment-manager deployments delete "${DEPLOYMENT_NAME}" --quiet
```

View File

@ -0,0 +1,21 @@
imports:
- path: talos-ha.jinja
resources:
- name: talos-ha
type: talos-ha.jinja
properties:
zone: us-west1-b
talosVersion: v0.13.3
externalCloudProvider: false
controlPlaneNodeCount: 3
controlPlaneNodeType: n1-standard-1
workerNodeCount: 1
workerNodeType: n1-standard-1
outputs:
- name: bucketName
value: $(ref.talos-ha.bucketName)
- name: serviceAccount
value: $(ref.talos-ha.serviceAccount)
- name: project
value: $(ref.talos-ha.project)

View File

@ -0,0 +1,276 @@
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: cloud-controller-manager
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: system:cloud-provider
rules:
- apiGroups:
- ""
resources:
- events
verbs:
- create
- patch
- update
- apiGroups:
- ""
resources:
- services/status
verbs:
- patch
- update
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: system:cloud-provider
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:cloud-provider
subjects:
- kind: ServiceAccount
name: cloud-provider
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: system:cloud-controller-manager
rules:
- apiGroups:
- ""
- events.k8s.io
resources:
- events
verbs:
- create
- patch
- update
- apiGroups:
- coordination.k8s.io
resources:
- leases
verbs:
- create
- apiGroups:
- coordination.k8s.io
resourceNames:
- cloud-controller-manager
resources:
- leases
verbs:
- get
- update
- apiGroups:
- ""
resources:
- endpoints
- serviceaccounts
verbs:
- create
- get
- update
- apiGroups:
- ""
resources:
- nodes
verbs:
- get
- update
- apiGroups:
- ""
resources:
- namespaces
verbs:
- get
- apiGroups:
- ""
resources:
- nodes/status
verbs:
- patch
- update
- apiGroups:
- ""
resources:
- secrets
verbs:
- create
- delete
- get
- update
- apiGroups:
- "authentication.k8s.io"
resources:
- tokenreviews
verbs:
- create
- apiGroups:
- "*"
resources:
- "*"
verbs:
- list
- watch
- apiGroups:
- ""
resources:
- serviceaccounts/token
verbs:
- create
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: system:controller:cloud-node-controller
rules:
- apiGroups:
- ""
resources:
- events
verbs:
- create
- patch
- update
- apiGroups:
- ""
resources:
- nodes
verbs:
- get
- list
- update
- delete
- patch
- apiGroups:
- ""
resources:
- nodes/status
verbs:
- get
- list
- update
- delete
- patch
- apiGroups:
- ""
resources:
- pods
verbs:
- list
- delete
- apiGroups:
- ""
resources:
- pods/status
verbs:
- list
- delete
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: cloud-controller-manager:apiserver-authentication-reader
namespace: kube-system
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: extension-apiserver-authentication-reader
subjects:
- apiGroup: ""
kind: ServiceAccount
name: cloud-controller-manager
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: system:cloud-controller-manager
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:cloud-controller-manager
subjects:
- kind: ServiceAccount
name: cloud-controller-manager
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: system:controller:cloud-node-controller
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:controller:cloud-node-controller
subjects:
- kind: ServiceAccount
name: cloud-node-controller
namespace: kube-system
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: cloud-controller-manager
namespace: kube-system
labels:
tier: control-plane
k8s-app: cloud-controller-manager
spec:
selector:
matchLabels:
k8s-app: cloud-controller-manager
updateStrategy:
type: RollingUpdate
template:
metadata:
labels:
tier: control-plane
k8s-app: cloud-controller-manager
spec:
nodeSelector:
node-role.kubernetes.io/master: ""
tolerations:
- key: node.cloudprovider.kubernetes.io/uninitialized
value: "true"
effect: NoSchedule
- key: node-role.kubernetes.io/master
effect: NoSchedule
securityContext:
seccompProfile:
type: RuntimeDefault
runAsUser: 65521
runAsNonRoot: true
priorityClassName: system-node-critical
hostNetwork: true
serviceAccountName: cloud-controller-manager
containers:
- name: cloud-controller-manager
image: quay.io/openshift/origin-gcp-cloud-controller-manager:4.10.0
resources:
requests:
cpu: 50m
command:
- /bin/gcp-cloud-controller-manager
args:
- --bind-address=127.0.0.1
- --cloud-provider=gce
- --use-service-account-credentials
- --configure-cloud-routes=false
- --allocate-node-cidrs=false
- --controllers=*,-nodeipam
livenessProbe:
httpGet:
host: 127.0.0.1
port: 10258
path: /healthz
scheme: HTTPS
initialDelaySeconds: 15
timeoutSeconds: 15

View File

@ -0,0 +1,282 @@
resources:
- type: storage.v1.bucket
name: {{ env["deployment"] }}-talos-assets
- name: create-talos-artifact
action: gcp-types/cloudbuild-v1:cloudbuild.projects.builds.create
metadata:
runtimePolicy:
- CREATE
properties:
steps:
- name: gcr.io/cloud-builders/curl
args:
- -fSLO
- https://github.com/siderolabs/talos/releases/download/{{ properties["talosVersion"] }}/gcp-amd64.tar.gz
- name: gcr.io/cloud-builders/gsutil
args:
- -m
- cp
- gcp-amd64.tar.gz
- gs://$(ref.{{ env["deployment"] }}-talos-assets.name)/gcp-amd64.tar.gz
timeout: 120s
- type: compute.v1.image
name: {{ env["deployment"] }}-talos-image
metadata:
dependsOn:
- create-talos-artifact
properties:
rawDisk:
source: https://storage.cloud.google.com/$(ref.{{ env["deployment"] }}-talos-assets.name)/gcp-amd64.tar.gz
sourceType: RAW
description: Talos image
family: talos
- type: compute.v1.instanceGroup
name: {{ env["deployment"] }}-talos-ig
properties:
zone: {{ properties["zone"] }}
description: Talos instance group
namedPorts:
- name: tcp6443
port: 6443
- type: compute.v1.healthCheck
name: {{ env["deployment"] }}-talos-healthcheck
properties:
description: Talos health check
type: TCP
tcpHealthCheck:
port: 6443
- type: compute.v1.backendService
name: {{ env["deployment"] }}-talos-backend
properties:
description: Talos backend service
protocol: TCP
healthChecks:
- $(ref.{{ env["deployment"] }}-talos-healthcheck.selfLink)
timeoutSec: 300
backends:
- description: Talos backend
group: $(ref.{{ env["deployment"] }}-talos-ig.selfLink)
portName: tcp6443
- type: compute.v1.targetTcpProxy
name: {{ env["deployment"] }}-talos-tcp-proxy
properties:
description: Talos TCP proxy
service: $(ref.{{ env["deployment"] }}-talos-backend.selfLink)
proxyHeader: NONE
- type: compute.v1.globalAddress
name: {{ env["deployment"] }}-talos-lb-ip
properties:
description: Talos LoadBalancer IP
- type: compute.v1.globalForwardingRule
name: talos-fwd-rule
properties:
description: Talos Forwarding rule
target: $(ref.{{ env["deployment"] }}-talos-tcp-proxy.selfLink)
IPAddress: $(ref.{{ env["deployment"] }}-talos-lb-ip.address)
IPProtocol: TCP
portRange: 443
- type: compute.v1.firewall
name: {{ env["deployment"] }}-talos-controlplane-firewall
properties:
description: Talos controlplane firewall
sourceRanges:
- 130.211.0.0/22
- 35.191.0.0/16
targetTags:
- talos-controlplane
allowed:
- IPProtocol: TCP
ports:
- 6443
- type: compute.v1.firewall
name: {{ env["deployment"] }}-talos-controlplane-talosctl
properties:
description: Talos controlplane talosctl firewall
sourceRanges:
- 0.0.0.0/0
targetTags:
- talos-controlplane
- talos-workers
allowed:
- IPProtocol: TCP
ports:
- 50000
{% if properties["externalCloudProvider"] %}
- type: gcp-types/iam-v1:projects.serviceAccounts
name: {{ env["deployment"] }}-ccm-sa
properties:
displayName: Cloud Controller Manager
accountId: {{ env["deployment"] }}-ccm-sa
{% endif %}
{% for index in range(properties["controlPlaneNodeCount"]) %}
- type: compute.v1.instance
name: {{ env["deployment"] }}-talos-controlplane-{{ index }}
properties:
zone: {{ properties["zone"] }}
machineType: zones/{{ properties["zone"] }}/machineTypes/{{ properties["controlPlaneNodeType"] }}
{% if properties["externalCloudProvider"] %}
serviceAccounts:
- email: $(ref.{{ env["deployment"] }}-ccm-sa.email)
scopes:
- https://www.googleapis.com/auth/compute
{% endif %}
tags:
items:
- talos-controlplane
- {{ env["deployment"] }}-talos-controlplane-{{ index }} # required for cloud controller-manager
disks:
- deviceName: boot
type: PERSISTENT
boot: true
autoDelete: true
initializeParams:
diskSizeGb: 20
sourceImage: $(ref.{{ env["deployment"] }}-talos-image.selfLink)
networkInterfaces:
- network: global/networks/default
accessConfigs:
- name: External NAT
type: ONE_TO_ONE_NAT
{% endfor %}
{% for index in range(properties["workerNodeCount"]) %}
- type: compute.v1.instance
name: {{ env["deployment"] }}-talos-worker-{{ index }}
properties:
zone: {{ properties["zone"] }}
machineType: zones/{{ properties["zone"] }}/machineTypes/{{ properties["workerNodeType"] }}
{% if properties["externalCloudProvider"] %}
serviceAccounts:
- email: $(ref.{{ env["deployment"] }}-ccm-sa.email)
scopes:
- https://www.googleapis.com/auth/compute
{% endif %}
tags:
items:
- talos-workers
- {{ env["deployment"] }}-talos-worker-{{ index }} # required for cloud controller-manager
disks:
- deviceName: boot
type: PERSISTENT
boot: true
autoDelete: true
initializeParams:
diskSizeGb: 20
sourceImage: $(ref.{{ env["deployment"] }}-talos-image.selfLink)
networkInterfaces:
- network: global/networks/default
accessConfigs:
- name: External NAT
type: ONE_TO_ONE_NAT
{% endfor %}
- name: {{ env["deployment"] }}-talos-ig-members
action: gcp-types/compute-v1:compute.instanceGroups.addInstances
properties:
zone: {{ properties["zone"] }}
instanceGroup: $(ref.{{ env["deployment"] }}-talos-ig.name)
instances:
{% for index in range(properties["controlPlaneNodeCount"]) %}
- instance: $(ref.{{ env["deployment"] }}-talos-controlplane-{{ index }}.selfLink)
{% endfor %}
- name: generate-config-and-bootstrap
action: gcp-types/cloudbuild-v1:cloudbuild.projects.builds.create
metadata:
runtimePolicy:
- CREATE
properties:
steps:
- name: gcr.io/cloud-builders/curl
args:
- -fSLO
- https://github.com/siderolabs/talos/releases/download/{{ properties["talosVersion"] }}/talosctl-linux-amd64
- name: alpine
args:
- /bin/sh
- -ec
- |
chmod +x talosctl-linux-amd64 && \
mv talosctl-linux-amd64 /usr/local/bin/talosctl && \
mkdir -p generated && \
talosctl gen config \
{{ env["deployment"] }} \
https://$(ref.{{ env["deployment"] }}-talos-lb-ip.address):443 \
{% if properties["externalCloudProvider"] %} --config-patch '[{"op": "add", "path": "/cluster/externalCloudProvider", "value": {"enabled": true}}]' \{% endif %}
--output-dir generated/ && \
{% for index in range(properties["controlPlaneNodeCount"]) %}
echo "applying config for {{ env["deployment"] }}-talos-controlplane-{{ index }}" && \
talosctl apply-config \
--insecure \
--nodes $(ref.{{ env["deployment"] }}-talos-controlplane-{{ index }}.networkInterfaces[0].accessConfigs[0].natIP) \
--endpoints $(ref.{{ env["deployment"] }}-talos-controlplane-{{ index }}.networkInterfaces[0].accessConfigs[0].natIP) \
--file generated/controlplane.yaml && \
{% endfor %}
{% for index in range(properties["workerNodeCount"]) %}
echo "applying config for {{ env["deployment"] }}-talos-worker-{{ index }}" && \
talosctl apply-config \
--insecure \
--nodes $(ref.{{ env["deployment"] }}-talos-worker-{{ index }}.networkInterfaces[0].accessConfigs[0].natIP) \
--endpoints $(ref.{{ env["deployment"] }}-talos-worker-{{ index }}.networkInterfaces[0].accessConfigs[0].natIP) \
--file generated/worker.yaml && \
{% endfor %}
# wait before bootstrapping
wait_count=120
until nc -vzw 3 $(ref.{{ env["deployment"] }}-talos-controlplane-0.networkInterfaces[0].accessConfigs[0].natIP) 50000; do
echo "Waiting for talos-controlplane-0 to be ready for bootstrap"
wait_count=$((wait_count=wait_count-1))
if [ "${wait_count}" -eq 0 ]; then
echo "Timeout waiting for talos-controlplane-0 to be ready for bootstrap"
# if failed just reset
{% for index in range(properties["controlPlaneNodeCount"]) %}
echo "resetting config for {{ env["deployment"] }}-talos-controlplane-{{ index }}" && \
talosctl reset \
--talosconfig generated/talosconfig\
--nodes $(ref.{{ env["deployment"] }}-talos-controlplane-{{ index }}.networkInterfaces[0].accessConfigs[0].natIP) \
--endpoints $(ref.{{ env["deployment"] }}-talos-controlplane-{{ index }}.networkInterfaces[0].accessConfigs[0].natIP) \
--graceful=false \
--system-labels-to-wipe SYSTEM \
--system-labels-to-wipe EPHEMERAL && \
{% endfor %}
{% for index in range(properties["workerNodeCount"]) %}
talosctl reset \
--talosconfig generated/talosconfig\
--nodes $(ref.{{ env["deployment"] }}-talos-worker-{{ index }}.networkInterfaces[0].accessConfigs[0].natIP) \
--endpoints $(ref.{{ env["deployment"] }}-talos-worker-{{ index }}.networkInterfaces[0].accessConfigs[0].natIP) \
--graceful=false \
--system-labels-to-wipe SYSTEM \
--system-labels-to-wipe EPHEMERAL && \
{% endfor %}
exit 1
fi
done && \
talosctl \
--talosconfig generated/talosconfig \
--nodes $(ref.{{ env["deployment"] }}-talos-controlplane-0.networkInterfaces[0].accessConfigs[0].natIP) \
--endpoints $(ref.{{ env["deployment"] }}-talos-controlplane-0.networkInterfaces[0].accessConfigs[0].natIP) \
bootstrap && \
talosctl \
--talosconfig generated/talosconfig \
--nodes $(ref.{{ env["deployment"] }}-talos-controlplane-0.networkInterfaces[0].accessConfigs[0].natIP) \
--endpoints $(ref.{{ env["deployment"] }}-talos-controlplane-0.networkInterfaces[0].accessConfigs[0].natIP) \
kubeconfig generated/ && \
talosctl \
--talosconfig generated/talosconfig \
config endpoint \
{% for index in range(properties["controlPlaneNodeCount"]) %}$(ref.{{ env["deployment"] }}-talos-controlplane-{{ index }}.networkInterfaces[0].accessConfigs[0].natIP){% if not loop.last %} {% endif %}{% endfor %} {% for index in range(properties["workerNodeCount"]) %}$(ref.{{ env["deployment"] }}-talos-worker-{{ index }}.networkInterfaces[0].accessConfigs[0].natIP){% if not loop.last %} {% endif %}{% endfor %} && \
talosctl \
--talosconfig generated/talosconfig \
config node \
{% for index in range(properties["controlPlaneNodeCount"]) %}$(ref.{{ env["deployment"] }}-talos-controlplane-{{ index }}.networkInterfaces[0].accessConfigs[0].natIP){% if not loop.last %} {% endif %}{% endfor %} {% for index in range(properties["workerNodeCount"]) %}$(ref.{{ env["deployment"] }}-talos-worker-{{ index }}.networkInterfaces[0].accessConfigs[0].natIP){% if not loop.last %} {% endif %}{% endfor %}
- name: gcr.io/cloud-builders/gsutil
args:
- -m
- cp
- -r
- generated
- gs://$(ref.{{ env["deployment"] }}-talos-assets.name)/
timeout: 360s
outputs:
- name: bucketName
value: $(ref.{{ env["deployment"] }}-talos-assets.name)
- name: serviceAccount
value: $(ref.{{ env["deployment"] }}-ccm-sa.email)
- name: project
value: {{ env["project"] }}

View File

@ -0,0 +1,248 @@
---
title: "Hetzner"
description: "Creating a cluster via the CLI (hcloud) on Hetzner."
aliases:
- ../../../cloud-platforms/hetzner
---
## Upload image
Hetzner Cloud does not support uploading custom images.
You can email their support to get a Talos ISO uploaded by following [issues:3599](https://github.com/siderolabs/talos/issues/3599#issuecomment-841172018) or you can prepare image snapshot by yourself.
There are two options to upload your own.
1. Run an instance in rescue mode and replace the system OS with the Talos image
2. Use [Hashicorp packer](https://www.packer.io/docs/builders/hetzner-cloud) to prepare an image
### Rescue mode
Create a new Server in the Hetzner console.
Enable the Hetzner Rescue System for this server and reboot.
Upon a reboot, the server will boot a special minimal Linux distribution designed for repair and reinstall.
Once running, login to the server using ```ssh``` to prepare the system disk by doing the following:
```bash
# Check that you in Rescue mode
df
### Result is like:
# udev 987432 0 987432 0% /dev
# 213.133.99.101:/nfs 308577696 247015616 45817536 85% /root/.oldroot/nfs
# overlay 995672 8340 987332 1% /
# tmpfs 995672 0 995672 0% /dev/shm
# tmpfs 398272 572 397700 1% /run
# tmpfs 5120 0 5120 0% /run/lock
# tmpfs 199132 0 199132 0% /run/user/0
# Download the Talos image
cd /tmp
wget -O /tmp/talos.raw.xz https://github.com/siderolabs/talos/releases/download/v0.13.0/hcloud-amd64.raw.xz
# Replace system
xz -d -c /tmp/talos.raw.xz | dd of=/dev/sda && sync
# shutdown the instance
shutdown -h now
```
To make sure disk content is consistent, it is recommended to shut the server down before taking an image (snapshot).
Once shutdown, simply create an image (snapshot) from the console.
You can now use this snapshot to run Talos on the cloud.
### Packer
Install [packer](https://learn.hashicorp.com/tutorials/packer/get-started-install-cli) to the local machine.
Create a config file for packer to use:
```hcl
# hcloud.pkr.hcl
packer {
required_plugins {
hcloud = {
version = ">= 1.0.0"
source = "github.com/hashicorp/hcloud"
}
}
}
variable "talos_version" {
type = string
default = "v0.13.0"
}
locals {
image = "https://github.com/siderolabs/talos/releases/download/${var.talos_version}/hcloud-amd64.raw.xz"
}
source "hcloud" "talos" {
rescue = "linux64"
image = "debian-11"
location = "hel1"
server_type = "cx11"
ssh_username = "root"
snapshot_name = "talos system disk"
snapshot_labels = {
type = "infra",
os = "talos",
version = "${var.talos_version}",
}
}
build {
sources = ["source.hcloud.talos"]
provisioner "shell" {
inline = [
"apt-get install -y wget",
"wget -O /tmp/talos.raw.xz ${local.image}",
"xz -d -c /tmp/talos.raw.xz | dd of=/dev/sda && sync",
]
}
}
```
Create a new image by issuing the commands shown below.
Note that to create a new API token for your Project, switch into the Hetzner Cloud Console choose a Project, go to Access → Security, and create a new token.
```bash
# First you need set API Token
export HCLOUD_TOKEN=${TOKEN}
# Upload image
packer init .
packer build .
# Save the image ID
export IMAGE_ID=<image-id-in-packer-output>
```
After doing this, you can find the snapshot in the console interface.
## Creating a Cluster via the CLI
This section assumes you have the [hcloud console utility](https://community.hetzner.com/tutorials/howto-hcloud-cli) on your local machine.
```bash
# Set hcloud context and api key
hcloud context create talos-tutorial
```
### Create a Load Balancer
Create a load balancer by issuing the commands shown below.
Save the IP/DNS name, as this info will be used in the next step.
```bash
hcloud load-balancer create --name controlplane --network-zone eu-central --type lb11 --label 'type=controlplane'
### Result is like:
# LoadBalancer 484487 created
# IPv4: 49.12.X.X
# IPv6: 2a01:4f8:X:X::1
hcloud load-balancer add-service controlplane \
--listen-port 6443 --destination-port 6443 --protocol tcp
hcloud load-balancer add-target controlplane \
--label-selector 'type=controlplane'
```
### Create the Machine Configuration Files
#### Generating Base Configurations
Using the IP/DNS name of the loadbalancer created earlier, generate the base configuration files for the Talos machines by issuing:
```bash
$ talosctl gen config talos-k8s-hcloud-tutorial https://<load balancer IP or DNS>:6443
created controlplane.yaml
created worker.yaml
created talosconfig
```
At this point, you can modify the generated configs to your liking.
Optionally, you can specify `--config-patch` with RFC6902 jsonpatches which will be applied during the config generation.
#### Validate the Configuration Files
Validate any edited machine configs with:
```bash
$ talosctl validate --config controlplane.yaml --mode cloud
controlplane.yaml is valid for cloud mode
$ talosctl validate --config worker.yaml --mode cloud
worker.yaml is valid for cloud mode
```
### Create the Servers
We can now create our servers.
Note that you can find ```IMAGE_ID``` in the snapshot section of the console: ```https://console.hetzner.cloud/projects/$PROJECT_ID/servers/snapshots```.
#### Create the Control Plane Nodes
Create the control plane nodes with:
```bash
export IMAGE_ID=<your-image-id>
hcloud server create --name talos-control-plane-1 \
--image ${IMAGE_ID} \
--type cx21 --location hel1 \
--label 'type=controlplane' \
--user-data-from-file controlplane.yaml
hcloud server create --name talos-control-plane-2 \
--image ${IMAGE_ID} \
--type cx21 --location fsn1 \
--label 'type=controlplane' \
--user-data-from-file controlplane.yaml
hcloud server create --name talos-control-plane-3 \
--image ${IMAGE_ID} \
--type cx21 --location nbg1 \
--label 'type=controlplane' \
--user-data-from-file controlplane.yaml
```
#### Create the Worker Nodes
Create the worker nodes with the following command, repeating (and incrementing the name counter) as many times as desired.
```bash
hcloud server create --name talos-worker-1 \
--image ${IMAGE_ID} \
--type cx21 --location hel1 \
--label 'type=worker' \
--user-data-from-file worker.yaml
```
### Bootstrap Etcd
To configure `talosctl` we will need the first control plane node's IP.
This can be found by issuing:
```bash
hcloud server list | grep talos-control-plane
```
Set the `endpoints` and `nodes` for your talosconfig with:
```bash
talosctl --talosconfig talosconfig config endpoint <control-plane-1-IP>
talosctl --talosconfig talosconfig config node <control-plane-1-IP>
```
Bootstrap `etcd` on the first control plane node with:
```bash
talosctl --talosconfig talosconfig bootstrap
```
### Retrieve the `kubeconfig`
At this point we can retrieve the admin `kubeconfig` by running:
```bash
talosctl --talosconfig talosconfig kubeconfig .
```

View File

@ -0,0 +1,127 @@
---
title: "Nocloud"
description: "Creating a cluster via the CLI using qemu."
aliases:
- ../../../cloud-platforms/nocloud
---
Talos supports [nocloud](https://cloudinit.readthedocs.io/en/latest/topics/datasources/nocloud.html) data source implementation.
There are two ways to configure Talos server with `nocloud` platform:
* via SMBIOS "serial number" option
* using CDROM or USB-flash filesystem
### SMBIOS Serial Number
This method requires the network connection to be up (e.g. via DHCP).
Configuration is delivered from the HTTP server.
```text
ds=nocloud-net;s=http://10.10.0.1/configs/;h=HOSTNAME
```
After the network initialization is complete, Talos fetches:
* the machine config from `http://10.10.0.1/configs/user-data`
* the network config (if available) from `http://10.10.0.1/configs/network-config`
#### SMBIOS: QEMU
Add the following flag to `qemu` command line when starting a VM:
```bash
qemu-system-x86_64 \
...\
-smbios type=1,serial=ds=nocloud-net;s=http://10.10.0.1/configs/
```
#### SMBIOS: Proxmox
Set the source machine config through the serial number on Proxmox GUI.
<img src="/images/no-cloud/proxmox-smbios.png" width="920px">
The Proxmox stores the VM config at `/etc/pve/qemu-server/$ID.conf` (```$ID``` - VM ID number of virtual machine), you will see something like:
```conf
...
smbios1: uuid=ceae4d10,serial=ZHM9bm9jbG91ZC1uZXQ7cz1odHRwOi8vMTAuMTAuMC4xL2NvbmZpZ3Mv,base64=1
...
```
Where serial holds the base64-encoded string version of `ds=nocloud-net;s=http://10.10.0.1/configs/`.
### CDROM/USB
Talos can also get machine config from local attached storage without any prior network connection being established.
You can provide configs to the server via files on a VFAT or ISO9660 filesystem.
The filesystem volume label must be ```cidata``` or ```CIDATA```.
#### Example: QEMU
Create and prepare Talos machine config:
```bash
export CONTROL_PLANE_IP=192.168.1.10
talosctl gen config talos-nocloud https://$CONTROL_PLANE_IP:6443 --output-dir _out
```
Prepare cloud-init configs:
```bash
mkdir -p iso
mv _out/controlplane.yaml iso/user-data
echo "local-hostname: controlplane-1" > iso/meta-data
cat > iso/network-config << EOF
version: 1
config:
- type: physical
name: eth0
mac_address: "52:54:00:12:34:00"
subnets:
- type: static
address: 192.168.1.10
netmask: 255.255.255.0
gateway: 192.168.1.254
EOF
```
Create cloud-init iso image
```bash
cd iso && genisoimage -output cidata.iso -V cidata -r -J user-data meta-data network-config
```
Start the VM
```bash
qemu-system-x86_64 \
...
-cdrom iso/cidata.iso \
...
```
#### Example: Proxmox
Proxmox can create cloud-init disk [for you](https://pve.proxmox.com/wiki/Cloud-Init_Support).
Edit the cloud-init config information in Proxmox as follows, substitute your own information as necessary:
<img src="/images/no-cloud/proxmox-cloudinit.png" width="600px">
and then update ```cicustom``` param at `/etc/pve/qemu-server/$ID.conf`.
```config
cicustom: user=local:snippets/master-1.yml
ipconfig0: ip=192.168.1.10/24,gw=192.168.10.254
nameserver: 1.1.1.1
searchdomain: local
```
> Note: `snippets/master-1.yml` is Talos machine config.
It is usually located at `/var/lib/vz/snippets/master-1.yml`.
This file must be placed to this path manually, as Proxmox does not support snippet uploading via API/GUI.
Click on `Regenerate Image` button after the above changes are made.

View File

@ -0,0 +1,148 @@
---
title: "Openstack"
description: "Creating a cluster via the CLI on Openstack."
aliases:
- ../../../cloud-platforms/openstack
---
## Creating a Cluster via the CLI
In this guide, we will create an HA Kubernetes cluster in Openstack with 1 worker node.
We will assume an existing some familiarity with Openstack.
If you need more information on Openstack specifics, please see the [official Openstack documentation](https://docs.openstack.org).
### Environment Setup
You should have an existing openrc file.
This file will provide environment variables necessary to talk to your Openstack cloud.
See [here](https://docs.openstack.org/newton/user-guide/common/cli-set-environment-variables-using-openstack-rc.html) for instructions on fetching this file.
### Create the Image
First, download the Openstack image from a Talos [release](https://github.com/siderolabs/talos/releases).
These images are called `openstack-$ARCH.tar.gz`.
Untar this file with `tar -xvf openstack-$ARCH.tar.gz`.
The resulting file will be called `disk.raw`.
#### Upload the Image
Once you have the image, you can upload to Openstack with:
```bash
openstack image create --public --disk-format raw --file disk.raw talos
```
### Network Infrastructure
#### Load Balancer and Network Ports
Once the image is prepared, you will need to work through setting up the network.
Issue the following to create a load balancer, the necessary network ports for each control plane node, and associations between the two.
Creating loadbalancer:
```bash
# Create load balancer, updating vip-subnet-id if necessary
openstack loadbalancer create --name talos-control-plane --vip-subnet-id public
# Create listener
openstack loadbalancer listener create --name talos-control-plane-listener --protocol TCP --protocol-port 6443 talos-control-plane
# Pool and health monitoring
openstack loadbalancer pool create --name talos-control-plane-pool --lb-algorithm ROUND_ROBIN --listener talos-control-plane-listener --protocol TCP
openstack loadbalancer healthmonitor create --delay 5 --max-retries 4 --timeout 10 --type TCP talos-control-plane-pool
```
Creating ports:
```bash
# Create ports for control plane nodes, updating network name if necessary
openstack port create --network shared talos-control-plane-1
openstack port create --network shared talos-control-plane-2
openstack port create --network shared talos-control-plane-3
# Create floating IPs for the ports, so that you will have talosctl connectivity to each control plane
openstack floating ip create --port talos-control-plane-1 public
openstack floating ip create --port talos-control-plane-2 public
openstack floating ip create --port talos-control-plane-3 public
```
> Note: Take notice of the private and public IPs associated with each of these ports, as they will be used in the next step.
> Additionally, take node of the port ID, as it will be used in server creation.
Associate port's private IPs to loadbalancer:
```bash
# Create members for each port IP, updating subnet-id and address as necessary.
openstack loadbalancer member create --subnet-id shared-subnet --address <PRIVATE IP OF talos-control-plane-1 PORT> --protocol-port 6443 talos-control-plane-pool
openstack loadbalancer member create --subnet-id shared-subnet --address <PRIVATE IP OF talos-control-plane-2 PORT> --protocol-port 6443 talos-control-plane-pool
openstack loadbalancer member create --subnet-id shared-subnet --address <PRIVATE IP OF talos-control-plane-3 PORT> --protocol-port 6443 talos-control-plane-pool
```
#### Security Groups
This example uses the default security group in Openstack.
Ports have been opened to ensure that connectivity from both inside and outside the group is possible.
You will want to allow, at a minimum, ports 6443 (Kubernetes API server) and 50000 (Talos API) from external sources.
It is also recommended to allow communication over all ports from within the subnet.
### Cluster Configuration
With our networking bits setup, we'll fetch the IP for our load balancer and create our configuration files.
```bash
LB_PUBLIC_IP=$(openstack loadbalancer show talos-control-plane -f json | jq -r .vip_address)
talosctl gen config talos-k8s-openstack-tutorial https://${LB_PUBLIC_IP}:6443
```
Additionally, you can specify `--config-patch` with RFC6902 jsonpatch which will be applied during the config generation.
### Compute Creation
We are now ready to create our Openstack nodes.
Create control plane:
```bash
# Create control planes 2 and 3, substituting the same info.
for i in $( seq 1 3 ); do
openstack server create talos-control-plane-$i --flavor m1.small --nic port-id=talos-control-plane-$i --image talos --user-data /path/to/controlplane.yaml
done
```
Create worker:
```bash
# Update network name as necessary.
openstack server create talos-worker-1 --flavor m1.small --network shared --image talos --user-data /path/to/worker.yaml
```
> Note: This step can be repeated to add more workers.
### Bootstrap Etcd
You should now be able to interact with your cluster with `talosctl`.
We will use one of the floating IPs we allocated earlier.
It does not matter which one.
Set the `endpoints` and `nodes`:
```bash
talosctl --talosconfig talosconfig config endpoint <control plane 1 IP>
talosctl --talosconfig talosconfig config node <control plane 1 IP>
```
Bootstrap `etcd`:
```bash
talosctl --talosconfig talosconfig bootstrap
```
### Retrieve the `kubeconfig`
At this point we can retrieve the admin `kubeconfig` by running:
```bash
talosctl --talosconfig talosconfig kubeconfig .
```

View File

@ -0,0 +1,195 @@
---
title: "Oracle"
description: "Creating a cluster via the CLI (oci) on OracleCloud.com."
aliases:
- ../../../cloud-platforms/oracle
---
## Upload image
Oracle Cloud at the moment does not have a Talos official image.
So you can use [Bring Your Own Image (BYOI)](https://docs.oracle.com/en-us/iaas/Content/Compute/References/bringyourownimage.htm) approach.
Once the image is uploaded, set the ```Boot volume type``` to ``Paravirtualized`` mode.
OracleCloud has highly available NTP service, it can be enabled in Talos machine config with:
```yaml
machine:
time:
servers:
- 169.254.169.254
```
## Creating a Cluster via the CLI
Login to the [console](https://www.oracle.com/cloud/).
And open the Cloud Shell.
### Create a network
```bash
export cidr_block=10.0.0.0/16
export subnet_block=10.0.0.0/24
export compartment_id=<substitute-value-of-compartment_id> # https://docs.cloud.oracle.com/en-us/iaas/tools/oci-cli/latest/oci_cli_docs/cmdref/network/vcn/create.html#cmdoption-compartment-id
export vcn_id=$(oci network vcn create --cidr-block $cidr_block --display-name talos-example --compartment-id $compartment_id --query data.id --raw-output)
export rt_id=$(oci network subnet create --cidr-block $subnet_block --display-name kubernetes --compartment-id $compartment_id --vcn-id $vcn_id --query data.route-table-id --raw-output)
export ig_id=$(oci network internet-gateway create --compartment-id $compartment_id --is-enabled true --vcn-id $vcn_id --query data.id --raw-output)
oci network route-table update --rt-id $rt_id --route-rules "[{\"cidrBlock\":\"0.0.0.0/0\",\"networkEntityId\":\"$ig_id\"}]" --force
# disable firewall
export sl_id=$(oci network vcn list --compartment-id $compartment_id --query 'data[0]."default-security-list-id"' --raw-output)
oci network security-list update --security-list-id $sl_id --egress-security-rules '[{"destination": "0.0.0.0/0", "protocol": "all", "isStateless": false}]' --ingress-security-rules '[{"source": "0.0.0.0/0", "protocol": "all", "isStateless": false}]' --force
```
### Create a Load Balancer
Create a load balancer by issuing the commands shown below.
Save the IP/DNS name, as this info will be used in the next step.
```bash
export subnet_id=$(oci network subnet list --compartment-id=$compartment_id --display-name kubernetes --query data[0].id --raw-output)
export network_load_balancer_id=$(oci nlb network-load-balancer create --compartment-id $compartment_id --display-name controlplane-lb --subnet-id $subnet_id --is-preserve-source-destination false --is-private false --query data.id --raw-output)
cat <<EOF > talos-health-checker.json
{
"intervalInMillis": 10000,
"port": 50000,
"protocol": "TCP"
}
EOF
oci nlb backend-set create --health-checker file://talos-health-checker.json --name talos --network-load-balancer-id $network_load_balancer_id --policy TWO_TUPLE --is-preserve-source false
oci nlb listener create --default-backend-set-name talos --name talos --network-load-balancer-id $network_load_balancer_id --port 50000 --protocol TCP
cat <<EOF > controlplane-health-checker.json
{
"intervalInMillis": 10000,
"port": 6443,
"protocol": "HTTPS",
"returnCode": 200,
"urlPath": "/readyz"
}
EOF
oci nlb backend-set create --health-checker file://controlplane-health-checker.json --name controlplane --network-load-balancer-id $network_load_balancer_id --policy TWO_TUPLE --is-preserve-source false
oci nlb listener create --default-backend-set-name controlplane --name controlplane --network-load-balancer-id $network_load_balancer_id --port 6443 --protocol TCP
# Save the external IP
oci nlb network-load-balancer list --compartment-id $compartment_id --display-name controlplane-lb --query 'data.items[0]."ip-addresses"'
```
### Create the Machine Configuration Files
#### Generating Base Configurations
Using the IP/DNS name of the loadbalancer created earlier, generate the base configuration files for the Talos machines by issuing:
```bash
$ talosctl gen config talos-k8s-oracle-tutorial https://<load balancer IP or DNS>:6443 --additional-sans <load balancer IP or DNS>
created controlplane.yaml
created worker.yaml
created talosconfig
```
At this point, you can modify the generated configs to your liking.
Optionally, you can specify `--config-patch` with RFC6902 jsonpatches which will be applied during the config generation.
#### Validate the Configuration Files
Validate any edited machine configs with:
```bash
$ talosctl validate --config controlplane.yaml --mode cloud
controlplane.yaml is valid for cloud mode
$ talosctl validate --config worker.yaml --mode cloud
worker.yaml is valid for cloud mode
```
### Create the Servers
#### Create the Control Plane Nodes
Create the control plane nodes with:
```bash
export shape='VM.Standard.A1.Flex'
export subnet_id=$(oci network subnet list --compartment-id=$compartment_id --display-name kubernetes --query data[0].id --raw-output)
export image_id=$(oci compute image list --compartment-id $compartment_id --shape $shape --operating-system Talos --limit 1 --query data[0].id --raw-output)
export availability_domain=$(oci iam availability-domain list --compartment-id=$compartment_id --query data[0].name --raw-output)
export network_load_balancer_id=$(oci nlb network-load-balancer list --compartment-id $compartment_id --display-name controlplane-lb --query 'data.items[0].id' --raw-output)
cat <<EOF > shape.json
{
"memoryInGBs": 4,
"ocpus": 1
}
EOF
export instance_id=$(oci compute instance launch --shape $shape --shape-config file://shape.json --availability-domain $availability_domain --compartment-id $compartment_id --image-id $image_id --subnet-id $subnet_id --display-name controlplane-1 --private-ip 10.0.0.11 --assign-public-ip true --launch-options '{"networkType":"PARAVIRTUALIZED"}' --user-data-file controlplane.yaml --query 'data.id' --raw-output)
oci nlb backend create --backend-set-name talos --network-load-balancer-id $network_load_balancer_id --port 50000 --target-id $instance_id
oci nlb backend create --backend-set-name controlplane --network-load-balancer-id $network_load_balancer_id --port 6443 --target-id $instance_id
export instance_id=$(oci compute instance launch --shape $shape --shape-config file://shape.json --availability-domain $availability_domain --compartment-id $compartment_id --image-id $image_id --subnet-id $subnet_id --display-name controlplane-2 --private-ip 10.0.0.12 --assign-public-ip true --launch-options '{"networkType":"PARAVIRTUALIZED"}' --user-data-file controlplane.yaml --query 'data.id' --raw-output)
oci nlb backend create --backend-set-name talos --network-load-balancer-id $network_load_balancer_id --port 50000 --target-id $instance_id
oci nlb backend create --backend-set-name controlplane --network-load-balancer-id $network_load_balancer_id --port 6443 --target-id $instance_id
export instance_id=$(oci compute instance launch --shape $shape --shape-config file://shape.json --availability-domain $availability_domain --compartment-id $compartment_id --image-id $image_id --subnet-id $subnet_id --display-name controlplane-3 --private-ip 10.0.0.13 --assign-public-ip true --launch-options '{"networkType":"PARAVIRTUALIZED"}' --user-data-file controlplane.yaml --query 'data.id' --raw-output)
oci nlb backend create --backend-set-name talos --network-load-balancer-id $network_load_balancer_id --port 50000 --target-id $instance_id
oci nlb backend create --backend-set-name controlplane --network-load-balancer-id $network_load_balancer_id --port 6443 --target-id $instance_id
```
#### Create the Worker Nodes
Create the worker nodes with the following command, repeating (and incrementing the name counter) as many times as desired.
```bash
export subnet_id=$(oci network subnet list --compartment-id=$compartment_id --display-name kubernetes --query data[0].id --raw-output)
export image_id=$(oci compute image list --compartment-id $compartment_id --operating-system Talos --limit 1 --query data[0].id --raw-output)
export availability_domain=$(oci iam availability-domain list --compartment-id=$compartment_id --query data[0].name --raw-output)
export shape='VM.Standard.E2.1.Micro'
oci compute instance launch --shape $shape --availability-domain $availability_domain --compartment-id $compartment_id --image-id $image_id --subnet-id $subnet_id --display-name worker-1 --assign-public-ip true --user-data-file worker.yaml
oci compute instance launch --shape $shape --availability-domain $availability_domain --compartment-id $compartment_id --image-id $image_id --subnet-id $subnet_id --display-name worker-2 --assign-public-ip true --user-data-file worker.yaml
oci compute instance launch --shape $shape --availability-domain $availability_domain --compartment-id $compartment_id --image-id $image_id --subnet-id $subnet_id --display-name worker-3 --assign-public-ip true --user-data-file worker.yaml
```
### Bootstrap Etcd
To configure `talosctl` we will need the first control plane node's IP.
This can be found by issuing:
```bash
export instance_id=$(oci compute instance list --compartment-id $compartment_id --display-name controlplane-1 --query 'data[0].id' --raw-output)
oci compute instance list-vnics --instance-id $instance_id --query 'data[0]."private-ip"' --raw-output
```
Set the `endpoints` and `nodes` for your talosconfig with:
```bash
talosctl --talosconfig talosconfig config endpoint <load balancer IP or DNS>
talosctl --talosconfig talosconfig config node <control-plane-1-IP>
```
Bootstrap `etcd` on the first control plane node with:
```bash
talosctl --talosconfig talosconfig bootstrap
```
### Retrieve the `kubeconfig`
At this point we can retrieve the admin `kubeconfig` by running:
```bash
talosctl --talosconfig talosconfig kubeconfig .
```

View File

@ -0,0 +1,8 @@
---
title: "Scaleway"
description: "Creating a cluster via the CLI (scw) on scaleway.com."
aliases:
- ../../../cloud-platforms/scaleway
---
Talos is known to work on scaleway.com; however, it is currently undocumented.

View File

@ -0,0 +1,8 @@
---
title: "UpCloud"
description: "Creating a cluster via the CLI (upctl) on UpCloud.com."
aliases:
- ../../../cloud-platforms/upcloud
---
Talos is known to work on UpCloud.com; however, it is currently undocumented.

View File

@ -0,0 +1,151 @@
---
title: "Vultr"
description: "Creating a cluster via the CLI (vultr-cli) on Vultr.com."
aliases:
- ../../../cloud-platforms/vultr
---
## Creating a Cluster using the Vultr CLI
This guide will demonstrate how to create a highly-available Kubernetes cluster with one worker using the Vultr cloud provider.
[Vultr](https://www.vultr.com/) have a very well documented REST API, and an open-source [CLI](https://github.com/vultr/vultr-cli) tool to interact with the API which will be used in this guide.
Make sure to follow installation and authentication instructions for the `vultr-cli` tool.
### Upload image
First step is to make the Talos ISO available to Vultr by uploading the latest release of the ISO to the Vultr ISO server.
```bash
vultr-cli iso create --url https://github.com/siderolabs/talos/releases/download/{{< release >}}/talos-amd64.iso
```
Make a note of the `ID` in the output, it will be needed later when creating the instances.
### Create a Load Balancer
A load balancer is needed to serve as the Kubernetes endpoint for the cluster.
```bash
vultr-cli load-balancer create \
--region $REGION \
--label "Talos Kubernetes Endpoint" \
--port 6443 \
--protocol tcp \
--check-interval 10 \
--response-timeout 5 \
--healthy-threshold 5 \
--unhealthy-threshold 3 \
--forwarding-rules frontend_protocol:tcp,frontend_port:443,backend_protocol:tcp,backend_port:6443
```
Make a note of the `ID` of the load balancer from the output of the above command, it will be needed after the control plane instances are created.
```bash
vultr-cli load-balancer get $LOAD_BALANCER_ID | grep ^IP
```
Make a note of the `IP` address, it will be needed later when generating the configuration.
### Create the Machine Configuration
#### Generate Base Configuration
Using the IP address (or DNS name if one was created) of the load balancer created above, generate the machine configuration files for the new cluster.
```bash
talosctl gen config talos-kubernetes-vultr https://$LOAD_BALANCER_ADDRESS
```
Once generated, the machine configuration can be modified as necessary for the new cluster, for instance updating disk installation, or adding SANs for the certificates.
#### Validate the Configuration Files
```bash
talosctl validate --config controlplane.yaml --mode cloud
talosctl validate --config worker.yaml --mode cloud
```
### Create the Nodes
#### Create the Control Plane Nodes
First a control plane needs to be created, with the example below creating 3 instances in a loop.
The instance type (noted by the `--plan vc2-2c-4gb` argument) in the example is for a minimum-spec control plane node, and should be updated to suit the cluster being created.
```bash
for id in $(seq 3); do
vultr-cli instance create \
--plan vc2-2c-4gb \
--region $REGION \
--iso $TALOS_ISO_ID \
--host talos-k8s-cp${id} \
--label "Talos Kubernetes Control Plane" \
--tags talos,kubernetes,control-plane
done
```
Make a note of the instance `ID`s, as they are needed to attach to the load balancer created earlier.
```bash
vultr-cli load-balancer update $LOAD_BALANCER_ID --instances $CONTROL_PLANE_1_ID,$CONTROL_PLANE_2_ID,$CONTROL_PLANE_3_ID
```
Once the nodes are booted and waiting in maintenance mode, the machine configuration can be applied to each one in turn.
```bash
talosctl --talosconfig talosconfig apply-config --insecure --nodes $CONTROL_PLANE_1_ADDRESS --file controlplane.yaml
talosctl --talosconfig talosconfig apply-config --insecure --nodes $CONTROL_PLANE_2_ADDRESS --file controlplane.yaml
talosctl --talosconfig talosconfig apply-config --insecure --nodes $CONTROL_PLANE_3_ADDRESS --file controlplane.yaml
```
#### Create the Worker Nodes
Now worker nodes can be created and configured in a similar way to the control plane nodes, the difference being mainly in the machine configuration file.
Note that like with the control plane nodes, the instance type (here set by `--plan vc2-1-1gb`) should be changed for the actual cluster requirements.
```bash
for id in $(seq 1); do
vultr-cli instance create \
--plan vc2-1c-1gb \
--region $REGION \
--iso $TALOS_ISO_ID \
--host talos-k8s-worker${id} \
--label "Talos Kubernetes Worker" \
--tags talos,kubernetes,worker
done
```
Once the worker is booted and in maintenance mode, the machine configuration can be applied in the following manner.
```bash
talosctl --talosconfig talosconfig apply-config --insecure --nodes $WORKER_1_ADDRESS --file worker.yaml
```
### Bootstrap etcd
Once all the cluster nodes are correctly configured, the cluster can be bootstrapped to become functional.
It is important that the `talosctl bootstrap` command be executed only once and against only a single control plane node.
```bash
talosctl --talosconfig talosconfig boostrap --endpoints $CONTROL_PLANE_1_ADDRESS --nodes $CONTROL_PLANE_1_ADDRESS
```
### Configure Endpoints and Nodes
While the cluster goes through the bootstrapping process and beings to self-manage, the `talosconfig` can be updated with the [endpoints and nodes]({{< relref "../../../learn-more/talosctl#endpoints-and-nodes" >}}).
```bash
talosctl --talosconfig talosconfig config endpoints $CONTROL_PLANE_1_ADDRESS $CONTROL_PLANE_2_ADDRESS $CONTROL_PLANE_3_ADDRESS
talosctl --talosconfig talosconfig config nodes $CONTROL_PLANE_1_ADDRESS $CONTROL_PLANE_2_ADDRESS $CONTROL_PLANE_3_ADDRESS WORKER_1_ADDRESS
```
### Retrieve the `kubeconfig`
Finally, with the cluster fully running, the administrative `kubeconfig` can be retrieved from the Talos API to be saved locally.
```bash
talosctl --talosconfig talosconfig kubeconfig .
```
Now the `kubeconfig` can be used by any of the usual Kubernetes tools to interact with the Talos-based Kubernetes cluster as normal.

View File

@ -0,0 +1,5 @@
---
title: "Local Platforms"
weight: 50
description: "Installation of Talos Linux on local platforms, helpful for testing and developing."
---

View File

@ -0,0 +1,55 @@
---
title: Docker
description: "Creating Talos Kubernetes cluster using Docker."
aliases:
- ../../../local-platforms/docker
---
In this guide we will create a Kubernetes cluster in Docker, using a containerized version of Talos.
Running Talos in Docker is intended to be used in CI pipelines, and local testing when you need a quick and easy cluster.
Furthermore, if you are running Talos in production, it provides an excellent way for developers to develop against the same version of Talos.
## Requirements
The follow are requirements for running Talos in Docker:
- Docker 18.03 or greater
- a recent version of [`talosctl`](https://github.com/siderolabs/talos/releases)
## Caveats
Due to the fact that Talos will be running in a container, certain APIs are not available.
For example `upgrade`, `reset`, and similar APIs don't apply in container mode.
Further, when running on a Mac in docker, due to networking limitations, VIPs are not supported.
## Create the Cluster
Creating a local cluster is as simple as:
```bash
talosctl cluster create --wait
```
Once the above finishes successfully, your talosconfig(`~/.talos/config`) will be configured to point to the new cluster.
> Note: Startup times can take up to a minute or more before the cluster is available.
Finally, we just need to specify which nodes you want to communicate with using talosctl.
Talosctl can operate on one or all the nodes in the cluster this makes cluster wide commands much easier.
`talosctl config nodes 10.5.0.2 10.5.0.3`
## Using the Cluster
Once the cluster is available, you can make use of `talosctl` and `kubectl` to interact with the cluster.
For example, to view current running containers, run `talosctl containers` for a list of containers in the `system` namespace, or `talosctl containers -k` for the `k8s.io` namespace.
To view the logs of a container, use `talosctl logs <container>` or `talosctl logs -k <container>`.
## Cleaning Up
To cleanup, run:
```bash
talosctl cluster destroy
```

View File

@ -0,0 +1,301 @@
---
title: QEMU
description: "Creating Talos Kubernetes cluster using QEMU VMs."
aliases:
- ../../../local-platforms/qemu
---
In this guide we will create a Kubernetes cluster using QEMU.
<img src="/images/qemu.png">
## Video Walkthrough
To see a live demo of this writeup, see the video below:
<iframe width="560" height="315" src="https://www.youtube.com/embed/UzQ8Hl_TfF8" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
## Requirements
- Linux
- a kernel with
- KVM enabled (`/dev/kvm` must exist)
- `CONFIG_NET_SCH_NETEM` enabled
- `CONFIG_NET_SCH_INGRESS` enabled
- at least `CAP_SYS_ADMIN` and `CAP_NET_ADMIN` capabilities
- QEMU
- `bridge`, `static` and `firewall` CNI plugins from the [standard CNI plugins](https://github.com/containernetworking/cni), and `tc-redirect-tap` CNI plugin from the [awslabs tc-redirect-tap](https://github.com/awslabs/tc-redirect-tap) installed to `/opt/cni/bin` (installed automatically by `talosctl`)
- iptables
- `/var/run/netns` directory should exist
## Installation
### How to get QEMU
Install QEMU with your operating system package manager.
For example, on Ubuntu for x86:
```bash
apt install qemu-system-x86 qemu-kvm
```
### Install talosctl
You can download `talosctl` and all required binaries via
[github.com/siderolabs/talos/releases](https://github.com/siderolabs/talos/releases)
```bash
curl https://github.com/siderolabs/talos/releases/download/<version>/talosctl-<platform>-<arch> -L -o talosctl
```
For example version `{{< release >}}` for `linux` platform:
```bash
curl https://github.com/siderolabs/talos/releases/download/{{< release >}}/talosctl-linux-amd64 -L -o talosctl
sudo cp talosctl /usr/local/bin
sudo chmod +x /usr/local/bin/talosctl
```
## Install Talos kernel and initramfs
QEMU provisioner depends on Talos kernel (`vmlinuz`) and initramfs (`initramfs.xz`).
These files can be downloaded from the Talos release:
```bash
mkdir -p _out/
curl https://github.com/siderolabs/talos/releases/download/<version>/vmlinuz-<arch> -L -o _out/vmlinuz-<arch>
curl https://github.com/siderolabs/talos/releases/download/<version>/initramfs-<arch>.xz -L -o _out/initramfs-<arch>.xz
```
For example version `{{< release >}}`:
```bash
curl https://github.com/siderolabs/talos/releases/download/{{< release >}}/vmlinuz-amd64 -L -o _out/vmlinuz-amd64
curl https://github.com/siderolabs/talos/releases/download/{{< release >}}/initramfs-amd64.xz -L -o _out/initramfs-amd64.xz
```
## Create the Cluster
For the first time, create root state directory as your user so that you can inspect the logs as non-root user:
```bash
mkdir -p ~/.talos/clusters
```
Create the cluster:
```bash
sudo -E talosctl cluster create --provisioner qemu
```
Before the first cluster is created, `talosctl` will download the CNI bundle for the VM provisioning and install it to `~/.talos/cni` directory.
Once the above finishes successfully, your talosconfig (`~/.talos/config`) will be configured to point to the new cluster, and `kubeconfig` will be
downloaded and merged into default kubectl config location (`~/.kube/config`).
Cluster provisioning process can be optimized with [registry pull-through caches]({{< relref "../../configuration/pull-through-cache" >}}).
## Using the Cluster
Once the cluster is available, you can make use of `talosctl` and `kubectl` to interact with the cluster.
For example, to view current running containers, run `talosctl -n 10.5.0.2 containers` for a list of containers in the `system` namespace, or `talosctl -n 10.5.0.2 containers -k` for the `k8s.io` namespace.
To view the logs of a container, use `talosctl -n 10.5.0.2 logs <container>` or `talosctl -n 10.5.0.2 logs -k <container>`.
A bridge interface will be created, and assigned the default IP 10.5.0.1.
Each node will be directly accessible on the subnet specified at cluster creation time.
A loadbalancer runs on 10.5.0.1 by default, which handles loadbalancing for the Kubernetes APIs.
You can see a summary of the cluster state by running:
```bash
$ talosctl cluster show --provisioner qemu
PROVISIONER qemu
NAME talos-default
NETWORK NAME talos-default
NETWORK CIDR 10.5.0.0/24
NETWORK GATEWAY 10.5.0.1
NETWORK MTU 1500
NODES:
NAME TYPE IP CPU RAM DISK
talos-default-master-1 Init 10.5.0.2 1.00 1.6 GB 4.3 GB
talos-default-master-2 ControlPlane 10.5.0.3 1.00 1.6 GB 4.3 GB
talos-default-master-3 ControlPlane 10.5.0.4 1.00 1.6 GB 4.3 GB
talos-default-worker-1 Worker 10.5.0.5 1.00 1.6 GB 4.3 GB
```
## Cleaning Up
To cleanup, run:
```bash
sudo -E talosctl cluster destroy --provisioner qemu
```
> Note: In that case that the host machine is rebooted before destroying the cluster, you may need to manually remove `~/.talos/clusters/talos-default`.
## Manual Clean Up
The `talosctl cluster destroy` command depends heavily on the clusters state directory.
It contains all related information of the cluster.
The PIDs and network associated with the cluster nodes.
If you happened to have deleted the state folder by mistake or you would like to cleanup
the environment, here are the steps how to do it manually:
### Remove VM Launchers
Find the process of `talosctl qemu-launch`:
```bash
ps -elf | grep 'talosctl qemu-launch'
```
To remove the VMs manually, execute:
```bash
sudo kill -s SIGTERM <PID>
```
Example output, where VMs are running with PIDs **157615** and **157617**
```bash
ps -elf | grep '[t]alosctl qemu-launch'
0 S root 157615 2835 0 80 0 - 184934 - 07:53 ? 00:00:00 talosctl qemu-launch
0 S root 157617 2835 0 80 0 - 185062 - 07:53 ? 00:00:00 talosctl qemu-launch
sudo kill -s SIGTERM 157615
sudo kill -s SIGTERM 157617
```
### Stopping VMs
Find the process of `qemu-system`:
```bash
ps -elf | grep 'qemu-system'
```
To stop the VMs manually, execute:
```bash
sudo kill -s SIGTERM <PID>
```
Example output, where VMs are running with PIDs **158065** and **158216**
```bash
ps -elf | grep qemu-system
2 S root 1061663 1061168 26 80 0 - 1786238 - 14:05 ? 01:53:56 qemu-system-x86_64 -m 2048 -drive format=raw,if=virtio,file=/home/username/.talos/clusters/talos-default/bootstrap-master.disk -smp cpus=2 -cpu max -nographic -netdev tap,id=net0,ifname=tap0,script=no,downscript=no -device virtio-net-pci,netdev=net0,mac=1e:86:c6:b4:7c:c4 -device virtio-rng-pci -no-reboot -boot order=cn,reboot-timeout=5000 -smbios type=1,uuid=7ec0a73c-826e-4eeb-afd1-39ff9f9160ca -machine q35,accel=kvm
2 S root 1061663 1061170 67 80 0 - 621014 - 21:23 ? 00:00:07 qemu-system-x86_64 -m 2048 -drive format=raw,if=virtio,file=/homeusername/.talos/clusters/talos-default/pxe-1.disk -smp cpus=2 -cpu max -nographic -netdev tap,id=net0,ifname=tap0,script=no,downscript=no -device virtio-net-pci,netdev=net0,mac=36:f3:2f:c3:9f:06 -device virtio-rng-pci -no-reboot -boot order=cn,reboot-timeout=5000 -smbios type=1,uuid=ce12a0d0-29c8-490f-b935-f6073ab916a6 -machine q35,accel=kvm
sudo kill -s SIGTERM 1061663
sudo kill -s SIGTERM 1061663
```
### Remove load balancer
Find the process of `talosctl loadbalancer-launch`:
```bash
ps -elf | grep 'talosctl loadbalancer-launch'
```
To remove the LB manually, execute:
```bash
sudo kill -s SIGTERM <PID>
```
Example output, where loadbalancer is running with PID **157609**
```bash
ps -elf | grep '[t]alosctl loadbalancer-launch'
4 S root 157609 2835 0 80 0 - 184998 - 07:53 ? 00:00:07 talosctl loadbalancer-launch --loadbalancer-addr 10.5.0.1 --loadbalancer-upstreams 10.5.0.2
sudo kill -s SIGTERM 157609
```
### Remove DHCP server
Find the process of `talosctl dhcpd-launch`:
```bash
ps -elf | grep 'talosctl dhcpd-launch'
```
To remove the LB manually, execute:
```bash
sudo kill -s SIGTERM <PID>
```
Example output, where loadbalancer is running with PID **157609**
```bash
ps -elf | grep '[t]alosctl dhcpd-launch'
4 S root 157609 2835 0 80 0 - 184998 - 07:53 ? 00:00:07 talosctl dhcpd-launch --state-path /home/username/.talos/clusters/talos-default --addr 10.5.0.1 --interface talosbd9c32bc
sudo kill -s SIGTERM 157609
```
### Remove network
This is more tricky part as if you have already deleted the state folder.
If you didn't then it is written in the `state.yaml` in the
`~/.talos/clusters/<cluster-name>` directory.
```bash
sudo cat ~/.talos/clusters/<cluster-name>/state.yaml | grep bridgename
bridgename: talos<uuid>
```
If you only had one cluster, then it will be the interface with name
`talos<uuid>`
```bash
46: talos<uuid>: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
link/ether a6:72:f4:0a:d3:9c brd ff:ff:ff:ff:ff:ff
inet 10.5.0.1/24 brd 10.5.0.255 scope global talos17c13299
valid_lft forever preferred_lft forever
inet6 fe80::a472:f4ff:fe0a:d39c/64 scope link
valid_lft forever preferred_lft forever
```
To remove this interface:
```bash
sudo ip link del talos<uuid>
```
### Remove state directory
To remove the state directory execute:
```bash
sudo rm -Rf /home/$USER/.talos/clusters/<cluster-name>
```
## Troubleshooting
### Logs
Inspect logs directory
```bash
sudo cat ~/.talos/clusters/<cluster-name>/*.log
```
Logs are saved under `<cluster-name>-<role>-<node-id>.log`
For example in case of **k8s** cluster name:
```bash
ls -la ~/.talos/clusters/k8s | grep log
-rw-r--r--. 1 root root 69415 Apr 26 20:58 k8s-master-1.log
-rw-r--r--. 1 root root 68345 Apr 26 20:58 k8s-worker-1.log
-rw-r--r--. 1 root root 24621 Apr 26 20:59 lb.log
```
Inspect logs during the installation
```bash
tail -f ~/.talos/clusters/<cluster-name>/*.log
```

View File

@ -0,0 +1,192 @@
---
title: VirtualBox
description: "Creating Talos Kubernetes cluster using VurtualBox VMs."
aliases:
- ../../../local-platforms/virtualbox
---
In this guide we will create a Kubernetes cluster using VirtualBox.
## Video Walkthrough
To see a live demo of this writeup, visit Youtube here:
<iframe width="560" height="315" src="https://www.youtube.com/embed/bIszwavcBiU" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
## Installation
### How to Get VirtualBox
Install VirtualBox with your operating system package manager or from the [website](https://www.virtualbox.org/).
For example, on Ubuntu for x86:
```bash
apt install virtualbox
```
### Install talosctl
You can download `talosctl` via
[github.com/siderolabs/talos/releases](https://github.com/siderolabs/talos/releases)
```bash
curl https://github.com/siderolabs/talos/releases/download/<version>/talosctl-<platform>-<arch> -L -o talosctl
```
For example version `{{< release >}}` for `linux` platform:
```bash
curl https://github.com/siderolabs/talos/releases/download/{{< release >}}/talosctl-linux-amd64 -L -o talosctl
sudo cp talosctl /usr/local/bin
sudo chmod +x /usr/local/bin/talosctl
```
### Download ISO Image
In order to install Talos in VirtualBox, you will need the ISO image from the Talos release page.
You can download `talos-amd64.iso` via
[github.com/siderolabs/talos/releases](https://github.com/siderolabs/talos/releases)
```bash
mkdir -p _out/
curl https://github.com/siderolabs/talos/releases/download/<version>/talos-<arch>.iso -L -o _out/talos-<arch>.iso
```
For example version `{{< release >}}` for `linux` platform:
```bash
mkdir -p _out/
curl https://github.com/siderolabs/talos/releases/download/{{< release >}}/talos-amd64.iso -L -o _out/talos-amd64.iso
```
## Create VMs
Start by creating a new VM by clicking the "New" button in the VirtualBox UI:
<img src="/images/vbox-guide/new-vm.png" width="500px">
Supply a name for this VM, and specify the Type and Version:
<img src="/images/vbox-guide/vm-name.png" width="500px">
Edit the memory to supply at least 2GB of RAM for the VM:
<img src="/images/vbox-guide/vm-memory.png" width="500px">
Proceed through the disk settings, keeping the defaults.
You can increase the disk space if desired.
Once created, select the VM and hit "Settings":
<img src="/images/vbox-guide/edit-settings.png" width="500px">
In the "System" section, supply at least 2 CPUs:
<img src="/images/vbox-guide/edit-cpu.png" width="500px">
In the "Network" section, switch the network "Attached To" section to "Bridged Adapter":
<img src="/images/vbox-guide/edit-nic.png" width="500px">
Finally, in the "Storage" section, select the optical drive and, on the right, select the ISO by browsing your filesystem:
<img src="/images/vbox-guide/add-iso.png" width="500px">
Repeat this process for a second VM to use as a worker node.
You can also repeat this for additional nodes desired.
## Start Control Plane Node
Once the VMs have been created and updated, start the VM that will be the first control plane node.
This VM will boot the ISO image specified earlier and enter "maintenance mode".
Once the machine has entered maintenance mode, there will be a console log that details the IP address that the node received.
Take note of this IP address, which will be referred to as `$CONTROL_PLANE_IP` for the rest of this guide.
If you wish to export this IP as a bash variable, simply issue a command like `export CONTROL_PLANE_IP=1.2.3.4`.
<img src="/images/vbox-guide/maintenance-mode.png" width="500px">
## Generate Machine Configurations
With the IP address above, you can now generate the machine configurations to use for installing Talos and Kubernetes.
Issue the following command, updating the output directory, cluster name, and control plane IP as you see fit:
```bash
talosctl gen config talos-vbox-cluster https://$CONTROL_PLANE_IP:6443 --output-dir _out
```
This will create several files in the `_out` directory: controlplane.yaml, worker.yaml, and talosconfig.
## Create Control Plane Node
Using the `controlplane.yaml` generated above, you can now apply this config using talosctl.
Issue:
```bash
talosctl apply-config --insecure --nodes $CONTROL_PLANE_IP --file _out/controlplane.yaml
```
You should now see some action in the VirtualBox console for this VM.
Talos will be installed to disk, the VM will reboot, and then Talos will configure the Kubernetes control plane on this VM.
> Note: This process can be repeated multiple times to create an HA control plane.
## Create Worker Node
Create at least a single worker node using a process similar to the control plane creation above.
Start the worker node VM and wait for it to enter "maintenance mode".
Take note of the worker node's IP address, which will be referred to as `$WORKER_IP`
Issue:
```bash
talosctl apply-config --insecure --nodes $WORKER_IP --file _out/worker.yaml
```
> Note: This process can be repeated multiple times to add additional workers.
## Using the Cluster
Once the cluster is available, you can make use of `talosctl` and `kubectl` to interact with the cluster.
For example, to view current running containers, run `talosctl containers` for a list of containers in the `system` namespace, or `talosctl containers -k` for the `k8s.io` namespace.
To view the logs of a container, use `talosctl logs <container>` or `talosctl logs -k <container>`.
First, configure talosctl to talk to your control plane node by issuing the following, updating paths and IPs as necessary:
```bash
export TALOSCONFIG="_out/talosconfig"
talosctl config endpoint $CONTROL_PLANE_IP
talosctl config node $CONTROL_PLANE_IP
```
### Bootstrap Etcd
Set the `endpoints` and `nodes`:
```bash
talosctl --talosconfig $TALOSCONFIG config endpoint <control plane 1 IP>
talosctl --talosconfig $TALOSCONFIG config node <control plane 1 IP>
```
Bootstrap `etcd`:
```bash
talosctl --talosconfig $TALOSCONFIG bootstrap
```
### Retrieve the `kubeconfig`
At this point we can retrieve the admin `kubeconfig` by running:
```bash
talosctl --talosconfig $TALOSCONFIG kubeconfig .
```
You can then use kubectl in this fashion:
```bash
kubectl get nodes
```
## Cleaning Up
To cleanup, simply stop and delete the virtual machines from the VirtualBox UI.

View File

@ -0,0 +1,5 @@
---
title: "Single Board Computers"
weight: 55
description: "Installation of Talos Linux on single-board computers."
---

View File

@ -0,0 +1,59 @@
---
title: "Banana Pi M64"
description: "Installing Talos on Banana Pi M64 SBC using raw disk image."
aliases:
- ../../../single-board-computers/bananapi_m64
---
## Prerequisites
You will need
- `talosctl`
- an SD card
Download the latest `talosctl`.
```bash
curl -Lo /usr/local/bin/talosctl https://github.com/siderolabs/talos/releases/download/{{< release >}}/talosctl-$(uname -s | tr "[:upper:]" "[:lower:]")-amd64
chmod +x /usr/local/bin/talosctl
```
## Download the Image
Download the image and decompress it:
```bash
curl -LO https://github.com/siderolabs/talos/releases/download/{{< release >}}/metal-bananapi_m64-arm64.img.xz
xz -d metal-bananapi_m64-arm64.img.xz
```
## Writing the Image
The path to your SD card can be found using `fdisk` on Linux or `diskutil` on macOS.
In this example, we will assume `/dev/mmcblk0`.
Now `dd` the image to your SD card:
```bash
sudo dd if=metal-bananapi_m64-arm64.img of=/dev/mmcblk0 conv=fsync bs=4M
```
## Bootstrapping the Node
Insert the SD card to your board, turn it on and wait for the console to show you the instructions for bootstrapping the node.
Following the instructions in the console output to connect to the interactive installer:
```bash
talosctl apply-config --insecure --mode=interactive --nodes <node IP or DNS name>
```
Once the interactive installation is applied, the cluster will form and you can then use `kubectl`.
## Retrieve the `kubeconfig`
Retrieve the admin `kubeconfig` by running:
```bash
talosctl kubeconfig
```

View File

@ -0,0 +1,119 @@
---
title: "Jetson Nano"
description: "Installing Talos on Jetson Nano SBC using raw disk image."
aliases:
- ../../../single-board-computers/jetson_nano
---
## Prerequisites
You will need
- `talosctl`
- an SD card/USB drive
- [crane CLI](https://github.com/google/go-containerregistry/releases)
Download the latest `talosctl`.
```bash
curl -Lo /usr/local/bin/talosctl https://github.com/siderolabs/talos/releases/download/{{< release >}}/talosctl-$(uname -s | tr "[:upper:]" "[:lower:]")-amd64
chmod +x /usr/local/bin/talosctl
```
## Flashing the firmware to on-board SPI flash
> Flashing the firmware only needs to be done once.
We will use the [R32.7.2 release](https://developer.nvidia.com/embedded/l4t/r32_release_v7.1/t210/jetson-210_linux_r32.7.2_aarch64.tbz2) for the Jetson Nano.
Most of the instructions is similar to this [doc](https://nullr0ute.com/2020/11/installing-fedora-on-the-nvidia-jetson-nano/) except that we'd be using a upstream version of `u-boot` with patches from NVIDIA u-boot so that USB boot also works.
Before flashing we need the following:
- A USB-A to micro USB cable
- A jumper wire to enable recovery mode
- A HDMI monitor to view the logs if the USB serial adapter is not available
- A USB to Serial adapter with 3.3V TTL (optional)
- A 5V DC barrel jack
If you're planning to use the serial console follow the documentation [here](https://www.jetsonhacks.com/2019/04/19/jetson-nano-serial-console/)
First start by downloading the Jetson Nano L4T release.
```bash
curl -SLO https://developer.nvidia.com/embedded/l4t/r32_release_v7.1/t210/jetson-210_linux_r32.7.2_aarch64.tbz2
```
Next we will extract the L4T release and replace the `u-boot` binary with the patched version.
```bash
tar xf jetson-210_linux_r32.6.1_aarch64.tbz2
cd Linux_for_Tegra
crane --platform=linux/arm64 export ghcr.io/siderolabs/u-boot:v1.1.0-alpha.0-42-gcd05ae8 - | tar xf - --strip-components=1 -C bootloader/t210ref/p3450-0000/ jetson_nano/u-boot.bin
```
Next we will flash the firmware to the Jetson Nano SPI flash.
In order to do that we need to put the Jetson Nano into Force Recovery Mode (FRC).
We will use the instructions from [here](https://developer.download.nvidia.com/embedded/L4T/r32_Release_v4.4/r32_Release_v4.4-GMC3/T210/l4t_quick_start_guide.txt)
- Ensure that the Jetson Nano is powered off.
There is no need for the SD card/USB storage/network cable to be connected
- Connect the micro USB cable to the micro USB port on the Jetson Nano, don't plug the other end to the PC yet
- Enable Force Recovery Mode (FRC) by placing a jumper across the FRC pins on the Jetson Nano
- For board revision *A02*, these are pins `3` and `4` of header `J40`
- For board revision *B01*, these are pins `9` and `10` of header `J50`
- Place another jumper across `J48` to enable power from the DC jack and connect the Jetson Nano to the DC jack `J25`
- Now connect the other end of the micro USB cable to the PC and remove the jumper wire from the FRC pins
Now the Jetson Nano is in Force Recovery Mode (FRC) and can be confirmed by running the following command
```bash
lsusb | grep -i "nvidia"
```
Now we can move on the flashing the firmware.
```bash
sudo ./flash p3448-0000-max-spi external
```
This will flash the firmware to the Jetson Nano SPI flash and you'll see a lot of output.
If you've connected the serial console you'll also see the progress there.
Once the flashing is done you can disconnect the USB cable and power off the Jetson Nano.
## Download the Image
Download the image and decompress it:
```bash
curl -LO https://github.com/siderolabs/talos/releases/download/{{< release >}}/metal-jetson_nano-arm64.img.xz
xz -d metal-jetson_nano-arm64.img.xz
```
## Writing the Image
Now `dd` the image to your SD card/USB storage:
```bash
sudo dd if=metal-jetson_nano-arm64.img of=/dev/mmcblk0 conv=fsync bs=4M status=progress
```
| Replace `/dev/mmcblk0` with the name of your SD card/USB storage.
## Bootstrapping the Node
Insert the SD card/USB storage to your board, turn it on and wait for the console to show you the instructions for bootstrapping the node.
Following the instructions in the console output to connect to the interactive installer:
```bash
talosctl apply-config --insecure --mode=interactive --nodes <node IP or DNS name>
```
Once the interactive installation is applied, the cluster will form and you can then use `kubectl`.
## Retrieve the `kubeconfig`
Retrieve the admin `kubeconfig` by running:
```bash
talosctl kubeconfig
```

View File

@ -0,0 +1,59 @@
---
title: "Libre Computer Board ALL-H3-CC"
description: "Installing Talos on Libre Computer Board ALL-H3-CC SBC using raw disk image."
aliases:
- ../../../single-board-computers/libretech_all_h3_cc_h5
---
## Prerequisites
You will need
- `talosctl`
- an SD card
Download the latest `talosctl`.
```bash
curl -Lo /usr/local/bin/talosctl https://github.com/siderolabs/talos/releases/download/{{< release >}}/talosctl-$(uname -s | tr "[:upper:]" "[:lower:]")-amd64
chmod +x /usr/local/bin/talosctl
```
## Download the Image
Download the image and decompress it:
```bash
curl -LO https://github.com/siderolabs/talos/releases/download/{{< release >}}/metal-libretech_all_h3_cc_h5-arm64.img.xz
xz -d metal-libretech_all_h3_cc_h5-arm64.img.xz
```
## Writing the Image
The path to your SD card can be found using `fdisk` on Linux or `diskutil` on macOS.
In this example, we will assume `/dev/mmcblk0`.
Now `dd` the image to your SD card:
```bash
sudo dd if=metal-libretech_all_h3_cc_h5-arm64.img of=/dev/mmcblk0 conv=fsync bs=4M
```
## Bootstrapping the Node
Insert the SD card to your board, turn it on and wait for the console to show you the instructions for bootstrapping the node.
Following the instructions in the console output to connect to the interactive installer:
```bash
talosctl apply-config --insecure --mode=interactive --nodes <node IP or DNS name>
```
Once the interactive installation is applied, the cluster will form and you can then use `kubectl`.
## Retrieve the `kubeconfig`
Retrieve the admin `kubeconfig` by running:
```bash
talosctl kubeconfig
```

View File

@ -0,0 +1,59 @@
---
title: "Pine64"
description: "Installing Talos on a Pine64 SBC using raw disk image."
aliases:
- ../../../single-board-computers/pine64
---
## Prerequisites
You will need
- `talosctl`
- an SD card
Download the latest `talosctl`.
```bash
curl -Lo /usr/local/bin/talosctl https://github.com/siderolabs/talos/releases/download/{{< release >}}/talosctl-$(uname -s | tr "[:upper:]" "[:lower:]")-amd64
chmod +x /usr/local/bin/talosctl
```
## Download the Image
Download the image and decompress it:
```bash
curl -LO https://github.com/siderolabs/talos/releases/download/{{< release >}}/metal-pine64-arm64.img.xz
xz -d metal-pine64-arm64.img.xz
```
## Writing the Image
The path to your SD card can be found using `fdisk` on Linux or `diskutil` on macOS.
In this example, we will assume `/dev/mmcblk0`.
Now `dd` the image to your SD card:
```bash
sudo dd if=metal-pine64-arm64.img of=/dev/mmcblk0 conv=fsync bs=4M
```
## Bootstrapping the Node
Insert the SD card to your board, turn it on and wait for the console to show you the instructions for bootstrapping the node.
Following the instructions in the console output to connect to the interactive installer:
```bash
talosctl apply-config --insecure --mode=interactive --nodes <node IP or DNS name>
```
Once the interactive installation is applied, the cluster will form and you can then use `kubectl`.
## Retrieve the `kubeconfig`
Retrieve the admin `kubeconfig` by running:
```bash
talosctl kubeconfig
```

View File

@ -0,0 +1,59 @@
---
title: "Pine64 Rock64"
description: "Installing Talos on Pine64 Rock64 SBC using raw disk image."
aliases:
- ../../../single-board-computers/rock64
---
## Prerequisites
You will need
- `talosctl`
- an SD card
Download the latest `talosctl`.
```bash
curl -Lo /usr/local/bin/talosctl https://github.com/siderolabs/talos/releases/download/{{< release >}}/talosctl-$(uname -s | tr "[:upper:]" "[:lower:]")-amd64
chmod +x /usr/local/bin/talosctl
```
## Download the Image
Download the image and decompress it:
```bash
curl -LO https://github.com/siderolabs/talos/releases/download/{{< release >}}/metal-rock64-arm64.img.xz
xz -d metal-rock64-arm64.img.xz
```
## Writing the Image
The path to your SD card can be found using `fdisk` on Linux or `diskutil` on macOS.
In this example, we will assume `/dev/mmcblk0`.
Now `dd` the image to your SD card:
```bash
sudo dd if=metal-rock64-arm64.img of=/dev/mmcblk0 conv=fsync bs=4M
```
## Bootstrapping the Node
Insert the SD card to your board, turn it on and wait for the console to show you the instructions for bootstrapping the node.
Following the instructions in the console output to connect to the interactive installer:
```bash
talosctl apply-config --insecure --mode=interactive --nodes <node IP or DNS name>
```
Once the interactive installation is applied, the cluster will form and you can then use `kubectl`.
## Retrieve the `kubeconfig`
Retrieve the admin `kubeconfig` by running:
```bash
talosctl kubeconfig
```

View File

@ -0,0 +1,98 @@
---
title: "Radxa ROCK PI 4"
description: "Installing Talos on Radxa ROCK PI 4a/4b SBC using raw disk image."
aliases:
- ../../../single-board-computers/rockpi_4c
---
## Prerequisites
You will need
- `talosctl`
- an SD card or an eMMC or USB drive or an nVME drive
Download the latest `talosctl`.
```bash
curl -Lo /usr/local/bin/talosctl https://github.com/siderolabs/talos/releases/download/{{< release >}}/talosctl-$(uname -s | tr "[:upper:]" "[:lower:]")-amd64
chmod +x /usr/local/bin/talosctl
```
## Download the Image
Download the image and decompress it:
```bash
curl -LO https://github.com/siderolabs/talos/releases/download/{{< release >}}/metal-rockpi_4-arm64.img.xz
xz -d metal-rockpi_4-arm64.img.xz
```
## Writing the Image
The path to your SD card/eMMC/USB/nVME can be found using `fdisk` on Linux or `diskutil` on macOS.
In this example, we will assume `/dev/mmcblk0`.
Now `dd` the image to your SD card:
```bash
sudo dd if=metal-rockpi_4-arm64.img of=/dev/mmcblk0 conv=fsync bs=4M
```
The user has two options to proceed:
- booting from a SD card or eMMC
- booting from a USB or nVME (requires the RockPi board to have the SPI flash)
### Booting from SD card or eMMC
Insert the SD card into the board, turn it on and proceed to [bootstrapping the node](#bootstrapping-the-node).
### Booting from USB or nVME
This requires the user to flash the RockPi SPI flash with u-boot.
This requires the user has access to [crane CLI](https://github.com/google/go-containerregistry/releases), a spare SD card and optionally access to the [RockPi serial console](https://wiki.radxa.com/Rockpi4/dev/serial-console).
- Flash the Rock PI 4c variant of [Debian](https://wiki.radxa.com/Rockpi4/downloads) to the SD card.
- Boot into the debian image
- Check that /dev/mtdblock0 exists otherwise the command will silently fail; e.g. `lsblk`.
- Download u-boot image from talos u-boot:
```bash
mkdir _out
crane --platform=linux/arm64 export ghcr.io/siderolabs/u-boot:v1.1.0-alpha.0-19-g6691342 - | tar xf - --strip-components=1 -C _out rockpi_4/rkspi_loader.img
sudo dd if=rkspi_loader.img of=/dev/mtdblock0 bs=4K
```
- Optionally, you can also write Talos image to the SSD drive right from your Rock PI board:
```bash
curl -LO https://github.com/siderolabs/talos/releases/download/{{< release >}}/metal-rockpi_4-arm64.img.xz
xz -d metal-rockpi_4-arm64.img.xz
sudo dd if=metal-rockpi_4-arm64.img.xz of=/dev/nvme0n1
```
- remove SD card and reboot.
After these steps, Talos will boot from the nVME/USB and enter maintenance mode.
Proceed to [bootstrapping the node](#bootstrapping-the-node).
## Bootstrapping the Node
Wait for the console to show you the instructions for bootstrapping the node.
Following the instructions in the console output to connect to the interactive installer:
```bash
talosctl apply-config --insecure --mode=interactive --nodes <node IP or DNS name>
```
Once the interactive installation is applied, the cluster will form and you can then use `kubectl`.
## Retrieve the `kubeconfig`
Retrieve the admin `kubeconfig` by running:
```bash
talosctl kubeconfig
```

View File

@ -0,0 +1,96 @@
---
title: "Radxa ROCK PI 4C"
description: "Installing Talos on Radxa ROCK PI 4c SBC using raw disk image."
---
## Prerequisites
You will need
- `talosctl`
- an SD card or an eMMC or USB drive or an nVME drive
Download the latest `talosctl`.
```bash
curl -Lo /usr/local/bin/talosctl https://github.com/siderolabs/talos/releases/download/{{< release >}}/talosctl-$(uname -s | tr "[:upper:]" "[:lower:]")-amd64
chmod +x /usr/local/bin/talosctl
```
## Download the Image
Download the image and decompress it:
```bash
curl -LO https://github.com/siderolabs/talos/releases/download/{{< release >}}/metal-rockpi_4c-arm64.img.xz
xz -d metal-rockpi_4c-arm64.img.xz
```
## Writing the Image
The path to your SD card/eMMC/USB/nVME can be found using `fdisk` on Linux or `diskutil` on macOS.
In this example, we will assume `/dev/mmcblk0`.
Now `dd` the image to your SD card:
```bash
sudo dd if=metal-rockpi_4c-arm64.img of=/dev/mmcblk0 conv=fsync bs=4M
```
The user has two options to proceed:
- booting from a SD card or eMMC
- booting from a USB or nVME (requires the RockPi board to have the SPI flash)
### Booting from SD card or eMMC
Insert the SD card into the board, turn it on and proceed to [bootstrapping the node](#bootstrapping-the-node).
### Booting from USB or nVME
This requires the user to flash the RockPi SPI flash with u-boot.
This requires the user has access to [crane CLI](https://github.com/google/go-containerregistry/releases), a spare SD card and optionally access to the [RockPi serial console](https://wiki.radxa.com/Rockpi4/dev/serial-console).
- Flash the Rock PI 4c variant of [Debian](https://wiki.radxa.com/Rockpi4/downloads) to the SD card.
- Boot into the debian image
- Check that /dev/mtdblock0 exists otherwise the command will silently fail; e.g. `lsblk`.
- Download u-boot image from talos u-boot:
```bash
mkdir _out
crane --platform=linux/arm64 export ghcr.io/siderolabs/u-boot:v1.1.0-alpha.0-19-g6691342 - | tar xf - --strip-components=1 -C _out rockpi_4c/rkspi_loader.img
sudo dd if=rkspi_loader.img of=/dev/mtdblock0 bs=4K
```
- Optionally, you can also write Talos image to the SSD drive right from your Rock PI board:
```bash
curl -LO https://github.com/siderolabs/talos/releases/download/{{< release >}}/metal-rockpi_4c-arm64.img.xz
xz -d metal-rockpi_4c-arm64.img.xz
sudo dd if=metal-rockpi_4c-arm64.img.xz of=/dev/nvme0n1
```
- remove SD card and reboot.
After these steps, Talos will boot from the nVME/USB and enter maintenance mode.
Proceed to [bootstrapping the node](#bootstrapping-the-node).
## Bootstrapping the Node
Wait for the console to show you the instructions for bootstrapping the node.
Following the instructions in the console output to connect to the interactive installer:
```bash
talosctl apply-config --insecure --mode=interactive --nodes <node IP or DNS name>
```
Once the interactive installation is applied, the cluster will form and you can then use `kubectl`.
## Retrieve the `kubeconfig`
Retrieve the admin `kubeconfig` by running:
```bash
talosctl kubeconfig
```

Some files were not shown because too many files have changed in this diff Show More