3506 Commits

Author SHA1 Message Date
Eng Zer Jun
fb058a7c92
test: use T.TempDir to create temporary test directory
This commit replaces `ioutil.TempDir` with `t.TempDir` in tests. The
directory created by `t.TempDir` is automatically removed when the test
and all its subtests complete.

Prior to this commit, temporary directory created using `ioutil.TempDir`
needs to be removed manually by calling `os.RemoveAll`, which is omitted
in some tests. The error handling boilerplate e.g.
	defer func() {
		if err := os.RemoveAll(dir); err != nil {
			t.Fatal(err)
		}
	}
is also tedious, but `t.TempDir` handles this for us nicely.

Reference: https://pkg.go.dev/testing#T.TempDir
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-03 16:31:55 +04:00
Andrey Smirnov
6fc38bae69
fix: iterate over etcd members endpoints for member promotion
This uses all available (potential) etcd endpoints, which includes the
member being promoted as well. We avoid failures by iterating over the
list of endpoints on each attempt to make sure each and every endpoint
is tried.

Part of #5889

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-02 00:16:33 +04:00
Andrey Smirnov
c70b692fb3
fix: update default address if removed from the host
This fixes a case when some IP which became default at some point was
removed completely from the node. In that case Talos should set default
address to another address, as having default IP not on the node doesn't
make much sense.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-01 23:19:04 +04:00
Utku Ozdemir
cf620d4733
feat: read talosconfig from secrets directory
Similar to the way kubectl reads kubeconfig, we attempt to load talosconfig file from multiple locations. If the file exists under `/var/run/secrets/talos.dev/config`, we load with higher priority before falling back to `~/.talos/config`. This will allow talosctl to be able to access Talos API from inside a pod when talosconfig is mounted into `/var/run/secrets/talos.dev/config`, similar to the way Kubernetes service account tokens work.

Part of siderolabs/talos#5980.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2022-08-01 18:56:57 +02:00
Eirik Askheim
1ad8e6122c
fix: keep entire vlan id when parsing cmdline
Only last digit was kept.

Signed-off-by: Eirik Askheim <eirik@x13.no>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-01 19:52:15 +04:00
Andrey Smirnov
fe2ee3b100
feat: implement MachineStatus resource
Fixes #5789

Example:

```yaml
spec:
    stage: running
    status:
        ready: false
        unmetConditions:
            - name: staticPods
              reason: kube-system/kube-controller-manager-talos-default-master-1 not ready, kube-system/kube-scheduler-talos-default-master-1 not ready
```

As events (CLI doesn't show full contents):

```
172.20.0.2   cbhf2l6f9lrs738hehfg   talos/runtime/machine.MachineStatusEvent   BOOTING   ready: false, unmet conditions: [time network services]
```

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-01 18:36:10 +04:00
Andrey Smirnov
670d274c45
chore: bump dependencies
Dependabot + go-mod-outdated

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-08-01 17:37:43 +04:00
Tommy Botten Jensen
08d2612e07
docs: bond devices are comma separated
Update kernel arguments bond doc.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-07-29 20:51:35 +04:00
Dmitriy Matrenichev
c3c3e14db5
chore: add gotagsrewrite tool and use it to add tags to resources
This commit adds gotagsrewrite tool, which is used to add `protobuf:"<n>"` tags to structs with //gotagsrewrite:gen comment. This will be used in conjunction with github.com/siderolabs/protoenc.

Closes #5941

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-07-29 14:51:02 +03:00
Andrey Smirnov
2e790526f7
refactor: make apid stop gracefully and be stopped late
This fixes apid and machined shutdown sequences to do graceful stop of
gRPC server with timeout.

Also sequences are restructured to stop apid/machined as late as
possible allowing access to the node while the long sequence is running
(e.g. upgrade or reset).

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-07-29 14:52:04 +04:00
Andrey Smirnov
0cdf222431
fix: retry Conflict errors when upgrading k8s manifests
Fixes #5985

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-07-29 13:20:04 +04:00
Andrey Smirnov
1db097f509
release(v1.2.0-alpha.1): prepare release
This is the official v1.2.0-alpha.1 release.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-07-28 21:43:44 +04:00
Noel Georgi
5ac4947b63
feat: enable default seccomp profile for kubelet
Enable the default seccomp profile provided by the container runtime

Fixes: #5293

Ref: https://kubernetes.io/docs/tutorials/security/seccomp/

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-07-28 21:45:49 +05:30
Artem Chernyshev
e5994ff7a7
fix: skip ResetDuringBoot test if the Cluster config is unknown
And improve retry logic in the test.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2022-07-28 15:57:58 +03:00
Artem Chernyshev
8028e10749
fix: wait for boot done when rebooting a node in the integration tests
We shouldn't start cluster healthcheck until boot sequence is done.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2022-07-27 23:58:43 +03:00
Artem Chernyshev
ae1bec59e9
feat: allow running only one sequence at a time
Fix `Talos` sequencer to run only a single sequence at the same time.
Sequences priority was updated. To match the table:

| what is running (columns) what is requested (rows) | boot | reboot | reset | upgrade |
|----------------------------------------------------|------|--------|-------|---------|
| reboot                                             | Y    | Y      | Y     | N       |
| reset                                              | Y    | N      | N     | N       |
| upgrade                                            | Y    | N      | N     | N       |

With a small addition that `WithTakeover` is still there.
If set, priority is ignored.

This is mainly used for `Shutdown` sequence invokation.
And if doing apply config with reboot enabled.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2022-07-27 17:21:36 +03:00
Andrey Smirnov
ec05aee040
fix: correctly unwrap errors when streaming
When message is sent via the proxy, `metadata.error` carries only string
representation which can't be unmarshalled back into an `error` which we
can match against. A similar fix was already done for "unary" responses,
but we missed the streaming case.

This fixes a spurious failure in integration tests when calling
`talosctl pcap --duration 1s`.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-07-26 23:52:37 +04:00
Dmitriy Matrenichev
7c7f2d8c3b
feat: refactor disk size matcher to be compatible with DeepEqual
Replace Matcher field with Matcher method and store Op and size data directly in InstallDiskSizeMatcher.

Closes #5860.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-07-26 17:10:11 +03:00
Andrey Smirnov
3addea83b9
feat: introduce support for Talos API access from Kubernetes
This is a first step: providing a service to access Talos API.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-07-26 00:02:19 +04:00
Matthew Richardson
34d3a41643
docs: add missing <> to relref
Fixing small issue in syntax.

Signed-off-by: Matthew Richardson <M.Richardson@ed.ac.uk>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-07-25 23:20:51 +04:00
Andrey Smirnov
c4d2d20c41
fix: enable stable hostnames for worker configs as well
This fixes a small bug with stable hostnames when they were only enabled
for control plane nodes.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-07-25 22:30:44 +04:00
Noel Georgi
0326bac1f9
chore: bump kernel to 5.15.57
Bump kernel to [5.15.57](https://github.com/siderolabs/pkgs/pull/539)

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-07-25 21:16:18 +05:30
Andrey Smirnov
86820c33f1
chore: bump dependencies
dependabot + go-mod-outdated

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-07-25 18:14:49 +04:00
Andrey Smirnov
6e7dfeeb38
fix: data race in packet capture (part 2)
The `PacketSource` interface is racy, as it provides a channel to read
packets from, while packets are read in a (invisible) goroutine, so
closing the capture handle creates a data race with reading.

Unwrap that goroutine into an explicit loop to avoid the race.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-07-25 15:24:32 +04:00
AMet
c11e1dae70
docs: fix spelling and grammar errors
Fix spelling and grammar errors

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-07-23 14:51:45 +05:30
Dmitriy Matrenichev
30f7851d2a
chore: bump golangci-lint from 1.45.2 to 1.47.2
Minor linter upgrade.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-07-22 17:49:44 +03:00
Dmitriy Matrenichev
2cce9112d1
chore: bump goimports from 0.1.10 to 0.1.11
Minor linter upgrade.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2022-07-22 02:07:39 +03:00
Noel Georgi
18756c7ff6
fix: folder permissions of overlay mounted folders
Set the correct permissions for the overlay mounted folders. This issue
was identified from #5948

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-07-22 01:38:09 +05:30
Utku Ozdemir
47c35dc474
feat: set stable default hostname based on machine-id
Use machine-id as the source for the default hostname (e.g. `talos-2gd-76y`) instead of DHCP-assigned IP (e.g. `talos-172-20-0-2`). This way, DHCP IP changes won't impact the hostname. Defaults to true for Talos version >=1.2.

Closes siderolabs/talos#5896.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2022-07-21 19:37:28 +02:00
Noel Georgi
1ed3df295c
chore: support glibc apps extension spec
Update extension spec to support glibc standard path.

Ref: https://github.com/siderolabs/extensions/pull/49

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-07-21 22:09:56 +05:30
Andrey Smirnov
a2aea97263
fix: write etcd PKI files in a controller
Instead of writing PKI "once" around the startup time, keep writing PKI
files as the certificates get updated. `etcd` is able to reload
certificates, so we should keep updating them e.g. if the hostname/IPs
change over time.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-07-21 18:37:45 +04:00
Utku Ozdemir
bb4abc0961
fix: regenerate kubelet certs when hostname changes
Clear the kubelet certificates and kubeconfig when hostname changes so that on next start, kubelet goes through the bootstrap process and new certificates are generated and the node is joined to the cluster with the new name.

Fixes siderolabs/talos#5834.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2022-07-21 01:54:15 +02:00
Noel Georgi
d650afb6cd
chore: fix typo in powercycle
Fix typo in `powercycle`

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-07-20 21:33:17 +05:30
Andrey Smirnov
644e803adf
fix: use masks and different firewall mark for KubeSpan
Fixes #4836

Firewall mark is `uint32` attached to the packet in the Linux kernel
(it's not transmitted on the wire). This is a shared value for all
networking software, so multiple components might attempt to set and
match on the firewall mark.

Cilium and Calico CNIs are using firewall marks internally, but they
touch only some bits of the firewall mark.

The way KubeSpan was implemented before this PR, it was doing direct
match on the firewall mark, and setting the whole `uint32`, so it comes
into conflict with any other networking component using firewall marks.

The other problem was that firewall mark 0x51820 (0x51821) was too
"wide" touching random bits of the 32-bit value for no good reason.

So this change contains two fixes:

* make firewall mark exactly a single bit (we use bits `0x20` and `0x40`
  now)
* match and mark packets with the mask (don't touch bits outside of the
  mask when setting the mark and ignore bits outside of the mask when
  matching on the mark).

This was tested successfully with both Cilium CNI (default config +
`ipam.mode=kubernetes`) and Calico CNI (default config).

One thing to note is that for KubeSpan and Talos it's important to make
sure that `podSubnets` in the machine config match CNI setting for
`podCIDRs`.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-07-20 16:05:56 +04:00
Andrey Smirnov
80444a43d9
fix: remove data race in pcap capture
Capture handle should be closed in the same goroutine with packet
reading.

Fix a spurious error which might appear in `talosctl pcap`.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-07-20 01:10:59 +04:00
Spencer Smith
04a45dff28
docs: remove katacoda links
This PR removes katacoda links since katacoda is dead now :(

Signed-off-by: Spencer Smith <spencer.smith@talos-systems.com>
2022-07-19 12:25:15 -04:00
Andrey Smirnov
065b59276c
feat: implement packet capture API
This uses the `go-packet` library with native bindings for the packet
capture (without `libpcap`). This is not the most performant way, but it
allows us to avoid CGo.

There is a problem with converting network filter expressions (like
`tcp port 3222`) into BPF instructions, it's only available in C
libraries, but there's a workaround with `tcpdump`.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-07-19 01:23:09 +04:00
Andrey Smirnov
7c006cabc7
feat: update Kubernetes to 1.24.3
See https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.24.md#changelog-since-v1242

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-07-18 22:10:34 +04:00
Andrey Smirnov
551290195c
chore: bump dependencies
dependabot + go-mod-outdated

Kubernetes 1.24.3 will go as a separate PR.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-07-18 21:22:01 +04:00
Andrey Smirnov
1677bcc4b2
fix: skip bond itself when matching interface (Equinix Metal)
This fixes a problem when platform network configuration might have
already been applied from the cached on disk representation, and in that
base e.g. `bond0` MAC is same as `eth0`, so Talos might mistakenly pick
up `bond0` as a slave to itself instead of `eth0`.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-07-18 21:03:48 +04:00
Andrey Smirnov
f1c2b5c558
feat: implement strategic merge patching for API server admission config
The testcase explains it better, but tl;dr is that this allows to do
strategic merge patching e.g. for the Pod Security configuration.

Fixes #5895

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-07-18 20:15:26 +04:00
Nico Berlee
be98cb82b5
feat: follow KEP-2568 non-root enhancements
KEP-2568: https://github.com/kubernetes/enhancements/tree/master/keps/sig-cluster-lifecycle/kubeadm/2568-kubeadm-non-root-control-plane

Deviation:
 - example sets UID/GID in container context, its safer to do this in pod context

Signed-off-by: Nico Berlee <nico.berlee@on2it.net>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-07-18 18:34:13 +04:00
Utku Ozdemir
87ea1d9611
fix: update kubelet kubeconfig when cluster control plane endpoint changes
Overwrite cluster's server URL in the kubeconfig file used by kubelet when the cluster control plane endpoint is changed in machineconfig, so that kubelet doesn't lose connectivity to kube-apiserver.

Closes siderolabs/talos#4470.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2022-07-16 14:19:25 +02:00
Utku Ozdemir
a75fe7600d
feat: gen secrets from kubernetes pki dir
This PR allows the ability to generate `secrets.yaml` (`talosctl gen secrets`) using a Kubernetes PKI directory path (e.g. `/etc/kubernetes/pki`) as input. Also introduces the flag `--kubernetes-bootstrap-token` to be able to set a static Kubernetes bootstrap token to the generated `secrets.yaml` file instead of a randomly-generated one. Closes siderolabs/talos#5894.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2022-07-16 13:06:32 +02:00
Utku Ozdemir
a1d7b535ad
docs: add kubeadm migration guide
Document how to migrate from kubeadm-based clusters to Talos.

Part of siderolabs/talos#5832

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2022-07-16 12:50:27 +02:00
zebernst
9e0c56581e
docs: guide for setting up synology-csi driver
Guide for synology-csi driver

Signed-off-by: Noel Georgi <git@frezbo.dev>
2022-07-15 23:54:34 +05:30
Andrey Smirnov
f0b8eea5e5
refactor: remove bootstrap sequence
Refactor things to remove the bootstrap sequence, this should help with
the task of sequencer concurrency changes and immediate API feedback.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2022-07-15 20:24:07 +04:00
Utku Ozdemir
89c7da8991
docs: add documentation for vagrant & libvirt
Documentation of running Talos on a Vagrant environment with libvirt provider.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2022-07-15 16:54:30 +02:00
Tim Jones
014b85fdcb
docs: improve talos kubernetes upgrade note
Improve working on Talos upgrade vs Kubernetes upgrades.

Signed-off-by: Tim Jones <tim.jones@siderolabs.com>
2022-07-15 16:08:18 +02:00
Spencer Smith
88bb017ed0
docs: remove old docs from site
This PR removes pre-v0.10 docs from the drop down. They will remain in
the content so folks can still read them if needed.

Signed-off-by: Spencer Smith <spencer.smith@talos-systems.com>
2022-07-14 20:52:35 -04:00