201 Commits

Author SHA1 Message Date
Andrey Smirnov
7a68504b6b
feat: support rotating Kubernetes CA
Fixes #8440

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-04-01 22:08:02 +04:00
Noel Georgi
bac366e43e
chore: add ExtraInfo field for extensions
Add an extra field to extensions to store arbitrary info.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-04-01 19:30:29 +05:30
Dmitry Sharshakov
9456489147
feat: support hardware watchdog timers
Only enabled when activated by config, disabled on shutdown/reboot

Fixes #8284

Signed-off-by: Dmitry Sharshakov <dmitry.sharshakov@siderolabs.com>
Signed-off-by: Dmitry Sharshakov <d3dx12.xx@gmail.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-03-25 18:19:39 +03:00
Andrey Smirnov
8eacc4ba80
feat: support rotation of Talos API CA
This allows to roll all nodes to use a new CA, to refresh it, or e.g.
when the `talosconfig` was exposed accidentally.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-03-22 12:16:47 +04:00
Andrey Smirnov
89fc68b459
fix: service lifecycle issues
The core change is moving the context out of the `ServiceRunner` struct
to be a local variable, and using a channel to notify about shutdown
events.

Add more synchronization between Run and the moment service started to
avoid mis-identifying not running (yet) service as successfully finished.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Co-authored-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-03-19 18:11:13 +04:00
Andrey Smirnov
15beb14780
feat: implement blockdevice watch controller
This controller combines kobject events, and scan of `/sys/block` to
build a consistent list of available block devices, updating resources
as the blockdevice changes.

Based on these resources the next step can run probe on the blockdevices
as they change to present a consistent view of filesystems/partitions.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-03-18 18:28:40 +04:00
Dmitriy Matrenichev
06e3bc0cbd
feat: implement Siderolink wireguard over GRPC
For #8064

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-03-18 15:38:13 +03:00
Dmitriy Matrenichev
32e0877607
chore: print all available logs containers in logs command completions
This is a small quality of life improvement that allows `logs` subcommand to suggest all available logs.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-03-11 17:48:01 +03:00
Noel Georgi
15e8bca2b2
feat: support environment in ExtensionServicesConfig
Support setting extension services environment variables in
`ExtensionServiceConfig` document.

Refactor `ExtensionServicesConfig` -> `ExtensionServiceConfig` and move extensions config under `runtime` pkg.

Fixes: #8271

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-02-15 20:16:29 +05:30
Dmitriy Matrenichev
5324d39167
chore: bump stuff
Also fix .golangci.yml file.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-02-09 19:19:25 +03:00
Dmitriy Matrenichev
afa71d6b02
chore: use "handle-like" resource in DNSResolveCacheController
Rework (and simplify) `DNSResolveCacheController` to use `DNSUpstream` "handle-like" resources.

Depends on https://github.com/cosi-project/runtime/pull/400

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-02-08 21:40:57 +03:00
Noel Georgi
1e6c8c4dec
feat: extensions services config
Support config files for extension services.

Fixes: #7791

Co-authored-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-02-06 17:12:01 +05:30
Andrey Smirnov
9d8cd4d058
chore: drop deprecated method EtcdRemoveMember
It was deprecated 16 months ago, time to cleanup.

(This is to prepare for the first v1.7 release)

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-02-01 15:54:29 +04:00
Dmitriy Matrenichev
ebeef28525
feat: implement local caching dns server
This PR adds a new controller - `DNSServerController` that starts tcp and udp dns servers locally. Just like `EtcFileController` it monitors `ResolverStatusType` and updates the list of destinations from there.

Most of the caching logic is in our "lobotomized" "`CoreDNS` fork. We need this fork because default `CoreDNS` carries
full Caddy server and various other modules that we don't need in Talos. On our side we implement
random selection of the actual dns and request forwarding.

Closes #7693

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-01-29 20:26:38 +03:00
Andrey Smirnov
131a1b1671
fix: add a KubeSpan option to disable extra endpoint harvesting
It works well for small clusters, but with bigger clusters it puts too
much load on the discovery service, as it has quadratic complexity in
number of endpoints discovered/reported from each member.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-12 14:07:31 +04:00
Dmitriy Matrenichev
6bb1e99aa3
chore: optimize pcap dump
Reimplement `gopacket.PacketSource.PacketsCtx` as `forEachPacket`.

- Use `ZeroCopyPacketDataSource` instead of `PacketDataSource`. I didn't find any specific reason why `PacketDataSource` exists at all, since `NewPacket` is doing copy inside if you don't explicitly tell it not to.
- Use `WillPool` to pool packet buffers. It doesn't fully remove allocations, but it's a safe start.
  Send packets back into the pool after we are done with them.
- Pass `Packet` directly to the closure instead of waiting for it on the channel. We don't store this packet anywhere so there is no reason to async this part.
- Drop `time.Sleep` code in `forEachPacket` body.
- Drop `SnapLen` support in client and server since it didn't work anyway (details in the PR).

Closes #7994

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2023-12-11 15:44:42 +03:00
Andrey Smirnov
e71e3e4161
feat: support extra arguments for flanneld
Fixes #7754

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-01 16:18:02 +04:00
Dmitriy Matrenichev
0b111ecb81
fix: support slices of enums and fix NfTablesConntrackStateMatch
We already have the code which supports custom enums, so let's extend it to support custom enums in slices and
fix the NfTablesConntrackStateMatch proto definition.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2023-11-30 00:23:16 +03:00
Andrey Smirnov
9a85217412
feat: improve nftables backend
Many changes to the nftables backend which will be used in the follow-up
PR with #4421.

1. Add support for chain policy: drop/accept.
2. Properly handle match on all IPs in the set (`0.0.0.0/0` like).
3. Implement conntrack state matching.
4. Implement multiple ifname matching in a single rule.
5. Implement anonymous counters.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-11-29 21:22:47 +04:00
Andrey Smirnov
db4e2539d4
feat: update Kubernetes 1.29.0-rc.1 and other bumps
Bump Go modules, final tools and semi-final pkgs.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-11-29 18:29:52 +04:00
Andrey Smirnov
e46e6a312f
feat: implement nftables backend
Implement initial set of backend controllers/resources to handle
nftables chains/rules etc.

Replace the KubeSpan nftables operations with controller-based.

See #4421

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-11-27 21:14:15 +04:00
Dmitriy Matrenichev
70d53ee13c
chore: deprecate .persist and .extensions
This commit deprecates those things:
- Removes the support of `.persist` flag. From now, it should always be enabled or not defined in the config.
- Removes the documentation for `.bootloader`. It never worked anyway.
- Adds a warning for `.machine.install.extensions`, suggests to use boot-assets.

Closes #7972
Closes #7507

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2023-11-22 20:35:38 +03:00
Oscar Utbult
020a0eb63e
docs: fix table formatting for bootstraprequest
Fixes formatting for https://www.talos.dev/v1.6/reference/api/#bootstraprequest

Signed-off-by: Oscar Utbult <oscar.utbult@gmail.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-11-20 20:11:43 +04:00
Oscar Utbult
de6caf5348
docs: fix table formatting for machineservice api
Fixes formatting for https://www.talos.dev/v1.6/reference/api/#machineservice

Signed-off-by: Oscar Utbult <oscar.utbult@gmail.com>
2023-11-20 17:13:10 +04:00
Noel Georgi
0d3c3ed716
feat: support kube scheduler config
Support kube-scheduler config.

Fixes: #7905
Partially fixes: #7911

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-11-15 10:15:23 +05:30
Noel Georgi
4f1ad16c76
feat: support kubelet credentialprovider config
Support configuring kubelet credential provider config.

Partially fixes: #7911

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-11-13 19:40:43 +05:30
Dmitriy Matrenichev
6eade3d5ef
chore: add ability to rewrite uuids and set unique tokens for Talos
This PR does those things:
- It allows API calls `MetaWrite` and `MetaRead` in maintenance mode.
- SystemInformation resource now waits for available META
- SystemInformation resource now overwrites UUID from META if there is an override
- META now supports "UUID override" and "unique token" keys
- ProvisionRequest now includes unique token and Talos version

For #7694

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2023-11-10 18:17:54 +03:00
Andrey Smirnov
2b548ad0d9
feat: update containerd to 1.7.x
Also update Linux and other pkgs.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-09-28 16:33:57 +04:00
Artem Chernyshev
2960f93baa
feat: add readonly information to the disks API response
Forward device readonly info from `go-blockdevice` library.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2023-09-12 18:09:59 +03:00
Noel Georgi
3d2dad4e69
chore: show securtiystate on dashboard
Show Talos SecurityState and MountStatus on dashboard.

Fixes: #7675

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-09-06 21:46:25 +05:30
Andrey Smirnov
676db97684
docs: fork docs for Talos 1.6
Create a copy of documentation for Talos 1.6.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-08-17 19:37:38 +04:00
Andrey Smirnov
0f1920bdda
chore: provide a resource to peek into Linux clock adjustments
This is a follow-up for #7567, which won't be backported to 1.5.

This allows to get an output like:

```
$ talosctl -n 172.20.0.5 get adjtimestatus -w
NODE         *   NAMESPACE TYPE            ID     VERSION   OFFSET        ESTERROR   MAXERROR   STATUS               SYNC
172.20.0.5   +   runtime   AdjtimeStatus   node   47        -18.14306ms   0s         191.5ms    STA_PLL | STA_NANO   true
172.20.0.5       runtime   AdjtimeStatus   node   48        -17.109555ms  0s         206.5ms    STA_NANO | STA_PLL   true
172.20.0.5       runtime   AdjtimeStatus   node   49        -16.134923ms  0s         221.5ms    STA_NANO | STA_PLL   true
172.20.0.5       runtime   AdjtimeStatus   node   50        -15.21581ms   0s         236.5ms    STA_PLL | STA_NANO   true
```

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-08-03 22:06:53 +04:00
Noel Georgi
68e6b98f7d
feat: add security state resource
Add security state resource that describes the state of Talos SecureBoot
and PCR signing key fingerprints.

The UKI fingerprint is currently not populated.

Fixes: #7514

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-07-31 22:02:08 +05:30
Dmitriy Matrenichev
5f34f5b41f
chore: rename api load balancer to KubePrism
Closes #7432

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2023-07-14 15:23:53 +03:00
Andrey Smirnov
8017afb107
feat: implement CRI image management and pre-pull on K8s upgrade
Fixes #6391

Implement a set of APIs and commands to manage images in the CRI, and
pre-pull images on Kubernetes upgrades.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-11 19:25:10 +04:00
Andrey Smirnov
bdb96189fa
refactor: make maintenance service controller-based
Fixes #7430

Introduce a set of resources which look similar to other API
implementations: CA, certs, cert SANs, etc.

Introduce a controller which manages the service based on resource
state.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-10 15:41:52 +04:00
LukasAuerbeck
c81ce8cfb0
feat: support controlplane resources configuration
Fixes #7379

Add possibility to configure the controlplane static pod resources via
APIServer, ControllerManager and Scheduler configs.

Signed-off-by: LukasAuerbeck <17929465+LukasAuerbeck@users.noreply.github.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-07-07 22:44:56 +04:00
Artem Chernyshev
ce63abb219
feat: add KMS assisted encryption key handler
Talos now supports new type of encryption keys which rely on Sealing/Unsealing randomly generated bytes with a KMS server:

```
systemDiskEncryption:
  ephemeral:
    keys:
      - kms:
          endpoint: https://1.2.3.4:443
        slot: 0
```
gRPC API definitions and a simple reference implementation of the KMS server can be found in this
[repository](https://github.com/siderolabs/kms-client/blob/main/cmd/kms-server/main.go).

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2023-07-07 19:02:39 +03:00
Utku Ozdemir
0d313b9733
feat: add reboot-mode flag to talosctl upgrade
Allow specifying the reboot mode during upgrades by introducing `--reboot-mode` flag, similar to the `--mode` flag of the reboot command.

Closes siderolabs/talos#7302.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2023-06-26 17:37:19 +02:00
Dmitriy Matrenichev
445f5ad542
feat: support API server load balancer
This commit adds support for API load balancer. Quick way to enable it is during cluster creation using new `api-server-balancer-port` flag (0 by default - disabled). When enabled all API request will be routed across
cluster control plane endpoints.

Closes #7191

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2023-06-16 10:09:20 -04:00
Andrey Smirnov
0a99965efb
refactor: replace uncordonNode with controllers
Fixes #7233

Waiting for node readiness now happens in the `MachineStatus` controller
which won't mark the node as ready until Kubernetes `Node` is ready.

Handling cordoning/uncordining happens with help of additional resource
in `NodeApplyController`.

New controller provides reactive `NodeStatus` resource to see current
status of Kubernetes `Node`.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-06-13 21:48:42 +04:00
Andrey Smirnov
dbaf5c6997
refactor: task labelControlPlane into controllers
See #7233

The controlplane label is simply injected into existing controller-based
node label flow.

For controlplane taint default NoScheduleTaint, additional controller &
resource was implemented to handle node taints.

This also fixes a problem with `allowSchedulingOnControlPlanes` not
being reactive to config changes - now it is.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-06-12 15:25:13 +04:00
Andrey Smirnov
e7be6ee7c3
refactor: make event log streaming fully reactive
I ended up completely rewriting the controller, simplifying the flow
(somewhat) so that there's just a single control flow in the controller,
while reading from v1alpha1 events is converted to reading from a
channel.

Fixes #7227

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-06-08 23:13:33 +04:00
Andrey Smirnov
5dab45e869
refactor: allow kmsg log streaming to be reconfigured on the fly
Fixes #7226

This follows same flow as other similar changes - split out logging
configuration as a separate resource, source it for now in the cmdline.

Rewrite the controller to allow multiple log outputs, add send retries.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-06-06 15:56:24 +04:00
Dmitriy Matrenichev
8a02ecd4cb
chore: add endpoints balancer controller
This PR adds support for creating a list of API endpoints (each is pair of host and port).

It gets them from
- Machine config cluster endpoint.
- Localhost with LocalAPIServerPort if machine is control panel.
- netip.Addr[0] and port from affiliates if they are control panels.

For #7191

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2023-06-05 20:47:52 -04:00
Andrey Smirnov
bab484a405
feat: use stable network interface names
Use `udevd` rules to create stable interface names.

Link controllers should wait for `udevd` to settle down, otherwise link
rename will fail (interface should not be UP).

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-06-01 21:29:12 +04:00
Andrey Smirnov
10155c390e
feat: enable xfs project quota support, kubelet feature
This is controlled with a feature flag which gets enabled automatically
for Talos 1.5+.

Fixes #7181

If enabled, configures kubelet to use project quotas to track xfs volume
usage, which is much more efficient than doing `du` periodically.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-05-19 20:33:39 +04:00
Andrey Smirnov
bb02dd263c
chore: drop deprecated stuff for Talos 1.5
* drop old resources API, which was deprecated long time ago
* use bootstrapped event in `talosctl get --watch` to better align
  columns in the table output

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-05-18 19:46:37 +04:00
Utku Ozdemir
62c6e9655c
feat: introduce siderolink config resource & reconnect
Introduce a new resource, `SiderolinkConfig`, to store SideroLink connection configuration (api endpoint for now).

Introduce a controller for this resource which populates it from the Kernel cmdline.

Rework the SideroLink `ManagerController` to take this new resource as input and reconfigure the link on changes.

Additionally, if the siderolink connection is lost, reconnect to it and reconfigure the links/addresses.

Closes siderolabs/talos#7142, siderolabs/talos#7143.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2023-05-05 17:04:34 +02:00
Andrey Smirnov
860002c735
fix: don't reload control plane pods on cert SANs changes
Fixes #7159

The change looks big, but it's actually pretty simple inside: the static
pods had an annotation which tracks a version of the secrets which
forced control plane pods to reload on a change. At the same time
`kube-apiserver` can reload certificate inputs automatically from files
without restart.

So the inputs were split: the dynamic (for kube-apiserver) inputs don't
need to be reloaded, so its version is not tracked in static pod
annotation, so they don't cause a reload. The previous non-dynamic
resource still causes a reload, but it doesn't get updated when e.g.
node addresses change.

There might be many more refactoring done, the resource chain is a bit
of a mess there, but I wanted to keep number of changes minimal to keep
this backportable.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
2023-05-05 16:59:09 +04:00