1927 Commits

Author SHA1 Message Date
Artem Chernyshev
4547ad9afa
feat: send actor id to the SideroLink events sink
This might come handy to distinguish sequences, tasks initiated by a
particular API request.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2023-12-11 21:59:02 +03:00
Dmitriy Matrenichev
6bb1e99aa3
chore: optimize pcap dump
Reimplement `gopacket.PacketSource.PacketsCtx` as `forEachPacket`.

- Use `ZeroCopyPacketDataSource` instead of `PacketDataSource`. I didn't find any specific reason why `PacketDataSource` exists at all, since `NewPacket` is doing copy inside if you don't explicitly tell it not to.
- Use `WillPool` to pool packet buffers. It doesn't fully remove allocations, but it's a safe start.
  Send packets back into the pool after we are done with them.
- Pass `Packet` directly to the closure instead of waiting for it on the channel. We don't store this packet anywhere so there is no reason to async this part.
- Drop `time.Sleep` code in `forEachPacket` body.
- Drop `SnapLen` support in client and server since it didn't work anyway (details in the PR).

Closes #7994

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2023-12-11 15:44:42 +03:00
Andrey Smirnov
46121c9fec
docs: rework machine config documentation generation
Generate a structured table of contents following the structure of the
config.

Make high-level examples follow the full structure of the config.

Document new multi-doc machine config.

Fixes #8023

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-08 14:16:40 +04:00
Andrey Smirnov
270604bead
fix: support user disks via symlinks
The core blockdevice library already supported resolving symlinks, we
just need to get the raw block device name from it, and use it
afterwards.

In QEMU provisioner, leave the first (system) disk as virtio (for
performance), and mount user disks as 'ata', which allows `udevd` to
pick up the disk IDs (not available for `virtio`), and use the symlink
path in the tests.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-05 22:02:56 +04:00
Andrey Smirnov
474fa0480d
fix: store and execute desired action on emergency action
Fixes #7854

Talos runs an emergency handler if the sequence experience and
unrecoverable failure. The emergency handler was unconditionally
executing "reboot" action if no other action was received (which only
gets received if the sequence completes successfully), so the Shutdown
request might result in a Reboot behavior on error during shutdown
phase.

This is not a pretty fix, but it's hard to deliver the intent from one
part of the code to another right now, so instead use a global variable
which stores default emergency intention, and gets overridden early in
the Shutdown sequence.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-04 19:51:48 +04:00
Andrey Smirnov
dbf274ddf7
fix: skip writing the file if the contents haven't changed
As the controller reconciles every /etc file present, it might be called
multiple times for the same file, even if the actual contents haven't
changed.

Rewriting the file might lead to some concurrent process seeing
incomplete file contents more often than needed.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-04 15:58:03 +04:00
Andrey Smirnov
d8a435f0e4
fix: initialize boot assets with defaults early
The problem was that bootloaders were correctly picking up defaults for
`installer` mode (vs. `imager` mode), but DTB and other SBC stuff wasn't
properly initialized, so installing on SBC fails.

Now all options are properly initialized with defaults early in the
process.

Fixes #8009

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-01 17:47:05 +04:00
Andrey Smirnov
c6835de17a
fix: pick etcd adverised addresses from 'current' addresses
Fixes #7947

This way etcd advertised address can be picked from the `external IPs`
of the machine.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-01 17:26:28 +04:00
Andrey Smirnov
e71e3e4161
feat: support extra arguments for flanneld
Fixes #7754

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-12-01 16:18:02 +04:00
Andrey Smirnov
36c8ddb5e1
feat: implement ingress firewall rules
Fixes #4421

See documentation for details on how to use the feature.

With `talosctl cluster create`, firewall can be easily test with
`--with-firewall=accept|block` (default mode).

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-11-30 22:58:16 +04:00
Dmitriy Matrenichev
0b111ecb81
fix: support slices of enums and fix NfTablesConntrackStateMatch
We already have the code which supports custom enums, so let's extend it to support custom enums in slices and
fix the NfTablesConntrackStateMatch proto definition.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2023-11-30 00:23:16 +03:00
Andrey Smirnov
9a85217412
feat: improve nftables backend
Many changes to the nftables backend which will be used in the follow-up
PR with #4421.

1. Add support for chain policy: drop/accept.
2. Properly handle match on all IPs in the set (`0.0.0.0/0` like).
3. Implement conntrack state matching.
4. Implement multiple ifname matching in a single rule.
5. Implement anonymous counters.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-11-29 21:22:47 +04:00
Noel Georgi
f041b26299
chore: add tests for mdadm extension
Add tests for mdadm extension.

See: https://github.com/siderolabs/extensions/pull/271

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-11-27 23:18:35 +05:30
Andrey Smirnov
e46e6a312f
feat: implement nftables backend
Implement initial set of backend controllers/resources to handle
nftables chains/rules etc.

Replace the KubeSpan nftables operations with controller-based.

See #4421

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-11-27 21:14:15 +04:00
Dmitriy Matrenichev
ba827bf8b8
chore: support getting multiple endpoints from the Provision rpc call
The code will rotate through the endpoints, until it reaches the end, and only then it will try to do the provisioning again.

Closes #7973

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2023-11-25 21:38:44 +03:00
Dmitriy Matrenichev
dd45dd06cf
chore: add custom node taints
This PR adds support for custom node taints. Refer to `nodeTaints` in the `configuration` for more information.

Closes #7581

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2023-11-25 18:33:18 +03:00
Dmitriy Matrenichev
70d53ee13c
chore: deprecate .persist and .extensions
This commit deprecates those things:
- Removes the support of `.persist` flag. From now, it should always be enabled or not defined in the config.
- Removes the documentation for `.bootloader`. It never worked anyway.
- Adds a warning for `.machine.install.extensions`, suggests to use boot-assets.

Closes #7972
Closes #7507

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2023-11-22 20:35:38 +03:00
Noel Georgi
aca8b5e179
fix: ignore kernel command line in container mode
Ignore kernel command line for `SideroLink` and `EventsSink` config when
running in container mode. Otherwise when running Talos as a docker
container in Talos it picks up the host kernel cmdline and try to
configure SideroLink/EventsSink.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-11-21 18:55:37 +05:30
Andrey Smirnov
27d208c26b
feat: implement OAuth2 device flow for machine config
Fixes #7939

See documentation in the PR for the description of the feature.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-11-20 14:31:43 +04:00
Noel Georgi
5c8fa2a803
chore: start containerd early in boot
Start container early in the boot process so system extension services
start in maintenance mode.

Fixes: #7083

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-11-16 23:19:33 +05:30
Noel Georgi
0d3c3ed716
feat: support kube scheduler config
Support kube-scheduler config.

Fixes: #7905
Partially fixes: #7911

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-11-15 10:15:23 +05:30
Andrey Smirnov
06941b7e5c
fix: allow rootfs propagation configuration for extension services
Fixes #7873

Some services which perform mounts inside the container which require
mounts to propagate back to the host (e.g. `stargz-snapshotter`) require
this configuration setting.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-11-13 21:58:22 +04:00
Noel Georgi
4f1ad16c76
feat: support kubelet credentialprovider config
Support configuring kubelet credential provider config.

Partially fixes: #7911

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-11-13 19:40:43 +05:30
Andrey Smirnov
f38eaaab87
feat: rework secureboot and PCR signing key
Support different providers, not only static file paths.

Drop `pcr-signing-key-public.pem` file, as we generate it on the fly
now.

See https://github.com/siderolabs/image-factory/issues/19

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-11-10 21:14:21 +04:00
Dmitriy Matrenichev
6eade3d5ef
chore: add ability to rewrite uuids and set unique tokens for Talos
This PR does those things:
- It allows API calls `MetaWrite` and `MetaRead` in maintenance mode.
- SystemInformation resource now waits for available META
- SystemInformation resource now overwrites UUID from META if there is an override
- META now supports "UUID override" and "unique token" keys
- ProvisionRequest now includes unique token and Talos version

For #7694

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2023-11-10 18:17:54 +03:00
Andrey Smirnov
e9c7ac17a9
fix: set max msg recv size when proxying
Previously a fix was deployed in the Talos API client, but when the
request passes through `apid`, we need to make sure that proxy doesn't
reject large responses.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-11-09 21:08:54 +04:00
Andrey Smirnov
e22ab440d7
feat: update Linux 6.1.61, containerd 1.7.8, runc 1.1.10
Bump tools/pkgs/extras.

Update Go dependencies.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-11-09 20:17:28 +04:00
Noel Georgi
75d3987c05
chore: drop sha1 from genereated pcr json
Drop `sha1` algorithm from expected PCR json calculation.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-11-07 22:11:33 +05:30
Andrey Smirnov
87c40da6cc
fix: proper logging in machined on startup
Move `setupLogging` inside the controller, so that logger is set up
correctly before Talos starts printing first messages.

This fixes an inconsistency that first messages are printed using
"default" logger, while after that the proper logger is set up, and
format of the messages matches kernel log.

Also move `waitForUSBDelay` into the sequencer after `udevd` was
started (this is when blockdevices including USB ones are discovered).

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-11-07 17:09:18 +04:00
Andrey Smirnov
a54da5f641
fix: image build for nanopi_4s
Path was missing a slash.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-11-07 16:50:22 +04:00
Andrey Smirnov
6f3cd05935
refactor: update packet capture to use 'afpacket' interface
First of all, this interface is way more performant than `pcap`
interface. It is Linux-specific, but we don't care in Talos Linux :)

Second, this drop dependency of `machined` on `gopacket/layers` package,
which has huge issues with memory allocations and startup time.

This cuts around 20MiB of process RSS for all Talos processes.
(`talosctl` still requires this `gopacket/layers` library for decoding
packets).

Fixes #7880

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-11-07 15:52:04 +04:00
Andrey Smirnov
813442dd7a
fix: don't validate machine.install if installed
As Talos doesn't consume `.machine.install` if already installed, there
is no point in validating it once already installed.

This fixes a problem users often run into: after a reboot/upgrade the
system disk blockdevice name changes, due to the kernel upgrade, or just
unpredictable behavior of device discovery. Talos fails to boot as it
can't validate the machine config, while it's already installed, so
actual blockdevice name doesn't matter.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-11-03 15:08:42 +04:00
Andrey Smirnov
807a9950ac
fix: use custom Talos/kernel version when generating UKI
See https://github.com/siderolabs/image-factory/issues/44

Instead of using constants, use proper Talos version and kernel version
discovered from the image.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-11-03 11:14:02 +04:00
Andrey Smirnov
2e78513e16
refactor: drop the dependency link platform -> network ctrl
This leads to lots of unnecessary improts, as the chain from network
controllers is pretty long.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-11-01 21:56:49 +04:00
Andrey Smirnov
6dc776b8aa
fix: when writing to META in the installer/imager, use fixed name
Use fixed partition name instead of trying to auto-discover by label.

Auto-discovery by label might hit completely wrong blockdevice.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-11-01 20:34:41 +04:00
Andrey Smirnov
cbe6e7622d
fix: generate images for SBCs using imager
See https://github.com/siderolabs/image-factory/issues/43

Two fixes:

* pass path to the dtb, uboot and rpi-firmware explicitly
* include dtb, uboot and rpi-firmware into arm64 installer image when
  generated via imager (regular arm64 installer was fine)

(The generation of SBC images was not broken for Talos itself, but only
when used via Image Factory).

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-10-30 13:46:58 +04:00
Utku Ozdemir
5dff164f1c
fix: fix error output of cli action tracker
Before we started a reboot/shutdown/reset/upgrade action with the action tracker (`--wait`), we were setting a flag to prevent cobra from printing the returned error from the command.

This was to prevent the error from being printed twice, as the reporter of the action tracker already prints any errors occurred during the action execution.

But if the error happens too early - i.e. before we even started the status printer goroutine, then that error wouldn't be printed at all, as we have suppressed the errors.

This PR moves the suppression flag to be set after the status printer is started - so we still do not double-print the errors, but neither do we suppress any early-stage error from being printed.

Closes siderolabs/talos#7900.

Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
2023-10-27 21:16:54 +02:00
Artem Chernyshev
ffa5e05cb9
fix: make Talos work on Rockpi 4c boards again
Suppress `efivars` `ENODEV` errors: skip mount and proceed with boot
sequence.

Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
2023-10-25 13:51:38 +03:00
Andrey Smirnov
8eba4c5999
feat: generate secrets bundle from the machine config
This allows to "recover" secrets if the machine config was generated
first without explicitly saving secrets bundle.

Fixes #7895

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-10-25 13:44:14 +04:00
Nico Berlee
a009f5c60c
fix: accept sysctl paths with dots
Fixes #7878

Signed-off-by: Nico Berlee <nico.berlee@on2it.net>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-10-20 21:16:15 +04:00
Nico Berlee
4919f6ee22
feat: add GOMEMLIMIT to shipped manifests with memory limits
This commit integrates the GOMEMLIMIT environment variable into shipped K8S
manifests when resources.limits.memory is defined. It is set to 95% of the
memory limit to optimize the performance of the Go garbage collector,
mitigating the risk of OOMKills in containerized environments.

When configuring the controller-manager or scheduler custom resources in
machine config, they where accepted, but ignored.

This commit adds Resources to NewControlPlaneSchedulerController and
NewControlPlaneControllerManagerController so machine config resources
Fixes #7874

Signed-off-by: Nico Berlee <nico.berlee@on2it.net>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-10-20 20:41:40 +04:00
Andrey Smirnov
9dfae8467d
chore: update dependencies
Containerd 1.7.7, Linux 6.1.58.

Fixes #7859

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-10-17 17:41:38 +04:00
Serge Logvinov
38ce3c827a
feat: nocloud prefer mac address
Use MAC address over network interface name.

Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-10-16 17:35:58 +04:00
Andrey Smirnov
c3e4182000
refactor: use COSI runtime with new controller runtime DB
See https://github.com/cosi-project/runtime/pull/336

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-10-12 19:44:44 +04:00
Serge Logvinov
0ff7350abe
fix: oracle integration fixes
* Set static gateway IPv6 if it possible.
  Some cni do not work properly with ipv6, so we will fix it.
* Disable talos dashboard.

Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-10-12 17:51:50 +04:00
Andrey Smirnov
f9639fb531
test: fix 'talosctl gen' tests
There were weird hacks put into the tests, while each test already runs
in a temporary directory as 'working directory', so no hacks are needed.

Moreover, using fixed `/tmp/...` paths leads to test failures, as CI
runs docker & QEMU tests in parallel conflicting with each other.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-10-12 16:24:02 +04:00
Andrei Kvapil
6142d87a0f
feat: hostname configuration improvements on the NoCloud platform
* support for local-hostname parameter
* support for hostnames passed via user-data (for Proxmox VE)

Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-10-12 15:45:25 +04:00
Andrey Smirnov
7bb205ebe2
fix: don't use runtime-specs Mount struct in machine config
First of all, it breaks our backwards compatibility promises and breaks
documentation generation. Upstream `specs.Mount` might change at any
time.

The issue was that containerd 1.7.x brings in new `specs.Mount` which
contains extra fields which don't have `omitempty` for YAML, so
machinery always generates them which confuses old Talos versions.

Use a copy of the upstream struct with proper YAML tags, and also
provide a special trick to make sure if the upstream struct changes, we
have a chance to update our copy of the struct.

Also this fixes docs and JSON schema.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-10-11 23:06:19 +04:00
Thomas Way
b87092ab69
fix: handle secure boot state policy pcr digest error
This does not fix the underlying digest mismatch issue, but does handle the error and should provide
further insight into issues (if present).

Refs: #7828

Signed-off-by: Thomas Way <thomas@6f.io>
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2023-10-09 18:24:56 +04:00
Thomas Way
336aee0fdb
fix: use tpm2 hash algorithm constants and allow non-SHA-256 PCRs
The conversion from TPM 2 hash algorithm to Go crypto algorithm will fail for
uncommon algorithms like SM3256. This can be avoided by checking the constants
directly, rather than converting them. It should also be fine to allow some non
SHA-256 PCRs.

Fixes: #7810

Signed-off-by: Thomas Way <thomas@6f.io>
Signed-off-by: Noel Georgi <git@frezbo.dev>
2023-10-04 01:02:20 +05:30