Commit Graph

686 Commits

Author SHA1 Message Date
Andrey Smirnov
f20a6900db
fix: json logging panic
Fixes #9466

There are two fixes:

* fix the actual panic via https://github.com/siderolabs/go-circular/pull/5
* prevent similar issues in the future by installing a panic handler

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit 5853bb0ea4d6a65635086bdef617d6d0800cabd0)
2024-10-25 18:28:36 +04:00
Andrey Smirnov
9f62fe96ce
feat: update pkgs and Kubernetes
Linux: 6.6.58
containerd: 2.0.0-rc.6
runc: 1.2.0
Kubernetes: 1.31.2

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-10-25 18:20:16 +04:00
Andrey Smirnov
4d279c65f4
fix: volume encryption with failing keyslots
Fix the flow when a failing key slot leads to repeated attempts to open
the volume, while it's already open, but the failure was to sync other
keys.

Refactor the code to get rid of variable assignment in the outer block
from closures.

Fixes #9415

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit 0a2b4556c55eda27536ee563f60bcf5d69379479)
2024-10-08 14:49:32 +04:00
Andrey Smirnov
070defad15
fix: update grpc-go the latest patch release
See https://github.com/grpc/grpc-go/releases/tag/v1.66.3

Specifically stream failures, I wonder if that is causing flaky
support script.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit 6affbd3182ebe0209ed5433c534062b7ad672b6a)
2024-10-08 14:49:08 +04:00
Andrey Smirnov
5f4515f306
fix: prevent file descriptors leaks to child processes
See #9412

I'll keep the issue open to track upstream PR status and remove replace
directives.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit a341bdb0640294a07939670919c56cbfa7a861c4)
2024-10-08 14:48:19 +04:00
Andrey Smirnov
01e580bddb
feat: update Go 1.22.8, Linux, pkgs
Bring in latest tools and packages.

Linux 6.6.54

containerd v2.0.0-rc.5

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-10-08 14:48:10 +04:00
Andrey Smirnov
8fb2f24b4a
fix: update blockdevice library to v2.0.2
Fixes #9350

Actual fix is https://github.com/siderolabs/go-blockdevice/pull/111

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit 9b77698cf2ff64c6f6d198d05c2012ab7fa858be)
2024-09-23 15:14:38 +04:00
Andrey Smirnov
920d8c8297
fix: audit and fix cgroup reservations
Fixes: #7081

Review all reservations and limits set, test under stress load (using
both memory and CPU).

The goal: system components (Talos itself) and runtime (kubelet, CRI)
should survive under extreme resource starvation (workloads consuming
all CPU/memory).

Uses #9337 to visualize changes, but doesn't depend on it.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit 6b15ca19cd1291b8a245d72d5153827945cad037)
2024-09-21 13:26:47 +04:00
Andrey Smirnov
bd91675121
test: add a test for inline machine config trusted roots
Run SideroLink API server via TLS with self-signed certificate, inject
that certificate into Talos via `talos.config.inline=`.

Fix a couple of place where our special TLS root CA provider supporting
reloading on the fly was not used.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit 8d6884a8e28e1bfa29f9a479e0f7179819cf70cd)
2024-09-13 12:53:04 +04:00
Andrey Smirnov
073ba25855
feat: update default Kubernetes version to 1.31.1
See https://github.com/kubernetes/kubernetes/releases/tag/v1.31.1

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit 869f8379f2317175901e8cb3deec4b800e7ab603)
2024-09-13 12:48:57 +04:00
Andrey Smirnov
5eb5ff532d
feat: update etcd to 3.5.16
See https://github.com/etcd-io/etcd/releases/tag/v3.5.16

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit 5c6277d171eea58878ce4fcb4d2fdb7154333ae7)
2024-09-13 12:41:35 +04:00
Noel Georgi
361283401e
chore: version specific kube-scheduler health checks
Use K8s version specific kube-scheduler health checks.

Ref: https://github.com/siderolabs/go-kubernetes/pull/17

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-09-06 19:47:47 +05:30
Andrey Smirnov
bcaf63628b
feat: update dependencies
Update to final tools, pkgs, extras.

Bump Go dependencies.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-09-06 15:51:05 +04:00
Andrey Smirnov
b453385bd9
feat: support volume configuration, provisioning, etc
This implements the first round of changes, replacing the volume backend
with the new implementation, while keeping most of the external
interfaces intact.

See #8367

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-08-30 18:32:34 +04:00
Andrey Smirnov
be2ebf6b4d
chore: bump dependencies
Update tools, pkgs, extras, Go dependencies, Go tools, etc.

Linux 6.6.47 and containerd 2.0.0-rc.4.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-08-29 20:44:37 +04:00
Dmitry Sharshakov
4834a61a8e
feat: report SELinux labels
This will be useful for debugging SELinux implementation. Make API report other xattrs for further development like IMA/EVM

Signed-off-by: Dmitry Sharshakov <dmitry.sharshakov@siderolabs.com>
2024-08-26 16:19:38 +03:00
Andrey Smirnov
5b4b64979e
fix: bump go-smbios for broken SMIOS tables
See https://github.com/siderolabs/go-smbios/issues/16

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-08-23 16:01:34 +04:00
Andrey Smirnov
9e348ef350
feat: update Kubernetes to 1.31.0
See https://github.com/kubernetes/kubernetes/releases/tag/v1.31.0

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-08-14 15:56:11 +04:00
Andrey Smirnov
eba5dafb9e
fix: add dns-resolve-cache to the support bundle
See https://github.com/siderolabs/go-talos-support/pull/4

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-08-12 22:06:16 +04:00
Dmitriy Matrenichev
beb9602e35
chore: bump github.com/docker/docker to v27.1.1+incompatible
Security fix (we are not affected).

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-08-12 10:47:18 +03:00
Dmitriy Matrenichev
622d66a98f
chore: bump deps
Bump stuff

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-08-09 11:59:03 +03:00
Andrey Smirnov
c9f1dece5d
feat: update Kubernetes to 1.31.0-rc.1
See https://github.com/kubernetes/kubernetes/releases/tag/v1.31.0-rc.1

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-08-06 19:20:13 +04:00
Andrey Smirnov
e02bd20933
feat: update Kubernetes to 1.31.0-rc.0
Also bump PKGS to the latest.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-08-05 17:18:02 +04:00
Andrey Smirnov
19aa44c549
fix: generate kubeconfig using proper types
Generating YAML using text templates is going to stop working because of
proper escaping.

Also fix unrelated issue with `cloud.google.com/go` module which got
split into submodules, and now this conflicts with each other.

Fixes #7180

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-07-29 22:03:29 +04:00
Noel Georgi
3ce5492f85
feat: runc memfd-bind service
Add a `runc-memfd-bind` service so that runc binary is not copied for
every `runc` invocation.

Fixes: #9007.

Signed-off-by: Noel Georgi <git@frezbo.dev>
2024-07-29 19:02:59 +05:30
Andrey Smirnov
b333ec07d9
feat: update etcd to 3.5.15, Flannel to 0.25.5
* https://github.com/flannel-io/flannel/releases/tag/v0.25.5
* https://github.com/etcd-io/etcd/releases/tag/v3.5.15

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-07-23 20:00:25 +04:00
Andrey Smirnov
407347a7a0
feat: update Kubernetes to 1.31.0-beta.0
See https://github.com/kubernetes/kubernetes/releases/tag/v1.31.0-beta.0

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-07-17 14:41:14 +04:00
Andrey Smirnov
c288ace7b1
fix: be more smart when merging DNS resolver config
Fixes #8690

Consider the following scenario (e.g. OpenStack): platform issues a
correct list of DNS servers, which includes both IPv4 and IPv6
resolvers, and configures DHCPv4 on the interface.

DHCPv4 returns a set of IPv4 resolvers (as it can't return IPv6 ones),
and this list completely overrides the list from the platform, wiping
out the IPv6 resolvers completely.

With this change, the merge process is more smart, as it tries to
preserve IPv6 resolvers for example if the next layer provides no
resolvers for IPv6.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-07-16 21:20:27 +04:00
Dmitriy Matrenichev
fbde9c556f
chore: bump deps
Bump github.com/siderolabs/grpc-proxy to v0.4.1 and replace deprecated calls to `grpc.CustomCodec`.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-07-09 20:01:13 +03:00
Andrey Smirnov
3bab15214d
feat: update Kubernetes to 1.31.0-alpha.3
Fixes #8911

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-07-09 17:49:06 +04:00
Dmitriy Matrenichev
076f3c4f20
chore: improve link spec controller code
`SortBonds` function bothered me since the last time I refactored this part.

We always know that it only accepts `network.LinkSpec`s, but we accepted the slice of untyped Resources because
this is what `List` method returns. Now we can do better, since `safe.List` now supports `Swap` method.

We can utilize `sort.Interface` and pass `safe.List` directly to `SortBonds`.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-07-05 16:39:27 +03:00
Andrey Smirnov
3d35e54683
chore: update hydrophone library
My PR https://github.com/kubernetes-sigs/hydrophone/pull/198 got merged
upstream, so drop local workaround.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-07-05 14:42:47 +04:00
Andrey Smirnov
52611a90d8
feat: update Kubernetes to v1.30.2
See https://github.com/kubernetes/kubernetes/releases/tag/v1.30.2

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-07-02 15:54:34 +04:00
Andrey Smirnov
b4c871e4b7
chore: bump dependencies
Update Go modules and other dependencies.

Fix linting of the Dockerfile.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-07-02 14:46:51 +04:00
Andrey Smirnov
6f6a5d1057
chore: upgrade to rtnetlink/v2 library
The v1 version is no longer supported.

The major change is the decoding of link data, but we're not using it,
as we have our own decoders/encoders for a long time.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-13 19:20:48 +04:00
Andrey Smirnov
1fb8453c2d
chore: update Go modules
Azure SDK has a CVE, bump other modules.

Update `hydrophone` with my fixes which got merged upstream.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-12 15:56:58 +04:00
Andrey Smirnov
7fcb521a6a
feat: use hydrophone instead of sonobuoy
Fixes #8790

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-11 16:51:45 +04:00
Dmitriy Matrenichev
c6f90d0149
chore: replace sync.Map with concurrent.HashTrieMap
Also bump `cosi-project/runtime` to the v0.4.4

Closes #8851

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-06-10 20:45:47 +03:00
Andrey Smirnov
7cbdce73f7
fix: detect CD devices, fix user disks wipe test
Detect CD devices, and set size to 0 for CD without media.

In user disk wipe tests, skip device mapper devices and CD-ROM.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-10 18:00:06 +04:00
Dmitriy Matrenichev
26cf566dc8
chore: bump our coredns fork
Update from github.com/coredns/coredns v1.11.2 to v1.11.3 and apply our changes.

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-06-07 22:18:28 +03:00
Andrey Smirnov
f07b79f4a8
feat: provide disk detection based on new blockdevices
Uses go-siderolabs/go-blockdevice/v2 for all the hard parts,
provides new resource `Disk` which describes all disks in the system.

Additional resource `SystemDisk` always point to the system disk (based
on the location of `META` partition).

The `Disks` API (and `talosctl disks`) provides a view now into the
`talosctl get disks` to keep backwards compatibility.

QEMU provisioner can now create extra disks of various types: IDE, AHCI,
SCSI, NVME, this allows to test detection properly.

The new resource will be the foundation for volume provisioning (to pick
up the disk to provision the volume on).

Example:

```
talosctl -n 172.20.0.5 get disks
NODE         NAMESPACE   TYPE   ID        VERSION   SIZE          READ ONLY   TRANSPORT   ROTATIONAL   WWID                                                               MODEL            SERIAL
172.20.0.5   runtime     Disk   loop0     1         65568768      true
172.20.0.5   runtime     Disk   nvme0n1   1         10485760000   false       nvme                     nvme.1b36-6465616462656566-51454d55204e564d65204374726c-00000001   QEMU NVMe Ctrl   deadbeef
172.20.0.5   runtime     Disk   sda       1         10485760000   false       virtio      true                                                                            QEMU HARDDISK
172.20.0.5   runtime     Disk   sdb       1         10485760000   false       sata        true         t10.ATA     QEMU HARDDISK                           QM00013        QEMU HARDDISK
172.20.0.5   runtime     Disk   sdc       1         10485760000   false       sata        true         t10.ATA     QEMU HARDDISK                           QM00001        QEMU HARDDISK
172.20.0.5   runtime     Disk   vda       1         12884901888   false       virtio      true
```

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-07 20:18:32 +04:00
Andrey Smirnov
7c9a14383e
fix: volume discovery improvements
Use shared locks, discover more partitions, some other small changes.

Re-enable the flaky test.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-06 19:45:40 +04:00
Andrey Smirnov
41f92e0ba4
chore: update Go to 1.22.4, other updates
Bump go modules, adjust the code.

New linter warnings.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-06-05 20:59:52 +04:00
Andrey Smirnov
4dd0aa7120
feat: implement PCI device bus enumeration
Fixes #8826

From the QEMU VM:

```shell
$ talosctl -n 172.20.0.5 get pcidevice
NODE         NAMESPACE   TYPE        ID             VERSION   CLASS                       SUBCLASS                    VENDOR              PRODUCT
172.20.0.5   hardware    PCIDevice   0000:00:00.0   1         Bridge                      Host bridge                 Intel Corporation   82G33/G31/P35/P31 Express DRAM Controller
172.20.0.5   hardware    PCIDevice   0000:00:01.0   1         Display controller          VGA compatible controller
172.20.0.5   hardware    PCIDevice   0000:00:02.0   1         Network controller          Ethernet controller         Red Hat, Inc.       Virtio network device
172.20.0.5   hardware    PCIDevice   0000:00:03.0   1         Unclassified device                                     Red Hat, Inc.       Virtio RNG
172.20.0.5   hardware    PCIDevice   0000:00:04.0   1         Unclassified device                                     Red Hat, Inc.       Virtio memory balloon
172.20.0.5   hardware    PCIDevice   0000:00:05.0   1         Communication controller    Communication controller    Red Hat, Inc.       Virtio console
172.20.0.5   hardware    PCIDevice   0000:00:06.0   1         Generic system peripheral   System peripheral           Intel Corporation   6300ESB Watchdog Timer
172.20.0.5   hardware    PCIDevice   0000:00:07.0   1         Mass storage controller     SCSI storage controller     Red Hat, Inc.       Virtio block device
172.20.0.5   hardware    PCIDevice   0000:00:1f.0   1         Bridge                      ISA bridge                  Intel Corporation   82801IB (ICH9) LPC Interface Controller
172.20.0.5   hardware    PCIDevice   0000:00:1f.2   1         Mass storage controller     SATA controller             Intel Corporation   82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA Controller [AHCI mode]
172.20.0.5   hardware    PCIDevice   0000:00:1f.3   1         Serial bus controller       SMBus                       Intel Corporation   82801I (ICH9 Family) SMBus Controller
```

```yaml
node: 172.20.0.5
metadata:
    namespace: hardware
    type: PCIDevices.hardware.talos.dev
    id: 0000:00:1f.3
    version: 1
    owner: hardware.PCIDevicesController
    phase: running
    created: 2024-05-30T12:09:05Z
    updated: 2024-05-30T12:09:05Z
spec:
    class: Serial bus controller
    subclass: SMBus
    vendor: Intel Corporation
    product: 82801I (ICH9 Family) SMBus Controller
    class_id: "0x0c"
    subclass_id: "0x05"
    vendor_id: "0x8086"
    product_id: "0x2930"
```

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-31 20:56:16 +04:00
Dmitriy Matrenichev
3367ded9fe
fix: correct time adjustment in time.SyncController
github.com/beevik/ntp v1.4.2 contains [^1] overflow bug. 1.4.3 fixes that, so lets use it.

Closes #8828

[^1]: 235effe749

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-05-30 14:22:17 +03:00
Andrey Smirnov
0359c8537c
chore: unify toml packages being used
Drop BurntSushi one, and use /v2 of pelletier package.
There is indirect use of v1 which should hopefully go away once we move
away from sonobouy.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-29 21:22:56 +04:00
Andrey Smirnov
c2b19dcb97
chore: move to containerd 2.0 API
Lots of module moves/renames.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-24 21:48:55 +04:00
Andrey Smirnov
01ea82053e
fix: time sync over NTP from future era
Logs:

```
[    7.127481] [talos] adjusting time (jump) by -205704h26m36.111961385s via 162.159.200.1, state TIME_OK, status STA_NANO {"component": "controller-runtime", "controller":t}
```

Fix: https://github.com/beevik/ntp/pull/47

Fixes: #8771

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-24 14:49:27 +04:00
Dmitriy Matrenichev
e7bd9cd2bb
fix: decrease maximum negative ttl for dns responses
The maximum negative ttl (ttl for non-existent domain responses) was set to 1 hour, which is
too long. This PR decreases the maximum negative ttl to 10 seconds.

Also update CoreDNS module while we are at it.

Closes #8631

Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
2024-05-21 23:20:42 +03:00
Andrey Smirnov
d4307043ff
fix: update go-tail library to fix 'short read' error
See https://github.com/siderolabs/go-tail/pull/2

It seems to pop up more with compressed logs, but overall makes sense to
be fixed.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
2024-05-20 20:44:43 +04:00