IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
This is the follow-up fix to the PR #5129.
1. Correctly catch only expected errors in the tests.
2. Rewind the snapshot each time the upload is retried.
3. Correctly unwrap errors in the `EtcdRecovery` client.
4. Update the `grpc-proxy` library to pass through the EOF error.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Default `--type` is `devices`, so trigger explicitly on both `devices`
and `subsystems` and use `add` action to mock initial events better.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This fixes a case when udev rules are first created in the machine
config and then removed from the config.
As the file is on the overlayfs, it persists over reboots, so we need to
write it every time we boot Talos.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Some failures can be fixed by updating the machine configuration.
Now `userDisks` and `userFiles` do not make Talos to enter into reboot
loop but pause for 35 minutes.
Additionally, `apid` and `machined` are now started right after
containerd is up and running.
That makes it possible for the operator to connect to the node using
talosctl and fix the config.
Fixes: https://github.com/talos-systems/talos/issues/4669
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
They should cause no harm as every extension as an image on its own, so
hardlinks are only available between the files in one image only.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
I'm not sure how I haven't noticed that before, but that is easily
reproducible with virtual IP moving between the nodes: Talos incorrectly
assumes that pod IPs might be valid kubelet node IPs, and this might
lead to unexpected results if the kubelet node IP is picked to be equal
to pod CIDR.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
The problem is that Virtual IP operator configuration might require
accessing platform metadata server (e.g. on Equinix Metal), while
regular operator sets up critical operators like DHCP.
The issue observed on Equinix Metal without the split:
* on initial boot, DHCP is set up on `eth2`
* platform network configuration is fetched and `bond0` configuration is
created
* node IP is assigned both to `eth2` and `bond0`, while `eth2` is a
slave to `bond0`
* networking is broken
* operator config controller is stuck trying to fetch EM VIP
configuration, as the network is broken, it fails to do so, but retries
for 3 minutes (in `download.Download`)
* network is broken for 3 minutes until `OperatorConfig` controller is
unblocked and cleans up DHCP operator for `eth2` as it should
The issue here is that DHCP operator setup is much more tricky on one
hand (depends on link status, other configuration items, etc.), while
VIP operator depends on DHCP operator setup, as it needs outbound
networking.
By splitting the controllers, we split the flows and remove
dependencies.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
We should skip the checks on container platforms, as Talos has no way to
enforce conditions on the host kernel.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Add a mock D-Bus daemon and a mock logind implementation over D-Bus.
Kubelet gets a handle to the D-Bus socket, connects over it to our
logind mock and negotiates shutdown activities.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This is more of a bandaid, rather than a real fix. As this should be
bacported to `release-1.0`, I tried to avoid doing big changes.
The race condition: controller correctly watches network state and
issues etcd certs as needed, but the service `etcd` writes down PKI
files from the resource just once early on startup. In this case there's
a chance that wrong PKI gets written to disk leaving etcd with
incomplete certs.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
With system extensions, size of the `initramfs` might increase
significantly. With 1000 MiB `/boot`, as we store `A` and `B` boot
directories, we have 500 MiB for each Talos boot (size of the kernel and
initramfs).
Fixes#5096
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
When IPv6 is disabled entirely, we should not try to set `accept_ra`,
since it does not exist.
This performs a check before adding the default kernel parameter.
Fixes#5087
Signed-off-by: Seán C McCord <ulexus@gmail.com>
They were discovered as we tagged 1.0.0 version:
* wrong deprecated version
* incompatibility in extension compatibility checks
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Fixes#4947
It turns out there's something related to boot process in BIOS mode
which leads to initramfs corruption on later `kexec`.
Booting via GRUB is always successful.
Problem with kexec was confirmed with:
* direct boot via QEMU
* QEMU boot via iPXE (bundled with QEMU)
The root cause is not known, but the only visible difference is the
placement of RAMDISK with UEFI and BIOS boots:
```
[ 0.005508] RAMDISK: [mem 0x312dd000-0x34965fff]
```
or:
```
[ 0.003821] RAMDISK: [mem 0x711aa000-0x747a7fff]
```
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Set memory/cpu resource reservation for system processes.
It helps system processes to allocate memory on memory pressure
situation.
Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Set default route to metaserver, which exists only on eth0 interface.
Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Fixes#5003
This implements a way to configure API server admission plugins via
Talos machine configuration.
If Pod Security admission is enabled, default cluster-wide policy is
generated which enforces baseline policy.
Policy can be overridden per-namespace.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Fixes#4694
User services run alongside with Talos system services.
Every user service container root filesystem should be already present
in the Talos root filesystem.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This feature allow to us use low source port <1024 to make a http calls.
Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Newest version of github.com/mdlayher/arp backed by the improved
https://github.com/mdlayher/packet package. There's no stable release
of arp yet but I'd like to get back around to that now that I'm stabilizing underlying pieces.
Signed-off-by: Matt Layher <mdlayher@gmail.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Instead of bundling the apiserver audit logs with the rest of the
apiserver logs, we should store them separately to file, assuring
reasonable defaults for retention and rotation.
Fixes#5000
Signed-off-by: Seán C McCord <ulexus@gmail.com>
This solves a case when lower layer (platform) defines `bond0` as
logical interface properly, and upper layer (configuration) defines only
some part of the config (e.g. VIP).
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
netaddr.Netmask changes the source ip to net clean subnet:
10.1.2.3/24 -> 10.1.2.0/24
Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
netaddr.Netmask changes the source ip to net clean subnet:
10.1.2.3/24 -> 10.1.2.0/24
Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Fixes#4727
On worker nodes, static pods are injected, but status can't be monitored
by Talos. On control plane nodes full status is available via
`StaticPodStatus`.
Pod definition is left as `Unstructured` in the machine configuration,
and no specific validation is performed to avoid pulling in Kubernetes
libraries into Talos machinery package.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This PR changes most common tweaks.
* inotify uses for reload config files if it changed
* tcp_keepalive_* helps to refrech tcp state connections
Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
netaddr.Netmask changes the source ip to net clean subnet:
10.1.2.3/24 -> 10.1.2.0/24
Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Fixes#4816
This changes the way system extensions are packaged into the squashfs
images: `/lib/firmware` is now moved out of the future squashfs images
and becomes part of `initramfs` to make firmware available in the early
boot.
Talos will bind-mount `/lib/firmware` into rootfs as well, so it will be
available in the rootfs as well.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
See #4947
The goal is to disable kexec temporarily to move on with the system
extensions, and to find the root cause and fix kexec before the next
release.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Enables `accept_ra = 2` when IPv6 forwarding is enabled.
When IPv6 forwarding is enabled, the default `accept_ra = 1` no longer
functions.
This is intentional by the kernel developers, because routers generally should not
accept router advertisements (they supply their own).
However, in the case of a machine running Kubernetes, while IP
forwarding is enabled, the machine is still treated more as an end node
than a router.
It is common for a Kubernetes node to be configured via SLAAC and
therefore to expect to receive router advertisements, while at the same
time, IP forwarding must be enabled to handle container communication.
Fixes#3841
Signed-off-by: Seán C McCord <ulexus@gmail.com>
While we use properly-generated certs, it is (according to STIG 242379)
possible to allow a client to downgrade to self-signed acceptance without explicitly
disabling `auto-tls`.
This patch sets `auto-tls` to `false`, preventing the downgrade.
Signed-off-by: Seán C McCord <ulexus@gmail.com>
Shutdown sequence was refactored to support draining and force mode, but
other invocations of the shutdown sequence haven't been updated.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
They were supported internally, but never properly exposed in the
machine configuration.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Cordon & drain a node when the Shutdown message is received.
Also adds a '--force' option to the shutdown command in case the control
plane is unresponsive.
Signed-off-by: Tim Jones <timniverse@gmail.com>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>