IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
When Talos `controlplane` node is waiting for a bootstrap, `etcd`
contents can be recovered from a snapshot created with
`talosctl etcd snapshot` on a healthy cluster.
Bootstrap process goes same way as before, but the etcd data directory
is recovered from the snapshot.
This flow enables disaster recovery for the control plane: given that
periodic backups are available, destroy control plane nodes, re-create
them with the same config, and bootstrap one node with the saved
snapshot to recover etcd state at the time of the snapshot.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This PR pulls in an updated cluster api aws version, ensuring the CRDs
are closer to what's expected when we patch the CAPA image later in the
setup. We will eventually move to 0.6.5 as soon as it's cut.
Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
When copy-pasting extra space might be added around an argument to the
`talosctl config endpoints/nodes`, which breaks the config as the
endpoint doesn't parse anymore as IP address.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This is mostly refactoring to adapt to the new APIs.
There are some small changes which are not user-visible immediately (but
visible when using `talosctl get` to inspect low-level details):
* `extras` namespace is removed, it was a hack to distinguish extra and
system manifests
* `Manifests` are managed by two controllers as shared outputs, stored
in the `controlplane` namespace now
* `talosctl inspect dependencies` output got slightly changed
* resources now have `md.owner` set to the controller name which manages
the resource
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This fix is not obvious on whether we need it actually or not, but what
I've seen in the tests seems to be around the fact that added member is
not visible in the member list fetched after the add command succeeds.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Overlay mount in `mountinfo` don't show up as mounts for any particular
block device, so the existing check doesn't catch them.
This was discovered as our current master can't upgrade because of
overlay mount for `/opt` and `apid` image in `/opt/apid` (which will be
fixed in a separate PR).
Without the check, installer fails on resetting partition table for the
disk effectively wiping the node (`device or resource busy` error).
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This moves things around a bit so that `go generate` is called after
modules are generated, as `go generate` downloads modules as well.
This fixes a race condition which might show up randomly.
Spotted by: @AlekSi
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Fixes: https://github.com/talos-systems/talos/issues/3410
Same as in `talosctl cluster create`. Will apply RFC6902 json patch
during the config generation if specified.
Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
This adds a simple API and `talosctl etcd snapshot` command to stream
snapshot of etcd from one of the control plane nodes to the local file.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Container rootfs should be writeable as containerd mounts standard
filesystems `/proc` et al.
When `/opt` was used as a root of container filesystem this results in a
problem: Talos overlay mounts `/opt` on `/var/system` which means that
as long as `apid` running `/var` can't be unmounted which breaks
upgrades.
So instead use `/system/libexec` as rootfs for the containers, `/system`
is `tmpfs`, and bind-mount actually executable (`/sbin/init`, machined)
into rootfs.
This fixes upgrades for 0.10.
See also #3425
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Version 0.9.1 contains a fix for concurrent map write on unmount which
was frequently breaking our upgrade tests.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This extracts function which was used in upgrade/convert flows to retry
transient errors to the main `kubernetes` package, expands it to ignore
timeout errors, and it is now used to retry errors where applicable in
`pkg/kubernetes`.
Fixes#3403
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
The way processing works is that errors are not printed in the
sequencer, but something which called the sequencer prints the error,
but this means that for fatal failures say in 'upgrade' sequence error
message is printed by machined after the `apid` is stopped.
This means that error won't be visible via `talosctl dmesg`, but only in
serial console.
This changes the flow to print the task error as soon as task fails, and
removes 'done' messages in the sequencer if sequence/phase/task fails
(as otherwise it has both 'done' and 'failed' message which is
confusing).
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
During the conversion process, API server goes down, so we can see lots
of network errors including EOF.
Fixes#3404
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This provides binary compatibility for really old binaries using 32-bit
time.
See also: talos-systems/pkgs#259
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This provides a variable to build core Talos components with race
detector enabled: `make initramfs WITH_RACE=yes`.
Also refactored and DRYed up the build code exposing common build/link
flags via the Makefile.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Tests for ApplyConfig API were relying on not really supported behavior
of modifying config via the `Provider` interface (and it was "fixed" in
another PR which cleans up such access to the configuration).
Cluster version bumped to try to workaround strange CAPI bootstrap
failures in e2e-capi.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
- Table row selection was 1 element off, so disk selector wasn't quite
working.
- Reduce amount of interfaces on the last screen: show only ones that
have physical addresses (changing some settings for lo0 for example was
making TUI generate incorrect configs)
Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
This avoids data race on config access: config object might be accessed
concurrently and it should be read-only on access.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Fixes: https://github.com/talos-systems/talos/issues/3384
Instead of doing simple `--no-comments` flag, decided to use more
granular approach which allows to either disable examples, or docstring,
or both.
Thus the command looks like this:
```bash
talosctl gen config --with-docs=false --with-examples=false <...>
```
Both are enabled by default to provide better UX for users learning
Talos.
Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
Containerd API to pass stdin to the container is far from being perfect,
but it seems to contain a race condition we can't avoid: if `NewTask()`
fails, it starts the I/O loop in a goroutine, but never stops it. We
can't stop it as well, as `NewTask()` failed, so to workaround this
failure, copy the stdin into new reader on each access.
This copying shouldn't be a big deal for us, as it's just machine
configuration and it's tiny.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This is a complete rewrite of time sync process.
Now the time sync process starts early at boot time, and it adapts to
configuration changes:
* before config is available, `pool.ntp.org` is used
* once config is available, configured time servers are used
Controller updates same time sync resource as other controllers had
dependency on, so they have a chance to wait for the time sync event.
Talos services which depend on time now wait on same resource instead of
waiting on timed health.
New features:
* time sync now sticks to the particular time server unless there's an
error from that server, and server is changed in that case, this
improves time sync accuracy
* time sync acts on config changes immediately, so it's possible to
reconfigure time sync at any time
* there's a new 'epoch' field in time sync resources which allows
time-dependent controllers to regenerate certs when there's a big enough
jump in time
Features to implement later:
* apid shouldn't depend on timed, it should be started early and it
should regenerate certs on time jump
* trustd should be updated in same way
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Follow-up for #3383
I added couple of first tests, we should add more as we go through this
code. Even with those tests, I found and fixed two more panics.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Fixes: https://github.com/talos-systems/talos/issues/3377, https://github.com/talos-systems/talos/issues/3380
Fixed the data race in the encoder documentation examples by using `sync.Once`.
We only need to generate them once anyways and then it's not a big deal
that we are using the same pointers everywhere as they're pretty much
constant.
As of `system.go`, looks like we actually have concurrent operations for
partitions unmount so I just added a mutex there.
Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
The command `--remove-initialized-key` is the last resort to convert
control plane when control plane is down for whatever reason, so it
should work when control plane is not available.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This PR updates our AWS docs so that we specify a tag when creating
instances. This makes it easier to know which VMs were created as part
of this process, as well as quickly spot the init node.
Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>