IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
The extra disks functionality was completely broken. One fundamental
issue was that we were attempting to create and mount the partitions
before the system disk was created. This moves the extra disks tasks to
the correct part of the boot sequnce. This also adds a simple check that
refuses to operate on a disk if any partitions are found.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
This installs default middleware to recover from panics (convert them to
errors) in all the grpc servers by default.
Slight refactoring to allow that as grpc can only accept Unary/Stream
interceptors only once.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This brings in the recent updates to protoc-gen-proxy to allow support
for proxying streaming api requests. We artificially limit it to only the first
target specified in the list while we work through what multi target stream
support looks like.
Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
This PR adds in the necessary manifests and fixes to deploy aws clusters
as part of e2e testing.
Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
This adds support for specify nameservers in the config.
When I was adding tests I noticed the netconf code for setting
the MTU caused a panic. Given how we retrieve the data ( device centric )
in the static addressing method, I think this is safe to remove.
Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
Also refactored `integration-test` build as a generic step to be shared
by basic-integration and e2e-integration steps.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This allows the config.Debug setting to control container output to allow better troubleshooting.
Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
Fixes#1419
This is required to avoid later startup failures while trying to connect
to etcd if it hasn't actually bootstrapped.
This health check does just connectivity check, no quorum/leader checks,
as they should depend on cluster state in general.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Just a small nit, as all the services share same package, global
variable with generic name might lead to fun collisions.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This fixes a long standing issue with upgrading the init node. We
currently have no way of knowing whether the init node should join an
existing etcd cluster, or create a new one. This makes use of the node's
metadata to determine if the node has already created the etcd cluster.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
This introduces the notion of metadata for a node. In this initial pass
there are only two fields. A timestamp to indicate when the install was
performed, and a field to indicate if the install was performed as part
of an upgrade.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
This starts with a very simple test for `osctl version` using regexps as
output of the command depends a lot on current version.
We might use more of 'gold' matches for other commands potentially.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This PR will re-enable e2e testing by using the new cluster api
bootstrap provider and various infra providers.
Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
Since APId/gRPC connections should never go through a proxy, we will explicitly exclude
these environment variables from apid.
Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
Without host network namespace, networkd and ntpd didnt work properly. NTP failed to
start up because it couldnt reach the ntp servers and networkd failed to configure
the interfaces and display interface information.
Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
This is just first steps and core foundation.
It can be used like:
```
make integration.test
osctl cluster create
build/integration.test -test.v
```
This should run the test against the Docker instance.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
The name helper isn't very good. This renames it to Client. A new func
was also added, NewForConfig, that will allow for the creation of the helper
client from an arbitrary Kubernetes REST config.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
This verifies that all etcd members are running before performing an
upgrade. Without this we run the risk of destroying the etcd cluster.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
We should use 127.0.0.1 only in special cases (like when bootstrapping
the cluster). There is the potential that the local etcd member is
unhealthy and/or not responsive. This adds function for creating an etcd
client configured with all control plane node IPs in order to better
handle this case.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
We should add an etcd member only if it has not already been added. When
a control plane node is rebooted, or down for whatever reason, when it
comes back up it will attempt to add itself again. When it does so, the
cluster is unhelathy due to the fact that the node was down. A feature
of etcd called "strict-reconfig-check" prevents any member adds when the
cluster is unhealthy since doing so would cause the cluster to lose
quorum.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
This moves the Kubeconfig api endpoint to machined and consolidates the
"read a file" code into machined. This also changes Kubeconfig to
use the CopyOut method which changes Kubeconfig to a streaming grpc call.
Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
We need to stop etcd earlier in the upgrade sequence to prevent machined
from trying to restart it after leaving the etcd cluster. We also need
to remove the data-dir since all the data becomes invalid once we leave
the etcd cluster.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
Using the CRI seems to be more dependable in ensuring that we don't hit
EBUSY when trying to reset the system disk after stopping all
containers.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
This adds an extra phase to the upgrade sequence that ensures we don't hit
EBUSY when attempting to delete the ephemeral partition. This is crucial
because if we fail to do so, the disk does not have a bootloader and we
effectively destroy the machine. It works by attempting to open the block
device with O_EXCL: If the block device is in use by the system (e.g., mounted)
, open() fails with the error EBUSY.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
There is no need for these packages to be in the base image. This moves
to installing them using ONBUILD.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
This moves to using the retry package for retrying NTP queries. It also
adds some additional logging that is useful when NTP queries fail.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
There are cases where we can see EBUSY when attempting to use the BLKPG
ioctl. The recommendation seems to be to retry when this happens.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
Since dmesg is not streamed, it becomes difficult to debug issues with
machined. This fixes that by setting up the logging of machine to go to
/dev/kmsg and to a log file.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
This adds a timestamp to /boot/installed. It can be useful for
determining the last known successful install.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
This addresses an issue caused by containers that refuse to exit with
SIGTERM. After sending SIGTERM, we send SIGKILL after a timeout of one minute.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
Trying to be smart about whether our not an install is being performed
as part of an upgrade has proven to be error prone. This moves to
perform installs with explicit args.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>