IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
The structure of the controllers is really similar to addresses and
routes:
* `LinkSpec` resource describes desired link state
* `LinkConfig` controller generates `LinkSpecs` based on machine
configuration and kernel cmdline
* `LinkMerge` controller merges multiple configuration sources into a
single `LinkSpec` paying attention to the config layer priority
* `LinkSpec` controller applies the specs to the kernel state
Controller `LinkStatus` (which was implemented before) watches the
kernel state and publishes current link status.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Enable logging using default development config with some fine tuning.
Additionally, now `info` and below logs go to kmsg.
Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
This pulls in a newer version of smbios so that we can detect lower
smbios version and handle endianness if necessary.
Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
Fixes#3538
See also talos-systems/pkgs#276
As new containerd is now Go module-based, it pulls many more
dependencies if simply imported in `go.mod`, so I had to replace the
reference to the constant in `pkg/machinery/` to `containerd` volume
with simple value to avoid pulling Kubernetes dependencies into
`pkg/machinery`.
Also updates the kernel to include PR talos-systems/pkgs#275 for AES-NI
support.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This controller queries addresses of all the interfaces in the system
and presents them as resources. The idea is that can be a source for
many decisions - e.g. whether network is ready (physical interface has
scope global address assigned).
This is also good for debugging purposes.
Examples:
```
$ talosctl -n 172.20.0.2 get addresses
NODE NAMESPACE TYPE ID VERSION
172.20.0.2 network AddressStatus cni0/10.244.0.1/24 1
172.20.0.2 network AddressStatus cni0/fe80::9c87:cdff:fe8e:5fdc/64 2
172.20.0.2 network AddressStatus eth0/172.20.0.2/24 1
172.20.0.2 network AddressStatus eth0/fe80::ac1b:9cff:fe19:6b47/64 2
172.20.0.2 network AddressStatus flannel.1/10.244.0.0/32 1
172.20.0.2 network AddressStatus flannel.1/fe80::440b:67ff:fe99:c18f/64 2
172.20.0.2 network AddressStatus lo/127.0.0.1/8 1
172.20.0.2 network AddressStatus lo/::1/128 1
172.20.0.2 network AddressStatus veth178e9b31/fe80::6040:1dff:fe5b:ae1a/64 2
172.20.0.2 network AddressStatus vethb0b96a94/fe80::2473:86ff:fece:1954/64 2
```
```
$ talosctl -n 172.20.0.2 get addresses -o yaml eth0/172.20.0.2/24
node: 172.20.0.2
metadata:
namespace: network
type: AddressStatuses.net.talos.dev
id: eth0/172.20.0.2/24
version: 1
owner: network.AddressStatusController
phase: running
spec:
address: 172.20.0.2/24
local: 172.20.0.2
broadcast: 172.20.0.255
linkIndex: 4
linkName: eth0
family: inet4
scope: global
flags: permanent
```
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
The OVF environment is a way to supply guestinfo to guests. It is
a datastructure (XML) put in `extraConfig` (commonly referred to as
`guestinfo`) under the key `ovfenv`.
This OVF env is said to be the proper way to supply customization data
to guests (ie, not through `extraConfig`), and on some platforms (eg,
vCD), it is even the only option.
This change also enables the actual OVF transport in the OVA.
Signed-off-by: Jorik Jonker <jorik.jonker@eu.equinix.com>
When Talos `controlplane` node is waiting for a bootstrap, `etcd`
contents can be recovered from a snapshot created with
`talosctl etcd snapshot` on a healthy cluster.
Bootstrap process goes same way as before, but the etcd data directory
is recovered from the snapshot.
This flow enables disaster recovery for the control plane: given that
periodic backups are available, destroy control plane nodes, re-create
them with the same config, and bootstrap one node with the saved
snapshot to recover etcd state at the time of the snapshot.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This is mostly refactoring to adapt to the new APIs.
There are some small changes which are not user-visible immediately (but
visible when using `talosctl get` to inspect low-level details):
* `extras` namespace is removed, it was a hack to distinguish extra and
system manifests
* `Manifests` are managed by two controllers as shared outputs, stored
in the `controlplane` namespace now
* `talosctl inspect dependencies` output got slightly changed
* resources now have `md.owner` set to the controller name which manages
the resource
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Fixes: https://github.com/talos-systems/talos/issues/3323
Not exactly matching with udevd generated `by-<id>` symlinks, but should
provide sufficient amount of property selectors to be able to pick
specific disks for any kind of disk: sd card, hdd, ssd, nvme.
Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
`os-runtime` now writes `yaml` block as raw yaml bytes instead of
decoding it into `yaml.Node` and encoding that `yaml.Node` back to YAML.
The reason is that `go-yaml` comments decoder can't really handle
comment placement properly, so it messes up indents here and there.
Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
This adds support for `-o json` (easier to use `jq` to query additional
data), and prints event name in `--watch` mode.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
See https://github.com/talos-systems/os-runtime/pull/12 for new mnaming
conventions.
No functional changes.
Additionally implements printing extra columns in `talosctl get xyz`.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This adds a VIP (virtual IP) option to the network configuration of an
interface, which will allow a set of nodes to share a floating IP
address among them. For now, this is restricted to control plane use
and only a single shared IP is supported.
Fixes#3111
Signed-off-by: Seán C McCord <ulexus@gmail.com>
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Fixes: https://github.com/talos-systems/talos/issues/3209
Using parts of `kubectl` package to run the editor.
Also using the same approach as in `kubectl edit` command:
- add commented section to the top of the file with the description.
- if the config has errors, display validation errors in the commented
section at the top of the file.
- retry apply config until it succeeds.
- abort if no changes were detected or if the edited file is empty.
Patch currently supports jsonpatch only and can read it either from the
file or from the inline argument.
https://asciinema.org/a/wPawpctjoCFbJZKo2z2ATDXeC
Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
This version is finally using working `go.mod` files and tags, so no
more hacks with imports, and allows us to bump `grpc` library to the
latest version (I also did for this PR).
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Critical bug (I believe) was that drain code entered the loop to evict
the pod after wait for pod to be deleted returned success effectively
evicting pod once again once it got rescheduled to a different node.
Add a global timeout to prevent draining code from running forever.
Filter more pod types which shouldn't be ever drained.
Fixes#3124
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
State partition encryption support adds a new section to the machine config.
And a new step to the sequencer flow which saves encryption
configuration object as json serialized value in the META partition.
Everything else is the same as is for the ephemeral partition.
Additionally enabled state partition encryption in the disk encryption
integration tests.
Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
This uses API in `os-runtime` to pull the initial list of resources +
updates for resource by type.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Filesystem creation step is moved on the later stage: when Talos mounts
the partition for the first time.
Now it checks if the partition doesn't have any filesystem and formats
it right before mounting.
Additionally refactored mount options a bit:
- replaced separate options with a set of binary flags.
- implemented pre-mount and post-unmount hooks.
And fixed typos in couple of places and increased timeout for `apid ready`.
Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
Modify provision library to support multiple IPs, CIDRs, gateways, which
can be IPv4/IPv6. Based on IP types, enable services in the cluster to
run DHCPv4/DHCPv6 in the test environment.
There's outstanding bug left with routes not being properly set up in
the cluster so, IPs are not properly routable, but DHCPv6 works and IPs
are allocated (validates DHCPv6 client).
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
ECDSA keys are smaller which decreases Talos config size, they are more
efficient in terms of key generation, signing, etc., so it makes boot
performance better (and config generation as well).
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This fixes spurious race conditions when user disks are partitioned
and formatted in `mountUserDisks` task. While this task runs, `udevd` is
running to allow various `/dev/` symlinks to be used for user disks.
At the same time `udevd` might trigger syscall `BLKRRPART` at any time
concurrently with Talos which leads to a race on kernel side when Talos
tries to update kernel partition table while kernel does it on its own
as a result of `udevd` call.
As part of the fix, `RereadPartitionTable()` calls were removed (they
trigger `BLKRRPART` and they're not needed as Talos updates partition
table on its own).
Some cleanups to make sure blockdevice is open/closed just in matching
pairs (no lingering open blockdevice instances). This is import for
`WithExclusiveLock()` calls, as it would lead to a deadlock if previous
blockdevice instance is not closed.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This refactoring is required to simplify the work to be done to support
disk encryption.
Tried to minimize amount of queries done by `blockdevice` `probe`
methods.
Instead, where we have `runtime.Runtime` we get all required blockdevices
there from blockdevice cache stored in `State().Machine().Disk()`.
This opens a way to store encryption settings in the `Partition`
objects.
Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
Control plane components are running as static pods managed by the
kubelets.
Whole subsystem is managed via resources/controllers from os-runtime.
Many supporting changes/refactoring to enable new code paths.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This brings in `os-runtime` package and exposes resources with first
iteration of read-only API.
Two Talos resources (and one controller) are implemented:
* legacy.Service resource tracks Talos 'service' `RUNNING` state
* config.V1Alpha1 stores current runtime config
Glue point between existing runtime and new os-runtime based runtime is
in `v1alpha2` implementation and `V1Alpha2()` sub-interfaces of existing
`Runtime`, `State`, `Controller` interfaces.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This the first iteration of Wireguard network support.
What was done:
- kernel was updated to enable Wireguard kernel module.
- changed networkd to support creating Wireguard device type.
- used wgctrl to configure wireguard.
- updated `talosctl cluster create` to support generating Wireguard
network configuration automatically by just specifying the network cidr.
- added docs about Wireguard support/how to use it.
Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
Fixes#3011
See also https://github.com/talos-systems/go-procfs/pull/8
We don't want to allow all the kernel args to be overridden, as this
might compromise KSPP, but we would rather allow some args to be
overridden explicitly.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Default manifests created by bootkube so far were only enabling
pod-checkpointer for kube-apiserver. This seems to have issues with
single-node control plane scenario, when without scheduler and
controller-manager node might fall into `NodeAffinity` state.
See https://github.com/talos-systems/bootkube-plugin/pull/23
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
That change should make Talos updates more straightforward in any
projects that depend on Talos.
Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
SBC should always overwrite default kernel params.
Otherwise we will always get duplicate values for some of them.
Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
If the node time is out of sync, it can generate incorrect
configuration. And maintenance mode does not allow us starting ntp,
because there is no containerd.
By providing current UTC time of the machine where talosctl client is
running, it is possible to force GenerateConfiguration use correct time.
Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
This PR moves the configpatcher as a package under machinery. It also
reworks the existing function to specify that it's explicitly for JSON
6902 patching so we can add more patch types if desired later on.
Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
Default image versions are kept as commented out examples.
This allows better experience for generating config on amd64 for arm64
servers. (e.g. for RPi).
Without embedded values in the config, Talos is going to use the
defaults which work better.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This introduces the notion of a "board" in Talos. A board is an interface that is capable
of modifying the installation in specific ways for a given SBC. This also adds support for the
libretech_all_h3_cc_h5.
Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
Talos 0.8 is going to ship with K8s 1.20.x.
Changes to support new `control-plane` label,
upgrade-k8s supports automated fixups for 1.20.
See also: https://github.com/talos-systems/bootkube-plugin/pull/22
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This is initial commit of the installer.
What's done:
- verifying node availability before starting any operations.
- gathering information about disks on the machine.
- allows setting: install disk, hostname, machine type, installer image,
kubernetes version, dns domain, cluster-name.
- dumps/merges talosconfig to a file after applying configuration.
Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
On first boot of Talos, if userdata is missing, Talos is going to drop
into maintenance mode which allows to upload config to the server via
`talosctl apply-config` command.
See also: https://github.com/talos-systems/go-retry/pull/4Fixes#2780
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Server in maintenance mode now prints certficate fingerprint and
provides sample talosctl command to upload config to the node.
`talosctl` can optionally enforce server certificate fingerprint.
See also https://github.com/talos-systems/crypto/pull/4Fixes#2753
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This builds a bundle with CNI plugins for talosctl which is
automatically downloaded by `talosctl` if CNI plugins are missing.
CNI directories are moved by default to the `~/.talos/cni` path.
Also add a bunch of pre-flight checks to the QEMU provisioner to make it
easier to bootstrap the Talos QEMU cluster.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This skips writing partition table if partition doesn't have to be
resized (already resized or max size from the beginning).
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
For 0.6 -> 0.7 upgrade, in any case config.yaml is preserved and moved
from `/boot` to `/system/state`.
For single node upgrade, `EPHEMERAL` partition is not touched and other
partitions are re-created as needed.
Bump provision tests to 0.6/0.7 upgrades as we get closer to the new
release.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This PR makes use of a new merge into the upstream rtnetlink library
that introduces functional args for adding routes.
Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
This covers most of the packages except for those we have to keep on
hold (etcd and grpc because of etcd).
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This unifies more code paths under the control of `install.Manifest` vs.
being split across the installer and manifest code.
There should be no functional changes now.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Library `blockdevice` was extracted as `talos-systems/go-blockdevice`,
this PR finalizes the move by removing Talos copy of it.
Some functions around `mkfs`/`growfs` were extracted as `makefs`
package, as they depend on `cmd` package.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Kubeconfig merge was completely rewritten to be "smarter":
* automatically apply renames done at previous stages to avoid asking
over and over again (in general should ask just once)
* skip checks if parts of the config match exactly
* allow overwrite as an option
* flexible way to control the output
* activating context in the end
* custom merged context name
Fixes#2578Fixes#2587Fixes#2577
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Adds the ability to apply (replace) an existing node configuration with
a new one via the Machine API.
Fixes#2345
Signed-off-by: Seán C McCord <ulexus@gmail.com>
This uses go-retry feature
(https://github.com/talos-systems/go-retry/pull/3) to print errors being
retried.
If image is not found in the index, abort retries immediately.
Don't pull installer image twice (if already pulled by the validation
code before).
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Add sonobuoy runner code with log fetching on failure. Use hand-picked
set of e2e tests to run: verify basic pod functionality, verify service
connectivity.
Add option `--run-e2e` to the `talosctl health` to run quick e2e test to
verify cluster health.
Add option to run provision tests with custom CNI, run one track of
provision tests with Cilium.
Bump Cilium to 1.8.2.
Talos 0.6 won't uncordon node automatically after upgrade from 0.5, as
0.5 doesn't put annotation. Workaround that in upgrade tests.
Bump upgrade test version to 0.6.0 release.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Bootkube recover process (and `talosctl recover`) was actually
regenerating assets each time `recover` runs forcing control plane to be
at the state when cluster got created. This PR fixes that by running
recover process correctly.
Recovery via etcd was fixed to handle encrypted etcd data:
it follows the way `apiserver` handles encryption at rest, and as at
the moment AES CBC is the only supported encryption method, code simply
follows the same path.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Package `pkg/crypto` was extracted as `github.com/talos-systems/crypto`
repository and Go module.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Latest version in 3.3 branch is 3.3.23, but it's broken, so we use previous
stable version.
Switch to official etcd gcr.io registry, early support for arm64.
Move `etcd` service to run in system containerd.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Fixes#2316
Simply update dependencies we don't track on version level to be
compatible with Talos components (like etcd or k8s).
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
1. Add [xid-based](https://github.com/rs/xid) event IDs. Xids
are sortable and unique enough. Xids also encode event publishing
time with a second precision.
2. Add three ways to look back into event history: based on number of
events, on time and ID. Lookup via ID might be used to restart event
polling in case of broken API connection from the same moment.
3. Reimplement core event buffer with positions which are always
incremented instead of generation+index, this implementation is much
more simple (idea from circular buffer).
4. By default, Events API works the same - it shows no history and
starts streaming new events only.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This PR brings in all changes necessary to deploy kubernetes 1.19.x.
It relies on an update to our bootkube-plugin project, as well as
implementation of some Image() functions for our various control plane
components, since they are all distinct images and not just hyperkube.
Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
This should be proper way to adjust time incrementally without causing
jumps one in +/- direction. Time-sensitive services might be confused by
huge jumps.
This also implements timed healh check based on first successful time
sync.
Fixed some random health check related issues in other services.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This updates the bootkube plugin that brings in a fix that allows
any seccomp profile name to be used.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
Handle dual-stack configurations with the bootkube wrapper. This uses
the new PodCIDRs and ServiceCIDRs `asset.Config` parameters in bootkube.
It also relies on the bootkube-plugin features for manipulating
kube-proxy config and installing the dual-stack DNS service.
Fixes#2055
Signed-off-by: Seán C McCord <ulexus@gmail.com>
This is a rewrite of machined. It addresses some of the limitations and
complexity in the implementation. This introduces the idea of a
controller. A controller is responsible for managing the runtime, the
sequencer, and a new state type introduced in this PR.
A few highlights are:
- no more event bus
- functional approach to tasks (no more types defined for each task)
- the task function definition now offers a lot more context, like
access to raw API requests, the current sequence, a logger, the new
state interface, and the runtime interface.
- no more panics to handle reboots
- additional initialize and reboot sequences
- graceful gRPC server shutdown on critical errors
- config is now stored at install time to avoid having to download it at
install time and at boot time
- upgrades now use the local config instead of downloading it
- the upgrade API's preserve option takes precedence over the config's
install force option
Additionally, this pulls various packes in under machined to make the
code easier to navigate.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
This fixes a bug caused by a missing device argument to `mkfs.xfs`.
Without a device, `mkfs.xfs` will error out. Additionally, this ensures
that the installer container is started with the `kmsg` writer that
ensures logs are formatted correctly for `/dev/kmsg`. Without this we
lose a lot of the logs output by the container, one of them being any
error from `mkfs.xfs`
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
This PR will pull in the latest release of k8s 1.18 so we can start
validating it through our test suite.
Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
This brings in the latest changes from our fork of bootkube. One thing
to note is a fix that stops the pod controller cache object.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
BREAKING CHANGE: This PR fixes a bug where we were only passing `cluster.local` to the
kubelet configuration. It will also pull in a new version of the
bootkube fork to ensure that custom domains got propogated down to the
API Server certs, as well as the CoreDNS configuration for a cluster.
Existing users should be aware that, if they were previously trying to
use this option in machine configs, that an upgrade will may break
their cluster. It will update a kubelet flag with the new domain, but
CoreDNS and API Server certs will not change since bootkube has already
run. One option may be to change these values manually inside the
Kubernetes cluster. However, it may prove easier to rebuild the cluster
if necessary.
Additionally, this PR also exposes a flag to `osctl config generate`
to allow tweaking this domain value as well.
Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
With the fix#1904, it's now possible to upgrade 0.4.x with
`machine.File` extra files (caused by registry mirror for
registry.ci.svc).
Bump resources for upgrade tests in attempt to speed it up.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This makes use of the external procfs pacakge that is based on the
pacakge we are removing here.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
This allows users to specify well known query parameters in `talos.config`.
The only supported parameter in this change is `uuid`. This will send
the node's UUID determined from SMBIOS along with the request for the
config.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
Firecracker launches tries to open VM disk image before every boot,
parses partition table, finds boot partition, tries to read it as FAT32
filesystem, extracts uncompressed kernel from `bzImage` (firecracker
doesn't support `bzImage` yet), extracts initramfs and passes it to
firecracker binary.
This flow allows for extended tests, e.g. testing installer, upgrade and
downgrade tests, etc.
Bootloader emulation is disabled by default for now, can be enabled via
`--with-bootloader-emulation` flag to `osctl cluster create`.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This PR contains generic simple TCP loadbalancer code, and glue code for
firecracker provisioner to use this loadbalancer.
K8s control plane is passed through the load balancer, and Talos API is
passed only to the init node (for now, as some APIs, including
kubeconfig, don't work with non-init node).
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
With all our PRs merged, we can switch back to upstream version. No tag
yet, so we have to follow `master` for now.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This PR updates the talos branch of bootkube to add extraArgs to
bootstrap controlplane components as well.
Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
This PR aims to fix the ability to pass extra flags to control plane
components. This will close#1523
Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
The `client.Creds` struct was not used very often, and made using the
`client.NewClient` function impossible to use in combination with the
`RemoteRenewingFileCertificateProvider`. This modifies
`client.NewClient` to accept a `tls.Config` instead of `client.Creds`,
allowing for the use of `RemoteRenewingFileCertificateProvider` with
`client.NewClient`.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
This PR will add the new cluster name field to our bootkube options.
This allows for the generated kubeconfig to include the context-name for
the default context.
Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
This is initial PR to push the initial code, it has several known
problems which are going to be addressed in follow-up PRs:
1. there's no "cluster destroy", so the only way to stop the VMs is to
`pkill firecracker`
2. provisioner creates state in `/tmp` and never deletes it, that is
required to keep cluster running when `osctl cluster create` finishes
3. doesn't run any controller process around firecracker to support
reboots/CNI cleanup (vethxyz interfaces are lingering on the host as
they're never cleaned up)
The plan is to create some structure in `~/.talos` to manage cluster
state, e.g. `~/.talos/clusters/<name>` which will contain all the
required files (disk images, file sockets, VM logs, etc.). This
directory structure will also work as a way to detect running clusters
and clean them up.
For point number 3, `osctl cluster create` is going to exec lightweight
process to control the firecracker VM process and to simulate VM reboots
if firecracker finishes cleanly (when VM reboots).
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This ensures bootkube waits until all pods and nodes are ready before tearing
down the bootstrap control plane.
Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
This updates the CoreDNS to use the 'kube-dns' naming convention. This
naming convention is used throughout the Kubrnetes documentation. This
also fixing the kube-dns service. The label label selector was wrong,
breaking cluster DNS.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
This moves to using the KubeletConfiguration instead of flags to the
kubelet. It also enables DynamicKubeletConfiguration, which allows users
to configure kubelets using a ConfigMap.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
This extracts Docker Talos cluster provisioner as common code
which might be shared between `osctl cluster` and integration-test.
There should be almost no functional changes.
As proof of concept, abstract cluster readiness checks were implemented
based on provisioned cluster state. It implements same checks as
`basic-integration.sh` in pure Go via Talos/K8s clients.
`conditions` package was promoted from machined-internal to
`internal/pkg` as it is used to run the checks.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This brings in an updated library along with some tweaks on our side to allow for
better decision making when it comes to the scope of routes. This also fixes an
issue where multiple configuration definitions for an interface were not properly
merged and instead were overwritten.
Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
This adds support for downloading the machine config over TFTP. This
will allow users to avoid having to setup an HTTP server, and use
whatever they are using for PXE.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
Primarily doc/constant changes.
Added additionnal bits to `docs` target in makefile to generate osctl
docs as well as config files. Explicitly define a HOME variable so we
get consistent home directories for talosconfig variables in our docs.
Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
This pulls in an update from our bootkube fork that adds security
hardening to the control plane. The following was changed:
- API server now uses an EncryptionConfig for encrypting secrets
- API server now has an audit policy
- Profiling was disabled on all control plane components
- PodSecurityPolicy is enabled
- API server TLS cipher suites were set to the recommended ciphers by CIS
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
When we upgrade a node, we kill off all pods before performing a fresh
install. The issue with this is that we run the risk of killing the CNI
pod before we finish killing all other pods, leaving the CRI unable to
teardown the pod's networking. This works around that by first killing
any pods running without host networking so that the CNI can do its'
job, and then removing the remaining pods.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
This PR will allow users to specify one or many URLs for CNI so that
they can bypass bootkube deploying flannel and bring their own. Will
close#1593
Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
This moves to using Webhook mode instead of the default AlwaysAllow for
kubelet API authorization.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
This brings in changes from upstream bootkube. It fixes an issue with
the pod-checkpointer that would cause the pod-checkpointer to fail if
the kubelet's read-only port were disabled. It also adds a dedicated
certificate for the API server's `kubelet-client-*` args, which will allow the
usage of the `authentication-token-webhook` flag in the kubelet.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
This replaces codegen version of apid proxying with
talos-systems/grpc-proxy based version. Proxying is transparent, it
doesn't require exact information about methods and response types. It
requires some common layout response to enhance it properly with node
metadata or errors.
There should be no signifcant changes to the API with the previous
version, but it's worth mentioning a few changes:
1. grpc.ClientConn is established just once per upstream (either local
service or remote apid instance).
2. When called without `-t` (`targets`), apid proxies immediately down
to local service skipping proxying to itself (as before), which results
in empty node metadata in response (before it had local node IP). Might
revert this later to proxy to itself (?).
3. Streaming APIs are now fully supported with multiple targets, but
message definition doesn't contain `ResponseMetadata`, so streaming APIs
are broken now with targets (needs a fix).
4. Errors are now returned as responses with `Error` field set in
`ResponseMetadata`, this requires client library update and `osctl` to
handle it properly.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This brings in a patched version of the pod-checkpointer. It fixes a bug
that prevented the static pod-checkpointer from being scheduled,
preventing recovery of the control plane on a reboot of all control
plane nodes.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
This moves to using our official bootkube repo. The latest changes in
the branch we are using enables the aggregation layer. This should fix
our conformance.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
This includes a healthy refactor of the networkd code as well.
- Move netlink functionality to nic package
- Networkd facilitates the orchestration of the underlying interface configuration
- Networkd now stores the state of each interface configuration. This
should allow us to expose this information via api in the future.
Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
This PR moves to using the full URL for endpoint instead of trying to
hardcode 6443 in various places like we were doing.
Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
This removes the github.com/pkg/errors package in favor of the official
error wrapping in go 1.13.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
When creating docker based clusters, we need to use `InternalIP` for
kubelet connections. The default is
`Hostname,InternalDNS,InternalIP,ExternalDNS,ExternalIP`, but
`Hostname` doesn't work in docker because we don't depend on docker for
DNS.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
In order for other projects to make use of our APIs, they must not
reside underneath the internal directory. This moves the protobuf
definitions to a top-level "api" directory and scopes them according to
their domain. This change also removes generated code from the gitignore
file so that users don't have to generate the code themseleves.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
New release comes with bugfixes (we got some of them integrated for
not tagged release), and few interesting new assertions, including
`Eventually` for polling.
See: https://github.com/stretchr/testify/milestone/2?closed=1
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
Use circular buffer instead of (unlimited) `bytes.Buffer` to limit
amount of stderr output captured. If command being run produces too much
output on stderr, this might consume too much RAM.
Use `pkg/cmd` to run command in `udevd` service. This should allow
easier udevd integration.
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This PR will upgrade to the latest beta of v1.16 in order to get us
closer to catching the v1.16.0 release as soon as it drops.
Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
This is a major rewrite of our network subsystem.
- This changes networkd to run as a standalone app versus internal goroutine
- This changes out the netlink package with the more idiomatic netlink/rtnetlink
packages
- This changes the initial network bootstrap/discovery from using a single
interface to attempting to bring up all interfaces
- This moves us back on to the upstream dhcp library
Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
This is not ideal, but it works. We essentially need to start using
replace statements in order to pull in the modules we need.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
This change aims to standardize the boot process. It introduces the
concept of a phase, which is comprised of tasks. Phases are ran in serial and
the tasks that make up a phase are ran concurrently.
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
Decided to combine two very small changes (which I'm now grumpy at myself for doing).
First, we'll update the toolchain image versions to allow for the use of a new containerd and runc. Also updated go.mod and go.sum to make use of newer containerd version. Closes#743 and #744.
Second, I added the bit of logic to osctl config generate to determine the working directory and let the user know that we created the various yaml files there. Closes#760.
Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
* feat(osctl): Automatic sizing of top window
Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
* feat(osctl): Format top output in proper columns
Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
* feat(osctl): Add sort by cpu/rss options
Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
* feat(osctl): Add ability to run once (no gui)
Signed-off-by: Brad Beam <brad.beam@talos-systems.com>