Commit Graph

495 Commits

Author SHA1 Message Date
Andrey Smirnov
5811f4dda1 feat: implement link (interface) controllers
The structure of the controllers is really similar to addresses and
routes:

* `LinkSpec` resource describes desired link state
* `LinkConfig` controller generates `LinkSpecs` based on machine
configuration and kernel cmdline
* `LinkMerge` controller merges multiple configuration sources into a
single `LinkSpec` paying attention to the config layer priority
* `LinkSpec` controller applies the specs to the kernel state

Controller `LinkStatus` (which was implemented before) watches the
kernel state and publishes current link status.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-06-01 09:36:25 -07:00
Alexey Palazhchenko
c036b94948 chore: bump dependencies
Closes #3699, #3668, #3698, #3697, #3696, #3695, #3694, #3693, #3692.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>
2021-05-31 06:12:06 -07:00
Artem Chernyshev
76dbfb3699 feat: add ability to mark MBR partition bootable
Fixes: https://github.com/talos-systems/talos/issues/3532

Machine install section now has `markMBRBootable` option.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2021-05-27 12:44:50 -07:00
Alexey Palazhchenko
e0f5b1e20a chore: split mgmt/gen.go into several files
No functional changes in this PR, to make future PRs easier.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>
2021-05-26 12:54:48 -07:00
Artem Chernyshev
723597657a feat: enable GORACE=halt_on_panic=1 in machined binary
Fixes: https://github.com/talos-systems/talos/issues/3533

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2021-05-25 11:24:07 -07:00
Artem Chernyshev
1db301edf6 feat: switch controller-runtime to zap.Logger
Enable logging using default development config with some fine tuning.
Additionally, now `info` and below logs go to kmsg.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2021-05-25 02:15:31 -07:00
Andrey Smirnov
59cfd312c1 chore: bump dependencies via dependabot
There were some upstream code changes in etcd, some code got moved
around.

PRs #3651 #3652 #3653 #3654 #3655 #3655 #3656 #3657 #3658
    #3659 #3660 #3661 #3662 #3663

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-05-24 12:15:15 -07:00
Andrey Smirnov
04ddda962f feat: update containerd to 1.5.2, runc to 1.0.0-rc95
This also updates libseccomp and add support for `netxen` networkd card.

This addresses[CVE-2021-30465](https://github.com/opencontainers/runc/security/advisories/GHSA-c3xm-pvg7-gh7r).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-05-19 15:24:28 -07:00
Andrey Smirnov
6bc6658b51 feat: update containerd to 1.5.1
See https://github.com/containerd/containerd/releases/tag/v1.5.1

Also brings Talos kernel with Geneve encapsulation for Openvswitch (see
talos-systems/pkgs#278).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-05-17 10:33:49 -07:00
Andrey Smirnov
c6567fae9c chore: dependabot updates
PRs #3622 #3623 #3624 #3625 #3627 #3628

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-05-17 07:46:24 -07:00
Alexey Palazhchenko
c81cfb2167 chore: allow building with debug handlers
Refs #3534.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>
2021-05-13 02:20:15 -07:00
Spencer Smith
c9651673b9 feat: update go-smbios library
This pulls in a newer version of smbios so that we can detect lower
smbios version and handle endianness if necessary.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2021-05-12 10:49:59 -07:00
Andrey Smirnov
95c656fb72 feat: update containerd to 1.5.0, runc to 1.0.0-rc94
Fixes #3538

See also talos-systems/pkgs#276

As new containerd is now Go module-based, it pulls many more
dependencies if simply imported in `go.mod`, so I had to replace the
reference to the constant in `pkg/machinery/` to `containerd` volume
with simple value to avoid pulling Kubernetes dependencies into
`pkg/machinery`.

Also updates the kernel to include PR talos-systems/pkgs#275 for AES-NI
support.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-05-11 14:43:27 -07:00
Andrey Smirnov
db9c35b570 feat: implement AddressStatusController
This controller queries addresses of all the interfaces in the system
and presents them as resources. The idea is that can be a source for
many decisions - e.g. whether network is ready (physical interface has
scope global address assigned).

This is also good for debugging purposes.

Examples:

```
$ talosctl -n 172.20.0.2 get addresses
NODE         NAMESPACE   TYPE            ID                                          VERSION
172.20.0.2   network     AddressStatus   cni0/10.244.0.1/24                          1
172.20.0.2   network     AddressStatus   cni0/fe80::9c87:cdff:fe8e:5fdc/64           2
172.20.0.2   network     AddressStatus   eth0/172.20.0.2/24                          1
172.20.0.2   network     AddressStatus   eth0/fe80::ac1b:9cff:fe19:6b47/64           2
172.20.0.2   network     AddressStatus   flannel.1/10.244.0.0/32                     1
172.20.0.2   network     AddressStatus   flannel.1/fe80::440b:67ff:fe99:c18f/64      2
172.20.0.2   network     AddressStatus   lo/127.0.0.1/8                              1
172.20.0.2   network     AddressStatus   lo/::1/128                                  1
172.20.0.2   network     AddressStatus   veth178e9b31/fe80::6040:1dff:fe5b:ae1a/64   2
172.20.0.2   network     AddressStatus   vethb0b96a94/fe80::2473:86ff:fece:1954/64   2
```

```
$ talosctl -n 172.20.0.2 get addresses -o yaml eth0/172.20.0.2/24
node: 172.20.0.2
metadata:
    namespace: network
    type: AddressStatuses.net.talos.dev
    id: eth0/172.20.0.2/24
    version: 1
    owner: network.AddressStatusController
    phase: running
spec:
    address: 172.20.0.2/24
    local: 172.20.0.2
    broadcast: 172.20.0.255
    linkIndex: 4
    linkName: eth0
    family: inet4
    scope: global
    flags: permanent
```

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-05-11 13:32:17 -07:00
Andrey Smirnov
1cf011a809 chore: bump dependencies via dependabot
See PRs #3596 #3593 #3592

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-05-11 11:20:23 -07:00
Artem Chernyshev
e3f407a1df fix: properly pass disk type selector from config to matcher
Also updated go-blockdevice library that makes disk type string case
insensitive.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2021-05-11 09:35:40 -07:00
Artem Chernyshev
0e8de04698 fix: update go-blockdevice to fix disk type detection
Otherwise it never detected mmc and nvme devices as such.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2021-05-07 07:33:11 -07:00
Andrey Smirnov
e54b6b7a3d chore: update dependencies via dependabot
PRs #3568 #3567 #3566

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-05-04 14:06:35 -07:00
Andrey Smirnov
f2caed0df5 chore: use extracted talos-systems/go-kmsg library
This change uses extracted go-kmsg library (see
https://github.com/talos-systems/go-kmsg/pull/1).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-05-04 10:40:28 -07:00
Alexey Palazhchenko
aeec99d824 chore: remove temporary fork
PR was merged by upstream.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>
2021-04-28 09:57:23 -07:00
Andrey Smirnov
a01b1d22d9 chore: dump dependencies via dependabot
PRs #3530 #3543 #3544 #3545 #3546 #3547 #3548 #3549 #3550 #3551 #3552

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-04-27 05:33:07 -07:00
Andrey Smirnov
d540a4a471 fix: bump crypto library for the CSR verification fix
See https://github.com/talos-systems/crypto/pull/11

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-04-27 05:30:56 -07:00
Alexey Palazhchenko
38037131cd chore: update wgctrl dependecy
Use fork until upstream is fixed.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>
2021-04-27 02:48:29 -07:00
Andrey Smirnov
05cbe250c8 chore: bump dependencies via dependabot
PRs #3503 #3504 #3505

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-04-19 06:44:47 -07:00
Jorik Jonker
8b8542e3b5 feat: add support for reading OVF data on VMWare
The OVF environment is a way to supply guestinfo to guests. It is
a datastructure (XML) put in `extraConfig` (commonly referred to as
`guestinfo`) under the key `ovfenv`.

This OVF env is said to be the proper way to supply customization data
to guests (ie, not through `extraConfig`), and on some platforms (eg,
vCD), it is even the only option.

This change also enables the actual OVF transport in the OVA.

Signed-off-by: Jorik Jonker <jorik.jonker@eu.equinix.com>
2021-04-13 16:16:44 +03:00
Andrey Smirnov
d24df8f844 chore: re-import talos-systems/os-runtime as cosi-project/runtime
No changes, just import path change (as project got moved).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-04-12 07:44:24 -07:00
Andrey Smirnov
ef24fd6a01 chore: bump dependencies via dependabot
See #3464 #3465 #3466

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-04-12 06:32:17 -07:00
Alexey Palazhchenko
37a5edf04a feat: update Kubernetes to 1.21.0 release
See CHANGELOG:
https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.21.md

Closes #3329.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>
2021-04-09 20:08:20 +03:00
Andrey Smirnov
e0650218a6 feat: support etcd recovery from snapshot on bootstrap
When Talos `controlplane` node is waiting for a bootstrap, `etcd`
contents can be recovered from a snapshot created with
`talosctl etcd snapshot` on a healthy cluster.

Bootstrap process goes same way as before, but the etcd data directory
is recovered from the snapshot.

This flow enables disaster recovery for the control plane: given that
periodic backups are available, destroy control plane nodes, re-create
them with the same config, and bootstrap one node with the saved
snapshot to recover etcd state at the time of the snapshot.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-04-08 10:15:37 -07:00
Andrey Smirnov
33035901ff fix: revert mark PMBR EFI partition as bootable
See talos-systems/go-blockdevice#34 talos-systems/talos#3440

That change broke UEFI boot.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-04-07 07:24:58 -07:00
Andrey Smirnov
fbfd1eb2b1 refactor: pull new version of os-runtime, update code
This is mostly refactoring to adapt to the new APIs.

There are some small changes which are not user-visible immediately (but
visible when using `talosctl get` to inspect low-level details):

* `extras` namespace is removed, it was a hack to distinguish extra and
system manifests
* `Manifests` are managed by two controllers as shared outputs, stored
in the `controlplane` namespace now
* `talosctl inspect dependencies` output got slightly changed
* resources now have `md.owner` set to the controller name which manages
the resource

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-04-07 06:55:09 -07:00
Andrey Smirnov
690eb20e97 chore: update blockdevice library for PMBR bootable fix
See https://github.com/talos-systems/go-blockdevice/pull/33

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-04-06 06:14:56 -07:00
Andrey Smirnov
39ae0415e9 chore: bump dependencies via dependabot
See #3431 #3432 #3433 #3434

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-04-05 06:16:24 -07:00
Alexey Palazhchenko
a9451f5712 feat: update Kubernetes to 1.21.0-beta.1
See CHANGELOG:
https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.21.md

Refs #3329.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>
2021-03-30 03:07:03 -07:00
Andrey Smirnov
a6106815b7 chore: bump dependencies via dependabot
See #3386 #3387 #3388

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-29 06:38:55 -07:00
Andrey Smirnov
966caf7a67 chore: remove unused module replace directives
They were required long time ago, doesn't look like we need it now.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-25 08:16:25 -07:00
Alexey Palazhchenko
ed272e604e feat: update Kubernetes to 1.21.0-beta.0
See CHANGELOG:
https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.21.md

Refs #3329.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>
2021-03-24 07:36:54 -07:00
Artem Chernyshev
6ffabe5169 feat: add ability to find disk by disk properties
Fixes: https://github.com/talos-systems/talos/issues/3323

Not exactly matching with udevd generated `by-<id>` symlinks, but should
provide sufficient amount of property selectors to be able to pick
specific disks for any kind of disk: sd card, hdd, ssd, nvme.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2021-03-23 14:23:02 -07:00
Andrey Smirnov
dc294db16c chore: bump dependencies via dependabot
PRs #3336 #3337 #3338 #3339

Also bump proto tools via talos-systems/tools#133

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-22 13:58:08 -07:00
Andrey Smirnov
f0512dfce9 feat: update Kubernetes to 1.20.5
See CHANGELOG:
https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.20.md#changelog-since-v1204

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-19 03:14:46 -07:00
Alexey Palazhchenko
08271ba931 chore: use Go 1.16 language version
It affects some languages features and go subcommands.
https://golang.org/ref/mod#go-mod-file-go

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>
2021-03-17 06:20:39 -07:00
Artem Chernyshev
e31790f6f5 fix: properly format spec comments in the resources
`os-runtime` now writes `yaml` block as raw yaml bytes instead of
decoding it into `yaml.Node` and encoding that `yaml.Node` back to YAML.
The reason is that `go-yaml` comments decoder can't really handle
comment placement properly, so it messes up indents here and there.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2021-03-15 14:21:57 -07:00
Andrey Smirnov
d4d77882e3 chore: update dependencies via dependabot
See #3301 #3302

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-15 06:12:03 -07:00
Andrey Smirnov
f4ca6e9a6e feat: update containerd to version 1.4.4
See https://github.com/containerd/containerd/releases/tag/v1.4.4

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-05 11:00:21 -08:00
Andrey Smirnov
db3785b930 fix: align partition start to the physical sector size
See https://github.com/talos-systems/go-blockdevice/pull/31

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-05 06:54:12 -08:00
Andrey Smirnov
49a23bbde8 chore: bump Go module dependencies
This bumps all the dependencies that can be bumped with minor fixups in
the code.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-03 18:45:12 +03:00
Andrey Smirnov
40a2e4d4fa feat: support JSON output in talosctl get, event types
This adds support for `-o json` (easier to use `jq` to query additional
data), and prints event name in `--watch` mode.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-03 06:18:14 -08:00
Andrey Smirnov
60aa011c7a feat: rename namespaces, resources, types etc
See https://github.com/talos-systems/os-runtime/pull/12 for new mnaming
conventions.

No functional changes.

Additionally implements printing extra columns in `talosctl get xyz`.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-03-02 13:34:15 -08:00
Andrey Smirnov
d7cdc8cc15 feat: implement simple layer 2 shared IP for CP
This adds a VIP (virtual IP) option to the network configuration of an
interface, which will allow a set of nodes to share a floating IP
address among them.  For now, this is restricted to control plane use
and only a single shared IP is supported.

Fixes #3111

Signed-off-by: Seán C McCord <ulexus@gmail.com>
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-02-26 14:14:34 -08:00
Artem Chernyshev
041620c852 feat: implement talosctl edit and patch config commands
Fixes: https://github.com/talos-systems/talos/issues/3209

Using parts of `kubectl` package to run the editor.
Also using the same approach as in `kubectl edit` command:
- add commented section to the top of the file with the description.
- if the config has errors, display validation errors in the commented
section at the top of the file.
- retry apply config until it succeeds.
- abort if no changes were detected or if the edited file is empty.

Patch currently supports jsonpatch only and can read it either from the
file or from the inline argument.

https://asciinema.org/a/wPawpctjoCFbJZKo2z2ATDXeC

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2021-02-26 02:00:20 +03:00
Andrey Smirnov
953ce643ab feat: bump etcd client library to 3.5.0-alpha.0
This version is finally using working `go.mod` files and tags, so no
more hacks with imports, and allows us to bump `grpc` library to the
latest version (I also did for this PR).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-02-25 10:36:15 -08:00
Andrey Smirnov
779ac74a08 fix: improve the drain function
Critical bug (I believe) was that drain code entered the loop to evict
the pod after wait for pod to be deleted returned success effectively
evicting pod once again once it got rescheduled to a different node.

Add a global timeout to prevent draining code from running forever.

Filter more pod types which shouldn't be ever drained.

Fixes #3124

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-02-25 07:02:24 -08:00
Andrey Smirnov
4e19b597ab test: add integration test with Canal CNI and reset API
Canal CNI is known to be trying to reach out to k8s control plane on pod
teardown.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-02-24 11:34:02 -08:00
Andrey Smirnov
e9fc54f6e3 feat: update Kubernetes to 1.20.3
https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.20.md#changelog-since-v1202

Also updater pkgs for:

* talos-systems/pkgs#238 (raspberrypi-firmware update)
* talos-systems/pkgs#242 (Linux 5.10.17 + init_on_free=0)

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-02-19 05:22:34 -08:00
Artem Chernyshev
54d6a45217 feat: add state encryption support
State partition encryption support adds a new section to the machine config.
And a new step to the sequencer flow which saves encryption
configuration object as json serialized value in the META partition.

Everything else is the same as is for the ephemeral partition.
Additionally enabled state partition encryption in the disk encryption
integration tests.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2021-02-18 06:55:22 -08:00
Andrey Smirnov
e5bd35ae3c feat: add resource watch API + CLI
This uses API in `os-runtime` to pull the initial list of resources +
updates for resource by type.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-02-17 13:24:47 -08:00
Artem Chernyshev
02b3719df9 feat: skip filesystem for state and ephemeral partitions in the installer
Filesystem creation step is moved on the later stage: when Talos mounts
the partition for the first time.
Now it checks if the partition doesn't have any filesystem and formats
it right before mounting.

Additionally refactored mount options a bit:
- replaced separate options with a set of binary flags.
- implemented pre-mount and post-unmount hooks.

And fixed typos in couple of places and increased timeout for `apid ready`.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2021-02-17 09:37:21 -08:00
Artem Chernyshev
f96548e165 refactor: extract go-cmd into a separate library
To be used in the `go-blockdevice` library.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2021-02-16 10:31:20 -08:00
Andrey Smirnov
7f3dca8e4c test: add support for IPv6 in talosctl cluster create
Modify provision library to support multiple IPs, CIDRs, gateways, which
can be IPv4/IPv6. Based on IP types, enable services in the cluster to
run DHCPv4/DHCPv6 in the test environment.

There's outstanding bug left with routes not being properly set up in
the cluster so, IPs are not properly routable, but DHCPv6 works and IPs
are allocated (validates DHCPv6 client).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-02-09 13:28:53 -08:00
Andrey Smirnov
2277ce8abe feat: move to ECDSA keys for all Kubernetes/etcd certs and keys
ECDSA keys are smaller which decreases Talos config size, they are more
efficient in terms of key generation, signing, etc., so it makes boot
performance better (and config generation as well).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-02-02 13:25:00 -08:00
Andrey Smirnov
8974b529af chore: bump dependencies (via dependabot)
See #3072, #3073, #3074, #3076, #3077, #3078

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-02-01 05:48:34 -08:00
Andrey Smirnov
18db20dbc2 fix: open blockdevices with exclusive flock for partitioning
This fixes spurious race conditions when user disks are partitioned
and formatted in `mountUserDisks` task. While this task runs, `udevd` is
running to allow various `/dev/` symlinks to be used for user disks.
At the same time `udevd` might trigger syscall `BLKRRPART` at any time
concurrently with Talos which leads to a race on kernel side when Talos
tries to update kernel partition table while kernel does it on its own
as a result of `udevd` call.

As part of the fix, `RereadPartitionTable()` calls were removed (they
trigger `BLKRRPART` and they're not needed as Talos updates partition
table on its own).

Some cleanups to make sure blockdevice is open/closed just in matching
pairs (no lingering open blockdevice instances). This is import for
`WithExclusiveLock()` calls, as it would lead to a deadlock if previous
blockdevice instance is not closed.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-01-28 09:11:39 -08:00
Artem Chernyshev
a83af03730 refactor: update go-blockdevice and restructure disk interaction code
This refactoring is required to simplify the work to be done to support
disk encryption.

Tried to minimize amount of queries done by `blockdevice` `probe`
methods.
Instead, where we have `runtime.Runtime` we get all required blockdevices
there from blockdevice cache stored in `State().Machine().Disk()`.
This opens a way to store encryption settings in the `Partition`
objects.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2021-01-28 17:42:09 +03:00
Andrey Smirnov
0aaf8fa968 feat: replace bootkube with Talos-managed control plane
Control plane components are running as static pods managed by the
kubelets.

Whole subsystem is managed via resources/controllers from os-runtime.

Many supporting changes/refactoring to enable new code paths.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-01-26 14:22:35 -08:00
Andrey Smirnov
11863dd74d feat: implement resource API in Talos
This brings in `os-runtime` package and exposes resources with first
iteration of read-only API.

Two Talos resources (and one controller) are implemented:

* legacy.Service resource tracks Talos 'service' `RUNNING` state
* config.V1Alpha1 stores current runtime config

Glue point between existing runtime and new os-runtime based runtime is
in `v1alpha2` implementation and `V1Alpha2()` sub-interfaces of existing
`Runtime`, `State`, `Controller` interfaces.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-01-19 11:45:46 -08:00
Andrey Smirnov
d71ac4c4ff feat: update Kubernetes to 1.20.2
Minor point release, official changelog:

https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.20.md

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-01-15 09:06:18 -08:00
Artem Chernyshev
9883d0af19 feat: support Wireguard networking
This the first iteration of Wireguard network support.
What was done:
- kernel was updated to enable Wireguard kernel module.
- changed networkd to support creating Wireguard device type.
- used wgctrl to configure wireguard.
- updated `talosctl cluster create` to support generating Wireguard
network configuration automatically by just specifying the network cidr.
- added docs about Wireguard support/how to use it.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2021-01-14 15:51:14 +03:00
Alexey Palazhchenko
275ca76c5b chore: update protobuf, grpc-go, prototool
To stay current.

Signed-off-by: Alexey Palazhchenko <alexey.palazhchenko@gmail.com>
2021-01-11 08:52:58 -08:00
Andrey Smirnov
d19486afaa fix: allow 'console' argument in kernel args to be always overridden
Fixes #3011

See also https://github.com/talos-systems/go-procfs/pull/8

We don't want to allow all the kernel args to be overridden, as this
might compromise KSPP, but we would rather allow some args to be
overridden explicitly.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2021-01-08 08:08:34 -08:00
Andrey Smirnov
a8dd2ff30d fix: checkpoint controller-manager and scheduler
Default manifests created by bootkube so far were only enabling
pod-checkpointer for kube-apiserver. This seems to have issues with
single-node control plane scenario, when without scheduler and
controller-manager node might fall into `NodeAffinity` state.

See https://github.com/talos-systems/bootkube-plugin/pull/23

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-12-28 11:53:17 -08:00
Artem Chernyshev
7b6c4bcb1f refactor: define default kernel flags in machinery instead of procfs
That change should make Talos updates more straightforward in any
projects that depend on Talos.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2020-12-24 06:50:53 -08:00
Artem Chernyshev
47fb7d26e0 fix: use SetAll instead of AppendAll when building kernel args
SBC should always overwrite default kernel params.
Otherwise we will always get duplicate values for some of them.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2020-12-23 09:09:13 -08:00
Andrey Smirnov
b1d4814308 feat: update Kubernetes to 1.20.1
See https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.20.md

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-12-21 23:52:29 +03:00
Andrey Smirnov
f90aa613ac fix: don't overwrite PMBR
See https://github.com/talos-systems/go-blockdevice/pull/24

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-12-18 10:21:37 -08:00
Andrey Smirnov
ed42d4a42a fix: bump blockdevice library for 2nd partitione entries copy fix
See https://github.com/talos-systems/go-blockdevice/pull/23

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-12-18 09:15:21 -08:00
Andrey Smirnov
80184393bc feat: update kernel to 5.9.13, new KSPP requirements
Pulls in following changes:

* https://github.com/talos-systems/toolchain/pull/20
* https://github.com/talos-systems/tools/pull/116
* https://github.com/talos-systems/pkgs/pull/214
* https://github.com/talos-systems/pkgs/pull/215
* https://github.com/talos-systems/pkgs/pull/216
* https://github.com/talos-systems/pkgs/pull/217
* https://github.com/talos-systems/go-procfs/pull/4

New empty amd64 images for u-boot & rpi-firmware reduce the size of
amd64 installer image.

For backwards compatibility QEMU provisioner still injects "legacy" KSPP
kernel args into initial boot environment.

Installer correctly upgrades KSPP options when moving from one version
of Talos to another.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-12-10 12:41:58 -08:00
Andrey Smirnov
872e792dbc feat: update Kubernetes to 1.20.0
Official K8s release matching Talos 0.8.0.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-12-09 06:11:48 -08:00
Andrey Smirnov
28d2270067 fix: make reset work again
This bumps go-blockdevice library for the fix
https://github.com/talos-systems/go-blockdevice/pull/22

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-12-04 19:27:41 +03:00
Artem Chernyshev
c7062e3f4d feat: make GenerateConfiguration accept current time as a parameter
If the node time is out of sync, it can generate incorrect
configuration. And maintenance mode does not allow us starting ntp,
because there is no containerd.

By providing current UTC time of the machine where talosctl client is
running, it is possible to force GenerateConfiguration use correct time.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2020-12-03 08:28:11 -08:00
Spencer Smith
ed31056d91 feat: introduce configpatcher package in machinery
This PR moves the configpatcher as a package under machinery. It also
reworks the existing function to specify that it's explicitly for JSON
6902 patching so we can add more patch types if desired later on.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2020-12-02 20:49:38 -05:00
Andrey Smirnov
621968977e feat: update kubernetes to 1.20.0-rc.0
Talos 0.8 is going to ship with K8s 1.20.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-12-02 10:50:58 -08:00
Andrey Smirnov
3951898c76 chore: update module dependencies
Generic bump of dependencies.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-12-01 10:04:02 -08:00
Andrey Smirnov
08c84fe678 feat: stop including K8s version by default in talosctl gen config
Default image versions are kept as commented out examples.

This allows better experience for generating config on amd64 for arm64
servers. (e.g. for RPi).

Without embedded values in the config, Talos is going to use the
defaults which work better.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-11-30 12:52:53 -08:00
Andrey Smirnov
07f4ed7fb4 feat: upgrade etcd to 3.4.14
No major fixes, just keeping version up to date.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-11-26 09:14:41 -08:00
Andrey Smirnov
1eac88e470 feat: add support for installing to SBCs
This introduces the notion of a "board" in Talos. A board is an interface that is capable
of modifying the installation in specific ways for a given SBC. This also adds support for the
libretech_all_h3_cc_h5.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2020-11-26 07:18:25 -08:00
Andrey Smirnov
28ba6e416e feat: update Kubernetes to v1.20.0-beta.2
Talos 0.8 is going to ship with K8s 1.20.x.

Changes to support new `control-plane` label,
upgrade-k8s supports automated fixups for 1.20.

See also: https://github.com/talos-systems/bootkube-plugin/pull/22

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-11-25 06:39:14 -08:00
Andrey Smirnov
b5173477c8 fix: bump blockdevice library for mmcblk part name fix
See https://github.com/talos-systems/go-blockdevice/pull/17

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-11-20 05:36:14 -08:00
Artem Chernyshev
b6874ee82a feat: add TUI based talos interactive installer
This is initial commit of the installer.
What's done:
- verifying node availability before starting any operations.
- gathering information about disks on the machine.
- allows setting: install disk, hostname, machine type, installer image,
  kubernetes version, dns domain, cluster-name.
- dumps/merges talosconfig to a file after applying configuration.

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2020-11-18 12:34:15 -08:00
Andrey Smirnov
83bb1afcb6 feat: drop to maintenance mode in cloud platforms if userdata is missing
On first boot of Talos, if userdata is missing, Talos is going to drop
into maintenance mode which allows to upload config to the server via
`talosctl apply-config` command.

See also: https://github.com/talos-systems/go-retry/pull/4

Fixes #2780

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-11-16 11:03:26 -08:00
Andrey Smirnov
df6ad3fa80 feat: upgrade Kubernetes default version to 1.19.4
k8s.io modules don't have 1.19.4 tag yet :(

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-11-12 08:51:04 -08:00
Andrey Smirnov
58df555580 feat: add example command in maintenance, enforce cert fingerprint
Server in maintenance mode now prints certficate fingerprint and
provides sample talosctl command to upload config to the node.

`talosctl` can optionally enforce server certificate fingerprint.

See also https://github.com/talos-systems/crypto/pull/4

Fixes #2753

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-11-12 07:36:18 -08:00
Andrew Rynhard
71321214a1 feat: add storage API
This is the initial implementation of a storage API.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2020-11-11 10:12:25 -08:00
Andrey Smirnov
4c42e22dbf chore: bump version of x/net module
Fixes #2746

This should cover the `x/net/http2` fixes which were backported in Go
1.15.4.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-11-10 07:09:19 -08:00
Andrew Rynhard
8e40ea46e0 fix: address issues in webconfig
Fixes TLS and invalide redirects.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2020-11-01 12:03:55 -08:00
Andrey Smirnov
350d75eb46 feat: build talosctl-cni-bundle, use it in talosctl for QEMU
This builds a bundle with CNI plugins for talosctl which is
automatically downloaded by `talosctl` if CNI plugins are missing.

CNI directories are moved by default to the `~/.talos/cni` path.

Also add a bunch of pre-flight checks to the QEMU provisioner to make it
easier to bootstrap the Talos QEMU cluster.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-10-30 16:30:37 -07:00
Andrey Smirnov
d9b74f0cc6 feat: skip resizing ephemeral partition if not required
This skips writing partition table if partition doesn't have to be
resized (already resized or max size from the beginning).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-10-30 14:31:10 -07:00
Andrey Smirnov
a32c0a78f7 docs: improve the config reference documentation
Lots of small changes, changing layout, adding back references,
propagating examples, etc.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-10-29 18:41:46 -07:00
Andrey Smirnov
bc9e0c0dba fix: re-implement upgrade (install) with preserve
For 0.6 -> 0.7 upgrade, in any case config.yaml is preserved and moved
from `/boot` to `/system/state`.

For single node upgrade, `EPHEMERAL` partition is not touched and other
partitions are re-created as needed.

Bump provision tests to 0.6/0.7 upgrades as we get closer to the new
release.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-10-28 07:25:26 -07:00
Andrey Smirnov
41f92b5d35 feat: wipe disks faster in the installer
See https://github.com/talos-systems/talos/pull/2663 for details.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-10-22 08:32:29 -07:00
Spencer Smith
8b5406c889 chore: move to newer release of rtnetlink with fn args
This PR makes use of a new merge into the upstream rtnetlink library
that introduces functional args for adding routes.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2020-10-22 06:56:22 -07:00
Andrey Smirnov
16b6d344de chore: bump module dependencies in go.mod
This covers most of the packages except for those we have to keep on
hold (etcd and grpc because of etcd).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-10-20 08:09:42 -07:00
Andrey Smirnov
56f1ee37fd feat: upgrade Kubernetes to 1.19.3
Just minor release bump.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-10-20 05:12:32 -07:00
Andrey Smirnov
4adb613f66 refactor: bring more control to install.Manifest execution
This unifies more code paths under the control of `install.Manifest` vs.
being split across the installer and manifest code.

There should be no functional changes now.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-10-20 01:08:14 -07:00
Andrey Smirnov
018086d1fa refactor: extract blockdevice library
Library `blockdevice` was extracted as `talos-systems/go-blockdevice`,
this PR finalizes the move by removing Talos copy of it.

Some functions around `mkfs`/`growfs` were extracted as `makefs`
package, as they depend on `cmd` package.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-10-05 11:18:43 -07:00
Andrey Smirnov
16eb47a1a3 feat: use kubeconfig merge in talosctl kubeconfig by default
Kubeconfig merge was completely rewritten to be "smarter":

* automatically apply renames done at previous stages to avoid asking
over and over again (in general should ask just once)

* skip checks if parts of the config match exactly

* allow overwrite as an option

* flexible way to control the output

* activating context in the end

* custom merged context name

Fixes #2578

Fixes #2587

Fixes #2577

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-10-03 05:36:15 -07:00
Seán C McCord
ff92d2a14b feat: add ApplyConfiguration API
Adds the ability to apply (replace) an existing node configuration with
a new one via the Machine API.

Fixes #2345

Signed-off-by: Seán C McCord <ulexus@gmail.com>
2020-09-29 14:44:06 -07:00
Andrey Smirnov
8236822c90 fix: retry image pulling, stop on 404, no duplicate pulls
This uses go-retry feature
(https://github.com/talos-systems/go-retry/pull/3) to print errors being
retried.

If image is not found in the index, abort retries immediately.

Don't pull installer image twice (if already pulled by the validation
code before).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-09-22 07:07:45 -07:00
Seán C McCord
3db5f72b3b fix: validate cluster endpoint
Validates cluster endpoint from v1alpha1 config.

Fixes #2101

Signed-off-by: Seán C McCord <ulexus@gmail.com>
2020-09-17 16:39:14 -07:00
Andrey Smirnov
14ad0674d3 feat: update Flannel to 0.12, support for arm64
This updates bootkube-plugin with PR
https://github.com/talos-systems/bootkube-plugin/pull/21.

Flannel image is chosen based on host architecture.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-09-15 08:04:29 -07:00
Andrey Smirnov
b4341d8780 feat: upgrade kubernetes to 1.19.1
Release notes: https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.19.md#v1191

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-09-11 06:58:12 -07:00
Andrey Smirnov
788cd15c29 test: add e2e test to the provision (upgrade) tests
Add sonobuoy runner code with log fetching on failure. Use hand-picked
set of e2e tests to run: verify basic pod functionality, verify service
connectivity.

Add option `--run-e2e` to the `talosctl health` to run quick e2e test to
verify cluster health.

Add option to run provision tests with custom CNI, run one track of
provision tests with Cilium.

Bump Cilium to 1.8.2.

Talos 0.6 won't uncordon node automatically after upgrade from 0.5, as
0.5 doesn't put annotation. Workaround that in upgrade tests.

Bump upgrade test version to 0.6.0 release.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-09-08 13:26:31 -07:00
Andrew Rynhard
2b84cf1967 feat: upgrade runc to v1.0.0-rc92
This brings in the latest stable version of runc.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2020-09-04 13:09:37 -07:00
Andrew Rynhard
6a85a47ffa feat: upgrade containerd to v1.4.0
This brings in the latest stable containerd.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2020-09-04 02:59:08 -07:00
Andrey Smirnov
f6ecf000c9 refactor: extract packages loadbalancer and retry
This removes in-tree packages in favor of:

* github.com/talos-systems/go-retry
* github.com/talos-systems/go-loadbalancer

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-09-02 13:46:22 -07:00
Andrey Smirnov
ac4ab11d36 chore: update k8s modules to 1.19 final version
I think we missed that when updating K8s for the final version.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-09-02 06:25:28 -07:00
Andrey Smirnov
6a7cc02648 fix: handle bootkube recover correctly, support recovery from etcd
Bootkube recover process (and `talosctl recover`) was actually
regenerating assets each time `recover` runs forcing control plane to be
at the state when cluster got created. This PR fixes that by running
recover process correctly.

Recovery via etcd was fixed to handle encrypted etcd data:
it follows the way `apiserver` handles encryption at rest, and as at
the moment AES CBC is the only supported encryption method, code simply
follows the same path.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-08-18 14:24:14 -07:00
Andrey Smirnov
7875e9499f chore: re-import talos-systems/pkg/crypto/tls
See also https://github.com/talos-systems/crypto/pull/2

This should break dependency of `pkg/client` on `pkg/grpc`.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-08-17 08:06:38 -07:00
Andrey Smirnov
2697b99b7d refactor: extract pkg/net as github.com/talos-systems/net
This extracts common package as new module/repository.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-08-14 11:04:50 -07:00
Andrey Smirnov
52c5911fcd chore: extract pkg/crypto as external module
Package `pkg/crypto` was extracted as `github.com/talos-systems/crypto`
repository and Go module.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-08-14 06:33:30 -07:00
Andrey Smirnov
7474b8ba52 feat: upgrade etcd to 3.4.10
This upgrades etcd to latest v3.4.x version as smooth upgrade from
version 3.3.22 in 0.6.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-08-13 07:33:51 -07:00
Andrew Rynhard
849959fefc feat: add dynamic config decoder
This adds the ability to dynamically decode mult-doc YAML files.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2020-07-30 08:07:14 -07:00
Andrey Smirnov
3926442704 feat: taint master nodes with NoSchedule taint
Fixes #2350

This also brings in a fix for `coredns` tolerations from
https://github.com/talos-systems/bootkube-plugin/pull/19.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-07-29 14:02:41 -07:00
Andrew Rynhard
1b491d0a66 feat: upgrade Kubernetes to v1.19.0-rc.3
This brings in the latest version of Kubernetes.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2020-07-29 11:04:50 -07:00
Andrey Smirnov
f23c9111d1 feat: upgrade etcd to 3.3.22 version
Latest version in 3.3 branch is 3.3.23, but it's broken, so we use previous
stable version.

Switch to official etcd gcr.io registry, early support for arm64.

Move `etcd` service to run in system containerd.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-07-21 09:44:43 -07:00
Andrey Smirnov
4cd6e7e200 refactor: use humanize.Bytes everywhere
This removes dependency on `bytefmt` package.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-07-20 07:26:33 -07:00
Andrey Smirnov
1a0e1bc393 chore: update module dependencies
Fixes #2316

Simply update dependencies we don't track on version level to be
compatible with Talos components (like etcd or k8s).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-07-16 12:00:50 -07:00
Andrew Rynhard
0617a10027 feat: upgrade Kubernetes to v1.19.0-rc.0
This brings in the latest version of Kubernetes.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2020-07-14 13:07:18 -07:00
Artem Chernyshev
8fc352ec4f feat: merge mode in talosctl kubeconfig
New flag `-m` will enable merge mechanism in `talosctl kubeconfig`

Command examples:

```
talosctl kubeconfig -m

talosctl kubeconfig -m ~/.kube/config
```

Signed-off-by: Artem Chernyshev <artem.0xD2@gmail.com>
2020-07-10 12:39:30 -07:00
Andrey Smirnov
4cc074cdba feat: implement API access to event history
1. Add [xid-based](https://github.com/rs/xid) event IDs. Xids
are sortable and unique enough. Xids also encode event publishing
time with a second precision.

2. Add three ways to look back into event history: based on number of
events, on time and ID. Lookup via ID might be used to restart event
polling in case of broken API connection from the same moment.

3. Reimplement core event buffer with positions which are always
incremented instead of generation+index, this implementation is much
more simple (idea from circular buffer).

4. By default, Events API works the same - it shows no history and
starts streaming new events only.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-07-08 10:54:50 -07:00
Andrew Rynhard
a5a2d959ed feat: upgrade runc to v1.0.0-rc90
This updates runc to the same version vendored by containerd.

Signed-off-by: Andrew Rynhard <andrew@rynhard.io>
2020-07-02 13:19:33 -07:00
Spencer Smith
90115bb3ef feat: update kubernetes to 1.19.0-beta.1
This PR brings in all changes necessary to deploy kubernetes 1.19.x.

It relies on an update to our bootkube-plugin project, as well as
implementation of some Image() functions for our various control plane
components, since they are all distinct images and not just hyperkube.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2020-06-10 15:01:11 -04:00
Andrey Smirnov
d1a4e6ee64 feat: adjust time properly in timed via adjtime()
This should be proper way to adjust time incrementally without causing
jumps one in +/- direction. Time-sensitive services might be confused by
huge jumps.

This also implements timed healh check based on first successful time
sync.

Fixed some random health check related issues in other services.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-06-03 09:39:48 -07:00
Andrew Rynhard
9412e2b478 fix: allow all seccomp profile names
This updates the bootkube plugin that brings in a fix that allows
any seccomp profile name to be used.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-05-19 18:48:22 -07:00
Andrew Rynhard
56d7bf19fe feat: add recovery API
This adds an API for recovering the self-hosted control plane.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-05-04 19:38:30 -07:00
Seán C McCord
c1299d3ff0 feat: allow dual-stack support with bootkube wrapper
Handle dual-stack configurations with the bootkube wrapper.  This uses
the new PodCIDRs and ServiceCIDRs `asset.Config` parameters in bootkube.
It also relies on the bootkube-plugin features for manipulating
kube-proxy config and installing the dual-stack DNS service.

Fixes #2055

Signed-off-by: Seán C McCord <ulexus@gmail.com>
2020-04-28 20:10:58 -07:00
Andrew Rynhard
49307d554d refactor: improve machined
This is a rewrite of machined. It addresses some of the limitations and
complexity in the implementation. This introduces the idea of a
controller. A controller is responsible for managing the runtime, the
sequencer, and a new state type introduced in this PR.

A few highlights are:

- no more event bus
- functional approach to tasks (no more types defined for each task)
  - the task function definition now offers a lot more context, like
    access to raw API requests, the current sequence, a logger, the new
    state interface, and the runtime interface.
- no more panics to handle reboots
- additional initialize and reboot sequences
- graceful gRPC server shutdown on critical errors
- config is now stored at install time to avoid having to download it at
  install time and at boot time
- upgrades now use the local config instead of downloading it
- the upgrade API's preserve option takes precedence over the config's
  install force option

Additionally, this pulls various packes in under machined to make the
code easier to navigate.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-04-28 08:20:55 -07:00
Andrew Rynhard
d34e9f0984 fix: pass dev path to mkfs
This fixes a bug caused by a missing device argument to `mkfs.xfs`.
Without a device, `mkfs.xfs` will error out. Additionally, this ensures
that the installer container is started with the `kmsg` writer that
ensures logs are formatted correctly for `/dev/kmsg`. Without this we
lose a lot of the logs output by the container, one of them being any
error from `mkfs.xfs`

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-04-21 16:30:35 -07:00
Andrew Rynhard
3791fb5cbc refactor: use upstream bootkube
This moves us off of our bootkube fork and makes us of our bootkube
plugin.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-04-21 10:27:01 -07:00
Spencer Smith
3a4eaeeef0 feat: upgrade kubernetes to 1.18
This PR will pull in the latest release of k8s 1.18 so we can start
validating it through our test suite.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2020-03-26 14:59:43 -04:00
Andrew Rynhard
eba80b453f feat: update bootkube
This brings in the latest changes from our fork of bootkube. One thing
to note is a fix that stops the pod controller cache object.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-03-24 15:53:08 -07:00
Spencer Smith
3485ea9f09 fix: update k8s to 1.17.3
This PR will update k8s to v1.17.3 to address CVEs mentioned in https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!topic/kubernetes-security-announce/2UOlsba2g0s

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2020-03-23 17:08:52 -07:00
Spencer Smith
2f4ccfda9a fix: respect dns domain from machine config
BREAKING CHANGE: This PR fixes a bug where we were only passing `cluster.local` to the
kubelet configuration. It will also pull in a new version of the
bootkube fork to ensure that custom domains got propogated down to the
API Server certs, as well as the CoreDNS configuration for a cluster.

Existing users should be aware that, if they were previously trying to
use this option in machine configs, that an upgrade will may break
their cluster. It will update a kubelet flag with the new domain, but
CoreDNS and API Server certs will not change since bootkube has already
run. One option may be to change these values manually inside the
Kubernetes cluster. However, it may prove easier to rebuild the cluster
if necessary.

Additionally, this PR also exposes a flag to `osctl config generate`
to allow tweaking this domain value as well.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2020-03-20 12:28:17 -04:00
Andrey Smirnov
7e136fee67 chore: update Firecracker Go SDK to the official release
This release includes all the fixes we upstreamed before.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-03-18 01:11:46 +03:00
Andrey Smirnov
d5d3035c8c test: enable upgrade tests 0.4.x -> latest
With the fix #1904, it's now possible to upgrade 0.4.x with
`machine.File` extra files (caused by registry mirror for
registry.ci.svc).

Bump resources for upgrade tests in attempt to speed it up.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-02-26 00:09:32 +03:00
Andrew Rynhard
64b5b32732 refactor: use go-procfs
This makes use of the external procfs pacakge that is based on the
pacakge we are removing here.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-02-19 15:58:57 -08:00
Andrew Rynhard
63ca83a02c feat: support sending machine info
This allows users to specify well known query parameters in `talos.config`.
The only supported parameter in this change is `uuid`. This will send
the node's UUID determined from SMBIOS along with the request for the
config.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-02-19 13:15:28 -08:00
Andrew Rynhard
fe7847e0b8 feat: add reboot flag to reset API
This adds the ability to automatically reboot a machine after a reboot.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-02-19 05:10:58 -08:00
Andrey Smirnov
33332f4c74 chore: support bootloader emulation in firecracker provisioner
Firecracker launches tries to open VM disk image before every boot,
parses partition table, finds boot partition, tries to read it as FAT32
filesystem, extracts uncompressed kernel from `bzImage` (firecracker
doesn't support `bzImage` yet), extracts initramfs and passes it to
firecracker binary.

This flow allows for extended tests, e.g. testing installer, upgrade and
downgrade tests, etc.

Bootloader emulation is disabled by default for now, can be enabled via
`--with-bootloader-emulation` flag to `osctl cluster create`.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-02-13 23:21:37 +03:00
Andrey Smirnov
76c2038b13 chore: implement loadbalancer for firecracker provisioner
This PR contains generic simple TCP loadbalancer code, and glue code for
firecracker provisioner to use this loadbalancer.

K8s control plane is passed through the load balancer, and Talos API is
passed only to the init node (for now, as some APIs, including
kubeconfig, don't work with non-init node).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-02-13 23:07:13 +03:00
Andrey Smirnov
4950f35440 chore: use upstream version of Firecracker Go SDK
With all our PRs merged, we can switch back to upstream version. No tag
yet, so we have to follow `master` for now.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-02-04 08:59:39 -08:00
Spencer Smith
e27b0cbfdb chore: update bootkube
This PR updates the talos branch of bootkube to add extraArgs to
bootstrap controlplane components as well.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2020-01-31 11:38:34 -08:00
Spencer Smith
ff393f8ae3 chore: update bootkube fork
This PR will pull in the latest of our bootkube fork and fix a bug with
extraArgs.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2020-01-31 09:39:43 -05:00
Andrey Smirnov
fae5e6915d chore: rework firecracker code around upstream Go SDK + PRs
This removes use of private fork with custom `ip=` kernel argument
handling and switches fully to upstream version of it.

Firecracker Go SDK version is `master` + following PRs:

* https://github.com/firecracker-microvm/firecracker-go-sdk/pull/167
* https://github.com/firecracker-microvm/firecracker-go-sdk/pull/177
* https://github.com/firecracker-microvm/firecracker-go-sdk/pull/178

MTU handling support was implemented as well.

Changes:

* hostname to each node is passed via `talos.hostname=` kernel arg
* IP configuration is generated by SDK from CNI result
* fixed bugs with wrong netmask
* nameservers & MTU is passed via Talos config

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-01-29 02:35:15 +03:00
Spencer Smith
aabd46e651 fix: re-enable control plane flags
This PR aims to fix the ability to pass extra flags to control plane
components. This will close #1523

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2020-01-23 14:28:02 -05:00
Andrew Rynhard
f3623d22b0 refactor: use tls.Config as client credentials
The `client.Creds` struct was not used very often, and made using the
`client.NewClient` function impossible to use in combination with the
`RemoteRenewingFileCertificateProvider`. This modifies
`client.NewClient` to accept a `tls.Config` instead of `client.Creds`,
allowing for the use of `RemoteRenewingFileCertificateProvider` with
`client.NewClient`.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-01-21 17:10:07 -08:00
Spencer Smith
1368bfa451 chore: update bootkube config to include cluster name
This PR will add the new cluster name field to our bootkube options.
This allows for the generated kubeconfig to include the context-name for
the default context.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2020-01-21 16:56:57 -05:00
Andrey Smirnov
2bf8540855 test: provision Talos clusters via Firecracker VMs
This is initial PR to push the initial code, it has several known
problems which are going to be addressed in follow-up PRs:

1. there's no "cluster destroy", so the only way to stop the VMs is to
`pkill firecracker`

2. provisioner creates state in `/tmp` and never deletes it, that is
required to keep cluster running when `osctl cluster create` finishes

3. doesn't run any controller process around firecracker to support
reboots/CNI cleanup (vethxyz interfaces are lingering on the host as
they're never cleaned up)

The plan is to create some structure in `~/.talos` to manage cluster
state, e.g. `~/.talos/clusters/<name>` which will contain all the
required files (disk images, file sockets, VM logs, etc.). This
directory structure will also work as a way to detect running clusters
and clean them up.

For point number 3, `osctl cluster create` is going to exec lightweight
process to control the firecracker VM process and to simulate VM reboots
if firecracker finishes cleanly (when VM reboots).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-01-16 00:27:08 +03:00
Brad Beam
95666900a7 fix: Update bootkube to include node ready check
This ensures bootkube waits until all pods and nodes are ready before tearing
down the bootstrap control plane.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2020-01-14 16:51:53 -06:00
Andrew Rynhard
5cac4f5f39 fix: set kube-dns labels
This updates the CoreDNS to use the 'kube-dns' naming convention. This
naming convention is used throughout the Kubrnetes documentation. This
also fixing the kube-dns service. The label label selector was wrong,
breaking cluster DNS.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-01-09 18:29:35 -08:00
Andrew Rynhard
79878c1d8d feat: enable DynamicKubeletConfiguration
This moves to using the KubeletConfiguration instead of flags to the
kubelet. It also enables DynamicKubeletConfiguration, which allows users
to configure kubelets using a ConfigMap.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-01-08 16:06:44 -08:00
Brad Beam
0742e5245a feat: Upgrade bootkube
This brings in the changes to run controller manager and scheduler as
a daemonset.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2020-01-08 15:41:37 -08:00
Andrey Smirnov
0081ac5fac refactor: extract Talos cluster provisioner as common code
This extracts Docker Talos cluster provisioner as common code
which might be shared between `osctl cluster` and integration-test.

There should be almost no functional changes.

As proof of concept, abstract cluster readiness checks were implemented
based on provisioned cluster state. It implements same checks as
`basic-integration.sh` in pure Go via Talos/K8s clients.

`conditions` package was promoted from machined-internal to
`internal/pkg` as it is used to run the checks.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-12-27 12:14:19 -08:00
Brad Beam
da88d7bcb3 fix(networkd): Make better route scoping decisions
This brings in an updated library along with some tweaks on our side to allow for
better decision making when it comes to the scope of routes. This also fixes an
issue where multiple configuration definitions for an interface were not properly
merged and instead were overwritten.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-12-23 09:43:14 -08:00
Andrey Smirnov
ecf68ab417 chore: pull in latest version of grpc-proxy
This pulls in talos-systems/grpc-proxy#5.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-12-21 13:02:31 -08:00
Andrew Rynhard
31baa14e36 feat: add support for tftp download
This adds support for downloading the machine config over TFTP. This
will allow users to avoid having to setup an HTTP server, and use
whatever they are using for PXE.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-12-18 09:28:38 -08:00
Andrey Smirnov
c24ce2fd5f feat: humanize timestamp and size in osctl list output
Fixes #1565

Examples:

```
$ osctl list -l
MODE          SIZE(B)   LASTMOD           NAME
drwxr-xr-x    4096      Dec 17 16:37:19   .
-rwxr-xr-x    0         Dec 17 16:37:19   .dockerenv
drwxr-xr-x    4096      Dec 17 16:35:20   bin
drwxr-xr-x    4096      Dec 17 16:37:20   boot
drwxr-xr-x    5480      Dec 17 16:37:19   dev
drwxr-xr-x    4096      Dec 17 16:37:19   etc
drwxr-xr-x    4096      Dec 17 16:35:19   lib
drwxr-xr-x    4096      Dec 17 16:35:21   mnt
drwxr-xr-x    4096      Dec 17 16:39:17   opt
dr-xr-xr-x    0         Dec 17 16:37:19   proc
drwxr-x---    4096      Dec  5 06:39:44   root
drwxr-xr-x    4096      Dec 17 16:39:06   run
drwxr-xr-x    4096      Dec 17 16:35:20   sbin
dr-xr-xr-x    0         Dec 17 16:37:19   sys
dtrwxrwxrwx   4096      Dec 17 16:38:05   tmp
drwxr-xr-x    4096      Dec 17 16:35:21   usr
drwxr-xr-x    4096      Dec 17 16:37:19   var
```

```
$ osctl list -lH
MODE          SIZE(B)   LASTMOD          NAME
drwxr-xr-x    4.1 kB    18 minutes ago   .
-rwxr-xr-x    0 B       18 minutes ago   .dockerenv
drwxr-xr-x    4.1 kB    20 minutes ago   bin
drwxr-xr-x    4.1 kB    18 minutes ago   boot
drwxr-xr-x    5.5 kB    18 minutes ago   dev
drwxr-xr-x    4.1 kB    18 minutes ago   etc
drwxr-xr-x    4.1 kB    20 minutes ago   lib
drwxr-xr-x    4.1 kB    20 minutes ago   mnt
drwxr-xr-x    4.1 kB    16 minutes ago   opt
dr-xr-xr-x    0 B       18 minutes ago   proc
drwxr-x---    4.1 kB    1 week ago       root
drwxr-xr-x    4.1 kB    16 minutes ago   run
drwxr-xr-x    4.1 kB    20 minutes ago   sbin
dr-xr-xr-x    0 B       18 minutes ago   sys
dtrwxrwxrwx   4.1 kB    17 minutes ago   tmp
drwxr-xr-x    4.1 kB    20 minutes ago   usr
drwxr-xr-x    4.1 kB    18 minutes ago   var
```

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-12-17 23:21:28 +03:00
Andrew Rynhard
2c4c4c8d47 chore: update client-go
This updates client-go to use the new tagging from upstream Kubernetes.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-12-10 17:15:56 -08:00
Brad Beam
9584b47cd7 feat: Upgrade kubernetes to 1.17.0
Primarily doc/constant changes.

Added additionnal bits to `docs` target in makefile to generate osctl
docs as well as config files. Explicitly define a HOME variable so we
get consistent home directories for talosconfig variables in our docs.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-12-10 16:03:35 -08:00
Andrew Rynhard
09fbe2d9ad feat: add security hardening settings
This pulls in an update from our bootkube fork that adds security
hardening to the control plane. The following was changed:

- API server now uses an EncryptionConfig for encrypting secrets
- API server now has an audit policy
- Profiling was disabled on all control plane components
- PodSecurityPolicy is enabled
- API server TLS cipher suites were set to the recommended ciphers by CIS

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-12-09 15:26:26 -08:00
Andrew Rynhard
fa515b8117 fix: kill POD network mode pods first on upgrades
When we upgrade a node, we kill off all pods before performing a fresh
install. The issue with this is that we run the risk of killing the CNI
pod before we finish killing all other pods, leaving the CRI unable to
teardown the pod's networking. This works around that by first killing
any pods running without host networking so that the CNI can do its'
job, and then removing the remaining pods.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-12-09 13:45:31 -08:00
Spencer Smith
92b5bd9b2b feat: allow ability to specify custom CNIs
This PR will allow users to specify one or many URLs for CNI so that
they can bypass bootkube deploying flannel and bring their own. Will
close #1593

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2019-12-06 15:27:36 -05:00
Andrew Rynhard
66f1355b10 chore: update containerd client version
This aligns the containerd version we use as a client witht the version
of the daemon.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-12-05 13:48:03 -08:00
Andrew Rynhard
21c4aa8aa6 feat: enable webhook authorization mode
This moves to using Webhook mode instead of the default AlwaysAllow for
kubelet API authorization.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-12-01 17:34:00 -08:00
Andrew Rynhard
6ab9877b72 chore: update bootkube
This brings in changes from upstream bootkube. It fixes an issue with
the pod-checkpointer that would cause the pod-checkpointer to fail if
the kubelet's read-only port were disabled. It also adds a dedicated
certificate for the API server's `kubelet-client-*` args, which will allow the
usage of the `authentication-token-webhook` flag in the kubelet.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-12-01 14:00:50 -08:00
Andrey Smirnov
5b7bea2471 feat: use grpc-proxy in apid
This replaces codegen version of apid proxying with
talos-systems/grpc-proxy based version. Proxying is transparent, it
doesn't require exact information about methods and response types. It
requires some common layout response to enhance it properly with node
metadata or errors.

There should be no signifcant changes to the API with the previous
version, but it's worth mentioning a few changes:

1. grpc.ClientConn is established just once per upstream (either local
service or remote apid instance).

2. When called without `-t` (`targets`), apid proxies immediately down
to local service skipping proxying to itself (as before), which results
in empty node metadata in response (before it had local node IP). Might
revert this later to proxy to itself (?).

3. Streaming APIs are now fully supported with multiple targets, but
message definition doesn't contain `ResponseMetadata`, so streaming APIs
are broken now with targets (needs a fix).

4. Errors are now returned as responses with `Error` field set in
`ResponseMetadata`, this requires client library update and `osctl` to
handle it properly.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-11-29 22:57:25 +03:00
Andrew Rynhard
aaefcbd891 fix: recover control plane on reboot
This brings in a patched version of the pod-checkpointer. It fixes a bug
that prevented the static pod-checkpointer from being scheduled,
preventing recovery of the control plane on a reboot of all control
plane nodes.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-11-27 10:47:51 -08:00
Andrew Rynhard
48d5aac0fc feat: enable aggregation layer
This moves to using our official bootkube repo. The latest changes in
the branch we are using enables the aggregation layer. This should fix
our conformance.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-11-27 08:40:58 -08:00
Brad Beam
119bf3e7bb feat(networkd): Add support for bonding
This includes a healthy refactor of the networkd code as well.
- Move netlink functionality to nic package
- Networkd facilitates the orchestration of the underlying interface configuration
- Networkd now stores the state of each interface configuration. This
  should allow us to expose this information via api in the future.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-11-26 20:08:31 -08:00
Andrew Rynhard
127fa54a76 docs: add docs command to osctl
This allows osctl to generate documentation for itself.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-11-13 17:14:53 -08:00
Andrew Rynhard
9218fa8b21 fix: upgrade rtnetlink package
This fixes an issue with our initial networking setup. The latest
package version is needed.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-11-11 18:17:06 -08:00
Brad Beam
bc6582e118 chore: Move back to official procfs repo
Code was merged upstream and a release cut, so we dont need to use my fork anymore.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-11-11 12:51:10 -06:00
Brad Beam
531e7d8144 feat: Add meminfo api
Add ability to retrieve node memory stats ( /proc/meminfo ).

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-11-10 21:02:43 -06:00
Andrew Rynhard
3c6d0135d0 feat: upgrade Kubernetes to 1.16.2
This brings in 1.16.2 modules and bumps the default hyperkube image.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-10-30 06:35:12 -07:00
Spencer Smith
d0111fe617 feat: allow specifcation of full url for endpoint
This PR moves to using the full URL for endpoint instead of trying to
hardcode 6443 in various places like we were doing.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2019-10-16 13:45:05 -04:00
Andrew Rynhard
d430a37e46 refactor: use go 1.13 error wrapping
This removes the github.com/pkg/errors package in favor of the official
error wrapping in go 1.13.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-10-15 22:20:50 -07:00
Andrew Rynhard
d9287cdfb5 fix: set kubelet-preferred-address-types to prioritize InternalIP
When creating docker based clusters, we need to use `InternalIP` for
kubelet connections. The default is
`Hostname,InternalDNS,InternalIP,ExternalDNS,ExternalIP`, but
`Hostname` doesn't work in docker because we don't depend on docker for
DNS.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-10-09 09:38:13 -07:00
Andrew Rynhard
9ff31cd5d9 fix: update bootkube fork to fix pod-checkpointer
This brings in an updated version of our fork so that pod-checkpointer
will run properly.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-10-08 18:39:04 -07:00
Andrew Rynhard
b29391f0be feat: use bootkube for cluster creation
This replaces kubeadm with bootkube.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-10-07 17:17:57 -07:00
Andrew Rynhard
c44f7669e5 feat: allow Kubernetes version to be configured
This allows for users to specifify which version of Kubernetes to use.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-09-27 17:12:27 -07:00
Andrew Rynhard
82c706a0fb feat: upgrade Kubernetes to v1.16.0
Brings in Kubernetes v1.16.0.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-09-19 20:19:29 -07:00
Andrew Rynhard
6efd6fbe08 chore: move gRPC API to public
In order for other projects to make use of our APIs, they must not
reside underneath the internal directory. This moves the protobuf
definitions to a top-level "api" directory and scopes them according to
their domain. This change also removes generated code from the gitignore
file so that users don't have to generate the code themseleves.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-09-19 08:55:13 -07:00
Andrew Rynhard
ab4e058489 feat: upgrade Kubernetes to v1.16.0-rc.2
This brings in the release candidate for Kubernetes v1.16.0.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-09-16 14:56:55 -07:00
Andrey Smirnov
c2176ee0fa chore: update github.com/stretchr/testify library to 1.4.0
New release comes with bugfixes (we got some of them integrated for
not tagged release), and few interesting new assertions, including
`Eventually` for polling.

See: https://github.com/stretchr/testify/milestone/2?closed=1

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-09-16 19:06:47 +03:00
Andrew Rynhard
75746266ce feat: upgrade Kubernetes to v1.16.0-rc.1
This brings in the latest RC of 1.16.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-09-12 20:20:48 -07:00
Andrey Smirnov
3012851208 fix(machined): limit max stderr output, use pkg/cmd consistently
Use circular buffer instead of (unlimited) `bytes.Buffer` to limit
amount of stderr output captured. If command being run produces too much
output on stderr, this might consume too much RAM.

Use `pkg/cmd` to run command in `udevd` service. This should allow
easier udevd integration.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-09-02 19:01:15 -07:00
Spencer Smith
739e232896 feat: upgrade kubernetes to v1.16.0-beta.1
This PR will upgrade to the latest beta of v1.16 in order to get us
closer to catching the v1.16.0 release as soon as it drops.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2019-08-27 13:25:33 -04:00
Andrew Rynhard
0bdaff1a90 feat: perform upgrades via container
This moves to performing upgrades via a container.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-27 09:44:50 -07:00
Brad Beam
cdc989ddda refactor(networkd): Switch from rtnetlink to rtnl
Gives a better abstraction on rtnetlink interaction

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-08-21 13:24:51 -05:00
Brad Beam
313c118ad0 refactor(networkd): Replace networkd with a standalone app
This is a major rewrite of our network subsystem.

- This changes networkd to run as a standalone app versus internal goroutine
- This changes out the netlink package with the more idiomatic netlink/rtnetlink
  packages
- This changes the initial network bootstrap/discovery from using a single
  interface to attempting to bring up all interfaces
- This moves us back on to the upstream dhcp library

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-08-21 13:24:51 -05:00
Andrew Rynhard
09693a26c9 chore: update go modules to use Kubernetes v1.16.0-alpha.3
This is not ideal, but it works. We essentially need to start using
replace statements in order to pull in the modules we need.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-14 15:34:09 -07:00
Andrew Rynhard
e63c882b89 refactor: split machined into phases
This change aims to standardize the boot process. It introduces the
concept of a phase, which is comprised of tasks. Phases are ran in serial and
the tasks that make up a phase are ran concurrently.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-07-29 12:40:03 -07:00
Andrew Rynhard
0ec17e4169 feat: run rootfs from squashfs
This change moves the rootfs to a squashfs image.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-07-25 08:38:31 -07:00
Spencer Smith
ff9934cfe2 chore: update toolchain version and output created config files
Decided to combine two very small changes (which I'm now grumpy at myself for doing).

First, we'll update the toolchain image versions to allow for the use of a new containerd and runc. Also updated go.mod and go.sum to make use of newer containerd version. Closes #743 and #744.

Second, I added the bit of logic to osctl config generate to determine the working directory and let the user know that we created the various yaml files there. Closes #760.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2019-07-05 17:59:25 -04:00
Andrey Smirnov
ab95261bd8 chore: update stretchr/testify to master version (#832)
This fixes the panic stackstraces not being printed, e.g.
https://github.com/stretchr/testify/issues/771

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-07-04 10:04:52 -07:00
Andrey Smirnov
237e903f91 feat(osd): implement CRI inspector for containers (#817)
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-07-02 15:48:00 -07:00
Andrew Rynhard
85afe4f828
feat: use eudev for udevd (#780)
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-06-25 19:25:57 -07:00
Andrew Rynhard
ebc725afa6
feat: add support for upgrading init nodes (#761)
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-06-24 15:25:32 -07:00
Seán C. McCord
81163cefb4 feat(osd): extend Routes API (#756)
Signed-off-by: Seán C McCord <ulexus@gmail.com>
2019-06-22 08:03:13 -07:00
Andrey Smirnov
ce1103d227 chore: tidy modules and verify module tidyness on build (#757)
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-06-21 21:18:08 -07:00
Andrew Rynhard
b330d3b778
feat: leave etcd before upgrading (#702)
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-05-31 10:59:12 -07:00
Andrew Rynhard
ea4d3c4f66
feat: add bootstrap token package (#657)
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-05-15 14:54:20 -07:00
Andrew Rynhard
92fb18e3ea
feat: use github.com/mdlayher/kobject (#653)
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-05-15 11:18:08 -07:00
Brad Beam
0b33280915
feat(init): Add upgrade endpoint (#623)
Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-05-13 15:15:25 -05:00
Andrew Rynhard
08789a0b8c
feat: update toolchain (#628)
This toolchain brings in:
- Linux v4.19.40
- Musl v1.1.22
- Golang v1.12.5
- Protobuf v3.7.1
- Golang-protobuf v1.3.1

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-05-09 05:36:13 -07:00
Brad Beam
a5d31d97ff
feat: Validate userdata (#593)
* feat: Validate userdata

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-05-02 13:10:16 -05:00
Andrew Rynhard
9b4fec0fa8
feat(osctl): add ability to create docker based clusters (#584)
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-04-28 12:06:03 -07:00
Andrew Rynhard
20662217a2
feat: add ability to generate userdata secrets (#581)
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-04-26 20:56:40 -07:00
Brad Beam
1a5be8da47
osctl top enhancements (#568)
* feat(osctl): Automatic sizing of top window

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>

* feat(osctl): Format top output in proper columns

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>

* feat(osctl): Add sort by cpu/rss options

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>

* feat(osctl): Add ability to run once (no gui)

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-04-24 16:44:57 -05:00
Brad Beam
3f358b12ae
feat(osctl): Add osctl top (#560)
Also adds pkg/proc as the backing package for top data

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-04-23 21:25:41 -05:00
Andrew Rynhard
a106e42657
feat: upgrade containerd to v1.2.6 (#532)
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-04-14 10:25:03 -07:00
Andrew Rynhard
ae9e6ac282
feat: upgrade Kubernetes to v1.14.1 (#530)
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-04-14 07:20:34 -07:00
Andrey Smirnov
c24f1531cb chore: refactor container image import code to avoid panics (#518)
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-04-10 09:57:03 -07:00
Andrew Rynhard
57bf1abaf5
chore: upgrade DHCP package (#481)
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-04-03 19:51:19 -07:00
Andrew Rynhard
e18b5086a9
chore: update org to new name (#480)
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-04-03 18:29:21 -07:00
Andrew Rynhard
50253b806a
feat: upgrade Kubernetes to v1.14.0 (#466)
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-03-28 17:39:26 -07:00
Andrew Rynhard
30774fc3f0
feat: upgrade containerd to v1.2.5 (#463)
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-03-24 20:52:27 -07:00
Andrew Rynhard
2e9a7ec0c5
feat: add power off functionality (#462)
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-03-24 20:21:41 -07:00
Brad Beam
3693cff14f feat: add basic ntp implementation (#459)
Signed-off-by: Brad Beam <brad.beam@b-rad.info>
2019-03-23 15:58:13 -07:00
Andrew Rynhard
7a93c97b98
chore: update go modules (#429)
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-03-04 18:36:09 -08:00
Spencer Smith
ee232b8f9a feat: add DHCP client (#427)
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-02-27 07:58:37 -08:00
Andrew Rynhard
9e947c3fa5
feat: add automated PKI for joining nodes (#406)
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-02-23 23:17:56 -08:00
Andrew Rynhard
b963f5a982
feat: upgrade containerd to v1.2.4 (#395)
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-02-19 21:51:38 -08:00
Spencer Smith
85e35d30b4 feat: add gcloud integration (#385)
Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2019-02-19 08:41:41 -08:00
Andrew Rynhard
1219ae7934
feat: upgrade Kubernetes to v1.13.3 (#335)
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-02-05 20:11:39 -08:00
Andrew Rynhard
97f874b6f9
chore: add travis config (#321)
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-01-23 15:48:02 -08:00
Andrew Rynhard
16416da446
chore: update go packages (#324)
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-01-22 23:40:04 -08:00
Andrew Rynhard
5cadd83aea
feat: upgrade Kubernetes to v1.13.2 (#319)
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-01-18 07:47:11 -08:00
Andrew Rynhard
a2b2e7e50c
feat: upgrade containerd to v1.2.2 (#318)
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-01-18 07:14:06 -08:00
Andrew Rynhard
94b011c724
refactor: use containerd exported defaults (#310)
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-01-15 20:29:13 -08:00
Andrew Rynhard
25fca3d68d
feat: import core service containers from local store (#309)
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-01-15 18:46:41 -08:00
Andrew Rynhard
ee226dddac
chore: enforce commit and license policies (#304)
Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-01-13 16:10:49 -08:00
Andrew Rynhard
42b722b0eb
feat: add filesystem probing library (#298) 2018-12-24 07:42:30 -08:00
Andrew Rynhard
72eb1b34f5
chore: use buildkit for builds (#295) 2018-12-19 22:22:05 -08:00
Andrew Rynhard
9191fcdb67
chore: remove unused go module files (#159) 2018-10-14 20:24:36 -07:00
Andrew Rynhard
92ef60222e
feat(*): upgrade all core components (#153)
This commit introduces the following upgrades:
    Linux: v4.18.13
    Containerd: v1.2.0-rc.1
    Kubernetes: v1.12.1
    Docker: v18.06.1-ce
    Sonobuoy: v0.12.0
2018-10-13 09:30:17 -07:00