605 Commits

Author SHA1 Message Date
Andrew Rynhard
10b6202c4f refactor: improve metal platform
This brings in a few minor improvements to the metal platform. The first
is to use talos.config=metal-iso to indicate that the machine's config
can be found in an ISO image. The second is a fix to ensure that /mnt
exists.

This adds support for creating more than one node using the qemu-boot.sh
script.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-10-14 22:05:56 -07:00
Andrew Rynhard
fef151748b feat: use the unified pkgs repo artifacts
This moves to using a single revision of pkgs. It includes a few
changes:

- kernel with KVM host support
- containerd v1.3.0

This change brings in a kernel with host KVM support. This will allow us
to use VMs within Talos for things like integrations tests. This also
allows users to do things with KVM as they see fit.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-10-14 07:18:17 -07:00
Spencer Smith
ee1b256e0f feat: add external IP discovery for azure
This PR will add the ability to query metadata servers in azure to fetch
external IPs. Needed to ensure certs get generated with proper cert SANs

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2019-10-10 16:57:44 -04:00
Andrew Rynhard
92de30715e feat: add retry package
This package provides a consistent way for us to retry arbitrary logic.
It provides the following backoff algorithms:

- exponential
- linear
- constant

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-10-10 13:11:02 -07:00
Andrey Smirnov
c2cb0f9778 chore: enable 'wsl' linter and fix all the issues
I wish there were less of them :)

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-10-10 01:16:29 +03:00
Andrey Smirnov
bb5f5cc754 chore: bump golangci-lint to 1.20
Memory usage reduced around 8-10x: now it stays stable at 1GB.

I disabled some of the new linters, and one rule which is violated a
lot.

I might make sense to go back and enable `wsl` fixing all the issues
(leaving that for another PR).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-10-09 22:21:08 +03:00
Andrew Rynhard
b29391f0be feat: use bootkube for cluster creation
This replaces kubeadm with bootkube.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-10-07 17:17:57 -07:00
Andrew Rynhard
4ae8186107 feat: add configurator interface
This moves from translating a config into an internal config
representation, to using an interface. The idea is that an interface
gives us stronger compile time checks, and will prevent us from having to copy
from on struct to another. As long as a concrete type implements the
Configurator interface, it can be used to provide instructions to Talos.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-10-04 07:53:09 -07:00
Brad Beam
3ba04cb67b feat: Discover platform external addresses
This introduces the functionality for discovering external addresses configured on an intance.
This allows us to automatically append these external addresses to our certificate SANs so we can
access the machines from these addresses without having to know about them ahead of time.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-10-01 20:34:16 -05:00
Andrew Rynhard
6ec5cb02cb refactor: decouple grpc client and userdata code
This detangles the gRPC client code from the userdata code. The
motivation behind this is to make creating clients more simple and not
dependent on our configuration format.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-09-26 14:18:53 -07:00
Andrew Rynhard
607d68008c feat: use kubeadm to distribute Kubernetes PKI
This removes the trustd-based PKI distribution method in favor of
kubeadm's method.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-09-25 11:13:07 -07:00
Andrew Rynhard
f244673856 feat: write audit policy instead of using trustd
This changes the controlplane logic to write the audit policy to disk
from a common template instead of using trustd to distribute it.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-09-25 10:12:31 -07:00
Andrew Rynhard
4ff8824182 feat: add aescbcEncryptionSecret field to machine config
This change allows us to generate the EncryptionConfig on each
controlplane node. The benefit is that we no longer need to distibute
the EncryptionConfig via trustd.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-09-25 09:41:20 -07:00
Andrew Rynhard
8f10647d3f fix: set extra kernel args for all platforms
This change ensures that the installer has access to the machine config
so that it can set the extra kernel arguments when installing.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-09-23 11:50:13 -07:00
Andrew Rynhard
82c706a0fb feat: upgrade Kubernetes to v1.16.0
Brings in Kubernetes v1.16.0.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-09-19 20:19:29 -07:00
Andrew Rynhard
ab4e058489 feat: upgrade Kubernetes to v1.16.0-rc.2
This brings in the release candidate for Kubernetes v1.16.0.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-09-16 14:56:55 -07:00
Andrey Smirnov
7d8c40e3aa chore: randomize containerd namespace in tests
Looks like containerd creates shim file sockets in Linux abstract
namespace which are fixed (don't depend on containerd root directory)
and depend on container namespace and id. So if two containerd instances
on the same host run same namespace/id pair, that is going to create a
conflict on that shim filesocket.

Avoid that by randomizing namespace name. CRI tests should be fine as
namespace is fixed, but container ID is random.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-09-13 23:56:40 +03:00
Andrew Rynhard
75746266ce feat: upgrade Kubernetes to v1.16.0-rc.1
This brings in the latest RC of 1.16.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-09-12 20:20:48 -07:00
Andrew Rynhard
5ee554128e chore: move from gofumpt to gofumports
The gofumports does everything that gofumpt does with the addition of
formatting imports. This change proposes the use of the `-local` flag so
that we can have imports separated in the following order:

- standard library
- third party
- Talos specific

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-09-12 07:49:12 -07:00
Andrew Rynhard
2955428850 chore: format code with gofumpt
The gofumpt linter is a stricter drop-in replacement for gofmt. The
rules are ones that I strongly agree with and I think it would be better
if we added this linter instead of nit picking every PR.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-09-11 11:03:29 -07:00
Andrew Rynhard
bf16b1e916 chore: remove invalid TODO
This TODO no longer applies. We have setteled on a fixed boot size. This
also removes variables no longer needed.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-09-10 10:53:36 -07:00
Andrew Rynhard
298ddc8f49 fix: enable slub_debug=P
This is the last KSPP kernel parameter we need to be compliant with KSPP
guidelines.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-09-10 10:53:19 -07:00
Spencer Smith
473df84cf6 fix: move to per-platform console setup
This PR will make sure that each platform gets the console settings it
needs by setting them as extra flags in the makefile. This should ensure
that we have console logs flowing properly for each cloud.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2019-09-10 07:50:34 -07:00
Seán C McCord
bcb6a2d3a5 fix: prepend custom options for kernel commandline
Added a decomposition option to the kernel.NewDefaultCmdline() so that
the Defaults can be added _after_ constructing a custom commandline.
This is then implemented for `osctl install`.

Fixes #1128

Signed-off-by: Seán C McCord <ulexus@gmail.com>
2019-09-08 16:58:49 -07:00
Seán C McCord
f7ad24ec4f feat: allow network interface to be ignored
Added a property to userdata to allow a network interface to be ignored,
such that Talos will perform no operations on it (including DHCP).

Also added kernel commandline parameter (talos.network.interface.ignore)
to specify a network interface should be ignored.

Also allows chaining of kernel cmdline parameter Contains() where the
parameter in question does not exist.

Fixes #1124

Signed-off-by: Seán C McCord <ulexus@gmail.com>
2019-09-07 16:33:52 -07:00
Andrew Rynhard
db78ed93ec fix: set default install image
This sets the default install image just before installation. It was
erroneously placed in the boot verification.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-09-04 11:48:23 -07:00
Andrew Rynhard
d098785a17 chore: remove local upgrade functionality
We have no need for this anymore since installs and upgrades are now
completely handled in a container.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-27 18:44:18 -07:00
Andrew Rynhard
d4770d41ad feat: run installs via container
This moves to performing installs via a container.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-27 15:01:20 -05:00
Andrew Rynhard
0bdaff1a90 feat: perform upgrades via container
This moves to performing upgrades via a container.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-27 09:44:50 -07:00
Spencer Smith
6f8e089271 chore: use kubeadm v1beta2 structs everywhere
This PR will move to using the external kubeadm v1beta2 structs for our
code base. This will hopefully allow for more stable integrations with
kubeadm in the long term, as well as solve some needs we have in the
machine config rewrite.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2019-08-26 12:07:36 -04:00
Andrew Rynhard
be8f58c15d feat: add overlay task
This adds a well defined task for handling all overlay mount points that
are required by the system.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-25 10:47:54 -07:00
Andrew Rynhard
1eb02875c2 feat: use BLKPG ioctl for partition events
This moves to using BLKPG ioctl instead of BLKRRPART. BLKRRPART is older
and more sensitive to EBUSY errors. BLKPG has the potential to minimize
the changes of encountering an EBUSY error when manipulating partition
tables.

In looking at a comparison between BLKPG and BLKRRPART, it seems that
both have their pros and cons. Eventually a combination of the two may
serve us better, but for now I think BLKPG will get us further.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-25 07:55:24 -07:00
Brad Beam
313c118ad0 refactor(networkd): Replace networkd with a standalone app
This is a major rewrite of our network subsystem.

- This changes networkd to run as a standalone app versus internal goroutine
- This changes out the netlink package with the more idiomatic netlink/rtnetlink
  packages
- This changes the initial network bootstrap/discovery from using a single
  interface to attempting to bring up all interfaces
- This moves us back on to the upstream dhcp library

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-08-21 13:24:51 -05:00
Andrew Rynhard
0af1eba159 refactor: add more runtime modes
In order to DRY up all installation methods and mount methods, this PR
introduces a few more runtime modes. The modes are then used to
determine the strategy for creating and or mounting the paritions.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-19 20:23:45 -07:00
Andrew Rynhard
2e65cff3ce feat: mount /sys/fs/bpf
The BPF filesystem is required to pin BPF objects.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-18 07:37:08 -07:00
Andrew Rynhard
e305acac20 feat: add standardized command runner
This adds a command runner function that can be used everywhere we need
to exec a binary. It adds addtional logic around error handling that
will allow for viewing errors in the case of a failed command.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-17 03:38:36 -07:00
Andrew Rynhard
6940aaf233 fix: verify installation definition
This fixes the possibility of panicing on a nil pointer by running the
verification steps earlier.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-16 09:58:12 -07:00
Brad Beam
46c283b6c9 chore: Disable rate limited kmessage
This should allow for better troubleshooting during early boot/startup

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-08-16 09:36:52 -07:00
Andrew Rynhard
a116145c1b feat: rename DATA partition to EPHEMERAL
This changes the data partition name to something more appropriate. We
chose ephemeral to make it very clear that the disk should not be used
for application data.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-15 08:00:22 -07:00
Andrew Rynhard
09693a26c9 chore: update go modules to use Kubernetes v1.16.0-alpha.3
This is not ideal, but it works. We essentially need to start using
replace statements in order to pull in the modules we need.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-14 15:34:09 -07:00
Seán C McCord
6d22744eca fix: store PartitionName when on NVMe disk
Fixes #978

Signed-off-by: Seán C McCord <ulexus@gmail.com>
2019-08-10 17:10:01 -07:00
Brad Beam
da1f73249f fix(machined): Clean up installation process
This also includes a fix for #955 which had the unintended side effect
of breaking image creation ( since it would attempt to grow the filesystem
always ).

The refactor standardizes around looking for the DATA and ESP labels to
discover any existing installations/filesystems. If none are found, an
installation will proceed -- for both image creation and bare metal.
During bootup, the DATA partition will always attempt to expand/grow.

This also introduces a new phase to verify the installation through the
existance of /boot/installed ( migrated from install stage ).

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-08-08 22:10:14 -05:00
Brad Beam
53b1330c44 fix(initramfs): Allow data partition to grow
This fix ensures that we always grow the data partition during an installation.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-08-07 09:11:02 -05:00
Andrew Rynhard
90c91807bd refactor: restructure the project layout
This change moves packages into more appropriate places.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-01 22:19:42 -07:00
Andrew Rynhard
a9c4a95a4b fix: mount the owned partitions in cloud platforms
This adds the logic for mounting the owned block device and resizing the
ephemeral partition for cloud platforms.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-01 21:48:23 -07:00
Andrew Rynhard
ca35b85300 refactor: improve installation reliability
This change aims to make installations more unified and reliable. It
introduces the concept of a mountpoint manager that is capable of
mounting, unmounting, and moving a set of mountpoints in the correct
order.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-01 11:44:40 -07:00
Spencer Smith
bc5fe085bd fix: set mtu value regardless of interface state
This PR will fix a bug we encountered in GCE, where the interface was
already up and the MTU value wasn't getting set.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2019-07-31 15:02:02 -04:00
Andrew Rynhard
e63c882b89 refactor: split machined into phases
This change aims to standardize the boot process. It introduces the
concept of a phase, which is comprised of tasks. Phases are ran in serial and
the tasks that make up a phase are ran concurrently.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-07-29 12:40:03 -07:00
Andrey Smirnov
f56a9d5b96 chore: implement first version of CRI runner
It runs containers via CRI interface in a pod sandbox. This is the very
first version:  I tried not to introduce any changes to common runner
interface.

There should be some CRI-speficic options for the runner (like polling
interval, as it doesn't have nice `Wait()` API), plus my plan so far is
to use OCI as the common layer for container options, so that we can
analyze OCI and translate to CRI (when possible, return errors when
option is not implemented).

CRI interface doesn't have a concept of 'unpacking' an image, so we
probably need to unpack via containerd API (or any other
runtime-specific API) by targeting CRI namespace.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-07-26 21:07:46 +03:00
Andrew Rynhard
b7a9acbe88 refactor: move setup logic into machined
The responsibility of init should only be to mount the rootfs. This
change moves Talos specific logic into machined. This will allow us to
define a version of Talos in a single binary instead of split across
two. This will enable cleaner upgrades and helps make the codebase
easier to reason about.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-07-26 07:48:49 -07:00