797 Commits

Author SHA1 Message Date
Andrew Rynhard
fd25c019bf chore: fix qemu-boot.sh
Fixes a typo that cased the switch statement to not match Linux
environments.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-24 13:24:24 -07:00
Andrew Rynhard
f5f6c29e99 chore: add QEMU script
This script will help in low-level development.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-24 00:56:12 -07:00
Seán C McCord
7b217c79d7 feat: allow specification of additional API SANs
Adds handler for specification of additional subjet alt names (SANs) for
the API Server when generating a new cluster configuration using
`osctl`.

Fixes #800

Signed-off-by: Seán C McCord <ulexus@gmail.com>
2019-08-21 16:25:54 -07:00
Brad Beam
cdc989ddda refactor(networkd): Switch from rtnetlink to rtnl
Gives a better abstraction on rtnetlink interaction

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-08-21 13:24:51 -05:00
Brad Beam
313c118ad0 refactor(networkd): Replace networkd with a standalone app
This is a major rewrite of our network subsystem.

- This changes networkd to run as a standalone app versus internal goroutine
- This changes out the netlink package with the more idiomatic netlink/rtnetlink
  packages
- This changes the initial network bootstrap/discovery from using a single
  interface to attempting to bring up all interfaces
- This moves us back on to the upstream dhcp library

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-08-21 13:24:51 -05:00
Andrew Rynhard
0af1eba159 refactor: add more runtime modes
In order to DRY up all installation methods and mount methods, this PR
introduces a few more runtime modes. The modes are then used to
determine the strategy for creating and or mounting the paritions.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-19 20:23:45 -07:00
Andrew Rynhard
794c7231f5 feat: run dedicated instance of containerd for system services
In order to facilitate upgrades and resets that are capable of
manipulating the system block device, we need to run an instance of
containerd that has zero dependencies on the disk. We run containerd
purely in memory for running system services.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-19 12:32:59 -07:00
Andrew Rynhard
060498ec87 chore: disable CIS benchmarks
These are failing with false positives. Disable for now so that we can
run our conformance tests.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-19 11:04:15 -07:00
Andrew Rynhard
2e65cff3ce feat: mount /sys/fs/bpf
The BPF filesystem is required to pin BPF objects.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-18 07:37:08 -07:00
Seán C McCord
cb1210719a fix: enclose target in quotes
Fixes issue #1049

Signed-off-by: Seán C McCord <ulexus@gmail.com>
2019-08-17 21:19:10 -07:00
Brad Beam
ec0f188309 fix(machined): Remove host mounts for specific CNI providers
We shouldnt need these anymore

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-08-17 20:20:45 -07:00
Brad Beam
03228c7401 chore(ci): Only push latest tags if branch is master.
Should prevent flakes when we merge fixes on release branches where they unintentionally
get tagged as `latest`.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-08-17 16:03:01 -05:00
Brad Beam
af47edf1ad chore: Make losetup atomic during installation
This should fix a race conditions where two independent image creation steps
run `losetup -f` and discover the same 'next available' loopback device and
attempt to use it.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-08-17 15:23:42 -05:00
Brad Beam
046a8a4ba5 chore: Fix reread error value on retry
Should prevent a flake with returning an error when
it actually succeeded.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-08-17 12:58:22 -07:00
Andrew Rynhard
8c73c38b8a chore: enforce one sentence per line in Markdown files
This is widely considered best practice, we should enforce it.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-17 10:15:27 -07:00
Andrew Rynhard
7970f977b7 chore: add markdownlint
This will give us a standard tool for linting Markdown files.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-17 03:53:52 -07:00
Andrew Rynhard
e305acac20 feat: add standardized command runner
This adds a command runner function that can be used everywhere we need
to exec a binary. It adds addtional logic around error handling that
will allow for viewing errors in the case of a failed command.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-17 03:38:36 -07:00
Brad Beam
a68cac0a94 chore: Retry reread partition table if EBUSY
Should help make it more robust

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-08-17 03:27:13 -07:00
Brad Beam
0dc5551510 docs: Add Azure docs
Add getting started guide for Azure.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-08-16 14:19:47 -07:00
Brad Beam
cf64847772 refactor(proxyd): Update multilisteners to use error chan.
This cleans up the multiple listener implementation.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-08-16 12:21:02 -05:00
Brad Beam
801db9b9b9 chore: Add log message for userdata backoff.
This should make it clearer when the download failed what is going on.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-08-16 10:02:24 -07:00
Andrew Rynhard
6940aaf233 fix: verify installation definition
This fixes the possibility of panicing on a nil pointer by running the
verification steps earlier.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-16 09:58:12 -07:00
Andrew Rynhard
1c7e86ce5c fix: name the serde functions appropriately
This fixes the names of the Serde functions to be descriptive of what
they are actually doing. The serialize and deserialize ideas were
flipped.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-16 09:48:13 -07:00
Brad Beam
76a9c15044 feat: Add gRPC server for ntp
Part of the API refactor; this introduces a gRPC server for ntp.
This allows the ability to query node time and check time against
specific ntp servers.

This refactor also moves the ntp functionality into a sub package for
better project organization.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-08-16 09:46:43 -07:00
Spencer Smith
9d759df9bd chore: move to smaller azure instance type
This PR will save us a little dinero over the course of running e2e
builds in azure. It's only a couple cents per hour difference, but will
shave off a fair amount over the course of a month.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2019-08-16 09:46:17 -07:00
Brad Beam
46c283b6c9 chore: Disable rate limited kmessage
This should allow for better troubleshooting during early boot/startup

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-08-16 09:36:52 -07:00
Brad Beam
6745e6b3bc fix(gpt): Fix partition naming to be >8 characters
There was a bug in the offset calculation for the last field (offset at 72 instead of 128 )
that caused a truncation of the partition name field to only allow for 8 characters/16 bytes
( utf-16 = 2bytes/character ). This last field isnt part of the gpt spec, so we are dropping it

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-08-16 11:09:37 -05:00
Brad Beam
70a478895f feat(proxyd): Add gRPC server
Part of the API refactor; this introduces a gRPC server for proxyd
to expose some of the internal state.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-08-15 16:35:03 -05:00
Andrew Rynhard
a116145c1b feat: rename DATA partition to EPHEMERAL
This changes the data partition name to something more appropriate. We
chose ephemeral to make it very clear that the disk should not be used
for application data.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-15 08:00:22 -07:00
Andrew Rynhard
92452ab981 chore: remove sonobuoy spinner
This is only slowing down the build since we use a remote DB for drone.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-15 05:15:20 -07:00
Brad Beam
249acda74a feat: Allow hostname to be specified in userdata
This sets up the ability to define hostname via userdata. I dont expect
this will get used publicly much, but provides a mechanism to convey
the hostname from various sources internally.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-08-14 22:41:27 -05:00
Andrew Rynhard
48109e9757 chore: apply manifests when init node is ready
If we wait for all masters to check in before applying the PSP, we run
the risk of kube-proxy failing to start for a long period of time.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-14 20:28:34 -07:00
Andrew Rynhard
15408963ac chore: update tools image
This image upgrades golang to v1.12.8 to address security
vulnerabilities.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-14 19:07:52 -07:00
Andrew Rynhard
09693a26c9 chore: update go modules to use Kubernetes v1.16.0-alpha.3
This is not ideal, but it works. We essentially need to start using
replace statements in order to pull in the modules we need.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-14 15:34:09 -07:00
Andrew Rynhard
582298ac0b feat: upgrade Linux to v5.2.8
Updates the kernel image to pull the kernel from.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-14 12:07:10 -07:00
Andrew Rynhard
f18ecca50c chore: use go runner in sonobuoy
This is the recommended fix for waiting on conformance results. Sonobuoy
is returning early even though the --wait flag is specified.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-13 22:26:03 -07:00
Spencer Smith
57d22ef1bb chore: enable floating IP creation in e2e tests
This PR will edit the manifests for e2e so that we can take advantage of https://github.com/talos-systems/cluster-api-provider-talos/pull/47

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2019-08-13 15:23:28 -07:00
Seán C. McCord
c5aef55f89 chore: add kernel parameters doc for bare-metal
Initial doc for kernel parameters.
2019-08-13 04:59:45 -07:00
Andrew Rynhard
caa0354fe9 chore: fix drone clone
In order to use promotion against pull requests to trigger things like
E2E, we need to update the default clone logic. The issue is that a
promotion is assumed to be ran against a build that has been merged. In
our case, we need to promote builds that are not necessarily merged.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-12 20:33:29 -07:00
Andrew Rynhard
1956504bd4 chore: fix default pipeline
This prevents the default pipeline from running on releases. It also
ensures that the push step is executed on a release.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-12 17:45:26 -07:00
Andrew Rynhard
e8355f07a0 chore: fix release pipeline
We should only use the "tag" event and remove the promotion event. It
seems like we can't have both.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-12 17:24:12 -07:00
Andrew Rynhard
8726cdb772 chore: prepare release v0.2.0-alpha.6
This is the official v0.2.0-alpha.6 release.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-12 16:04:14 -07:00
Andrew Rynhard
a420b85b07 chore: run unique E2E tests
In order to run more than one instance of E2E testing at a time, we need
to ensure that all resources are unique to the run.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-12 14:14:08 -07:00
Andrew Rynhard
57db8a77b7 chore: exclude promotion event
We need to exclude the promotion event in a number of places.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-12 11:43:38 -07:00
Seán C McCord
ae77d6e053 fix: format IPv6 host entries properly
This reworks a bunch of the formatting for the userdata generation to
output a cleaner talos config when using IPv6 masters and `osctl config
generate`.

Please note that this changes the scope of concern for master indexing,
keeping `osctl` blissfully unaware of the master-reference chaining.
All it does is report the index of the master it is trying to generate.
The generator itself handles the reference chaining.

Fixes #916, fixes #917, and fixes #918

Signed-off-by: Seán C McCord <ulexus@gmail.com>
2019-08-12 11:35:38 -07:00
Andrew Rynhard
142500ce3e fix(proxyd): print bootstrap backend dial errors
This prints any error that occurs when dialing the bootstrap backend.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-11 15:12:09 -07:00
Seán C McCord
fd76d90028 fix(proxyd): do not pre-bracket IPv6 backend addrs
Fixes #996

Signed-off-by: Seán C McCord <ulexus@gmail.com>
2019-08-11 15:00:22 -07:00
Andrew Rynhard
ad79e8dfcf feat: remove the machine config on reset
This wil remove the machine config on a reset so that a new machine
configwill be downloaded and used on a reboot.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-11 12:51:55 -07:00
Andrew Rynhard
ac54a3cb86 chore: add ability to promote to a release
Although the GitHub release plug requires a tag and will fail on a
promotion, this is still useful as it will allow us to mimic a release
before we tag.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-11 11:51:53 -07:00
Andrew Rynhard
2ee769d19e chore: add image test step
Instead of building platform specific images in the default pipeline, we
should build just one image as part of our basic testing to make sure
installations work as expected.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-11 10:51:33 -07:00