752 Commits

Author SHA1 Message Date
Andrew Rynhard
142500ce3e fix(proxyd): print bootstrap backend dial errors
This prints any error that occurs when dialing the bootstrap backend.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-11 15:12:09 -07:00
Seán C McCord
fd76d90028 fix(proxyd): do not pre-bracket IPv6 backend addrs
Fixes #996

Signed-off-by: Seán C McCord <ulexus@gmail.com>
2019-08-11 15:00:22 -07:00
Andrew Rynhard
ad79e8dfcf feat: remove the machine config on reset
This wil remove the machine config on a reset so that a new machine
configwill be downloaded and used on a reboot.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-11 12:51:55 -07:00
Andrew Rynhard
ac54a3cb86 chore: add ability to promote to a release
Although the GitHub release plug requires a tag and will fail on a
promotion, this is still useful as it will allow us to mimic a release
before we tag.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-11 11:51:53 -07:00
Andrew Rynhard
2ee769d19e chore: add image test step
Instead of building platform specific images in the default pipeline, we
should build just one image as part of our basic testing to make sure
installations work as expected.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-11 10:51:33 -07:00
Andrew Rynhard
c34ce3a4ed chore: reenable AMI publishing
This was removed during the refactor of our Drone file.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-11 10:07:57 -07:00
Andrew Rynhard
817380bad6 chore: refactor the Jsonnet file
This change improves the drone jsonnet file by making it more DRY and
structuring it in a way that makes it much easier to follow.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-11 09:23:30 -07:00
Seán C McCord
63cfd8a405 fix(proxyd): wrap Dial addresses
Handle IPv6 addresses in proxyd frontend.

Fixes #988

Signed-off-by: Seán C McCord <ulexus@gmail.com>
2019-08-10 23:00:28 -07:00
Seán C McCord
7691bb060c fix: enable IPv6 forwarding
Fixes #985

Signed-off-by: Seán C McCord <ulexus@gmail.com>
2019-08-10 22:39:56 -07:00
Seán C McCord
5210bf489f fix: enclose address in brackets gRPC client
When talking to an IPv6 address for a gRPC server, enclose the IPv6
address in brackets.

Also fixes backwards implementation of IPv4/IPv6 test.

Fixes #983

Signed-off-by: Seán C McCord <ulexus@gmail.com>
2019-08-10 19:02:39 -07:00
Seán C McCord
d0ff28a8c7 fix: enclose server address is bracks if IPv6
Fixes #980

Signed-off-by: Seán C McCord <ulexus@gmail.com>
2019-08-10 17:42:17 -07:00
Seán C McCord
6d22744eca fix: store PartitionName when on NVMe disk
Fixes #978

Signed-off-by: Seán C McCord <ulexus@gmail.com>
2019-08-10 17:10:01 -07:00
Andrew Rynhard
620efe52ef chore: fix push step dependencies
We should wait until basic integration is done.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-10 03:52:29 -07:00
Andrey Smirnov
ae54f7e40d fix: stalls in local Docker cluster boot
Problem was triggered by udevd trigger, root cause is not clear, but
workaround is to disable it for container mode.

Implement CPU/mem limits for `osctl cluster create`, apply defaults,
bump defaults for cicd.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-08-10 13:31:47 +03:00
Andrew Rynhard
aadbad44f0 docs: add project layout standards
This document describes our official layout standards. It serves as a
guide for contributors to help them in organizing code and placing
packages into the correct locations.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-09 23:16:08 -07:00
Andrew Rynhard
b965239672 chore: fix clone logic
This is another attempt at fixing the clone logic to make it work when
building the master branch.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-09 23:04:43 -07:00
Andrew Rynhard
217b7e2f9d chore: fix broken clone
This fixes and issue with cloning the master branch caused by git
refusing to fetch into the current branch.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-09 22:44:54 -07:00
Andrew Rynhard
8786916fd0 chore: build drone YAML via jsonnet
This PR aims to DRY the drone config file by using Jsonnet to generate
it.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-09 22:30:37 -07:00
Andrew Rynhard
449c14c391 chore: remove GitHub action workflow
We are starting to see 403 suddenly when conform is ran is a GitHub
action. This PR removes the action.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-09 22:27:03 -07:00
Brad Beam
e60a57e186 chore: Fix up adhoc e2e tests
- Wait a little after cluster comes up
- Change interaction with CONFORMANCE variable to work around
  set -eou pipefail restrictions
- Set sonobouy runner version to latest to work with alpha
  version

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-08-09 13:55:14 -05:00
Andrey Smirnov
2227d1f6b6 chore: add race-enabled test run
As Go race detector doesn't work under libmuscl, use stock glibc-based
golang container.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-08-09 17:12:34 +03:00
Brad Beam
da1f73249f fix(machined): Clean up installation process
This also includes a fix for #955 which had the unintended side effect
of breaking image creation ( since it would attempt to grow the filesystem
always ).

The refactor standardizes around looking for the DATA and ESP labels to
discover any existing installations/filesystems. If none are found, an
installation will proceed -- for both image creation and bare metal.
During bootup, the DATA partition will always attempt to expand/grow.

This also introduces a new phase to verify the installation through the
existance of /boot/installed ( migrated from install stage ).

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-08-08 22:10:14 -05:00
Andrew Rynhard
3383e72d37 chore: remove machined from rootfs target
The machined dependency is not needed in the rootfs target. The
dependency is handled by buildkit.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-08 12:26:30 -07:00
Brad Beam
bfc1646cd9 chore(ci): Add e2e promotion pipeline
Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-08-08 11:27:57 -05:00
Spencer Smith
c03e4f850c chore: re-add github actions
This PR will hopefully re-enable the github actions for conform to work
as expected. 🤞

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2019-08-08 11:56:54 -04:00
Spencer Smith
e88d908f07 chore: delete github actions temporarily
This PR will drop the .github directory in an effort to clean things up
so we can add it back and get conform acting correctly.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2019-08-08 11:51:58 -04:00
Andrew Rynhard
1df4690db3 chore: set docker server entrypoint to dockerd to avoid TLS generation
As of the latest DIND images, TLS certificates are generated by default.
This change bypasses the TLS generation.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-07 15:10:57 -07:00
Spencer Smith
eea33a2254 chore: enable CIS testing in conformance runs
This PR will run through the kube-bench tests as part of our nightly
conformance runs

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2019-08-07 17:06:03 -04:00
Spencer Smith
902577b4dc feat: upgrade kubernetes to v1.16.0-alpha.3
This PR updates the kubernetes version constant, as well as pulls in the
new kubeadm image with the last alpha of v1.16.0 baked in. Additionally,
moves the CNI daemon sets to apps/v1, since they're now out of beta.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2019-08-07 16:05:07 -04:00
Spencer Smith
9e02c77c0a chore: add azure e2e testing
This PR will allow us to run an azure e2e test in parallel with our
current GCE implementation.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2019-08-07 12:16:32 -04:00
Brad Beam
53b1330c44 fix(initramfs): Allow data partition to grow
This fix ensures that we always grow the data partition during an installation.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-08-07 09:11:02 -05:00
Spencer Smith
ec3c77d863 feat: bump k8s version to v1.15.2
This PR will bump the hyperkube version so that we've got fixes for some
pretty critical CVEs: CVE-2019-11247 and CVE-2019-11249

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2019-08-06 15:56:18 -04:00
Andrey Smirnov
80f2d62958 chore: stabilize one more health test
Same approach: attempt more retries to fight general slowness/resource
starvation.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-08-06 02:45:00 +03:00
Andrew Rynhard
719afb56bd chore: prepare release v0.2.0-alpha.5
This is the official v0.2.0-alpha.5 release.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-05 12:04:45 -07:00
Andrey Smirnov
2f0698def2 chore: stabilize health test
It was failing randomly due to Sleep being insufficient for the desired
condition being reached.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-08-02 14:04:03 -07:00
Andrey Smirnov
8362f58e7a chore: fix data race in goroutine runner
Discovered with `go test -race`:

```
WARNING: DATA RACE
Read at 0x00c0000cf2f8 by goroutine 25:
  github.com/talos-systems/talos/internal/app/machined/pkg/system/runner/goroutine.(*goroutineRunner).Stop()
      /home/smira/Documents/autonomy/talos/internal/app/machined/pkg/system/runner/goroutine/goroutine.go:111 +0x3e
  github.com/talos-systems/talos/internal/app/machined/pkg/system/runner/goroutine_test.(*GoroutineSuite).TestStop()
      /home/smira/Documents/autonomy/talos/internal/app/machined/pkg/system/runner/goroutine/goroutine_test.go:115 +0x345
  runtime.call32()
      /usr/local/go/src/runtime/asm_amd64.s:519 +0x3a
  reflect.Value.Call()
      /usr/local/go/src/reflect/value.go:308 +0xc0
  github.com/stretchr/testify/suite.Run.func2()
      /home/smira/Documents/go/pkg/mod/github.com/stretchr/testify@v1.3.1-0.20190311161405-34c6fa2dc709/suite/suite.go:133 +0x2ec
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:865 +0x163

Previous write at 0x00c0000cf2f8 by goroutine 26:
  github.com/talos-systems/talos/internal/app/machined/pkg/system/runner/goroutine.(*goroutineRunner).Run()
      /home/smira/Documents/autonomy/talos/internal/app/machined/pkg/system/runner/goroutine/goroutine.go:65 +0xcb
  github.com/talos-systems/talos/internal/app/machined/pkg/system/runner/goroutine_test.(*GoroutineSuite).TestStop.func3()
      /home/smira/Documents/autonomy/talos/internal/app/machined/pkg/system/runner/goroutine/goroutine_test.go:104 +0x4a
```

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-08-02 14:03:18 -07:00
Andrey Smirnov
37c1703f06 chore: add tests for event.Bus
Small tests to make sure code works as expected.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-08-02 14:02:18 -07:00
Andrey Smirnov
71640662e0 chore(init): rearrange phase handling to push shutdown to main
This re-arranges phases a bit so that shutdown actions are pushed back
to the top-level main.go of machined.

Small rudimentary event.Bus is introduce to facilitate event passing
(shutdown/restart) between various machined components and main.go. This
might be not the best implementation, just something to allow this
message passing without global variables or such.

Machined API was refactored to run as goroutine service.

ACPI & signal handlers re-built as phase tasks, and activated for
non-container, container modes respectively.

As part of the fix, now `docker stop` triggers correct shutdown of Talos
(not a big deal, but good for testing).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-08-02 08:42:12 -07:00
Andrew Rynhard
90c91807bd refactor: restructure the project layout
This change moves packages into more appropriate places.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-01 22:19:42 -07:00
Andrew Rynhard
a9c4a95a4b fix: mount the owned partitions in cloud platforms
This adds the logic for mounting the owned block device and resizing the
ephemeral partition for cloud platforms.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-01 21:48:23 -07:00
Andrew Rynhard
ca35b85300 refactor: improve installation reliability
This change aims to make installations more unified and reliable. It
introduces the concept of a mountpoint manager that is capable of
mounting, unmounting, and moving a set of mountpoints in the correct
order.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-01 11:44:40 -07:00
Andrey Smirnov
9c63f4ed0a feat(init): implement complete API for service lifecycle (start/stop)
It is now possible to `start`/`stop`/`restart` any service via `osctl`
commands.

There are some changes in `ServiceRunner` to support re-use (re-entering
running state). `Services` singleton now tracks service running state to
avoid calling `Start()` on already running `ServiceRunner` instance.
Method `Start()` was renamed to `LoadAndStart()` to break up service
loading (adding to the list of service) and actual service start.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-08-01 11:16:57 -07:00
Andrew Rynhard
91ac1d7a8c chore: run CI jobs on CI nodes
This adds a node selector to our drone jobs that runs the jobs on
dedictated CI nodes.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-01 08:33:02 -07:00
Spencer Smith
38dfddbab3 feat: break up osctl cluster create and basic/e2e tests
This PR will break cluster create apart from the other steps in
integration tests. It will allow us to run the cluster create, then use
it for parallel e2e builds in different cloud environments.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2019-08-01 10:55:24 -04:00
Andrew Rynhard
835d72b74a fix: create overlay mounts after install
Without running the install task first, /var is read-only. This causes
the overlay phase to fail as it tries to create /var/system.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-01 06:35:12 -07:00
Andrey Smirnov
3024c26a55 chore: update dockerfile/buildkit versions
New buildkit release: https://github.com/moby/buildkit/releases/tag/v0.6.0

New release was published for buildkit's dockerfile:
https://github.com/moby/buildkit/releases/tag/dockerfile%2F1.1.2-experimental,
so we can stick to release version now.

These releases include fixes/implementation for `RUN --security=insecure`.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-08-01 01:05:42 +03:00
Andrey Smirnov
084378ac04 fix(init): flip concurrency of tasks/services, fix small issues
Phases should run sequentially, while tasks concurrently in a phase.

There are two potential issues fixed:

1. `result` multierror was updated inside goroutine without any
synchronization, so this is a data race
2. panic inside task/phase runner might happen and as unhandled panic in a
goroutine aborts whole process, this might lead to a system halt as
as the 'machined' exits

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-07-31 14:21:07 -07:00
Spencer Smith
bc5fe085bd fix: set mtu value regardless of interface state
This PR will fix a bug we encountered in GCE, where the interface was
already up and the MTU value wasn't getting set.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2019-07-31 15:02:02 -04:00
Andrey Smirnov
ac963ad7e1 feat(osctl): allow configurable number of masters to cluster create
This allows to run tiny Talos clusters (which is sometimes nice for
local testing), e.g. with just a single master and zero workers.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-07-30 15:32:16 -07:00
Andrew Rynhard
e2e5236f62 chore: prepare release v0.1.0
This is the official v0.1.0 release.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-07-29 21:17:10 -07:00