1091 Commits

Author SHA1 Message Date
Andrew Rynhard
38692847b3 refactor: pass runtime to initializer
By passing the runtime to the initializer we can flex on install options
better.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-11-03 09:23:37 -08:00
Andrew Rynhard
326702925a refactor: align platform names with kernel args
This aligns platform names with tals.platofrm kernel arg.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-11-03 09:13:07 -08:00
Andrew Rynhard
ce911c02da refactor: use etcd package
This DRYs things up by using the etcd package for client creation.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-11-01 21:02:44 -07:00
Brad Beam
4653745acd fix(osd): Add additional capabilities for osd
This adds `CAP_DAC_READ_SEARCH`, `CAP_DAC_OVERRIDE`, and `CAP_SYSLOG`
capabilities to osd.  This fixes the ability to read dmesg and kubeconfig.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-11-01 20:45:43 -07:00
Andrew Rynhard
5abbb9b041 fix: Avoid running bootkube on reboots
Since bootkube should only be ran once, we need a way to determine if it
has already been ran. This makes use of etcd to store a key-value pair
indicating that the cluster has been initialized.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-11-01 15:20:43 -07:00
Tim Gerla
c3a0302f17 docs: various layout and responsiveness fixes
- adjust ul margin to keep the bullets inside the content area
- fix a few docs page responsiveness problems on small screens
- adjust the layout of the logo relative to the docs sidebar
- clean up some vestigial CSS classes

Signed-off-by: Tim Gerla <tim@gerla.net>
2019-11-01 05:58:15 -07:00
Andrew Rynhard
dc3870453b feat: create cluster with default PSP
This adds a default PSP that is applied upon bootstrapping the cluster.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-10-31 22:50:35 -07:00
Andrew Rynhard
a3dc6adec1 chore: remove unused files
This removes unused files in hack.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-10-31 22:46:38 -07:00
Andrew Rynhard
03a26f5836 chore: prepare release v0.3.0-alpha.5
This is the official v0.3.0-alpha.5 release.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-10-31 15:35:41 -07:00
Andrew Rynhard
7cd9ba588c chore: remove RAW disk
We need to remove this so that it is not published in a release.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-10-31 15:35:06 -07:00
Andrew Rynhard
6764170d1a docs: remove v0.2 docs
The v0.2 docs are inaccurate, and in general just bad. Since we made so
many breaking changes in v0.3 I think its better we just hit the reset
button and stick to v0.3 going forward.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-10-31 14:59:17 -07:00
Andrew Rynhard
96513ac397 docs: fix list-style-position
This sets the list-style-position to inside by default, and overrides
the landing page to use outside. This way we only need to maintain the
CSS for the landing page and not all the other potential places we would
want inside in the future.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-10-31 14:57:08 -07:00
Andrew Rynhard
2cad745292 docs: add customization guide
This adds a section on customizing Talos.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-10-31 14:47:17 -07:00
Andrew Rynhard
d39658a9ed docs: add VMware docs to menu
This adds the VMware docs to the sidebar menu and also touches up the
wording a bit.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-10-31 12:19:33 -07:00
Brad Beam
ca76ccd4af feat: Add support for creating VMware images
This PR adds support for generating VMware compatible images as an ova.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-10-31 13:39:54 -05:00
Andrew Rynhard
6e03adad06 docs: add troubleshooting guide on common PKI scenarios
This adds a "Troubleshooting" section to the documention along with a
guide on generating a certificate. This covers the scenario when a
user's certificate has expired.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-10-30 15:54:29 -07:00
Andrew Rynhard
6a61b3a1b2 docs: add note on CRNG initialization
This adds a note on the usage of random.trust_cpu to get around slow
boot times due to low entropy.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-10-30 14:12:31 -07:00
Andrew Rynhard
82e43e0570 feat: use Ed25519 public-key signature system
This replaces ECDSA with Ed25519. Ed25519 is considered to be safer and
more trustworthy than ECDSA NIST curves.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-10-30 10:32:53 -07:00
Andrew Rynhard
3c6d0135d0 feat: upgrade Kubernetes to 1.16.2
This brings in 1.16.2 modules and bumps the default hyperkube image.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-10-30 06:35:12 -07:00
Andrew Rynhard
41619f9016 feat: lock down container permissions
This removes the default privileged mode that all containers were
started with and adds the required capabilities on a per-service basis.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-10-29 11:50:37 -07:00
Andrew Rynhard
9933fc0fba fix: check if endpoint is nil
This fixes a panic by checking if the cluster endpoint is nil before
using it.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-10-29 07:44:16 -07:00
Andrew Rynhard
f26a4ce040 chore: update pkgs SHA
This brings in pkgs that have been built using a prefix of /usr.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-10-29 07:10:25 -07:00
Andrew Rynhard
d70e7e3ccd chore: prepare release v0.3.0-alpha.4
This is the official v0.3.0-alpha.4 release.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-10-28 15:45:17 -07:00
Andrew Rynhard
73d76307b0 chore: add Digital Ocean image to release
This will add a step to build the Digital Ocean image.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-10-28 15:37:34 -07:00
Andrew Rynhard
362134e2f2 docs: fix Digital Ocean docs
Fixes small typos, and errors in the tutorial.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-10-28 15:34:57 -07:00
Andrew Rynhard
2459ca14da fix: add cluster endpoint to certificate SANs
The cluster endpoint needs to be specified in the API server's
certificate SANs.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-10-28 13:53:25 -07:00
Tim Gerla
8610a3f387 docs: more whitespace, wording, and responsiveness changes
- tweak whitespace between sections
- fix the top menu for small screens
- fix the terminal overlapping on small screens
- tweak wording on a few of the bullet points
- clean up the display of the "certified" logo on small screens
- clean up the "features" grid on medium/large screens

Signed-off-by: Tim Gerla <tim@gerla.net>
2019-10-28 10:50:01 -07:00
Andrew Rynhard
0d1c5ac305 feat: add support for Digital Ocean
This adds a Digital Ocean platform implementation.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-10-28 10:46:15 -07:00
Sekerin Evgeniy
c6e1e6f28f feat: Add retry on get kubeconfig
This implement retry when get kubeconfig

Signed-off-by: Sekerin Evgeniy <sekerin.e.a@gmail.com>
2019-10-28 09:59:30 -07:00
Brad Beam
457c6416a6 feat: Add network api to apid
This extends apid to include the network api

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-10-28 04:21:48 -07:00
Brad Beam
ee24e42319 feat: Add time api to apid
This extends apid to cover the time api.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-10-25 14:35:14 -07:00
Andrey Smirnov
d3d011c8d2 chore: replace /* */ comments with // comments in license header
This fixes issues with `// +build` directives not being recognized in
source files.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-10-25 14:15:17 -07:00
Brad Beam
6de32dd30b fix: Fix osctl version output
This broke when we introduced the apid changes.

```
Client:
Tag:         v0.3.0-alpha.3-3-gc3e353aa-dirty
SHA:         c3e353a-dirty
Built:
Go version:  go1.13.3
OS/Arch:     linux/amd64

Server:
NODE:        10.5.0.3
Tag:         v0.3.0-alpha.3-3-gc3e353aa-dirty
SHA:         c3e353a-dirty
Built:
Go version:  go1.13.3
OS/Arch:     linux/amd64

NODE:        10.5.0.2
Tag:         v0.3.0-alpha.3-3-gc3e353aa-dirty
SHA:         c3e353a-dirty
Built:
Go version:  go1.13.3
OS/Arch:     linux/amd64
```

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-10-25 15:41:06 -05:00
Andrey Smirnov
c3e353aa45 chore: bump tools/pkgs for toolchain refactor
This also pulls in Go 1.13.3

See  talos-systems/toolchain#8, talos-systems/tools#82,
https://github.com/talos-systems/pkgs/pull/69

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-10-25 21:59:41 +03:00
Brad Beam
573cce8d18 feat: Add APId
This PR introduces APId. This service replaces the frontend functionality
previously provided by OSD. The main driver for this is two fold:

1. Create a single purpose application to expose the talos api

2. Make use of code generation to DRY api changes

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-10-25 13:02:33 -05:00
Andrew Rynhard
a9fe8beb0f chore: fix markdown lint error
Dollar signs should only be used when the command has output.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-10-25 10:26:36 -07:00
Andrew Rynhard
f0e669330e chore: prepare release v0.3.0-alpha.3
This is the official v0.3.0-alpha.3 release.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-10-25 07:59:08 -07:00
Spencer Smith
b615418e11 fix: append localhost to cert sans if docker platform
This PR fixes a bug on mac with the localhost not making it into cert
sans when doing `osctl cluster create`. Now that they're present, we're
able to use kubectl again.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2019-10-25 10:54:54 -04:00
Tim Gerla
b324217802 docs: responsiveness fixes and wording changes
- Most of the landing page is responsive on small/medium screens now. There are still
some bugs around the ascii cinema.
- Some wording tweaks, mostly I removed words to make things more concise. Feel free
to edit my edits.
- Simplified a couple of HTML constructs.
- Expanded the "features" section into two rows with a placeholder image for the 6th item.
Happy for feedback.

Signed-off-by: Tim Gerla <tim@gerla.net>
2019-10-24 15:43:23 -07:00
Andrew Rynhard
edf4ace611 docs: update getting started guide
Things have changed since v0.2. This is a refresh to make the getting
started guide up to date.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-10-23 15:23:04 -07:00
Andrew Rynhard
7a4b4d42b5 docs: add v0.3 AWS guide
This adds documentation for v0.3 AWS users.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-10-23 15:20:07 -07:00
Spencer Smith
d8db2bc65b feat: detect gzipped machine configs
This PR will add the ability for talos to detect if the machine config
that it downloads from the platform is a gzipped file. If so, it will
unzip it and overwrite the byte slice that gets written to disk.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2019-10-23 13:30:53 -07:00
Andrew Rynhard
bccaa36b44 fix: create external IP failures as non-fatal
There are use cases where a Talos node will not be publicly accessible.
This treats platform external IP errors as non-fatal.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-10-23 13:15:58 -07:00
Andrey Smirnov
f48830e7db chore: attempt to avoid containerd shim socket conflicts in tests
I can't say how exactly those conflicts happen in the tests, but I tried
to randomize more container IDs and namespace names (which both feed
into final abstract unix socket path).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-10-23 19:33:55 +03:00
Andrey Smirnov
d44f10f094 chore: attempt to fix test hanging with reaper enabled
This is not 100% fix as I can't reproduce tests hanging in local
environment, but the idea is the following:

1. `reaper.Start()` started reaper loop in a goroutine which starts with
subscribing to `SIGCHLD`.
2. `reaper.Start()` just spawned goroutine never waiting for it.
3. if after `reaper.Start()` reaper goroutine never runs, but process is
created in the test and it terminates, `SIGCHLD` will be ignored and
reaper will never wake up to reap the child.
4. process test hangs as it waits for reaper to reap the child and
return its exit status.

Sample failures:

```
=== RUN   TestProcessSuite/runReaper=true/TestRunLogs
2019/10/15 14:17:41 state Running: Process Process(["/bin/sh" "-c" "echo -n \"Test 1\nTest 2\n\""]) started with PID 11802
coverage: 60.0% of statements
panic: test timed out after 10m0s
```

```
=== RUN   TestCmdSuite/runReaper=true/TestRun
true
coverage: 71.4% of statements
panic: test timed out after 10m0s
```

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-10-22 16:59:42 -07:00
Andrey Smirnov
8a80712d9a chore: fix containerd test hanging
The problem was that if container fails to start, it never reaches
'StateRunning' and test hangs waiting for that state. Assertion doesn't
abort whole test (it only aborts goroutine it was called from), so this
doesn't help.

Fix that by signalling back if some containers fail to start.

This is not a fix, but it should expose the actual failure happening in
this test.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-10-22 16:59:21 -07:00
Andrey Smirnov
b11d708fdc chore: make service_runner_test less flaky
This replaces `time.Sleep()` wait with  calls to `retry.Constant` to
wait for specific condition to be reached.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-10-23 01:20:22 +03:00
Andrey Smirnov
b83b9d2892 chore: fix flaky constant retry test
Failure:

```
--- FAIL: Test_constantRetryer_Retry (7.00s)
    --- FAIL: Test_constantRetryer_Retry/test_expected_number_of_retries (2.00s)
        constant_test.go:168: expected count of 2, got 3
```

The problem is that retry interval (1s) perfectly aligns with timeout
(2s), so depending on which timer fires first, function might be called
two or three times. Fix that by extending timeout a bit so it fits one
more run and not more.

P.S. This test might be still flaky under load if function doesn't have
a chance to run (starvation). Proper fix is to use fake time in the
tests.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-10-23 00:39:22 +03:00
Andrey Smirnov
811fd6706a chore: make Slack notifications more fancy
Uses some examples found on the Internet :)

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-10-22 14:23:25 -07:00
Brad Beam
251ab16e07 feat: Add node metadata wrapper to machine api
- Added common.proto to host NodeMetadata
- go_package names were fixed up so imports are generated with the proper
  package names
- fixed up build work (dockerfile) to prevent copying the previously
  generated go proto files. This fixes a bug where we could incorrectly
  copy the previously generated protobuf instead of a new one generated
  at an incorrect location/name/etc.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-10-22 14:42:34 -05:00