talos

Author	SHA1	Message	Date
Andrew Rynhard	d4770d41ad	feat: run installs via container This moves to performing installs via a container. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-08-27 15:01:20 -05:00
Spencer Smith	739e232896	feat: upgrade kubernetes to v1.16.0-beta.1 This PR will upgrade to the latest beta of v1.16 in order to get us closer to catching the v1.16.0 release as soon as it drops. Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>	2019-08-27 13:25:33 -04:00
Brad Beam	f028d29d31	chore: Increase timers for healthchecks We've seen some instances where the initial delay is not long enough (containerd) as well as a period of every second increases the log size for services like proxyd which log incoming connections. Signed-off-by: Brad Beam <brad.beam@talos-systems.com>	2019-08-27 09:54:05 -07:00
Andrew Rynhard	0bdaff1a90	feat: perform upgrades via container This moves to performing upgrades via a container. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-08-27 09:44:50 -07:00
Spencer Smith	f85750cdca	feat: generate and use v1 machine configs This PR will implement the v1 machine config proposal. This will allow for a streamlined config for talos nodes. Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>	2019-08-26 19:36:14 -04:00
Andrew Rynhard	43e20217e8	feat: add ability to pass data on event bus We need to support eventing with associated data. This moves the event bus to an observer design pattern that allows observers to register for specific events, and to receive the associated data. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-08-26 13:27:02 -07:00
Spencer Smith	6f8e089271	chore: use kubeadm v1beta2 structs everywhere This PR will move to using the external kubeadm v1beta2 structs for our code base. This will hopefully allow for more stable integrations with kubeadm in the long term, as well as solve some needs we have in the machine config rewrite. Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>	2019-08-26 12:07:36 -04:00
Brad Beam	692571bdec	feat(networkd): Add grpc endpoint Allows us to list routes and interface details Signed-off-by: Brad Beam <brad.beam@talos-systems.com>	2019-08-25 19:48:08 -07:00
Brad Beam	d36007fb29	feat(osd): Add ntpd client Allows us to access ntp api Signed-off-by: Brad Beam <brad.beam@talos-systems.com>	2019-08-25 13:38:34 -07:00
Andrew Rynhard	9eaa2d8140	feat: add sequencer interface This adds an interface that can be used to descibe boot, shutdown, and upgrade events in a set of phases. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-08-25 12:59:42 -07:00
Andrew Rynhard	be8f58c15d	feat: add overlay task This adds a well defined task for handling all overlay mount points that are required by the system. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-08-25 10:47:54 -07:00
Andrew Rynhard	1eb02875c2	feat: use BLKPG ioctl for partition events This moves to using BLKPG ioctl instead of BLKRRPART. BLKRRPART is older and more sensitive to EBUSY errors. BLKPG has the potential to minimize the changes of encountering an EBUSY error when manipulating partition tables. In looking at a comparison between BLKPG and BLKRRPART, it seems that both have their pros and cons. Eventually a combination of the two may serve us better, but for now I think BLKPG will get us further. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-08-25 07:55:24 -07:00
Brad Beam	cdc989ddda	refactor(networkd): Switch from rtnetlink to rtnl Gives a better abstraction on rtnetlink interaction Signed-off-by: Brad Beam <brad.beam@talos-systems.com>	2019-08-21 13:24:51 -05:00
Brad Beam	313c118ad0	refactor(networkd): Replace networkd with a standalone app This is a major rewrite of our network subsystem. - This changes networkd to run as a standalone app versus internal goroutine - This changes out the netlink package with the more idiomatic netlink/rtnetlink packages - This changes the initial network bootstrap/discovery from using a single interface to attempting to bring up all interfaces - This moves us back on to the upstream dhcp library Signed-off-by: Brad Beam <brad.beam@talos-systems.com>	2019-08-21 13:24:51 -05:00
Andrew Rynhard	0af1eba159	refactor: add more runtime modes In order to DRY up all installation methods and mount methods, this PR introduces a few more runtime modes. The modes are then used to determine the strategy for creating and or mounting the paritions. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-08-19 20:23:45 -07:00
Andrew Rynhard	794c7231f5	feat: run dedicated instance of containerd for system services In order to facilitate upgrades and resets that are capable of manipulating the system block device, we need to run an instance of containerd that has zero dependencies on the disk. We run containerd purely in memory for running system services. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-08-19 12:32:59 -07:00
Andrew Rynhard	2e65cff3ce	feat: mount /sys/fs/bpf The BPF filesystem is required to pin BPF objects. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-08-18 07:37:08 -07:00
Brad Beam	ec0f188309	fix(machined): Remove host mounts for specific CNI providers We shouldnt need these anymore Signed-off-by: Brad Beam <brad.beam@talos-systems.com>	2019-08-17 20:20:45 -07:00
Andrew Rynhard	e305acac20	feat: add standardized command runner This adds a command runner function that can be used everywhere we need to exec a binary. It adds addtional logic around error handling that will allow for viewing errors in the case of a failed command. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-08-17 03:38:36 -07:00
Brad Beam	cf64847772	refactor(proxyd): Update multilisteners to use error chan. This cleans up the multiple listener implementation. Signed-off-by: Brad Beam <brad.beam@talos-systems.com>	2019-08-16 12:21:02 -05:00
Andrew Rynhard	6940aaf233	fix: verify installation definition This fixes the possibility of panicing on a nil pointer by running the verification steps earlier. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-08-16 09:58:12 -07:00
Brad Beam	76a9c15044	feat: Add gRPC server for ntp Part of the API refactor; this introduces a gRPC server for ntp. This allows the ability to query node time and check time against specific ntp servers. This refactor also moves the ntp functionality into a sub package for better project organization. Signed-off-by: Brad Beam <brad.beam@talos-systems.com>	2019-08-16 09:46:43 -07:00
Brad Beam	46c283b6c9	chore: Disable rate limited kmessage This should allow for better troubleshooting during early boot/startup Signed-off-by: Brad Beam <brad.beam@talos-systems.com>	2019-08-16 09:36:52 -07:00
Brad Beam	70a478895f	feat(proxyd): Add gRPC server Part of the API refactor; this introduces a gRPC server for proxyd to expose some of the internal state. Signed-off-by: Brad Beam <brad.beam@talos-systems.com>	2019-08-15 16:35:03 -05:00
Andrew Rynhard	a116145c1b	feat: rename DATA partition to EPHEMERAL This changes the data partition name to something more appropriate. We chose ephemeral to make it very clear that the disk should not be used for application data. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-08-15 08:00:22 -07:00
Brad Beam	249acda74a	feat: Allow hostname to be specified in userdata This sets up the ability to define hostname via userdata. I dont expect this will get used publicly much, but provides a mechanism to convey the hostname from various sources internally. Signed-off-by: Brad Beam <brad.beam@talos-systems.com>	2019-08-14 22:41:27 -05:00
Andrew Rynhard	09693a26c9	chore: update go modules to use Kubernetes v1.16.0-alpha.3 This is not ideal, but it works. We essentially need to start using replace statements in order to pull in the modules we need. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-08-14 15:34:09 -07:00
Andrew Rynhard	142500ce3e	fix(proxyd): print bootstrap backend dial errors This prints any error that occurs when dialing the bootstrap backend. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-08-11 15:12:09 -07:00
Seán C McCord	fd76d90028	fix(proxyd): do not pre-bracket IPv6 backend addrs Fixes #996 Signed-off-by: Seán C McCord <ulexus@gmail.com>	2019-08-11 15:00:22 -07:00
Andrew Rynhard	ad79e8dfcf	feat: remove the machine config on reset This wil remove the machine config on a reset so that a new machine configwill be downloaded and used on a reboot. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-08-11 12:51:55 -07:00
Seán C McCord	63cfd8a405	fix(proxyd): wrap Dial addresses Handle IPv6 addresses in proxyd frontend. Fixes #988 Signed-off-by: Seán C McCord <ulexus@gmail.com>	2019-08-10 23:00:28 -07:00
Seán C McCord	7691bb060c	fix: enable IPv6 forwarding Fixes #985 Signed-off-by: Seán C McCord <ulexus@gmail.com>	2019-08-10 22:39:56 -07:00
Seán C McCord	6d22744eca	fix: store PartitionName when on NVMe disk Fixes #978 Signed-off-by: Seán C McCord <ulexus@gmail.com>	2019-08-10 17:10:01 -07:00
Andrey Smirnov	ae54f7e40d	fix: stalls in local Docker cluster boot Problem was triggered by udevd trigger, root cause is not clear, but workaround is to disable it for container mode. Implement CPU/mem limits for `osctl cluster create`, apply defaults, bump defaults for cicd. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2019-08-10 13:31:47 +03:00
Brad Beam	da1f73249f	fix(machined): Clean up installation process This also includes a fix for #955 which had the unintended side effect of breaking image creation ( since it would attempt to grow the filesystem always ). The refactor standardizes around looking for the DATA and ESP labels to discover any existing installations/filesystems. If none are found, an installation will proceed -- for both image creation and bare metal. During bootup, the DATA partition will always attempt to expand/grow. This also introduces a new phase to verify the installation through the existance of /boot/installed ( migrated from install stage ). Signed-off-by: Brad Beam <brad.beam@talos-systems.com>	2019-08-08 22:10:14 -05:00
Brad Beam	53b1330c44	fix(initramfs): Allow data partition to grow This fix ensures that we always grow the data partition during an installation. Signed-off-by: Brad Beam <brad.beam@talos-systems.com>	2019-08-07 09:11:02 -05:00
Andrey Smirnov	80f2d62958	chore: stabilize one more health test Same approach: attempt more retries to fight general slowness/resource starvation. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2019-08-06 02:45:00 +03:00
Andrey Smirnov	2f0698def2	chore: stabilize health test It was failing randomly due to Sleep being insufficient for the desired condition being reached. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2019-08-02 14:04:03 -07:00
Andrey Smirnov	8362f58e7a	chore: fix data race in goroutine runner Discovered with `go test -race`: ``` WARNING: DATA RACE Read at 0x00c0000cf2f8 by goroutine 25: github.com/talos-systems/talos/internal/app/machined/pkg/system/runner/goroutine.(goroutineRunner).Stop() /home/smira/Documents/autonomy/talos/internal/app/machined/pkg/system/runner/goroutine/goroutine.go:111 +0x3e github.com/talos-systems/talos/internal/app/machined/pkg/system/runner/goroutine_test.(GoroutineSuite).TestStop() /home/smira/Documents/autonomy/talos/internal/app/machined/pkg/system/runner/goroutine/goroutine_test.go:115 +0x345 runtime.call32() /usr/local/go/src/runtime/asm_amd64.s:519 +0x3a reflect.Value.Call() /usr/local/go/src/reflect/value.go:308 +0xc0 github.com/stretchr/testify/suite.Run.func2() /home/smira/Documents/go/pkg/mod/github.com/stretchr/testify@v1.3.1-0.20190311161405-34c6fa2dc709/suite/suite.go:133 +0x2ec testing.tRunner() /usr/local/go/src/testing/testing.go:865 +0x163 Previous write at 0x00c0000cf2f8 by goroutine 26: github.com/talos-systems/talos/internal/app/machined/pkg/system/runner/goroutine.(goroutineRunner).Run() /home/smira/Documents/autonomy/talos/internal/app/machined/pkg/system/runner/goroutine/goroutine.go:65 +0xcb github.com/talos-systems/talos/internal/app/machined/pkg/system/runner/goroutine_test.(GoroutineSuite).TestStop.func3() /home/smira/Documents/autonomy/talos/internal/app/machined/pkg/system/runner/goroutine/goroutine_test.go:104 +0x4a ``` Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2019-08-02 14:03:18 -07:00
Andrey Smirnov	37c1703f06	chore: add tests for event.Bus Small tests to make sure code works as expected. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2019-08-02 14:02:18 -07:00
Andrey Smirnov	71640662e0	chore(init): rearrange phase handling to push shutdown to main This re-arranges phases a bit so that shutdown actions are pushed back to the top-level main.go of machined. Small rudimentary event.Bus is introduce to facilitate event passing (shutdown/restart) between various machined components and main.go. This might be not the best implementation, just something to allow this message passing without global variables or such. Machined API was refactored to run as goroutine service. ACPI & signal handlers re-built as phase tasks, and activated for non-container, container modes respectively. As part of the fix, now `docker stop` triggers correct shutdown of Talos (not a big deal, but good for testing). Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2019-08-02 08:42:12 -07:00
Andrew Rynhard	90c91807bd	refactor: restructure the project layout This change moves packages into more appropriate places. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-08-01 22:19:42 -07:00
Andrew Rynhard	a9c4a95a4b	fix: mount the owned partitions in cloud platforms This adds the logic for mounting the owned block device and resizing the ephemeral partition for cloud platforms. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-08-01 21:48:23 -07:00
Andrew Rynhard	ca35b85300	refactor: improve installation reliability This change aims to make installations more unified and reliable. It introduces the concept of a mountpoint manager that is capable of mounting, unmounting, and moving a set of mountpoints in the correct order. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-08-01 11:44:40 -07:00
Andrey Smirnov	9c63f4ed0a	feat(init): implement complete API for service lifecycle (start/stop) It is now possible to `start`/`stop`/`restart` any service via `osctl` commands. There are some changes in `ServiceRunner` to support re-use (re-entering running state). `Services` singleton now tracks service running state to avoid calling `Start()` on already running `ServiceRunner` instance. Method `Start()` was renamed to `LoadAndStart()` to break up service loading (adding to the list of service) and actual service start. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2019-08-01 11:16:57 -07:00
Andrew Rynhard	835d72b74a	fix: create overlay mounts after install Without running the install task first, /var is read-only. This causes the overlay phase to fail as it tries to create /var/system. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-08-01 06:35:12 -07:00
Andrey Smirnov	084378ac04	fix(init): flip concurrency of tasks/services, fix small issues Phases should run sequentially, while tasks concurrently in a phase. There are two potential issues fixed: 1. `result` multierror was updated inside goroutine without any synchronization, so this is a data race 2. panic inside task/phase runner might happen and as unhandled panic in a goroutine aborts whole process, this might lead to a system halt as as the 'machined' exits Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2019-07-31 14:21:07 -07:00
Spencer Smith	bc5fe085bd	fix: set mtu value regardless of interface state This PR will fix a bug we encountered in GCE, where the interface was already up and the MTU value wasn't getting set. Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>	2019-07-31 15:02:02 -04:00
Andrew Rynhard	e63c882b89	refactor: split machined into phases This change aims to standardize the boot process. It introduces the concept of a phase, which is comprised of tasks. Phases are ran in serial and the tasks that make up a phase are ran concurrently. Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>	2019-07-29 12:40:03 -07:00
Andrey Smirnov	f56a9d5b96	chore: implement first version of CRI runner It runs containers via CRI interface in a pod sandbox. This is the very first version: I tried not to introduce any changes to common runner interface. There should be some CRI-speficic options for the runner (like polling interval, as it doesn't have nice `Wait()` API), plus my plan so far is to use OCI as the common layer for container options, so that we can analyze OCI and translate to CRI (when possible, return errors when option is not implemented). CRI interface doesn't have a concept of 'unpacking' an image, so we probably need to unpack via containerd API (or any other runtime-specific API) by targeting CRI namespace. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>	2019-07-26 21:07:46 +03:00

1 2 3 4 5 ...

263 Commits