IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Add startup probes that probe the containers for 60 seconds before switching to liveness probes.
Closessiderolabs/talos#7054.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Hide kube-apiserver, kube-controller-manager and kube-scheduler statuses on the dashboard for the worker nodes, instead of showing them as n/a.
Also display the cluster name as n/a for workers (instead of an empty string), as that information is not available to them.
Closessiderolabs/talos#7103.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Show the config URL template that will be populated when the code is entered. Closessiderolabs/talos#7092.
Clear the form when the tab is exited & do not display "Saved successfully" message when the code is saved, as we navigate to the summary tab afterward anyway. Closessiderolabs/talos#7093.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Updated documentation, what's new, etc.
Also fix some minor UI issues in the dashboard.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Prevent dashboard from crashing when a dead/non-existent node is specified on `talosctl --nodes`.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Use GRUB quoting function to the kernel args passed to Talos.
This fixes passing `${variable}` to `talos.config=` kernel argument.
Also fix a problem with `ONBUILD` being exected for `imager` image.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
The problem was that 'Succeeded' pod was treated as 'not ready', so that
`MachineStatus` never reached readiness state.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Fixes#7041
Rework the DHCP flow so that we don't use `INFORM` requests anymore. The
idea is to try requesting a hostname from the DHCP server first, and if
the hostname is not send, or it gets overridden in Talos, restart the
DHCP sequence sending the hostname to the DHCP server.
This still avoids sending and requesting a hostname in one request.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Fixes: https://github.com/siderolabs/talos/issues/7017
Should allow external services to detect which user block devices might
need to be wiped during reset.
Signed-off-by: Artem Chernyshev <artem.chernyshev@talos-systems.com>
Network probes are configured with the specs, and provide their output
as a status.
At the moment only platform code can configure network probes.
If any network probes are configured, they affect network.Status
'Connectivity' flag.
Example, create the probe:
```
talosctl -n 172.20.0.3 meta write 0xa '{"probes": [{"interval": "1s", "tcp": {"endpoint": "google.com:80", "timeout": "10s"}}]}'
```
Watch probe status:
```
$ talosctl -n 172.20.0.3 get probe
NODE NAMESPACE TYPE ID VERSION SUCCESS
172.20.0.3 network ProbeStatus tcp:google.com:80 5 true
```
With failing probes:
```
$ talosctl -n 172.20.0.3 get probe
NODE NAMESPACE TYPE ID VERSION SUCCESS
172.20.0.3 network ProbeStatus tcp:google.com:80 4 true
172.20.0.3 network ProbeStatus tcp:google.com:81 1 false
$ talosctl -n 172.20.0.3 get networkstatus
NODE NAMESPACE TYPE ID VERSION ADDRESS CONNECTIVITY HOSTNAME ETC
172.20.0.3 network NetworkStatus status 5 true true true true
```
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Implement a screen for entering/managing the config `${code}` variable.
Enable this screen only when the platform is `metal` and there is a `${code}` variable in the `talos.config` kernel cmdline URL query.
Additionally, remove the "Delete" button and its functionality from the network config screen to avoid users accidentally deleting PlatformNetworkConfig parts that are not managed by the dashboard.
Add some tests for the form data parsing on the network config screen.
Remove the unnecessary lock on the summary tab - all updates come from the same goroutine.
Closessiderolabs/talos#6993.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Restart udevd on adding custom rules where in the case the subsystems
needs to be re-triggered.
Fixes: #7001
Signed-off-by: Noel Georgi <git@frezbo.dev>
talosctl netstat -k show all host and non-hostnetwork pods sockets/connections.
talosctl netstat namespace/pod shows sockets/connections of a specific pod +
autocompletes in the shell.
Signed-off-by: Nico Berlee <nico.berlee@on2it.net>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
* Clear the input form and switch to summary tab after the network config is saved.
* Use nodeaddress resource for detecting and displaying IPs. Improve the IP filtering logic.
* Fix the logic of gateway detection. Display all gateways instead of a single one.
* Use hostnamestatus resource to detect the hostname instead of an API call.
* Add hostname entry to the network info section on summary tab (as `HOST`).
* Enable `OUTBOUND` entry in network info section on summary tab.
* Display only the physical network interfaces in the interface dropdown on network config tab.
* Improve form input handling.
* Additional minor fixes & improvements.
Closessiderolabs/talos#6992.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
This adds support for automatically registering node hostnames in DNS by
sending the current hostname to DHCP via option 12. If the current hostname is
updated, issue a new DISCOVER to propagate the update to DHCP (updating the
hostname on lease renewals is not universally supported by DHCP servers). This
addition maintains the previous functionality where the node can also request
its hostname from the DHCP server. The received hostname will be processed and
prioritized as usual by the `network.HostnameSpecController`.
This change set also contains fixes to make DHCP renewals compliant with RFC
2131, specifically avoiding sending the server identifier and requested IP
address when issuing renewals using a previous offer. This also uncovered
issues and missing features in the upstream `insomniacslk/dhcp` library, the
fixes and improvements for which are now finally merged.
Sending hostname updates have been tested against `dnsmasq` and the built-in
DHCP + DNS services in Windows Server. Hostname retrieval from DHCP and edge
cases with overridden hostnames from different configuration layers have been
extensively tested against `dnsmasq`.
Signed-off-by: Dennis Marttinen <twelho@welho.tech>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Unify getting environment variables, support passing environment
variables via kernel args.
Fixes#6984
See #6999
For META this will be used to pass environment variables to the
installer for ISO images (or PXE booting).
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
New variable value is coming from `META`, and it might be set using the
interactive console (not implemented yet, but it will come soon).
I had to refactor the URL expansion implementation:
* simplify things where possible
* provide more unit-tests for smaller units
* handle expansion of all variables in parallel
* allow parallel expansion on multiple variables
Also I refactored download code to support proper passing of endpoint
function with context.
The end result:
* Talos will try to download config for 3 hours before rebooting
* Each attempt which includes URL expansion + download is limited to 3
minutes
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Implement the network config screen with input forms to configure the initial node networking by writing a config to the META partition.
Closessiderolabs/talos#6961.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
The problem was that `GracefulStop()` will hang forever if there is a
running API call. So if there is a running streaming call, the
maintenance service might hang until it is finished.
The problem shows up with 'Upgrade' API in the maintenance mode if there
is a concurrent streaming API call, e.g.:
1. Watch API is running against maintenance mode.
2. Upgrade API is issued, it tries to run the MaintenanceUpgrade
sequence, which tries to take over the Initialize sequence. The
Initialize sequence is canceled, maintenance API service context is
canceled, but the service doesn't terminate, as it's stuck in
`GracefulStop`. The sequence take over times out, as even the
sequence is canceled, it hasn't terminated yet.
Sample log:
```
[talos] upgrade request received: "ghcr.io/siderolabs/installer:v1.3.3"
[talos] upgrade failed: failed to acquire lock: timeout
[talos] task loadConfig (1/1): failed: failed to receive config via maintenance service: maintenance service failed: context canceled
[talos] phase config (6/7): failed
[talos] initialize sequence: failed
<stuck here>
```
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Bump golangci-lint and fixup new warnings. Ignore check that checks for
used function parameters, it's kind of noisy and makes it confusing to
read interface implementations.
Signed-off-by: Noel Georgi <git@frezbo.dev>
Discovered in #6971. Go compiler cannot deduce proper type on 32bit architectures for those constants,
in `fmt.Print(f)` functions. Since we only compare them with uint32 variables, it makes sense to add proper
types to them.
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
The problem showed up on 'reset' of the Talos node which had multiple
endpoints for other control plane nodes, many of which weren't actually
available.
When 'grpc.WithBlock()' is used, etcd will try to dial the first
endpoint and return an error if the dial fails.
Use noblock mode by default with multiple endpoints, and blocking mode
with a single endpoint.
Pass the context to etcd to properly abort dial operations if the
context get canceled.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Instead of doing excessive get/list requests, do a watch per node in an infinite retry.
Additionally, refactor the dashboard code to make the various data listener namings more consistent and reorganize the packages.
Closessiderolabs/talos#6960.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
If link has no `Info` field we can't do anything meaningful, so we'll just log and skip.
Also fix race in test.
For #6956
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
Fix a data race caused by the metadata field of PlatformNetworkConfig being edited after it was sent to the channel. It caused test failures.
Fix it by setting a copy of the metadata instead.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
A special META key might contain optional platform network config for
the `METAL` platform.
It is completely optional, but if present, it works same way as in the
clouds: it is applied with low priority (can be overridden with machine
config), but provides some initial defaults for the machine.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This allows to put keys to META partition.
META contents can be viewed with `talosctl get metakeys`.
There is not real usecase for it yet, but the next PRs will introduce
two special keys which can be written:
* platform network config for `metal`
* `${code}` variable
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Implement the new summary dashboard with node info and logs.
Replace the previous metrics dashboard with the new dashboard which has multiple screens for node summary, metrics and editing network config.
Port the old metrics dashboard to the tview library and assign it to be a screen in the new dashboard, accessible by F2 key.
Add a new resource, infos.cluster.talos.dev which contains the cluster name and id of a node.
Disable the network config editor screen in the new dashboard until it is fully implemented with its backend.
Closessiderolabs/talos#4790.
Signed-off-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Use a global instance, handle loading/saving META in global context.
Deprecate legacy syslinux ADV, provide an easier interface for
consumers.
Expose META as resources.
Fix the bootloader revert process (it was completely broken for quite a
while :sad:).
This is a first step which mostly does preparation work, real changes
will come in the next PRs:
* add APIs to write to META
* consume META keys for platform network config for `metal`
* custom key for URL `${code}`
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Fixes#6730
`go generate`-based step downloads the upstream manifest, transforms it
to match our requirements, and it is compiled in as the Flannel
manifest.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Launch CoreDNS even if the node is not initialized.
Network is ready already, but CCM didn't finish their job.
Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Fixes#6817
The original problem wasn't reproducible with `main`, but there was a
set of bugs in the shutdown sequence which was preventing it from
completing successfully, as in the maintenance mode nothing is running
and initialized yet.
Most of the bugs were `nil` pointer dereferences.
Fixed a small issue with final 'RebootError' printed as a failure in the
ACPI shutdown path.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This PR adds first 12 symbols from container ID and adds them to `talosctl -k containers` each container output.
That way we can ensure that we get the logs from proper container even if there is a newer one.
Closes#6886
Co-authored-by: Utku Ozdemir <utku.ozdemir@siderolabs.com>
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>
Use new version of go-kubernetes, and move the `kube-proxy` DaemonSet
update to follow common logic of bootstrap manifests update.
This fixes a confusing behavior when after `k8s-upgrade` the version of
`kube-proxy` is not updated in the machine config.
See https://github.com/siderolabs/go-kubernetes/pull/3
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
Exposes the pod IP as the `POD_IP` environment variable via the downward
API in the kube-proxy pod for use in e.g. metrics-bind-addr.
Signed-off-by: Tim Jones <tim.jones@siderolabs.com>
This introduces a new role for Talos API which fills the gap between
`os:reader` and `os:admin` roles.
Fixes#6898
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
When removing a member from `etcd`, the server does a pre-check to make
sure the member is connected to a quorum of other members, and the
remove request might fail. Add a retry to wait for the etcd to be fully
connected before giving up, as some parts of the reset flow alrady ran.
Also fix an issue which appears in the integration test, when `reset` is
called early in the boot sequence when local etcd hasn't started fully yet.
Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
This PR ensures that we can handle third grub option - "wipe". We will use it in 1.4.
For #6842
Signed-off-by: Dmitriy Matrenichev <dmitry.matrenichev@siderolabs.com>