3.0 migration guide (#15099)
* docs: 2 to 3 migration guide Signed-off-by: Jan Fajerski <jfajersk@redhat.com> * docs/stability: add 3.0 section Signed-off-by: Jan Fajerski <jfajersk@redhat.com> * docs/migration: details on enabling legacy name validation Signed-off-by: Owen Williams <owen.williams@grafana.com>\ * migration: add log format and `le` normalization Signed-off-by: Jan Fajerski <jfajersk@redhat.com> * migration: add new enable_http2 default for remote write Signed-off-by: Jan Fajerski <jfajersk@redhat.com> --------- Signed-off-by: Jan Fajerski <jfajersk@redhat.com> Signed-off-by: Owen Williams <owen.williams@grafana.com> Co-authored-by: Owen Williams <owen.williams@grafana.com>
This commit is contained in:
parent
24a10528ac
commit
f131cdd4c5
@ -3,198 +3,198 @@ title: Migration
|
||||
sort_rank: 10
|
||||
---
|
||||
|
||||
# Prometheus 2.0 migration guide
|
||||
# Prometheus 3.0 migration guide
|
||||
|
||||
In line with our [stability promise](https://prometheus.io/blog/2016/07/18/prometheus-1-0-released/#fine-print),
|
||||
the Prometheus 2.0 release contains a number of backwards incompatible changes.
|
||||
This document offers guidance on migrating from Prometheus 1.8 to Prometheus 2.0 and newer versions.
|
||||
In line with our [stability promise](https://prometheus.io/docs/prometheus/latest/stability/),
|
||||
the Prometheus 3.0 release contains a number of backwards incompatible changes.
|
||||
This document offers guidance on migrating from Prometheus 2.x to Prometheus 3.0 and newer versions.
|
||||
|
||||
## Flags
|
||||
|
||||
The format of Prometheus command line flags has changed. Instead of a
|
||||
single dash, all flags now use a double dash. Common flags (`--config.file`,
|
||||
`--web.listen-address` and `--web.external-url`) remain but
|
||||
almost all storage-related flags have been removed.
|
||||
- The following feature flags have been removed and they have been added to the
|
||||
default behavior of Prometheus v3:
|
||||
- `promql-at-modifier`
|
||||
- `promql-negative-offset`
|
||||
- `remote-write-receiver`
|
||||
- `new-service-discovery-manager`
|
||||
- `expand-external-labels`
|
||||
Environment variable references `${var}` or `$var` in external label values
|
||||
are replaced according to the values of the current environment variables.
|
||||
References to undefined variables are replaced by the empty string.
|
||||
The `$` character can be escaped by using `$$`.
|
||||
- `no-default-scrape-port`
|
||||
Prometheus v3 will no longer add ports to scrape targets according to the
|
||||
specified scheme. Target will now appear in labels as configured.
|
||||
If you rely on scrape targets like
|
||||
`https://example.com/metrics` or `http://exmaple.com/metrics` to be
|
||||
represented as `https://example.com/metrics:443` and
|
||||
`http://example.com/metrics:80` respectively, add them to your target URLs
|
||||
- `agent`
|
||||
Instead use the dedicated `--agent` cli flag.
|
||||
|
||||
Some notable flags which have been removed:
|
||||
Prometheus v3 will log a warning if you continue to pass these to
|
||||
`--enable-feature`.
|
||||
|
||||
- `-alertmanager.url` In Prometheus 2.0, the command line flags for configuring
|
||||
a static Alertmanager URL have been removed. Alertmanager must now be
|
||||
discovered via service discovery, see [Alertmanager service discovery](#alertmanager-service-discovery).
|
||||
## Configuration
|
||||
|
||||
- `-log.format` In Prometheus 2.0 logs can only be streamed to standard error.
|
||||
|
||||
- `-query.staleness-delta` has been renamed to `--query.lookback-delta`; Prometheus
|
||||
2.0 introduces a new mechanism for handling staleness, see [staleness](querying/basics.md#staleness).
|
||||
|
||||
- `-storage.local.*` Prometheus 2.0 introduces a new storage engine; as such all
|
||||
flags relating to the old engine have been removed. For information on the
|
||||
new engine, see [Storage](#storage).
|
||||
|
||||
- `-storage.remote.*` Prometheus 2.0 has removed the deprecated remote
|
||||
storage flags, and will fail to start if they are supplied. To write to
|
||||
InfluxDB, Graphite, or OpenTSDB use the relevant storage adapter.
|
||||
|
||||
## Alertmanager service discovery
|
||||
|
||||
Alertmanager service discovery was introduced in Prometheus 1.4, allowing Prometheus
|
||||
to dynamically discover Alertmanager replicas using the same mechanism as scrape
|
||||
targets. In Prometheus 2.0, the command line flags for static Alertmanager config
|
||||
have been removed, so the following command line flag:
|
||||
|
||||
```
|
||||
./prometheus -alertmanager.url=http://alertmanager:9093/
|
||||
```
|
||||
|
||||
Would be replaced with the following in the `prometheus.yml` config file:
|
||||
|
||||
```yaml
|
||||
alerting:
|
||||
alertmanagers:
|
||||
- static_configs:
|
||||
- targets:
|
||||
- alertmanager:9093
|
||||
```
|
||||
|
||||
You can also use all the usual Prometheus service discovery integrations and
|
||||
relabeling in your Alertmanager configuration. This snippet instructs
|
||||
Prometheus to search for Kubernetes pods, in the `default` namespace, with the
|
||||
label `name: alertmanager` and with a non-empty port.
|
||||
|
||||
```yaml
|
||||
alerting:
|
||||
alertmanagers:
|
||||
- kubernetes_sd_configs:
|
||||
- role: pod
|
||||
tls_config:
|
||||
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
|
||||
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
|
||||
relabel_configs:
|
||||
- source_labels: [__meta_kubernetes_pod_label_name]
|
||||
regex: alertmanager
|
||||
action: keep
|
||||
- source_labels: [__meta_kubernetes_namespace]
|
||||
regex: default
|
||||
action: keep
|
||||
- source_labels: [__meta_kubernetes_pod_container_port_number]
|
||||
regex:
|
||||
action: drop
|
||||
```
|
||||
|
||||
## Recording rules and alerts
|
||||
|
||||
The format for configuring alerting and recording rules has been changed to YAML.
|
||||
An example of a recording rule and alert in the old format:
|
||||
|
||||
```
|
||||
job:request_duration_seconds:histogram_quantile99 =
|
||||
histogram_quantile(0.99, sum by (le, job) (rate(request_duration_seconds_bucket[1m])))
|
||||
|
||||
ALERT FrontendRequestLatency
|
||||
IF job:request_duration_seconds:histogram_quantile99{job="frontend"} > 0.1
|
||||
FOR 5m
|
||||
ANNOTATIONS {
|
||||
summary = "High frontend request latency",
|
||||
}
|
||||
```
|
||||
|
||||
Would look like this:
|
||||
|
||||
```yaml
|
||||
groups:
|
||||
- name: example.rules
|
||||
rules:
|
||||
- record: job:request_duration_seconds:histogram_quantile99
|
||||
expr: histogram_quantile(0.99, sum by (le, job) (rate(request_duration_seconds_bucket[1m])))
|
||||
- alert: FrontendRequestLatency
|
||||
expr: job:request_duration_seconds:histogram_quantile99{job="frontend"} > 0.1
|
||||
for: 5m
|
||||
annotations:
|
||||
summary: High frontend request latency
|
||||
```
|
||||
|
||||
To help with the change, the `promtool` tool has a mode to automate the rules conversion. Given a `.rules` file, it will output a `.rules.yml` file in the
|
||||
new format. For example:
|
||||
|
||||
```
|
||||
$ promtool update rules example.rules
|
||||
```
|
||||
|
||||
You will need to use `promtool` from [Prometheus 2.5](https://github.com/prometheus/prometheus/releases/tag/v2.5.0) as later versions no longer contain the above subcommand.
|
||||
|
||||
## Storage
|
||||
|
||||
The data format in Prometheus 2.0 has completely changed and is not backwards
|
||||
compatible with 1.8 and older versions. To retain access to your historic monitoring data we
|
||||
recommend you run a non-scraping Prometheus instance running at least version
|
||||
1.8.1 in parallel with your Prometheus 2.0 instance, and have the new server
|
||||
read existing data from the old one via the remote read protocol.
|
||||
|
||||
Your Prometheus 1.8 instance should be started with the following flags and an
|
||||
config file containing only the `external_labels` setting (if any):
|
||||
|
||||
```
|
||||
$ ./prometheus-1.8.1.linux-amd64/prometheus -web.listen-address ":9094" -config.file old.yml
|
||||
```
|
||||
|
||||
Prometheus 2.0 can then be started (on the same machine) with the following flags:
|
||||
|
||||
```
|
||||
$ ./prometheus-2.0.0.linux-amd64/prometheus --config.file prometheus.yml
|
||||
```
|
||||
|
||||
Where `prometheus.yml` contains in addition to your full existing configuration, the stanza:
|
||||
|
||||
```yaml
|
||||
remote_read:
|
||||
- url: "http://localhost:9094/api/v1/read"
|
||||
```
|
||||
- The scrape job level configuration option `scrape_classic_histograms` has been
|
||||
renamed to `always_scrape_classic_histograms`. If you use the
|
||||
`--enable-feature=native-histograms` feature flag to ingest native histograms
|
||||
and you also want to ingest classic histograms that an endpoint might expose
|
||||
along with native histograms, be sure to add this configuration or change your
|
||||
configuration from the old name.
|
||||
- The `http_config.enable_http2` in `remote_write` items default has been
|
||||
changed to `false`. In Prometheus v2 the remote write http client would
|
||||
default to use http2. In order to parallelize multiple remote write queues
|
||||
across multiple sockets its preferable to not default to http2.
|
||||
If you prefer to use http2 for remote write you must now set
|
||||
`http_config.enable_http2: true` in your `remote_write` configuration section.
|
||||
|
||||
## PromQL
|
||||
|
||||
The following features have been removed from PromQL:
|
||||
- The `.` pattern in regular expressions in PromQL matches newline characters.
|
||||
With this change a regular expressions like `.*` matches strings that include
|
||||
`\n`. This applies to matchers in queries and relabel configs. For example the
|
||||
following regular expressions now match the accompanying strings, wheras in
|
||||
Prometheus v2 these combinations didn't match.
|
||||
|
||||
- `drop_common_labels` function - the `without` aggregation modifier should be used
|
||||
instead.
|
||||
- `keep_common` aggregation modifier - the `by` modifier should be used instead.
|
||||
- `count_scalar` function - use cases are better handled by `absent()` or correct
|
||||
propagation of labels in operations.
|
||||
| Regex | Additional matches |
|
||||
| ----- | ------ |
|
||||
| ".*" | "foo\n", "Foo\nBar" |
|
||||
| "foo.?bar" | "foo\nbar" |
|
||||
| "foo.+bar" | "foo\nbar" |
|
||||
|
||||
See [issue #3060](https://github.com/prometheus/prometheus/issues/3060) for more
|
||||
details.
|
||||
If you want Prometheus v3 to behave like v2 did, you will have to change your
|
||||
regular expressions by replacing all `.` patterns with `[^\n]`, e.g.
|
||||
`foo[^\n]*`.
|
||||
- Lookback and range selectors are left open and right closed (previously left
|
||||
closed and right closed). This change affects queries when the evaluation time
|
||||
perfectly aligns with the sample timestamps. For example assume querying a
|
||||
timeseries with even spaced samples exactly 1 minute apart. Before Prometheus
|
||||
3.x, range query with `5m` will mostly return 5 samples. But if the query
|
||||
evaluation aligns perfectly with a scrape, it would return 6 samples. In
|
||||
Prometheus 3.x queries like this will always return 5 samples.
|
||||
This change has likely few effects for everyday use, except for some sub query
|
||||
use cases.
|
||||
Query front-ends that align queries usually align sub-queries to multiples of
|
||||
the step size. These sub queries will likely be affected.
|
||||
Tests are more likely to affected. To fix those either adjust the expected
|
||||
number of samples or extend to range by less then one sample interval.
|
||||
- The `holt_winters` function has been renamed to `double_exponential_smoothing`
|
||||
and is now guarded by the `promql-experimental-functions` feature flag.
|
||||
If you want to keep using holt_winters, you have to do both of these things:
|
||||
- Rename holt_winters to double_exponential_smoothing in your queries.
|
||||
- Pass `--enable-feature=promql-experimental-functions` in your Prometheus
|
||||
cli invocation..
|
||||
|
||||
## Scrape protocols
|
||||
Prometheus v3 is more strict concerning the Content-Type header received when
|
||||
scraping. Prometheus v2 would default to the standard Prometheus text protocol
|
||||
if the target being scraped did not specify a Content-Type header or if the
|
||||
header was unparsable or unrecognised. This could lead to incorrect data being
|
||||
parsed in the scrape. Prometheus v3 will now fail the scrape in such cases.
|
||||
|
||||
If a scrape target is not providing the correct Content-Type header the
|
||||
fallback protocol can be specified using the fallback_scrape_protocol
|
||||
parameter. See [Prometheus scrape_config documentation.](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config)
|
||||
|
||||
This is a breaking change as scrapes that may have succeeded with Prometheus v2
|
||||
may now fail if this fallback protocol is not specified.
|
||||
|
||||
## Miscellaneous
|
||||
|
||||
### Prometheus non-root user
|
||||
### TSDB format and downgrade
|
||||
The TSDB format has been changed in Prometheus v2.55 in preparation for changes
|
||||
to the index format. Consequently a Prometheus v3 tsdb can only be read by a
|
||||
Prometheus v2.55 or newer.
|
||||
Before upgrading to Prometheus v3 please upgrade to v2.55 first and confirm
|
||||
Prometheus works as expected. Only then continue with the upgrade to v3.
|
||||
|
||||
The Prometheus Docker image is now built to [run Prometheus
|
||||
as a non-root user](https://github.com/prometheus/prometheus/pull/2859). If you
|
||||
want the Prometheus UI/API to listen on a low port number (say, port 80), you'll
|
||||
need to override it. For Kubernetes, you would use the following YAML:
|
||||
### TSDB Storage contract
|
||||
TSDB compatible storage is now expected to return results matching the specified
|
||||
selectors. This might impact some third party implementations, most likely
|
||||
implementing `remote_read`.
|
||||
This contract is not explicitly enforced, but can cause undefined behavior.
|
||||
|
||||
### UTF-8 names
|
||||
Prometheus v3 supports UTF-8 in metric and label names. This means metric and
|
||||
label names can change after upgrading according to what is exposed by
|
||||
endpoints. Furthermore, metric and label names that would have previously been
|
||||
flagged as invalid no longer will be.
|
||||
|
||||
Users wishing to preserve the original validation behavior can update their
|
||||
prometheus yaml configuration to specify the legacy validation scheme:
|
||||
|
||||
```
|
||||
global:
|
||||
metric_name_validation_scheme: legacy
|
||||
```
|
||||
|
||||
Or on a per-scrape basis:
|
||||
|
||||
```
|
||||
scrape_configs:
|
||||
- job_name: job1
|
||||
metric_name_validation_scheme: utf8
|
||||
- job_name: job2
|
||||
metric_name_validation_scheme: legacy
|
||||
```
|
||||
|
||||
### Log message format
|
||||
Prometheus v3 has adopted `log/slog` over the previous `go-kit/log`. This
|
||||
results in a change of log message format. An example of the old log format is:
|
||||
```
|
||||
ts=2024-10-23T22:01:06.074Z caller=main.go:627 level=info msg="No time or size retention was set so using the default time retention" duration=15d
|
||||
ts=2024-10-23T22:01:06.074Z caller=main.go:671 level=info msg="Starting Prometheus Server" mode=server version="(version=, branch=, revision=91d80252c3e528728b0f88d254dd720f6be07cb8-modified)"
|
||||
ts=2024-10-23T22:01:06.074Z caller=main.go:676 level=info build_context="(go=go1.23.0, platform=linux/amd64, user=, date=, tags=unknown)"
|
||||
ts=2024-10-23T22:01:06.074Z caller=main.go:677 level=info host_details="(Linux 5.15.0-124-generic #134-Ubuntu SMP Fri Sep 27 20:20:17 UTC 2024 x86_64 gigafips (none))"
|
||||
```
|
||||
|
||||
a similar sequence in the new log format looks like this:
|
||||
```
|
||||
time=2024-10-24T00:03:07.542+02:00 level=INFO source=/home/user/go/src/github.com/prometheus/prometheus/cmd/prometheus/main.go:640 msg="No time or size retention was set so using the default time retention" duration=15d
|
||||
time=2024-10-24T00:03:07.542+02:00 level=INFO source=/home/user/go/src/github.com/prometheus/prometheus/cmd/prometheus/main.go:681 msg="Starting Prometheus Server" mode=server version="(version=, branch=, revision=7c7116fea8343795cae6da42960cacd0207a2af8)"
|
||||
time=2024-10-24T00:03:07.542+02:00 level=INFO source=/home/user/go/src/github.com/prometheus/prometheus/cmd/prometheus/main.go:686 msg="operational information" build_context="(go=go1.23.0, platform=linux/amd64, user=, date=, tags=unknown)" host_details="(Linux 5.15.0-124-generic #134-Ubuntu SMP Fri Sep 27 20:20:17 UTC 2024 x86_64 gigafips (none))" fd_limits="(soft=1048576, hard=1048576)" vm_limits="(soft=unlimited, hard=unlimited)"
|
||||
```
|
||||
|
||||
### `le` and `quantile` label values
|
||||
In Prometheus v3, the values of the `le` label of classic histograms and the
|
||||
`quantile` label of summaries are normalized upon ingestions. In Prometheus v2
|
||||
the value of these labels depended on the scrape protocol (protobuf vs text
|
||||
format) in some situations. This led to label values changing based on the
|
||||
scrape protocol. E.g. a metric exposed as `my_classic_hist{le="1"}` would be
|
||||
ingested as `my_classic_hist{le="1"}` via the text format, but as
|
||||
`my_classic_hist{le="1.0"}` via protobuf. This changed the identity of the
|
||||
metric and caused problems when querying the metric.
|
||||
In Prometheus v3 these label values will always be normalized to a float like
|
||||
representation. I.e. the above example will always result in
|
||||
`my_classic_hist{le="1.0"}` being ingested into prometheus, no matter via which
|
||||
protocol. The effect of this change is that alerts, recording rules and
|
||||
dashboards that directly reference label values as whole numbers such as
|
||||
`le="1"` will stop working.
|
||||
|
||||
Ways to deal with this change either globally or on a per metric basis:
|
||||
|
||||
- Fix references to integer `le`, `quantile` label values, but otherwise do
|
||||
nothing and accept that some queries that span the transition time will produce
|
||||
inaccurate or unexpected results.
|
||||
_This is the recommended solution._
|
||||
- Use `metric_relabel_config` to retain the old labels when scraping targets.
|
||||
This should **only** be applied to metrics that currently produce such labels.
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Pod
|
||||
metadata:
|
||||
name: security-context-demo-2
|
||||
spec:
|
||||
securityContext:
|
||||
runAsUser: 0
|
||||
...
|
||||
metric_relabel_configs:
|
||||
- source_labels:
|
||||
- quantile
|
||||
target_label: quantile
|
||||
regex: (\d+)\.0+
|
||||
- source_labels:
|
||||
- le
|
||||
- __name__
|
||||
target_label: le
|
||||
regex: (\d+)\.0+;.*_bucket
|
||||
```
|
||||
|
||||
See [Configure a Security Context for a Pod or Container](https://kubernetes.io/docs/tasks/configure-pod-container/security-context/)
|
||||
for more details.
|
||||
# Prometheus 2.0 migration guide
|
||||
|
||||
If you're using Docker, then the following snippet would be used:
|
||||
|
||||
```
|
||||
docker run -p 9090:9090 prom/prometheus:latest
|
||||
```
|
||||
|
||||
### Prometheus lifecycle
|
||||
|
||||
If you use the Prometheus `/-/reload` HTTP endpoint to [automatically reload your
|
||||
Prometheus config when it changes](configuration/configuration.md),
|
||||
these endpoints are disabled by default for security reasons in Prometheus 2.0.
|
||||
To enable them, set the `--web.enable-lifecycle` flag.
|
||||
For the Prometheus 1.8 to 2.0 please refer to the [Prometheus v2.55 documentation](https://prometheus.io/docs/prometheus/2.55/migration/).
|
||||
|
@ -9,7 +9,7 @@ Prometheus promises API stability within a major version, and strives to avoid
|
||||
breaking changes for key features. Some features, which are cosmetic, still
|
||||
under development, or depend on 3rd party services, are not covered by this.
|
||||
|
||||
Things considered stable for 2.x:
|
||||
Things considered stable for 3.x:
|
||||
|
||||
* The query language and data model
|
||||
* Alerting and recording rules
|
||||
@ -18,21 +18,25 @@ Things considered stable for 2.x:
|
||||
* Configuration file format (minus the service discovery remote read/write, see below)
|
||||
* Rule/alert file format
|
||||
* Console template syntax and semantics
|
||||
* Remote write sending, per the [1.0 specification](https://prometheus.io/docs/concepts/remote_write_spec/).
|
||||
* Remote write sending, per the [1.0 specification](https://prometheus.io/docs/concepts/remote_write_spec/) and receiving
|
||||
* Agent mode
|
||||
* OTLP receiver endpoint
|
||||
|
||||
Things considered unstable for 2.x:
|
||||
Things considered unstable for 3.x:
|
||||
|
||||
* Any feature listed as experimental or subject to change, including:
|
||||
* The [`holt_winters` PromQL function](https://github.com/prometheus/prometheus/issues/2458)
|
||||
* Remote write receiving, remote read and the remote read endpoint
|
||||
* The [`double_exponential_smoothing` PromQL function](https://github.com/prometheus/prometheus/issues/2458)
|
||||
* Remote read and the remote read endpoint
|
||||
* Server-side HTTPS and basic authentication
|
||||
* Service discovery integrations, with the exception of `static_configs` and `file_sd_configs`
|
||||
* Service discovery integrations, with the exception of `static_configs`, `file_sd_configs` and `http_sd_config`
|
||||
* Go APIs of packages that are part of the server
|
||||
* HTML generated by the web UI
|
||||
* The metrics in the /metrics endpoint of Prometheus itself
|
||||
* Exact on-disk format. Potential changes however, will be forward compatible and transparently handled by Prometheus
|
||||
* The format of the logs
|
||||
|
||||
Prometheus 2.x stability guarantees can be found [in the 2.x documentation](https://prometheus.io/docs/prometheus/2.55/stability/).
|
||||
|
||||
As long as you are not using any features marked as experimental/unstable, an
|
||||
upgrade within a major version can usually be performed without any operational
|
||||
adjustments and very little risk that anything will break. Any breaking changes
|
||||
|
Loading…
Reference in New Issue
Block a user