3.0 migration guide (#15099)

* docs: 2 to 3 migration guide

Signed-off-by: Jan Fajerski <jfajersk@redhat.com>

* docs/stability: add 3.0 section

Signed-off-by: Jan Fajerski <jfajersk@redhat.com>

* docs/migration: details on enabling legacy name validation

Signed-off-by: Owen Williams <owen.williams@grafana.com>\

* migration: add log format and `le` normalization

Signed-off-by: Jan Fajerski <jfajersk@redhat.com>

* migration: add new enable_http2 default for remote write

Signed-off-by: Jan Fajerski <jfajersk@redhat.com>

---------

Signed-off-by: Jan Fajerski <jfajersk@redhat.com>
Signed-off-by: Owen Williams <owen.williams@grafana.com>
Co-authored-by: Owen Williams <owen.williams@grafana.com>
This commit is contained in:
Jan Fajerski 2024-10-25 12:30:13 +02:00 committed by GitHub
parent 24a10528ac
commit f131cdd4c5
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
2 changed files with 184 additions and 180 deletions

View File

@ -3,198 +3,198 @@ title: Migration
sort_rank: 10
---
# Prometheus 2.0 migration guide
# Prometheus 3.0 migration guide
In line with our [stability promise](https://prometheus.io/blog/2016/07/18/prometheus-1-0-released/#fine-print),
the Prometheus 2.0 release contains a number of backwards incompatible changes.
This document offers guidance on migrating from Prometheus 1.8 to Prometheus 2.0 and newer versions.
In line with our [stability promise](https://prometheus.io/docs/prometheus/latest/stability/),
the Prometheus 3.0 release contains a number of backwards incompatible changes.
This document offers guidance on migrating from Prometheus 2.x to Prometheus 3.0 and newer versions.
## Flags
The format of Prometheus command line flags has changed. Instead of a
single dash, all flags now use a double dash. Common flags (`--config.file`,
`--web.listen-address` and `--web.external-url`) remain but
almost all storage-related flags have been removed.
- The following feature flags have been removed and they have been added to the
default behavior of Prometheus v3:
- `promql-at-modifier`
- `promql-negative-offset`
- `remote-write-receiver`
- `new-service-discovery-manager`
- `expand-external-labels`
Environment variable references `${var}` or `$var` in external label values
are replaced according to the values of the current environment variables.
References to undefined variables are replaced by the empty string.
The `$` character can be escaped by using `$$`.
- `no-default-scrape-port`
Prometheus v3 will no longer add ports to scrape targets according to the
specified scheme. Target will now appear in labels as configured.
If you rely on scrape targets like
`https://example.com/metrics` or `http://exmaple.com/metrics` to be
represented as `https://example.com/metrics:443` and
`http://example.com/metrics:80` respectively, add them to your target URLs
- `agent`
Instead use the dedicated `--agent` cli flag.
Some notable flags which have been removed:
Prometheus v3 will log a warning if you continue to pass these to
`--enable-feature`.
- `-alertmanager.url` In Prometheus 2.0, the command line flags for configuring
a static Alertmanager URL have been removed. Alertmanager must now be
discovered via service discovery, see [Alertmanager service discovery](#alertmanager-service-discovery).
## Configuration
- `-log.format` In Prometheus 2.0 logs can only be streamed to standard error.
- `-query.staleness-delta` has been renamed to `--query.lookback-delta`; Prometheus
2.0 introduces a new mechanism for handling staleness, see [staleness](querying/basics.md#staleness).
- `-storage.local.*` Prometheus 2.0 introduces a new storage engine; as such all
flags relating to the old engine have been removed. For information on the
new engine, see [Storage](#storage).
- `-storage.remote.*` Prometheus 2.0 has removed the deprecated remote
storage flags, and will fail to start if they are supplied. To write to
InfluxDB, Graphite, or OpenTSDB use the relevant storage adapter.
## Alertmanager service discovery
Alertmanager service discovery was introduced in Prometheus 1.4, allowing Prometheus
to dynamically discover Alertmanager replicas using the same mechanism as scrape
targets. In Prometheus 2.0, the command line flags for static Alertmanager config
have been removed, so the following command line flag:
```
./prometheus -alertmanager.url=http://alertmanager:9093/
```
Would be replaced with the following in the `prometheus.yml` config file:
```yaml
alerting:
alertmanagers:
- static_configs:
- targets:
- alertmanager:9093
```
You can also use all the usual Prometheus service discovery integrations and
relabeling in your Alertmanager configuration. This snippet instructs
Prometheus to search for Kubernetes pods, in the `default` namespace, with the
label `name: alertmanager` and with a non-empty port.
```yaml
alerting:
alertmanagers:
- kubernetes_sd_configs:
- role: pod
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels: [__meta_kubernetes_pod_label_name]
regex: alertmanager
action: keep
- source_labels: [__meta_kubernetes_namespace]
regex: default
action: keep
- source_labels: [__meta_kubernetes_pod_container_port_number]
regex:
action: drop
```
## Recording rules and alerts
The format for configuring alerting and recording rules has been changed to YAML.
An example of a recording rule and alert in the old format:
```
job:request_duration_seconds:histogram_quantile99 =
histogram_quantile(0.99, sum by (le, job) (rate(request_duration_seconds_bucket[1m])))
ALERT FrontendRequestLatency
IF job:request_duration_seconds:histogram_quantile99{job="frontend"} > 0.1
FOR 5m
ANNOTATIONS {
summary = "High frontend request latency",
}
```
Would look like this:
```yaml
groups:
- name: example.rules
rules:
- record: job:request_duration_seconds:histogram_quantile99
expr: histogram_quantile(0.99, sum by (le, job) (rate(request_duration_seconds_bucket[1m])))
- alert: FrontendRequestLatency
expr: job:request_duration_seconds:histogram_quantile99{job="frontend"} > 0.1
for: 5m
annotations:
summary: High frontend request latency
```
To help with the change, the `promtool` tool has a mode to automate the rules conversion. Given a `.rules` file, it will output a `.rules.yml` file in the
new format. For example:
```
$ promtool update rules example.rules
```
You will need to use `promtool` from [Prometheus 2.5](https://github.com/prometheus/prometheus/releases/tag/v2.5.0) as later versions no longer contain the above subcommand.
## Storage
The data format in Prometheus 2.0 has completely changed and is not backwards
compatible with 1.8 and older versions. To retain access to your historic monitoring data we
recommend you run a non-scraping Prometheus instance running at least version
1.8.1 in parallel with your Prometheus 2.0 instance, and have the new server
read existing data from the old one via the remote read protocol.
Your Prometheus 1.8 instance should be started with the following flags and an
config file containing only the `external_labels` setting (if any):
```
$ ./prometheus-1.8.1.linux-amd64/prometheus -web.listen-address ":9094" -config.file old.yml
```
Prometheus 2.0 can then be started (on the same machine) with the following flags:
```
$ ./prometheus-2.0.0.linux-amd64/prometheus --config.file prometheus.yml
```
Where `prometheus.yml` contains in addition to your full existing configuration, the stanza:
```yaml
remote_read:
- url: "http://localhost:9094/api/v1/read"
```
- The scrape job level configuration option `scrape_classic_histograms` has been
renamed to `always_scrape_classic_histograms`. If you use the
`--enable-feature=native-histograms` feature flag to ingest native histograms
and you also want to ingest classic histograms that an endpoint might expose
along with native histograms, be sure to add this configuration or change your
configuration from the old name.
- The `http_config.enable_http2` in `remote_write` items default has been
changed to `false`. In Prometheus v2 the remote write http client would
default to use http2. In order to parallelize multiple remote write queues
across multiple sockets its preferable to not default to http2.
If you prefer to use http2 for remote write you must now set
`http_config.enable_http2: true` in your `remote_write` configuration section.
## PromQL
The following features have been removed from PromQL:
- The `.` pattern in regular expressions in PromQL matches newline characters.
With this change a regular expressions like `.*` matches strings that include
`\n`. This applies to matchers in queries and relabel configs. For example the
following regular expressions now match the accompanying strings, wheras in
Prometheus v2 these combinations didn't match.
- `drop_common_labels` function - the `without` aggregation modifier should be used
instead.
- `keep_common` aggregation modifier - the `by` modifier should be used instead.
- `count_scalar` function - use cases are better handled by `absent()` or correct
propagation of labels in operations.
| Regex | Additional matches |
| ----- | ------ |
| ".*" | "foo\n", "Foo\nBar" |
| "foo.?bar" | "foo\nbar" |
| "foo.+bar" | "foo\nbar" |
See [issue #3060](https://github.com/prometheus/prometheus/issues/3060) for more
details.
If you want Prometheus v3 to behave like v2 did, you will have to change your
regular expressions by replacing all `.` patterns with `[^\n]`, e.g.
`foo[^\n]*`.
- Lookback and range selectors are left open and right closed (previously left
closed and right closed). This change affects queries when the evaluation time
perfectly aligns with the sample timestamps. For example assume querying a
timeseries with even spaced samples exactly 1 minute apart. Before Prometheus
3.x, range query with `5m` will mostly return 5 samples. But if the query
evaluation aligns perfectly with a scrape, it would return 6 samples. In
Prometheus 3.x queries like this will always return 5 samples.
This change has likely few effects for everyday use, except for some sub query
use cases.
Query front-ends that align queries usually align sub-queries to multiples of
the step size. These sub queries will likely be affected.
Tests are more likely to affected. To fix those either adjust the expected
number of samples or extend to range by less then one sample interval.
- The `holt_winters` function has been renamed to `double_exponential_smoothing`
and is now guarded by the `promql-experimental-functions` feature flag.
If you want to keep using holt_winters, you have to do both of these things:
- Rename holt_winters to double_exponential_smoothing in your queries.
- Pass `--enable-feature=promql-experimental-functions` in your Prometheus
cli invocation..
## Scrape protocols
Prometheus v3 is more strict concerning the Content-Type header received when
scraping. Prometheus v2 would default to the standard Prometheus text protocol
if the target being scraped did not specify a Content-Type header or if the
header was unparsable or unrecognised. This could lead to incorrect data being
parsed in the scrape. Prometheus v3 will now fail the scrape in such cases.
If a scrape target is not providing the correct Content-Type header the
fallback protocol can be specified using the fallback_scrape_protocol
parameter. See [Prometheus scrape_config documentation.](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config)
This is a breaking change as scrapes that may have succeeded with Prometheus v2
may now fail if this fallback protocol is not specified.
## Miscellaneous
### Prometheus non-root user
### TSDB format and downgrade
The TSDB format has been changed in Prometheus v2.55 in preparation for changes
to the index format. Consequently a Prometheus v3 tsdb can only be read by a
Prometheus v2.55 or newer.
Before upgrading to Prometheus v3 please upgrade to v2.55 first and confirm
Prometheus works as expected. Only then continue with the upgrade to v3.
The Prometheus Docker image is now built to [run Prometheus
as a non-root user](https://github.com/prometheus/prometheus/pull/2859). If you
want the Prometheus UI/API to listen on a low port number (say, port 80), you'll
need to override it. For Kubernetes, you would use the following YAML:
### TSDB Storage contract
TSDB compatible storage is now expected to return results matching the specified
selectors. This might impact some third party implementations, most likely
implementing `remote_read`.
This contract is not explicitly enforced, but can cause undefined behavior.
### UTF-8 names
Prometheus v3 supports UTF-8 in metric and label names. This means metric and
label names can change after upgrading according to what is exposed by
endpoints. Furthermore, metric and label names that would have previously been
flagged as invalid no longer will be.
Users wishing to preserve the original validation behavior can update their
prometheus yaml configuration to specify the legacy validation scheme:
```
global:
metric_name_validation_scheme: legacy
```
Or on a per-scrape basis:
```
scrape_configs:
- job_name: job1
metric_name_validation_scheme: utf8
- job_name: job2
metric_name_validation_scheme: legacy
```
### Log message format
Prometheus v3 has adopted `log/slog` over the previous `go-kit/log`. This
results in a change of log message format. An example of the old log format is:
```
ts=2024-10-23T22:01:06.074Z caller=main.go:627 level=info msg="No time or size retention was set so using the default time retention" duration=15d
ts=2024-10-23T22:01:06.074Z caller=main.go:671 level=info msg="Starting Prometheus Server" mode=server version="(version=, branch=, revision=91d80252c3e528728b0f88d254dd720f6be07cb8-modified)"
ts=2024-10-23T22:01:06.074Z caller=main.go:676 level=info build_context="(go=go1.23.0, platform=linux/amd64, user=, date=, tags=unknown)"
ts=2024-10-23T22:01:06.074Z caller=main.go:677 level=info host_details="(Linux 5.15.0-124-generic #134-Ubuntu SMP Fri Sep 27 20:20:17 UTC 2024 x86_64 gigafips (none))"
```
a similar sequence in the new log format looks like this:
```
time=2024-10-24T00:03:07.542+02:00 level=INFO source=/home/user/go/src/github.com/prometheus/prometheus/cmd/prometheus/main.go:640 msg="No time or size retention was set so using the default time retention" duration=15d
time=2024-10-24T00:03:07.542+02:00 level=INFO source=/home/user/go/src/github.com/prometheus/prometheus/cmd/prometheus/main.go:681 msg="Starting Prometheus Server" mode=server version="(version=, branch=, revision=7c7116fea8343795cae6da42960cacd0207a2af8)"
time=2024-10-24T00:03:07.542+02:00 level=INFO source=/home/user/go/src/github.com/prometheus/prometheus/cmd/prometheus/main.go:686 msg="operational information" build_context="(go=go1.23.0, platform=linux/amd64, user=, date=, tags=unknown)" host_details="(Linux 5.15.0-124-generic #134-Ubuntu SMP Fri Sep 27 20:20:17 UTC 2024 x86_64 gigafips (none))" fd_limits="(soft=1048576, hard=1048576)" vm_limits="(soft=unlimited, hard=unlimited)"
```
### `le` and `quantile` label values
In Prometheus v3, the values of the `le` label of classic histograms and the
`quantile` label of summaries are normalized upon ingestions. In Prometheus v2
the value of these labels depended on the scrape protocol (protobuf vs text
format) in some situations. This led to label values changing based on the
scrape protocol. E.g. a metric exposed as `my_classic_hist{le="1"}` would be
ingested as `my_classic_hist{le="1"}` via the text format, but as
`my_classic_hist{le="1.0"}` via protobuf. This changed the identity of the
metric and caused problems when querying the metric.
In Prometheus v3 these label values will always be normalized to a float like
representation. I.e. the above example will always result in
`my_classic_hist{le="1.0"}` being ingested into prometheus, no matter via which
protocol. The effect of this change is that alerts, recording rules and
dashboards that directly reference label values as whole numbers such as
`le="1"` will stop working.
Ways to deal with this change either globally or on a per metric basis:
- Fix references to integer `le`, `quantile` label values, but otherwise do
nothing and accept that some queries that span the transition time will produce
inaccurate or unexpected results.
_This is the recommended solution._
- Use `metric_relabel_config` to retain the old labels when scraping targets.
This should **only** be applied to metrics that currently produce such labels.
```yaml
apiVersion: v1
kind: Pod
metadata:
name: security-context-demo-2
spec:
securityContext:
runAsUser: 0
...
metric_relabel_configs:
- source_labels:
- quantile
target_label: quantile
regex: (\d+)\.0+
- source_labels:
- le
- __name__
target_label: le
regex: (\d+)\.0+;.*_bucket
```
See [Configure a Security Context for a Pod or Container](https://kubernetes.io/docs/tasks/configure-pod-container/security-context/)
for more details.
# Prometheus 2.0 migration guide
If you're using Docker, then the following snippet would be used:
```
docker run -p 9090:9090 prom/prometheus:latest
```
### Prometheus lifecycle
If you use the Prometheus `/-/reload` HTTP endpoint to [automatically reload your
Prometheus config when it changes](configuration/configuration.md),
these endpoints are disabled by default for security reasons in Prometheus 2.0.
To enable them, set the `--web.enable-lifecycle` flag.
For the Prometheus 1.8 to 2.0 please refer to the [Prometheus v2.55 documentation](https://prometheus.io/docs/prometheus/2.55/migration/).

View File

@ -9,7 +9,7 @@ Prometheus promises API stability within a major version, and strives to avoid
breaking changes for key features. Some features, which are cosmetic, still
under development, or depend on 3rd party services, are not covered by this.
Things considered stable for 2.x:
Things considered stable for 3.x:
* The query language and data model
* Alerting and recording rules
@ -18,21 +18,25 @@ Things considered stable for 2.x:
* Configuration file format (minus the service discovery remote read/write, see below)
* Rule/alert file format
* Console template syntax and semantics
* Remote write sending, per the [1.0 specification](https://prometheus.io/docs/concepts/remote_write_spec/).
* Remote write sending, per the [1.0 specification](https://prometheus.io/docs/concepts/remote_write_spec/) and receiving
* Agent mode
* OTLP receiver endpoint
Things considered unstable for 2.x:
Things considered unstable for 3.x:
* Any feature listed as experimental or subject to change, including:
* The [`holt_winters` PromQL function](https://github.com/prometheus/prometheus/issues/2458)
* Remote write receiving, remote read and the remote read endpoint
* The [`double_exponential_smoothing` PromQL function](https://github.com/prometheus/prometheus/issues/2458)
* Remote read and the remote read endpoint
* Server-side HTTPS and basic authentication
* Service discovery integrations, with the exception of `static_configs` and `file_sd_configs`
* Service discovery integrations, with the exception of `static_configs`, `file_sd_configs` and `http_sd_config`
* Go APIs of packages that are part of the server
* HTML generated by the web UI
* The metrics in the /metrics endpoint of Prometheus itself
* Exact on-disk format. Potential changes however, will be forward compatible and transparently handled by Prometheus
* The format of the logs
Prometheus 2.x stability guarantees can be found [in the 2.x documentation](https://prometheus.io/docs/prometheus/2.55/stability/).
As long as you are not using any features marked as experimental/unstable, an
upgrade within a major version can usually be performed without any operational
adjustments and very little risk that anything will break. Any breaking changes