parent
299802dfd0
commit
e6cdc2d355
@ -0,0 +1,417 @@ |
||||
--- |
||||
title: HTTP API |
||||
sort_rank: 7 |
||||
--- |
||||
|
||||
# HTTP API |
||||
|
||||
The current stable HTTP API is reachable under `/api/v1` on a Prometheus |
||||
server. Any non-breaking additions will be added under that endpoint. |
||||
|
||||
## Format overview |
||||
|
||||
The API response format is JSON. Every successful API request returns a `2xx` |
||||
status code. |
||||
|
||||
Invalid requests that reach the API handlers return a JSON error object |
||||
and one of the following HTTP response codes: |
||||
|
||||
- `400 Bad Request` when parameters are missing or incorrect. |
||||
- `422 Unprocessable Entity` when an expression can't be executed |
||||
([RFC4918](http://tools.ietf.org/html/rfc4918#page-78)). |
||||
- `503 Service Unavailable` when queries time out or abort. |
||||
|
||||
Other non-`2xx` codes may be returned for errors occurring before the API |
||||
endpoint is reached. |
||||
|
||||
The JSON response envelope format is as follows: |
||||
|
||||
``` |
||||
{ |
||||
"status": "success" | "error", |
||||
"data": <data>, |
||||
|
||||
// Only set if status is "error". The data field may still hold |
||||
// additional data. |
||||
"errorType": "<string>", |
||||
"error": "<string>" |
||||
} |
||||
``` |
||||
|
||||
Input timestamps may be provided either in |
||||
[RFC3339](https://www.ietf.org/rfc/rfc3339.txt) format or as a Unix timestamp |
||||
in seconds, with optional decimal places for sub-second precision. Output |
||||
timestamps are always represented as Unix timestamps in seconds. |
||||
|
||||
Names of query parameters that may be repeated end with `[]`. |
||||
|
||||
`<series_selector>` placeholders refer to Prometheus [time series |
||||
selectors](basics.md#time-series-selectors) like `http_requests_total` or |
||||
`http_requests_total{method=~"^GET|POST$"}` and need to be URL-encoded. |
||||
|
||||
`<duration>` placeholders refer to Prometheus duration strings of the form |
||||
`[0-9]+[smhdwy]`. For example, `5m` refers to a duration of 5 minutes. |
||||
|
||||
## Expression queries |
||||
|
||||
Query language expressions may be evaluated at a single instant or over a range |
||||
of time. The sections below describe the API endpoints for each type of |
||||
expression query. |
||||
|
||||
### Instant queries |
||||
|
||||
The following endpoint evaluates an instant query at a single point in time: |
||||
|
||||
``` |
||||
GET /api/v1/query |
||||
``` |
||||
|
||||
URL query parameters: |
||||
|
||||
- `query=<string>`: Prometheus expression query string. |
||||
- `time=<rfc3339 | unix_timestamp>`: Evaluation timestamp. Optional. |
||||
- `timeout=<duration>`: Evaluation timeout. Optional. Defaults to and |
||||
is capped by the value of the `-query.timeout` flag. |
||||
|
||||
The current server time is used if the `time` parameter is omitted. |
||||
|
||||
The `data` section of the query result has the following format: |
||||
|
||||
``` |
||||
{ |
||||
"resultType": "matrix" | "vector" | "scalar" | "string", |
||||
"result": <value> |
||||
} |
||||
``` |
||||
|
||||
`<value>` refers to the query result data, which has varying formats |
||||
depending on the `resultType`. See the [expression query result |
||||
formats](#expression-query-result-formats). |
||||
|
||||
The following example evaluates the expression `up` at the time |
||||
`2015-07-01T20:10:51.781Z`: |
||||
|
||||
```json |
||||
$ curl 'http://localhost:9090/api/v1/query?query=up&time=2015-07-01T20:10:51.781Z' |
||||
{ |
||||
"status" : "success", |
||||
"data" : { |
||||
"resultType" : "vector", |
||||
"result" : [ |
||||
{ |
||||
"metric" : { |
||||
"__name__" : "up", |
||||
"job" : "prometheus", |
||||
"instance" : "localhost:9090" |
||||
}, |
||||
"value": [ 1435781451.781, "1" ] |
||||
}, |
||||
{ |
||||
"metric" : { |
||||
"__name__" : "up", |
||||
"job" : "node", |
||||
"instance" : "localhost:9100" |
||||
}, |
||||
"value" : [ 1435781451.781, "0" ] |
||||
} |
||||
] |
||||
} |
||||
} |
||||
``` |
||||
|
||||
### Range queries |
||||
|
||||
The following endpoint evaluates an expression query over a range of time: |
||||
|
||||
``` |
||||
GET /api/v1/query_range |
||||
``` |
||||
|
||||
URL query parameters: |
||||
|
||||
- `query=<string>`: Prometheus expression query string. |
||||
- `start=<rfc3339 | unix_timestamp>`: Start timestamp. |
||||
- `end=<rfc3339 | unix_timestamp>`: End timestamp. |
||||
- `step=<duration>`: Query resolution step width. |
||||
- `timeout=<duration>`: Evaluation timeout. Optional. Defaults to and |
||||
is capped by the value of the `-query.timeout` flag. |
||||
|
||||
The `data` section of the query result has the following format: |
||||
|
||||
``` |
||||
{ |
||||
"resultType": "matrix", |
||||
"result": <value> |
||||
} |
||||
``` |
||||
|
||||
For the format of the `<value>` placeholder, see the [range-vector result |
||||
format](#range-vectors). |
||||
|
||||
The following example evaluates the expression `up` over a 30-second range with |
||||
a query resolution of 15 seconds. |
||||
|
||||
```json |
||||
$ curl 'http://localhost:9090/api/v1/query_range?query=up&start=2015-07-01T20:10:30.781Z&end=2015-07-01T20:11:00.781Z&step=15s' |
||||
{ |
||||
"status" : "success", |
||||
"data" : { |
||||
"resultType" : "matrix", |
||||
"result" : [ |
||||
{ |
||||
"metric" : { |
||||
"__name__" : "up", |
||||
"job" : "prometheus", |
||||
"instance" : "localhost:9090" |
||||
}, |
||||
"values" : [ |
||||
[ 1435781430.781, "1" ], |
||||
[ 1435781445.781, "1" ], |
||||
[ 1435781460.781, "1" ] |
||||
] |
||||
}, |
||||
{ |
||||
"metric" : { |
||||
"__name__" : "up", |
||||
"job" : "node", |
||||
"instance" : "localhost:9091" |
||||
}, |
||||
"values" : [ |
||||
[ 1435781430.781, "0" ], |
||||
[ 1435781445.781, "0" ], |
||||
[ 1435781460.781, "1" ] |
||||
] |
||||
} |
||||
] |
||||
} |
||||
} |
||||
``` |
||||
|
||||
## Querying metadata |
||||
|
||||
### Finding series by label matchers |
||||
|
||||
The following endpoint returns the list of time series that match a certain label set. |
||||
|
||||
``` |
||||
GET /api/v1/series |
||||
``` |
||||
|
||||
URL query parameters: |
||||
|
||||
- `match[]=<series_selector>`: Repeated series selector argument that selects the |
||||
series to return. At least one `match[]` argument must be provided. |
||||
- `start=<rfc3339 | unix_timestamp>`: Start timestamp. |
||||
- `end=<rfc3339 | unix_timestamp>`: End timestamp. |
||||
|
||||
The `data` section of the query result consists of a list of objects that |
||||
contain the label name/value pairs which identify each series. |
||||
|
||||
The following example returns all series that match either of the selectors |
||||
`up` or `process_start_time_seconds{job="prometheus"}`: |
||||
|
||||
```json |
||||
$ curl -g 'http://localhost:9090/api/v1/series?match[]=up&match[]=process_start_time_seconds{job="prometheus"}' |
||||
{ |
||||
"status" : "success", |
||||
"data" : [ |
||||
{ |
||||
"__name__" : "up", |
||||
"job" : "prometheus", |
||||
"instance" : "localhost:9090" |
||||
}, |
||||
{ |
||||
"__name__" : "up", |
||||
"job" : "node", |
||||
"instance" : "localhost:9091" |
||||
}, |
||||
{ |
||||
"__name__" : "process_start_time_seconds", |
||||
"job" : "prometheus", |
||||
"instance" : "localhost:9090" |
||||
} |
||||
] |
||||
} |
||||
``` |
||||
|
||||
### Querying label values |
||||
|
||||
The following endpoint returns a list of label values for a provided label name: |
||||
|
||||
``` |
||||
GET /api/v1/label/<label_name>/values |
||||
``` |
||||
|
||||
The `data` section of the JSON response is a list of string label names. |
||||
|
||||
This example queries for all label values for the `job` label: |
||||
|
||||
```json |
||||
$ curl http://localhost:9090/api/v1/label/job/values |
||||
{ |
||||
"status" : "success", |
||||
"data" : [ |
||||
"node", |
||||
"prometheus" |
||||
] |
||||
} |
||||
``` |
||||
|
||||
## Deleting series |
||||
|
||||
The following endpoint deletes matched series entirely from a Prometheus server: |
||||
|
||||
``` |
||||
DELETE /api/v1/series |
||||
``` |
||||
|
||||
URL query parameters: |
||||
|
||||
- `match[]=<series_selector>`: Repeated label matcher argument that selects the |
||||
series to delete. At least one `match[]` argument must be provided. |
||||
|
||||
The `data` section of the JSON response has the following format: |
||||
|
||||
``` |
||||
{ |
||||
"numDeleted": <number of deleted series> |
||||
} |
||||
``` |
||||
|
||||
The following example deletes all series that match either of the selectors |
||||
`up` or `process_start_time_seconds{job="prometheus"}`: |
||||
|
||||
```json |
||||
$ curl -XDELETE -g 'http://localhost:9090/api/v1/series?match[]=up&match[]=process_start_time_seconds{job="prometheus"}' |
||||
{ |
||||
"status" : "success", |
||||
"data" : { |
||||
"numDeleted" : 3 |
||||
} |
||||
} |
||||
``` |
||||
|
||||
## Expression query result formats |
||||
|
||||
Expression queries may return the following response values in the `result` |
||||
property of the `data` section. `<sample_value>` placeholders are numeric |
||||
sample values. JSON does not support special float values such as `NaN`, `Inf`, |
||||
and `-Inf`, so sample values are transferred as quoted JSON strings rather than |
||||
raw numbers. |
||||
|
||||
### Range vectors |
||||
|
||||
Range vectors are returned as result type `matrix`. The corresponding |
||||
`result` property has the following format: |
||||
|
||||
``` |
||||
[ |
||||
{ |
||||
"metric": { "<label_name>": "<label_value>", ... }, |
||||
"values": [ [ <unix_time>, "<sample_value>" ], ... ] |
||||
}, |
||||
... |
||||
] |
||||
``` |
||||
|
||||
### Instant vectors |
||||
|
||||
Instant vectors are returned as result type `vector`. The corresponding |
||||
`result` property has the following format: |
||||
|
||||
``` |
||||
[ |
||||
{ |
||||
"metric": { "<label_name>": "<label_value>", ... }, |
||||
"value": [ <unix_time>, "<sample_value>" ] |
||||
}, |
||||
... |
||||
] |
||||
``` |
||||
|
||||
### Scalars |
||||
|
||||
Scalar results are returned as result type `scalar`. The corresponding |
||||
`result` property has the following format: |
||||
|
||||
``` |
||||
[ <unix_time>, "<scalar_value>" ] |
||||
``` |
||||
|
||||
### Strings |
||||
|
||||
String results are returned as result type `string`. The corresponding |
||||
`result` property has the following format: |
||||
|
||||
``` |
||||
[ <unix_time>, "<string_value>" ] |
||||
``` |
||||
|
||||
## Targets |
||||
|
||||
> This API is experimental as it is intended to be extended with targets |
||||
> dropped due to relabelling in the future. |
||||
|
||||
The following endpoint returns an overview of the current state of the |
||||
Prometheus target discovery: |
||||
|
||||
``` |
||||
GET /api/v1/targets |
||||
``` |
||||
|
||||
Currently only the active targets are part of the response. |
||||
|
||||
```json |
||||
$ curl http://localhost:9090/api/v1/targets |
||||
{ |
||||
"status": "success", [3/11] |
||||
"data": { |
||||
"activeTargets": [ |
||||
{ |
||||
"discoveredLabels": { |
||||
"__address__": "127.0.0.1:9090", |
||||
"__metrics_path__": "/metrics", |
||||
"__scheme__": "http", |
||||
"job": "prometheus" |
||||
}, |
||||
"labels": { |
||||
"instance": "127.0.0.1:9090", |
||||
"job": "prometheus" |
||||
}, |
||||
"scrapeUrl": "http://127.0.0.1:9090/metrics", |
||||
"lastError": "", |
||||
"lastScrape": "2017-01-17T15:07:44.723715405+01:00", |
||||
"health": "up" |
||||
} |
||||
] |
||||
} |
||||
} |
||||
``` |
||||
|
||||
## Alertmanagers |
||||
|
||||
> This API is experimental as it is intended to be extended with Alertmanagers |
||||
> dropped due to relabelling in the future. |
||||
|
||||
The following endpoint returns an overview of the current state of the |
||||
Prometheus alertmanager discovery: |
||||
|
||||
``` |
||||
GET /api/v1/alertmanagers |
||||
``` |
||||
|
||||
Currently only the active Alertmanagers are part of the response. |
||||
|
||||
```json |
||||
$ curl http://localhost:9090/api/v1/alertmanagers |
||||
{ |
||||
"status": "success", |
||||
"data": { |
||||
"activeAlertmanagers": [ |
||||
{ |
||||
"url": "http://127.0.0.1:9090/api/v1/alerts" |
||||
} |
||||
] |
||||
} |
||||
} |
||||
``` |
||||
@ -0,0 +1,215 @@ |
||||
--- |
||||
title: Querying basics |
||||
nav_title: Basics |
||||
sort_rank: 1 |
||||
--- |
||||
|
||||
# Querying Prometheus |
||||
|
||||
Prometheus provides a functional expression language that lets the user select |
||||
and aggregate time series data in real time. The result of an expression can |
||||
either be shown as a graph, viewed as tabular data in Prometheus's expression |
||||
browser, or consumed by external systems via the [HTTP API](api.md). |
||||
|
||||
## Examples |
||||
|
||||
This document is meant as a reference. For learning, it might be easier to |
||||
start with a couple of [examples](examples.md). |
||||
|
||||
## Expression language data types |
||||
|
||||
In Prometheus's expression language, an expression or sub-expression can |
||||
evaluate to one of four types: |
||||
|
||||
* **Instant vector** - a set of time series containing a single sample for each time series, all sharing the same timestamp |
||||
* **Range vector** - a set of time series containing a range of data points over time for each time series |
||||
* **Scalar** - a simple numeric floating point value |
||||
* **String** - a simple string value; currently unused |
||||
|
||||
Depending on the use-case (e.g. when graphing vs. displaying the output of an |
||||
expression), only some of these types are legal as the result from a |
||||
user-specified expression. For example, an expression that returns an instant |
||||
vector is the only type that can be directly graphed. |
||||
|
||||
## Literals |
||||
|
||||
### String literals |
||||
|
||||
Strings may be specified as literals in single quotes, double quotes or |
||||
backticks. |
||||
|
||||
PromQL follows the same [escaping rules as |
||||
Go](https://golang.org/ref/spec#String_literals). In single or double quotes a |
||||
backslash begins an escape sequence, which may be followed by `a`, `b`, `f`, |
||||
`n`, `r`, `t`, `v` or `\`. Specific characters can be provided using octal |
||||
(`\nnn`) or hexadecimal (`\xnn`, `\unnnn` and `\Unnnnnnnn`). |
||||
|
||||
No escaping is processed inside backticks. Unlike Go, Prometheus does not discard newlines inside backticks. |
||||
|
||||
Example: |
||||
|
||||
"this is a string" |
||||
'these are unescaped: \n \\ \t' |
||||
`these are not unescaped: \n ' " \t` |
||||
|
||||
### Float literals |
||||
|
||||
Scalar float values can be literally written as numbers of the form |
||||
`[-](digits)[.(digits)]`. |
||||
|
||||
-2.43 |
||||
|
||||
## Time series Selectors |
||||
|
||||
### Instant vector selectors |
||||
|
||||
Instant vector selectors allow the selection of a set of time series and a |
||||
single sample value for each at a given timestamp (instant): in the simplest |
||||
form, only a metric name is specified. This results in an instant vector |
||||
containing elements for all time series that have this metric name. |
||||
|
||||
This example selects all time series that have the `http_requests_total` metric |
||||
name: |
||||
|
||||
http_requests_total |
||||
|
||||
It is possible to filter these time series further by appending a set of labels |
||||
to match in curly braces (`{}`). |
||||
|
||||
This example selects only those time series with the `http_requests_total` |
||||
metric name that also have the `job` label set to `prometheus` and their |
||||
`group` label set to `canary`: |
||||
|
||||
http_requests_total{job="prometheus",group="canary"} |
||||
|
||||
It is also possible to negatively match a label value, or to match label values |
||||
against regular expressions. The following label matching operators exist: |
||||
|
||||
* `=`: Select labels that are exactly equal to the provided string. |
||||
* `!=`: Select labels that are not equal to the provided string. |
||||
* `=~`: Select labels that regex-match the provided string (or substring). |
||||
* `!~`: Select labels that do not regex-match the provided string (or substring). |
||||
|
||||
For example, this selects all `http_requests_total` time series for `staging`, |
||||
`testing`, and `development` environments and HTTP methods other than `GET`. |
||||
|
||||
http_requests_total{environment=~"staging|testing|development",method!="GET"} |
||||
|
||||
Label matchers that match empty label values also select all time series that do |
||||
not have the specific label set at all. Regex-matches are fully anchored. |
||||
|
||||
Vector selectors must either specify a name or at least one label matcher |
||||
that does not match the empty string. The following expression is illegal: |
||||
|
||||
{job=~".*"} # Bad! |
||||
|
||||
In contrast, these expressions are valid as they both have a selector that does not |
||||
match empty label values. |
||||
|
||||
{job=~".+"} # Good! |
||||
{job=~".*",method="get"} # Good! |
||||
|
||||
Label matchers can also be applied to metric names by matching against the internal |
||||
`__name__` label. For example, the expression `http_requests_total` is equivalent to |
||||
`{__name__="http_requests_total"}`. Matchers other than `=` (`!=`, `=~`, `!~`) may also be used. |
||||
The following expression selects all metrics that have a name starting with `job:`: |
||||
|
||||
{__name__=~"^job:.*"} |
||||
|
||||
### Range Vector Selectors |
||||
|
||||
Range vector literals work like instant vector literals, except that they |
||||
select a range of samples back from the current instant. Syntactically, a range |
||||
duration is appended in square brackets (`[]`) at the end of a vector selector |
||||
to specify how far back in time values should be fetched for each resulting |
||||
range vector element. |
||||
|
||||
Time durations are specified as a number, followed immediately by one of the |
||||
following units: |
||||
|
||||
* `s` - seconds |
||||
* `m` - minutes |
||||
* `h` - hours |
||||
* `d` - days |
||||
* `w` - weeks |
||||
* `y` - years |
||||
|
||||
In this example, we select all the values we have recorded within the last 5 |
||||
minutes for all time series that have the metric name `http_requests_total` and |
||||
a `job` label set to `prometheus`: |
||||
|
||||
http_requests_total{job="prometheus"}[5m] |
||||
|
||||
### Offset modifier |
||||
|
||||
The `offset` modifier allows changing the time offset for individual |
||||
instant and range vectors in a query. |
||||
|
||||
For example, the following expression returns the value of |
||||
`http_requests_total` 5 minutes in the past relative to the current |
||||
query evaluation time: |
||||
|
||||
http_requests_total offset 5m |
||||
|
||||
Note that the `offset` modifier always needs to follow the selector |
||||
immediately, i.e. the following would be correct: |
||||
|
||||
sum(http_requests_total{method="GET"} offset 5m) // GOOD. |
||||
|
||||
While the following would be *incorrect*: |
||||
|
||||
sum(http_requests_total{method="GET"}) offset 5m // INVALID. |
||||
|
||||
The same works for range vectors. This returns the 5-minutes rate that |
||||
`http_requests_total` had a week ago: |
||||
|
||||
rate(http_requests_total[5m] offset 1w) |
||||
|
||||
## Operators |
||||
|
||||
Prometheus supports many binary and aggregation operators. These are described |
||||
in detail in the [expression language operators](operators.md) page. |
||||
|
||||
## Functions |
||||
|
||||
Prometheus supports several functions to operate on data. These are described |
||||
in detail in the [expression language functions](functions.md) page. |
||||
|
||||
## Gotchas |
||||
|
||||
### Interpolation and staleness |
||||
|
||||
When queries are run, timestamps at which to sample data are selected |
||||
independently of the actual present time series data. This is mainly to support |
||||
cases like aggregation (`sum`, `avg`, and so on), where multiple aggregated |
||||
time series do not exactly align in time. Because of their independence, |
||||
Prometheus needs to assign a value at those timestamps for each relevant time |
||||
series. It does so by simply taking the newest sample before this timestamp. |
||||
|
||||
If no stored sample is found (by default) 5 minutes before a sampling timestamp, |
||||
no value is assigned for this time series at this point in time. This |
||||
effectively means that time series "disappear" from graphs at times where their |
||||
latest collected sample is older than 5 minutes. |
||||
|
||||
NOTE: <b>NOTE:</b> Staleness and interpolation handling might change. See |
||||
https://github.com/prometheus/prometheus/issues/398 and |
||||
https://github.com/prometheus/prometheus/issues/581. |
||||
|
||||
### Avoiding slow queries and overloads |
||||
|
||||
If a query needs to operate on a very large amount of data, graphing it might |
||||
time out or overload the server or browser. Thus, when constructing queries |
||||
over unknown data, always start building the query in the tabular view of |
||||
Prometheus's expression browser until the result set seems reasonable |
||||
(hundreds, not thousands, of time series at most). Only when you have filtered |
||||
or aggregated your data sufficiently, switch to graph mode. If the expression |
||||
still takes too long to graph ad-hoc, pre-record it via a [recording |
||||
rule](rules.md#recording-rules). |
||||
|
||||
This is especially relevant for Prometheus's query language, where a bare |
||||
metric name selector like `api_http_requests_total` could expand to thousands |
||||
of time series with different labels. Also keep in mind that expressions which |
||||
aggregate over many time series will generate load on the server even if the |
||||
output is only a small number of time series. This is similar to how it would |
||||
be slow to sum all values of a column in a relational database, even if the |
||||
output value is only a single number. |
||||
@ -0,0 +1,83 @@ |
||||
--- |
||||
title: Querying examples |
||||
nav_title: Examples |
||||
sort_rank: 4 |
||||
--- |
||||
|
||||
# Query examples |
||||
|
||||
## Simple time series selection |
||||
|
||||
Return all time series with the metric `http_requests_total`: |
||||
|
||||
http_requests_total |
||||
|
||||
Return all time series with the metric `http_requests_total` and the given |
||||
`job` and `handler` labels: |
||||
|
||||
http_requests_total{job="apiserver", handler="/api/comments"} |
||||
|
||||
Return a whole range of time (in this case 5 minutes) for the same vector, |
||||
making it a range vector: |
||||
|
||||
http_requests_total{job="apiserver", handler="/api/comments"}[5m] |
||||
|
||||
Note that an expression resulting in a range vector cannot be graphed directly, |
||||
but viewed in the tabular ("Console") view of the expression browser. |
||||
|
||||
Using regular expressions, you could select time series only for jobs whose |
||||
name match a certain pattern, in this case, all jobs that end with `server`. |
||||
Note that this does a substring match, not a full string match: |
||||
|
||||
http_requests_total{job=~"server$"} |
||||
|
||||
To select all HTTP status codes except 4xx ones, you could run: |
||||
|
||||
http_requests_total{status!~"^4..$"} |
||||
|
||||
## Using functions, operators, etc. |
||||
|
||||
Return the per-second rate for all time series with the `http_requests_total` |
||||
metric name, as measured over the last 5 minutes: |
||||
|
||||
rate(http_requests_total[5m]) |
||||
|
||||
Assuming that the `http_requests_total` time series all have the labels `job` |
||||
(fanout by job name) and `instance` (fanout by instance of the job), we might |
||||
want to sum over the rate of all instances, so we get fewer output time series, |
||||
but still preserve the `job` dimension: |
||||
|
||||
sum(rate(http_requests_total[5m])) by (job) |
||||
|
||||
If we have two different metrics with the same dimensional labels, we can apply |
||||
binary operators to them and elements on both sides with the same label set |
||||
will get matched and propagated to the output. For example, this expression |
||||
returns the unused memory in MiB for every instance (on a fictional cluster |
||||
scheduler exposing these metrics about the instances it runs): |
||||
|
||||
(instance_memory_limit_bytes - instance_memory_usage_bytes) / 1024 / 1024 |
||||
|
||||
The same expression, but summed by application, could be written like this: |
||||
|
||||
sum( |
||||
instance_memory_limit_bytes - instance_memory_usage_bytes |
||||
) by (app, proc) / 1024 / 1024 |
||||
|
||||
If the same fictional cluster scheduler exposed CPU usage metrics like the |
||||
following for every instance: |
||||
|
||||
instance_cpu_time_ns{app="lion", proc="web", rev="34d0f99", env="prod", job="cluster-manager"} |
||||
instance_cpu_time_ns{app="elephant", proc="worker", rev="34d0f99", env="prod", job="cluster-manager"} |
||||
instance_cpu_time_ns{app="turtle", proc="api", rev="4d3a513", env="prod", job="cluster-manager"} |
||||
instance_cpu_time_ns{app="fox", proc="widget", rev="4d3a513", env="prod", job="cluster-manager"} |
||||
... |
||||
|
||||
...we could get the top 3 CPU users grouped by application (`app`) and process |
||||
type (`proc`) like this: |
||||
|
||||
topk(3, sum(rate(instance_cpu_time_ns[5m])) by (app, proc)) |
||||
|
||||
Assuming this metric contains one time series per running instance, you could |
||||
count the number of running instances per application like this: |
||||
|
||||
count(instance_cpu_time_ns) by (app) |
||||
@ -0,0 +1,408 @@ |
||||
--- |
||||
title: Query functions |
||||
nav_title: Functions |
||||
sort_rank: 3 |
||||
--- |
||||
|
||||
# Functions |
||||
|
||||
Some functions have default arguments, e.g. `year(v=vector(time()) |
||||
instant-vector)`. This means that there is one argument `v` which is an instant |
||||
vector, which if not provided it will default to the value of the expression |
||||
`vector(time())`. |
||||
|
||||
## `abs()` |
||||
|
||||
`abs(v instant-vector)` returns the input vector with all sample values converted to |
||||
their absolute value. |
||||
|
||||
## `absent()` |
||||
|
||||
`absent(v instant-vector)` returns an empty vector if the vector passed to it |
||||
has any elements and a 1-element vector with the value 1 if the vector passed to |
||||
it has no elements. |
||||
|
||||
This is useful for alerting on when no time series exist for a given metric name |
||||
and label combination. |
||||
|
||||
``` |
||||
absent(nonexistent{job="myjob"}) |
||||
# => {job="myjob"} |
||||
|
||||
absent(nonexistent{job="myjob",instance=~".*"}) |
||||
# => {job="myjob"} |
||||
|
||||
absent(sum(nonexistent{job="myjob"})) |
||||
# => {} |
||||
``` |
||||
|
||||
In the second example, `absent()` tries to be smart about deriving labels of the |
||||
1-element output vector from the input vector. |
||||
|
||||
## `ceil()` |
||||
|
||||
`ceil(v instant-vector)` rounds the sample values of all elements in `v` up to |
||||
the nearest integer. |
||||
|
||||
## `changes()` |
||||
|
||||
For each input time series, `changes(v range-vector)` returns the number of |
||||
times its value has changed within the provided time range as an instant |
||||
vector. |
||||
|
||||
## `clamp_max()` |
||||
|
||||
`clamp_max(v instant-vector, max scalar)` clamps the sample values of all |
||||
elements in `v` to have an upper limit of `max`. |
||||
|
||||
## `clamp_min()` |
||||
|
||||
`clamp_min(v instant-vector, min scalar)` clamps the sample values of all |
||||
elements in `v` to have a lower limit of `min`. |
||||
|
||||
## `count_scalar()` |
||||
|
||||
`count_scalar(v instant-vector)` returns the number of elements in a time series |
||||
vector as a scalar. This is in contrast to the `count()` |
||||
[aggregation operator](operators.md#aggregation-operators), which |
||||
always returns a vector (an empty one if the input vector is empty) and allows |
||||
grouping by labels via a `by` clause. |
||||
|
||||
## `day_of_month()` |
||||
|
||||
`day_of_month(v=vector(time()) instant-vector)` returns the day of the month |
||||
for each of the given times in UTC. Returned values are from 1 to 31. |
||||
|
||||
## `day_of_week()` |
||||
|
||||
`day_of_week(v=vector(time()) instant-vector)` returns the day of the week for |
||||
each of the given times in UTC. Returned values are from 0 to 6, where 0 means |
||||
Sunday etc. |
||||
|
||||
## `days_in_month()` |
||||
|
||||
`days_in_month(v=vector(time()) instant-vector)` returns number of days in the |
||||
month for each of the given times in UTC. Returned values are from 28 to 31. |
||||
|
||||
## `delta()` |
||||
|
||||
`delta(v range-vector)` calculates the difference between the |
||||
first and last value of each time series element in a range vector `v`, |
||||
returning an instant vector with the given deltas and equivalent labels. |
||||
The delta is extrapolated to cover the full time range as specified in |
||||
the range vector selector, so that it is possible to get a non-integer |
||||
result even if the sample values are all integers. |
||||
|
||||
The following example expression returns the difference in CPU temperature |
||||
between now and 2 hours ago: |
||||
|
||||
``` |
||||
delta(cpu_temp_celsius{host="zeus"}[2h]) |
||||
``` |
||||
|
||||
`delta` should only be used with gauges. |
||||
|
||||
## `deriv()` |
||||
|
||||
`deriv(v range-vector)` calculates the per-second derivative of the time series in a range |
||||
vector `v`, using [simple linear regression](http://en.wikipedia.org/wiki/Simple_linear_regression). |
||||
|
||||
`deriv` should only be used with gauges. |
||||
|
||||
## `drop_common_labels()` |
||||
|
||||
`drop_common_labels(instant-vector)` drops all labels that have the same name |
||||
and value across all series in the input vector. |
||||
|
||||
## `exp()` |
||||
|
||||
`exp(v instant-vector)` calculates the exponential function for all elements in `v`. |
||||
Special cases are: |
||||
|
||||
* `Exp(+Inf) = +Inf` |
||||
* `Exp(NaN) = NaN` |
||||
|
||||
## `floor()` |
||||
|
||||
`floor(v instant-vector)` rounds the sample values of all elements in `v` down |
||||
to the nearest integer. |
||||
|
||||
## `histogram_quantile()` |
||||
|
||||
`histogram_quantile(φ float, b instant-vector)` calculates the φ-quantile (0 ≤ φ |
||||
≤ 1) from the buckets `b` of a |
||||
[histogram](https://prometheus.io/docs/concepts/metric_types/#histogram). (See |
||||
[histograms and summaries](https://prometheus.io/docs/practices/histograms) for |
||||
a detailed explanation of φ-quantiles and the usage of the histogram metric type |
||||
in general.) The samples in `b` are the counts of observations in each bucket. |
||||
Each sample must have a label `le` where the label value denotes the inclusive |
||||
upper bound of the bucket. (Samples without such a label are silently ignored.) |
||||
The [histogram metric type](https://prometheus.io/docs/concepts/metric_types/#histogram) |
||||
automatically provides time series with the `_bucket` suffix and the appropriate |
||||
labels. |
||||
|
||||
Use the `rate()` function to specify the time window for the quantile |
||||
calculation. |
||||
|
||||
Example: A histogram metric is called `http_request_duration_seconds`. To |
||||
calculate the 90th percentile of request durations over the last 10m, use the |
||||
following expression: |
||||
|
||||
histogram_quantile(0.9, rate(http_request_duration_seconds_bucket[10m])) |
||||
|
||||
The quantile is calculated for each label combination in |
||||
`http_request_duration_seconds`. To aggregate, use the `sum()` aggregator |
||||
around the `rate()` function. Since the `le` label is required by |
||||
`histogram_quantile()`, it has to be included in the `by` clause. The following |
||||
expression aggregates the 90th percentile by `job`: |
||||
|
||||
histogram_quantile(0.9, sum(rate(http_request_duration_seconds_bucket[10m])) by (job, le)) |
||||
|
||||
To aggregate everything, specify only the `le` label: |
||||
|
||||
histogram_quantile(0.9, sum(rate(http_request_duration_seconds_bucket[10m])) by (le)) |
||||
|
||||
The `histogram_quantile()` function interpolates quantile values by |
||||
assuming a linear distribution within a bucket. The highest bucket |
||||
must have an upper bound of `+Inf`. (Otherwise, `NaN` is returned.) If |
||||
a quantile is located in the highest bucket, the upper bound of the |
||||
second highest bucket is returned. A lower limit of the lowest bucket |
||||
is assumed to be 0 if the upper bound of that bucket is greater than |
||||
0. In that case, the usual linear interpolation is applied within that |
||||
bucket. Otherwise, the upper bound of the lowest bucket is returned |
||||
for quantiles located in the lowest bucket. |
||||
|
||||
If `b` contains fewer than two buckets, `NaN` is returned. For φ < 0, `-Inf` is |
||||
returned. For φ > 1, `+Inf` is returned. |
||||
|
||||
## `holt_winters()` |
||||
|
||||
`holt_winters(v range-vector, sf scalar, tf scalar)` produces a smoothed value |
||||
for time series based on the range in `v`. The lower the smoothing factor `sf`, |
||||
the more importance is given to old data. The higher the trend factor `tf`, the |
||||
more trends in the data is considered. Both `sf` and `tf` must be between 0 and |
||||
1. |
||||
|
||||
`holt_winters` should only be used with gauges. |
||||
|
||||
## `hour()` |
||||
|
||||
`hour(v=vector(time()) instant-vector)` returns the hour of the day |
||||
for each of the given times in UTC. Returned values are from 0 to 23. |
||||
|
||||
## `idelta()` |
||||
|
||||
`idelta(v range-vector)` |
||||
|
||||
`idelta(v range-vector)` calculates the difference between the last two samples |
||||
in the range vector `v`, returning an instant vector with the given deltas and |
||||
equivalent labels. |
||||
|
||||
`idelta` should only be used with gauges. |
||||
|
||||
## `increase()` |
||||
|
||||
`increase(v range-vector)` calculates the increase in the |
||||
time series in the range vector. Breaks in monotonicity (such as counter |
||||
resets due to target restarts) are automatically adjusted for. The |
||||
increase is extrapolated to cover the full time range as specified |
||||
in the range vector selector, so that it is possible to get a |
||||
non-integer result even if a counter increases only by integer |
||||
increments. |
||||
|
||||
The following example expression returns the number of HTTP requests as measured |
||||
over the last 5 minutes, per time series in the range vector: |
||||
|
||||
``` |
||||
increase(http_requests_total{job="api-server"}[5m]) |
||||
``` |
||||
|
||||
`increase` should only be used with counters. It is syntactic sugar |
||||
for `rate(v)` multiplied by the number of seconds under the specified |
||||
time range window, and should be used primarily for human readability. |
||||
Use `rate` in recording rules so that increases are tracked consistently |
||||
on a per-second basis. |
||||
|
||||
## `irate()` |
||||
|
||||
`irate(v range-vector)` calculates the per-second instant rate of increase of |
||||
the time series in the range vector. This is based on the last two data points. |
||||
Breaks in monotonicity (such as counter resets due to target restarts) are |
||||
automatically adjusted for. |
||||
|
||||
The following example expression returns the per-second rate of HTTP requests |
||||
looking up to 5 minutes back for the two most recent data points, per time |
||||
series in the range vector: |
||||
|
||||
``` |
||||
irate(http_requests_total{job="api-server"}[5m]) |
||||
``` |
||||
|
||||
`irate` should only be used when graphing volatile, fast-moving counters. |
||||
Use `rate` for alerts and slow-moving counters, as brief changes |
||||
in the rate can reset the `FOR` clause and graphs consisting entirely of rare |
||||
spikes are hard to read. |
||||
|
||||
Note that when combining `irate()` with an |
||||
[aggregation operator](operators.md#aggregation-operators) (e.g. `sum()`) |
||||
or a function aggregating over time (any function ending in `_over_time`), |
||||
always take a `irate()` first, then aggregate. Otherwise `irate()` cannot detect |
||||
counter resets when your target restarts. |
||||
|
||||
## `label_join()` |
||||
|
||||
For each timeseries in `v`, `label_join(v instant-vector, dst_label string, separator string, src_label_1 string, src_label_2 string, ...)` joins all the values of all the `src_labels` |
||||
using `separator` and returns the timeseries with the label `dst_label` containing the joined value. |
||||
There can be any number of `src_labels` in this function. |
||||
|
||||
This example will return a vector with each time series having a `foo` label with the value `a,b,c` added to it: |
||||
|
||||
``` |
||||
label_join(up{job="api-server",src1="a",src2="b",src3="c"}, "foo", ",", "src1", "src2", "src3") |
||||
``` |
||||
|
||||
## `label_replace()` |
||||
|
||||
For each timeseries in `v`, `label_replace(v instant-vector, dst_label string, |
||||
replacement string, src_label string, regex string)` matches the regular |
||||
expression `regex` against the label `src_label`. If it matches, then the |
||||
timeseries is returned with the label `dst_label` replaced by the expansion of |
||||
`replacement`. `$1` is replaced with the first matching subgroup, `$2` with the |
||||
second etc. If the regular expression doesn't match then the timeseries is |
||||
returned unchanged. |
||||
|
||||
This example will return a vector with each time series having a `foo` |
||||
label with the value `a` added to it: |
||||
|
||||
``` |
||||
label_replace(up{job="api-server",service="a:c"}, "foo", "$1", "service", "(.*):.*") |
||||
``` |
||||
|
||||
## `ln()` |
||||
|
||||
`ln(v instant-vector)` calculates the natural logarithm for all elements in `v`. |
||||
Special cases are: |
||||
|
||||
* `ln(+Inf) = +Inf` |
||||
* `ln(0) = -Inf` |
||||
* `ln(x < 0) = NaN` |
||||
* `ln(NaN) = NaN` |
||||
|
||||
## `log2()` |
||||
|
||||
`log2(v instant-vector)` calculates the binary logarithm for all elements in `v`. |
||||
The special cases are equivalent to those in `ln`. |
||||
|
||||
## `log10()` |
||||
|
||||
`log10(v instant-vector)` calculates the decimal logarithm for all elements in `v`. |
||||
The special cases are equivalent to those in `ln`. |
||||
|
||||
## `minute()` |
||||
|
||||
`minute(v=vector(time()) instant-vector)` returns the minute of the hour for each |
||||
of the given times in UTC. Returned values are from 0 to 59. |
||||
|
||||
## `month()` |
||||
|
||||
`month(v=vector(time()) instant-vector)` returns the month of the year for each |
||||
of the given times in UTC. Returned values are from 1 to 12, where 1 means |
||||
January etc. |
||||
|
||||
## `predict_linear()` |
||||
|
||||
`predict_linear(v range-vector, t scalar)` predicts the value of time series |
||||
`t` seconds from now, based on the range vector `v`, using [simple linear |
||||
regression](http://en.wikipedia.org/wiki/Simple_linear_regression). |
||||
|
||||
`predict_linear` should only be used with gauges. |
||||
|
||||
## `rate()` |
||||
|
||||
`rate(v range-vector)` calculates the per-second average rate of increase of the |
||||
time series in the range vector. Breaks in monotonicity (such as counter |
||||
resets due to target restarts) are automatically adjusted for. Also, the |
||||
calculation extrapolates to the ends of the time range, allowing for missed |
||||
scrapes or imperfect alignment of scrape cycles with the range's time period. |
||||
|
||||
The following example expression returns the per-second rate of HTTP requests as measured |
||||
over the last 5 minutes, per time series in the range vector: |
||||
|
||||
``` |
||||
rate(http_requests_total{job="api-server"}[5m]) |
||||
``` |
||||
|
||||
`rate` should only be used with counters. It is best suited for alerting, |
||||
and for graphing of slow-moving counters. |
||||
|
||||
Note that when combining `rate()` with an aggregation operator (e.g. `sum()`) |
||||
or a function aggregating over time (any function ending in `_over_time`), |
||||
always take a `rate()` first, then aggregate. Otherwise `rate()` cannot detect |
||||
counter resets when your target restarts. |
||||
|
||||
## `resets()` |
||||
|
||||
For each input time series, `resets(v range-vector)` returns the number of |
||||
counter resets within the provided time range as an instant vector. Any |
||||
decrease in the value between two consecutive samples is interpreted as a |
||||
counter reset. |
||||
|
||||
`resets` should only be used with counters. |
||||
|
||||
## `round()` |
||||
|
||||
`round(v instant-vector, to_nearest=1 scalar)` rounds the sample values of all |
||||
elements in `v` to the nearest integer. Ties are resolved by rounding up. The |
||||
optional `to_nearest` argument allows specifying the nearest multiple to which |
||||
the sample values should be rounded. This multiple may also be a fraction. |
||||
|
||||
## `scalar()` |
||||
|
||||
Given a single-element input vector, `scalar(v instant-vector)` returns the |
||||
sample value of that single element as a scalar. If the input vector does not |
||||
have exactly one element, `scalar` will return `NaN`. |
||||
|
||||
## `sort()` |
||||
|
||||
`sort(v instant-vector)` returns vector elements sorted by their sample values, |
||||
in ascending order. |
||||
|
||||
## `sort_desc()` |
||||
|
||||
Same as `sort`, but sorts in descending order. |
||||
|
||||
## `sqrt()` |
||||
|
||||
`sqrt(v instant-vector)` calculates the square root of all elements in `v`. |
||||
|
||||
## `time()` |
||||
|
||||
`time()` returns the number of seconds since January 1, 1970 UTC. Note that |
||||
this does not actually return the current time, but the time at which the |
||||
expression is to be evaluated. |
||||
|
||||
## `vector()` |
||||
|
||||
`vector(s scalar)` returns the scalar `s` as a vector with no labels. |
||||
|
||||
## `year()` |
||||
|
||||
`year(v=vector(time()) instant-vector)` returns the year |
||||
for each of the given times in UTC. |
||||
|
||||
## `<aggregation>_over_time()` |
||||
|
||||
The following functions allow aggregating each series of a given range vector |
||||
over time and return an instant vector with per-series aggregation results: |
||||
|
||||
* `avg_over_time(range-vector)`: the average value of all points in the specified interval. |
||||
* `min_over_time(range-vector)`: the minimum value of all points in the specified interval. |
||||
* `max_over_time(range-vector)`: the maximum value of all points in the specified interval. |
||||
* `sum_over_time(range-vector)`: the sum of all values in the specified interval. |
||||
* `count_over_time(range-vector)`: the count of all values in the specified interval. |
||||
* `quantile_over_time(scalar, range-vector)`: the φ-quantile (0 ≤ φ ≤ 1) of the values in the specified interval. |
||||
* `stddev_over_time(range-vector)`: the population standard deviation of the values in the specified interval. |
||||
* `stdvar_over_time(range-vector)`: the population standard variance of the values in the specified interval. |
||||
|
||||
Note that all values in the specified interval have the same weight in the |
||||
aggregation even if the values are not equally spaced throughout the interval. |
||||
@ -0,0 +1,4 @@ |
||||
--- |
||||
title: Querying |
||||
sort_rank: 4 |
||||
--- |
||||
@ -0,0 +1,250 @@ |
||||
--- |
||||
title: Operators |
||||
sort_rank: 2 |
||||
--- |
||||
|
||||
# Operators |
||||
|
||||
## Binary operators |
||||
|
||||
Prometheus's query language supports basic logical and arithmetic operators. |
||||
For operations between two instant vectors, the [matching behavior](#vector-matching) |
||||
can be modified. |
||||
|
||||
### Arithmetic binary operators |
||||
|
||||
The following binary arithmetic operators exist in Prometheus: |
||||
|
||||
* `+` (addition) |
||||
* `-` (subtraction) |
||||
* `*` (multiplication) |
||||
* `/` (division) |
||||
* `%` (modulo) |
||||
* `^` (power/exponentiation) |
||||
|
||||
Binary arithmetic operators are defined between scalar/scalar, vector/scalar, |
||||
and vector/vector value pairs. |
||||
|
||||
**Between two scalars**, the behavior is obvious: they evaluate to another |
||||
scalar that is the result of the operator applied to both scalar operands. |
||||
|
||||
**Between an instant vector and a scalar**, the operator is applied to the |
||||
value of every data sample in the vector. E.g. if a time series instant vector |
||||
is multiplied by 2, the result is another vector in which every sample value of |
||||
the original vector is multiplied by 2. |
||||
|
||||
**Between two instant vectors**, a binary arithmetic operator is applied to |
||||
each entry in the left-hand-side vector and its [matching element](#vector-matching) |
||||
in the right hand vector. The result is propagated into the result vector and the metric |
||||
name is dropped. Entries for which no matching entry in the right-hand vector can be |
||||
found are not part of the result. |
||||
|
||||
### Comparison binary operators |
||||
|
||||
The following binary comparison operators exist in Prometheus: |
||||
|
||||
* `==` (equal) |
||||
* `!=` (not-equal) |
||||
* `>` (greater-than) |
||||
* `<` (less-than) |
||||
* `>=` (greater-or-equal) |
||||
* `<=` (less-or-equal) |
||||
|
||||
Comparison operators are defined between scalar/scalar, vector/scalar, |
||||
and vector/vector value pairs. By default they filter. Their behaviour can be |
||||
modified by providing `bool` after the operator, which will return `0` or `1` |
||||
for the value rather than filtering. |
||||
|
||||
**Between two scalars**, the `bool` modifier must be provided and these |
||||
operators result in another scalar that is either `0` (`false`) or `1` |
||||
(`true`), depending on the comparison result. |
||||
|
||||
**Between an instant vector and a scalar**, these operators are applied to the |
||||
value of every data sample in the vector, and vector elements between which the |
||||
comparison result is `false` get dropped from the result vector. If the `bool` |
||||
modifier is provided, vector elements that would be dropped instead have the value |
||||
`0` and vector elements that would be kept have the value `1`. |
||||
|
||||
**Between two instant vectors**, these operators behave as a filter by default, |
||||
applied to matching entries. Vector elements for which the expression is not |
||||
true or which do not find a match on the other side of the expression get |
||||
dropped from the result, while the others are propagated into a result vector |
||||
with their original (left-hand-side) metric names and label values. |
||||
If the `bool` modifier is provided, vector elements that would have been |
||||
dropped instead have the value `0` and vector elements that would be kept have |
||||
the value `1` with the left-hand-side metric names and label values. |
||||
|
||||
### Logical/set binary operators |
||||
|
||||
These logical/set binary operators are only defined between instant vectors: |
||||
|
||||
* `and` (intersection) |
||||
* `or` (union) |
||||
* `unless` (complement) |
||||
|
||||
`vector1 and vector2` results in a vector consisting of the elements of |
||||
`vector1` for which there are elements in `vector2` with exactly matching |
||||
label sets. Other elements are dropped. The metric name and values are carried |
||||
over from the left-hand-side vector. |
||||
|
||||
`vector1 or vector2` results in a vector that contains all original elements |
||||
(label sets + values) of `vector1` and additionally all elements of `vector2` |
||||
which do not have matching label sets in `vector1`. |
||||
|
||||
`vector1 unless vector2` results in a vector consisting of the elements of |
||||
`vector1` for which there are no elements in `vector2` with exactly matching |
||||
label sets. All matching elements in both vectors are dropped. |
||||
|
||||
## Vector matching |
||||
|
||||
Operations between vectors attempt to find a matching element in the right-hand-side |
||||
vector for each entry in the left-hand side. There are two basic types of |
||||
matching behavior: |
||||
|
||||
**One-to-one** finds a unique pair of entries from each side of the operation. |
||||
In the default case, that is an operation following the format `vector1 <operator> vector2`. |
||||
Two entries match if they have the exact same set of labels and corresponding values. |
||||
The `ignoring` keyword allows ignoring certain labels when matching, while the |
||||
`on` keyword allows reducing the set of considered labels to a provided list: |
||||
|
||||
<vector expr> <bin-op> ignoring(<label list>) <vector expr> |
||||
<vector expr> <bin-op> on(<label list>) <vector expr> |
||||
|
||||
Example input: |
||||
|
||||
method_code:http_errors:rate5m{method="get", code="500"} 24 |
||||
method_code:http_errors:rate5m{method="get", code="404"} 30 |
||||
method_code:http_errors:rate5m{method="put", code="501"} 3 |
||||
method_code:http_errors:rate5m{method="post", code="500"} 6 |
||||
method_code:http_errors:rate5m{method="post", code="404"} 21 |
||||
|
||||
method:http_requests:rate5m{method="get"} 600 |
||||
method:http_requests:rate5m{method="del"} 34 |
||||
method:http_requests:rate5m{method="post"} 120 |
||||
|
||||
Example query: |
||||
|
||||
method_code:http_errors:rate5m{code="500"} / ignoring(code) method:http_requests:rate5m |
||||
|
||||
This returns a result vector containing the fraction of HTTP requests with status code |
||||
of 500 for each method, as measured over the last 5 minutes. Without `ignoring(code)` there |
||||
would have been no match as the metrics do not share the same set of labels. |
||||
The entries with methods `put` and `del` have no match and will not show up in the result: |
||||
|
||||
{method="get"} 0.04 // 24 / 600 |
||||
{method="post"} 0.05 // 6 / 120 |
||||
|
||||
**Many-to-one** and **one-to-many** matchings refer to the case where each vector element on |
||||
the "one"-side can match with multiple elements on the "many"-side. This has to |
||||
be explicitly requested using the `group_left` or `group_right` modifier, where |
||||
left/right determines which vector has the higher cardinality. |
||||
|
||||
<vector expr> <bin-op> ignoring(<label list>) group_left(<label list>) <vector expr> |
||||
<vector expr> <bin-op> ignoring(<label list>) group_right(<label list>) <vector expr> |
||||
<vector expr> <bin-op> on(<label list>) group_left(<label list>) <vector expr> |
||||
<vector expr> <bin-op> on(<label list>) group_right(<label list>) <vector expr> |
||||
|
||||
The label list provided with the group modifier contains additional labels from |
||||
the "one"-side to be included in the result metrics. For `on` a label can only |
||||
appear in one of the lists. Every time series of the result vector must be |
||||
uniquely identifiable. |
||||
|
||||
_Grouping modifiers can only be used for |
||||
[comparison](#comparison-binary-operators) and |
||||
[arithmetic](#arithmetic-binary-operators). Operations as `and`, `unless` and |
||||
`or` operations match with all possible entries in the right vector by |
||||
default._ |
||||
|
||||
Example query: |
||||
|
||||
method_code:http_errors:rate5m / ignoring(code) group_left method:http_requests:rate5m |
||||
|
||||
In this case the left vector contains more than one entry per `method` label |
||||
value. Thus, we indicate this using `group_left`. The elements from the right |
||||
side are now matched with multiple elements with the same `method` label on the |
||||
left: |
||||
|
||||
{method="get", code="500"} 0.04 // 24 / 600 |
||||
{method="get", code="404"} 0.05 // 30 / 600 |
||||
{method="post", code="500"} 0.05 // 6 / 120 |
||||
{method="post", code="404"} 0.175 // 21 / 120 |
||||
|
||||
_Many-to-one and one-to-many matching are advanced use cases that should be carefully considered. |
||||
Often a proper use of `ignoring(<labels>)` provides the desired outcome._ |
||||
|
||||
## Aggregation operators |
||||
|
||||
Prometheus supports the following built-in aggregation operators that can be |
||||
used to aggregate the elements of a single instant vector, resulting in a new |
||||
vector of fewer elements with aggregated values: |
||||
|
||||
* `sum` (calculate sum over dimensions) |
||||
* `min` (select minimum over dimensions) |
||||
* `max` (select maximum over dimensions) |
||||
* `avg` (calculate the average over dimensions) |
||||
* `stddev` (calculate population standard deviation over dimensions) |
||||
* `stdvar` (calculate population standard variance over dimensions) |
||||
* `count` (count number of elements in the vector) |
||||
* `count_values` (count number of elements with the same value) |
||||
* `bottomk` (smallest k elements by sample value) |
||||
* `topk` (largest k elements by sample value) |
||||
* `quantile` (calculate φ-quantile (0 ≤ φ ≤ 1) over dimensions) |
||||
|
||||
These operators can either be used to aggregate over **all** label dimensions |
||||
or preserve distinct dimensions by including a `without` or `by` clause. |
||||
|
||||
<aggr-op>([parameter,] <vector expression>) [without|by (<label list>)] [keep_common] |
||||
|
||||
`parameter` is only required for `count_values`, `quantile`, `topk` and |
||||
`bottomk`. `without` removes the listed labels from the result vector, while |
||||
all other labels are preserved the output. `by` does the opposite and drops |
||||
labels that are not listed in the `by` clause, even if their label values are |
||||
identical between all elements of the vector. The `keep_common` clause allows |
||||
keeping those extra labels (labels that are identical between elements, but not |
||||
in the `by` clause). |
||||
|
||||
`count_values` outputs one time series per unique sample value. Each series has |
||||
an additional label. The name of that label is given by the aggregation |
||||
parameter, and the label value is the unique sample value. The value of each |
||||
time series is the number of times that sample value was present. |
||||
|
||||
`topk` and `bottomk` are different from other aggregators in that a subset of |
||||
the input samples, including the original labels, are returned in the result |
||||
vector. `by` and `without` are only used to bucket the input vector. |
||||
|
||||
Example: |
||||
|
||||
If the metric `http_requests_total` had time series that fan out by |
||||
`application`, `instance`, and `group` labels, we could calculate the total |
||||
number of seen HTTP requests per application and group over all instances via: |
||||
|
||||
sum(http_requests_total) without (instance) |
||||
|
||||
If we are just interested in the total of HTTP requests we have seen in **all** |
||||
applications, we could simply write: |
||||
|
||||
sum(http_requests_total) |
||||
|
||||
To count the number of binaries running each build version we could write: |
||||
|
||||
count_values("version", build_version) |
||||
|
||||
To get the 5 largest HTTP requests counts across all instances we could write: |
||||
|
||||
topk(5, http_requests_total) |
||||
|
||||
## Binary operator precedence |
||||
|
||||
The following list shows the precedence of binary operators in Prometheus, from |
||||
highest to lowest. |
||||
|
||||
1. `^` |
||||
2. `*`, `/`, `%` |
||||
3. `+`, `-` |
||||
4. `==`, `!=`, `<=`, `<`, `>=`, `>` |
||||
5. `and`, `unless` |
||||
6. `or` |
||||
|
||||
Operators on the same precedence level are left-associative. For example, |
||||
`2 * 3 % 2` is equivalent to `(2 * 3) % 2`. However `^` is right associative, |
||||
so `2 ^ 3 ^ 2` is equivalent to `2 ^ (3 ^ 2)`. |
||||
@ -0,0 +1,66 @@ |
||||
--- |
||||
title: Recording rules |
||||
sort_rank: 6 |
||||
--- |
||||
|
||||
# Defining recording rules |
||||
|
||||
## Configuring rules |
||||
|
||||
Prometheus supports two types of rules which may be configured and then |
||||
evaluated at regular intervals: recording rules and [alerting |
||||
rules](https://prometheus.io/docs/alerting/rules/). To include rules in |
||||
Prometheus, create a file containing the necessary rule statements and have |
||||
Prometheus load the file via the `rule_files` field in the [Prometheus |
||||
configuration](../configuration.md). |
||||
|
||||
The rule files can be reloaded at runtime by sending `SIGHUP` to the Prometheus |
||||
process. The changes are only applied if all rule files are well-formatted. |
||||
|
||||
## Syntax-checking rules |
||||
|
||||
To quickly check whether a rule file is syntactically correct without starting |
||||
a Prometheus server, install and run Prometheus's `promtool` command-line |
||||
utility tool: |
||||
|
||||
```bash |
||||
go get github.com/prometheus/prometheus/cmd/promtool |
||||
promtool check-rules /path/to/example.rules |
||||
``` |
||||
|
||||
When the file is syntactically valid, the checker prints a textual |
||||
representation of the parsed rules to standard output and then exits with |
||||
a `0` return status. |
||||
|
||||
If there are any syntax errors, it prints an error message to standard error |
||||
and exits with a `1` return status. On invalid input arguments the exit status |
||||
is `2`. |
||||
|
||||
## Recording rules |
||||
|
||||
Recording rules allow you to precompute frequently needed or computationally |
||||
expensive expressions and save their result as a new set of time series. |
||||
Querying the precomputed result will then often be much faster than executing |
||||
the original expression every time it is needed. This is especially useful for |
||||
dashboards, which need to query the same expression repeatedly every time they |
||||
refresh. |
||||
|
||||
To add a new recording rule, add a line of the following syntax to your rule |
||||
file: |
||||
|
||||
<new time series name>[{<label overrides>}] = <expression to record> |
||||
|
||||
Some examples: |
||||
|
||||
# Saving the per-job HTTP in-progress request count as a new set of time series: |
||||
job:http_inprogress_requests:sum = sum(http_inprogress_requests) by (job) |
||||
|
||||
# Drop or rewrite labels in the result time series: |
||||
new_time_series{label_to_change="new_value",label_to_drop=""} = old_time_series |
||||
|
||||
Recording rules are evaluated at the interval specified by the |
||||
`evaluation_interval` field in the Prometheus configuration. During each |
||||
evaluation cycle, the right-hand-side expression of the rule statement is |
||||
evaluated at the current instant in time and the resulting sample vector is |
||||
stored as a new set of time series with the current timestamp and a new metric |
||||
name (and perhaps an overridden set of labels). |
||||
Loading…
Reference in new issue