mirror of https://github.com/grafana/loki
Documentation Rewrite (#982)
* docs: create structure of docs overhaul This commit removes all old docs and lays out the table of contents and framework for how the new documentation will be intended to be read. * docs: add design docs back in * docs: add community documentation * docs: add LogQL docs * docs: port existing operations documentation * docs: add new placeholder file for promtail configuration docs * docs: add TOC for operations/storage * docs: add Loki API documentation * docs: port troubleshooting document * docs: add docker-driver documentation * docs: link to configuration from main docker-driver document * docs: update API for new paths * docs: fix broken links in api.md and remove json marker from examples * docs: incorporate api changes from #1009 * docs: port promtail documentation * docs: add TOC to promtail configuration reference * docs: fix promtail spelling errors * docs: add loki configuration reference * docs: add TOC to configuration * docs: add loki configuration example * docs: add Loki overview with brief explanation about each component * docs: add comparisons document * docs: add info on table manager and update storage/README.md * docs: add getting started * docs: incorporate config yaml changes from #755 * docs: fix typo in releases url for promtail * docs: add installation instructions * docs: add more configuration examples * docs: add information on fluentd client fluent-bit has been temporarily removed until the PR for it is merged. * docs: PR review feedback * docs: add architecture document * docs: add missing information from old docs * `localy` typo Co-Authored-By: Ed Welch <ed@oqqer.com> * docs: s/ran/run/g * Typo * Typo * Tyop * Typo * docs: fixed typo * docs: PR feedback * docs: @cyriltovena PR feedback * docs: add more details to promtail url config option * docs: expand promtail's pipelines document with extra detail * docs: remove reference to Stage interface in pipelines.md * docs: fixed some spelling * docs: clarify promtail configuration and scraping * docs: attempt #2 at explaining promtail's usage of machine hostname * docs: spelling fixes * docs: add reference to promtail custom metrics and fix silly typo * docs: cognizant -> aware * docs: typo * docs: typos * docs: add which components expose which API endpoints in microservices mode * docs: change ksonnet installation to tanka * docs: address most @pracucci feedback * docs: fix all spelling errors so reviewers don't have to keep finding them :) * docs: incorporate changes to API endpoints made in #1022 * docs: add missing loki metrics * docs: add missing promtail metrics * docs: @pstribrany feedback * docs: more @pracucci feedback * docs: move metrics into a table * docs: update push path references to /loki/api/v1/push * docs: add detail to further explain limitations of monolithic mode * docs: add alternative names to modes_of_operation diagram * docs: add log ordering requirement * docs: add procedure for updating docs with latest version * docs: separate out stages documentation into one document per stage * docs: list supported stores in storage documentation * docs: add info on duplicate log lines in pipelines * docs: add line_format as key feature to fluentd * docs: hopefully final commit :)pull/1060/head
parent
f755c59011
commit
65ba42a6e7
@ -1,42 +1,56 @@ |
||||
# Loki Documentation |
||||
|
||||
<p align="center"> <img src="logo_and_name.png" alt="Loki Logo"> <br> |
||||
<small>Like Prometheus, but for logs!</small> </p> |
||||
|
||||
Grafana Loki is a set of components, that can be composed into a fully featured |
||||
Grafana Loki is a set of components that can be composed into a fully featured |
||||
logging stack. |
||||
|
||||
It builds around the idea of treating a single log line as-is. This means that |
||||
instead of full-text indexing them, related logs are grouped using the same |
||||
labels as in Prometheus. This is much more efficient and scales better. |
||||
|
||||
## Components |
||||
- **[Loki](loki/README.md)**: The main server component is called Loki. It is |
||||
responsible for permanently storing the logs it is being shipped and it |
||||
executes the LogQL |
||||
queries from clients. |
||||
Loki shares its high-level architecture with Cortex, a highly scalable |
||||
Prometheus backend. |
||||
- **[Promtail](promtail/README.md)**: To ship logs to a central place, an |
||||
agent is required. Promtail |
||||
is deployed to every node that should be monitored and sends the logs to Loki. |
||||
It also does important task of pre-processing the log lines, including |
||||
attaching labels to them for easier querying. |
||||
- *Grafana*: The *Explore* feature of Grafana 6.0+ is the primary place of |
||||
contact between a human and Loki. It is used for discovering and analyzing |
||||
logs. |
||||
Unlike other logging systems, Loki is built around the idea of only indexing |
||||
metadata about your logs: labels (just like Prometheus labels). Log data itself |
||||
is then compressed and stored in chunks in object stores such as S3 or GCS, or |
||||
even locally on the filesystem. A small index and highly compressed chunks |
||||
simplifies the operation and significantly lowers the cost of Loki. |
||||
|
||||
Alongside these main components, there are some other ones as well: |
||||
## Table of Contents |
||||
|
||||
- **[LogCLI](logcli.md)**: A command line interface to query logs and labels |
||||
from Loki |
||||
- **[Canary](canary/README.md)**: An audit utility to analyze the log-capturing |
||||
performance of Loki. Ingests data into Loki and immediately reads it back to |
||||
check for latency and loss. |
||||
- **[Docker |
||||
Driver](https://github.com/grafana/loki/tree/master/cmd/docker-driver)**: A |
||||
Docker [log |
||||
driver](https://docs.docker.com/config/containers/logging/configure/) to ship |
||||
logs captured by Docker directly to Loki, without the need of an agent. |
||||
- **[Fluentd |
||||
Plugin](https://github.com/grafana/loki/tree/master/fluentd/fluent-plugin-grafana-loki)**: |
||||
An Fluentd [output plugin](https://docs.fluentd.org/output), to use Fluentd |
||||
for shipping logs into Loki |
||||
1. [Overview](overview/README.md) |
||||
1. [Comparison to other Log Systems](overview/comparisons.md) |
||||
2. [Installation](installation/README.md) |
||||
1. [Installing with Tanka](installation/tanka.md) |
||||
2. [Installing with Helm](installation/helm.md) |
||||
3. [Installing Locally](installation/local.md) |
||||
3. [Getting Started](getting-started/README.md) |
||||
1. [Grafana](getting-started/grafana.md) |
||||
2. [LogCLI](getting-started/logcli.md) |
||||
4. [Troubleshooting](getting-started/troubleshooting.md) |
||||
4. [Configuration](configuration/README.md) |
||||
1. [Examples](configuration/examples.md) |
||||
5. [Clients](clients/README.md) |
||||
1. [Promtail](clients/promtail/README.md) |
||||
1. [Installation](clients/promtail/installation.md) |
||||
2. [Configuration](clients/promtail/configuration.md) |
||||
3. [Scraping](clients/promtail/scraping.md) |
||||
4. [Pipelines](clients/promtail/pipelines.md) |
||||
5. [Troubleshooting](clients/promtail/troubleshooting.md) |
||||
2. [Docker Driver](clients/docker-driver/README.md) |
||||
1. [Configuration](clients/docker-driver/configuration.md) |
||||
3. [Fluentd](clients/fluentd.md) |
||||
6. [LogQL](logql.md) |
||||
7. [Operations](operations/README.md) |
||||
1. [Authentication](operations/authentication.md) |
||||
2. [Observability](operations/observability.md) |
||||
3. [Scalability](operations/scalability.md) |
||||
4. [Storage](operations/storage/README.md) |
||||
1. [Table Manager](operations/storage/table-manager.md) |
||||
2. [Retention](operations/storage/retention.md) |
||||
5. [Multi-tenancy](operations/multi-tenancy.md) |
||||
6. [Loki Canary](operations/loki-canary.md) |
||||
8. [HTTP API](api.md) |
||||
9. [Architecture](architecture.md) |
||||
10. [Community](community/README.md) |
||||
1. [Governance](community/governance.md) |
||||
2. [Getting in Touch](community/getting-in-touch.md) |
||||
3. [Contributing to Loki](community/contributing.md) |
||||
11. [Loki Maintainers Guide](./maintaining/README.md) |
||||
1. [Releasing Loki](./maintaining/release.md) |
||||
|
||||
@ -0,0 +1,639 @@ |
||||
# Loki's HTTP API |
||||
|
||||
Loki exposes an HTTP API for pushing, querying, and tailing log data. |
||||
Note that [authenticating](operations/authentication.md) against the API is |
||||
out of scope for Loki. |
||||
|
||||
The HTTP API includes the following endpoints: |
||||
|
||||
- [`GET /loki/api/v1/query`](#get-lokiapiv1query) |
||||
- [`GET /loki/api/v1/query_range`](#get-lokiapiv1query_range) |
||||
- [`GET /loki/api/v1/label`](#get-lokiapiv1label) |
||||
- [`GET /loki/api/v1/label/<name>/values`](#get-lokiapiv1labelnamevalues) |
||||
- [`GET /loki/api/v1/tail`](#get-lokiapiv1tail) |
||||
- [`POST /loki/api/v1/push`](#post-lokiapiv1push) |
||||
- [`GET /api/prom/tail`](#get-apipromtail) |
||||
- [`GET /api/prom/query`](#get-apipromquery) |
||||
- [`GET /ready`](#get-ready) |
||||
- [`POST /flush`](#post-flush) |
||||
- [`GET /metrics`](#get-metrics) |
||||
|
||||
## Microservices Mode |
||||
|
||||
When deploying Loki in microservices mode, the set of endpoints exposed by each |
||||
component is different. |
||||
|
||||
These endpoints are exposed by all components: |
||||
|
||||
- [`GET /ready`](#get-ready) |
||||
- [`GET /metrics`](#get-metrics) |
||||
|
||||
These endpoints are exposed by just the querier: |
||||
|
||||
- [`GET /loki/api/v1/query`](#get-lokiapiv1query) |
||||
- [`GET /loki/api/v1/query_range`](#get-lokiapiv1query_range) |
||||
- [`GET /loki/api/v1/label`](#get-lokiapiv1label) |
||||
- [`GET /loki/api/v1/label/<name>/values`](#get-lokiapiv1labelnamevalues) |
||||
- [`GET /loki/api/v1/tail`](#get-lokiapiv1tail) |
||||
- [`GET /api/prom/tail`](#get-lokiapipromtail) |
||||
- [`GET /api/prom/query`](#get-apipromquery) |
||||
|
||||
While these endpoints are exposed by just the distributor: |
||||
|
||||
- [`POST /loki/api/v1/push`](#post-lokiapiv1push) |
||||
|
||||
And these endpoints are exposed by just the ingester: |
||||
|
||||
- [`POST /flush`](#post-flush) |
||||
|
||||
The API endpoints starting with `/loki/` are [Prometheus API-compatible](https://prometheus.io/docs/prometheus/latest/querying/api/) and the result formats can be used interchangeably. |
||||
|
||||
[Example clients](#example-clients) can be found at the bottom of this document. |
||||
|
||||
## Matrix, Vector, And Streams |
||||
|
||||
Some Loki API endpoints return a result of a matrix, a vector, or a stream: |
||||
|
||||
- Matrix: a table of values where each row represents a different label set |
||||
and the columns are each sample value for that row over the queried time. |
||||
Matrix types are only returned when running a query that computes some value. |
||||
|
||||
- Instant Vector: denoted in the type as just `vector`, an Instant Vector |
||||
represents the latest value of a calculation for a given labelset. Instant |
||||
Vectors are only returned when doing a query against a single point in |
||||
time. |
||||
|
||||
- Stream: a Stream is a set of all values (logs) for a given label set over the |
||||
queried time range. Streams are the only type that will result in log lines |
||||
being returned. |
||||
|
||||
## `GET /loki/api/v1/query` |
||||
|
||||
`/loki/api/v1/query` allows for doing queries against a single point in time. The URL |
||||
query parameters support the following values: |
||||
|
||||
- `query`: The [LogQL](./logql.md) query to perform |
||||
- `limit`: The max number of entries to return |
||||
- `time`: The evaluation time for the query as a nanosecond Unix epoch. Defaults to now. |
||||
- `direction`: Determines the sort order of logs. Supported values are `forward` or `backward`. Defaults to `backward.` |
||||
|
||||
In microservices mode, `/loki/api/v1/query` is exposed by the querier. |
||||
|
||||
Response: |
||||
|
||||
``` |
||||
{ |
||||
"status": "success", |
||||
"data": { |
||||
"resultType": "vector" | "streams", |
||||
"result": [<vector value>] | [<stream value>] |
||||
} |
||||
} |
||||
``` |
||||
|
||||
Where `<vector value>` is: |
||||
|
||||
``` |
||||
{ |
||||
"metric": { |
||||
<label key-value pairs> |
||||
}, |
||||
"value": [ |
||||
<number: nanosecond unix epoch>, |
||||
<string: value> |
||||
] |
||||
} |
||||
``` |
||||
|
||||
And `<stream value>` is: |
||||
|
||||
``` |
||||
{ |
||||
"stream": { |
||||
<label key-value pairs> |
||||
}, |
||||
"values": [ |
||||
[ |
||||
<string: nanosecond unix epoch>, |
||||
<string: log line> |
||||
], |
||||
... |
||||
] |
||||
} |
||||
``` |
||||
|
||||
### Examples |
||||
|
||||
```bash |
||||
$ curl -G -s "http://localhost:3100/loki/api/v1/query" --data-urlencode 'query=sum(rate({job="varlogs"}[10m])) by (level)' | jq |
||||
{ |
||||
"status": "success", |
||||
"data": { |
||||
"resultType": "vector", |
||||
"result": [ |
||||
{ |
||||
"metric": {}, |
||||
"value": [ |
||||
1559848867745737, |
||||
"1267.1266666666666" |
||||
] |
||||
}, |
||||
{ |
||||
"metric": { |
||||
"level": "warn" |
||||
}, |
||||
"value": [ |
||||
1559848867745737, |
||||
"37.77166666666667" |
||||
] |
||||
}, |
||||
{ |
||||
"metric": { |
||||
"level": "info" |
||||
}, |
||||
"value": [ |
||||
1559848867745737, |
||||
"37.69" |
||||
] |
||||
} |
||||
] |
||||
} |
||||
} |
||||
``` |
||||
|
||||
```bash |
||||
$ curl -G -s "http://localhost:3100/loki/api/v1/query" --data-urlencode 'query={job="varlogs"}' | jq |
||||
{ |
||||
"status": "success", |
||||
"data": { |
||||
"resultType": "streams", |
||||
"result": [ |
||||
{ |
||||
"stream": { |
||||
"filename": "/var/log/myproject.log", |
||||
"job": "varlogs", |
||||
"level": "info" |
||||
}, |
||||
"values": [ |
||||
[ |
||||
"1568234281726420425", |
||||
"foo" |
||||
], |
||||
[ |
||||
"1568234269716526880", |
||||
"bar" |
||||
] |
||||
] |
||||
} |
||||
] |
||||
} |
||||
} |
||||
``` |
||||
|
||||
## `GET /loki/api/v1/query_range` |
||||
|
||||
`/loki/api/v1/query_range` is used to do a query over a range of time and |
||||
accepts the following query parameters in the URL: |
||||
|
||||
- `query`: The [LogQL](./logql.md) query to perform |
||||
- `limit`: The max number of entries to return |
||||
- `start`: The start time for the query as a nanosecond Unix epoch. Defaults to one hour ago. |
||||
- `end`: The start time for the query as a nanosecond Unix epoch. Defaults to now. |
||||
- `step`: Query resolution step width in seconds. Defaults to 1. |
||||
- `direction`: Determines the sort order of logs. Supported values are `forward` or `backward`. Defaults to `backward.` |
||||
|
||||
Requests against this endpoint require Loki to query the index store in order to |
||||
find log streams for particular labels. Because the index store is spread out by |
||||
time, the time span covered by `start` and `end`, if large, may cause additional |
||||
load against the index server and result in a slow query. |
||||
|
||||
In microservices mode, `/loki/api/v1/query_range` is exposed by the querier. |
||||
|
||||
Response: |
||||
|
||||
``` |
||||
{ |
||||
"status": "success", |
||||
"data": { |
||||
"resultType": "matrix" | "streams", |
||||
"result": [<matrix value>] | [<stream value>] |
||||
} |
||||
} |
||||
``` |
||||
|
||||
Where `<matrix value>` is: |
||||
|
||||
``` |
||||
{ |
||||
"metric": { |
||||
<label key-value pairs> |
||||
}, |
||||
"values": [ |
||||
<number: nanosecond unix epoch>, |
||||
<string: value> |
||||
] |
||||
} |
||||
``` |
||||
|
||||
And `<stream value>` is: |
||||
|
||||
``` |
||||
{ |
||||
"stream": { |
||||
<label key-value pairs> |
||||
}, |
||||
"values": [ |
||||
[ |
||||
<string: nanosecond unix epoch>, |
||||
<string: log line> |
||||
], |
||||
... |
||||
] |
||||
} |
||||
``` |
||||
|
||||
### Examples |
||||
|
||||
```bash |
||||
$ curl -G -s "http://localhost:3100/loki/api/v1/query_range" --data-urlencode 'query=sum(rate({job="varlogs"}[10m])) by (level)' --data-urlencode 'step=300' | jq |
||||
{ |
||||
"status": "success", |
||||
"data": { |
||||
"resultType": "matrix", |
||||
"result": [ |
||||
{ |
||||
"metric": { |
||||
"level": "info" |
||||
}, |
||||
"values": [ |
||||
[ |
||||
1559848958663735, |
||||
"137.95" |
||||
], |
||||
[ |
||||
1559849258663735, |
||||
"467.115" |
||||
], |
||||
[ |
||||
1559849558663735, |
||||
"658.8516666666667" |
||||
] |
||||
] |
||||
}, |
||||
{ |
||||
"metric": { |
||||
"level": "warn" |
||||
}, |
||||
"values": [ |
||||
[ |
||||
1559848958663735, |
||||
"137.27833333333334" |
||||
], |
||||
[ |
||||
1559849258663735, |
||||
"467.69" |
||||
], |
||||
[ |
||||
1559849558663735, |
||||
"660.6933333333334" |
||||
] |
||||
] |
||||
} |
||||
] |
||||
} |
||||
} |
||||
``` |
||||
|
||||
```bash |
||||
$ curl -G -s "http://localhost:3100/loki/api/v1/query_range" --data-urlencode 'query={job="varlogs"}' | jq |
||||
{ |
||||
"status": "success", |
||||
"data": { |
||||
"resultType": "streams", |
||||
"result": [ |
||||
{ |
||||
"stream": { |
||||
"filename": "/var/log/myproject.log", |
||||
"job": "varlogs", |
||||
"level": "info" |
||||
}, |
||||
"values": [ |
||||
{ |
||||
"1569266497240578000", |
||||
"foo" |
||||
}, |
||||
{ |
||||
"1569266492548155000", |
||||
"bar" |
||||
} |
||||
] |
||||
} |
||||
] |
||||
} |
||||
} |
||||
``` |
||||
|
||||
## `GET /loki/api/v1/label` |
||||
|
||||
`/loki/api/v1/label` retrieves the list of known labels within a given time span. It |
||||
accepts the following query parameters in the URL: |
||||
|
||||
- `start`: The start time for the query as a nanosecond Unix epoch. Defaults to 6 hours ago. |
||||
- `end`: The start time for the query as a nanosecond Unix epoch. Defaults to now. |
||||
|
||||
In microservices mode, `/loki/api/v1/label` is exposed by the querier. |
||||
|
||||
Response: |
||||
|
||||
``` |
||||
{ |
||||
"values": [ |
||||
<label string>, |
||||
... |
||||
] |
||||
} |
||||
``` |
||||
|
||||
### Examples |
||||
|
||||
```bash |
||||
$ curl -G -s "http://localhost:3100/loki/api/v1/label" | jq |
||||
{ |
||||
"values": [ |
||||
"foo", |
||||
"bar", |
||||
"baz" |
||||
] |
||||
} |
||||
``` |
||||
|
||||
## `GET /loki/api/v1/label/<name>/values` |
||||
|
||||
`/loki/api/v1/label/<name>/values` retrieves the list of known values for a given |
||||
label within a given time span. It accepts the following query parameters in |
||||
the URL: |
||||
|
||||
- `start`: The start time for the query as a nanosecond Unix epoch. Defaults to 6 hours ago. |
||||
- `end`: The start time for the query as a nanosecond Unix epoch. Defaults to now. |
||||
|
||||
In microservices mode, `/loki/api/v1/label/<name>/values` is exposed by the querier. |
||||
|
||||
Response: |
||||
|
||||
``` |
||||
{ |
||||
"values": [ |
||||
<label value>, |
||||
... |
||||
] |
||||
} |
||||
``` |
||||
|
||||
### Examples |
||||
|
||||
```bash |
||||
$ curl -G -s "http://localhost:3100/loki/api/v1/label/foo/values" | jq |
||||
{ |
||||
"values": [ |
||||
"cat", |
||||
"dog", |
||||
"axolotl" |
||||
] |
||||
} |
||||
``` |
||||
|
||||
## `GET /loki/api/v1/tail` |
||||
|
||||
`/loki/api/v1/tail` is a WebSocket endpoint that will stream log messages based on |
||||
a query. It accepts the following query parameters in the URL: |
||||
|
||||
- `query`: The [LogQL](./logql.md) query to perform |
||||
- `delay_for`: The number of seconds to delay retrieving logs to let slow |
||||
loggers catch up. Defaults to 0 and cannot be larger than 5. |
||||
- `limit`: The max number of entries to return |
||||
- `start`: The start time for the query as a nanosecond Unix epoch. Defaults to one hour ago. |
||||
|
||||
In microservices mode, `/loki/api/v1/tail` is exposed by the querier. |
||||
|
||||
Response (streamed): |
||||
|
||||
``` |
||||
{ |
||||
"streams": [ |
||||
{ |
||||
"stream": { |
||||
<label key-value pairs> |
||||
}, |
||||
"values": [ |
||||
[ |
||||
<string: nanosecond unix epoch>, |
||||
<string: log line> |
||||
] |
||||
] |
||||
} |
||||
], |
||||
"dropped_entries": [ |
||||
{ |
||||
"labels": { |
||||
<label key-value pairs> |
||||
}, |
||||
"timestamp": "<nanosecond unix epoch>" |
||||
} |
||||
] |
||||
} |
||||
``` |
||||
|
||||
## `POST /loki/api/v1/push` |
||||
|
||||
Alias (DEPRECATED): `POST /loki/api/v1/push` |
||||
|
||||
`/loki/api/v1/push` is the endpoint used to send log entries to Loki. The default |
||||
behavior is for the POST body to be a snappy-compressed protobuf messsage: |
||||
|
||||
- [Protobuf definition](/pkg/logproto/logproto.proto) |
||||
- [Go client library](/pkg/promtail/client/client.go) |
||||
|
||||
Alternatively, if the `Content-Type` header is set to `application/json`, a |
||||
JSON post body can be sent in the following format: |
||||
|
||||
``` |
||||
{ |
||||
"streams": [ |
||||
{ |
||||
"labels": "<LogQL label key-value pairs>", |
||||
"entries": [ |
||||
{ |
||||
"ts": "<RFC3339Nano string>", |
||||
"line": "<log line>" |
||||
} |
||||
] |
||||
} |
||||
] |
||||
} |
||||
``` |
||||
|
||||
> **NOTE**: logs sent to Loki for every stream must be in timestamp-ascending |
||||
> order, meaning each log line must be more recent than the one last received. |
||||
> If logs do not follow this order, Loki will reject the log with an out of |
||||
> order error. |
||||
|
||||
In microservices mode, `/loki/api/v1/push` is exposed by the distributor. |
||||
|
||||
### Examples |
||||
|
||||
```bash |
||||
$ curl -H "Content-Type: application/json" -XPOST -s "https://localhost:3100/loki/api/v1/push" --data-raw \ |
||||
'{"streams": [{ "labels": "{foo=\"bar\"}", "entries": [{ "ts": "2018-12-18T08:28:06.801064-04:00", "line": "fizzbuzz" }] }]}' |
||||
``` |
||||
|
||||
## `GET /api/prom/tail` |
||||
|
||||
> **DEPRECATED**: `/api/prom/tail` is deprecated. Use `/loki/api/v1/tail` |
||||
> instead. |
||||
|
||||
`/api/prom/tail` is a WebSocket endpoint that will stream log messages based on |
||||
a query. It accepts the following query parameters in the URL: |
||||
|
||||
- `query`: The [LogQL](./logql.md) query to perform |
||||
- `delay_for`: The number of seconds to delay retrieving logs to let slow |
||||
loggers catch up. Defaults to 0 and cannot be larger than 5. |
||||
- `limit`: The max number of entries to return |
||||
- `start`: The start time for the query as a nanosecond Unix epoch. Defaults to one hour ago. |
||||
|
||||
In microservices mode, `/api/prom/tail` is exposed by the querier. |
||||
|
||||
Response (streamed): |
||||
|
||||
```json |
||||
{ |
||||
"streams": [ |
||||
{ |
||||
"labels": "<LogQL label key-value pairs>", |
||||
"entries": [ |
||||
{ |
||||
"ts": "<RFC3339Nano timestamp>", |
||||
"line": "<log line>" |
||||
} |
||||
] |
||||
} |
||||
], |
||||
"dropped_entries": [ |
||||
{ |
||||
"Timestamp": "<RFC3339Nano timestamp>", |
||||
"Labels": "<LogQL label key-value pairs>" |
||||
} |
||||
] |
||||
} |
||||
``` |
||||
|
||||
`dropped_entries` will be populated when the tailer could not keep up with the |
||||
amount of traffic in Loki. When present, it indicates that the entries received |
||||
in the streams is not the full amount of logs that are present in Loki. Note |
||||
that the keys in `dropped_entries` will be sent as uppercase `Timestamp` |
||||
and `Labels` instead of `labels` and `ts` like in the entries for the stream. |
||||
|
||||
As the response is streamed, the object defined by the response format above |
||||
will be sent over the WebSocket multiple times. |
||||
|
||||
|
||||
|
||||
## `GET /api/prom/query` |
||||
|
||||
> **WARNING**: `/api/prom/query` is DEPRECATED; use `/loki/api/v1/query_range` |
||||
> instead. |
||||
|
||||
`/api/prom/query` supports doing general queries. The URL query parameters |
||||
support the following values: |
||||
|
||||
- `query`: The [LogQL](./logql.md) query to perform |
||||
- `limit`: The max number of entries to return |
||||
- `start`: The start time for the query as a nanosecond Unix epoch. Defaults to one hour ago. |
||||
- `end`: The start time for the query as a nanosecond Unix epoch. Defaults to now. |
||||
- `direction`: Determines the sort order of logs. Supported values are `forward` or `backward`. Defaults to `backward.` |
||||
- `regexp`: a regex to filter the returned results |
||||
|
||||
In microservices mode, `/api/prom/query` is exposed by the querier. |
||||
|
||||
Note that the larger the time span between `start` and `end` will cause |
||||
additional load on Loki and the index store, resulting in slower queries. |
||||
|
||||
Response: |
||||
|
||||
``` |
||||
{ |
||||
"streams": [ |
||||
{ |
||||
"labels": "<LogQL label key-value pairs>", |
||||
"entries": [ |
||||
{ |
||||
"ts": "<RFC3339Nano string>", |
||||
"line": "<log line>" |
||||
}, |
||||
... |
||||
], |
||||
}, |
||||
... |
||||
] |
||||
} |
||||
``` |
||||
|
||||
### Examples |
||||
|
||||
```bash |
||||
$ curl -G -s "http://localhost:3100/api/prom/query" --data-urlencode '{foo="bar"}' | jq |
||||
{ |
||||
"streams": [ |
||||
{ |
||||
"labels": "{filename=\"/var/log/myproject.log\", job=\"varlogs\", level=\"info\"}", |
||||
"entries": [ |
||||
{ |
||||
"ts": "2019-06-06T19:25:41.972739Z", |
||||
"line": "foo" |
||||
}, |
||||
{ |
||||
"ts": "2019-06-06T19:25:41.972722Z", |
||||
"line": "bar" |
||||
} |
||||
] |
||||
} |
||||
] |
||||
} |
||||
``` |
||||
|
||||
### Examples |
||||
|
||||
```bash |
||||
$ curl -H "Content-Type: application/json" -XPOST -s "https://localhost:3100/loki/api/v1/push" --data-raw \ |
||||
'{"streams": [{ "labels": "{foo=\"bar\"}", "entries": [{ "ts": "2018-12-18T08:28:06.801064-04:00", "line": "fizzbuzz" }] }]}' |
||||
``` |
||||
|
||||
## `GET /ready` |
||||
|
||||
`/ready` returns HTTP 200 when the Loki ingester is ready to accept traffic. If |
||||
running Loki on Kubernetes, `/ready` can be used as a readiness probe. |
||||
|
||||
In microservices mode, the `/ready` endpoint is exposed by all components. |
||||
|
||||
## `POST /flush` |
||||
|
||||
`/flush` triggers a flush of all in-memory chunks held by the ingesters to the |
||||
backing store. Mainly used for local testing. |
||||
|
||||
In microservices mode, the `/flush` endpoint is exposed by the ingester. |
||||
|
||||
## `GET /metrics` |
||||
|
||||
`/metrics` exposes Prometheus metrics. See |
||||
[Observing Loki](operations/observability.md) |
||||
for a list of exported metrics. |
||||
|
||||
In microservices mode, the `/metrics` endpoint is exposed by all components. |
||||
|
||||
## Example Clients |
||||
|
||||
Please note that the Loki API is not stable yet and breaking changes may occur |
||||
when using or writing a third-party client. |
||||
|
||||
- [Promtail](https://github.com/grafana/loki/tree/master/pkg/promtail) (Official, Go) |
||||
- [promtail-client](https://github.com/afiskon/promtail-client) (Go) |
||||
- [push-to-loki.py](https://github.com/sleleko/devops-kb/blob/master/python/push-to-loki.py) (Python 3) |
||||
@ -0,0 +1,306 @@ |
||||
# Loki's Architecture |
||||
|
||||
This document will expand on the information detailed in the [Loki |
||||
Overview](overview/README.md). |
||||
|
||||
## Multi Tenancy |
||||
|
||||
All data - both in memory and in long-term storage - is partitioned by a |
||||
tenant ID, pulled from the `X-Scope-OrgID` HTTP header in the request when Loki |
||||
is running in multi-tenant mode. When Loki is **not** in multi-tenant mode, the |
||||
header is ignored and the tenant ID is set to "fake", which will appear in the |
||||
index and in stored chunks. |
||||
|
||||
## Modes of Operation |
||||
|
||||
 |
||||
|
||||
Loki has a set of components (defined below in [Components](#components)) which |
||||
are internally referred to as modules. Each component spawns a gRPC server for |
||||
internal traffic and an HTTP/1 server for external API requests. All components |
||||
come with an HTTP/1 server, but most only expose readiness, health, and metrics |
||||
endpoints. |
||||
|
||||
Which component Loki runs is determined by either the `-target` flag at the |
||||
command line or the `target: <string>` section in Loki's config file. When the |
||||
value of `target` is `all`, Loki will run all of its components in a single |
||||
process. This is referred to as "single process", "single binary", or monolithic |
||||
mode. Monolithic mode is the default deployment of Loki when Loki is installed |
||||
using Helm. |
||||
|
||||
When `target` is _not_ set to `all` (i.e., it is set to `querier`, `ingester`, |
||||
or `distributor`), then Loki is said to be running in "horizontally scalable", |
||||
or microservices, mode. |
||||
|
||||
Each component of Loki, such as the ingesters and distributors, communicate with |
||||
one another over gRPC using the gRPC listen port defined in the Loki config. |
||||
When running components in monolithic mode, this is still true: each component, |
||||
although running in the same process, will connect to each other over the local |
||||
network for inter-component communication. |
||||
|
||||
Single process mode is ideally suited for local development, small workloads, |
||||
and for evaluation purposes. Monolithic mode can be scaled with multiple |
||||
processes with the following limitations: |
||||
|
||||
1. Local index and local storage cannot currently be used when running |
||||
monolithic mode with more than one replica, as each replica must be able to |
||||
access the same storage backend, and local storage is not safe for concurrent |
||||
access. |
||||
2. Individual components cannot be scaled independently, so it is not possible |
||||
to have more read components than write components. |
||||
|
||||
## Components |
||||
|
||||
### Distributor |
||||
|
||||
The **distributor** service is responsible for handling incoming streams by |
||||
clients. It's the first stop in the write path for log data. Once the |
||||
distributor receives a set of streams, each stream is validated for correctness |
||||
and to ensure that it is within the configured tenant (or global) limits. Valid |
||||
chunks are then split into batches and sent to multiple [ingesters](#ingester) |
||||
in parallel. |
||||
|
||||
#### Hashing |
||||
|
||||
Distributors use consistent hashing in conjunction with a configurable |
||||
replication factor to determine which instances of the ingester service should |
||||
receive a given stream. |
||||
|
||||
A stream is a set of logs associated to a tenant and a unique labelset. The |
||||
stream is hashed using both the tenant ID and the labelset and then the hash is |
||||
used to find the ingesters to send the stream to. |
||||
|
||||
A hash ring stored in [Consul](https://www.consul.io) is used to achieve |
||||
consistent hashing; all [ingesters](#ingester) register themselves into the hash |
||||
ring with a set of tokens they own. Each token is a random unsigned 32-bit |
||||
number. Along with a set of tokens, ingesters register their state into the |
||||
hash ring. The state JOINING, and ACTIVE may all receive write requests, while |
||||
ACTIVE and LEAVING ingesters may receive read requests. When doing a hash |
||||
lookup, distributors only use tokens for ingesters who are in the appropriate |
||||
state for the request. |
||||
|
||||
To do the hash lookup, distributors find the smallest appropriate token whose |
||||
value is larger than the hash of the stream. When the replication factor is |
||||
larger than 1, the next subsequent tokens (clockwise in the ring) that belong to |
||||
different ingesters will also be included in the result. |
||||
|
||||
The effect of this hash set up is that each token that an ingester owns is |
||||
responsible for a range of hashes. If there are three tokens with values 0, 25, |
||||
and 50, then a hash of 3 would be given to the ingester that owns the token 25; |
||||
the ingester owning token 25 is responsible for the hash range of 1-25. |
||||
|
||||
#### Quorum consistency |
||||
|
||||
Since all distributors share access to the same hash ring, write requests can be |
||||
sent to any distributor. |
||||
|
||||
To ensure consistent query results, Loki uses |
||||
[Dynamo-style](https://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf) |
||||
quorum consistency on reads and writes. This means that the distributor will wait |
||||
for a positive response of at least one half plus one of the ingesters to send |
||||
the sample to before responding to the client that initiated the send. |
||||
|
||||
### Ingester |
||||
|
||||
The **ingester** service is responsible for writing log data to long-term |
||||
storage backends (DynamoDB, S3, Cassandra, etc.) on the write path and returning |
||||
log data for in-memory queries on the read path. |
||||
|
||||
Ingesters contain a _lifecycler_ which manages the lifecycle of an ingester in |
||||
the hash ring. Each ingester has a state of either `PENDING`, `JOINING`, |
||||
`ACTIVE`, `LEAVING`, or `UNHEALTHY`: |
||||
|
||||
1. `PENDING` is an Ingester's state when it is waiting for a handoff from |
||||
another ingester that is `LEAVING`. |
||||
|
||||
2. `JOINING` is an Ingester's state when it is currently inserting its tokens |
||||
into the ring and initializing itself. It may receive write requests for |
||||
tokens it owns. |
||||
|
||||
3. `ACTIVE` is an Ingester's state when it is fully initialized. It may receive |
||||
both write and read requests for tokens it owns. |
||||
|
||||
4. `LEAVING` is an Ingester's state when it is shutting down. It may receive |
||||
read requests for data it still has in memory. |
||||
|
||||
5. `UNHEALTHY` is an Ingester's state when it has failed to heartbeat to |
||||
Consul. `UNHEALHTY` is set by the distributor when it periodically checks the ring. |
||||
|
||||
Each log stream that an ingester receives is built up into a set of many |
||||
"chunks" in memory and flushed to the backing storage backend at a configurable |
||||
interval. |
||||
|
||||
Chunks are compressed and marked as read-only when: |
||||
|
||||
1. The current chunk has reached capacity (a configurable value). |
||||
2. Too much time has passed without the current chunk being updated |
||||
3. A flush occurs. |
||||
|
||||
Whenever a chunk is compressed and marked as read-only, a writable chunk takes |
||||
its place. |
||||
|
||||
If an ingester process crashes or exits abruptly, all the data that has not yet |
||||
been flushed will be lost. Loki is usually configured to replicate multiple |
||||
replicas (usually 3) of each log to mitigate this risk. |
||||
|
||||
When a flush occurs to a persistent storage provider, the chunk is hashed based |
||||
on its tenant, labels, and contents. This means that multiple ingesters with the |
||||
same copy of data will not write the same data to the backing store twice, but |
||||
if any write failed to one of the replicas, multiple differing chunk objects |
||||
will be created in the backing store. See [Querier](#querier) for how data is |
||||
deduplicated. |
||||
|
||||
#### Handoff |
||||
|
||||
By default, when an ingester is shutting down and tries to leave the hash ring, |
||||
it will wait to see if a new ingester tries to enter before flushing and will |
||||
try to initiate a handoff. The handoff will transfer all of the tokens and |
||||
in-memory chunks owned by the leaving ingester to the new ingester. |
||||
|
||||
Before joining the hash ring, ingesters will wait in `PENDING` state for a |
||||
handoff to occur. After a configurable timeout, ingesters in the `PENDING` state |
||||
that have not received a transfer will join the ring normally, inserting a new |
||||
set of tokens. |
||||
|
||||
This process is used to avoid flushing all chunks when shutting down, which is a |
||||
slow process. |
||||
|
||||
### Querier |
||||
|
||||
The **querier** service handles queries using the [LogQL](./logql.md) query |
||||
language, fetching logs both from the ingesters and long-term storage. |
||||
|
||||
Queriers query all ingesters for in-memory data before falling back to |
||||
running the same query against the backend store. Because of the replication |
||||
factor, it is possible that the querier may receive duplicate data. To resolve |
||||
this, the querier internally **deduplicates** data that has the same nanosecond |
||||
timestamp, label set, and log message. |
||||
|
||||
## Chunk Format |
||||
|
||||
``` |
||||
------------------------------------------------------------------- |
||||
| | | |
||||
| MagicNumber(4b) | version(1b) | |
||||
| | | |
||||
------------------------------------------------------------------- |
||||
| block-1 bytes | checksum (4b) | |
||||
------------------------------------------------------------------- |
||||
| block-2 bytes | checksum (4b) | |
||||
------------------------------------------------------------------- |
||||
| block-n bytes | checksum (4b) | |
||||
------------------------------------------------------------------- |
||||
| #blocks (uvarint) | |
||||
------------------------------------------------------------------- |
||||
| #entries(uvarint) | mint, maxt (varint) | offset, len (uvarint) | |
||||
------------------------------------------------------------------- |
||||
| #entries(uvarint) | mint, maxt (varint) | offset, len (uvarint) | |
||||
------------------------------------------------------------------- |
||||
| #entries(uvarint) | mint, maxt (varint) | offset, len (uvarint) | |
||||
------------------------------------------------------------------- |
||||
| #entries(uvarint) | mint, maxt (varint) | offset, len (uvarint) | |
||||
------------------------------------------------------------------- |
||||
| checksum(from #blocks) | |
||||
------------------------------------------------------------------- |
||||
| #blocks section byte offset | |
||||
------------------------------------------------------------------- |
||||
``` |
||||
|
||||
`mint` and `maxt` describe the minimum and maximum Unix nanosecond timestamp, |
||||
respectively. |
||||
|
||||
### Block Format |
||||
|
||||
A block is comprised of a series of entries, each of which is an individual log |
||||
line. |
||||
|
||||
Note that the bytes of a block are stored compressed using Gzip. The following |
||||
is their form when uncompressed: |
||||
|
||||
``` |
||||
------------------------------------------------------------------- |
||||
| ts (varint) | len (uvarint) | log-1 bytes | |
||||
------------------------------------------------------------------- |
||||
| ts (varint) | len (uvarint) | log-2 bytes | |
||||
------------------------------------------------------------------- |
||||
| ts (varint) | len (uvarint) | log-3 bytes | |
||||
------------------------------------------------------------------- |
||||
| ts (varint) | len (uvarint) | log-n bytes | |
||||
------------------------------------------------------------------- |
||||
``` |
||||
|
||||
`ts` is the Unix nanosecond timestamp of the logs, while len is the length in |
||||
bytes of the log entry. |
||||
|
||||
## Chunk Store |
||||
|
||||
The **chunk store** is Loki's long-term data store, designed to support |
||||
interactive querying and sustained writing without the need for background |
||||
maintenance tasks. It consists of: |
||||
|
||||
* An index for the chunks. This index can be backed by: |
||||
* [Amazon DynamoDB](https://aws.amazon.com/dynamodb) |
||||
* [Google Bigtable](https://cloud.google.com/bigtable) |
||||
* [Apache Cassandra](https://cassandra.apache.org) |
||||
* A key-value (KV) store for the chunk data itself, which can be: |
||||
* [Amazon DynamoDB](https://aws.amazon.com/dynamodb) |
||||
* [Google Bigtable](https://cloud.google.com/bigtable) |
||||
* [Apache Cassandra](https://cassandra.apache.org) |
||||
* [Amazon S3](https://aws.amazon.com/s3) |
||||
* [Google Cloud Storage](https://cloud.google.com/storage/) |
||||
|
||||
> Unlike the other core components of Loki, the chunk store is not a separate |
||||
> service, job, or process, but rather a library embedded in the two services |
||||
> that need to access Loki data: the [ingester](#ingester) and [querier](#querier). |
||||
|
||||
The chunk store relies on a unified interface to the |
||||
"[NoSQL](https://en.wikipedia.org/wiki/NoSQL)" stores (DynamoDB, Bigtable, and |
||||
Cassandra) that can be used to back the chunk store index. This interface |
||||
assumes that the index is a collection of entries keyed by: |
||||
|
||||
* A **hash key**. This is required for *all* reads and writes. |
||||
* A **range key**. This is required for writes and can be omitted for reads, |
||||
which can be queried by prefix or range. |
||||
|
||||
The interface works somewhat differently across the supported databases: |
||||
|
||||
* DynamoDB supports range and hash keys natively. Index entries are thus |
||||
modelled directly as DynamoDB entries, with the hash key as the distribution |
||||
key and the range as the DynamoDB range key. |
||||
* For Bigtable and Cassandra, index entries are modelled as individual column |
||||
values. The hash key becomes the row key and the range key becomes the column |
||||
key. |
||||
|
||||
A set of schemas are used to map the matchers and label sets used on reads and |
||||
writes to the chunk store into appropriate operations on the index. Schemas have |
||||
been added as Loki has evolved, mainly in an attempt to better load balance |
||||
writes and improve query performance. |
||||
|
||||
> The current schema recommendation is the **v10 schema**. |
||||
|
||||
## Read Path |
||||
|
||||
To summarize, the read path works as follows: |
||||
|
||||
1. The querier receives an HTTP/1 request for data. |
||||
2. The querier passes the query to all ingesters for in-memory data. |
||||
3. The ingesters receive the read request and return data matching the query, if |
||||
any. |
||||
4. The querier lazily loads data from the backing store and runs the query |
||||
against it if no ingesters returned data. |
||||
5. The querier iterates over all received data and deduplicates, returning a |
||||
final set of data over the HTTP/1 connection. |
||||
|
||||
## Write Path |
||||
|
||||
 |
||||
|
||||
To summarize, the write path works as follows: |
||||
|
||||
1. The distributor receives an HTTP/1 request to store data for streams. |
||||
2. Each stream is hashed using the hash ring. |
||||
3. The distributor sends each stream to the appropriate ingesters and their |
||||
replicas (based on the configured replication factor). |
||||
4. Each ingester will create a chunk or append to an existing chunk for the |
||||
stream's data. A chunk is unique per tenant and per labelset. |
||||
5. The distributor responds with a success code over the HTTP/1 connection. |
||||
@ -1,166 +0,0 @@ |
||||
# loki-canary |
||||
|
||||
A standalone app to audit the log capturing performance of Loki. |
||||
|
||||
## How it works |
||||
|
||||
 |
||||
|
||||
loki-canary writes a log to a file and stores the timestamp in an internal |
||||
array, the contents look something like this: |
||||
|
||||
```nohighlight |
||||
1557935669096040040 ppppppppppppppppppppppppppppppppppppppppppppppppppppppppppp |
||||
``` |
||||
|
||||
The relevant part is the timestamp, the `p`'s are just filler bytes to make the |
||||
size of the log configurable. |
||||
|
||||
Promtail (or another agent) then reads the log file and ships it to Loki. |
||||
|
||||
Meanwhile loki-canary opens a websocket connection to loki and listens for logs |
||||
it creates |
||||
|
||||
When a log is received on the websocket, the timestamp in the log message is |
||||
compared to the internal array. |
||||
|
||||
If the received log is: |
||||
|
||||
* The next in the array to be received, it is removed from the array and the |
||||
(current time - log timestamp) is recorded in the `response_latency` |
||||
histogram, this is the expected behavior for well behaving logs |
||||
* Not the next in the array received, is is removed from the array, the |
||||
response time is recorded in the `response_latency` histogram, and the |
||||
`out_of_order_entries` counter is incremented |
||||
* Not in the array at all, it is checked against a separate list of received |
||||
logs to either increment the `duplicate_entries` counter or the |
||||
`unexpected_entries` counter. |
||||
|
||||
In the background, loki-canary also runs a timer which iterates through all the |
||||
entries in the internal array, if any are older than the duration specified by |
||||
the `-wait` flag (default 60s), they are removed from the array and the |
||||
`websocket_missing_entries` counter is incremented. Then an additional query is |
||||
made directly to loki for these missing entries to determine if they were |
||||
actually missing or just didn't make it down the websocket. If they are not |
||||
found in the followup query the `missing_entries` counter is incremented. |
||||
|
||||
## Installation |
||||
|
||||
### Binary |
||||
Loki Canary is provided as a pre-compiled binary as part of the |
||||
[Releases](https://github.com/grafana/loki/releases) on GitHub. |
||||
|
||||
### Docker |
||||
Loki Canary is also provided as a Docker container image: |
||||
```bash |
||||
# change tag to the most recent release |
||||
$ docker pull grafana/loki-canary:v0.2.0 |
||||
``` |
||||
|
||||
### Kubernetes |
||||
To run on Kubernetes, you can do something simple like: |
||||
|
||||
`kubectl run loki-canary --generator=run-pod/v1 |
||||
--image=grafana/loki-canary:latest --restart=Never --image-pull-policy=IfNotPresent |
||||
--labels=name=loki-canary -- -addr=loki:3100` |
||||
|
||||
Or you can do something more complex like deploy it as a daemonset, there is a |
||||
ksonnet setup for this in the `production` folder, you can import it using |
||||
jsonnet-bundler: |
||||
|
||||
```shell |
||||
jb install github.com/grafana/loki-canary/production/ksonnet/loki-canary |
||||
``` |
||||
|
||||
Then in your ksonnet environments `main.jsonnet` you'll want something like |
||||
this: |
||||
|
||||
```jsonnet |
||||
local loki_canary = import 'loki-canary/loki-canary.libsonnet'; |
||||
|
||||
loki_canary { |
||||
loki_canary_args+:: { |
||||
addr: "loki:3100", |
||||
port: 80, |
||||
labelname: "instance", |
||||
interval: "100ms", |
||||
size: 1024, |
||||
wait: "3m", |
||||
}, |
||||
_config+:: { |
||||
namespace: "default", |
||||
} |
||||
} |
||||
|
||||
``` |
||||
|
||||
### From Source |
||||
If the other options are not sufficient for your use-case, you can compile |
||||
`loki-canary` yourself: |
||||
|
||||
```bash |
||||
# clone the source tree |
||||
$ git clone https://github.com/grafana/loki |
||||
|
||||
# build the binary |
||||
$ make loki-canary |
||||
|
||||
# (optionally build the container image) |
||||
$ make loki-canary-image |
||||
``` |
||||
|
||||
## Configuration |
||||
|
||||
It is required to pass in the Loki address with the `-addr` flag, if your server |
||||
uses TLS, also pass `-tls=true` (this will create a `wss://` instead of `ws://` |
||||
connection) |
||||
|
||||
You should also pass the `-labelname` and `-labelvalue` flags, these are used by |
||||
loki-canary to filter the log stream to only process logs for this instance of |
||||
loki-canary, so they must be unique per each of your loki-canary instances. The |
||||
ksonnet config in this project accomplishes this by passing in the pod name as |
||||
the labelvalue |
||||
|
||||
If you get a high number of `unexpected_entries` you may not be waiting long |
||||
enough and should increase `-wait` from 60s to something larger. |
||||
|
||||
__Be cognizant__ of the relationship between `pruneinterval` and the `interval`. |
||||
For example, with an interval of 10ms (100 logs per second) and a prune interval |
||||
of 60s, you will write 6000 logs per minute, if those logs were not received |
||||
over the websocket, the canary will attempt to query loki directly to see if |
||||
they are completely lost. __However__ the query return is limited to 1000 |
||||
results so you will not be able to return all the logs even if they did make it |
||||
to Loki. |
||||
|
||||
__Likewise__, if you lower the `pruneinterval` you risk causing a denial of |
||||
service attack as all your canaries attempt to query for missing logs at |
||||
whatever your `pruneinterval` is defined at. |
||||
|
||||
All options: |
||||
|
||||
```nohighlight |
||||
-addr string |
||||
The Loki server URL:Port, e.g. loki:3100 |
||||
-buckets int |
||||
Number of buckets in the response_latency histogram (default 10) |
||||
-interval duration |
||||
Duration between log entries (default 1s) |
||||
-labelname string |
||||
The label name for this instance of loki-canary to use in the log selector (default "name") |
||||
-labelvalue string |
||||
The unique label value for this instance of loki-canary to use in the log selector (default "loki-canary") |
||||
-pass string |
||||
Loki password |
||||
-port int |
||||
Port which loki-canary should expose metrics (default 3500) |
||||
-pruneinterval duration |
||||
Frequency to check sent vs received logs, also the frequency which queries for missing logs will be dispatched to loki (default 1m0s) |
||||
-size int |
||||
Size in bytes of each log line (default 100) |
||||
-tls |
||||
Does the loki connection use TLS? |
||||
-user string |
||||
Loki username |
||||
-wait duration |
||||
Duration to wait for log entries before reporting them lost (default 1m0s) |
||||
``` |
||||
@ -1,42 +0,0 @@ |
||||
--- |
||||
kind: DaemonSet |
||||
apiVersion: extensions/v1beta1 |
||||
metadata: |
||||
labels: |
||||
app: loki-canary |
||||
name: loki-canary |
||||
name: loki-canary |
||||
spec: |
||||
template: |
||||
metadata: |
||||
name: loki-canary |
||||
labels: |
||||
app: loki-canary |
||||
spec: |
||||
containers: |
||||
- args: |
||||
- -addr=loki:3100 |
||||
image: grafana/loki-canary:latest |
||||
imagePullPolicy: IfNotPresent |
||||
name: loki-canary |
||||
resources: {} |
||||
--- |
||||
apiVersion: v1 |
||||
kind: Service |
||||
metadata: |
||||
name: loki-canary |
||||
labels: |
||||
app: loki-canary |
||||
spec: |
||||
type: ClusterIP |
||||
selector: |
||||
app: loki-canary |
||||
ports: |
||||
- name: metrics |
||||
protocol: TCP |
||||
port: 3500 |
||||
targetPort: 3500 |
||||
|
||||
|
||||
|
||||
|
||||
@ -1,36 +0,0 @@ |
||||
--- |
||||
apiVersion: v1 |
||||
kind: Pod |
||||
metadata: |
||||
labels: |
||||
app: loki-canary |
||||
name: loki-canary |
||||
name: loki-canary |
||||
spec: |
||||
containers: |
||||
- args: |
||||
- -addr=loki:3100 |
||||
image: grafana/loki-canary:latest |
||||
imagePullPolicy: IfNotPresent |
||||
name: loki-canary |
||||
resources: {} |
||||
--- |
||||
apiVersion: v1 |
||||
kind: Service |
||||
metadata: |
||||
name: loki-canary |
||||
labels: |
||||
app: loki-canary |
||||
spec: |
||||
type: ClusterIP |
||||
selector: |
||||
app: loki-canary |
||||
ports: |
||||
- name: metrics |
||||
protocol: TCP |
||||
port: 3500 |
||||
targetPort: 3500 |
||||
|
||||
|
||||
|
||||
|
||||
|
After Width: | Height: | Size: 89 KiB |
@ -0,0 +1,29 @@ |
||||
# Loki Clients |
||||
|
||||
Loki supports the following official clients for sending logs: |
||||
|
||||
1. [Promtail](./promtail/README.md) |
||||
2. [Docker Driver](./docker-driver/README.md) |
||||
3. [Fluentd](./fluentd.md) |
||||
|
||||
## Picking a Client |
||||
|
||||
While all clients can be used simultaneously to cover multiple use cases, which |
||||
client is initially picked to send logs depends on your use case. |
||||
|
||||
Promtail is the client of choice when you're running Kubernetes, as you can |
||||
configure it to automatically scrape logs from pods running on the same node |
||||
that Promtail runs on. Promtail and Prometheus running together in Kubernetes |
||||
enables powerful debugging: if Prometheus and Promtail use the same labels, |
||||
users can use tools like Grafana to switch between metrics and logs based on the |
||||
label set. |
||||
|
||||
Promtail is also the client of choice on bare-metal: since it can be configured |
||||
to tail logs from all files given a host path, it is the easiest way to send |
||||
logs to Loki from plain-text files (e.g., things that log to `/var/log/*.log`). |
||||
|
||||
When using Docker and not Kubernetes, the Docker Logging driver should be used, |
||||
as it automatically adds labels appropriate to the running container. |
||||
|
||||
The Fluentd plugin is ideal when you already have Fluentd deployed and you don't |
||||
need the service discovery capabilities of Promtail. |
||||
@ -0,0 +1,54 @@ |
||||
# Docker Driver Client |
||||
|
||||
Loki officially supports a Docker plugin that will read logs from Docker |
||||
containers and ship them to Loki. The plugin can be configured to send the logs |
||||
to a private Loki instance or [Grafana Cloud](https://grafana.com/oss/loki). |
||||
|
||||
> Docker plugins are not yet supported on Windows; see the |
||||
> [Docker docs](https://docs.docker.com/engine/extend) for more information. |
||||
|
||||
Documentation on configuring the Loki Docker Driver can be found on the |
||||
[configuration page](./configuration.md). |
||||
|
||||
## Installing |
||||
|
||||
The Docker plugin must be installed on each Docker host that will be running |
||||
containers you want to collect logs from. |
||||
|
||||
Run the following command to install the plugin: |
||||
|
||||
```bash |
||||
docker plugin install grafana/loki-docker-driver:latest --alias loki |
||||
--grant-all-permissions |
||||
``` |
||||
|
||||
To check installed plugins, use the `docker plugin ls` command. Plugins that |
||||
have started successfully are listed as enabled: |
||||
|
||||
``` |
||||
$ docker plugin ls |
||||
ID NAME DESCRIPTION ENABLED |
||||
ac720b8fcfdb loki Loki Logging Driver true |
||||
``` |
||||
|
||||
Once the plugin is installed it can be [configured](./configuration.md). |
||||
|
||||
## Upgrading |
||||
|
||||
The upgrade process involves disabling the existing plugin, upgrading, and then |
||||
re-enabling: |
||||
|
||||
```bash |
||||
docker plugin disable loki |
||||
docker plugin upgrade loki grafana/loki-docker-driver:master |
||||
docker plugin enable loki |
||||
``` |
||||
|
||||
## Uninstalling |
||||
|
||||
To cleanly uninstall the plugin, disable and remove it: |
||||
|
||||
```bash |
||||
docker plugin disable loki |
||||
docker plugin rm loki |
||||
``` |
||||
@ -0,0 +1,152 @@ |
||||
# Configuring the Docker Driver |
||||
|
||||
The Docker daemon on each machine has a default logging driver and |
||||
each container will use the default driver unless configured otherwise. |
||||
|
||||
## Change the logging driver for a container |
||||
|
||||
The `docker run` command can be configured to use a different logging driver |
||||
than the Docker daemon's default with the `--log-driver` flag. Any options that |
||||
the logging driver supports can be set using the `--log-opt <NAME>=<VALUE>` flag. |
||||
`--log-opt` can be passed multiple times for each option to be set. |
||||
|
||||
The following command will start Grafana in a container and send logs to Grafana |
||||
Cloud, using a batch size of 400 entries and no more than 5 retries if a send |
||||
fails. |
||||
|
||||
```bash |
||||
docker run --log-driver=loki \ |
||||
--log-opt loki-url="https://<user_id>:<password>@logs-us-west1.grafana.net/loki/api/v1/push" \ |
||||
--log-opt loki-retries=5 \ |
||||
--log-opt loki-batch-size=400 \ |
||||
grafana/grafana |
||||
``` |
||||
|
||||
> **Note**: The Loki logging driver still uses the json-log driver in |
||||
> combination with sending logs to Loki. This is mainly useful to keep the |
||||
> `docker logs` command working. You can adjust file size and rotation |
||||
> using the respective log option `max-size` and `max-file`. |
||||
|
||||
## Change the default logging driver |
||||
|
||||
If you want the Loki logging driver to be the default for all containers, |
||||
change Docker's `daemon.json` file (located in `/etc/docker` on Linux) and set |
||||
the value of `log-driver` to `loki`: |
||||
|
||||
```json |
||||
{ |
||||
"debug": true, |
||||
"log-driver": "loki" |
||||
} |
||||
``` |
||||
|
||||
Options for the logging driver can also be configured with `log-opts` in the |
||||
`daemon.json`: |
||||
|
||||
```json |
||||
{ |
||||
"debug" : true, |
||||
"log-driver": "loki", |
||||
"log-opts": { |
||||
"loki-url": "https://<user_id>:<password>@logs-us-west1.grafana.net/loki/api/v1/push", |
||||
"loki-batch-size": "400" |
||||
} |
||||
} |
||||
``` |
||||
|
||||
> **Note**: log-opt configuration options in daemon.json must be provided as |
||||
> strings. Boolean and numeric values (such as the value for loki-batch-size in |
||||
> the example above) must therefore be enclosed in quotes (`"`). |
||||
|
||||
After changing `daemon.json`, restart the Docker daemon for the changes to take |
||||
effect. All containers from that host will then send logs to Loki. |
||||
|
||||
## Configure the logging driver for a Swarm service or Compose |
||||
|
||||
You can also configure the logging driver for a [swarm |
||||
service](https://docs.docker.com/engine/swarm/how-swarm-mode-works/services/) |
||||
directly in your compose file. This also applies for `docker-compose`: |
||||
|
||||
```yaml |
||||
version: "3.7" |
||||
services: |
||||
logger: |
||||
image: grafana/grafana |
||||
logging: |
||||
driver: loki |
||||
options: |
||||
loki-url: "https://<user_id>:<password>@logs-us-west1.grafana.net/loki/api/v1/push" |
||||
``` |
||||
|
||||
You can then deploy your stack using: |
||||
|
||||
```bash |
||||
docker stack deploy my_stack_name --compose-file docker-compose.yaml |
||||
``` |
||||
|
||||
Or with `docker-compose`: |
||||
|
||||
```bash |
||||
docker-compose -f docker-compose.yaml up |
||||
``` |
||||
|
||||
Once deployed, the Grafana service will send its logs to Loki. |
||||
|
||||
> **Note**: stack name and service name for each swarm service and project name |
||||
> and service name for each compose service are automatically discovered and |
||||
> sent as Loki labels, this way you can filter by them in Grafana. |
||||
|
||||
## Labels |
||||
|
||||
By default, the Docker driver will add the following labels to each log line: |
||||
|
||||
- `filename`: where the log is written to on disk |
||||
- `host`: the hostname where the log has been generated |
||||
- `container_name`: the name of the container generating logs |
||||
- `swarm_stack`, `swarm_service`: added when deploying from Docker Swarm. |
||||
|
||||
Custom labels can be added using the `loki-external-labels`, |
||||
`loki-pipeline-stage-file`, `labels`, `env`, and `env-regex` options. See the |
||||
next section for all supported options. |
||||
|
||||
## Supported log-opt options |
||||
|
||||
The following are all supported options that the Loki logging driver supports: |
||||
|
||||
| Option | Required? | Default Value | Description |
||||
| ------------------------------- | :-------: | :------------------------: | -------------------------------------- | |
||||
| `loki-url` | Yes | | Loki HTTP push endpoint. |
||||
| `loki-external-labels` | No | `container_name={{.Name}}` | Additional label value pairs separated by `,` to send with logs. The value is expanded with the [Docker tag template format](https://docs.docker.com/engine/admin/logging/log_tags/). (e.g.,: `container_name={{.ID}}.{{.Name}},cluster=prod`) |
||||
| `loki-timeout` | No | `10s` | The timeout to use when sending logs to the Loki instance. Valid time units are "ns", "us" (or "µs"), "ms", "s", "m", "h". |
||||
| `loki-batch-wait` | No | `1s` | The amount of time to wait before sending a log batch complete or not. Valid time units are "ns", "us" (or "µs"), "ms", "s", "m", "h". |
||||
| `loki-batch-size` | No | `102400` | The maximum size of a log batch to send. |
||||
| `loki-min-backoff` | No | `100ms` | The minimum amount of time to wait before retrying a batch. Valid time units are "ns", "us" (or "µs"), "ms", "s", "m", "h". |
||||
| `loki-max-backoff` | No | `10s` | The maximum amount of time to wait before retrying a batch. Valid time units are "ns", "us" (or "µs"), "ms", "s", "m", "h". |
||||
| `loki-retries` | No | `10` | The maximum amount of retries for a log batch. |
||||
| `loki-pipeline-stage-file` | No | | The location of a pipeline stage configuration file. Pipeline stages allows to parse log lines to extract more labels. [See the Promtail documentation for more info.](../promtail/pipelines.md) |
||||
| `loki-tls-ca-file` | No | | Set the path to a custom certificate authority. |
||||
| `loki-tls-cert-file` | No | | Set the path to a client certificate file. |
||||
| `loki-tls-key-file` | No | | Set the path to a client key. |
||||
| `loki-tls-server-name` | No | | Name used to validate the server certificate. |
||||
| `loki-tls-insecure-skip-verify` | No | `false` | Allow to skip tls verification. |
||||
| `loki-proxy-url` | No | | Proxy URL use to connect to Loki. |
||||
| `max-size` | No | -1 | The maximum size of the log before it is rolled. A positive integer plus a modifier representing the unit of measure (k, m, or g). Defaults to -1 (unlimited). This is used by json-log required to keep the `docker log` command working. |
||||
| `max-file` | No | 1 | The maximum number of log files that can be present. If rolling the logs creates excess files, the oldest file is removed. Only effective when max-size is also set. A positive integer. Defaults to 1. |
||||
| `labels` | No | | Comma-separated list of keys of labels, which should be included in message, if these labels are specified for container. |
||||
| `env` | No | | Comma-separated list of keys of environment variables to be included in message if they specified for a container. |
||||
| `env-regex` | No | | A regular expression to match logging-related environment variables. Used for advanced log label options. If there is collision between the label and env keys, the value of the env takes precedence. Both options add additional fields to the labels of a logging message. |
||||
|
||||
## Troubleshooting |
||||
|
||||
Plugin logs can be found as docker daemon log. To enable debug mode refer to the |
||||
[Docker daemon documentation](https://docs.docker.com/config/daemon/). |
||||
|
||||
The standard output (`stdout`) of a plugin is redirected to Docker logs. Such |
||||
entries are prefixed with `plugin=`. |
||||
|
||||
To find out the plugin ID of the Loki logging driver, use `docker plugin ls` and |
||||
look for the `loki` entry. |
||||
|
||||
Depending on your system, location of Docker daemon logging may vary. Refer to |
||||
[Docker documentation for Docker daemon](https://docs.docker.com/config/daemon/) |
||||
log location for your specific platform. |
||||
@ -0,0 +1,179 @@ |
||||
# Fluentd |
||||
|
||||
Loki has a [Fluentd](https://fluentd.org/) output plugin called |
||||
`fluent-plugin-grafana-loki` that enables shipping logs to a private Loki |
||||
instance or [Grafana Cloud](https://grafana.com/oss/loki). |
||||
|
||||
The plugin offers two line formats and uses protobuf to send compressed data to |
||||
Loki. |
||||
|
||||
Key features: |
||||
|
||||
* `extra_labels`: Labels to be added to every line of a log file, useful for |
||||
designating environments |
||||
* `label_keys`: Customizable list of keys for stream labels |
||||
* `line_format`: Format to use when flattening the record to a log line (`json` |
||||
or `key_value`). |
||||
|
||||
## Installation |
||||
|
||||
```bash |
||||
$ gem install fluent-plugin-grafana-loki |
||||
``` |
||||
|
||||
## Usage |
||||
|
||||
In your Fluentd configuration, use `@type loki`. Additional configuration is |
||||
optional. Default values look like this: |
||||
|
||||
``` |
||||
<match **> |
||||
@type loki |
||||
url "https://logs-us-west1.grafana.net" |
||||
username "#{ENV['LOKI_USERNAME']}" |
||||
password "#{ENV['LOKI_PASSWORD']}" |
||||
extra_labels {"env":"dev"} |
||||
flush_interval 10s |
||||
flush_at_shutdown true |
||||
buffer_chunk_limit 1m |
||||
</match> |
||||
``` |
||||
|
||||
### Multi-worker usage |
||||
|
||||
Loki doesn't currently support out-of-order inserts - if you try to insert a log |
||||
entry with an earlier timestamp after a log entry with with identical labels but |
||||
a later timestamp, the insert will fail with the message |
||||
`HTTP status code: 500, message: rpc error: code = Unknown desc = Entry out of |
||||
order`. Therefore, in order to use this plugin in a multi-worker Fluentd setup, |
||||
you'll need to include the worker ID in the labels sent to Loki. |
||||
|
||||
For example, using |
||||
[fluent-plugin-record-modifier](https://github.com/repeatedly/fluent-plugin-record-modifier): |
||||
|
||||
``` |
||||
<filter mytag> |
||||
@type record_modifier |
||||
<record> |
||||
fluentd_worker "#{worker_id}" |
||||
</record> |
||||
</filter> |
||||
|
||||
<match mytag> |
||||
@type loki |
||||
# ... |
||||
label_keys "fluentd_worker" |
||||
# ... |
||||
</match> |
||||
``` |
||||
|
||||
## Docker Image |
||||
|
||||
There is a Docker image `grafana/fluent-plugin-grafana-loki:master` which |
||||
contains default configuration files to git log information |
||||
a host's `/var/log` directory, and from the host's journald. To use it, you can set |
||||
the `LOKI_URL`, `LOKI_USERNAME`, and `LOKI_PASSWORD` environment variables |
||||
(`LOKI_USERNAME` and `LOKI_PASSWORD` can be left blank if Loki is not protected |
||||
behind an authenticating proxy). |
||||
|
||||
An example Docker Swarm Compose configuration looks like: |
||||
|
||||
```yaml |
||||
services: |
||||
fluentd: |
||||
image: grafana/fluent-plugin-grafana-loki:master |
||||
command: |
||||
- "fluentd" |
||||
- "-v" |
||||
- "-p" |
||||
- "/fluentd/plugins" |
||||
environment: |
||||
LOKI_URL: http://loki:3100 |
||||
LOKI_USERNAME: |
||||
LOKI_PASSWORD: |
||||
deploy: |
||||
mode: global |
||||
configs: |
||||
- source: loki_config |
||||
target: /fluentd/etc/loki/loki.conf |
||||
networks: |
||||
- loki |
||||
volumes: |
||||
- host_logs:/var/log |
||||
# Needed for journald log ingestion: |
||||
- /etc/machine-id:/etc/machine-id |
||||
- /dev/log:/dev/log |
||||
- /var/run/systemd/journal/:/var/run/systemd/journal/ |
||||
logging: |
||||
options: |
||||
tag: infra.monitoring |
||||
``` |
||||
|
||||
## Configuration |
||||
|
||||
### Proxy Support |
||||
|
||||
Starting with version 0.8.0, this gem uses `excon`, which supports proxy with |
||||
environment variables - https://github.com/excon/excon#proxy-support |
||||
|
||||
### `url` |
||||
|
||||
The URL of the Loki server to send logs to. When sending data the publish path |
||||
(`/loki/api/v1/push`) will automatically be appended. By default the URL is set to |
||||
`https://logs-us-west1.grafana.net`, the URL of the Grafana Labs [hosted |
||||
Loki](https://grafana.com/loki) service. |
||||
|
||||
### `username` / `password` |
||||
|
||||
Specify a username and password if the Loki server requires authentication. |
||||
If using the Grafana Labs' hosted Loki, the username needs to be set to your |
||||
instanceId and the password should be a grafana.com API Key. |
||||
|
||||
### `tenant` |
||||
|
||||
Loki is a multi-tenant log storage platform and all requests sent must include a |
||||
tenant. For some installations (like Hosted Loki) the tenant will be set |
||||
automatically by an authenticating proxy. Otherwise you can define a tenant to |
||||
be passed through. The tenant can be any string value. |
||||
|
||||
### output format |
||||
|
||||
Loki is intended to index and group log streams using only a small set of |
||||
labels and is not intended for full-text indexing. When sending logs to Loki, |
||||
the majority of log message will be sent as a single log "line". |
||||
|
||||
There are few configurations settings to control the output format: |
||||
|
||||
- `extra_labels`: (default: nil) set of labels to include with every Loki |
||||
stream. (e.g., `{"env":"dev", "datacenter": "dc1"}`) |
||||
|
||||
- `remove_keys`: (default: nil) comma separated list of record keys to |
||||
remove. All other keys will be placed into the log line. |
||||
|
||||
- `label_keys`: (default: "job,instance") comma separated list of keys to use as |
||||
stream labels. |
||||
|
||||
- `line_format`: format to use when flattening the record to a log line. Valid |
||||
values are `json` or `key_value`. If set to `json` the log line sent to Loki |
||||
will be the fluentd record (excluding any keys extracted out as labels) dumped |
||||
as json. If set to `key_value`, the log line will be each item in the record |
||||
concatenated together (separated by a single space) in the format |
||||
`<key>=<value>`. |
||||
|
||||
- `drop_single_key`: if set to true, when the set of extracted `label_keys` |
||||
after dropping with `remove_keys`, the log line sent to Loki will just be |
||||
the value of the single remaining record key. |
||||
|
||||
### Buffer options |
||||
|
||||
`fluentd-plugin-loki` extends [Fluentd's builtin Output |
||||
plugin](https://docs.fluentd.org/v1.0/articles/output-plugin-overview) and uses |
||||
the `compat_parameters` plugin helper. It adds the following options: |
||||
|
||||
``` |
||||
buffer_type memory |
||||
flush_interval 10s |
||||
retry_limit 17 |
||||
retry_wait 1.0 |
||||
num_threads 1 |
||||
``` |
||||
@ -0,0 +1,83 @@ |
||||
# Promtail |
||||
|
||||
Promtail is an agent which ships the contents of local logs to a private Loki |
||||
instance or [Grafana Cloud](https://grafana.com/oss/loki). It is usually |
||||
deployed to every machine that has applications needed to be monitored. |
||||
|
||||
It primarily: |
||||
|
||||
1. Discovers targets |
||||
2. Attaches labels to log streams |
||||
3. Pushes them to the Loki instance. |
||||
|
||||
Currently, Promtail can tail logs from two sources: local log files and the |
||||
systemd journal (on AMD64 machines only). |
||||
|
||||
## Log File Discovery |
||||
|
||||
Before Promtail can ship any data from log files to Loki, it needs to find out |
||||
information about its environment. Specifically, this means discovering |
||||
applications emitting log lines to files that need to be monitored. |
||||
|
||||
Promtail borrows the same |
||||
[service discovery mechanism from Prometheus](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config), |
||||
although it currently only supports `static` and `kubernetes` service |
||||
discovery. This limitation is due to the fact that `promtail` is deployed as a |
||||
daemon to every local machine and, as such, does not discover label from other |
||||
machines. `kubernetes` service discovery fetches required labels from the |
||||
Kubernetes API server while `static` usually covers all other use cases. |
||||
|
||||
Just like Prometheus, `promtail` is configured using a `scrape_configs` stanza. |
||||
`relabel_configs` allows for fine-grained control of what to ingest, what to |
||||
drop, and the final metadata to attach to the log line. Refer to the docs for |
||||
[configuring Promtail](configuration.md) for more details. |
||||
|
||||
## Labeling and Parsing |
||||
|
||||
During service discovery, metadata is determined (pod name, filename, etc.) that |
||||
may be attached to the log line as a label for easier identification when |
||||
querying logs in Loki. Through `relabel_configs`, discovered labels can be |
||||
mutated into the desired form. |
||||
|
||||
To allow more sophisticated filtering afterwards, Promtail allows to set labels |
||||
not only from service discovery, but also based on the contents of each log |
||||
line. The `pipeline_stages` can be used to add or update labels, correct the |
||||
timestamp, or re-write log lines entirely. Refer to the documentation for |
||||
[pipelines](pipelines.md) for more details. |
||||
|
||||
## Shipping |
||||
|
||||
Once Promtail has a set of targets (i.e., things to read from, like files) and |
||||
all labels are set correctly, it will start tailing (continuously reading) the |
||||
logs from targets. Once enough data is read into memory or after a configurable |
||||
timeout, it is flushed as a single batch to Loki. |
||||
|
||||
As Promtail reads data from sources (files and systemd journal, if configured), |
||||
it will track the last offset it read in a positions file. By default, the |
||||
positions file is stored at `/var/log/positions.yaml`. The positions file helps |
||||
Promtail continue reading from where it left off in the case of the Promtail |
||||
instance restarting. |
||||
|
||||
## API |
||||
|
||||
Promtail features an embedded web server exposing a web console at `/` and the following API endpoints: |
||||
|
||||
### `GET /ready` |
||||
|
||||
This endpoint returns 200 when Promtail is up and running, and there's at least one working target. |
||||
|
||||
### `GET /metrics` |
||||
|
||||
This endpoint returns Promtail metrics for Prometheus. See |
||||
"[Operations > Observability](../../operations/observability.md)" to get a list |
||||
of exported metrics. |
||||
|
||||
### Promtail web server config |
||||
|
||||
The web server exposed by Promtail can be configured in the Promtail `.yaml` config file: |
||||
|
||||
```yaml |
||||
server: |
||||
http_listen_host: 127.0.0.1 |
||||
http_listen_port: 9080 |
||||
``` |
||||
@ -0,0 +1,949 @@ |
||||
# Configuring Promtail |
||||
|
||||
Promtail is configured in a YAML file (usually referred to as `config.yaml`) |
||||
which contains information on the Promtail server, where positions are stored, |
||||
and how to scrape logs from files. |
||||
|
||||
* [Configuration File Reference](#configuration-file-reference) |
||||
* [server_config](#server_config) |
||||
* [client_config](#client_config) |
||||
* [position_config](#position_config) |
||||
* [scrape_config](#scrape_config) |
||||
* [pipeline_stages](#pipeline_stages) |
||||
* [regex_stage](#regex_stage) |
||||
* [json_stage](#json_stage) |
||||
* [template_stage](#template_stage) |
||||
* [match_stage](#match_stage) |
||||
* [timestamp_stage](#timestamp_stage) |
||||
* [output_stage](#output_stage) |
||||
* [labels_stage](#labels_stage) |
||||
* [metrics_stage](#metrics_stage) |
||||
* [metric_counter](#metric_counter) |
||||
* [metric_gauge](#metric_gauge) |
||||
* [metric_histogram](#metric_histogram) |
||||
* [journal_config](#journal_config) |
||||
* [relabel_config](#relabel_config) |
||||
* [static_config](#static_config) |
||||
* [file_sd_config](#file_sd_config) |
||||
* [kubernetes_sd_config](#kubernetes_sd_config) |
||||
* [target_config](#target_config) |
||||
* [Example Docker Config](#example-docker-config) |
||||
* [Example Journal Config](#example-journal-config) |
||||
|
||||
## Configuration File Reference |
||||
|
||||
To specify which configuration file to load, pass the `-config.file` flag at the |
||||
command line. The file is written in [YAML format](https://en.wikipedia.org/wiki/YAML), |
||||
defined by the schema below. Brackets indicate that a parameter is optional. For |
||||
non-list parameters the value is set to the specified default. |
||||
|
||||
For more detailed information on configuring how to discover and scrape logs from |
||||
targets, see [Scraping](scraping.md). For more information on transforming logs |
||||
from scraped targets, see [Pipelines](pipelines.md). |
||||
|
||||
Generic placeholders are defined as follows: |
||||
|
||||
* `<boolean>`: a boolean that can take the values `true` or `false` |
||||
* `<int>`: any integer matching the regular expression `[1-9]+[0-9]*` |
||||
* `<duration>`: a duration matching the regular expression `[0-9]+(ms|[smhdwy])` |
||||
* `<labelname>`: a string matching the regular expression `[a-zA-Z_][a-zA-Z0-9_]*` |
||||
* `<labelvalue>`: a string of Unicode characters |
||||
* `<filename>`: a valid path relative to current working directory or an |
||||
absolute path. |
||||
* `<host>`: a valid string consisting of a hostname or IP followed by an optional port number |
||||
* `<string>`: a regular string |
||||
* `<secret>`: a regular string that is a secret, such as a password |
||||
|
||||
Supported contents and default values of `config.yaml`: |
||||
|
||||
```yaml |
||||
# Configures the server for Promtail. |
||||
[server: <server_config>] |
||||
|
||||
# Describes how promtail connects to multiple instances |
||||
# of Loki, sending logs to each. |
||||
clients: |
||||
- [<client_config>] |
||||
|
||||
# Describes how to save read file offsets to disk |
||||
[positions: <position_config>] |
||||
|
||||
scrape_configs: |
||||
- [<scrape_config>] |
||||
|
||||
# Configures how tailed targets will be watched. |
||||
[target_config: <target_config>] |
||||
``` |
||||
|
||||
## server_config |
||||
|
||||
The `server_config` block configures Promtail's behavior as an HTTP server: |
||||
|
||||
```yaml |
||||
# HTTP server listen host |
||||
[http_listen_host: <string>] |
||||
|
||||
# HTTP server listen port |
||||
[http_listen_port: <int> | default = 80] |
||||
|
||||
# gRPC server listen host |
||||
[grpc_listen_host: <string>] |
||||
|
||||
# gRPC server listen port |
||||
[grpc_listen_port: <int> | default = 9095] |
||||
|
||||
# Register instrumentation handlers (/metrics, etc.) |
||||
[register_instrumentation: <boolean> | default = true] |
||||
|
||||
# Timeout for graceful shutdowns |
||||
[graceful_shutdown_timeout: <duration> | default = 30s] |
||||
|
||||
# Read timeout for HTTP server |
||||
[http_server_read_timeout: <duration> | default = 30s] |
||||
|
||||
# Write timeout for HTTP server |
||||
[http_server_write_timeout: <duration> | default = 30s] |
||||
|
||||
# Idle timeout for HTTP server |
||||
[http_server_idle_timeout: <duration> | default = 120s] |
||||
|
||||
# Max gRPC message size that can be received |
||||
[grpc_server_max_recv_msg_size: <int> | default = 4194304] |
||||
|
||||
# Max gRPC message size that can be sent |
||||
[grpc_server_max_send_msg_size: <int> | default = 4194304] |
||||
|
||||
# Limit on the number of concurrent streams for gRPC calls (0 = unlimited) |
||||
[grpc_server_max_concurrent_streams: <int> | default = 100] |
||||
|
||||
# Log only messages with the given severity or above. Supported values [debug, |
||||
# info, warn, error] |
||||
[log_level: <string> | default = "info"] |
||||
|
||||
# Base path to server all API routes from (e.g., /v1/). |
||||
[http_path_prefix: <string>] |
||||
``` |
||||
|
||||
## client_config |
||||
|
||||
The `client_config` block configures how Promtail connects to an instance of |
||||
Loki: |
||||
|
||||
```yaml |
||||
# The URL where Loki is listening, denoted in Loki as http_listen_host and |
||||
# http_listen_port. If Loki is running in microservices mode, this is the HTTP |
||||
# URL for the Distributor. |
||||
url: <string> |
||||
|
||||
# Maximum amount of time to wait before sending a batch, even if that |
||||
# batch isn't full. |
||||
[batchwait: <duration> | default = 1s] |
||||
|
||||
# Maximum batch size (in bytes) of logs to accumulate before sending |
||||
# the batch to Loki. |
||||
[batchsize: <int> | default = 102400] |
||||
|
||||
# If using basic auth, configures the username and password |
||||
# sent. |
||||
basic_auth: |
||||
# The username to use for basic auth |
||||
[username: <string>] |
||||
|
||||
# The password to use for basic auth |
||||
[password: <string>] |
||||
|
||||
# The file containing the password for basic auth |
||||
[password_file: <filename>] |
||||
|
||||
# Bearer token to send to the server. |
||||
[bearer_token: <secret>] |
||||
|
||||
# File containing bearer token to send to the server. |
||||
[bearer_token_file: <filename>] |
||||
|
||||
# HTTP proxy server to use to connect to the server. |
||||
[proxy_url: <string>] |
||||
|
||||
# If connecting to a TLS server, configures how the TLS |
||||
# authentication handshake will operate. |
||||
tls_config: |
||||
# The CA file to use to verify the server |
||||
[ca_file: <string>] |
||||
|
||||
# The cert file to send to the server for client auth |
||||
[cert_file: <filename>] |
||||
|
||||
# The key file to send to the server for client auth |
||||
[key_file: <filename>] |
||||
|
||||
# Validates that the server name in the server's certificate |
||||
# is this value. |
||||
[server_name: <string>] |
||||
|
||||
# If true, ignores the server certificate being signed by an |
||||
# unknown CA. |
||||
[insecure_skip_verify: <boolean> | default = false] |
||||
|
||||
# Configures how to retry requests to Loki when a request |
||||
# fails. |
||||
backoff_config: |
||||
# Initial backoff time between retries |
||||
[minbackoff: <duration> | default = 100ms] |
||||
|
||||
# Maximum backoff time between retries |
||||
[maxbackoff: <duration> | default = 10s] |
||||
|
||||
# Maximum number of retries to do |
||||
[maxretries: <int> | default = 10] |
||||
|
||||
# Static labels to add to all logs being sent to Loki. |
||||
# Use map like {"foo": "bar"} to add a label foo with |
||||
# value bar. |
||||
external_labels: |
||||
[ <labelname>: <labelvalue> ... ] |
||||
|
||||
# Maximum time to wait for a server to respond to a request |
||||
[timeout: <duration> | default = 10s] |
||||
``` |
||||
|
||||
## position_config |
||||
|
||||
The `position_config` block configures where Promtail will save a file |
||||
indicating how far it has read into a file. It is needed for when Promtail |
||||
is restarted to allow it to continue from where it left off. |
||||
|
||||
```yaml |
||||
# Location of positions file |
||||
[filename: <string> | default = "/var/log/positions.yaml"] |
||||
|
||||
# How often to update the positions file |
||||
[sync_period: <duration> | default = 10s] |
||||
``` |
||||
|
||||
## scrape_config |
||||
|
||||
The `scrape_config` block configures how Promtail can scrape logs from a series |
||||
of targets using a specified discovery method: |
||||
|
||||
```yaml |
||||
# Name to identify this scrape config in the promtail UI. |
||||
job_name: <string> |
||||
|
||||
# Describes how to parse log lines. Suported values [cri docker raw] |
||||
[entry_parser: <string> | default = "docker"] |
||||
|
||||
# Describes how to transform logs from targets. |
||||
[pipeline_stages: <pipeline_stages>] |
||||
|
||||
# Describes how to scrape logs from the journal. |
||||
[journal: <journal_config>] |
||||
|
||||
# Describes how to relabel targets to determine if they should |
||||
# be processed. |
||||
relabel_configs: |
||||
- [<relabel_config>] |
||||
|
||||
# Static targets to scrape. |
||||
static_configs: |
||||
- [<static_config>] |
||||
|
||||
# Files containing targets to scrape. |
||||
file_sd_configs: |
||||
- [<file_sd_configs>] |
||||
|
||||
# Describes how to discover Kubernetes services running on the |
||||
# same host. |
||||
kubernetes_sd_configs: |
||||
- [<kubernetes_sd_config>] |
||||
``` |
||||
|
||||
### pipeline_stages |
||||
|
||||
The [pipeline](./pipelines.md) stages (`pipeline_stages`) is used to transform |
||||
log entries and their labels after discovery. It is simply an array of various |
||||
stages, defined below. |
||||
|
||||
The purpose of most stages is to extract fields and values into a temporary |
||||
set of key-value pairs that is passed around from stage to stage. |
||||
|
||||
```yaml |
||||
- [ |
||||
<regex_stage> |
||||
<json_stage> | |
||||
<template_stage> | |
||||
<match_stage> | |
||||
<timestamp_stage> | |
||||
<output_stage> | |
||||
<labels_stage> | |
||||
<metrics_stage> |
||||
] |
||||
``` |
||||
|
||||
Example: |
||||
|
||||
```yaml |
||||
pipeline_stages: |
||||
- regex: |
||||
expr: "./*" |
||||
- json: |
||||
timestamp: |
||||
source: time |
||||
format: RFC3339 |
||||
labels: |
||||
stream: |
||||
source: json_key_name.json_sub_key_name |
||||
output: |
||||
``` |
||||
|
||||
#### regex_stage |
||||
|
||||
The Regex stage takes a regular expression and extracts captured named groups to |
||||
be used in further stages. |
||||
|
||||
```yaml |
||||
regex: |
||||
# The RE2 regular expression. Each capture group must be named. |
||||
expression: <string> |
||||
|
||||
# Name from extracted data to parse. If empty, uses the log message. |
||||
[source: <string>] |
||||
``` |
||||
|
||||
#### json_stage |
||||
|
||||
The JSON stage parses a log line as JSON and takes |
||||
[JMESPath](http://jmespath.org/) expressions to extract data from the JSON to be |
||||
used in further stages. |
||||
|
||||
```yaml |
||||
json: |
||||
# Set of key/value pairs of JMESPath expressions. The key will be |
||||
# the key in the extracted data while the expression will the value, |
||||
# evaluated as a JMESPath from the source data. |
||||
expressions: |
||||
[ <string>: <string> ... ] |
||||
|
||||
# Name from extracted data to parse. If empty, uses the log message. |
||||
[source: <string>] |
||||
``` |
||||
|
||||
#### template_stage |
||||
|
||||
The template stage uses Go's |
||||
[`text/template`](https://golang.org/pkg/text/template) language to manipulate |
||||
values. |
||||
|
||||
```yaml |
||||
template: |
||||
# Name from extracted data to parse. If key in extract data doesn't exist, an |
||||
# entry for it will be created. |
||||
source: <string> |
||||
|
||||
# Go template string to use. In additional to normal template |
||||
# functions, ToLower, ToUpper, Replace, Trim, TrimLeft, TrimRight, |
||||
# TrimPrefix, TrimSuffix, and TrimSpace are available as functions. |
||||
template: <string> |
||||
``` |
||||
|
||||
Example: |
||||
|
||||
```yaml |
||||
template: |
||||
source: level |
||||
template: '{{ if eq .Value "WARN" }}{{ Replace .Value "WARN" "OK" -1 }}{{ else }}{{ .Value }}{{ end }}' |
||||
``` |
||||
|
||||
#### match_stage |
||||
|
||||
The match stage conditionally executes a set of stages when a log entry matches |
||||
a configurable [LogQL](../../logql.md) stream selector. |
||||
|
||||
```yaml |
||||
match: |
||||
# LogQL stream selector. |
||||
selector: <string> |
||||
|
||||
# Names the pipeline. When defined, creates an additional label in |
||||
# the pipeline_duration_seconds histogram, where the value is |
||||
# concatenated with job_name using an underscore. |
||||
[pipieline_name: <string>] |
||||
|
||||
# Nested set of pipeline stages only if the selector |
||||
# matches the labels of the log entries: |
||||
stages: |
||||
- [ |
||||
<regex_stage> |
||||
<json_stage> | |
||||
<template_stage> | |
||||
<match_stage> | |
||||
<timestamp_stage> | |
||||
<output_stage> | |
||||
<labels_stage> | |
||||
<metrics_stage> |
||||
] |
||||
``` |
||||
|
||||
#### timestamp_stage |
||||
|
||||
The timestamp stage parses data from the extracted map and overrides the final |
||||
time value of the log that is stored by Loki. If this stage isn't present, |
||||
Promtail will associate the timestamp of the log entry with the time that |
||||
log entry was read. |
||||
|
||||
```yaml |
||||
timestamp: |
||||
# Name from extracted data to use for the timestamp. |
||||
source: <string> |
||||
|
||||
# Determines how to parse the time string. Can use |
||||
# pre-defined formats by name: [ANSIC UnixDate RubyDate RFC822 |
||||
# RFC822Z RFC850 RFC1123 RFC1123Z RFC3339 RFC3339Nano Unix |
||||
# UnixMs UnixNs]. |
||||
format: <string> |
||||
|
||||
# IANA Timezone Database string. |
||||
[location: <string>] |
||||
``` |
||||
|
||||
##### output_stage |
||||
|
||||
The output stage takes data from the extracted map and sets the contents of the |
||||
log entry that will be stored by Loki. |
||||
|
||||
```yaml |
||||
output: |
||||
# Name from extracted data to use for the log entry. |
||||
source: <string> |
||||
``` |
||||
|
||||
#### labels_stage |
||||
|
||||
The labels stage takes data from the extracted map and sets additional labels |
||||
on the log entry that will be sent to Loki. |
||||
|
||||
```yaml |
||||
labels: |
||||
# Key is REQUIRED and the name for the label that will be created. |
||||
# Value is optional and will be the name from extracted data whose value |
||||
# will be used for the value of the label. If empty, the value will be |
||||
# inferred to be the same as the key. |
||||
[ <string>: [<string>] ... ] |
||||
``` |
||||
|
||||
#### metrics_stage |
||||
|
||||
The metrics stage allows for defining metrics from the extracted data. |
||||
|
||||
Created metrics are not pushed to Loki and are instead exposed via Promtail's |
||||
`/metrics` endpoint. Prometheus should be configured to scrape Promtail to be |
||||
able to retrieve the metrics configured by this stage. |
||||
|
||||
|
||||
```yaml |
||||
# A map where the key is the name of the metric and the value is a specific |
||||
# metric type. |
||||
metrics: |
||||
[<string>: [ <metric_counter> | <metric_gauge> | <metric_histogram> ] ...] |
||||
``` |
||||
|
||||
##### metric_counter |
||||
|
||||
Defines a counter metric whose value only goes up. |
||||
|
||||
```yaml |
||||
# The metric type. Must be Counter. |
||||
type: Counter |
||||
|
||||
# Describes the metric. |
||||
[description: <string>] |
||||
|
||||
# Key from the extracted data map to use for the mtric, |
||||
# defaulting to the metric's name if not present. |
||||
[source: <string>] |
||||
|
||||
config: |
||||
# Filters down source data and only changes the metric |
||||
# if the targeted value exactly matches the provided string. |
||||
# If not present, all data will match. |
||||
[value: <string>] |
||||
|
||||
# Must be either "inc" or "add" (case insensitive). If |
||||
# inc is chosen, the metric value will increase by 1 for each |
||||
# log line receieved that passed the filter. If add is chosen, |
||||
# the extracted value most be convertible to a positive float |
||||
# and its value will be added to the metric. |
||||
action: <string> |
||||
``` |
||||
|
||||
##### metric_gauge |
||||
|
||||
Defines a gauge metric whose value can go up or down. |
||||
|
||||
```yaml |
||||
# The metric type. Must be Gauge. |
||||
type: Gauge |
||||
|
||||
# Describes the metric. |
||||
[description: <string>] |
||||
|
||||
# Key from the extracted data map to use for the mtric, |
||||
# defaulting to the metric's name if not present. |
||||
[source: <string>] |
||||
|
||||
config: |
||||
# Filters down source data and only changes the metric |
||||
# if the targeted value exactly matches the provided string. |
||||
# If not present, all data will match. |
||||
[value: <string>] |
||||
|
||||
# Must be either "set", "inc", "dec"," add", or "sub". If |
||||
# add, set, or sub is chosen, the extracted value must be |
||||
# convertible to a positive float. inc and dec will increment |
||||
# or decrement the metric's value by 1 respectively. |
||||
action: <string> |
||||
``` |
||||
|
||||
##### metric_histogram |
||||
|
||||
Defines a histogram metric whose values are bucketed. |
||||
|
||||
```yaml |
||||
# The metric type. Must be Histogram. |
||||
type: Histogram |
||||
|
||||
# Describes the metric. |
||||
[description: <string>] |
||||
|
||||
# Key from the extracted data map to use for the mtric, |
||||
# defaulting to the metric's name if not present. |
||||
[source: <string>] |
||||
|
||||
config: |
||||
# Filters down source data and only changes the metric |
||||
# if the targeted value exactly matches the provided string. |
||||
# If not present, all data will match. |
||||
[value: <string>] |
||||
|
||||
# Must be either "inc" or "add" (case insensitive). If |
||||
# inc is chosen, the metric value will increase by 1 for each |
||||
# log line receieved that passed the filter. If add is chosen, |
||||
# the extracted value most be convertible to a positive float |
||||
# and its value will be added to the metric. |
||||
action: <string> |
||||
|
||||
# Holds all the numbers in which to bucket the metric. |
||||
buckets: |
||||
- <int> |
||||
``` |
||||
|
||||
### journal_config |
||||
|
||||
The `journal_config` block configures reading from the systemd journal from |
||||
Promtail. Requires a build of Promtail that has journal support _enabled_. If |
||||
using the AMD64 Docker image, this is enabled by default. |
||||
|
||||
```yaml |
||||
# The oldest relative time from process start that will be read |
||||
# and sent to Loki. |
||||
[max_age: <duration> | default = 7h] |
||||
|
||||
# Label map to add to every log coming out of the journal |
||||
labels: |
||||
[ <labelname>: <labelvalue> ... ] |
||||
|
||||
# Path to a directory to read entries from. Defaults to system |
||||
# path when empty. |
||||
[path: <string>] |
||||
``` |
||||
|
||||
### relabel_config |
||||
|
||||
Relabeling is a powerful tool to dynamically rewrite the label set of a target |
||||
before it gets scraped. Multiple relabeling steps can be configured per scrape |
||||
configuration. They are applied to the label set of each target in order of |
||||
their appearance in the configuration file. |
||||
|
||||
After relabeling, the `instance` label is set to the value of `__address__` by |
||||
default if it was not set during relabeling. The `__scheme__` and |
||||
`__metrics_path__` labels are set to the scheme and metrics path of the target |
||||
respectively. The `__param_<name>` label is set to the value of the first passed |
||||
URL parameter called `<name>`. |
||||
|
||||
Additional labels prefixed with `__meta_` may be available during the relabeling |
||||
phase. They are set by the service discovery mechanism that provided the target |
||||
and vary between mechanisms. |
||||
|
||||
Labels starting with `__` will be removed from the label set after target |
||||
relabeling is completed. |
||||
|
||||
If a relabeling step needs to store a label value only temporarily (as the |
||||
input to a subsequent relabeling step), use the `__tmp` label name prefix. This |
||||
prefix is guaranteed to never be used by Prometheus itself. |
||||
|
||||
```yaml |
||||
# The source labels select values from existing labels. Their content is concatenated |
||||
# using the configured separator and matched against the configured regular expression |
||||
# for the replace, keep, and drop actions. |
||||
[ source_labels: '[' <labelname> [, ...] ']' ] |
||||
|
||||
# Separator placed between concatenated source label values. |
||||
[ separator: <string> | default = ; ] |
||||
|
||||
# Label to which the resulting value is written in a replace action. |
||||
# It is mandatory for replace actions. Regex capture groups are available. |
||||
[ target_label: <labelname> ] |
||||
|
||||
# Regular expression against which the extracted value is matched. |
||||
[ regex: <regex> | default = (.*) ] |
||||
|
||||
# Modulus to take of the hash of the source label values. |
||||
[ modulus: <uint64> ] |
||||
|
||||
# Replacement value against which a regex replace is performed if the |
||||
# regular expression matches. Regex capture groups are available. |
||||
[ replacement: <string> | default = $1 ] |
||||
|
||||
# Action to perform based on regex matching. |
||||
[ action: <relabel_action> | default = replace ] |
||||
``` |
||||
|
||||
`<regex>` is any valid |
||||
[RE2 regular expression](https://github.com/google/re2/wiki/Syntax). It is |
||||
required for the `replace`, `keep`, `drop`, `labelmap`,`labeldrop` and |
||||
`labelkeep` actions. The regex is anchored on both ends. To un-anchor the regex, |
||||
use `.*<regex>.*`. |
||||
|
||||
`<relabel_action>` determines the relabeling action to take: |
||||
|
||||
* `replace`: Match `regex` against the concatenated `source_labels`. Then, set |
||||
`target_label` to `replacement`, with match group references |
||||
(`${1}`, `${2}`, ...) in `replacement` substituted by their value. If `regex` |
||||
does not match, no replacement takes place. |
||||
* `keep`: Drop targets for which `regex` does not match the concatenated `source_labels`. |
||||
* `drop`: Drop targets for which `regex` matches the concatenated `source_labels`. |
||||
* `hashmod`: Set `target_label` to the `modulus` of a hash of the concatenated `source_labels`. |
||||
* `labelmap`: Match `regex` against all label names. Then copy the values of the matching labels |
||||
to label names given by `replacement` with match group references |
||||
(`${1}`, `${2}`, ...) in `replacement` substituted by their value. |
||||
* `labeldrop`: Match `regex` against all label names. Any label that matches will be |
||||
removed from the set of labels. |
||||
* `labelkeep`: Match `regex` against all label names. Any label that does not match will be |
||||
removed from the set of labels. |
||||
|
||||
Care must be taken with `labeldrop` and `labelkeep` to ensure that logs are |
||||
still uniquely labeled once the labels are removed. |
||||
|
||||
### static_config |
||||
|
||||
A `static_config` allows specifying a list of targets and a common label set |
||||
for them. It is the canonical way to specify static targets in a scrape |
||||
configuration. |
||||
|
||||
```yaml |
||||
# Configures the discovery to look on the current machine. Must be either |
||||
# localhost or the hostname of the current computer. |
||||
targets: |
||||
- localhost |
||||
|
||||
# Defines a file to scrape and an optional set of additional labels to apply to |
||||
# all streams defined by the files from __path__. |
||||
labels: |
||||
# The path to load logs from. Can use glob patterns (e.g., /var/log/*.log). |
||||
__path__: <string> |
||||
|
||||
# Additional labels to assign to the logs |
||||
[ <labelname>: <labelvalue> ... ] |
||||
``` |
||||
|
||||
### file_sd_config |
||||
|
||||
File-based service discovery provides a more generic way to configure static |
||||
targets and serves as an interface to plug in custom service discovery |
||||
mechanisms. |
||||
|
||||
It reads a set of files containing a list of zero or more |
||||
`<static_config>`s. Changes to all defined files are detected via disk watches |
||||
and applied immediately. Files may be provided in YAML or JSON format. Only |
||||
changes resulting in well-formed target groups are applied. |
||||
|
||||
The JSON file must contain a list of static configs, using this format: |
||||
|
||||
```yaml |
||||
[ |
||||
{ |
||||
"targets": [ "localhost" ], |
||||
"labels": { |
||||
"__path__": "<string>", ... |
||||
"<labelname>": "<labelvalue>", ... |
||||
} |
||||
}, |
||||
... |
||||
] |
||||
``` |
||||
|
||||
As a fallback, the file contents are also re-read periodically at the specified |
||||
refresh interval. |
||||
|
||||
Each target has a meta label `__meta_filepath` during the |
||||
[relabeling phase](#relabel_config). Its value is set to the |
||||
filepath from which the target was extracted. |
||||
|
||||
```yaml |
||||
# Patterns for files from which target groups are extracted. |
||||
files: |
||||
[ - <filename_pattern> ... ] |
||||
|
||||
# Refresh interval to re-read the files. |
||||
[ refresh_interval: <duration> | default = 5m ] |
||||
``` |
||||
|
||||
Where `<filename_pattern>` may be a path ending in `.json`, `.yml` or `.yaml`. |
||||
The last path segment may contain a single `*` that matches any character |
||||
sequence, e.g. `my/path/tg_*.json`. |
||||
|
||||
### kubernetes_sd_config |
||||
|
||||
Kubernetes SD configurations allow retrieving scrape targets from |
||||
[Kubernetes'](https://kubernetes.io/) REST API and always staying synchronized |
||||
with the cluster state. |
||||
|
||||
One of the following `role` types can be configured to discover targets: |
||||
|
||||
#### `node` |
||||
|
||||
The `node` role discovers one target per cluster node with the address |
||||
defaulting to the Kubelet's HTTP port. |
||||
|
||||
The target address defaults to the first existing address of the Kubernetes |
||||
node object in the address type order of `NodeInternalIP`, `NodeExternalIP`, |
||||
`NodeLegacyHostIP`, and `NodeHostName`. |
||||
|
||||
Available meta labels: |
||||
|
||||
* `__meta_kubernetes_node_name`: The name of the node object. |
||||
* `__meta_kubernetes_node_label_<labelname>`: Each label from the node object. |
||||
* `__meta_kubernetes_node_labelpresent_<labelname>`: `true` for each label from the node object. |
||||
* `__meta_kubernetes_node_annotation_<annotationname>`: Each annotation from the node object. |
||||
* `__meta_kubernetes_node_annotationpresent_<annotationname>`: `true` for each annotation from the node object. |
||||
* `__meta_kubernetes_node_address_<address_type>`: The first address for each node address type, if it exists. |
||||
|
||||
In addition, the `instance` label for the node will be set to the node name |
||||
as retrieved from the API server. |
||||
|
||||
#### `service` |
||||
|
||||
The `service` role discovers a target for each service port of each service. |
||||
This is generally useful for blackbox monitoring of a service. |
||||
The address will be set to the Kubernetes DNS name of the service and respective |
||||
service port. |
||||
|
||||
Available meta labels: |
||||
|
||||
* `__meta_kubernetes_namespace`: The namespace of the service object. |
||||
* `__meta_kubernetes_service_annotation_<annotationname>`: Each annotation from the service object. |
||||
* `__meta_kubernetes_service_annotationpresent_<annotationname>`: "true" for each annotation of the service object. |
||||
* `__meta_kubernetes_service_cluster_ip`: The cluster IP address of the service. (Does not apply to services of type ExternalName) |
||||
* `__meta_kubernetes_service_external_name`: The DNS name of the service. (Applies to services of type ExternalName) |
||||
* `__meta_kubernetes_service_label_<labelname>`: Each label from the service object. |
||||
* `__meta_kubernetes_service_labelpresent_<labelname>`: `true` for each label of the service object. |
||||
* `__meta_kubernetes_service_name`: The name of the service object. |
||||
* `__meta_kubernetes_service_port_name`: Name of the service port for the target. |
||||
* `__meta_kubernetes_service_port_protocol`: Protocol of the service port for the target. |
||||
|
||||
#### `pod` |
||||
|
||||
The `pod` role discovers all pods and exposes their containers as targets. For |
||||
each declared port of a container, a single target is generated. If a container |
||||
has no specified ports, a port-free target per container is created for manually |
||||
adding a port via relabeling. |
||||
|
||||
Available meta labels: |
||||
|
||||
* `__meta_kubernetes_namespace`: The namespace of the pod object. |
||||
* `__meta_kubernetes_pod_name`: The name of the pod object. |
||||
* `__meta_kubernetes_pod_ip`: The pod IP of the pod object. |
||||
* `__meta_kubernetes_pod_label_<labelname>`: Each label from the pod object. |
||||
* `__meta_kubernetes_pod_labelpresent_<labelname>`: `true`for each label from the pod object. |
||||
* `__meta_kubernetes_pod_annotation_<annotationname>`: Each annotation from the pod object. |
||||
* `__meta_kubernetes_pod_annotationpresent_<annotationname>`: `true` for each annotation from the pod object. |
||||
* `__meta_kubernetes_pod_container_init`: `true` if the container is an [InitContainer](https://kubernetes.io/docs/concepts/workloads/pods/init-containers/) |
||||
* `__meta_kubernetes_pod_container_name`: Name of the container the target address points to. |
||||
* `__meta_kubernetes_pod_container_port_name`: Name of the container port. |
||||
* `__meta_kubernetes_pod_container_port_number`: Number of the container port. |
||||
* `__meta_kubernetes_pod_container_port_protocol`: Protocol of the container port. |
||||
* `__meta_kubernetes_pod_ready`: Set to `true` or `false` for the pod's ready state. |
||||
* `__meta_kubernetes_pod_phase`: Set to `Pending`, `Running`, `Succeeded`, `Failed` or `Unknown` |
||||
in the [lifecycle](https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-phase). |
||||
* `__meta_kubernetes_pod_node_name`: The name of the node the pod is scheduled onto. |
||||
* `__meta_kubernetes_pod_host_ip`: The current host IP of the pod object. |
||||
* `__meta_kubernetes_pod_uid`: The UID of the pod object. |
||||
* `__meta_kubernetes_pod_controller_kind`: Object kind of the pod controller. |
||||
* `__meta_kubernetes_pod_controller_name`: Name of the pod controller. |
||||
|
||||
#### `endpoints` |
||||
|
||||
The `endpoints` role discovers targets from listed endpoints of a service. For |
||||
each endpoint address one target is discovered per port. If the endpoint is |
||||
backed by a pod, all additional container ports of the pod, not bound to an |
||||
endpoint port, are discovered as targets as well. |
||||
|
||||
Available meta labels: |
||||
|
||||
* `__meta_kubernetes_namespace`: The namespace of the endpoints object. |
||||
* `__meta_kubernetes_endpoints_name`: The names of the endpoints object. |
||||
* For all targets discovered directly from the endpoints list (those not additionally inferred |
||||
from underlying pods), the following labels are attached: |
||||
* `__meta_kubernetes_endpoint_hostname`: Hostname of the endpoint. |
||||
* `__meta_kubernetes_endpoint_node_name`: Name of the node hosting the endpoint. |
||||
* `__meta_kubernetes_endpoint_ready`: Set to `true` or `false` for the endpoint's ready state. |
||||
* `__meta_kubernetes_endpoint_port_name`: Name of the endpoint port. |
||||
* `__meta_kubernetes_endpoint_port_protocol`: Protocol of the endpoint port. |
||||
* `__meta_kubernetes_endpoint_address_target_kind`: Kind of the endpoint address target. |
||||
* `__meta_kubernetes_endpoint_address_target_name`: Name of the endpoint address target. |
||||
* If the endpoints belong to a service, all labels of the `role: service` discovery are attached. |
||||
* For all targets backed by a pod, all labels of the `role: pod` discovery are attached. |
||||
|
||||
#### `ingress` |
||||
|
||||
The `ingress` role discovers a target for each path of each ingress. |
||||
This is generally useful for blackbox monitoring of an ingress. |
||||
The address will be set to the host specified in the ingress spec. |
||||
|
||||
Available meta labels: |
||||
|
||||
* `__meta_kubernetes_namespace`: The namespace of the ingress object. |
||||
* `__meta_kubernetes_ingress_name`: The name of the ingress object. |
||||
* `__meta_kubernetes_ingress_label_<labelname>`: Each label from the ingress object. |
||||
* `__meta_kubernetes_ingress_labelpresent_<labelname>`: `true` for each label from the ingress object. |
||||
* `__meta_kubernetes_ingress_annotation_<annotationname>`: Each annotation from the ingress object. |
||||
* `__meta_kubernetes_ingress_annotationpresent_<annotationname>`: `true` for each annotation from the ingress object. |
||||
* `__meta_kubernetes_ingress_scheme`: Protocol scheme of ingress, `https` if TLS |
||||
config is set. Defaults to `http`. |
||||
* `__meta_kubernetes_ingress_path`: Path from ingress spec. Defaults to `/`. |
||||
|
||||
See below for the configuration options for Kubernetes discovery: |
||||
|
||||
```yaml |
||||
# The information to access the Kubernetes API. |
||||
|
||||
# The API server addresses. If left empty, Prometheus is assumed to run inside |
||||
# of the cluster and will discover API servers automatically and use the pod's |
||||
# CA certificate and bearer token file at /var/run/secrets/kubernetes.io/serviceaccount/. |
||||
[ api_server: <host> ] |
||||
|
||||
# The Kubernetes role of entities that should be discovered. |
||||
role: <role> |
||||
|
||||
# Optional authentication information used to authenticate to the API server. |
||||
# Note that `basic_auth`, `bearer_token` and `bearer_token_file` options are |
||||
# mutually exclusive. |
||||
# password and password_file are mutually exclusive. |
||||
|
||||
# Optional HTTP basic authentication information. |
||||
basic_auth: |
||||
[ username: <string> ] |
||||
[ password: <secret> ] |
||||
[ password_file: <string> ] |
||||
|
||||
# Optional bearer token authentication information. |
||||
[ bearer_token: <secret> ] |
||||
|
||||
# Optional bearer token file authentication information. |
||||
[ bearer_token_file: <filename> ] |
||||
|
||||
# Optional proxy URL. |
||||
[ proxy_url: <string> ] |
||||
|
||||
# TLS configuration. |
||||
tls_config: |
||||
[ <tls_config> ] |
||||
|
||||
# Optional namespace discovery. If omitted, all namespaces are used. |
||||
namespaces: |
||||
names: |
||||
[ - <string> ] |
||||
``` |
||||
|
||||
Where `<role>` must be `endpoints`, `service`, `pod`, `node`, or |
||||
`ingress`. |
||||
|
||||
See |
||||
[this example Prometheus configuration file](/documentation/examples/prometheus-kubernetes.yml) |
||||
for a detailed example of configuring Prometheus for Kubernetes. |
||||
|
||||
You may wish to check out the 3rd party |
||||
[Prometheus Operator](https://github.com/coreos/prometheus-operator), |
||||
which automates the Prometheus setup on top of Kubernetes. |
||||
|
||||
## target_config |
||||
|
||||
The `target_config` block controls the behavior of reading files from discovered |
||||
targets. |
||||
|
||||
```yaml |
||||
# Period to resync directories being watched and files being tailed to discover |
||||
# new ones or stop watching removed ones. |
||||
sync_period: "10s" |
||||
``` |
||||
|
||||
## Example Docker Config |
||||
|
||||
```yaml |
||||
server: |
||||
http_listen_port: 9080 |
||||
grpc_listen_port: 0 |
||||
|
||||
positions: |
||||
filename: /tmp/positions.yaml |
||||
|
||||
client: |
||||
url: http://ip_or_hostname_where_Loki_run:3100/loki/api/v1/push |
||||
|
||||
scrape_configs: |
||||
- job_name: system |
||||
pipeline_stages: |
||||
- docker: |
||||
static_configs: |
||||
- targets: |
||||
- localhost |
||||
labels: |
||||
job: varlogs |
||||
host: yourhost |
||||
__path__: /var/log/*.log |
||||
|
||||
- job_name: someone_service |
||||
pipeline_stages: |
||||
- docker: |
||||
static_configs: |
||||
- targets: |
||||
- localhost |
||||
labels: |
||||
job: someone_service |
||||
host: yourhost |
||||
__path__: /srv/log/someone_service/*.log |
||||
``` |
||||
|
||||
## Example Journal Config |
||||
|
||||
```yaml |
||||
server: |
||||
http_listen_port: 9080 |
||||
grpc_listen_port: 0 |
||||
|
||||
positions: |
||||
filename: /tmp/positions.yaml |
||||
|
||||
clients: |
||||
- url: http://ip_or_hostname_where_loki_runns:3100/loki/api/v1/push |
||||
|
||||
scrape_configs: |
||||
- job_name: journal |
||||
journal: |
||||
max_age: 12h |
||||
path: /var/log/journal |
||||
labels: |
||||
job: systemd-journal |
||||
relabel_configs: |
||||
- source_labels: ['__journal__systemd_unit'] |
||||
target_label: 'unit' |
||||
``` |
||||
@ -0,0 +1,208 @@ |
||||
# Pipelines |
||||
|
||||
A detailed look at how to setup Promtail to process your log lines, including |
||||
extracting metrics and labels. |
||||
|
||||
## Pipeline |
||||
|
||||
A pipeline is used to transform a single log line, its labels, and its |
||||
timestamp. A pipeline is comprised of a set of **stages**. There are 4 types of |
||||
stages: |
||||
|
||||
1. **Parsing stages** parse the current log line and extract data out of it. The |
||||
extracted data is then available for use by other stages. |
||||
2. **Transform stages** transform extracted data from previous stages. |
||||
3. **Action stages** take extracted data from previous stages and do something |
||||
with them. Actions can: |
||||
1. Add or modify existing labels to the log line |
||||
2. Change the timestamp of the log line |
||||
3. Change the content of the log line |
||||
4. Create a metric based on the extracted data |
||||
4. **Filtering stages** optionally apply a subset of stages based on some |
||||
condition. |
||||
|
||||
Typical pipelines will start with a parsing stage (such as a |
||||
[regex](./stages/regex.md) or [json](./stages/json.md) stage) to extract data |
||||
from the log line. Then, a series of action stages will be present to do |
||||
something with that extract data. The most common action stage will be a |
||||
[labels](./stages/labels.md) stage to turn extracted data into a label. |
||||
|
||||
A common stage will also be the [match](./stages/match.md) stage to selectively |
||||
apply stages based on the current labels. |
||||
|
||||
Note that pipelines can not currently be used to deduplicate logs; Loki will |
||||
receive the same log line multiple times if, for example: |
||||
|
||||
1. Two scrape configs read from the same file |
||||
2. Duplicate log lines in a file are sent through a pipeline. Deduplication is |
||||
not done. |
||||
|
||||
However, Loki will perform some deduplication at query time for logs that have |
||||
the exact same nanosecond timestamp, labels, and log contents. |
||||
|
||||
This documented example gives a good glimpse of what you can achieve with a |
||||
pipeline: |
||||
|
||||
```yaml |
||||
scrape_configs: |
||||
- job_name: kubernetes-pods-name |
||||
kubernetes_sd_configs: .... |
||||
pipeline_stages: |
||||
|
||||
# This stage is only going to run if the scraped target has a label |
||||
# of "name" with value "promtail". |
||||
- match: |
||||
selector: '{name="promtail"}' |
||||
stages: |
||||
# The regex stage parses out a level, timestamp, and component. At the end |
||||
# of the stage, the values for level, timestamp, and component are only |
||||
# set internally for the pipeline. Future stages can use these values and |
||||
# decide what to do with them. |
||||
- regex: |
||||
expression: '.*level=(?P<level>[a-zA-Z]+).*ts=(?P<timestamp>[T\d-:.Z]*).*component=(?P<component>[a-zA-Z]+)' |
||||
|
||||
# The labels stage takes the level and component entries from the previous |
||||
# regex stage and promotes them to a label. For example, level=error may |
||||
# be a label added by this stage. |
||||
- labels: |
||||
level: |
||||
component: |
||||
|
||||
# Finally, the timestamp stage takes the timestamp extracted from the |
||||
# regex stage and promotes it to be the new timestamp of the log entry, |
||||
# parsing it as an RFC3339Nano-formatted value. |
||||
- timestamp: |
||||
format: RFC3339Nano |
||||
source: timestamp |
||||
|
||||
# This stage is only going to run if the scraped target has a label of |
||||
# "name" with a value of "nginx". |
||||
- match: |
||||
selector: '{name="nginx"}' |
||||
stages: |
||||
# This regex stage extracts a new output by matching against some |
||||
# values and capturing the rest. |
||||
- regex: |
||||
expression: \w{1,3}.\w{1,3}.\w{1,3}.\w{1,3}(?P<output>.*) |
||||
|
||||
# The output stage changes the content of the captured log line by |
||||
# setting it to the value of output from the regex stage. |
||||
- output: |
||||
source: output |
||||
|
||||
# This stage is only going to run if the scraped target has a label of |
||||
# "name" with a value of "jaeger-agent". |
||||
- match: |
||||
selector: '{name="jaeger-agent"}' |
||||
stages: |
||||
# The JSON stage reads the log line as a JSON string and extracts |
||||
# the "level" field from the object for use in further stages. |
||||
- json: |
||||
expressions: |
||||
level: level |
||||
|
||||
# The labels stage pulls the value from "level" that was extracted |
||||
# from the previous stage and promotes it to a label. |
||||
- labels: |
||||
level: |
||||
- job_name: kubernetes-pods-app |
||||
kubernetes_sd_configs: .... |
||||
pipeline_stages: |
||||
# This stage will only run if the scraped target has a label of "app" |
||||
# with a name of *either* grafana or prometheus. |
||||
- match: |
||||
selector: '{app=~"grafana|prometheus"}' |
||||
stages: |
||||
# The regex stage will extract a level and component for use in further |
||||
# stages, allowing the level to be defined as either lvl=<level> or |
||||
# level=<level> and the component to be defined as either |
||||
# logger=<component> or component=<component> |
||||
- regex: |
||||
expression: ".*(lvl|level)=(?P<level>[a-zA-Z]+).*(logger|component)=(?P<component>[a-zA-Z]+)" |
||||
|
||||
# The labels stage then promotes the level and component extracted from |
||||
# the regex stage to labels. |
||||
- labels: |
||||
level: |
||||
component: |
||||
|
||||
# This stage will only run if the scraped target has a label of "app" |
||||
# and a value of "some-app". |
||||
- match: |
||||
selector: '{app="some-app"}' |
||||
stages: |
||||
# The regex stage tries to extract a Go panic by looking for panic: |
||||
# in the log message. |
||||
- regex: |
||||
expression: ".*(?P<panic>panic: .*)" |
||||
|
||||
# The metrics stage is going to increment a panic_total metric counter |
||||
# which promtail exposes. The counter is only incremented when panic |
||||
# was extracted from the regex stage. |
||||
- metrics: |
||||
- panic_total: |
||||
type: Counter |
||||
description: "total count of panic" |
||||
source: panic |
||||
config: |
||||
action: inc |
||||
``` |
||||
|
||||
### Data Accessible to Stages |
||||
|
||||
The following sections further describe the types that are accessible to each |
||||
stage (although not all may be used): |
||||
|
||||
##### Label Set |
||||
|
||||
The current set of labels for the log line. Initialized to be the set of labels |
||||
that were scraped along with the log line. The label set is only modified by an |
||||
action stage, but filtering stages read from it. |
||||
|
||||
The final label set will be index by Loki and can be used for queries. |
||||
|
||||
##### Extracted Map |
||||
|
||||
A collection of key-value pairs extracted during a parsing stage. Subsequent |
||||
stages operate on the extracted map, either transforming them or taking action |
||||
with them. At the end of a pipeline, the extracted map is discarded; for a |
||||
parsing stage to be useful, it must always be paired with at least one action |
||||
stage. |
||||
|
||||
##### Log Timestamp |
||||
|
||||
The current timestamp for the log line. Action stages can modify this value. |
||||
If left unset, it defaults to the time when the log was scraped. |
||||
|
||||
The final value for the timestamp is sent to Loki. |
||||
|
||||
##### Log Line |
||||
|
||||
The current log line, represented as text. Initialized to be the text that |
||||
promtail scraped. Action stages can modify this value. |
||||
|
||||
The final value for the log line is sent to Loki as the text content for the |
||||
given log entry. |
||||
|
||||
## Stages |
||||
|
||||
Parsing stages: |
||||
|
||||
* [regex](./stages/regex.md): Extract data using a regular expression. |
||||
* [json](./stages/json.md): Extract data by parsing the log line as JSON. |
||||
|
||||
Transform stages: |
||||
|
||||
* [template](./stages/template.md): Use Go templates to modify extracted data. |
||||
|
||||
Action stages: |
||||
|
||||
* [timestamp](./stages/timestamp.md): Set the timestamp value for the log entry. |
||||
* [output](./stages/output.md): Set the log line text. |
||||
* [labels](./stages/labels.md): Update the label set for the log entry. |
||||
* [metrics](./stages/metrics.md): Calculate metrics based on extracted data. |
||||
|
||||
Filtering stages: |
||||
|
||||
* [match](./stages/match.md): Conditionally run stages based on the label set. |
||||
|
||||
@ -0,0 +1,143 @@ |
||||
# Promtail Scraping (Service Discovery) |
||||
|
||||
## File Target Discovery |
||||
|
||||
Promtail discovers locations of log files and extract labels from them through |
||||
the `scrape_configs` section in the config YAML. The syntax is identical to what |
||||
[Prometheus uses](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config). |
||||
|
||||
`scrape_configs` contains one or more entries which are executed for each |
||||
discovered target (i.e., each container in each new pod running in the |
||||
instance): |
||||
|
||||
``` |
||||
scrape_configs: |
||||
- job_name: local |
||||
static_configs: |
||||
- ... |
||||
|
||||
- job_name: kubernetes |
||||
kubernetes_sd_config: |
||||
- ... |
||||
``` |
||||
|
||||
If more than one scrape config section matches your logs, you will get duplicate |
||||
entries as the logs are sent in different streams likely with slightly |
||||
different labels. |
||||
|
||||
There are different types of labels present in Promtail: |
||||
|
||||
* Labels starting with `__` (two underscores) are internal labels. They usually |
||||
come from dynamic sources like service discovery. Once relabeling is done, |
||||
they are removed from the label set. To persist internal labels so they're |
||||
sent to Loki, rename them so they don't start with `__`. See |
||||
[Relabeling](#relabeling) for more information. |
||||
|
||||
* Labels starting with `__meta_kubernetes_pod_label_*` are "meta labels" which |
||||
are generated based on your Kubernetes pod's labels. |
||||
|
||||
For example, if your Kubernetes pod has a label `name` set to `foobar`, then |
||||
the `scrape_configs` section will receive an internal label |
||||
`__meta_kubernetes_pod_label_name` with a value set to `foobar`. |
||||
|
||||
* Other labels starting with `__meta_kubernetes_*` exist based on other |
||||
Kubernetes metadata, such as the namespace of the pod |
||||
(`__meta_kubernetes_namespace`) or the name of the container inside the pod |
||||
(`__meta_kubernetes_pod_container_name`). Refer to |
||||
[the Prometheus docs](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config) |
||||
for the full list of Kubernetes meta labels. |
||||
|
||||
* The `__path__` label is a special label which Promtail uses after discovery to |
||||
figure out where the file to read is located. Wildcards are allowed. |
||||
|
||||
* The label `filename` is added for every file found in `__path__` to ensure the |
||||
uniqueness of the streams. It is set to the absolute path of the file the line |
||||
was read from. |
||||
|
||||
### Kubernetes Discovery |
||||
|
||||
Note that while Promtail can utilize the Kubernetes API to discover pods as |
||||
targets, it can only read log files from pods that are running on the same node |
||||
as the one Promtail is running on. Promtail looks for a `__host__` label on |
||||
each target and validates that it is set to the same hostname as Promtail's |
||||
(using either `$HOSTNAME` or the hostname reported by the kernel if the |
||||
environment variable is not set). |
||||
|
||||
This means that any time Kubernetes service discovery is used, there must be a |
||||
`relabel_config` that creates the intermediate label `__host__` from |
||||
`__meta_kubernetes_pod_node_name`: |
||||
|
||||
```yaml |
||||
relabel_configs: |
||||
- source_labels: ['__meta_kubernetes_pod_node_name'] |
||||
target_label: '__host__' |
||||
``` |
||||
|
||||
See [Relabeling](#relabeling) for more information. |
||||
|
||||
## Relabeling |
||||
|
||||
Each `scrape_configs` entry can contain a `relabel_configs` stanza. |
||||
`relabel_configs` is a list of operations to transform the labels from discovery |
||||
into another form. |
||||
|
||||
A single entry in `relabel_configs` can also reject targets by doing an `action: |
||||
drop` if a label value matches a specified regex. When a target is dropped, the |
||||
owning `scrape_config` will not process logs from that particular source. |
||||
Other `scrape_configs` without the drop action reading from the same target |
||||
may still use and forward logs from it to Loki. |
||||
|
||||
A common use case of `relabel_configs` is to transform an internal label such |
||||
as `__meta_kubernetes_*` into an intermediate internal label such as |
||||
`__service__`. The intermediate internal label may then be dropped based on |
||||
value or transformed to a final external label, such as `__job__`. |
||||
|
||||
### Examples |
||||
|
||||
* Drop the target if a label (`__service__` in the example) is empty: |
||||
```yaml |
||||
- action: drop |
||||
regex: ^$ |
||||
source_labels: |
||||
- __service__ |
||||
``` |
||||
* Drop the target if any of the `source_labels` contain a value: |
||||
```yaml |
||||
- action: drop |
||||
regex: .+ |
||||
separator: '' |
||||
source_labels: |
||||
- __meta_kubernetes_pod_label_name |
||||
- __meta_kubernetes_pod_label_app |
||||
``` |
||||
* Persist an internal label by renaming it so it will be sent to Loki: |
||||
```yaml |
||||
- action: replace |
||||
source_labels: |
||||
- __meta_kubernetes_namespace |
||||
target_label: namespace |
||||
``` |
||||
* Persist all Kubernetes pod labels by mapping them, like by mapping |
||||
`__meta_kube__meta_kubernetes_pod_label_foo` to `foo`. |
||||
```yaml |
||||
- action: labelmap |
||||
regex: __meta_kubernetes_pod_label_(.+) |
||||
``` |
||||
|
||||
Additional reading: |
||||
|
||||
* [Julien Pivotto's slides from PromConf Munich, 2017](https://www.slideshare.net/roidelapluie/taking-advantage-of-prometheus-relabeling-109483749) |
||||
|
||||
## HTTP client options |
||||
|
||||
Promtail uses the Prometheus HTTP client implementation for all calls to Loki. |
||||
Therefore it can be configured using the `clients` stanza, where one or more |
||||
connections to Loki can be established: |
||||
|
||||
```yaml |
||||
clients: |
||||
- [ <client_option> ] |
||||
``` |
||||
|
||||
Refer to [`client_config`](./configuration.md#client_config) from the Promtail |
||||
Configuration reference for all available options. |
||||
@ -0,0 +1,25 @@ |
||||
# Stages |
||||
|
||||
This section is a collection of all stages Promtail supports in a |
||||
[Pipeline](../ppipelines.md). |
||||
|
||||
Parsing stages: |
||||
|
||||
* [regex](./regex.md): Extract data using a regular expression. |
||||
* [json](./json.md): Extract data by parsing the log line as JSON. |
||||
|
||||
Transform stages: |
||||
|
||||
* [template](./template.md): Use Go templates to modify extracted data. |
||||
|
||||
Action stages: |
||||
|
||||
* [timestamp](./timestamp.md): Set the timestamp value for the log entry. |
||||
* [output](./output.md): Set the log line text. |
||||
* [labels](./labels.md): Update the label set for the log entry. |
||||
* [metrics](./metrics.md): Calculate metrics based on extracted data. |
||||
|
||||
Filtering stages: |
||||
|
||||
* [match](./match.md): Conditionally run stages based on the label set. |
||||
|
||||
@ -0,0 +1,91 @@ |
||||
# `json` stage |
||||
|
||||
The `json` stage is a parsing stage that reads the log line as JSON and accepts |
||||
[JMESPath](http://jmespath.org/) expressions to extract data. |
||||
|
||||
## Schema |
||||
|
||||
```yaml |
||||
json: |
||||
# Set of key/value pairs of JMESPath expressions. The key will be |
||||
# the key in the extracted data while the expression will the value, |
||||
# evaluated as a JMESPath from the source data. |
||||
expressions: |
||||
[ <string>: <string> ... ] |
||||
|
||||
# Name from extracted data to parse. If empty, uses the log message. |
||||
[source: <string>] |
||||
``` |
||||
|
||||
This stage uses the Go JSON unmarshaler, which means non-string types like |
||||
numbers or booleans will be unmarshaled into those types. The extracted data |
||||
can hold non-string values and this stage does not do any type conversions; |
||||
downstream stages will need to perform correct type conversion of these values |
||||
as necessary. Please refer to the [the `template` stage](./template.md) for how |
||||
to do this. |
||||
|
||||
If the value extracted is a complex type, such as an array or a JSON object, it |
||||
will be converted back into a JSON string before being inserted into the |
||||
extracted data. |
||||
|
||||
## Examples |
||||
|
||||
### Using log line |
||||
|
||||
For the given pipeline: |
||||
|
||||
```yaml |
||||
- json: |
||||
expressions: |
||||
output: log |
||||
stream: stream |
||||
timestamp: time |
||||
``` |
||||
|
||||
Given the following log line: |
||||
|
||||
``` |
||||
{"log":"log message\n","stream":"stderr","time":"2019-04-30T02:12:41.8443515Z"} |
||||
``` |
||||
|
||||
The following key-value pairs would be created in the set of extracted data: |
||||
|
||||
- `output`: `log message\n` |
||||
- `stream`: `stderr` |
||||
- `timestamp`: `2019-04-30T02:12:41.8443515` |
||||
|
||||
### Using extracted data |
||||
|
||||
For the given pipeline: |
||||
|
||||
```yaml |
||||
- json: |
||||
expressions: |
||||
output: log |
||||
stream: stream |
||||
timestamp: time |
||||
extra: |
||||
- json: |
||||
expressions: |
||||
user: |
||||
source: extra |
||||
``` |
||||
|
||||
And the given log line: |
||||
|
||||
``` |
||||
{"log":"log message\n","stream":"stderr","time":"2019-04-30T02:12:41.8443515Z","extra":"{\"user\":\"marco\"}"} |
||||
``` |
||||
|
||||
The first stage would create the following key-value pairs in the set of |
||||
extracted data: |
||||
|
||||
- `output`: `log message\n` |
||||
- `stream`: `stderr` |
||||
- `timestamp`: `2019-04-30T02:12:41.8443515` |
||||
- `extra`: `{"user": "marco"}` |
||||
|
||||
The second stage will parse the value of `extra` from the extracted data as JSON |
||||
and append the following key-value pairs to the set of extracted data: |
||||
|
||||
- `user`: `marco` |
||||
@ -0,0 +1,37 @@ |
||||
# `labels` stage |
||||
|
||||
The labels stage is an action stage that takes data from the extracted map and |
||||
modifies the label set that is sent to Loki with the log entry. |
||||
|
||||
## Schema |
||||
|
||||
```yaml |
||||
labels: |
||||
# Key is REQUIRED and the name for the label that will be created. |
||||
# Value is optional and will be the name from extracted data whose value |
||||
# will be used for the value of the label. If empty, the value will be |
||||
# inferred to be the same as the key. |
||||
[ <string>: [<string>] ... ] |
||||
``` |
||||
|
||||
### Examples |
||||
|
||||
For the given pipeline: |
||||
|
||||
```yaml |
||||
- json: |
||||
expressions: |
||||
stream: stream |
||||
- labels: |
||||
stream: |
||||
``` |
||||
|
||||
Given the following log line: |
||||
|
||||
``` |
||||
{"log":"log message\n","stream":"stderr","time":"2019-04-30T02:12:41.8443515Z"} |
||||
``` |
||||
|
||||
The first stage would extract `stream` into the extracted map with a value of |
||||
`stderr`. The labels stage would turn that key-value pair into a label, so the |
||||
log line sent to Loki would include the label `stream` with a value of `stderr`. |
||||
@ -0,0 +1,85 @@ |
||||
# `match` stage |
||||
|
||||
The match stage is a filtering stage that conditionally applies a set of stages |
||||
when a log entry matches a configurable [LogQL](../../../logql.md) stream |
||||
selector. |
||||
|
||||
## Schema |
||||
|
||||
```yaml |
||||
match: |
||||
# LogQL stream selector. |
||||
selector: <string> |
||||
|
||||
# Names the pipeline. When defined, creates an additional label in |
||||
# the pipeline_duration_seconds histogram, where the value is |
||||
# concatenated with job_name using an underscore. |
||||
[pipieline_name: <string>] |
||||
|
||||
# Nested set of pipeline stages only if the selector |
||||
# matches the labels of the log entries: |
||||
stages: |
||||
- [ |
||||
<regex_stage> |
||||
<json_stage> | |
||||
<template_stage> | |
||||
<match_stage> | |
||||
<timestamp_stage> | |
||||
<output_stage> | |
||||
<labels_stage> | |
||||
<metrics_stage> |
||||
] |
||||
``` |
||||
|
||||
Refer to the [Promtail Configuration Reference](../configuration.md) for the |
||||
schema on the various other stages referenced here. |
||||
|
||||
### Example |
||||
|
||||
For the given pipeline: |
||||
|
||||
```yaml |
||||
pipeline_stages: |
||||
- json: |
||||
expressions: |
||||
app: |
||||
- labels: |
||||
app: |
||||
- match: |
||||
selector: "{app=\"loki\"}" |
||||
stages: |
||||
- json: |
||||
expressions: |
||||
msg: message |
||||
- match: |
||||
pipeline_name: "app2" |
||||
selector: "{app=\"pokey\"}" |
||||
stages: |
||||
- json: |
||||
expressions: |
||||
msg: msg |
||||
- output: |
||||
source: msg |
||||
``` |
||||
|
||||
And the given log line: |
||||
|
||||
``` |
||||
{ "time":"2012-11-01T22:08:41+00:00", "app":"loki", "component": ["parser","type"], "level" : "WARN", "message" : "app1 log line" } |
||||
``` |
||||
|
||||
The first stage will add `app` with a value of `loki` into the extracted map, |
||||
while the second stage will add `app` as a label (again with the value of `loki`). |
||||
|
||||
The third stage uses LogQL to only execute the nested stages when there is a |
||||
label of `app` whose value is `loki`. This matches in our case; the nested |
||||
`json` stage then adds `msg` into the extracted map with a value of `app1 log |
||||
line`. |
||||
|
||||
The fourth stage uses LogQL to only executed the nested stages when there is a |
||||
label of `app` whose value is `pokey`. This does **not** match in our case, so |
||||
the nested `json` stage is not ran. |
||||
|
||||
The final `output` stage changes the contents of the log line to be the value of |
||||
`msg` from the extracted map. In this case, the log line is changed to `app1 log |
||||
line`. |
||||
@ -0,0 +1,215 @@ |
||||
# `metrics` stage |
||||
|
||||
The `metrics` stage is an action stage that allows for defining and updating |
||||
metrics based on data from the extracted map. Note that created metrics are not |
||||
pushed to Loki and are instead exposed via Promtail's `/metrics` endpoint. |
||||
Prometheus should be configured to scrape Promtail to be able to retrieve the |
||||
metrics configured by this stage. |
||||
|
||||
## Schema |
||||
|
||||
```yaml |
||||
# A map where the key is the name of the metric and the value is a specific |
||||
# metric type. |
||||
metrics: |
||||
[<string>: [ <metric_counter> | <metric_gauge> | <metric_histogram> ] ...] |
||||
``` |
||||
|
||||
### metric_counter |
||||
|
||||
Defines a counter metric whose value only goes up. |
||||
|
||||
```yaml |
||||
# The metric type. Must be Counter. |
||||
type: Counter |
||||
|
||||
# Describes the metric. |
||||
[description: <string>] |
||||
|
||||
# Key from the extracted data map to use for the mtric, |
||||
# defaulting to the metric's name if not present. |
||||
[source: <string>] |
||||
|
||||
config: |
||||
# Filters down source data and only changes the metric |
||||
# if the targeted value exactly matches the provided string. |
||||
# If not present, all data will match. |
||||
[value: <string>] |
||||
|
||||
# Must be either "inc" or "add" (case insensitive). If |
||||
# inc is chosen, the metric value will increase by 1 for each |
||||
# log line receieved that passed the filter. If add is chosen, |
||||
# the extracted value most be convertible to a positive float |
||||
# and its value will be added to the metric. |
||||
action: <string> |
||||
``` |
||||
|
||||
### metric_gauge |
||||
|
||||
Defines a gauge metric whose value can go up or down. |
||||
|
||||
```yaml |
||||
# The metric type. Must be Gauge. |
||||
type: Gauge |
||||
|
||||
# Describes the metric. |
||||
[description: <string>] |
||||
|
||||
# Key from the extracted data map to use for the mtric, |
||||
# defaulting to the metric's name if not present. |
||||
[source: <string>] |
||||
|
||||
config: |
||||
# Filters down source data and only changes the metric |
||||
# if the targeted value exactly matches the provided string. |
||||
# If not present, all data will match. |
||||
[value: <string>] |
||||
|
||||
# Must be either "set", "inc", "dec"," add", or "sub". If |
||||
# add, set, or sub is chosen, the extracted value must be |
||||
# convertible to a positive float. inc and dec will increment |
||||
# or decrement the metric's value by 1 respectively. |
||||
action: <string> |
||||
``` |
||||
|
||||
### metric_histogram |
||||
|
||||
Defines a histogram metric whose values are bucketed. |
||||
|
||||
```yaml |
||||
# The metric type. Must be Histogram. |
||||
type: Histogram |
||||
|
||||
# Describes the metric. |
||||
[description: <string>] |
||||
|
||||
# Key from the extracted data map to use for the mtric, |
||||
# defaulting to the metric's name if not present. |
||||
[source: <string>] |
||||
|
||||
config: |
||||
# Filters down source data and only changes the metric |
||||
# if the targeted value exactly matches the provided string. |
||||
# If not present, all data will match. |
||||
[value: <string>] |
||||
|
||||
# Must be either "inc" or "add" (case insensitive). If |
||||
# inc is chosen, the metric value will increase by 1 for each |
||||
# log line receieved that passed the filter. If add is chosen, |
||||
# the extracted value most be convertible to a positive float |
||||
# and its value will be added to the metric. |
||||
action: <string> |
||||
|
||||
# Holds all the numbers in which to bucket the metric. |
||||
buckets: |
||||
- <int> |
||||
``` |
||||
|
||||
## Examples |
||||
|
||||
### Counter |
||||
|
||||
```yaml |
||||
- metrics: |
||||
log_lines_total: |
||||
type: Counter |
||||
description: "total number of log lines" |
||||
source: time |
||||
config: |
||||
action: inc |
||||
``` |
||||
|
||||
This pipeline creates a `log_lines_total` counter that increments whenever the |
||||
extracted map contains a key for `time`. Since every log entry has a timestamp, |
||||
this is a good field to use to count every line. Notice that `value` is not |
||||
defined in the `config` section as we want to count every line and don't need to |
||||
filter the value. Similarly, `inc` is used as the action because we want to |
||||
increment the counter by one rather than by using the value of `time`. |
||||
|
||||
```yaml |
||||
- regex: |
||||
expression: "^.*(?P<order_success>order successful).*$" |
||||
- metrics: |
||||
succesful_orders_total: |
||||
type: Counter |
||||
description: "log lines with the message `order successful`" |
||||
source: order_success |
||||
config: |
||||
action: inc |
||||
``` |
||||
|
||||
This pipeline first tries to find `order successful` in the log line, extracting |
||||
it as the `order_success` field in the extracted map. The metrics stage then |
||||
creates a metric called `succesful_orders_total` whose value only increases when |
||||
`order_success` was found in the extracted map. |
||||
|
||||
The result of this pipeline is a metric whose value only increases when a log |
||||
line with the text `order successful` was scraped by Promtail. |
||||
|
||||
```yaml |
||||
- regex: |
||||
expression: "^.* order_status=(?P<order_status>.*?) .*$" |
||||
- metrics: |
||||
succesful_orders_total: |
||||
type: Counter |
||||
description: "successful orders" |
||||
source: order_status |
||||
config: |
||||
value: success |
||||
action: inc |
||||
failed_orders_total: |
||||
type: Counter |
||||
description: "failed orders" |
||||
source: order_status |
||||
config: |
||||
fail: fail |
||||
action: inc |
||||
``` |
||||
|
||||
This pipeline first tries to find text in the format `order_status=<value>` in |
||||
the log line, pulling out the `<value>` into the extracted map with the key |
||||
`order_status`. |
||||
|
||||
The metric stages creates `succesful_orders_total` and `failed_orders_total` |
||||
metrics that only increment when the value of `order_status` in the extracted |
||||
map is `success` or `fail` respectively. |
||||
|
||||
### Gauge |
||||
|
||||
Gauge examples will be very similar to Counter examples with additional `action` |
||||
values. |
||||
|
||||
```yaml |
||||
- regex: |
||||
expression: "^.* retries=(?P<retries>\d+) .*$" |
||||
- metrics: |
||||
retries_total: |
||||
type: Gauge |
||||
description: "total retries" |
||||
source: retries |
||||
config: |
||||
action: add |
||||
``` |
||||
|
||||
This pipeline first tries to find text in the format `retries=<value>` in the |
||||
log line, pulling out the `<value>` into the extracted map with the key |
||||
`retries`. Note that the regex only parses numbers for the value in `retries`. |
||||
|
||||
The metrics stage then creates a Gauge whose current value will be added to the |
||||
number in the `retries` field from the extracted map. |
||||
|
||||
### Histogram |
||||
|
||||
```yaml |
||||
- metrics: |
||||
http_response_time_seconds: |
||||
type: Histogram |
||||
description: "length of each log line" |
||||
source: response_time |
||||
config: |
||||
buckets: [0.001,0.0025,0.005,0.010,0.025,0.050] |
||||
``` |
||||
|
||||
This pipeline creates a histogram that reads `response_time` from the extracted |
||||
map and places it into a bucket, both increasing the count of the bucket and the |
||||
sum for that particular bucket. |
||||
@ -0,0 +1,43 @@ |
||||
# `output` stage |
||||
|
||||
The `output` stage is an action stage that takes data from the extracted map and |
||||
changes the log line that will be sent to Loki. |
||||
|
||||
## Schema |
||||
|
||||
```yaml |
||||
output: |
||||
# Name from extracted data to use for the log entry. |
||||
source: <string> |
||||
``` |
||||
|
||||
## Example |
||||
|
||||
For the given pipeline: |
||||
|
||||
```yaml |
||||
- json: |
||||
expressions: |
||||
user: log |
||||
message: stream |
||||
- labels: |
||||
user: |
||||
- output: |
||||
source: content |
||||
``` |
||||
|
||||
And the given log line: |
||||
|
||||
``` |
||||
{"user": "alexis", "message": "hello, world!"} |
||||
``` |
||||
|
||||
Then the first stage will extract the following key-value pairs into the |
||||
extracted map: |
||||
|
||||
- `user`: `alexis` |
||||
- `message`: `hello, world!` |
||||
|
||||
The second stage will then add `user=alexis` to the label set for the outgoing |
||||
log line, and the final `output` stage will change the log line from the |
||||
original JSON to `hello, world!` |
||||
@ -0,0 +1,76 @@ |
||||
# `regex` stage |
||||
|
||||
The `regex` stage is a parsing stage that parses a log line using a regular |
||||
expression. Named capture groups in the regex support adding data into the |
||||
extracted map. |
||||
|
||||
## Schema |
||||
|
||||
```yaml |
||||
regex: |
||||
# The RE2 regular expression. Each capture group must be named. |
||||
expression: <string> |
||||
|
||||
# Name from extracted data to parse. If empty, uses the log message. |
||||
[source: <string>] |
||||
``` |
||||
|
||||
`expression` needs to be a [Go RE2 regex |
||||
string](https://github.com/google/re2/wiki/Syntax). Every capture group `(re)` |
||||
will be set into the `extracted` map, every capture group **must be named:** |
||||
`(?P<name>re)`. The name of the capture group will be used as the key in the |
||||
extracted map. |
||||
|
||||
## Example |
||||
|
||||
### Without `source` |
||||
|
||||
Given the pipeline: |
||||
|
||||
```yaml |
||||
- regex: |
||||
expression: "^(?s)(?P<time>\\S+?) (?P<stream>stdout|stderr) (?P<flags>\\S+?) (?P<content>.*)$" |
||||
``` |
||||
|
||||
And the log line: |
||||
|
||||
``` |
||||
2019-01-01T01:00:00.000000001Z stderr P i'm a log message! |
||||
``` |
||||
|
||||
The following key-value pairs would be added to the extracted map: |
||||
|
||||
- `time`: `2019-01-01T01:00:00.000000001Z`, |
||||
- `stream`: `stderr`, |
||||
- `flags`: `P`, |
||||
- `content`: `i'm a log message` |
||||
|
||||
### With `source` |
||||
|
||||
Given the pipeline: |
||||
|
||||
```yaml |
||||
- json: |
||||
expressions: |
||||
time: |
||||
- regex: |
||||
expression: "^(?P<year>\\d+)" |
||||
source: "time" |
||||
``` |
||||
|
||||
And the log line: |
||||
|
||||
``` |
||||
{"time":"2019-01-01T01:00:00.000000001Z"} |
||||
``` |
||||
|
||||
The first stage would add the following key-value pairs into the `extracted` |
||||
map: |
||||
|
||||
- `time`: `2019-01-01T01:00:00.000000001Z` |
||||
|
||||
While the regex stage would then parse the value for `time` in the extracted map |
||||
and append the following key-value pairs back into the extracted map: |
||||
|
||||
- `year`: `2019` |
||||
|
||||
@ -0,0 +1,70 @@ |
||||
# `template` stage |
||||
|
||||
The `template` stage is a transform stage that lets use manipulate the values in |
||||
the extracted map using [Go's template |
||||
syntax](https://golang.org/pkg/text/template/). |
||||
|
||||
The `template` stage is primarily useful for manipulating data from other stages |
||||
before setting them as labels, such as to replace spaces with underscores or |
||||
converting an uppercase string into a lowercase one. |
||||
|
||||
The template stage can also create new keys in the extracted map. |
||||
|
||||
## Schema |
||||
|
||||
```yaml |
||||
template: |
||||
# Name from extracted data to parse. If key in extract data doesn't exist, an |
||||
# entry for it will be created. |
||||
source: <string> |
||||
|
||||
# Go template string to use. In additional to normal template |
||||
# functions, ToLower, ToUpper, Replace, Trim, TrimLeft, TrimRight, |
||||
# TrimPrefix, TrimSuffix, and TrimSpace are available as functions. |
||||
template: <string> |
||||
``` |
||||
|
||||
## Examples |
||||
|
||||
```yaml |
||||
- template: |
||||
source: new_key |
||||
template: 'hello world!' |
||||
``` |
||||
|
||||
Assuming no data has been added to the extracted map yet, this stage will first |
||||
add `new_key` with a blank value into the extracted map. Then its value will be |
||||
set to `hello world!`. |
||||
|
||||
```yaml |
||||
- template: |
||||
source: app |
||||
template: '{{ .Value }}_some_suffix' |
||||
``` |
||||
|
||||
This pipeline takes the value of the `app` key in the existing extracted map and |
||||
appends `_some_suffix` to its value. For example, if the extracted map had a |
||||
key of `app` and a value of `loki`, this stage would modify the value from |
||||
`loki` to `loki_some_suffix`. |
||||
|
||||
```yaml |
||||
- template: |
||||
source: app |
||||
template: '{{ ToLower .Value }}' |
||||
``` |
||||
|
||||
This pipeline takes the current value of `app` from the extracted map and |
||||
converts its value to be all lowercase. For example, if the extracted map |
||||
contained `app` with a value of `LOKI`, this pipeline would change its value to |
||||
`loki`. |
||||
|
||||
```yaml |
||||
- template: |
||||
source: app |
||||
template: '{{ Replace .Value "loki" "blokey" 1 }}' |
||||
``` |
||||
|
||||
The template here uses Go's [`string.Replace` |
||||
function](https://golang.org/pkg/strings/#Replace). When the template executes, |
||||
the entire contents of the `app` key from the extracted map will have at most |
||||
`1` instance of `loki` changed to `blokey`. |
||||
@ -0,0 +1,81 @@ |
||||
# `timestamp` stage |
||||
|
||||
The `timestamp` stage is an action stage that can change the timestamp of a log |
||||
line before it is sent to Loki. When a `timestamp` stage is not present, the |
||||
timestamp of a log line defaults to the time when the log entry is scraped. |
||||
|
||||
## Schema |
||||
|
||||
```yaml |
||||
timestamp: |
||||
# Name from extracted data to use for the timestamp. |
||||
source: <string> |
||||
|
||||
# Determines how to parse the time string. Can use |
||||
# pre-defined formats by name: [ANSIC UnixDate RubyDate RFC822 |
||||
# RFC822Z RFC850 RFC1123 RFC1123Z RFC3339 RFC3339Nano Unix |
||||
# UnixMs UnixNs]. |
||||
format: <string> |
||||
|
||||
# IANA Timezone Database string. |
||||
[location: <string>] |
||||
``` |
||||
|
||||
The `format` field can be provided as an "example" of what timestamps look like |
||||
(such as `Mon Jan 02 15:04:05 -0700 2006`) but Promtail accepts using a series |
||||
of pre-defined formats to represent common forms: |
||||
|
||||
- `ANSIC`: `Mon Jan _2 15:04:05 2006` |
||||
- `UnixDate`: `Mon Jan _2 15:04:05 MST 2006` |
||||
- `RubyDate`: `Mon Jan 02 15:04:05 -0700 2006` |
||||
- `RFC822`: `02 Jan 06 15:04 MST` |
||||
- `RFC822Z`: `02 Jan 06 15:04 -0700` |
||||
- `RFC850`: `Monday, 02-Jan-06 15:04:05 MST` |
||||
- `RFC1123`: `Mon, 02 Jan 2006 15:04:05 MST` |
||||
- `RFC1123Z`: `Mon, 02 Jan 2006 15:04:05 -0700` |
||||
- `RFC3339`: `2006-01-02T15:04:05-07:00` |
||||
- `RFC3339Nano`: `2006-01-02T15:04:05.999999999-07:00` |
||||
|
||||
Additionally, support for common Unix timestamps is supported with the following |
||||
`format` values: |
||||
|
||||
- `Unix`: `1562708916` |
||||
- `UnixMs`: `1562708916414` |
||||
- `UnixNs`: `1562708916000000123` |
||||
|
||||
Custom formats are passed directly to the layout parameter in Go's |
||||
[time.Parse](https://golang.org/pkg/time/#Parse) function. If the custom format |
||||
has no year component specified, Promtail will assume that the current year |
||||
according to the system's clock should be used. |
||||
|
||||
The syntax used by the custom format defines the reference date and time using |
||||
specific values for each component of the timestamp (i.e., `Mon Jan 2 15:04:05 |
||||
-0700 MST 2006`). The following table shows supported reference values which |
||||
should be used in the custom format. |
||||
|
||||
| Timestamp component | Format value | |
||||
| ------------------- | ------------------------------------------------------------------------------------------------------------------------------------ | |
||||
| Year | `06`, `2006` | |
||||
| Month | `1`, `01`, `Jan`, `January` | |
||||
| Day | `2`, `02`, `_2` (two digits right justified) | |
||||
| Day of the week | `Mon`, `Monday` | |
||||
| Hour | `3` (12-hour), `03` (12-hour zero prefixed), `15` (24-hour) | |
||||
| Minute | `4`, `04` | |
||||
| Second | `5`, `05` | |
||||
| Fraction of second | `.000` (ms zero prefixed), `.000000` (μs), `.000000000` (ns), `.999` (ms without trailing zeroes), `.999999` (μs), `.999999999` (ns) | |
||||
| 12-hour period | `pm`, `PM` | |
||||
| Timezone name | `MST` | |
||||
| Timezone offset | `-0700`, `-070000` (with seconds), `-07`, `07:00`, `-07:00:00` (with seconds) | |
||||
| Timezone ISO-8601 | `Z0700` (Z for UTC or time offset), `Z070000`, `Z07`, `Z07:00`, `Z07:00:00` | |
||||
|
||||
## Examples |
||||
|
||||
```yaml |
||||
- timestamp: |
||||
source: time |
||||
format: RFC3339Nano |
||||
``` |
||||
|
||||
This stage looks for a `time` field in the extracted map and reads its value in |
||||
`RFC3339Nano` form (e.g., `2006-01-02T15:04:05.999999999-07:00`). The resulting |
||||
time value will be used as the timestamp sent with the log line to Loki. |
||||
@ -0,0 +1,81 @@ |
||||
# Troubleshooting Promtail |
||||
|
||||
This document describes known failure modes of `promtail` on edge cases and the |
||||
adopted trade-offs. |
||||
|
||||
## A tailed file is truncated while `promtail` is not running |
||||
|
||||
Given the following order of events: |
||||
|
||||
1. `promtail` is tailing `/app.log` |
||||
2. `promtail` current position for `/app.log` is `100` (byte offset) |
||||
3. `promtail` is stopped |
||||
4. `/app.log` is truncated and new logs are appended to it |
||||
5. `promtail` is restarted |
||||
|
||||
When `promtail` is restarted, it reads the previous position (`100`) from the |
||||
positions file. Two scenarios are then possible: |
||||
|
||||
- `/app.log` size is less than the position before truncating |
||||
- `/app.log` size is greater than or equal to the position before truncating |
||||
|
||||
If the `/app.log` file size is less than the previous position, then the file is |
||||
detected as truncated and logs will be tailed starting from position `0`. |
||||
Otherwise, if the `/app.log` file size is greater than or equal to the previous |
||||
position, `promtail` can't detect it was truncated while not running and will |
||||
continue tailing the file from position `100`. |
||||
|
||||
Generally speaking, `promtail` uses only the path to the file as key in the |
||||
positions file. Whenever `promtail` is started, for each file path referenced in |
||||
the positions file, `promtail` will read the file from the beginning if the file |
||||
size is less than the offset stored in the position file, otherwise it will |
||||
continue from the offset, regardless the file has been truncated or rolled |
||||
multiple times while `promtail` was not running. |
||||
|
||||
## Loki is unavailable |
||||
|
||||
For each tailing file, `promtail` reads a line, process it through the |
||||
configured `pipeline_stages` and push the log entry to Loki. Log entries are |
||||
batched together before getting pushed to Loki, based on the max batch duration |
||||
`client.batch-wait` and size `client.batch-size-bytes`, whichever comes first. |
||||
|
||||
In case of any error while sending a log entries batch, `promtail` adopts a |
||||
"retry then discard" strategy: |
||||
|
||||
- `promtail` retries to send log entry to the ingester up to `maxretries` times |
||||
- If all retries fail, `promtail` discards the batch of log entries (_which will |
||||
be lost_) and proceeds with the next one |
||||
|
||||
You can configure the `maxretries` and the delay between two retries via the |
||||
`backoff_config` in the promtail config file: |
||||
|
||||
```yaml |
||||
clients: |
||||
- url: INGESTER-URL |
||||
backoff_config: |
||||
minbackoff: 100ms |
||||
maxbackoff: 5s |
||||
maxretries: 5 |
||||
``` |
||||
|
||||
## Log entries pushed after a `promtail` crash / panic / abruptly termination |
||||
|
||||
When `promtail` shuts down gracefully, it saves the last read offsets in the |
||||
positions file, so that on a subsequent restart it will continue tailing logs |
||||
without duplicates neither losses. |
||||
|
||||
In the event of a crash or abruptly termination, `promtail` can't save the last |
||||
read offsets in the positions file. When restarted, `promtail` will read the |
||||
positions file saved at the last sync period and will continue tailing the files |
||||
from there. This means that if new log entries have been read and pushed to the |
||||
ingester between the last sync period and the crash, these log entries will be |
||||
sent again to the ingester on `promtail` restart. |
||||
|
||||
However, for each log stream (set of unique labels) the Loki ingester skips all |
||||
log entries received out of timestamp order. For this reason, even if duplicated |
||||
logs may be sent from `promtail` to the ingester, entries whose timestamp is |
||||
older than the latest received will be discarded to avoid having duplicated |
||||
logs. To leverage this, it's important that your `pipeline_stages` include |
||||
the `timestamp` stage, parsing the log entry timestamp from the log line instead |
||||
of relying on the default behaviour of setting the timestamp as the point in |
||||
time when the line is read by `promtail`. |
||||
@ -0,0 +1,5 @@ |
||||
# Community |
||||
|
||||
1. [Governance](./governance.md) |
||||
2. [Getting in Touch](./getting-in-touch.md) |
||||
3. [Contributing](./contributing.md) |
||||
@ -0,0 +1,60 @@ |
||||
# Contributing to Loki |
||||
|
||||
Loki uses [GitHub](https://github.com/grafana/loki) to manage reviews of pull requests: |
||||
|
||||
- If you have a trivial fix or improvement, go ahead and create a pull request. |
||||
- If you plan to do something more involved, discuss your ideas on the relevant GitHub issue (creating one if it doesn't exist). |
||||
|
||||
## Steps to contribute |
||||
|
||||
To contribute to Loki, you must clone it into your `$GOPATH` and add your fork |
||||
as a remote. |
||||
|
||||
```bash |
||||
$ git clone https://github.com/grafana/loki.git $GOPATH/src/github.com/grafana/loki |
||||
$ cd $GOPATH/src/github.com/grafana/loki |
||||
$ git remote add fork <FORK_URL> |
||||
|
||||
# Make some changes! |
||||
|
||||
$ git add . |
||||
$ git commit -m "docs: fix spelling error" |
||||
$ git push -u fork HEAD |
||||
|
||||
# Open a PR! |
||||
``` |
||||
|
||||
Note that if you downloaded Loki using `go get`, the message `package github.com/grafana/loki: no Go files in /go/src/github.com/grafana/loki` |
||||
is normal and requires no actions to resolve. |
||||
|
||||
### Building |
||||
|
||||
While `go install ./cmd/loki` works, the preferred way to build is by using |
||||
`make`: |
||||
|
||||
- `make loki`: builds Loki and outputs the binary to `./cmd/loki/loki` |
||||
|
||||
- `make promtail`: builds Promtail and outputs the binary to |
||||
`./cmd/promtail/promtail` |
||||
|
||||
- `make logcli`: builds LogCLI and outputs the binary to `./cmd/logcli/logcli` |
||||
|
||||
- `make loki-canary`: builds Loki Canary and outputs the binary to |
||||
`./cmd/loki-canary/loki-canary` |
||||
|
||||
- `make docker-driver`: builds the Loki Docker Driver and installs it into |
||||
Docker. |
||||
|
||||
- `make images`: builds all Docker images (optionally suffix the previous binary |
||||
commands with `-image`, e.g., `make loki-image`). |
||||
|
||||
These commands can be chained together to build multiple binaries in one go: |
||||
|
||||
```bash |
||||
# Builds binaries for Loki, Promtail, and LogCLI. |
||||
$ make loki promtail logcli |
||||
``` |
||||
|
||||
## Contribute to the Helm Chart |
||||
|
||||
Please follow the [Helm documentation](../../production/helm/README.md). |
||||
@ -0,0 +1,12 @@ |
||||
# Contacting the Loki Team |
||||
|
||||
If you have any questions or feedback regarding Loki: |
||||
|
||||
- Ask a question on the Loki Slack channel. To invite yourself to the Grafana Slack, visit [http://slack.raintank.io/](http://slack.raintank.io/) and join the #loki channel. |
||||
- [File a GitHub issue](https://github.com/grafana/loki/issues/new) for bugs, issues and feature suggestions. |
||||
- Send an email to [lokiproject@googlegroups.com](mailto:lokiproject@googlegroups.com), or use the [web interface](https://groups.google.com/forum/#!forum/lokiproject). |
||||
|
||||
Please file UI issues directly to the [Grafana repository](https://github.com/grafana/grafana/issues/new). |
||||
|
||||
Your feedback is always welcome. |
||||
|
||||
@ -0,0 +1,833 @@ |
||||
# Configuring Loki |
||||
|
||||
Loki is configured in a YAML file (usually referred to as `loki.yaml`) |
||||
which contains information on the Loki server and its individual components, |
||||
depending on which mode Loki is launched in. |
||||
|
||||
Configuration examples can be found in the [Configuration Examples](examples.md) document. |
||||
|
||||
* [Configuration File Reference](#configuration-file-reference) |
||||
* [server_config](#server_config) |
||||
* [querier_config](#querier_config) |
||||
* [ingester_client_config](#ingester_client_config) |
||||
* [grpc_client_config](#grpc_client_config) |
||||
* [ingester_config](#ingester_config) |
||||
* [lifecycler_config](#lifecycler_config) |
||||
* [ring_config](#ring_config) |
||||
* [storage_config](#storage_config) |
||||
* [cache_config](#cache_config) |
||||
* [chunk_store_config](#chunk_store_config) |
||||
* [schema_config](#schema_config) |
||||
* [period_config](#period_config) |
||||
* [limits_config](#limits_config) |
||||
* [table_manager_config](#table_manager_config) |
||||
* [provision_config](#provision_config) |
||||
* [auto_scaling_config](#auto_scaling_config) |
||||
|
||||
## Configuration File Reference |
||||
|
||||
To specify which configuration file to load, pass the `-config.file` flag at the |
||||
command line. The file is written in [YAML format](https://en.wikipedia.org/wiki/YAML), |
||||
defined by the scheme below. Brackets indicate that a parameter is optional. For |
||||
non-list parameters the value is set to the specified default. |
||||
|
||||
Generic placeholders are defined as follows: |
||||
|
||||
* `<boolean>`: a boolean that can take the values `true` or `false` |
||||
* `<int>`: any integer matching the regular expression `[1-9]+[0-9]*` |
||||
* `<duration>`: a duration matching the regular expression `[0-9]+(ms|[smhdwy])` |
||||
* `<labelname>`: a string matching the regular expression `[a-zA-Z_][a-zA-Z0-9_]*` |
||||
* `<labelvalue>`: a string of unicode characters |
||||
* `<filename>`: a valid path relative to current working directory or an |
||||
absolute path. |
||||
* `<host>`: a valid string consisting of a hostname or IP followed by an optional port number |
||||
* `<string>`: a regular string |
||||
* `<secret>`: a regular string that is a secret, such as a password |
||||
|
||||
Supported contents and default values of `loki.yaml`: |
||||
|
||||
```yaml |
||||
# The module to run Loki with. Supported values |
||||
# all, querier, table-manager, ingester, distributor |
||||
[target: <string> | default = "all"] |
||||
|
||||
# Enables authentication through the X-Scope-OrgID header, which must be present |
||||
# if true. If false, the OrgID will always be set to "fake". |
||||
[auth_enabled: <boolean> | default = true] |
||||
|
||||
# Configures the server of the launched module(s). |
||||
[server: <server_config>] |
||||
|
||||
# Configures the querier. Only appropriate when running all modules or |
||||
# just the querier. |
||||
[querier: <querier_config>] |
||||
|
||||
# Configures how the distributor will connect to ingesters. Only appropriate |
||||
# when running all modules, the distributor, or the querier. |
||||
[ingester_client: <ingester_client_config>] |
||||
|
||||
# Configures the ingester and how the ingester will register itself to a |
||||
# key value store. |
||||
[ingester: <ingester_config>] |
||||
|
||||
# Configures where Loki will store data. |
||||
[storage_config: <storage_config>] |
||||
|
||||
# Configures how Loki will store data in the specific store. |
||||
[chunk_store_config: <chunk_store_config>] |
||||
|
||||
# Configures the chunk index schema and where it is stored. |
||||
[schema_config: <schema_config>] |
||||
|
||||
# Configures limits per-tenant or globally |
||||
[limits_config: <limits_config>] |
||||
|
||||
# Configures the table manager for retention |
||||
[table_manager: <table_manager_config>] |
||||
``` |
||||
|
||||
## server_config |
||||
|
||||
The `server_config` block configures Promtail's behavior as an HTTP server: |
||||
|
||||
```yaml |
||||
# HTTP server listen host |
||||
[http_listen_host: <string>] |
||||
|
||||
# HTTP server listen port |
||||
[http_listen_port: <int> | default = 80] |
||||
|
||||
# gRPC server listen host |
||||
[grpc_listen_host: <string>] |
||||
|
||||
# gRPC server listen port |
||||
[grpc_listen_port: <int> | default = 9095] |
||||
|
||||
# Register instrumentation handlers (/metrics, etc.) |
||||
[register_instrumentation: <boolean> | default = true] |
||||
|
||||
# Timeout for graceful shutdowns |
||||
[graceful_shutdown_timeout: <duration> | default = 30s] |
||||
|
||||
# Read timeout for HTTP server |
||||
[http_server_read_timeout: <duration> | default = 30s] |
||||
|
||||
# Write timeout for HTTP server |
||||
[http_server_write_timeout: <duration> | default = 30s] |
||||
|
||||
# Idle timeout for HTTP server |
||||
[http_server_idle_timeout: <duration> | default = 120s] |
||||
|
||||
# Max gRPC message size that can be received |
||||
[grpc_server_max_recv_msg_size: <int> | default = 4194304] |
||||
|
||||
# Max gRPC message size that can be sent |
||||
[grpc_server_max_send_msg_size: <int> | default = 4194304] |
||||
|
||||
# Limit on the number of concurrent streams for gRPC calls (0 = unlimited) |
||||
[grpc_server_max_concurrent_streams: <int> | default = 100] |
||||
|
||||
# Log only messages with the given severity or above. Supported values [debug, |
||||
# info, warn, error] |
||||
[log_level: <string> | default = "info"] |
||||
|
||||
# Base path to server all API routes from (e.g., /v1/). |
||||
[http_path_prefix: <string>] |
||||
``` |
||||
|
||||
## querier_config |
||||
|
||||
The `querier_config` block configures the Loki Querier. |
||||
|
||||
```yaml |
||||
# Timeout when querying ingesters or storage during the execution of a |
||||
# query request. |
||||
[query_timeout: <duration> | default = 1m] |
||||
|
||||
# Limit of the duration for which live tailing requests should be |
||||
# served. |
||||
[tail_max_duration: <duration> | default = 1h] |
||||
|
||||
# Configuration options for the LogQL engine. |
||||
engine: |
||||
# Timeout for query execution |
||||
[timeout: <duration> | default = 3m] |
||||
|
||||
# The maximum amount of time to look back for log lines. Only |
||||
# applicable for instant log queries. |
||||
[max_look_back_period: <duration> | default = 30s] |
||||
``` |
||||
|
||||
## ingester_client_config |
||||
|
||||
The `ingester_client_config` block configures how connections to ingesters |
||||
operate. |
||||
|
||||
```yaml |
||||
# Configures how connections are pooled |
||||
pool_config: |
||||
# Whether or not to do health checks. |
||||
[health_check_ingesters: <boolean> | default = false] |
||||
|
||||
# How frequently to clean up clients for servers that have gone away after |
||||
# a health check. |
||||
[client_cleanup_period: <duration> | default = 15s] |
||||
|
||||
# How quickly a dead client will be removed after it has been detected |
||||
# to disappear. Set this to a value to allow time for a secondary |
||||
# health check to recover the missing client. |
||||
[remotetimeout: <duration>] |
||||
|
||||
# The remote request timeout on the client side. |
||||
[remote_timeout: <duration> | default = 5s] |
||||
|
||||
# Configures how the gRPC connection to ingesters work as a |
||||
# client. |
||||
[grpc_client_config: <grpc_client_config>] |
||||
``` |
||||
|
||||
### grpc_client_config |
||||
|
||||
The `grpc_client_config` block configures a client connection to a gRPC service. |
||||
|
||||
```yaml |
||||
# The maximum size in bytes the client can recieve |
||||
[max_recv_msg_size: <int> | default = 104857600] |
||||
|
||||
# The maximum size in bytes the client can send |
||||
[max_send_msg_size: <int> | default = 16777216] |
||||
|
||||
# Whether or not messages should be compressed |
||||
[use_gzip_compression: <bool> | default = false] |
||||
|
||||
# Rate limit for gRPC client. 0 is disabled |
||||
[rate_limit: <float> | default = 0] |
||||
|
||||
# Rate limit burst for gRPC client. |
||||
[rate_limit_burst: <int> | default = 0] |
||||
|
||||
# Enable backoff and retry when a rate limit is hit. |
||||
[backoff_on_ratelimits: <bool> | default = false] |
||||
|
||||
# Configures backoff when enbaled. |
||||
backoff_config: |
||||
# Minimum delay when backing off. |
||||
[minbackoff: <duration> | default = 100ms] |
||||
|
||||
# The maximum delay when backing off. |
||||
[maxbackoff: <duration> | default = 10s] |
||||
|
||||
# Number of times to backoff and retry before failing. |
||||
[maxretries: <int> | default = 10] |
||||
``` |
||||
|
||||
## ingester_config |
||||
|
||||
The `ingester_config` block configures Ingesters. |
||||
|
||||
```yaml |
||||
# Configures how the lifecycle of the ingester will operate |
||||
# and where it will register for discovery. |
||||
[lifecycler: <lifecycler_config>] |
||||
|
||||
# Number of times to try and transfer chunks when leaving before |
||||
# falling back to flushing to the store. |
||||
[max_transfer_retries: <int> | default = 10] |
||||
|
||||
# How many flushes can happen concurrently from each stream. |
||||
[concurrent_flushes: <int> | default = 16] |
||||
|
||||
# How often should the ingester see if there are any blocks |
||||
# to flush |
||||
[flush_check_period: <duration> | default = 30s] |
||||
|
||||
# The timeout before a flush is cancelled |
||||
[flush_op_timeout: <duration> | default = 10s] |
||||
|
||||
# How long chunks should be retained in-memory after they've |
||||
# been flushed. |
||||
[chunk_retain_period: <duration> | default = 15m] |
||||
|
||||
# How long chunks should sit in-memory with no updates before |
||||
# being flushed if they don't hit the max block size. This means |
||||
# that half-empty chunks will still be flushed after a certain |
||||
# period as long as they receieve no further activity. |
||||
[chunk_idle_period: <duration> | default = 30m] |
||||
|
||||
# The maximum size in bytes a chunk can be before it should be flushed. |
||||
[chunk_block_size: <int> | default = 262144] |
||||
``` |
||||
|
||||
### lifecycler_config |
||||
|
||||
The `lifecycler_config` is used by the Ingester to control how that ingester |
||||
registers itself into the ring and manages its lifecycle during its stay in the |
||||
ring. |
||||
|
||||
```yaml |
||||
# Configures the ring the lifecycler connects to |
||||
[ring: <ring_config>] |
||||
|
||||
# The number of tokens the lifecycler will generate and put into the ring if |
||||
# it joined without transfering tokens from another lifecycler. |
||||
[num_tokens: <int> | default = 128] |
||||
|
||||
# Period at which to heartbeat to the underlying ring. |
||||
[heartbeat_period: <duration> | default = 5s] |
||||
|
||||
# How long to wait to claim tokens and chunks from another member when |
||||
# that member is leaving. Will join automatically after the duration expires. |
||||
[join_after: <duration> | default = 0s] |
||||
|
||||
# Minimum duration to wait before becoming ready. This is to work around race |
||||
# conditions with ingesters exiting and updating the ring. |
||||
[min_ready_duration: <duration> | default = 1m] |
||||
|
||||
# Store tokens in a normalised fashion to reduce the number of allocations. |
||||
[normalise_tokens: <boolean> | default = false] |
||||
|
||||
# Name of network interfaces to read addresses from. |
||||
interface_names: |
||||
- [<string> ... | default = ["eth0", "en0"]] |
||||
|
||||
# Duration to sleep before exiting to ensure metrics are scraped. |
||||
[final_sleep: <duration> | default = 30s] |
||||
``` |
||||
|
||||
### ring_config |
||||
|
||||
The `ring_config` is used to discover and connect to Ingesters. |
||||
|
||||
```yaml |
||||
kvstore: |
||||
# The backend storage to use for the ring. Supported values are |
||||
# consul, etcd, inmemory |
||||
store: <string> |
||||
|
||||
# The prefix for the keys in the store. Should end with a /. |
||||
[prefix: <string> | default = "collectors/"] |
||||
|
||||
# Configuration for a Consul client. Only applies if store |
||||
# is "consul" |
||||
consul: |
||||
# The hostname and port of Consul. |
||||
[host: <string> | duration = "localhost:8500"] |
||||
|
||||
# The ACL Token used to interact with Consul. |
||||
[acltoken: <string>] |
||||
|
||||
# The HTTP timeout when communicating with Consul |
||||
[httpclienttimeout: <duration> | default = 20s] |
||||
|
||||
# Whether or not consistent reads to Consul are enabled. |
||||
[consistentreads: <boolean> | default = true] |
||||
|
||||
# Configuration for an ETCD v3 client. Only applies if |
||||
# store is "etcd" |
||||
etcd: |
||||
# The ETCD endpoints to connect to. |
||||
endpoints: |
||||
- <string> |
||||
|
||||
# The Dial timeout for the ETCD connection. |
||||
[dial_tmeout: <duration> | default = 10s] |
||||
|
||||
# The maximum number of retries to do for failed ops to ETCD. |
||||
[max_retries: <int> | default = 10] |
||||
|
||||
# The heartbeart timeout after which ingesters are skipped for |
||||
# reading and writing. |
||||
[heartbeart_timeout: <duration> | default = 1m] |
||||
|
||||
# The number of ingesters to write to and read from. Must be at least |
||||
# 1. |
||||
[replication_factor: <int> | default = 3] |
||||
``` |
||||
|
||||
## storage_config |
||||
|
||||
The `storage_config` block configures one of many possible stores for both the |
||||
index and chunks. Which configuration is read from depends on the schema_config |
||||
block and what is set for the store value. |
||||
|
||||
```yaml |
||||
# Configures storing chunks in AWS. Required options only required when aws is |
||||
# present. |
||||
aws: |
||||
# S3 or S3-compatible URL to connect to. If only region is specified as a |
||||
# host, the proper endpoint will be deduced. Use inmemory:///<bucket-name> to |
||||
# use a mock in-memory implementation. |
||||
s3: <string> |
||||
|
||||
# Set to true to force the request to use path-style addressing |
||||
[s3forcepathstyle: <boolean> | default = false] |
||||
|
||||
# Configure the DynamoDB conection |
||||
dynamodbconfig: |
||||
# URL for DynamoDB with escaped Key and Secret encoded. If only region is specified as a |
||||
# host, the proper endpoint will be deduced. Use inmemory:///<bucket-name> to |
||||
# use a mock in-memory implementation. |
||||
dynamodb: <string> |
||||
|
||||
# DynamoDB table management requests per-second limit. |
||||
[apilimit: <float> | default = 2.0] |
||||
|
||||
# DynamoDB rate cap to back off when throttled. |
||||
[throttlelimit: <float> | default = 10.0] |
||||
|
||||
# Application Autoscaling endpoint URL with escaped Key and Secret |
||||
# encoded. |
||||
[applicationautoscaling: <string>] |
||||
|
||||
# Metics-based autoscaling configuration. |
||||
metrics: |
||||
# Use metrics-based autoscaling via this Prometheus query URL. |
||||
[url: <string>] |
||||
|
||||
# Queue length above which we will scale up capacity. |
||||
[targetqueuelen: <int> | default = 100000] |
||||
|
||||
# Scale up capacity by this multiple |
||||
[scaleupfactor: <float64> | default = 1.3] |
||||
|
||||
# Ignore throttling below this level (rate per second) |
||||
[minthrottling: <float64> | default = 1] |
||||
|
||||
# Query to fetch ingester queue length |
||||
[queuelengthquery: <string> | default = "sum(avg_over_time(cortex_ingester_flush_queue_length{job="cortex/ingester"}[2m]))"] |
||||
|
||||
# Query to fetch throttle rates per table |
||||
[throttlequery: <string> | default = "sum(rate(cortex_dynamo_throttled_total{operation="DynamoDB.BatchWriteItem"}[1m])) by (table) > 0"] |
||||
|
||||
# Quer to fetch write capacity usage per table |
||||
[usagequery: <string> | default = "sum(rate(cortex_dynamo_consumed_capacity_total{operation="DynamoDB.BatchWriteItem"}[15m])) by (table) > 0"] |
||||
|
||||
# Query to fetch read capacity usage per table |
||||
[readusagequery: <string> | default = "sum(rate(cortex_dynamo_consumed_capacity_total{operation="DynamoDB.QueryPages"}[1h])) by (table) > 0"] |
||||
|
||||
# Query to fetch read errors per table |
||||
[readerrorquery: <string> | default = "sum(increase(cortex_dynamo_failures_total{operation="DynamoDB.QueryPages",error="ProvisionedThroughputExceededException"}[1m])) by (table) > 0"] |
||||
|
||||
# Number of chunks to group together to parallelise fetches (0 to disable) |
||||
[chunkgangsize: <int> | default = 10] |
||||
|
||||
# Max number of chunk get operations to start in parallel. |
||||
[chunkgetmaxparallelism: <int> | default = 32] |
||||
|
||||
# Configures storing chunks in Bigtable. Required fields only required |
||||
# when bigtable is defined in config. |
||||
bigtable: |
||||
# BigTable project ID |
||||
project: <string> |
||||
|
||||
# BigTable instance ID |
||||
instance: <string> |
||||
|
||||
# Configures the gRPC client used to connect to Bigtable. |
||||
[grpc_client_config: <grpc_client_config>] |
||||
|
||||
# Configures storing index in GCS. Required fields only required |
||||
# when gcs is defined in config. |
||||
gcs: |
||||
# Name of GCS bucket to put chunks in. |
||||
bucket_name: <string> |
||||
|
||||
# The size of the buffer that the GCS client uses for each PUT request. 0 |
||||
# to disable buffering. |
||||
[chunk_buffer_size: <int> | default = 0] |
||||
|
||||
# The duration after which the requests to GCS should be timed out. |
||||
[request_timeout: <duration> | default = 0s] |
||||
|
||||
# Configures storing chunks in Cassandra |
||||
cassandra: |
||||
# Comma-separated hostnames or IPs of Cassandra instances |
||||
addresses: <string> |
||||
|
||||
# Port that cassandra is running on |
||||
[port: <int> | default = 9042] |
||||
|
||||
# Keyspace to use in Cassandra |
||||
keyspace: <string> |
||||
|
||||
# Consistency level for Cassandra |
||||
[consistency: <string> | default = "QUORUM"] |
||||
|
||||
# Replication factor to use in Cassandra. |
||||
[replication_factor: <int> | default = 1] |
||||
|
||||
# Instruct the Cassandra driver to not attempt to get host |
||||
# info from the system.peers table. |
||||
[disable_initial_host_lookup: <bool> | default = false] |
||||
|
||||
# Use SSL when connecting to Cassandra instances. |
||||
[SSL: <boolean> | default = false] |
||||
|
||||
# Require SSL certificate validation when SSL is enabled. |
||||
[host_verification: <bool> | default = true] |
||||
|
||||
# Path to certificate file to verify the peer when SSL is |
||||
# enabled. |
||||
[CA_path: <string>] |
||||
|
||||
# Enable password authentication when connecting to Cassandra. |
||||
[auth: <bool> | default = false] |
||||
|
||||
# Username for password authentication when auth is true. |
||||
[username: <string>] |
||||
|
||||
# Password for password authentication when auth is true. |
||||
[password: <string>] |
||||
|
||||
# Timeout when connecting to Cassandra. |
||||
[timeout: <duration> | default = 600ms] |
||||
|
||||
# Initial connection timeout during initial dial to server. |
||||
[connect_timeout: <duration> | default = 600ms] |
||||
|
||||
# Configures storing index in BoltDB. Required fields only |
||||
# required when boltdb is present in config. |
||||
boltdb: |
||||
# Location of BoltDB index files. |
||||
directory: <string> |
||||
|
||||
# Configures storing the chunks on the local filesystem. Required |
||||
# fields only required when filesystem is present in config. |
||||
filesystem: |
||||
# Directory to store chunks in. |
||||
directory: <string> |
||||
|
||||
# Cache validity for active index entries. Should be no higher than |
||||
# the chunk_idle_period in the ingester settings. |
||||
[indexcachevalidity: <duration> | default = 5m] |
||||
|
||||
# The maximum number of chunks to fetch per batch. |
||||
[max_chunk_batch_size: <int> | default = 50] |
||||
|
||||
# Config for how the cache for index queries should |
||||
# be built. |
||||
index_queries_cache_config: <cache_config> |
||||
``` |
||||
|
||||
### cache_config |
||||
|
||||
The `cache_config` block configures how Loki will cache requests, chunks, and |
||||
the index to a backing cache store. |
||||
|
||||
```yaml |
||||
# Enable in-memory cache. |
||||
[enable_fifocache: <boolean>] |
||||
|
||||
# The default validity of entries for caches unless overriden. |
||||
# "defaul" is correct. |
||||
[defaul_validity: <duration>] |
||||
|
||||
# Configures the background cache when memcached is used. |
||||
background: |
||||
# How many goroutines to use to write back to memcached. |
||||
[writeback_goroutines: <int> | default = 10] |
||||
|
||||
# How many chunks to buffer for background write back to memcached. |
||||
[writeback_buffer: <int> = 10000] |
||||
|
||||
# Configures memcached settings. |
||||
memcached: |
||||
# Configures how long keys stay in memcached. |
||||
expiration: <duration> |
||||
|
||||
# Configures how many keys to fetch in each batch request. |
||||
batch_size: <int> |
||||
|
||||
# Maximum active requests to memcached. |
||||
[parallelism: <int> | default = 100] |
||||
|
||||
# Configures how to connect to one or more memcached servers. |
||||
memcached_client: |
||||
# The hostname to use for memcached services when caching chunks. If |
||||
# empty, no memcached will be used. A SRV lookup will be used. |
||||
[host: <string>] |
||||
|
||||
# SRV service used to discover memcached servers. |
||||
[service: <string> | default = "memcached"] |
||||
|
||||
# Maximum time to wait before giving up on memcached requests. |
||||
[timeout: <duration> | default = 100ms] |
||||
|
||||
# The maximum number of idle connections in the memcached client |
||||
# pool. |
||||
[max_idle_conns: <int> | default = 100] |
||||
|
||||
# The period with which to poll the DNS for memcached servers. |
||||
[update_interval: <duration> | default = 1m] |
||||
|
||||
# Whether or not to use a consistent hash to discover multiple memcached |
||||
# servers. |
||||
[consistent_hash: <bool>] |
||||
|
||||
fifocache: |
||||
# Number of entries to cache in-memory. |
||||
[size: <int> | default = 0] |
||||
|
||||
# The expiry duration for the in-memory cache. |
||||
[validity: <duration> | default = 0s] |
||||
``` |
||||
|
||||
## chunk_store_config |
||||
|
||||
The `chunk_store_config` block configures how chunks will be cached and how long |
||||
to wait before saving them to the backing store. |
||||
|
||||
```yaml |
||||
# The cache configuration for storing chunks |
||||
[chunk_cache_config: <cache_config>] |
||||
|
||||
# The cache configuration for deduplicating writes |
||||
[write_dedupe_cache_config: <cache_config>] |
||||
|
||||
# The minimum time between a chunk update and being saved |
||||
# to the store. |
||||
[min_chunk_age: <duration>] |
||||
|
||||
# Cache index entries older than this period. Default is |
||||
# disabled. |
||||
[cache_lookups_older_than: <duration>] |
||||
|
||||
# Limit how long back data can be queries. Default is disabled. |
||||
[max_look_back_period: <duration>] |
||||
``` |
||||
|
||||
## schema_config |
||||
|
||||
The `schema_config` block configures schemas from given dates. |
||||
|
||||
```yaml |
||||
# The configuration for chunk index schemas. |
||||
configs: |
||||
- [<period_config>] |
||||
``` |
||||
|
||||
### period_config |
||||
|
||||
The `period_config` block configures what index schemas should be used |
||||
for from specific time periods. |
||||
|
||||
```yaml |
||||
# The date of the first day that index buckets should be created. Use |
||||
# a date in the past if this is your only period_config, otherwise |
||||
# use a date when you want the schema to switch over. |
||||
[from: <daytime>] |
||||
|
||||
# store and object_store below affect which <storage_config> key is |
||||
# used. |
||||
|
||||
# Which store to use for the index. Either cassandra, bigtable, dynamodb, or |
||||
# boltdb |
||||
store: <string> |
||||
|
||||
# Which store to use for the chunks. Either gcs, s3, inmemory, filesystem, |
||||
# cassandra. If omitted, defaults to same value as store. |
||||
[object_store: <string>] |
||||
|
||||
# The schema to use. Set to v9 or v10. |
||||
schema: <string> |
||||
|
||||
# Configures how the index is updated and stored. |
||||
index: |
||||
# Table prefix for all period tables. |
||||
prefix: <string> |
||||
# Table period. |
||||
[period: <duration> | default = 168h] |
||||
# A map to be added to all managed tables. |
||||
tags: |
||||
[<string>: <string> ...] |
||||
|
||||
# Configured how the chunks are updated and stored. |
||||
chunks: |
||||
# Table prefix for all period tables. |
||||
prefix: <string> |
||||
# Table period. |
||||
[period: <duration> | default = 168h] |
||||
# A map to be added to all managed tables. |
||||
tags: |
||||
[<string>: <string> ...] |
||||
|
||||
# How many shards will be created. Only used if schema is v10. |
||||
[row_shards: <int> | default = 16] |
||||
``` |
||||
|
||||
Where `daytime` is a value in the format of `yyyy-mm-dd` like `2006-01-02`. |
||||
|
||||
## limits_config |
||||
|
||||
The `limits_config` block configures global and per-tenant limits for ingesting |
||||
logs in Loki. |
||||
|
||||
```yaml |
||||
# Per-user ingestion rate limit in samples per second. |
||||
[ingestion_rate: <float> | default = 25000] |
||||
|
||||
# Per-user allowed ingestion burst size (in number of samples). |
||||
[ingestion_burst_size: <int> | default = 50000] |
||||
|
||||
# Whether or not, for all users, samples with external labels |
||||
# identifying replicas in an HA Prometheus setup will be handled. |
||||
[accept_ha_samples: <boolean> | default = false] |
||||
|
||||
# Prometheus label to look for in samples to identify a |
||||
# Prometheus HA cluster. |
||||
[ha_cluster_label: <string> | default = "cluster"] |
||||
|
||||
# Prometheus label to look for in samples to identify a Prometheus HA |
||||
# replica. |
||||
[ha_replica_label: <string> | default = "__replica__"] |
||||
|
||||
# Maximum length of a label name. |
||||
[max_label_name_length: <int> | default = 1024] |
||||
|
||||
# Maximum length of a label value. |
||||
[max_label_value_length: <int> | default = 2048] |
||||
|
||||
# Maximum number of label names per series. |
||||
[max_label_names_per_series: <int> | default = 30] |
||||
|
||||
# Whether or not old samples will be rejected. |
||||
[reject_old_samples: <bool> | default = false] |
||||
|
||||
# Maximum accepted sample age before rejecting. |
||||
[reject_old_samples_max_age: <duration> | default = 336h] |
||||
|
||||
# Duration for a table to be created/deleted before/after it's |
||||
# needed. Samples won't be accepted before this time. |
||||
[creation_grace_period: <duration> | default = 10m] |
||||
|
||||
# Enforce every sample has a metric name. |
||||
[enforce_metric_name: <boolean> | default = true] |
||||
|
||||
# Maximum number of samples that a query can return. |
||||
[max_samples_per_query: <int> | default = 1000000] |
||||
|
||||
# Maximum number of active series per user. |
||||
[max_series_per_user: <int> | default = 5000000] |
||||
|
||||
# Maximum number of active series per metric name. |
||||
[max_series_per_metric: <int> | default = 50000] |
||||
|
||||
# Maximum number of chunks that can be fetched by a single query. |
||||
[max_chunks_per_query: <int> | default = 2000000] |
||||
|
||||
# The limit to length of chunk store queries. 0 to disable. |
||||
[max_query_length: <duration> | default = 0] |
||||
|
||||
# Maximum number of queries that will be scheduled in parallel by the |
||||
# frontend. |
||||
[max_query_parallelism: <int> | default = 14] |
||||
|
||||
# Cardinality limit for index queries |
||||
[cardinality_limit: <int> | default = 100000] |
||||
|
||||
# Filename of per-user overrides file |
||||
[per_tenant_override_config: <string>] |
||||
|
||||
# Period with which to reload the overrides file if configured. |
||||
[per_tenant_override_period: <duration> | default = 10s] |
||||
``` |
||||
|
||||
## table_manager_config |
||||
|
||||
The `table_manager_config` block configures how the table manager operates |
||||
and how to provision tables when DynamoDB is used as the backing store. |
||||
|
||||
```yaml |
||||
# Master 'off-switch' for table capacity updates, e.g. when troubleshooting |
||||
[throughput_updates_disabled: <boolean> | default = false] |
||||
|
||||
# Master 'on-switch' for table retention deletions |
||||
[retention_deletes_enabled: <boolean> | default = false] |
||||
|
||||
# How far back tables will be kept before they are deleted. 0s disables |
||||
# deletion. |
||||
[retention_period: <duration> | default = 0s] |
||||
|
||||
# Period with which the table manager will poll for tables. |
||||
[dynamodb_poll_interval: <duration> | default = 2m] |
||||
|
||||
# duration a table will be created before it is needed. |
||||
[creation_grace_period: <duration> | default = 10m] |
||||
|
||||
# Configures management of the index tables for DynamoDB. |
||||
index_tables_provisioning: <provision_config> |
||||
|
||||
# Configures management of the chunk tables for DynamoDB. |
||||
chunk_tables_provisioning: <provision_config> |
||||
``` |
||||
|
||||
### provision_config |
||||
|
||||
The `provision_config` block configures provisioning capacity for DynamoDB. |
||||
|
||||
```yaml |
||||
# Enables on-demand throughput provisioning for the storage |
||||
# provider, if supported. Applies only to tables which are not autoscaled. |
||||
[provisioned_throughput_on_demand_mode: <boolean> | default = false] |
||||
|
||||
# DynamoDB table default write throughput. |
||||
[provisioned_write_throughput: <int> | default = 3000] |
||||
|
||||
# DynamoDB table default read throughput. |
||||
[provisioned_read_throughput: <int> | default = 300] |
||||
|
||||
# Enables on-demand throughput provisioning for the storage provide, |
||||
# if supported. Applies only to tables which are not autoscaled. |
||||
[inactive_throughput_on_demand_mode: <boolean> | default = false] |
||||
|
||||
# DynamoDB table write throughput for inactive tables. |
||||
[inactive_write_throughput: <int> | default = 1] |
||||
|
||||
# DynamoDB table read throughput for inactive tables. |
||||
[inactive_read_throughput: <int> | Default = 300] |
||||
|
||||
# Active table write autoscale config. |
||||
[write_scale: <auto_scaling_config>] |
||||
|
||||
# Inactive table write autoscale config. |
||||
[inactive_write_scale: <auto_scaling_config>] |
||||
|
||||
# Number of last inactive tables to enable write autoscale. |
||||
[inactive_write_scale_lastn: <int>] |
||||
|
||||
# Active table read autoscale config. |
||||
[read_scale: <auto_scaling_config>] |
||||
|
||||
# Inactive table read autoscale config. |
||||
[inactive_read_scale: <auto_scaling_config>] |
||||
|
||||
# Number of last inactive tables to enable read autoscale. |
||||
[inactive_read_scale_lastn: <int>] |
||||
``` |
||||
|
||||
#### auto_scaling_config |
||||
|
||||
The `auto_scaling_config` block configures autoscaling for DynamoDB. |
||||
|
||||
```yaml |
||||
# Whether or not autoscaling should be enabled. |
||||
[enabled: <boolean>: default = false] |
||||
|
||||
# AWS AutoScaling role ARN |
||||
[role_arn: <string>] |
||||
|
||||
# DynamoDB minimum provision capacity. |
||||
[min_capacity: <int> | default = 3000] |
||||
|
||||
# DynamoDB maximum provision capacity. |
||||
[max_capacity: <int> | default = 6000] |
||||
|
||||
# DynamoDB minimum seconds between each autoscale up. |
||||
[out_cooldown: <int> | default = 1800] |
||||
|
||||
# DynamoDB minimum seconds between each autoscale down. |
||||
[in_cooldown: <int> | default = 1800] |
||||
|
||||
# DynamoDB target ratio of consumed capacity to provisioned capacity. |
||||
[target: <float> | default = 80] |
||||
``` |
||||
@ -0,0 +1,163 @@ |
||||
# Loki Configuration Examples |
||||
|
||||
1. [Complete Local Config](#complete-local-config) |
||||
2. [Google Cloud Storage](#google-cloud-storage) |
||||
3. [Cassandra Index](#cassandra-index) |
||||
4. [AWS](#aws) |
||||
|
||||
## Complete Local config |
||||
|
||||
```yaml |
||||
auth_enabled: false |
||||
|
||||
server: |
||||
http_listen_port: 3100 |
||||
|
||||
ingester: |
||||
lifecycler: |
||||
address: 127.0.0.1 |
||||
ring: |
||||
kvstore: |
||||
store: inmemory |
||||
replication_factor: 1 |
||||
final_sleep: 0s |
||||
chunk_idle_period: 5m |
||||
chunk_retain_period: 30s |
||||
|
||||
schema_config: |
||||
configs: |
||||
- from: 2018-04-15 |
||||
store: boltdb |
||||
object_store: filesystem |
||||
schema: v9 |
||||
index: |
||||
prefix: index_ |
||||
period: 168h |
||||
|
||||
storage_config: |
||||
boltdb: |
||||
directory: /tmp/loki/index |
||||
|
||||
filesystem: |
||||
directory: /tmp/loki/chunks |
||||
|
||||
limits_config: |
||||
enforce_metric_name: false |
||||
reject_old_samples: true |
||||
reject_old_samples_max_age: 168h |
||||
|
||||
chunk_store_config: |
||||
max_look_back_period: 0 |
||||
|
||||
table_manager: |
||||
chunk_tables_provisioning: |
||||
inactive_read_throughput: 0 |
||||
inactive_write_throughput: 0 |
||||
provisioned_read_throughput: 0 |
||||
provisioned_write_throughput: 0 |
||||
index_tables_provisioning: |
||||
inactive_read_throughput: 0 |
||||
inactive_write_throughput: 0 |
||||
provisioned_read_throughput: 0 |
||||
provisioned_write_throughput: 0 |
||||
retention_deletes_enabled: false |
||||
retention_period: 0 |
||||
``` |
||||
|
||||
## Google Cloud Storage |
||||
|
||||
This is partial config that uses GCS and Bigtable for the chunk and index |
||||
stores, respectively. |
||||
|
||||
```yaml |
||||
schema_config: |
||||
configs: |
||||
- from: 2018-04-15 |
||||
store: bigtable |
||||
object_store: gcs |
||||
schema: v9 |
||||
index: |
||||
prefix: loki_index_ |
||||
period: 168h |
||||
|
||||
storage_config: |
||||
bigtable: |
||||
instance: BIGTABLE_INSTANCE |
||||
project: BIGTABLE_PROJECT |
||||
gcs: |
||||
bucket_name: GCS_BUCKET_NAME |
||||
``` |
||||
|
||||
## Cassandra Index |
||||
|
||||
This is a partial config that uses the local filesystem for chunk storage and |
||||
Cassandra for the index storage: |
||||
|
||||
```yaml |
||||
schema_config: |
||||
configs: |
||||
- from: 2018-04-15 |
||||
store: cassandra |
||||
object_store: filesystem |
||||
schema: v9 |
||||
index: |
||||
prefix: cassandra_table |
||||
period: 168h |
||||
|
||||
storage_config: |
||||
cassandra: |
||||
username: cassandra |
||||
password: cassandra |
||||
addresses: 127.0.0.1 |
||||
auth: true |
||||
keyspace: lokiindex |
||||
|
||||
filesystem: |
||||
directory: /tmp/loki/chunks |
||||
``` |
||||
|
||||
## AWS |
||||
|
||||
This is a partial config that uses S3 for chunk storage and DynamoDB for the |
||||
index storage: |
||||
|
||||
```yaml |
||||
schema_config: |
||||
configs: |
||||
- from: 0 |
||||
store: dynamo |
||||
object_store: s3 |
||||
schema: v9 |
||||
index: |
||||
prefix: dynamodb_table_name |
||||
period: 0 |
||||
storage_config: |
||||
aws: |
||||
s3: s3://access_key:secret_access_key@region/bucket_name |
||||
dynamodbconfig: |
||||
dynamodb: dynamodb://access_key:secret_access_key@region |
||||
``` |
||||
|
||||
If you don't wish to hard-code S3 credentials, you can also configure an EC2 |
||||
instance role by changing the `storage_config` section: |
||||
|
||||
```yaml |
||||
storage_config: |
||||
aws: |
||||
s3: s3://region/bucket_name |
||||
dynamodbconfig: |
||||
dynamodb: dynamodb://region |
||||
``` |
||||
|
||||
### S3-compatible APIs |
||||
|
||||
S3-compatible APIs (e.g., Ceph Object Storage with an S3-compatible API) can be |
||||
used. If the API supports path-style URL rather than virtual hosted bucket |
||||
addressing, configure the URL in `storage_config` with the custom endpoint: |
||||
|
||||
```yaml |
||||
storage_config: |
||||
aws: |
||||
s3: s3://access_key:secret_access_key@custom_endpoint/bucket_name |
||||
s3forcepathstyle: true |
||||
``` |
||||
@ -0,0 +1,6 @@ |
||||
# Getting started with Loki |
||||
|
||||
1. [Grafana](grafana.md) |
||||
2. [LogCLI](logcli.md) |
||||
4. [Troubleshooting](troubleshooting.md) |
||||
|
||||
@ -0,0 +1,29 @@ |
||||
# Loki in Grafana |
||||
|
||||
Grafana ships with built-in support for Loki for versions greater than |
||||
[6.0](https://grafana.com/grafana/download/6.0.0). Using |
||||
[6.3](https://grafana.com/grafana/download/6.3.0) or later is highly |
||||
recommended to take advantage of new LogQL functionality. |
||||
|
||||
1. Log into your Grafana instance. If this is your first time running |
||||
Grafana, the username and password are both defaulted to `admin`. |
||||
2. In Grafana, go to `Configuration` > `Data Sources` via the cog icon on the |
||||
left sidebar. |
||||
3. Click the big <kbd>+ Add data source</kbd> button. |
||||
4. Choose Loki from the list. |
||||
5. The http URL field should be the address of your Loki server. For example, |
||||
when running locally or with Docker using port mapping, the address is |
||||
likely `http://localhost:3100`. When running with docker-compose or |
||||
Kubernetes, the address is likely `https://loki:3100`. |
||||
6. To see the logs, click <kbd>Explore</kbd> on the sidebar, select the Loki |
||||
datasource in the top-left dropdown, and then choose a log stream using the |
||||
<kbd>Log labels</kbd> button. |
||||
|
||||
Read more about Grafana's Explore feature in the |
||||
[Grafana documentation](http://docs.grafana.org/features/explore) and on how to |
||||
search and filter for logs with Loki. |
||||
|
||||
> To configure the datasource via provisioning, see [Configuring Grafana via |
||||
> Provisioning](http://docs.grafana.org/features/datasources/loki/#configure-the-datasource-with-provisioning) |
||||
> in the Grafana documentation and make sure to adjust the URL similarly as |
||||
> shown above. |
||||
@ -0,0 +1,126 @@ |
||||
# Troubleshooting Loki |
||||
|
||||
## "Loki: Bad Gateway. 502" |
||||
|
||||
This error can appear in Grafana when Loki is added as a |
||||
datasource, indicating that Grafana in unable to connect to Loki. There may |
||||
one of many root causes: |
||||
|
||||
- If Loki is deployed with Docker, and Grafana and Loki are not running in the |
||||
same node, check your firewall to make sure the nodes can connect. |
||||
- If Loki is deployed with Kubernetes: |
||||
- If Grafana and Loki are in the same namespace, set the Loki URL as |
||||
`http://$LOKI_SERVICE_NAME:$LOKI_PORT` |
||||
- Otherwise, set the Loki URL as |
||||
`http://$LOKI_SERVICE_NAME.$LOKI_NAMESPACE:$LOKI_PORT` |
||||
|
||||
## "Data source connected, but no labels received. Verify that Loki and Promtail is configured properly." |
||||
|
||||
This error can appear in Grafana when Loki is added as a datasource, indicating |
||||
that although Grafana has connected to Loki, Loki hasn't received any logs from |
||||
Promtail yet. There may be one of many root causes: |
||||
|
||||
- Promtail is running and collecting logs but is unable to connect to Loki to |
||||
send the logs. Check Promtail's output. |
||||
- Promtail started sending logs to Loki before Loki was ready. This can |
||||
happen in test environment where Promtail has already read all logs and sent |
||||
them off. Here is what you can do: |
||||
- Start Promtail after Loki, e.g., 60 seconds later. |
||||
- To force Promtail to re-send log messages, delete the positions file |
||||
(default location `/tmp/positions.yaml`). |
||||
- Promtail is ignoring targets and isn't reading any logs because of a |
||||
configuration issue. |
||||
- This can be detected by turning on debug logging in Promtail and looking |
||||
for `dropping target, no labels` or `ignoring target` messages. |
||||
- Promtail cannot find the location of your log files. Check that the |
||||
`scrape_configs` contains valid path settings for finding the logs on your |
||||
worker nodes. |
||||
- Your pods are running with different labels than the ones Promtail is |
||||
configured to read. Check `scrape_configs` to validate. |
||||
|
||||
## Troubleshooting targets |
||||
|
||||
Promtail exposes two web pages that can be used to understand how its service |
||||
discovery works. |
||||
|
||||
The service discovery page (`/service-discovery`) shows all |
||||
discovered targets with their labels before and after relabeling as well as |
||||
the reason why the target has been dropped. |
||||
|
||||
The targets page (`/targets`) displays only targets that are being actively |
||||
scraped and their respective labels, files, and positions. |
||||
|
||||
On Kubernetes, you can access those two pages by port-forwarding the Promtail |
||||
port (`9080` or `3101` if using Helm) locally: |
||||
|
||||
```bash |
||||
$ kubectl port-forward loki-promtail-jrfg7 9080 |
||||
# Then, in a web browser, visit http://localhost:9080/service-discovery |
||||
``` |
||||
|
||||
## Debug output |
||||
|
||||
Both `loki` and `promtail` support a log level flag on the command-line: |
||||
|
||||
```bash |
||||
$ loki —log.level=debug |
||||
$ promtail -log.level=debug |
||||
``` |
||||
|
||||
## Failed to create target, `ioutil.ReadDir: readdirent: not a directory` |
||||
|
||||
The Promtail configuration contains a `__path__` entry to a directory that |
||||
Promtail cannot find. |
||||
|
||||
## Connecting to a Promtail pod to troubleshoot |
||||
|
||||
First check [Troubleshooting targets](#troubleshooting-targets) section above. |
||||
If that doesn't help answer your questions, you can connect to the Promtail pod |
||||
to investigate further. |
||||
|
||||
If you are running Promtail as a DaemonSet in your cluster, you will have a |
||||
Promtail pod on each node, so figure out which Promtail you need to debug first: |
||||
|
||||
|
||||
```shell |
||||
$ kubectl get pods --all-namespaces -o wide |
||||
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE |
||||
... |
||||
nginx-7b6fb56fb8-cw2cm 1/1 Running 0 41d 10.56.4.12 node-ckgc <none> |
||||
... |
||||
promtail-bth9q 1/1 Running 0 3h 10.56.4.217 node-ckgc <none> |
||||
``` |
||||
|
||||
That output is truncated to highlight just the two pods we are interested in, |
||||
you can see with the `-o wide` flag the NODE on which they are running. |
||||
|
||||
You'll want to match the node for the pod you are interested in, in this example |
||||
NGINX, to the Promtail running on the same node. |
||||
|
||||
To debug you can connect to the Promtail pod: |
||||
|
||||
```shell |
||||
kubectl exec -it promtail-bth9q -- /bin/sh |
||||
``` |
||||
|
||||
Once connected, verify the config in `/etc/promtail/promtail.yml` has the |
||||
contents you expect. |
||||
|
||||
Also check `/var/log/positions.yaml` (`/run/promtail/positions.yaml` when |
||||
deployed by Helm or whatever value is specified for `positions.file`) and make |
||||
sure Promtail is tailing the logs you would expect. |
||||
|
||||
You can check the Promtail log by looking in `/var/log/containers` at the |
||||
Promtail container log. |
||||
|
||||
## Enable tracing for Loki |
||||
|
||||
Loki can be traced using [Jaeger](https://www.jaegertracing.io/) by setting |
||||
the environment variable `JAEGER_AGENT_HOST` to the hostname and port where |
||||
Loki is running. |
||||
|
||||
If you deploy with Helm, use the following command: |
||||
|
||||
```bash |
||||
$ helm upgrade --install loki loki/loki --set "loki.tracing.jaegerAgentHost=YOUR_JAEGER_AGENT_HOST" |
||||
``` |
||||
@ -0,0 +1,5 @@ |
||||
# Installing Loki |
||||
|
||||
1. [Installing using Tanka (recommended)](./tanka.md) |
||||
2. [Installing through Helm](./helm.md) |
||||
3. [Installing locally](./local.md) |
||||
@ -0,0 +1,108 @@ |
||||
# Installing Loki with Helm |
||||
|
||||
## Prerequisites |
||||
|
||||
Make sure you have Helm [installed](https://helm.sh/docs/using_helm/#installing-helm) and |
||||
[deployed](https://helm.sh/docs/using_helm/#installing-tiller) to your cluster. Then add |
||||
Loki's chart repository to Helm: |
||||
|
||||
```bash |
||||
$ helm repo add loki https://grafana.github.io/loki/charts |
||||
``` |
||||
|
||||
You can update the chart repository by running: |
||||
|
||||
```bash |
||||
$ helm repo update |
||||
``` |
||||
|
||||
## Deploy Loki to your cluster |
||||
|
||||
### Deploy with default config |
||||
|
||||
```bash |
||||
$ helm upgrade --install loki loki/loki-stack |
||||
``` |
||||
|
||||
### Deploy in a custom namespace |
||||
|
||||
```bash |
||||
$ helm upgrade --install loki --namespace=loki loki/loki |
||||
``` |
||||
|
||||
### Deploy with custom config |
||||
|
||||
```bash |
||||
$ helm upgrade --install loki loki/loki --set "key1=val1,key2=val2,..." |
||||
``` |
||||
|
||||
### Deploy Loki Stack (Loki, Promtail, Grafana, Prometheus) |
||||
|
||||
```bash |
||||
$ helm upgrade --install loki loki/loki-stack |
||||
``` |
||||
|
||||
## Deploy Grafana to your cluster |
||||
|
||||
To install Grafana on your cluster with Helm, use the following command: |
||||
|
||||
```bash |
||||
$ helm install stable/grafana -n loki-grafana |
||||
``` |
||||
|
||||
To get the admin password for the Grafana pod, run the following command: |
||||
|
||||
```bash |
||||
$ kubectl get secret --namespace <YOUR-NAMESPACE> loki-grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo |
||||
``` |
||||
|
||||
To access the Grafana UI, run the following command: |
||||
|
||||
```bash |
||||
$ kubectl port-forward --namespace <YOUR-NAMESPACE> service/loki-grafana 3000:80 |
||||
``` |
||||
|
||||
Navigate to `http://localhost:3000` and login with `admin` and the password |
||||
output above. Then follow the [instructions for adding the Loki Data Source](../getting-started/grafana.md), using the URL |
||||
`http://loki:3100/` for Loki. |
||||
|
||||
## Run Loki behind HTTPS ingress |
||||
|
||||
If Loki and Promtail are deployed on different clusters you can add an Ingress |
||||
in front of Loki. By adding a certificate you create an HTTPS endpoint. For |
||||
extra security you can also enable Basic Authentication on the Ingress. |
||||
|
||||
In Promtail, set the following values to communicate using HTTPS and basic |
||||
authentication: |
||||
|
||||
```yaml |
||||
loki: |
||||
serviceScheme: https |
||||
user: user |
||||
password: pass |
||||
``` |
||||
|
||||
Sample Helm template for Ingress: |
||||
|
||||
```yaml |
||||
apiVersion: extensions/v1beta1 |
||||
kind: Ingress |
||||
metadata: |
||||
annotations: |
||||
kubernetes.io/ingress.class: {{ .Values.ingress.class }} |
||||
ingress.kubernetes.io/auth-type: "basic" |
||||
ingress.kubernetes.io/auth-secret: {{ .Values.ingress.basic.secret }} |
||||
name: loki |
||||
spec: |
||||
rules: |
||||
- host: {{ .Values.ingress.host }} |
||||
http: |
||||
paths: |
||||
- backend: |
||||
serviceName: loki |
||||
servicePort: 3100 |
||||
tls: |
||||
- secretName: {{ .Values.ingress.cert }} |
||||
hosts: |
||||
- {{ .Values.ingress.host }} |
||||
``` |
||||
@ -0,0 +1,42 @@ |
||||
# Installing Loki Locally |
||||
|
||||
## Release Binaries |
||||
|
||||
Every [Loki release](https://github.com/grafana/loki/releases) includes |
||||
prebuilt binaries: |
||||
|
||||
```bash |
||||
# download a binary (modify app, os, and arch as needed) |
||||
# Installs v0.3.0. Go to the releases page for the latest version |
||||
$ curl -fSL -o "/usr/local/bin/loki.gz" "https://github.com/grafana/loki/releases/download/v0.3.0/loki_linux_amd64.gz" |
||||
$ gunzip "/usr/local/bin/loki.gz" |
||||
|
||||
# make sure it is executable |
||||
$ chmod a+x "/usr/local/bin/loki" |
||||
``` |
||||
|
||||
## Manual Build |
||||
|
||||
### Prerequisites |
||||
|
||||
- Go 1.11 or later |
||||
- Make |
||||
- Docker (for updating protobuf files and yacc files) |
||||
|
||||
### Building |
||||
|
||||
Clone Loki to `$GOPATH/src/github.com/grafana/loki`: |
||||
|
||||
```bash |
||||
$ git clone https://github.com/grafana/loki $(go env GOPATH)/src/github.com/grafana/loki |
||||
``` |
||||
|
||||
Then change into that directory and run `make loki`: |
||||
|
||||
```bash |
||||
$ cd $(go env GOPATH)/src/github.com/grafana/loki |
||||
$ make loki |
||||
|
||||
# A file at ./cmd/loki/loki will be created and is the |
||||
# final built binary. |
||||
``` |
||||
@ -0,0 +1,66 @@ |
||||
# Installing Loki with Tanka |
||||
|
||||
[Tanka](https://tanka.dev) is a reimplementation of |
||||
[Ksonnet](https://ksonnet.io) that Grafana Labs created after Ksonnet was |
||||
deprecated. Tanka is used by Grafana Labs to run Loki in production. |
||||
|
||||
## Prerequisites |
||||
|
||||
Grab the latest version of Tanka (at least version v0.5.0) for the `tk env` |
||||
commands. Prebuilt binaries for Tanka can be found at the [Tanka releases |
||||
URL](https://github.com/grafana/tanka/releases). |
||||
|
||||
In your config repo, if you don't have a Tanka application, create a folder and |
||||
call `tk init` inside of it. Then create an environment for Loki and provide the |
||||
URL for the Kubernetes API server to deploy to (e.g., `https://localhost:6443`): |
||||
|
||||
``` |
||||
$ mkdir <application name> |
||||
$ cd <application name> |
||||
$ tk init |
||||
$ tk env add environments/loki --namespace=loki --server=<Kubernetes API server> |
||||
``` |
||||
|
||||
## Deploying |
||||
|
||||
Grab the Loki module using `jb`: |
||||
|
||||
```bash |
||||
$ go get -u github.com/jsonnet-bundler/jsonnet-bundler/cmd/jb |
||||
$ jb init |
||||
$ jb install github.com/grafana/loki/production/ksonnet/loki |
||||
``` |
||||
|
||||
Be sure to replace the username, password and the relevant `htpasswd` contents. |
||||
Making sure to set the value for username, password, and `htpasswd` properly, |
||||
replace the contents of `environments/loki/main.jsonnet` with: |
||||
|
||||
```jsonnet |
||||
local gateway = import 'loki/gateway.libsonnet'; |
||||
local loki = import 'loki/loki.libsonnet'; |
||||
local promtail = import 'promtail/promtail.libsonnet'; |
||||
|
||||
loki + promtail + gateway { |
||||
_config+:: { |
||||
namespace: 'loki', |
||||
htpasswd_contents: 'loki:$apr1$H4yGiGNg$ssl5/NymaGFRUvxIV1Nyr.', |
||||
|
||||
promtail_config: { |
||||
scheme: 'http', |
||||
hostname: 'gateway.%(namespace)s.svc' % $._config, |
||||
username: 'loki', |
||||
password: 'password', |
||||
container_root_path: '/var/lib/docker', |
||||
}, |
||||
|
||||
replication_factor: 3, |
||||
consul_replicas: 1, |
||||
}, |
||||
} |
||||
``` |
||||
|
||||
Notice that `container_root_path` is your own data root for the Docker Daemon, |
||||
run `docker info | grep "Root Dir"` to get it. |
||||
|
||||
Run `tk show environments/loki` to see the manifests that will be deployed to the cluster and |
||||
finally run `tk apply environments/loki` to deploy it. |
||||
@ -1,5 +0,0 @@ |
||||
# logentry |
||||
|
||||
Both the Docker Driver and Promtail support transformations on received log |
||||
entries to control what data is sent to Loki. Please see the documentation |
||||
on how to [process log lines](processing-log-lines.md) for more information. |
||||
@ -1,621 +0,0 @@ |
||||
# Processing Log Lines |
||||
|
||||
A detailed look at how to setup promtail to process your log lines, including extracting metrics and labels. |
||||
|
||||
* [Pipeline](#pipeline) |
||||
* [Stages](#stages) |
||||
|
||||
## Pipeline |
||||
|
||||
Pipeline stages implement the following interface: |
||||
|
||||
```go |
||||
type Stage interface { |
||||
Process(labels model.LabelSet, extracted map[string]interface{}, time *time.Time, entry *string) |
||||
} |
||||
``` |
||||
|
||||
Any Stage is capable of modifying the `labels`, `extracted` data, `time`, and/or `entry`, though generally a Stage should only modify one of those things to reduce complexity. |
||||
|
||||
Typical pipelines will start with a [regex](#regex) or [json](#json) stage to extract data from the log line. Then any combination of other stages follow to use the data in the `extracted` map. It may also be common to see the use of [match](#match) at the start of a pipeline to selectively apply stages based on labels. |
||||
|
||||
The example below gives a good glimpse of what you can achieve with a pipeline : |
||||
|
||||
```yaml |
||||
scrape_configs: |
||||
- job_name: kubernetes-pods-name |
||||
kubernetes_sd_configs: .... |
||||
pipeline_stages: |
||||
- match: |
||||
selector: '{name="promtail"}' |
||||
stages: |
||||
- regex: |
||||
expression: '.*level=(?P<level>[a-zA-Z]+).*ts=(?P<timestamp>[T\d-:.Z]*).*component=(?P<component>[a-zA-Z]+)' |
||||
- labels: |
||||
level: |
||||
component: |
||||
- timestamp: |
||||
format: RFC3339Nano |
||||
source: timestamp |
||||
- match: |
||||
selector: '{name="nginx"}' |
||||
stages: |
||||
- regex: |
||||
expression: \w{1,3}.\w{1,3}.\w{1,3}.\w{1,3}(?P<output>.*) |
||||
- output: |
||||
source: output |
||||
- match: |
||||
selector: '{name="jaeger-agent"}' |
||||
stages: |
||||
- json: |
||||
expressions: |
||||
level: level |
||||
- labels: |
||||
level: |
||||
- job_name: kubernetes-pods-app |
||||
kubernetes_sd_configs: .... |
||||
pipeline_stages: |
||||
- match: |
||||
selector: '{app=~"grafana|prometheus"}' |
||||
stages: |
||||
- regex: |
||||
expression: ".*(lvl|level)=(?P<level>[a-zA-Z]+).*(logger|component)=(?P<component>[a-zA-Z]+)" |
||||
- labels: |
||||
level: |
||||
component: |
||||
- match: |
||||
selector: '{app="some-app"}' |
||||
stages: |
||||
- regex: |
||||
expression: ".*(?P<panic>panic: .*)" |
||||
- metrics: |
||||
- panic_total: |
||||
type: Counter |
||||
description: "total count of panic" |
||||
source: panic |
||||
config: |
||||
action: inc |
||||
``` |
||||
|
||||
In the first job: |
||||
|
||||
The first `match` stage will only run if a label named `name` == `promtail`, it then applies a regex to parse the line, followed by setting two labels (level and component) and the timestamp from extracted data. |
||||
|
||||
The second `match` stage will only run if a label named `name` == `nginx`, it then parses the log line with regex and extracts the `output` which is then set as the log line output sent to loki |
||||
|
||||
The third `match` stage will only run if label named `name` == `jaeger-agent`, it then parses this log as JSON extracting `level` which is then set as a label |
||||
|
||||
In the second job: |
||||
|
||||
The first `match` stage will only run if a label named `app` == `grafana` or `prometheus`, it then parses the log line with regex, and sets two new labels of level and component from the extracted data. |
||||
|
||||
The second `match` stage will only run if a label named `app` == `some-app`, it then parses the log line and creates an extracted key named panic if it finds `panic: ` in the log line. Then a metrics stage will increment a counter if the extracted key `panic` is found in the `extracted` map. |
||||
|
||||
More info on each field in the interface: |
||||
|
||||
##### labels model.LabelSet |
||||
|
||||
A set of prometheus style labels which will be sent with the log line and will be indexed by Loki. |
||||
|
||||
##### extracted map[string]interface{} |
||||
|
||||
metadata extracted during the pipeline execution which can be used by subsequent stages. This data is not sent with the logs and is dropped after the log entry is processed through the pipeline. |
||||
|
||||
For example, stages like [regex](#regex) and [json](#json) will use expressions to extract data from a log line and store it in the `extracted` map, which following stages like [timestamp](#timestamp) or [output](#output) can use to manipulate the log lines `time` and `entry`. |
||||
|
||||
##### time *time.Time |
||||
|
||||
The timestamp which loki will store for the log line, if not set within the pipeline using the [timestamp](#timestamp) stage, it will default to time.Now(). |
||||
|
||||
##### entry *string |
||||
|
||||
The log line which will be stored by loki, the [output](#output) stage is capable of modifying this value, if no stage modifies this value the log line stored will match what was input to the system and not be modified. |
||||
|
||||
## Stages |
||||
|
||||
Extracting data (for use by other stages) |
||||
|
||||
* [regex](#regex) - use regex to extract data |
||||
* [json](#json) - parse a JSON log and extract data |
||||
|
||||
Modifying extracted data |
||||
|
||||
* [template](#template) - use Go templates to modify extracted data |
||||
|
||||
Filtering stages |
||||
|
||||
* [match](#match) - apply selectors to conditionally run stages based on labels |
||||
|
||||
Mutating/manipulating output |
||||
|
||||
* [timestamp](#timestamp) - set the timestamp sent to Loki |
||||
* [output](#output) - set the log content sent to Loki |
||||
|
||||
Adding Labels |
||||
|
||||
* [labels](#labels) - add labels to the log stream |
||||
|
||||
Metrics |
||||
|
||||
* [metrics](#metrics) - calculate metrics from the log content |
||||
|
||||
### regex |
||||
|
||||
A regex stage will take the provided regex and set the named groups as data in the `extracted` map. |
||||
|
||||
```yaml |
||||
- regex: |
||||
expression: ① |
||||
source: ② |
||||
``` |
||||
|
||||
① `expression` is **required** and needs to be a [golang RE2 regex string](https://github.com/google/re2/wiki/Syntax). Every capture group `(re)` will be set into the `extracted` map, every capture group **must be named:** `(?P<name>re)`, the name will be used as the key in the map. |
||||
② `source` is optional and contains the name of key in the `extracted` map containing the data to parse. If omitted, the regex stage will parse the log `entry`. |
||||
|
||||
##### Example (without source): |
||||
|
||||
```yaml |
||||
- regex: |
||||
expression: "^(?s)(?P<time>\\S+?) (?P<stream>stdout|stderr) (?P<flags>\\S+?) (?P<content>.*)$" |
||||
``` |
||||
|
||||
Log line: `2019-01-01T01:00:00.000000001Z stderr P i'm a log message!` |
||||
|
||||
Would create the following `extracted` map: |
||||
|
||||
```go |
||||
{ |
||||
"time": "2019-01-01T01:00:00.000000001Z", |
||||
"stream": "stderr", |
||||
"flags": "P", |
||||
"content": "i'm a log message", |
||||
} |
||||
``` |
||||
|
||||
These map entries can then be used by other pipeline stages such as [timestamp](#timestamp) and/or [output](#output) |
||||
|
||||
[Example in unit test](../../pkg/logentry/stages/regex_test.go) |
||||
|
||||
##### Example (with source): |
||||
|
||||
```yaml |
||||
- json: |
||||
expressions: |
||||
time: |
||||
- regex: |
||||
expression: "^(?P<year>\\d+)" |
||||
source: "time" |
||||
``` |
||||
|
||||
Log line: `{"time":"2019-01-01T01:00:00.000000001Z"}` |
||||
|
||||
Would create the following `extracted` map: |
||||
|
||||
```go |
||||
{ |
||||
"time": "2019-01-01T01:00:00.000000001Z", |
||||
"year": "2019" |
||||
} |
||||
``` |
||||
|
||||
These map entries can then be used by other pipeline stages such as [timestamp](#timestamp) and/or [output](#output) |
||||
|
||||
### json |
||||
|
||||
A json stage will take the provided [JMESPath expressions](http://jmespath.org/) and set the key/value data in the `extracted` map. |
||||
|
||||
```yaml |
||||
- json: |
||||
expressions: ① |
||||
key: expression ② |
||||
source: ③ |
||||
``` |
||||
|
||||
① `expressions` is a required yaml object containing key/value pairs of JMESPath expressions |
||||
② `key: expression` where `key` will be the key in the `extracted` map, and the value will be the evaluated JMESPath expression. |
||||
③ `source` is optional and contains the name of key in the `extracted` map containing the json to parse. If omitted, the json stage will parse the log `entry`. |
||||
|
||||
This stage uses the Go JSON unmarshaller, which means non string types like numbers or booleans will be unmarshalled into those types. The `extracted` map will accept non-string values and this stage will keep primitive types as they are unmarshalled (e.g. bool or float64). Downstream stages will need to perform correct type conversion of these values as necessary. |
||||
|
||||
If the value is a complex type, for example a JSON object, it will be marshalled back to JSON before being put in the `extracted` map. |
||||
|
||||
##### Example (without source): |
||||
|
||||
```yaml |
||||
- json: |
||||
expressions: |
||||
output: log |
||||
stream: stream |
||||
timestamp: time |
||||
``` |
||||
|
||||
Log line: `{"log":"log message\n","stream":"stderr","time":"2019-04-30T02:12:41.8443515Z"}` |
||||
|
||||
Would create the following `extracted` map: |
||||
|
||||
```go |
||||
{ |
||||
"output": "log message\n", |
||||
"stream": "stderr", |
||||
"timestamp": "2019-04-30T02:12:41.8443515" |
||||
} |
||||
``` |
||||
[Example in unit test](../../pkg/logentry/stages/json_test.go) |
||||
|
||||
##### Example (with source): |
||||
|
||||
```yaml |
||||
- json: |
||||
expressions: |
||||
output: log |
||||
stream: stream |
||||
timestamp: time |
||||
extra: |
||||
- json: |
||||
expressions: |
||||
user: |
||||
source: extra |
||||
``` |
||||
|
||||
Log line: `{"log":"log message\n","stream":"stderr","time":"2019-04-30T02:12:41.8443515Z","extra":"{\"user\":\"marco\"}"}` |
||||
|
||||
Would create the following `extracted` map: |
||||
|
||||
```go |
||||
{ |
||||
"output": "log message\n", |
||||
"stream": "stderr", |
||||
"timestamp": "2019-04-30T02:12:41.8443515", |
||||
"extra": "{\"user\":\"marco\"}", |
||||
"user": "marco" |
||||
} |
||||
``` |
||||
|
||||
|
||||
#### template |
||||
|
||||
A template stage lets you manipulate the values in the `extracted` data map using [Go's template package](https://golang.org/pkg/text/template/). This can be useful if you want to manipulate data extracted by regex or json stages before setting label values. Maybe to replace all spaces with underscores or make everything lowercase, or append some values to the extracted data. |
||||
|
||||
You can set values in the extracted map for keys that did not previously exist. |
||||
|
||||
```yaml |
||||
- template: |
||||
source: ① |
||||
template: ② |
||||
``` |
||||
|
||||
① `source` is **required** and is the key to the value in the `extracted` data map you wish to modify, this key does __not__ have to be present and will be added if missing. |
||||
② `template` is **required** and is a [Go template string](https://golang.org/pkg/text/template/) |
||||
|
||||
The value of the extracted data map is accessed by using `.Value` in your template |
||||
|
||||
In addition to normal template syntax, several functions have also been mapped to use directly or in a pipe configuration: |
||||
|
||||
```go |
||||
"ToLower": strings.ToLower, |
||||
"ToUpper": strings.ToUpper, |
||||
"Replace": strings.Replace, |
||||
"Trim": strings.Trim, |
||||
"TrimLeft": strings.TrimLeft, |
||||
"TrimRight": strings.TrimRight, |
||||
"TrimPrefix": strings.TrimPrefix, |
||||
"TrimSuffix": strings.TrimSuffix, |
||||
"TrimSpace": strings.TrimSpace, |
||||
``` |
||||
|
||||
##### Example |
||||
|
||||
```yaml |
||||
- template: |
||||
source: app |
||||
template: '{{ .Value }}_some_suffix' |
||||
``` |
||||
|
||||
This would take the value of the `app` key in the `extracted` data map and append `_some_suffix` to it. For example, if `app=loki` the new value for `app` in the map would be `loki_some_suffix` |
||||
|
||||
```yaml |
||||
- template: |
||||
source: app |
||||
template: '{{ ToLower .Value }}' |
||||
``` |
||||
|
||||
This would take the value of `app` from `extracted` data and lowercase all the letters. If `app=LOKI` the new value for `app` would be `loki`. |
||||
|
||||
The template syntax passes paramters to functions using space delimiters, functions only taking a single argument can also use the pipe syntax: |
||||
|
||||
```yaml |
||||
- template: |
||||
source: app |
||||
template: '{{ .Value | ToLower }}' |
||||
``` |
||||
|
||||
A more complicated function example: |
||||
|
||||
```yaml |
||||
- template: |
||||
source: app |
||||
template: '{{ Replace .Value "loki" "bloki" 1 }}' |
||||
``` |
||||
|
||||
The arguments here as described for the [Replace function](https://golang.org/pkg/strings/#Replace), in this example we are saying to Replace in the string `.Value` (which is our extracted value for the `app` key) the occurrence of the string "loki" with the string "bloki" exactly 1 time. |
||||
|
||||
[More examples in unit test](../../pkg/logentry/stages/template_test.go) |
||||
|
||||
|
||||
### match |
||||
|
||||
A match stage will take the provided label `selector` and determine if a group of provided Stages will be executed or not based on labels |
||||
|
||||
```yaml |
||||
- match: |
||||
selector: "{app=\"loki\"}" ① |
||||
pipeline_name: loki_pipeline ② |
||||
stages: ③ |
||||
``` |
||||
① `selector` is **required** and must be a [logql stream selector](../querying.md#log-stream-selector). |
||||
② `pipeline_name` is **optional** but when defined, will create an additional label on the `pipeline_duration_seconds` histogram, the value for `pipeline_name` will be concatenated with the `job_name` using an underscore: `job_name`_`pipeline_name` |
||||
③ `stages` is a **required** list of additional pipeline stages which will only be executed if the defined `selector` matches the labels. The format is a list of pipeline stages which is defined exactly the same as the root pipeline |
||||
|
||||
|
||||
[Example in unit test](../../pkg/logentry/stages/match_test.go) |
||||
|
||||
|
||||
### timestamp |
||||
|
||||
A timestamp stage will parse data from the `extracted` map and set the `time` value which will be stored by Loki. The timestamp stage is important for having log entries in the correct order. In the absence of this stage, promtail will associate the current timestamp to the log entry. |
||||
|
||||
```yaml |
||||
- timestamp: |
||||
source: ① |
||||
format: ② |
||||
location: ③ |
||||
``` |
||||
|
||||
① `source` is **required** and is the key name to data in the `extracted` map. |
||||
② `format` is **required** and is the input to Go's [time.parse](https://golang.org/pkg/time/#Parse) function. |
||||
③ `location` is **optional** and is an IANA Timezone Database string, see the [go docs](https://golang.org/pkg/time/#LoadLocation) for more info |
||||
|
||||
Several of Go's pre-defined format's can be used by their name: |
||||
|
||||
```go |
||||
ANSIC = "Mon Jan _2 15:04:05 2006" |
||||
UnixDate = "Mon Jan _2 15:04:05 MST 2006" |
||||
RubyDate = "Mon Jan 02 15:04:05 -0700 2006" |
||||
RFC822 = "02 Jan 06 15:04 MST" |
||||
RFC822Z = "02 Jan 06 15:04 -0700" // RFC822 with numeric zone |
||||
RFC850 = "Monday, 02-Jan-06 15:04:05 MST" |
||||
RFC1123 = "Mon, 02 Jan 2006 15:04:05 MST" |
||||
RFC1123Z = "Mon, 02 Jan 2006 15:04:05 -0700" // RFC1123 with numeric zone |
||||
RFC3339 = "2006-01-02T15:04:05-07:00" |
||||
RFC3339Nano = "2006-01-02T15:04:05.999999999-07:00" |
||||
``` |
||||
|
||||
Additionally support for common Unix timestamps is supported: |
||||
|
||||
```go |
||||
Unix = 1562708916 |
||||
UnixMs = 1562708916414 |
||||
UnixNs = 1562708916000000123 |
||||
``` |
||||
|
||||
Finally any custom format can be supplied, and will be passed directly in as the layout parameter in `time.Parse()`. If the custom format has no year component specified (ie. syslog's default logs), promtail will assume the current year should be used, correctly handling the edge cases around new year's eve. |
||||
|
||||
The syntax used by the custom format defines the reference date and time using specific values for each component of the timestamp (ie. `Mon Jan 2 15:04:05 -0700 MST 2006`). The following table shows supported reference values which should be used in the custom format. |
||||
|
||||
| Timestamp component | Format value | |
||||
| ------------------- | ------------ | |
||||
| Year | `06`, `2006` | |
||||
| Month | `1`, `01`, `Jan`, `January` | |
||||
| Day | `2`, `02`, `_2` (two digits right justified) | |
||||
| Day of the week | `Mon`, `Monday` | |
||||
| Hour | `3` (12-hour), `03` (12-hour zero prefixed), `15` (24-hour) | |
||||
| Minute | `4`, `04` | |
||||
| Second | `5`, `05` | |
||||
| Fraction of second | `.000` (ms zero prefixed), `.000000` (μs), `.000000000` (ns), `.999` (ms without trailing zeroes), `.999999` (μs), `.999999999` (ns) | |
||||
| 12-hour period | `pm`, `PM` | |
||||
| Timezone name | `MST` | |
||||
| Timezone offset | `-0700`, `-070000` (with seconds), `-07`, `07:00`, `-07:00:00` (with seconds) | |
||||
| Timezone ISO-8601 | `Z0700` (Z for UTC or time offset), `Z070000`, `Z07`, `Z07:00`, `Z07:00:00` |
||||
|
||||
_For more details, read the [`time.Parse()`](https://golang.org/pkg/time/#Parse) docs and [`format.go`](https://golang.org/src/time/format.go) sources._ |
||||
|
||||
##### Example: |
||||
|
||||
```yaml |
||||
- timestamp: |
||||
source: time |
||||
format: RFC3339Nano |
||||
``` |
||||
|
||||
This stage would be placed after the [regex](#regex) example stage above, and the resulting `extracted` map _time_ value would be stored by Loki. |
||||
|
||||
|
||||
[Example in unit test](../../pkg/logentry/stages/timestamp_test.go) |
||||
|
||||
### output |
||||
|
||||
An output stage will take data from the `extracted` map and set the `entry` value which be stored by Loki. |
||||
|
||||
```yaml |
||||
- output: |
||||
source: ① |
||||
``` |
||||
|
||||
① `source` is **required** and is the key name to data in the `extracted` map. |
||||
|
||||
##### Example: |
||||
|
||||
```yaml |
||||
- output: |
||||
source: content |
||||
``` |
||||
|
||||
This stage would be placed after the [regex](#regex) example stage above, and the resulting `extracted` map _content_ value would be stored as the log value by Loki. |
||||
|
||||
[Example in unit test](../../pkg/logentry/stages/output_test.go) |
||||
|
||||
### labels |
||||
|
||||
A label stage will take data from the `extracted` map and set additional `labels` on the log line. |
||||
|
||||
```yaml |
||||
- labels: |
||||
label_name: source ①② |
||||
``` |
||||
|
||||
① `label_name` is **required** and will be the name of the label added. |
||||
② `"source"` is **optional**, if not provided the label_name is used as the source key into the `extracted` map |
||||
|
||||
##### Example: |
||||
|
||||
```yaml |
||||
- labels: |
||||
stream: |
||||
``` |
||||
|
||||
This stage when placed after the [regex](#regex) example stage above, would create the following `labels`: |
||||
|
||||
```go |
||||
{ |
||||
"stream": "stderr", |
||||
} |
||||
``` |
||||
|
||||
[Example in unit test](../../pkg/logentry/stages/labels_test.go) |
||||
|
||||
### metrics |
||||
|
||||
A metrics stage will define and update metrics from `extracted` data. |
||||
|
||||
[Simple example in unit test](../../pkg/logentry/stages/metrics_test.go) |
||||
|
||||
Several metric types are available: |
||||
|
||||
#### Counter |
||||
|
||||
```yaml |
||||
- metrics: |
||||
counter_name: ① |
||||
type: Counter ② |
||||
description: ③ |
||||
source: ④ |
||||
config: |
||||
value: ⑤ |
||||
action: ⑥ |
||||
``` |
||||
|
||||
① `counter_name` is **required** and should be set to the desired counters name. |
||||
② `type` is **required** and should be the word `Counter` (case insensitive). |
||||
③ `description` is **optional** but recommended. |
||||
④ `source` is **optional** and is will be used as the key in the `extracted` data map, if not provided it will default to the `counter_name`. |
||||
⑤ `value` is **optional**, if present, the metric will only be operated on if `value` == `extracted[source]`. For example, if `value` is _panic_ then the counter will only be modified if `extracted[source] == "panic"`. |
||||
⑥ `action` is **required** and must be either `inc` or `add` (case insensitive). If `add` is chosen, the value of the `extracted` data will be used as the parameter to the method and therefore must be convertible to a positive float. |
||||
|
||||
##### Examples |
||||
|
||||
```yaml |
||||
- metrics: |
||||
log_lines_total: |
||||
type: Counter |
||||
description: "total number of log lines" |
||||
source: time |
||||
config: |
||||
action: inc |
||||
``` |
||||
|
||||
This counter will increment whenever the _time_ key is present in the `extracted` map, since every log entry should have a timestamp this is a good field to pick if you wanted to count every line. Notice `value` is missing here because we don't care what the value is, we want to match every timestamp. Also we use `inc` because we are not interested in the value of the extracted _time_ field. |
||||
|
||||
```yaml |
||||
- regex: |
||||
expression: "^.*(?P<order_success>order successful).*$" |
||||
- metrics: |
||||
succesful_orders_total: |
||||
type: Counter |
||||
description: "log lines with the message `order successful`" |
||||
source: order_success |
||||
config: |
||||
action: inc |
||||
``` |
||||
|
||||
This combo regex and counter would count any log line which has the words `order successful` in it. |
||||
|
||||
```yaml |
||||
- regex: |
||||
expression: "^.* order_status=(?P<order_status>.*?) .*$" |
||||
- metrics: |
||||
succesful_orders_total: |
||||
type: Counter |
||||
description: "successful orders" |
||||
source: order_status |
||||
config: |
||||
value: success |
||||
action: inc |
||||
failed_orders_total: |
||||
type: Counter |
||||
description: "failed orders" |
||||
source: order_status |
||||
config: |
||||
fail: fail |
||||
action: inc |
||||
``` |
||||
|
||||
Similarly, this would look for a key=value pair of `order_status=success` or `order_status=fail` and increment each counter respectively. |
||||
|
||||
#### Gauge |
||||
|
||||
```yaml |
||||
- metrics: |
||||
gauge_name: ① |
||||
type: Gauge ② |
||||
description: ③ |
||||
source: ④ |
||||
config: |
||||
value: ⑤ |
||||
action: ⑥ |
||||
``` |
||||
|
||||
① `gauge_name` is **required** and should be set to the desired counters name. |
||||
② `type` is **required** and should be the word `Gauge` (case insensitive). |
||||
③ `description` is **optional** but recommended. |
||||
④ `source` is **optional** and is will be used as the key in the `extracted` data map, if not provided it will default to the `gauge_name`. |
||||
⑤ `value` is **optional**, if present, the metric will only be operated on if `value` == `extracted[source]`. For example, if `value` is _panic_ then the counter will only be modified if `extracted[source] == "panic"`. |
||||
⑥ `action` is **required** and must be either `set`, `inc`, `dec`, `add` or `sub` (case insensitive). If `add`, `set`, or `sub`, is chosen, the value of the `extracted` data will be used as the parameter to the method and therefore must be convertible to a positive float. |
||||
|
||||
##### Example |
||||
|
||||
Gauge examples will be very similar to Counter examples with additional `action` values |
||||
|
||||
#### Histogram |
||||
|
||||
```yaml |
||||
- metrics: |
||||
histogram_name: ① |
||||
type: Histogram ② |
||||
description: ③ |
||||
source: ④ |
||||
config: |
||||
value: ⑤ |
||||
buckets: [] ⑥⑦ |
||||
``` |
||||
|
||||
① `histogram_name` is **required** and should be set to the desired counters name. |
||||
② `type` is **required** and should be the word `Histogram` (case insensitive). |
||||
③ `description` is **optional** but recommended. |
||||
④ `source` is **optional** and is will be used as the key in the `extracted` data map, if not provided it will default to the `histogram_name`. |
||||
⑤ `value` is **optional**, if present, the metric will only be operated on if `value` == `extracted[source]`. For example, if `value` is _panic_ then the counter will only be modified if `extracted[source] == "panic"`. |
||||
⑥ `action` is **required** and must be either `inc` or `add` (case insensitive). If `add` is chosen, the value of the `extracted` data will be used as the parameter in `add()` and therefore must be convertible to a numeric type. |
||||
⑦ bucket values should be an array of numeric type |
||||
|
||||
##### Example |
||||
|
||||
```yaml |
||||
- metrics: |
||||
http_response_time_seconds: |
||||
type: Histogram |
||||
description: "length of each log line" |
||||
source: response_time |
||||
config: |
||||
buckets: [0.001,0.0025,0.005,0.010,0.025,0.050] |
||||
``` |
||||
|
||||
This would create a Histogram which looks for _response_time_ in the `extracted` data and applies the value to the histogram. |
||||
@ -0,0 +1,153 @@ |
||||
# LogQL: Log Query Language |
||||
|
||||
Loki comes with its very own language for querying logs called *LogQL*. LogQL |
||||
can be considered a distributed `grep` with labels for filtering. |
||||
|
||||
A basic LogQL query consists of two parts: the **log stream selector** and a |
||||
**filter expression**. Due to Loki's design, all LogQL queries are required to |
||||
contain a log stream selector. |
||||
|
||||
The log stream selector will reduce the number of log streams to a manageable |
||||
volume. Depending how many labels you use to filter down the log streams will |
||||
affect the relative performance of the query's execution. The filter expression |
||||
is then used to do a distributed `grep` over the retrieved log streams. |
||||
|
||||
### Log Stream Selector |
||||
|
||||
The log stream selector determines which log streams should be included in your |
||||
query. The stream selector is comprised of one or more key-value pairs, where |
||||
each key is a **log label** and the value is that label's value. |
||||
|
||||
The log stream selector is written by wrapping the key-value pairs in a |
||||
pair of curly braces: |
||||
|
||||
``` |
||||
{app="mysql",name="mysql-backup"} |
||||
``` |
||||
|
||||
In this example, log streams that have a label of `app` whose value is `mysql` |
||||
_and_ a label of `name` whose value is `mysql-backup` will be included in the |
||||
query results. |
||||
|
||||
The `=` operator after the label name is a **label matching operator**. The |
||||
following label matching operators are supported: |
||||
|
||||
- `=`: exactly equal. |
||||
- `!=`: not equal. |
||||
- `=~`: regex matches. |
||||
- `!~`: regex does not match. |
||||
|
||||
Examples: |
||||
|
||||
- `{name=~"mysql.+"}` |
||||
- `{name!~"mysql.+"}` |
||||
|
||||
The same rules that apply for [Prometheus Label |
||||
Selectors](https://prometheus.io/docs/prometheus/latest/querying/basics/#instant-vector-selectors) |
||||
apply for Loki log stream selectors. |
||||
|
||||
### Filter Expression |
||||
|
||||
After writing the log stream selector, the resulting set of logs can be filtered |
||||
further with a search expression. The search expression can be just text or |
||||
regex: |
||||
|
||||
- `{job="mysql"} |= "error"` |
||||
- `{name="kafka"} |~ "tsdb-ops.*io:2003"` |
||||
- `{instance=~"kafka-[23]",name="kafka"} != kafka.server:type=ReplicaManager` |
||||
|
||||
In the previous examples, `|=`, `|~`, and `!=` act as **filter operators** and |
||||
the following filter operators are supported: |
||||
|
||||
- `|=`: Log line contains string. |
||||
- `!=`: Log line does not contain string. |
||||
- `|~`: Log line matches regular expression. |
||||
- `!~`: Log line does not match regular expression. |
||||
|
||||
Filter operators can be chained and will sequentially filter down the |
||||
expression - resulting log lines must satisfy _every_ filter: |
||||
|
||||
`{job="mysql"} |= "error" != "timeout"` |
||||
|
||||
When using `|~` and `!~`, |
||||
[Go RE2 syntax](https://github.com/google/re2/wiki/Syntax) regex may be used. The |
||||
matching is case-sensitive by default and can be switched to case-insensitive |
||||
prefixing the regex with `(?i)`. |
||||
|
||||
## Counting logs |
||||
|
||||
LogQL also supports functions that wrap a query and allow for counting entries |
||||
per stream. |
||||
|
||||
### Range Vector aggregation |
||||
|
||||
LogQL shares the same [range vector](https://prometheus.io/docs/prometheus/latest/querying/basics/#range-vector-selectors) |
||||
concept from Prometheus, except the selected range of samples include a value of |
||||
1 for each log entry. An aggregation can be applied over the selected range to |
||||
transform it into an instance vector. |
||||
|
||||
The currently supported functions for operating over are: |
||||
|
||||
- `rate`: calculate the number of entries per second |
||||
- `count_over_time`: counts the entries for each log stream within the given |
||||
range. |
||||
|
||||
> `count_over_time({job="mysql"}[5m])` |
||||
|
||||
This example counts all the log lines within the last five minutes for the |
||||
MySQL job. |
||||
|
||||
> `rate( ( {job="mysql"} |= "error" != "timeout)[10s] ) )` |
||||
|
||||
This example demonstrates that a fully LogQL query can be wrapped in the |
||||
aggregation syntax, including filter expressions. This example gets the |
||||
per-second rate of all non-timeout errors within the last ten seconds for the |
||||
MySQL job. |
||||
|
||||
### Aggregation operators |
||||
|
||||
Like [PromQL](https://prometheus.io/docs/prometheus/latest/querying/operators/#aggregation-operators), |
||||
LogQL supports a subset of built-in aggregation operators that can be used to |
||||
aggregate the element of a single vector, resulting in a new vector of fewer |
||||
elements but with aggregated values: |
||||
|
||||
- `sum`: Calculate sum over labels |
||||
- `min`: Select minimum over labels |
||||
- `max`: Select maximum over labels |
||||
- `avg`: Calculate the average over labels |
||||
- `stddev`: Calculate the population standard deviation over labels |
||||
- `stdvar`: Calculate the population standard variance over labels |
||||
- `count`: Count number of elements in the vector |
||||
- `bottomk`: Select smallest k elements by sample value |
||||
- `topk`: Select largest k elements by sample value |
||||
|
||||
The aggregation operators can either be used to aggregate over all label |
||||
values or a set of distinct label values by including a `withou` or a |
||||
`by` clause: |
||||
|
||||
> `<aggr-op>([parameter,] <vector expression>) [without|by (<label list>)]` |
||||
|
||||
`parameter` is only required when using `topk` and `bottomk`. `topk` and |
||||
`bottomk` are different from other aggregators in that a subset of the input |
||||
samples, including the original labels, are returned in the result vector. `by` |
||||
and `without` are only used to group the input vector. |
||||
|
||||
The `without` cause removes the listed labels from the resulting vector, keeping |
||||
all others. The `by` clause does the opposite, dropping labels that are not |
||||
listed in the clause, even if their label values are identical between all |
||||
elements of the vector. |
||||
|
||||
#### Examples |
||||
|
||||
Get the top 10 applications by the highest log throughput: |
||||
|
||||
> `topk(10,sum(rate({region="us-east1"}[5m]) by (name))` |
||||
|
||||
Get the count of logs during the last five minutes, grouping |
||||
by level: |
||||
|
||||
> `sum(count_over_time({job="mysql"}[5m])) by (level)` |
||||
|
||||
Get the rate of HTTP GET requests from NGINX logs: |
||||
|
||||
> `avg(rate(({job="nginx"} |= "GET")[10s])) by (region)` |
||||
@ -1,60 +0,0 @@ |
||||
# Overview |
||||
|
||||
Loki is the heart of the whole logging stack. It is responsible for permanently |
||||
storing the ingested log lines, as well as executing the queries against its |
||||
persistent store to analyze the contents. |
||||
|
||||
## Architecture |
||||
|
||||
Loki mainly consists of three and a half individual services that achieve this |
||||
in common. The high level architecture is based on Cortex, most of their documentation |
||||
usually applies to Loki as well. |
||||
|
||||
### Distributor |
||||
|
||||
The distributor can be considered the "first stop" for the log lines ingested by |
||||
the agents (e.g. Promtail). |
||||
|
||||
It performs validation tasks on the data, splits it into batches and sends it to |
||||
multiple Ingesters in parallel. |
||||
|
||||
Distributors communicate with ingesters via gRPC. They are *stateless* and can be scaled up and down as needed. |
||||
|
||||
Refer to the [Cortex |
||||
docs](https://github.com/cortexproject/cortex/blob/master/docs/architecture.md#distributor) |
||||
for details on the internals. |
||||
|
||||
### Ingester |
||||
|
||||
The ingester service is responsible for de-duplicating and persisting the data |
||||
to long-term storage backends (DynamoDB, S3, Cassandra, etc.). |
||||
|
||||
Ingesters are semi-*stateful*, the maintain the last 12 hours worth of logs |
||||
before flushing to the [Chunk store](#chunk-store). When restarting ingesters, |
||||
care must be taken no to lose this data. |
||||
|
||||
More details can be found in the [Cortex |
||||
docs](https://github.com/cortexproject/cortex/blob/master/docs/architecture.md#ingester). |
||||
|
||||
### Chunk store |
||||
|
||||
Loki is not database, so it needs some place to persist the ingested log lines |
||||
to, for a longer period of time. |
||||
|
||||
The chunk store is not really a service of Loki in the traditional way, but |
||||
rather some storage backend Loki uses. |
||||
|
||||
It consists of a key-value (KV) store for the actual **chunk data** and an |
||||
**index store** to keep track of them. Refer to [Storage](storage.md) for details. |
||||
|
||||
The [Cortex |
||||
docs](https://github.com/cortexproject/cortex/blob/master/docs/architecture.md#chunk-store) |
||||
also have good information about this. |
||||
|
||||
### Querier |
||||
|
||||
The Querier executes the LogQL queries from clients such as Grafana and LogCLI. |
||||
|
||||
It fetches its data directly from the [Chunk store](#chunk-store) and the |
||||
[Ingesters](#ingester). |
||||
|
||||
@ -1,311 +0,0 @@ |
||||
# Loki API |
||||
|
||||
The Loki server has the following API endpoints (_Note:_ Authentication is out of scope for this project): |
||||
|
||||
- `POST /api/prom/push` |
||||
|
||||
For sending log entries, expects a snappy compressed proto in the HTTP Body: |
||||
|
||||
- [ProtoBuffer definition](/pkg/logproto/logproto.proto) |
||||
- [Golang client library](/pkg/promtail/client/client.go) |
||||
|
||||
Also accepts JSON formatted requests when the header `Content-Type: application/json` is sent. Example of the JSON format: |
||||
|
||||
```json |
||||
{ |
||||
"streams": [ |
||||
{ |
||||
"labels": "{foo=\"bar\"}", |
||||
"entries": [{ "ts": "2018-12-18T08:28:06.801064-04:00", "line": "baz" }] |
||||
} |
||||
] |
||||
} |
||||
|
||||
``` |
||||
|
||||
- `GET /api/v1/query` |
||||
|
||||
For doing instant queries at a single point in time, accepts the following parameters in the query-string: |
||||
|
||||
- `query`: a logQL query |
||||
- `limit`: max number of entries to return (not used for metric queries) |
||||
- `time`: the evaluation time for the query, as a nanosecond Unix epoch (nanoseconds since 1970). Default is always now. |
||||
- `direction`: `forward` or `backward`, useful when specifying a limit. Default is backward. |
||||
|
||||
Loki needs to query the index store in order to find log streams for particular labels and the store is spread out by time, |
||||
so you need to specify the time and labels accordingly. Querying a long time into the history will cause additional |
||||
load to the index server and make the query slower. |
||||
|
||||
Responses looks like this: |
||||
|
||||
```json |
||||
{ |
||||
"status" : "success", |
||||
"data": { |
||||
"resultType": "vector" | "streams", |
||||
"result": <value> |
||||
} |
||||
} |
||||
``` |
||||
|
||||
Examples: |
||||
|
||||
```bash |
||||
$ curl -G -s "http://localhost:3100/api/v1/query" --data-urlencode 'query=sum(rate({job="varlogs"}[10m])) by (level)' | jq |
||||
{ |
||||
"status" : "success", |
||||
"data": { |
||||
"resultType": "vector", |
||||
"result": [ |
||||
{ |
||||
"metric": {}, |
||||
"value": [ |
||||
1559848867745737, |
||||
"1267.1266666666666" |
||||
] |
||||
}, |
||||
{ |
||||
"metric": { |
||||
"level": "warn" |
||||
}, |
||||
"value": [ |
||||
1559848867745737, |
||||
"37.77166666666667" |
||||
] |
||||
}, |
||||
{ |
||||
"metric": { |
||||
"level": "info" |
||||
}, |
||||
"value": [ |
||||
1559848867745737, |
||||
"37.69" |
||||
] |
||||
} |
||||
] |
||||
} |
||||
} |
||||
``` |
||||
|
||||
```bash |
||||
curl -G -s "http://localhost:3100/api/v1/query" --data-urlencode 'query={job="varlogs"}' | jq |
||||
{ |
||||
"status" : "success", |
||||
"data": { |
||||
"resultType": "streams", |
||||
"result": [ |
||||
{ |
||||
"labels": "{filename=\"/var/log/myproject.log\", job=\"varlogs\", level=\"info\"}", |
||||
"entries": [ |
||||
{ |
||||
"ts": "2019-06-06T19:25:41.972739Z", |
||||
"line": "foo" |
||||
}, |
||||
{ |
||||
"ts": "2019-06-06T19:25:41.972722Z", |
||||
"line": "bar" |
||||
} |
||||
] |
||||
} |
||||
] |
||||
} |
||||
} |
||||
``` |
||||
|
||||
- `GET /api/v1/query_range` |
||||
|
||||
For doing queries over a range of time, accepts the following parameters in the query-string: |
||||
|
||||
- `query`: a logQL query |
||||
- `limit`: max number of entries to return (not used for metric queries) |
||||
- `start`: the start time for the query, as a nanosecond Unix epoch (nanoseconds since 1970). Default is always one hour ago. |
||||
- `end`: the end time for the query, as a nanosecond Unix epoch (nanoseconds since 1970). Default is always now. |
||||
- `step`: query resolution step width in seconds. Default 1 second. |
||||
- `direction`: `forward` or `backward`, useful when specifying a limit. Default is backward. |
||||
|
||||
Loki needs to query the index store in order to find log streams for particular labels and the store is spread out by time, |
||||
so you need to specify the time and labels accordingly. Querying a long time into the history will cause additional |
||||
load to the index server and make the query slower. |
||||
|
||||
Responses looks like this: |
||||
|
||||
```json |
||||
{ |
||||
"status" : "success", |
||||
"data": { |
||||
"resultType": "matrix" | "streams", |
||||
"result": <value> |
||||
} |
||||
} |
||||
``` |
||||
|
||||
Examples: |
||||
|
||||
```bash |
||||
$ curl -G -s "http://localhost:3100/api/v1/query_range" --data-urlencode 'query=sum(rate({job="varlogs"}[10m])) by (level)' --data-urlencode 'step=300' | jq |
||||
{ |
||||
"status" : "success", |
||||
"data": { |
||||
"resultType": "matrix", |
||||
"result": [ |
||||
{ |
||||
"metric": { |
||||
"level": "info" |
||||
}, |
||||
"values": [ |
||||
[ |
||||
1559848958663735, |
||||
"137.95" |
||||
], |
||||
[ |
||||
1559849258663735, |
||||
"467.115" |
||||
], |
||||
[ |
||||
1559849558663735, |
||||
"658.8516666666667" |
||||
] |
||||
] |
||||
}, |
||||
{ |
||||
"metric": { |
||||
"level": "warn" |
||||
}, |
||||
"values": [ |
||||
[ |
||||
1559848958663735, |
||||
"137.27833333333334" |
||||
], |
||||
[ |
||||
1559849258663735, |
||||
"467.69" |
||||
], |
||||
[ |
||||
1559849558663735, |
||||
"660.6933333333334" |
||||
] |
||||
] |
||||
} |
||||
] |
||||
} |
||||
} |
||||
``` |
||||
|
||||
```bash |
||||
curl -G -s "http://localhost:3100/api/v1/query_range" --data-urlencode 'query={job="varlogs"}' | jq |
||||
{ |
||||
"status" : "success", |
||||
"data": { |
||||
"resultType": "streams", |
||||
"result": [ |
||||
{ |
||||
"labels": "{filename=\"/var/log/myproject.log\", job=\"varlogs\", level=\"info\"}", |
||||
"entries": [ |
||||
{ |
||||
"ts": "2019-06-06T19:25:41.972739Z", |
||||
"line": "foo" |
||||
}, |
||||
{ |
||||
"ts": "2019-06-06T19:25:41.972722Z", |
||||
"line": "bar" |
||||
} |
||||
] |
||||
} |
||||
] |
||||
} |
||||
} |
||||
``` |
||||
|
||||
- `GET /api/prom/query` |
||||
|
||||
For doing queries, accepts the following parameters in the query-string: |
||||
|
||||
- `query`: a [logQL query](../querying.md) (eg: `{name=~"mysql.+"}` or `{name=~"mysql.+"} |= "error"`) |
||||
- `limit`: max number of entries to return |
||||
- `start`: the start time for the query, as a nanosecond Unix epoch (nanoseconds since 1970) or as RFC3339Nano (eg: "2006-01-02T15:04:05.999999999-07:00"). Default is always one hour ago. |
||||
- `end`: the end time for the query, as a nanosecond Unix epoch (nanoseconds since 1970) or as RFC3339Nano (eg: "2006-01-02T15:04:05.999999999-07:00"). Default is current time. |
||||
- `direction`: `forward` or `backward`, useful when specifying a limit. Default is backward. |
||||
- `regexp`: a regex to filter the returned results |
||||
|
||||
Loki needs to query the index store in order to find log streams for particular labels and the store is spread out by time, |
||||
so you need to specify the start and end labels accordingly. Querying a long time into the history will cause additional |
||||
load to the index server and make the query slower. |
||||
|
||||
> This endpoint will be deprecated in the future you should use `api/v1/query_range` instead. |
||||
> You can only query for logs, it doesn't accept [queries returning metrics](../querying.md#counting-logs). |
||||
|
||||
Responses looks like this: |
||||
|
||||
```json |
||||
{ |
||||
"streams": [ |
||||
{ |
||||
"labels": "{instance=\"...\", job=\"...\", namespace=\"...\"}", |
||||
"entries": [ |
||||
{ |
||||
"ts": "2018-06-27T05:20:28.699492635Z", |
||||
"line": "..." |
||||
}, |
||||
... |
||||
] |
||||
}, |
||||
... |
||||
] |
||||
} |
||||
``` |
||||
|
||||
- `GET /api/prom/label` |
||||
|
||||
For doing label name queries, accepts the following parameters in the query-string: |
||||
|
||||
- `start`: the start time for the query, as a nanosecond Unix epoch (nanoseconds since 1970). Default is always 6 hour ago. |
||||
- `end`: the end time for the query, as a nanosecond Unix epoch (nanoseconds since 1970). Default is current time. |
||||
|
||||
Responses looks like this: |
||||
|
||||
```json |
||||
{ |
||||
"values": [ |
||||
"instance", |
||||
"job", |
||||
... |
||||
] |
||||
} |
||||
``` |
||||
|
||||
- `GET /api/prom/label/<name>/values` |
||||
|
||||
For doing label values queries, accepts the following parameters in the query-string: |
||||
|
||||
- `start`: the start time for the query, as a nanosecond Unix epoch (nanoseconds since 1970). Default is always 6 hour ago. |
||||
- `end`: the end time for the query, as a nanosecond Unix epoch (nanoseconds since 1970). Default is current time. |
||||
|
||||
Responses looks like this: |
||||
|
||||
```json |
||||
{ |
||||
"values": [ |
||||
"default", |
||||
"cortex-ops", |
||||
... |
||||
] |
||||
} |
||||
``` |
||||
|
||||
- `GET /ready` |
||||
|
||||
This endpoint returns 200 when Loki ingester is ready to accept traffic. If you're running Loki on Kubernetes, this endpoint can be used as readiness probe. |
||||
|
||||
- `GET /flush` |
||||
|
||||
This endpoint triggers a flush of all in memory chunks in the ingester. Mainly used for local testing. |
||||
|
||||
- `GET /metrics` |
||||
|
||||
This endpoint returns Loki metrics for Prometheus. See "[Operations > Observability > Metrics](./operations.md)" to have a list of exported metrics. |
||||
|
||||
|
||||
## Examples of using the API in a third-party client library |
||||
|
||||
1) Take a look at this [client](https://github.com/afiskon/promtail-client), but be aware that the API is not stable yet (Golang). |
||||
2) Example on [Python3](https://github.com/sleleko/devops-kb/blob/master/python/push-to-loki.py) |
||||
@ -1,88 +0,0 @@ |
||||
# Operations |
||||
|
||||
This page lists operational aspects of running Loki in alphabetical order: |
||||
|
||||
## Authentication |
||||
|
||||
Loki does not have an authentication layer. |
||||
You are expected to run an authenticating reverse proxy in front of your services, such as an Nginx with basic auth or an OAuth2 proxy. |
||||
See [client options](../promtail/deployment-methods.md#custom-client-options) for more details about supported authentication methods. |
||||
|
||||
### Multi-tenancy |
||||
|
||||
Loki is a multitenant system; requests and data for tenant A are isolated from tenant B. |
||||
Requests to the Loki API should include an HTTP header (`X-Scope-OrgID`) identifying the tenant for the request. |
||||
Tenant IDs can be any alphanumeric string; limiting them to 20 bytes is reasonable. |
||||
To run in multitenant mode, loki should be started with `auth_enabled: true`. |
||||
|
||||
Loki can be run in "single-tenant" mode where the `X-Scope-OrgID` header is not required. |
||||
In this situation, the tenant ID is defaulted to be `fake`. |
||||
|
||||
## Observability |
||||
|
||||
### Metrics |
||||
|
||||
Both Loki and promtail expose a `/metrics` endpoint for Prometheus metrics. |
||||
You need a local Prometheus and make sure it can add your Loki and promtail as targets, [see Prometheus configuration](https://prometheus.io/docs/prometheus/latest/configuration/configuration). |
||||
When Prometheus can scrape Loki and promtail, you get the following metrics: |
||||
|
||||
Loki metrics: |
||||
|
||||
- `log_messages_total` Total number of log messages. |
||||
- `loki_distributor_bytes_received_total` The total number of uncompressed bytes received per tenant. |
||||
- `loki_distributor_lines_received_total` The total number of lines received per tenant. |
||||
- `loki_ingester_streams_created_total` The total number of streams created per tenant. |
||||
- `loki_request_duration_seconds_count` Number of received HTTP requests. |
||||
|
||||
Promtail metrics: |
||||
|
||||
- `promtail_read_bytes_total` Number of bytes read. |
||||
- `promtail_read_lines_total` Number of lines read. |
||||
- `promtail_request_duration_seconds_count` Number of send requests. |
||||
- `promtail_encoded_bytes_total` Number of bytes encoded and ready to send. |
||||
- `promtail_sent_bytes_total` Number of bytes sent. |
||||
- `promtail_dropped_bytes_total` Number of bytes dropped because failed to be sent to the ingester after all retries. |
||||
- `promtail_sent_entries_total` Number of log entries sent to the ingester. |
||||
- `promtail_dropped_entries_total` Number of log entries dropped because failed to be sent to the ingester after all retries. |
||||
|
||||
Most of these metrics are counters and should continuously increase during normal operations: |
||||
|
||||
1. Your app emits a log line to a file tracked by promtail. |
||||
2. Promtail reads the new line and increases its counters. |
||||
3. Promtail forwards the line to a Loki distributor, its received counters should increase. |
||||
4. The Loki distributor forwards it to a Loki ingester, its request duration counter increases. |
||||
|
||||
You can import dashboard with ID [10004](https://grafana.com/dashboards/10004) to see them in Grafana UI. |
||||
|
||||
### Monitoring Mixins |
||||
|
||||
Check out our [Loki mixin](../../production/loki-mixin) for a set of dashboards, recording rules, and alerts. |
||||
These give you a comprehensive package on how to monitor Loki in production. |
||||
|
||||
For more information about mixins, take a look at the [mixins project docs](https://github.com/monitoring-mixins/docs). |
||||
|
||||
## Retention/Deleting old data |
||||
|
||||
Retention in Loki can be done by configuring Table Manager. You need to set a retention period and enable deletes for retention using yaml config as seen [here](https://github.com/grafana/loki/blob/39bbd733be4a0d430986d9513476a91334485e9f/production/ksonnet/loki/config.libsonnet#L128-L129) or using `table-manager.retention-period` and `table-manager.retention-deletes-enabled` command line args. Retention period needs to be a duration in string format that can be parsed using [time.Duration](https://golang.org/pkg/time/#ParseDuration). |
||||
|
||||
**[NOTE]** Retention period should be at least twice the [duration of periodic table config](https://github.com/grafana/loki/blob/347a3e18f4976d799d51a26cee229efbc27ef6c9/production/helm/loki/values.yaml#L53), which currently defaults to 7 days. |
||||
|
||||
In the case of chunks retention when using S3 or GCS, you need to set the expiry policy on the bucket that is configured for storing chunks. For more details check [this](https://docs.aws.amazon.com/AmazonS3/latest/dev/object-lifecycle-mgmt.html) for S3 and [this](https://cloud.google.com/storage/docs/managing-lifecycles) for GCS. |
||||
|
||||
Currently we only support global retention policy. A per user retention policy and API to delete ingested logs is still under development. |
||||
Feel free to add your use case to this [GitHub issue](https://github.com/grafana/loki/issues/162). |
||||
|
||||
A design goal of Loki is that storing logs should be cheap, hence a volume-based deletion API was deprioritized. |
||||
|
||||
Until this feature is released: If you suddenly must delete ingested logs, you can delete old chunks in your object store. |
||||
Note that this will only delete the log content while keeping the label index intact. |
||||
You will still be able to see related labels, but the log retrieval of the deleted log content will no longer work. |
||||
|
||||
## Scalability |
||||
|
||||
See this [blog post](https://grafana.com/blog/2018/12/12/loki-prometheus-inspired-open-source-logging-for-cloud-natives/) on a discussion about Loki's scalability. |
||||
|
||||
When scaling Loki, consider running several Loki processes with their respective roles of ingestor, distributor, and querier. |
||||
Take a look at their respective `.libsonnet` files in [our production setup](../../production/ksonnet/loki) to get an idea about resource usage. |
||||
|
||||
We're happy to get feedback about your resource usage. |
||||
@ -1,86 +0,0 @@ |
||||
## Installation |
||||
|
||||
Loki is provided as pre-compiled binaries, or as a Docker container image. |
||||
|
||||
### Docker container (Recommended) |
||||
If you want to run in a container, use our Docker image: |
||||
```bash |
||||
$ docker pull "grafana/loki:v0.2.0" |
||||
``` |
||||
|
||||
### Binary |
||||
If you want to use plain binaries instead, head over to the |
||||
[Releases](https://github.com/grafana/loki/releases) on GitHub and download the |
||||
most recent one for you operating system and architecture. |
||||
|
||||
Example (Linux, `amd64`), Loki `v0.2.0`: |
||||
```bash |
||||
# download binary (adapt app, os and arch as needed) |
||||
$ curl -fSL -o "/usr/local/bin/loki.gz" "https://github.com/grafana/loki/releases/download/v0.2.0/loki-linux-amd64.gz" |
||||
$ gunzip "/usr/local/bin/loki.gz" |
||||
|
||||
# make sure it is executable |
||||
$ chmod a+x "/usr/local/bin/loki" |
||||
``` |
||||
|
||||
|
||||
## Running |
||||
|
||||
After you have Loki installed, you need to pick between one of the two operation modes: |
||||
|
||||
You can either run all three components (Distributor, Ingester and Querier) |
||||
together as a single fat process, which allows easy operations because you do |
||||
not need to bother about inter-service communication. This is usually |
||||
recommended for most use-cases. |
||||
This still allows to scale horizontally, but keep in mind you scale all three |
||||
services at once. |
||||
|
||||
The other option is the distributed mode, where each component runs on its own |
||||
and communicates with the others using gRPC. |
||||
This especially allows fine-grained control over scaling, because you can scale |
||||
the services individually. |
||||
|
||||
### Single Process |
||||
|
||||
#### `docker-compose` |
||||
To try it out locally, or to run on only a few systems, `docker-compose` is a |
||||
good choice. |
||||
|
||||
Check out the [`docker-compose.yml` file on |
||||
GitHub](https://github.com/grafana/loki/blob/master/production/docker-compose.yaml). |
||||
|
||||
#### `helm` |
||||
If want to quickly get up and running on Kubernetes, `helm` got you covered: |
||||
|
||||
```bash |
||||
# add the loki repository to helm |
||||
$ helm repo add loki https://grafana.github.io/loki/charts |
||||
$ helm update |
||||
``` |
||||
|
||||
You can then choose between deploying the whole stack (Loki and Promtail), or |
||||
each component individually: |
||||
|
||||
```bash |
||||
# whole stack |
||||
$ helm upgrade --install loki loki/loki-stack |
||||
|
||||
# only Loki |
||||
$ helm upgrade --install loki loki/loki |
||||
|
||||
# only Promtail |
||||
helm upgrade --install promtail loki/promtail --set "loki.serviceName=loki" |
||||
``` |
||||
|
||||
Refer to the |
||||
[Chart](https://github.com/grafana/loki/tree/master/production/helm) for more |
||||
information |
||||
|
||||
### Distributed |
||||
Running in distributed mode is currently pretty hard, because of the multitude |
||||
of challenges that one encounters. |
||||
|
||||
We do this internally, but cannot really recommend anyone to try as well. |
||||
If you really want to, you can take a look at our [production |
||||
setup](https://github.com/grafana/loki/tree/master/production/ksonnet). You have |
||||
been warned :P |
||||
@ -1,158 +0,0 @@ |
||||
# Storage |
||||
|
||||
Loki needs to store two different types of data: **Chunks** and **Indexes**. |
||||
|
||||
Loki receives logs in separate streams. Each stream is identified by a set of labels. |
||||
As the log entries from a stream arrive, they are gzipped as chunks and saved in |
||||
the chunks store. The chunk format is documented in [`pkg/chunkenc`](../../pkg/chunkenc/README.md). |
||||
|
||||
On the other hand, the index stores the stream's label set and links them to the |
||||
individual chunks. |
||||
|
||||
### Local storage |
||||
|
||||
By default, Loki stores everything on disk. The index is stored in a BoltDB under |
||||
`/tmp/loki/index` and the chunks are stored under `/tmp/loki/chunks`. |
||||
|
||||
### Google Cloud Storage |
||||
|
||||
Loki supports Google Cloud Storage. Refer to Grafana Labs' |
||||
[production setup](https://github.com/grafana/loki/blob/a422f394bb4660c98f7d692e16c3cc28747b7abd/production/ksonnet/loki/config.libsonnet#L55) |
||||
for the relevant configuration fields. |
||||
|
||||
### Cassandra |
||||
|
||||
Loki can use Cassandra for the index storage. Example config using Cassandra: |
||||
|
||||
```yaml |
||||
schema_config: |
||||
configs: |
||||
- from: 2018-04-15 |
||||
store: cassandra |
||||
object_store: filesystem |
||||
schema: v9 |
||||
index: |
||||
prefix: cassandra_table |
||||
period: 168h |
||||
|
||||
storage_config: |
||||
cassandra: |
||||
username: cassandra |
||||
password: cassandra |
||||
addresses: 127.0.0.1 |
||||
auth: true |
||||
keyspace: lokiindex |
||||
|
||||
filesystem: |
||||
directory: /tmp/loki/chunks |
||||
``` |
||||
|
||||
### AWS S3 & DynamoDB |
||||
|
||||
Example config for using S3 & DynamoDB: |
||||
|
||||
```yaml |
||||
schema_config: |
||||
configs: |
||||
- from: 0 |
||||
store: dynamo |
||||
object_store: s3 |
||||
schema: v9 |
||||
index: |
||||
prefix: dynamodb_table_name |
||||
period: 0 |
||||
storage_config: |
||||
aws: |
||||
s3: s3://access_key:secret_access_key@region/bucket_name |
||||
dynamodbconfig: |
||||
dynamodb: dynamodb://access_key:secret_access_key@region |
||||
``` |
||||
|
||||
If you don't wish to hard-code S3 credentials, you can also configure an |
||||
EC2 instance role by changing the `storage_config` section: |
||||
|
||||
```yaml |
||||
storage_config: |
||||
aws: |
||||
s3: s3://region/bucket_name |
||||
dynamodbconfig: |
||||
dynamodb: dynamodb://region |
||||
``` |
||||
|
||||
#### S3 |
||||
|
||||
Loki can use S3 as object storage, storing logs within directories based on |
||||
the [OrgID](./operations.md#multi-tenancy). For example, logs from the `faker` |
||||
org will be stored in `s3://BUCKET_NAME/faker/`. |
||||
|
||||
The S3 configuration is set up using the URL format: |
||||
`s3://access_key:secret_access_key@region/bucket_name`. |
||||
|
||||
S3-compatible APIs (e.g., Ceph Object Storage with an S3-compatible API) can |
||||
be used. If the API supports path-style URL rather than virtual hosted bucket |
||||
addressing, configure the URL in `storage_config` with the custom endpoint: |
||||
|
||||
```yaml |
||||
storage_config: |
||||
aws: |
||||
s3: s3://access_key:secret_access_key@custom_endpoint/bucket_name |
||||
s3forcepathstyle: true |
||||
``` |
||||
|
||||
Loki needs the following permissions to write to an S3 bucket: |
||||
|
||||
* s3:ListBucket |
||||
* s3:PutObject |
||||
* s3:GetObject |
||||
|
||||
#### DynamoDB |
||||
|
||||
Loki can use DynamoDB for storing the index. The index is used for querying |
||||
logs. Throughput to the index should be adjusted to your usage. |
||||
|
||||
Access to DynamoDB is very similar to S3; however, a table name does not |
||||
need to be specified in the storage section, as Loki calculates that for |
||||
you. The table name prefix will need to be configured inside `schema_config` |
||||
for Loki to be able to create new tables. |
||||
|
||||
DynamoDB can be set up manually or automatically through `table-manager`. |
||||
The `table-manager` allows deleting old indices by rotating a number of |
||||
different DynamoDB tables and deleting the oldest one. An example deployment |
||||
of the `table-manager` using ksonnet can be found |
||||
[here](../../production/ksonnet/loki/table-manager.libsonnet) and more information |
||||
about it can be find at the |
||||
[Cortex project](https://github.com/cortexproject/cortex). |
||||
|
||||
DynamoDB's `table-manager` client defaults provisioning capacity units |
||||
read to 300 and writes to 3000. The defaults can be overwritten in the |
||||
config: |
||||
|
||||
```yaml |
||||
table_manager: |
||||
index_tables_provisioning: |
||||
provisioned_write_throughput: 10 |
||||
provisioned_read_throughput: 10 |
||||
chunk_tables_provisioning: |
||||
provisioned_write_throughput: 10 |
||||
provisioned_read_throughput: 10 |
||||
``` |
||||
|
||||
If DynamoDB is set up manually, old data cannot be easily erased and the index |
||||
will grow indefinitely. Manual configurations should ensure that the primary |
||||
index key is set to `h` (string) and the sort key is set to `r` (binary). The |
||||
"period" attribute in the yaml should be set to zero. |
||||
|
||||
Loki needs the following permissions to write to DynamoDB: |
||||
|
||||
* dynamodb:BatchGetItem |
||||
* dynamodb:BatchWriteItem |
||||
* dynamodb:DeleteItem |
||||
* dynamodb:DescribeTable |
||||
* dynamodb:GetItem |
||||
* dynamodb:ListTagsOfResource |
||||
* dynamodb:PutItem |
||||
* dynamodb:Query |
||||
* dynamodb:TagResource |
||||
* dynamodb:UntagResource |
||||
* dynamodb:UpdateItem |
||||
* dynamodb:UpdateTable |
||||
@ -0,0 +1,5 @@ |
||||
# Loki Maintainers Guide |
||||
|
||||
This section details information for maintainers of Loki. |
||||
|
||||
1. [Releasing Loki](./release.md) |
||||
|
After Width: | Height: | Size: 71 KiB |
@ -0,0 +1,10 @@ |
||||
# Operating Loki |
||||
|
||||
1. [Authentication](authentication.md) |
||||
2. [Observability](observability.md) |
||||
3. [Scalability](scalability.md) |
||||
4. [Storage](storage/README.md) |
||||
1. [Table Manager](storage/table-manager.md) |
||||
2. [Retention](storage/retention.md) |
||||
5. [Multi-tenancy](multi-tenancy.md) |
||||
6. [Loki Canary](loki-canary.md) |
||||
@ -0,0 +1,14 @@ |
||||
# Authentication with Loki |
||||
|
||||
Loki does not come with any included authentication layer. Operators are |
||||
expected to run an authenticating reverse proxy in front of your services, such |
||||
as NGINX using basic auth or an OAuth2 proxy. |
||||
|
||||
Note that when using Loki in multi-tenant mode, Loki requires the HTTP header |
||||
`X-Scopre-OrgID` to be set to a string identifying the user; the responsibility |
||||
of populating this value should be handled by the authenticating reverse proxy. |
||||
For more information on multi-tenancy please read its |
||||
[documentation](multi-tenancy.md). |
||||
|
||||
For information on authenticating promtail, please see the docs for [how to |
||||
configure Promtail](../clients/promtail/configuration.md). |
||||
|
Before Width: | Height: | Size: 55 KiB After Width: | Height: | Size: 55 KiB |
@ -0,0 +1,172 @@ |
||||
# Loki Canary |
||||
|
||||
Loki Canary is a standalone app that audits the log capturing performance of |
||||
Loki. |
||||
|
||||
## How it works |
||||
|
||||
 |
||||
|
||||
Loki Canary writes a log to a file and stores the timestamp in an internal |
||||
array. The contents look something like this: |
||||
|
||||
```nohighlight |
||||
1557935669096040040 ppppppppppppppppppppppppppppppppppppppppppppppppppppppppppp |
||||
``` |
||||
|
||||
The relevant part of the log entry is the timestamp; the `p`s are just filler |
||||
bytes to make the size of the log configurable. |
||||
|
||||
An agent (like Promtail) should be configured to read the log file and ship it |
||||
to Loki. |
||||
|
||||
Meanwhile, Loki Canary will open a WebSocket connection to Loki and will tail |
||||
the logs it creates. When a log is received on the WebSocket, the timestamp |
||||
in the log message is compared to the internal array. |
||||
|
||||
If the received log is: |
||||
|
||||
* The next in the array to be received, it is removed from the array and the |
||||
(current time - log timestamp) is recorded in the `response_latency` |
||||
histogram. This is the expected behavior for well behaving logs. |
||||
* Not the next in the array to be received, it is removed from the array, the |
||||
response time is recorded in the `response_latency` histogram, and the |
||||
`out_of_order_entries` counter is incremented. |
||||
* Not in the array at all, it is checked against a separate list of received |
||||
logs to either increment the `duplicate_entries` counter or the |
||||
`unexpected_entries` counter. |
||||
|
||||
In the background, Loki Canary also runs a timer which iterates through all of |
||||
the entries in the internal array. If any of the entries are older than the |
||||
duration specified by the `-wait` flag (defaulting to 60s), they are removed |
||||
from the array and the `websocket_missing_entries` counter is incremented. An |
||||
additional query is then made directly to Loki for any missing entries to |
||||
determine if they are truly missing or only missing from the WebSocket. If |
||||
missing entries are not found in the direct query, the `missing_entries` counter |
||||
is incremented. |
||||
|
||||
## Installation |
||||
|
||||
### Binary |
||||
|
||||
Loki Canary is provided as a pre-compiled binary as part of the |
||||
[Loki Releases](https://github.com/grafana/loki/releases) on GitHub. |
||||
|
||||
### Docker |
||||
|
||||
Loki Canary is also provided as a Docker container image: |
||||
|
||||
```bash |
||||
# change tag to the most recent release |
||||
$ docker pull grafana/loki-canary:v0.2.0 |
||||
``` |
||||
|
||||
### Kubernetes |
||||
|
||||
To run on Kubernetes, you can do something simple like: |
||||
|
||||
`kubectl run loki-canary --generator=run-pod/v1 |
||||
--image=grafana/loki-canary:latest --restart=Never --image-pull-policy=Never |
||||
--labels=name=loki-canary -- -addr=loki:3100` |
||||
|
||||
Or you can do something more complex like deploy it as a DaemonSet, there is a |
||||
Tanka setup for this in the `production` folder, you can import it using |
||||
`jsonnet-bundler`: |
||||
|
||||
```shell |
||||
jb install github.com/grafana/loki-canary/production/ksonnet/loki-canary |
||||
``` |
||||
|
||||
Then in your Tanka environment's `main.jsonnet` you'll want something like |
||||
this: |
||||
|
||||
```jsonnet |
||||
local loki_canary = import 'loki-canary/loki-canary.libsonnet'; |
||||
|
||||
loki_canary { |
||||
loki_canary_args+:: { |
||||
addr: "loki:3100", |
||||
port: 80, |
||||
labelname: "instance", |
||||
interval: "100ms", |
||||
size: 1024, |
||||
wait: "3m", |
||||
}, |
||||
_config+:: { |
||||
namespace: "default", |
||||
} |
||||
} |
||||
``` |
||||
|
||||
### From Source |
||||
|
||||
If the other options are not sufficient for your use case, you can compile |
||||
`loki-canary` yourself: |
||||
|
||||
```bash |
||||
# clone the source tree |
||||
$ git clone https://github.com/grafana/loki |
||||
|
||||
# build the binary |
||||
$ make loki-canary |
||||
|
||||
# (optionally build the container image) |
||||
$ make loki-canary-image |
||||
``` |
||||
|
||||
## Configuration |
||||
|
||||
The address of Loki must be passed in with the `-addr` flag, and if your Loki |
||||
server uses TLS, `-tls=true` must also be provided. Note that using TLS will |
||||
cause the WebSocket connection to use `wss://` instead of `ws://`. |
||||
|
||||
The `-labelname` and `-labelvalue` flags should also be provided, as these are |
||||
used by Loki Canary to filter the log stream to only process logs for the |
||||
current instance of the canary. Ensure that the values provided to the flags are |
||||
unique to each instance of Loki Canary. Grafana Labs' Tanka config |
||||
accomplishes this by passing in the pod name as the label value. |
||||
|
||||
If Loki Canary reports a high number of `unexpected_entries`, Loki Canary may |
||||
not be waiting long enough and the value for the `-wait` flag should be |
||||
increased to a larger value than 60s. |
||||
|
||||
__Be aware__ of the relationship between `pruneinterval` and the `interval`. |
||||
For example, with an interval of 10ms (100 logs per second) and a prune interval |
||||
of 60s, you will write 6000 logs per minute. If those logs were not received |
||||
over the WebSocket, the canary will attempt to query Loki directly to see if |
||||
they are completely lost. __However__ the query return is limited to 1000 |
||||
results so you will not be able to return all the logs even if they did make it |
||||
to Loki. |
||||
|
||||
__Likewise__, if you lower the `pruneinterval` you risk causing a denial of |
||||
service attack as all your canaries attempt to query for missing logs at |
||||
whatever your `pruneinterval` is defined at. |
||||
|
||||
All options: |
||||
|
||||
```nohighlight |
||||
-addr string |
||||
The Loki server URL:Port, e.g. loki:3100 |
||||
-buckets int |
||||
Number of buckets in the response_latency histogram (default 10) |
||||
-interval duration |
||||
Duration between log entries (default 1s) |
||||
-labelname string |
||||
The label name for this instance of loki-canary to use in the log selector (default "name") |
||||
-labelvalue string |
||||
The unique label value for this instance of loki-canary to use in the log selector (default "loki-canary") |
||||
-pass string |
||||
Loki password |
||||
-port int |
||||
Port which loki-canary should expose metrics (default 3500) |
||||
-pruneinterval duration |
||||
Frequency to check sent vs received logs, also the frequency which queries for missing logs will be dispatched to loki (default 1m0s) |
||||
-size int |
||||
Size in bytes of each log line (default 100) |
||||
-tls |
||||
Does the loki connection use TLS? |
||||
-user string |
||||
Loki username |
||||
-wait duration |
||||
Duration to wait for log entries before reporting them lost (default 1m0s) |
||||
``` |
||||
@ -0,0 +1,15 @@ |
||||
# Loki Multi-Tenancy |
||||
|
||||
Loki is a multi-tenant system; requests and data for tenant A are isolated from |
||||
tenant B. Requests to the Loki API should include an HTTP header |
||||
(`X-Scope-OrgID`) that identifies the tenant for the request. |
||||
|
||||
Tenant IDs can be any alphanumeric string that fits within the Go HTTP header |
||||
limit (1MB). Operators are recommended to use a reasonable limit for uniquely |
||||
identifying tenants; 20 bytes is usually enough. |
||||
|
||||
To run in multi-tenant mode, Loki should be started with `auth_enabled: true`. |
||||
|
||||
Loki can be run in "single-tenant" mode where the `X-Scope-OrgID` header is not |
||||
required. In single-tenant mode, the tenant ID defaults to `fake`. |
||||
|
||||
@ -0,0 +1,87 @@ |
||||
# Observing Loki |
||||
|
||||
Both Loki and Promtail expose a `/metrics` endpoint that expose Prometheus |
||||
metrics. You will need a local Prometheus and add Loki and Promtail as targets. |
||||
See [configuring |
||||
Prometheus](https://prometheus.io/docs/prometheus/latest/configuration/configuration) |
||||
for more information. |
||||
|
||||
All components of Loki expose the following metrics: |
||||
|
||||
| Metric Name | Metric Type | Description | |
||||
| ------------------------------- | ----------- | ---------------------------------------- | |
||||
| `log_messages_total` | Counter | Total number of messages logged by Loki. | |
||||
| `loki_request_duration_seconds` | Histogram | Number of received HTTP requests. | |
||||
|
||||
The Loki Distributors expose the following metrics: |
||||
|
||||
| Metric Name | Metric Type | Description | |
||||
| ------------------------------------------------- | ----------- | ----------------------------------------------------------- | |
||||
| `loki_distributor_ingester_appends_total` | Counter | The total number of batch appends sent to ingesters. | |
||||
| `loki_distributor_ingester_append_failures_total` | Counter | The total number of failed batch appends sent to ingesters. | |
||||
| `loki_distributor_bytes_received_total` | Counter | The total number of uncompressed bytes received per tenant. | |
||||
| `loki_distributor_lines_received_total` | Counter | The total number of lines received per tenant. | |
||||
|
||||
The Loki Ingesters expose the following metrics: |
||||
|
||||
| Metric Name | Metric Type | Description | |
||||
| ----------------------------------------- | ----------- | ------------------------------------------------------------------------------------------- | |
||||
| `cortex_ingester_flush_queue_length` | Gauge | The total number of series pending in the flush queue. | |
||||
| `loki_ingester_chunk_age_seconds` | Histogram | Distribution of chunk ages when flushed. | |
||||
| `loki_ingester_chunk_encode_time_seconds` | Histogram | Distribution of chunk encode times. | |
||||
| `loki_ingester_chunk_entries` | Histogram | Distribution of entires per-chunk when flushed. | |
||||
| `loki_ingester_chunk_size_bytes` | Histogram | Distribution of chunk sizes when flushed. | |
||||
| `loki_ingester_chunk_stored_bytes_total` | Counter | Total bytes stored in chunks per tenant. | |
||||
| `loki_ingester_chunks_created_total` | Counter | The total number of chunks created in the ingester. | |
||||
| `loki_ingester_chunks_flushed_total` | Counter | The total number of chunks flushed by the ingester. | |
||||
| `loki_ingester_chunks_stored_total` | Counter | Total stored chunks per tenant. | |
||||
| `loki_ingester_received_chunks` | Counter | The total number of chunks sent by this ingester whilst joining during the handoff process. | |
||||
| `loki_ingester_samples_per_chunk` | Histogram | The number of samples in a chunk. | |
||||
| `loki_ingester_sent_chunks` | Counter | The total number of chunks sent by this ingester whilst leaving during the handoff process. | |
||||
| `loki_ingester_streams_created_total` | Counter | The total number of streams created per tenant. | |
||||
| `loki_ingester_streams_removed_total` | Counter | The total number of streams removed per tenant. | |
||||
|
||||
Promtail exposes these metrics: |
||||
|
||||
| Metric Name | Metric Type | Description | |
||||
| ----------------------------------------- | ----------- | ------------------------------------------------------------------------------------------ | |
||||
| `promtail_read_bytes_total` | Gauge | Number of bytes read. | |
||||
| `promtail_read_lines_total` | Counter | Number of lines read. | |
||||
| `promtail_dropped_bytes_total` | Counter | Number of bytes dropped because failed to be sent to the ingester after all retries. | |
||||
| `promtail_dropped_entries_total` | Counter | Number of log entries dropped because failed to be sent to the ingester after all retries. | |
||||
| `promtail_encoded_bytes_total` | Counter | Number of bytes encoded and ready to send. | |
||||
| `promtail_file_bytes_total` | Gauge | Number of bytes read from files. | |
||||
| `promtail_files_active_total` | Gauge | Number of active files. | |
||||
| `promtail_log_entries_bytes` | Histogram | The total count of bytes read. | |
||||
| `promtail_request_duration_seconds_count` | Histogram | Number of send requests. | |
||||
| `promtail_sent_bytes_total` | Counter | Number of bytes sent. | |
||||
| `promtail_sent_entries_total` | Counter | Number of log entries sent to the ingester. | |
||||
| `promtail_targets_active_total` | Gauge | Number of total active targets. | |
||||
| `promtail_targets_failed_total` | Counter | Number of total failed targets. | |
||||
|
||||
Most of these metrics are counters and should continuously increase during normal operations: |
||||
|
||||
1. Your app emits a log line to a file that is tracked by Promtail. |
||||
2. Promtail reads the new line and increases its counters. |
||||
3. Promtail forwards the log line to a Loki distributor, where the received |
||||
counters should increase. |
||||
4. The Loki distributor forwards the log line to a Loki ingester, where the |
||||
request duration counter should increase. |
||||
|
||||
If Promtail uses any pipelines with metrics stages, those metrics will also be |
||||
exposed by Promtail at its `/metrics` endpoint. See Promtail's documentation on |
||||
[Pipelines](../clients/promtail/pipelines.md) for more information. |
||||
|
||||
An example Grafana dashboard was built by the community and is available as |
||||
dashboard [10004](https://grafana.com/dashboards/10004). |
||||
|
||||
## Mixins |
||||
|
||||
The Loki repository has a [mixin](../../production/loki-mixin) that includes a |
||||
set of dashboards, recording rules, and alerts. Together, the mixin gives you a |
||||
comprehensive package for monitoring Loki in production. |
||||
|
||||
For more information about mixins, take a look at the docs for the |
||||
[monitoring-mixins project](https://github.com/monitoring-mixins/docs). |
||||
|
||||
|
||||
@ -0,0 +1,11 @@ |
||||
# Scaling with Loki |
||||
|
||||
See this |
||||
[blog post](https://grafana.com/blog/2018/12/12/loki-prometheus-inspired-open-source-logging-for-cloud-natives/) |
||||
on a discussion about Loki's scalability. |
||||
|
||||
When scaling Loki, operators should consider running several Loki processes |
||||
partitioned by role (ingester, distributor, querier) rather than a single Loki |
||||
process. Grafana Labs' [production setup](../../production/ksonnet/loki) |
||||
contains `.libsonnet` files that demonstrates configuring separate components |
||||
and scaling for resource usage. |
||||
@ -0,0 +1,95 @@ |
||||
# Loki Storage |
||||
|
||||
Loki needs to store two different types of data: **chunks** and **indexes**. |
||||
|
||||
Loki receives logs in separate streams, where each stream is uniquely identified |
||||
by its tenant ID and its set of labels. As log entries from a stream arrive, |
||||
they are GZipped as "chunks" and saved in the chunks store. See [chunk |
||||
format](#chunk-format) for how chunks are stored internally. |
||||
|
||||
The **index** stores each stream's label set and links them to the individual |
||||
chunks. |
||||
|
||||
Refer to Loki's [configuration](../../configuration/README.md) for details on |
||||
how to configure the storage and the index. |
||||
|
||||
For more information: |
||||
|
||||
1. [Table Manager](table-manager.md) |
||||
2. [Retention](retention.md) |
||||
|
||||
## Supported Stores |
||||
|
||||
The following are supported for the index: |
||||
|
||||
* [Amazon DynamoDB](https://aws.amazon.com/dynamodb) |
||||
* [Google Bigtable](https://cloud.google.com/bigtable) |
||||
* [Apache Cassandra](https://cassandra.apache.org) |
||||
* [BoltDB](https://github.com/boltdb/bolt) (doesn't work when clustering Loki) |
||||
|
||||
The following are supported for the chunks: |
||||
|
||||
* [Amazon DynamoDB](https://aws.amazon.com/dynamodb) |
||||
* [Google Bigtable](https://cloud.google.com/bigtable) |
||||
* [Apache Cassandra](https://cassandra.apache.org) |
||||
* [Amazon S3](https://aws.amazon.com/s3) |
||||
* [Google Cloud Storage](https://cloud.google.com/storage/) |
||||
* Filesystem (doesn't work when clustering Loki) |
||||
|
||||
## Cloud Storage Permissions |
||||
|
||||
### S3 |
||||
|
||||
When using S3 as object storage, the following permissions are needed: |
||||
|
||||
* `s3:ListBucket` |
||||
* `s3:PutObject` |
||||
* `s3:GetObject` |
||||
|
||||
### DynamoDB |
||||
|
||||
When using DynamoDB for the index, the following permissions are needed: |
||||
|
||||
* `dynamodb:BatchGetItem` |
||||
* `dynamodb:BatchWriteItem` |
||||
* `dynamodb:DeleteItem` |
||||
* `dynamodb:DescribeTable` |
||||
* `dynamodb:GetItem` |
||||
* `dynamodb:ListTagsOfResource` |
||||
* `dynamodb:PutItem` |
||||
* `dynamodb:Query` |
||||
* `dynamodb:TagResource` |
||||
* `dynamodb:UntagResource` |
||||
* `dynamodb:UpdateItem` |
||||
* `dynamodb:UpdateTable` |
||||
|
||||
## Chunk Format |
||||
|
||||
``` |
||||
------------------------------------------------------------------- |
||||
| | | |
||||
| MagicNumber(4b) | version(1b) | |
||||
| | | |
||||
------------------------------------------------------------------- |
||||
| block-1 bytes | checksum (4b) | |
||||
------------------------------------------------------------------- |
||||
| block-2 bytes | checksum (4b) | |
||||
------------------------------------------------------------------- |
||||
| block-n bytes | checksum (4b) | |
||||
------------------------------------------------------------------- |
||||
| #blocks (uvarint) | |
||||
------------------------------------------------------------------- |
||||
| #entries(uvarint) | mint, maxt (varint) | offset, len (uvarint) | |
||||
------------------------------------------------------------------- |
||||
| #entries(uvarint) | mint, maxt (varint) | offset, len (uvarint) | |
||||
------------------------------------------------------------------- |
||||
| #entries(uvarint) | mint, maxt (varint) | offset, len (uvarint) | |
||||
------------------------------------------------------------------- |
||||
| #entries(uvarint) | mint, maxt (varint) | offset, len (uvarint) | |
||||
------------------------------------------------------------------- |
||||
| checksum(from #blocks) | |
||||
------------------------------------------------------------------- |
||||
| metasOffset - offset to the point with #blocks | |
||||
------------------------------------------------------------------- |
||||
``` |
||||
|
||||
@ -0,0 +1,57 @@ |
||||
# Loki Storage Retention |
||||
|
||||
Retention in Loki is achieved through the [Table Manager](./table-manager.md). |
||||
In order to enable the retention support, the Table Manager needs to be |
||||
configured to enable deletions and a retention period. Please refer to the |
||||
[`table_manager_config`](../../configuration/README.md#table_manager_config) |
||||
section of the Loki configuration reference for all available options. |
||||
Alternatively, the `table-manager.retention-period` and |
||||
`table-manager.retention-deletes-enabled` command line flags can be used. The |
||||
provided retention period needs to be a duration represented as a string that |
||||
can be parsed using Go's [time.Duration](https://golang.org/pkg/time/#ParseDuration). |
||||
|
||||
> **WARNING**: The retention period should be at least twice the [duration of |
||||
the periodic table config](https://github.com/grafana/loki/blob/347a3e18f4976d799d51a26cee229efbc27ef6c9/production/helm/loki/values.yaml#L53), which currently defaults to 7 days. |
||||
|
||||
When using S3 or GCS, the bucket storing the chunks needs to have the expiry |
||||
policy set correctly. For more details check |
||||
[S3's documentation](https://docs.aws.amazon.com/AmazonS3/latest/dev/object-lifecycle-mgmt.html) |
||||
or |
||||
[GCS's documentation](https://cloud.google.com/storage/docs/managing-lifecycles). |
||||
|
||||
Currently, the retention policy can only be set globally. A per-tenant retention |
||||
policy with an API to delete ingested logs is still under development. |
||||
|
||||
Since a design goal of Loki is to make storing logs cheap, a volume-based |
||||
deletion API is deprioritized. Until this feature is released, if you suddenly |
||||
must delete ingested logs, you can delete old chunks in your object store. Note, |
||||
however, that this only deletes the log content and keeps the label index |
||||
intact; you will still be able to see related labels but will be unable to |
||||
retrieve the deleted log content. |
||||
|
||||
## Example Configuration |
||||
|
||||
Example configuration with GCS with a 30 day retention: |
||||
|
||||
```yaml |
||||
schema_config: |
||||
configs: |
||||
- from: 2018-04-15 |
||||
store: bigtable |
||||
object_store: gcs |
||||
schema: v9 |
||||
index: |
||||
prefix: loki_index_ |
||||
period: 168h |
||||
|
||||
storage_config: |
||||
bigtable: |
||||
instance: BIGTABLE_INSTANCE |
||||
project: BIGTABLE_PROJECT |
||||
gcs: |
||||
bucket_name: GCS_BUCKET_NAME |
||||
|
||||
table_manager: |
||||
retention_deletes_enabled: true |
||||
retention_period: 720h |
||||
``` |
||||
@ -0,0 +1,33 @@ |
||||
# Table Manager |
||||
|
||||
The Table Manager is used to delete old data past a certain retention period. |
||||
The Table Manager also includes support for automatically provisioning DynamoDB |
||||
tables with autoscaling support. |
||||
|
||||
For detailed information on configuring the Table Manager, refer to the |
||||
[table_manager_config](../../configuration/README.md#table_manager_config) |
||||
section in the Loki configuration document. |
||||
|
||||
## DynamoDB Provisioning |
||||
|
||||
When configuring DynamoDB with the Table Manager, the default [on-demand |
||||
provisioning](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.ReadWriteCapacityMode.html) |
||||
capacity units for reads are set to 300 and writes are set to 3000. The |
||||
defaults can be overwritten: |
||||
|
||||
```yaml |
||||
table_manager: |
||||
index_tables_provisioning: |
||||
provisioned_write_throughput: 10 |
||||
provisioned_read_throughput: 10 |
||||
chunk_tables_provisioning: |
||||
provisioned_write_throughput: 10 |
||||
provisioned_read_throughput: 10 |
||||
``` |
||||
|
||||
If Table Manager is not automatically managing DynamoDB, old data cannot easily |
||||
be erased and the index will grow indefinitely. Manual configurations should |
||||
ensure that the primary index key is set to `h` (string) and the sort key is set |
||||
to `r` (binary). The "period" attribute in the configuration YAML should be set |
||||
to `0`. |
||||
|
||||
@ -0,0 +1,149 @@ |
||||
# Overview of Loki |
||||
|
||||
Grafana Loki is a set of components that can be composed into a fully featured |
||||
logging stack. |
||||
|
||||
Unlike other logging systems, Loki is built around the idea of only indexing |
||||
labels for logs and leaving the original log message unindexed. This means |
||||
that Loki is cheaper to operate and can be orders of magnitude more efficient. |
||||
|
||||
For a more detailed version of this same document, please read |
||||
[Architecture](../architecture.md). |
||||
|
||||
## Multi Tenancy |
||||
|
||||
Loki supports multi-tenancy so that data between tenants is completely |
||||
separated. Multi-tenancy is achieved through a tenant ID (which is represented |
||||
as an alphanumeric string). When multi-tenancy mode is disabled, all requests |
||||
are internally given a tenant ID of "fake". |
||||
|
||||
## Modes of Operation |
||||
|
||||
Loki is optimized for both running locally (or at small scale) and for scaling |
||||
horizontally: Loki comes with a _single process mode_ that runs all of the required |
||||
microservices in one process. The single process mode is great for testing Loki |
||||
or for running it at a small scale. For horizontal scalability, the |
||||
microservices of Loki can be broken out into separate processes, allowing them |
||||
to scale independently of each other. |
||||
|
||||
## Components |
||||
|
||||
### Distributor |
||||
|
||||
The **distributor** service is responsible for handling logs written by |
||||
[clients](../clients/README.md). It's essentially the "first stop" in the write |
||||
path for log data. Once the distributor receives log data, it splits them into |
||||
batches and sends them to multiple [ingesters](#ingester) in parallel. |
||||
|
||||
Distributors communicate with ingesters via [gRPC](https://grpc.io). They are |
||||
stateless and can be scaled up and down as needed. |
||||
|
||||
#### Hashing |
||||
|
||||
Distributors use consistent hashing in conjunction with a configurable |
||||
replication factor to determine which instances of the ingester service should |
||||
receive log data. |
||||
|
||||
The hash is based on a combination of the log's labels and the tenant ID. |
||||
|
||||
A hash ring stored in [Consul](https://www.consul.io) is used to achieve |
||||
consistent hashing; all [ingesters](#ingester) register themselves into the |
||||
hash ring with a set of tokens they own. Distributors then find the token that |
||||
most closely matches the value of the log's hash and will send data to that |
||||
token's owner. |
||||
|
||||
#### Quorum consistency |
||||
|
||||
Since all distributors share access to the same hash ring, write requests can be |
||||
sent to any distributor. |
||||
|
||||
To ensure consistent query results, Loki uses |
||||
[Dynamo-style](https://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf) |
||||
quorum consistency on reads and writes. This means that the distributor will wait |
||||
for a positive response of at least one half plus one of the ingesters to send |
||||
the sample to before responding to the user. |
||||
|
||||
### Ingester |
||||
|
||||
The **ingester** service is responsible for writing log data to long-term |
||||
storage backends (DynamoDB, S3, Cassandra, etc.). |
||||
|
||||
The ingester validates that ingested log lines are received in |
||||
timestamp-ascending order (i.e., each log has a timestamp that occurs at a later |
||||
time than the log before it). When the ingester receives a log that does not |
||||
follow this order, the log line is rejected and an error is returned. |
||||
|
||||
Logs from each unique set of labels are built up into "chunks" in memory and |
||||
then flushed to the backing storage backend. |
||||
|
||||
If an ingester process crashes or exits abruptly, all the data that has not yet |
||||
been flushed will be lost. Loki is usually configured to replicate multiple |
||||
replicas (usually 3) of each log to mitigate this risk. |
||||
|
||||
#### Handoff |
||||
|
||||
By default, when an ingester is shutting down and tries to leave the hash ring, |
||||
it will wait to see if a new ingester tries to enter before flushing and will |
||||
try to initiate a handoff. The handoff will transfer all of the tokens and |
||||
in-memory chunks owned by the leaving ingester to the new ingester. |
||||
|
||||
This process is used to avoid flushing all chunks when shutting down, which is a |
||||
slow process. |
||||
|
||||
#### Filesystem Support |
||||
|
||||
While ingesters do support writing to the filesystem through BoltDB, this only |
||||
works in single-process mode as [queriers](#querier) need access to the same |
||||
back-end store and BoltDB only allows one process to have a lock on the DB at a |
||||
given time. |
||||
|
||||
### Querier |
||||
|
||||
The **querier** service handles the actual [LogQL](../logql.md) evaluation of |
||||
logs stored in long-term storage. |
||||
|
||||
It first tries to query all ingesters for in-memory data before falling back to |
||||
loading data from the backend store. |
||||
|
||||
## Chunk Store |
||||
|
||||
The **chunk store** is Loki's long-term data store, designed to support |
||||
interactive querying and sustained writing without the need for background |
||||
maintenance tasks. It consists of: |
||||
|
||||
* An index for the chunks. This index can be backed by |
||||
[DynamoDB from Amazon Web Services](https://aws.amazon.com/dynamodb), |
||||
[Bigtable from Google Cloud Platform](https://cloud.google.com/bigtable), or |
||||
[Apache Cassandra](https://cassandra.apache.org). |
||||
* A key-value (KV) store for the chunk data itself, which can be DynamoDB, |
||||
Bigtable, Cassandra again, or an object store such as |
||||
[Amazon * S3](https://aws.amazon.com/s3) |
||||
|
||||
> Unlike the other core components of Loki, the chunk store is not a separate |
||||
> service, job, or process, but rather a library embedded in the two services |
||||
> that need to access Loki data: the [ingester](#ingester) and [querier](#querier). |
||||
|
||||
The chunk store relies on a unified interface to the |
||||
"[NoSQL](https://en.wikipedia.org/wiki/NoSQL)" stores (DynamoDB, Bigtable, and |
||||
Cassandra) that can be used to back the chunk store index. This interface |
||||
assumes that the index is a collection of entries keyed by: |
||||
|
||||
* A **hash key**. This is required for *all* reads and writes. |
||||
* A **range key**. This is required for writes and can be omitted for reads, |
||||
which can be queried by prefix or range. |
||||
|
||||
The interface works somewhat differently across the supported databases: |
||||
|
||||
* DynamoDB supports range and hash keys natively. Index entries are thus |
||||
modelled directly as DynamoDB entries, with the hash key as the distribution |
||||
key and the range as the range key. |
||||
* For Bigtable and Cassandra, index entries are modelled as individual column |
||||
values. The hash key becomes the row key and the range key becomes the column |
||||
key. |
||||
|
||||
A set of schemas are used to map the matchers and label sets used on reads and |
||||
writes to the chunk store into appropriate operations on the index. Schemas have |
||||
been added as Loki has evolved, mainly in an attempt to better load balance |
||||
writes and improve query performance. |
||||
|
||||
> The current schema recommendation is the **v10 schema**. |
||||
@ -0,0 +1,51 @@ |
||||
# Loki compared to other log systems |
||||
|
||||
## Loki / Promtail / Grafana vs EFK |
||||
|
||||
The EFK (Elasticsearch, Fluentd, Kibana) stack is used to ingest, visualize, and |
||||
query for logs from various sources. |
||||
|
||||
Data in Elasticsearch is stored on-disk as unstructured JSON objects. Both the |
||||
keys for each object and the contents of each key are indexed. Data can then be |
||||
queried using a JSON object to define a query (called the Query DSL) or through |
||||
the Lucene query language. |
||||
|
||||
In comparison, Loki in single-binary mode can store data on-disk, but in |
||||
horizontally-scalable mode data is stored in a cloud storage system such as S3, |
||||
GCS, or Cassandra. Logs are stored in plaintext form tagged with a set of label |
||||
names and values, where only the label pairs are indexed. This tradeoff makes it |
||||
cheaper to operate than a full index and allows developers to aggressively log |
||||
from their applications. Logs in Loki are queried using [LogQL](../logql.md). |
||||
However, because of this design tradeoff, LogQL queries that filter based on |
||||
content (i.e., text within the log lines) require loading all chunks within the |
||||
search window that match the labels defined in the query. |
||||
|
||||
Fluentd is usually used to collect and forward logs to Elasticsearch. Fluentd is |
||||
called a data collector which can ingest logs from many sources, process it, and |
||||
forward it to one or more targets. |
||||
|
||||
In comparison, Promtail's use case is specifically tailored to Loki. Its main mode |
||||
of operation is to discover log files stored on disk and forward them associated |
||||
with a set of labels to Loki. Promtail can do service discovery for Kubernetes |
||||
pods running on the same node as Promtail, act as a container sidecar or a |
||||
Docker logging driver, read logs from specified folders, and tail the systemd |
||||
journal. |
||||
|
||||
The way Loki represents logs by a set of label pairs is similar to how |
||||
[Prometheus](https://prometheus.io) represents metrics. When deployed in an |
||||
environment alongside Prometheus, logs from Promtail usually have the same |
||||
labels as your applications metrics thanks to using the same service |
||||
discovery mechanisms. Having logs and metrics with the same levels enables users |
||||
to seamlessly context switch between metrics and logs, helping with root cause |
||||
analysis. |
||||
|
||||
Kibana is used to visualize and search Elasticsearch data and is very powerful |
||||
for doing analytics on that data. Kibana provides many visualization tools to do |
||||
data analysis, such as location maps, machine learning for anomaly detection, |
||||
and graphs to discover relationships in data. Alerts can be configured to notify |
||||
users when an unexpected condition occurs. |
||||
|
||||
In comparison, Grafana is tailored specifically towards time series data from |
||||
sources like Prometheus and Loki. Dashboards can be set up to visualize metrics |
||||
(log support coming soon) and an explore view can be used to make ad-hoc queries |
||||
against your data. Like Kibana, Grafana supports alerting based on your metrics. |
||||
@ -1,115 +0,0 @@ |
||||
# Promtail |
||||
|
||||
* [Scrape Configs](#scrape-configs) |
||||
* [Entry Parsing](#entry-parser) |
||||
* [Deployment Methods](./promtail/deployment-methods.md) |
||||
* [Promtail API](./promtail/api.md) |
||||
* [Config and Usage Examples](./promtail/config-examples.md) |
||||
* [Failure modes](./promtail/known-failure-modes.md) |
||||
* [Troubleshooting](./troubleshooting.md) |
||||
|
||||
## Scrape Configs |
||||
|
||||
Promtail is an agent which reads log files and sends streams of log data to |
||||
the centralised Loki instances along with a set of labels. For example if you are running Promtail in Kubernetes |
||||
then each container in a single pod will usually yield a single log stream with a set of labels |
||||
based on that particular pod Kubernetes labels. You can also run Promtail outside Kubernetes, but you would |
||||
then need to customise the `scrape_configs` for your particular use case. |
||||
|
||||
The way how Promtail finds out the log locations and extracts the set of labels is by using the *`scrape_configs`* |
||||
section in the Promtail yaml configuration. The syntax is the same what Prometheus uses. |
||||
|
||||
The `scrape_config`s contains one or more *entries* which are all executed for each container in each new pod running |
||||
in the instance. If more than one entry matches your logs you will get duplicates as the logs are sent in more than |
||||
one stream, likely with a slightly different labels. Everything is based on different labels. |
||||
The term "label" here is used in more than one different way and they can be easily confused. |
||||
|
||||
* Labels starting with `__` (two underscores) are internal labels. They are not stored to the loki index and are |
||||
invisible after Promtail. They "magically" appear from different sources. |
||||
* Labels starting with `__meta_kubernetes_pod_label_*` are "meta labels" which are generated based on your kubernetes |
||||
pod labels. Example: If your kubernetes pod has a label "name" set to "foobar" then the scrape_configs section |
||||
will have a label `__meta_kubernetes_pod_label_name` with value set to "foobar". |
||||
* There are other `__meta_kubernetes_*` labels based on the Kubernetes metadata, such as the namespace the pod is |
||||
running (`__meta_kubernetes_namespace`) or the name of the container inside the pod (`__meta_kubernetes_pod_container_name`) |
||||
* The label `__path__` is a special label which Promtail will read to find out where the log files are to be read in. |
||||
* The label `filename` is added for every file found in `__path__` to ensure uniqueness of the streams. It contains the absolute path of the file being tailed. |
||||
|
||||
The most important part of each entry is the `relabel_configs` which are a list of operations which creates, |
||||
renames, modifies or alters labels. A single `scrape_config` can also reject logs by doing an `action: drop` if |
||||
a label value matches a specified regex, which means that this particular `scrape_config` will not forward logs |
||||
from a particular log source, but another scrape_config might. |
||||
|
||||
Many of the scrape_configs read labels from `__meta_kubernetes_*` meta-labels, assign them to intermediate labels |
||||
such as `__service__` based on a few different logic, possibly drop the processing if the `__service__` was empty |
||||
and finally set visible labels (such as "job") based on the `__service__` label. |
||||
|
||||
In general, all of the default Promtail scrape_configs do the following: |
||||
* They read pod logs from under /var/log/pods/$1/*.log. |
||||
* They set "namespace" label directly from the `__meta_kubernetes_namespace.` |
||||
* They expect to see your pod name in the "name" label |
||||
* They set a "job" label which is roughly "your namespace/your job name" |
||||
|
||||
### Idioms and examples on different `relabel_configs:` |
||||
|
||||
* Drop the processing if a label is empty: |
||||
```yaml |
||||
- action: drop |
||||
regex: ^$ |
||||
source_labels: |
||||
- __service__ |
||||
``` |
||||
* Drop the processing if any of these labels contains a value: |
||||
```yaml |
||||
- action: drop |
||||
regex: .+ |
||||
separator: '' |
||||
source_labels: |
||||
- __meta_kubernetes_pod_label_name |
||||
- __meta_kubernetes_pod_label_app |
||||
``` |
||||
* Rename a metadata label into another so that it will be visible in the final log stream: |
||||
```yaml |
||||
- action: replace |
||||
source_labels: |
||||
- __meta_kubernetes_namespace |
||||
target_label: namespace |
||||
``` |
||||
* Convert all of the Kubernetes pod labels into visible labels: |
||||
```yaml |
||||
- action: labelmap |
||||
regex: __meta_kubernetes_pod_label_(.+) |
||||
``` |
||||
|
||||
|
||||
Additional reading: |
||||
* https://www.slideshare.net/roidelapluie/taking-advantage-of-prometheus-relabeling-109483749 |
||||
|
||||
## Entry parser |
||||
|
||||
### Overview |
||||
|
||||
Each job can be configured with a `pipeline_stages` to parse and mutate your log entry. |
||||
This allows you to add more labels, correct the timestamp or entirely rewrite the log line sent to Loki. |
||||
|
||||
> Rewriting labels by parsing the log entry should be done with caution, this could increase the cardinality |
||||
> of streams created by Promtail. |
||||
|
||||
Aside from mutating the log entry, pipeline stages can also generate metrics which could be useful in situation where you can't instrument an application. |
||||
|
||||
See [Processing Log Lines](./logentry/processing-log-lines.md) for a detailed pipeline description |
||||
|
||||
#### Labels |
||||
|
||||
[The original design doc](./design-documents/labels.md) for labels. Post implementation we have strayed quit a bit from the config examples, though the pipeline idea was maintained. |
||||
|
||||
See the [pipeline label docs](./logentry/processing-log-lines.md#labels) for more info on creating labels from log content. |
||||
|
||||
#### Metrics |
||||
|
||||
Metrics can also be extracted from log line content as a set of Prometheus metrics. Metrics are exposed on the path `/metrics` in promtail. By default a log size histogram (`log_entries_bytes_bucket`) per stream is computed. This means you don't need to create metrics to count status code or log level, simply parse the log entry and add them to the labels. All custom metrics are prefixed with `promtail_custom_`. |
||||
|
||||
There are three [Prometheus metric types](https://prometheus.io/docs/concepts/metric_types/) available. |
||||
|
||||
`Counter` and `Gauge` record metrics for each line parsed by adding the value. While `Histograms` observe sampled values by `buckets`. |
||||
|
||||
See the [pipeline metric docs](./logentry/processing-log-lines.md#metrics) for more info on creating metrics from log content. |
||||
@ -1,45 +0,0 @@ |
||||
# Overview |
||||
|
||||
Promtail is an agent which ships the content of local log files to Loki. It is |
||||
usually deployed to every machine that has applications needed to be monitored. |
||||
|
||||
It primarily **discovers** targets, attaches **labels** to log streams and |
||||
**pushes** them to the Loki instance. |
||||
|
||||
### Discovery |
||||
|
||||
Before Promtail is able to ship anything to Loki, it needs to find about its |
||||
environment. This specifically means discovering applications emitting log lines |
||||
that need to be monitored. |
||||
|
||||
Promtail borrows the [service discovery mechanism from |
||||
Prometheus](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config), |
||||
although it currently only supports `static` and `kubernetes` service discovery. |
||||
This is due to the fact that `promtail` is deployed as a daemon to every local |
||||
machine and does not need to discover labels from other systems. `kubernetes` |
||||
service discovery fetches required labels from the api-server, `static` usually |
||||
covers the other use cases. |
||||
|
||||
Just like Prometheus, `promtail` is configured using a `scrape_configs` stanza. |
||||
`relabel_configs` allows fine-grained control of what to ingest, what to drop |
||||
and the final metadata attached to the log line. Refer to the |
||||
[configuration](configuration.md) for more details. |
||||
|
||||
### Labeling and Parsing |
||||
|
||||
During service discovery, metadata is determined (pod name, filename, etc.) that |
||||
may be attached to the log line as a label for easier identification afterwards. |
||||
Using `relabel_configs`, those discovered labels can be mutated into the form |
||||
they should have for querying. |
||||
|
||||
To allow more sophisticated filtering afterwards, Promtail allows to set labels |
||||
not only from service discovery, but also based on the contents of the log |
||||
lines. The so-called `pipeline_stages` can be used to add or update labels, |
||||
correct the timestamp or rewrite the log line entirely. Refer to the [logentry |
||||
processing documentation](../logentry/processing-log-lines.md) for more details. |
||||
|
||||
### Shipping |
||||
|
||||
Once Promtail is certain about what to ingest and all labels are set correctly, |
||||
it starts *tailing* (continuously reading) the log files from the applications. |
||||
Once enough data is read into memory, it is flushed in as a batch to Loki. |
||||
@ -1,22 +0,0 @@ |
||||
# API |
||||
|
||||
Promtail features an embedded web server exposing a web console at `/` and the following API endpoints: |
||||
|
||||
### `GET /ready` |
||||
|
||||
This endpoint returns 200 when Promtail is up and running, and there's at least one working target. |
||||
|
||||
### `GET /metrics` |
||||
|
||||
This endpoint returns Promtail metrics for Prometheus. See "[Operations > Observability > Metrics](../loki/operations.md)" to have a list of exported metrics. |
||||
|
||||
|
||||
## Promtail web server config |
||||
|
||||
The web server exposed by Promtail can be configured in the promtail `.yaml` config file: |
||||
|
||||
``` |
||||
server: |
||||
http_listen_host: 127.0.0.1 |
||||
http_listen_port: 9080 |
||||
``` |
||||
@ -1,141 +0,0 @@ |
||||
# Promtail Config Examples |
||||
|
||||
## Pipeline Examples |
||||
|
||||
[Pipeline Docs](../logentry/processing-log-lines.md) contains detailed documentation of the pipeline stages |
||||
|
||||
## Simple Docker Config |
||||
|
||||
This example of config promtail based on original docker [config](https://github.com/grafana/loki/blob/master/cmd/promtail/promtail-docker-config.yaml) |
||||
and show how work with 2 and more sources: |
||||
|
||||
Filename for example: my-docker-config.yaml |
||||
```yaml |
||||
server: |
||||
http_listen_port: 9080 |
||||
grpc_listen_port: 0 |
||||
|
||||
positions: |
||||
filename: /tmp/positions.yaml |
||||
|
||||
client: |
||||
url: http://ip_or_hostname_where_Loki_run:3100/api/prom/push |
||||
|
||||
scrape_configs: |
||||
- job_name: system |
||||
pipeline_stages: |
||||
- docker: |
||||
static_configs: |
||||
- targets: |
||||
- localhost |
||||
labels: |
||||
job: varlogs |
||||
host: yourhost |
||||
__path__: /var/log/*.log |
||||
|
||||
- job_name: someone_service |
||||
pipeline_stages: |
||||
- docker: |
||||
static_configs: |
||||
- targets: |
||||
- localhost |
||||
labels: |
||||
job: someone_service |
||||
host: yourhost |
||||
__path__: /srv/log/someone_service/*.log |
||||
|
||||
``` |
||||
|
||||
#### Description |
||||
|
||||
`scrape_config` section of `config.yaml` contents contains various jobs for parsing your logs |
||||
|
||||
`job` and `host` are examples of static labels added to all logs, labels are indexed by Loki and are used to help search logs. |
||||
|
||||
`__path__` it is path to directory where stored your logs. |
||||
|
||||
If you run promtail and this `config.yaml` in Docker container, don't forget use docker volumes for mapping real directories |
||||
with log to those folders in the container. |
||||
|
||||
#### Example Use |
||||
1) Create folder, for example `promtail`, then new sub directory `build/conf` and place there `my-docker-config.yaml`. |
||||
2) Create new Dockerfile in root folder `promtail`, with contents |
||||
```dockerfile |
||||
FROM grafana/promtail:latest |
||||
COPY build/conf /etc/promtail |
||||
``` |
||||
3) Create your Docker image based on original Promtail image and tag it, for example `mypromtail-image` |
||||
3) After that you can run Docker container by this command: |
||||
`docker run -d --name promtail --network loki_network -p 9080:9080 -v /var/log:/var/log -v /srv/log/someone_service:/srv/log/someone_service mypromtail-image -config.file=/etc/promtail/my-docker-config.yaml` |
||||
|
||||
## Simple Systemd Journal Config |
||||
|
||||
This example demonstrates how to configure promtail to listen to systemd journal |
||||
entries and write them to Loki: |
||||
|
||||
Filename for example: my-systemd-journal-config.yaml |
||||
|
||||
```yaml |
||||
server: |
||||
http_listen_port: 9080 |
||||
grpc_listen_port: 0 |
||||
|
||||
positions: |
||||
filename: /tmp/positions.yaml |
||||
|
||||
clients: |
||||
- url: http://ip_or_hostname_where_loki_runns:3100/api/prom/push |
||||
|
||||
scrape_configs: |
||||
- job_name: journal |
||||
journal: |
||||
max_age: 12h |
||||
path: /var/log/journal |
||||
labels: |
||||
job: systemd-journal |
||||
relabel_configs: |
||||
- source_labels: ['__journal__systemd_unit'] |
||||
target_label: 'unit' |
||||
``` |
||||
|
||||
### Description |
||||
|
||||
Just like the Docker example, the `scrape_configs` sections holds various |
||||
jobs for parsing logs. A job with a `journal` key configures it for systemd |
||||
journal reading. |
||||
|
||||
`max_age` is an optional string specifying the earliest entry that will be |
||||
read. If unspecified, `max_age` defaults to `7h`. Even if the position in the |
||||
journal is saved, if the entry corresponding to that position is older than |
||||
the max_age, the position won't be used. |
||||
|
||||
`path` is an optional string specifying the path to read journal entries |
||||
from. If unspecified, defaults to the system default (`/var/log/journal`). |
||||
|
||||
`labels`: is a map of string values specifying labels that should always |
||||
be associated with each log entry being read from the systemd journal. |
||||
In our example, each log will have a label of `job=systemd-journal`. |
||||
|
||||
Every field written to the systemd journal is available for processing |
||||
in the `relabel_configs` section. Label names are converted to lowercase |
||||
and prefixed with `__journal_`. After `relabel_configs` processes all |
||||
labels for a job entry, any label starting with `__` is deleted. |
||||
|
||||
Our example renames the `_SYSTEMD_UNIT` label (available as |
||||
`__journal__systemd_unit` in promtail) to `unit` so it will be available |
||||
in Loki. All other labels from the journal entry are dropped. |
||||
|
||||
### Example Use |
||||
|
||||
`promtail` must have access to the journal path (`/var/log/journal`) |
||||
where journal entries are stored and the machine ID (`/etc/machine-id`) for |
||||
journal support to work correctly. |
||||
|
||||
If running with Docker, that means to bind those paths: |
||||
|
||||
```bash |
||||
docker run -d --name promtail --network loki_network -p 9080:9080 \ |
||||
-v /var/log/journal:/var/log/journal \ |
||||
-v /etc/machine-id:/etc/machine-id \ |
||||
mypromtail-image -config.file=/etc/promtail/my-systemd-journal-config.yaml |
||||
``` |
||||
@ -1,185 +0,0 @@ |
||||
# Configuration |
||||
|
||||
## `scrape_configs` (Target Discovery) |
||||
The way how Promtail finds out the log locations and extracts the set of labels |
||||
is by using the `scrape_configs` section in the `promtail.yaml` configuration |
||||
file. The syntax is equal to what [Prometheus |
||||
uses](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config). |
||||
|
||||
The `scrape_configs` contains one or more *entries* which are all executed for |
||||
each discovered target (read each container in each new pod running in the instance): |
||||
```yaml |
||||
scrape_configs: |
||||
- job_name: local |
||||
static_configs: |
||||
- ... |
||||
|
||||
- job_name: kubernetes |
||||
kubernetes_sd_config: |
||||
- ... |
||||
``` |
||||
|
||||
If more than one entry matches your logs, you will get duplicates as the logs are |
||||
sent in more than one stream, likely with a slightly different labels. |
||||
|
||||
There are different types of labels present in Promtail: |
||||
|
||||
* Labels starting with `__` (two underscores) are internal labels. They usually |
||||
come from dynamic sources like the service discovery. Once relabeling is done, |
||||
they are removed from the label set. To persist those, rename them to |
||||
something not starting with `__`. |
||||
* Labels starting with `__meta_kubernetes_pod_label_*` are "meta labels" which |
||||
are generated based on your kubernetes pod labels. |
||||
Example: If your kubernetes pod has a label `name` set to `foobar` then the |
||||
`scrape_configs` section will have a label `__meta_kubernetes_pod_label_name` |
||||
with value set to `foobar`. |
||||
* There are other `__meta_kubernetes_*` labels based on the Kubernetes |
||||
metadadata, such as the namespace the pod is running in |
||||
(`__meta_kubernetes_namespace`) or the name of the container inside the pod |
||||
(`__meta_kubernetes_pod_container_name`). Refer to [the Prometheus |
||||
docs](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config) |
||||
for the full list. |
||||
* The label `__path__` is a special label which Promtail will use afterwards to |
||||
figure out where the file to be read is located. Wildcards are allowed. |
||||
* The label `filename` is added for every file found in `__path__` to ensure |
||||
uniqueness of the streams. It contains the absolute path of the file the line |
||||
was read from. |
||||
|
||||
## `relabel_configs` (Relabeling) |
||||
The most important part of each entry is the `relabel_configs` stanza, which is a list |
||||
of operations to create, rename, modify or alter the labels. |
||||
|
||||
A single `scrape_config` can also reject logs by doing an `action: drop` if a label value |
||||
matches a specified regex, which means that this particular `scrape_config` will |
||||
not forward logs from a particular log source. |
||||
This does not mean that other `scrape_config`'s might not do, though. |
||||
|
||||
Many of the `scrape_configs` read labels from `__meta_kubernetes_*` meta-labels, |
||||
assign them to intermediate labels such as `__service__` based on |
||||
different logic, possibly drop the processing if the `__service__` was empty |
||||
and finally set visible labels (such as `job`) based on the `__service__` |
||||
label. |
||||
|
||||
In general, all of the default Promtail `scrape_configs` do the following: |
||||
|
||||
* They read pod logs from under `/var/log/pods/$1/*.log`. |
||||
* They set `namespace` label directly from the `__meta_kubernetes_namespace`. |
||||
* They expect to see your pod name in the `name` label |
||||
* They set a `job` label which is roughly `namespace/job` |
||||
|
||||
#### Examples |
||||
|
||||
* Drop the processing if a label is empty: |
||||
```yaml |
||||
- action: drop |
||||
regex: ^$ |
||||
source_labels: |
||||
- __service__ |
||||
``` |
||||
* Drop the processing if any of these labels contains a value: |
||||
```yaml |
||||
- action: drop |
||||
regex: .+ |
||||
separator: '' |
||||
source_labels: |
||||
- __meta_kubernetes_pod_label_name |
||||
- __meta_kubernetes_pod_label_app |
||||
``` |
||||
* Rename a metadata label into another so that it will be visible in the final log stream: |
||||
```yaml |
||||
- action: replace |
||||
source_labels: |
||||
- __meta_kubernetes_namespace |
||||
target_label: namespace |
||||
``` |
||||
* Convert all of the Kubernetes pod labels into visible labels: |
||||
```yaml |
||||
- action: labelmap |
||||
regex: __meta_kubernetes_pod_label_(.+) |
||||
``` |
||||
|
||||
|
||||
Additional reading: |
||||
|
||||
* [Julien Pivotto's slides from PromConf Munich, 2017](https://www.slideshare.net/roidelapluie/taking-advantage-of-prometheus-relabeling-109483749) |
||||
|
||||
## `client_option` (HTTP Client) |
||||
Promtail uses the Prometheus HTTP client implementation for all calls to Loki. |
||||
Therefore, you can configure it using the `client` stanza: |
||||
```yaml |
||||
client: [ <client_option> ] |
||||
``` |
||||
|
||||
Reference for `client_option`: |
||||
```yaml |
||||
# Sets the `url` of loki api push endpoint |
||||
url: http[s]://<host>:<port>/api/prom/push |
||||
|
||||
# Sets the `Authorization` header on every promtail request with the |
||||
# configured username and password. |
||||
# password and password_file are mutually exclusive. |
||||
basic_auth: |
||||
username: <string> |
||||
password: <secret> |
||||
password_file: <string> |
||||
|
||||
# Sets the `Authorization` header on every promtail request with |
||||
# the configured bearer token. It is mutually exclusive with `bearer_token_file`. |
||||
bearer_token: <secret> |
||||
|
||||
# Sets the `Authorization` header on every promtail request with the bearer token |
||||
# read from the configured file. It is mutually exclusive with `bearer_token`. |
||||
bearer_token_file: /path/to/bearer/token/file |
||||
|
||||
# Configures the promtail request's TLS settings. |
||||
tls_config: |
||||
# CA certificate to validate API server certificate with. |
||||
# If not provided Trusted CA from system will be used. |
||||
ca_file: <filename> |
||||
|
||||
# Certificate and key files for client cert authentication to the server. |
||||
cert_file: <filename> |
||||
key_file: <filename> |
||||
|
||||
# ServerName extension to indicate the name of the server. |
||||
# https://tools.ietf.org/html/rfc4366#section-3.1 |
||||
server_name: <string> |
||||
|
||||
# Disable validation of the server certificate. |
||||
insecure_skip_verify: <boolean> |
||||
|
||||
# Optional proxy URL. |
||||
proxy_url: <string> |
||||
|
||||
# Maximum wait period before sending batch |
||||
batchwait: 1s |
||||
|
||||
# Maximum batch size to accrue before sending, unit is byte |
||||
batchsize: 102400 |
||||
|
||||
# Maximum time to wait for server to respond to a request |
||||
timeout: 10s |
||||
|
||||
backoff_config: |
||||
# Initial backoff time between retries |
||||
minbackoff: 100ms |
||||
# Maximum backoff time between retries |
||||
maxbackoff: 5s |
||||
# Maximum number of retires when sending batches, 0 means infinite retries |
||||
maxretries: 5 |
||||
|
||||
# The labels to add to any time series or alerts when communicating with loki |
||||
external_labels: {} |
||||
``` |
||||
|
||||
#### Ship to multiple Loki Servers |
||||
Promtail is able to push logs to as many different Loki servers as you like. Use |
||||
`clients` instead of `client` if needed: |
||||
```yaml |
||||
# Single Loki |
||||
client: [ <client_option> ] |
||||
|
||||
# Multiple Loki instances |
||||
clients: |
||||
- [ <client_option> ] |
||||
``` |
||||
@ -1,253 +0,0 @@ |
||||
# Promtail Deployment Methods |
||||
|
||||
## Daemonset method |
||||
|
||||
Daemonset will deploy `promtail` on every node within the Kubernetes cluster. |
||||
|
||||
Daemonset deployment is great to collect all of the container logs within the |
||||
cluster. It is great solution for single tenant. All of the logs will send to a |
||||
single Loki server. |
||||
|
||||
Check the `production` folder for examples of a daemonset deployment for kubernetes using both helm and ksonnet. |
||||
|
||||
### Example |
||||
|
||||
```yaml |
||||
---Daemonset.yaml |
||||
apiVersion: extensions/v1beta1 |
||||
kind: Daemonset |
||||
metadata: |
||||
name: promtail-daemonset |
||||
... |
||||
spec: |
||||
... |
||||
template: |
||||
spec: |
||||
serviceAccount: SERVICE_ACCOUNT |
||||
serviceAccountName: SERVICE_ACCOUNT |
||||
volumes: |
||||
- name: logs |
||||
hostPath: HOST_PATH |
||||
- name: promtail-config |
||||
configMap |
||||
name: promtail-configmap |
||||
containers: |
||||
- name: promtail-container |
||||
args: |
||||
- -config.file=/etc/promtail/promtail.yaml |
||||
volumeMounts: |
||||
- name: logs |
||||
mountPath: MOUNT_PATH |
||||
- name: promtail-config |
||||
mountPath: /etc/promtail |
||||
... |
||||
|
||||
---configmap.yaml |
||||
apiVersion: v1 |
||||
kind: ConfigMap |
||||
metadata: |
||||
name: promtail-config |
||||
... |
||||
data: |
||||
promtail.yaml: YOUR CONFIG |
||||
|
||||
---Clusterrole.yaml |
||||
apiVersion: rbac.authorization.k8s.io/v1 |
||||
kind: ClusterRole |
||||
metadata: |
||||
name: promtail-clusterrole |
||||
rules: |
||||
- apiGroups: |
||||
resources: |
||||
- nodes |
||||
- services |
||||
- pod |
||||
verbs: |
||||
- get |
||||
- watch |
||||
- list |
||||
---ServiceAccount.yaml |
||||
apiVersion: v1 |
||||
kind: ServiceAccount |
||||
metadata: |
||||
name: promtail-serviceaccount |
||||
|
||||
---Rolebinding |
||||
apiVersion: rbac.authorization.k9s.io/v1 |
||||
kind: ClusterRoleBinding |
||||
metadata: |
||||
name: promtail-clusterrolebinding |
||||
subjects: |
||||
- kind: ServiceAccount |
||||
name: promtail-serviceaccount |
||||
roleRef: |
||||
kind: ClusterRole |
||||
name: promtail-clusterrole |
||||
apiGroup: rbac.authorization.k8s.io |
||||
``` |
||||
|
||||
## Sidecar Method |
||||
|
||||
Sidecar method will deploy `promtail` as a container within a pod that |
||||
developer/devops create. |
||||
|
||||
This method will deploy `promtail` as a sidecar container within a pod. |
||||
In a multi-tenant environment, this enables teams to aggregate logs |
||||
for specific pods and deployments for example for all pods in a namespace. |
||||
|
||||
### Example |
||||
|
||||
```yaml |
||||
---Deployment.yaml |
||||
apiVersion: extensions/v1beta1 |
||||
kind: Deployment |
||||
metadata: |
||||
name: my_test_app |
||||
... |
||||
spec: |
||||
... |
||||
template: |
||||
spec: |
||||
serviceAccount: SERVICE_ACCOUNT |
||||
serviceAccountName: SERVICE_ACCOUNT |
||||
volumes: |
||||
- name: logs |
||||
hostPath: HOST_PATH |
||||
- name: promtail-config |
||||
configMap |
||||
name: promtail-configmap |
||||
containers: |
||||
- name: promtail-container |
||||
args: |
||||
- -config.file=/etc/promtail/promtail.yaml |
||||
volumeMounts: |
||||
- name: logs |
||||
mountPath: MOUNT_PATH |
||||
- name: promtail-config |
||||
mountPath: /etc/promtail |
||||
... |
||||
... |
||||
|
||||
``` |
||||
|
||||
### Custom Log Paths |
||||
|
||||
Sometime application create customized log files. To collect those logs, you |
||||
would need to have a customized `__path__` in your scrape_config. |
||||
|
||||
Right now, the best way to watch and tail custom log path is define log file path |
||||
as a label for the pod. |
||||
|
||||
#### Example |
||||
|
||||
```yaml |
||||
---Deployment.yaml |
||||
apiVersion: extensions/v1beta1 |
||||
kind: Deployment |
||||
metadata: |
||||
name: test-app-deployment |
||||
namespace: your_namespace |
||||
labels: |
||||
logFileName: my_app_log |
||||
... |
||||
|
||||
---promtail_config.yaml |
||||
... |
||||
scrape_configs: |
||||
... |
||||
- job_name: job_name |
||||
kubernetes_sd_config: |
||||
- role: pod |
||||
relabel_config: |
||||
... |
||||
- action: replace |
||||
target_label: __path__ |
||||
source_labes: |
||||
- __meta_kubernetes_pod_label_logFileName |
||||
replacement: /your_log_file_dir/$1.log |
||||
... |
||||
``` |
||||
|
||||
### Custom Client options |
||||
|
||||
`promtail` client configuration uses the [Prometheus http client](https://godoc.org/github.com/prometheus/common/config) implementation. |
||||
Therefore you can configure the following authentication parameters in the `client` or `clients` section. |
||||
|
||||
```yaml |
||||
---promtail_config.yaml |
||||
... |
||||
|
||||
# Simple client |
||||
client: |
||||
[ <client_option> ] |
||||
|
||||
# Multiple clients |
||||
clients: |
||||
[ - <client_option> ] |
||||
... |
||||
``` |
||||
|
||||
>Note: Passing the `-client.url` from command line is only valid if you set the `client` section. |
||||
|
||||
#### `<client_option>` |
||||
|
||||
```yaml |
||||
# Sets the `url` of loki api push endpoint |
||||
url: http[s]://<host>:<port>/api/prom/push |
||||
|
||||
# Sets the `Authorization` header on every promtail request with the |
||||
# configured username and password. |
||||
# password and password_file are mutually exclusive. |
||||
basic_auth: |
||||
username: <string> |
||||
password: <secret> |
||||
password_file: <string> |
||||
|
||||
# Sets the `Authorization` header on every promtail request with |
||||
# the configured bearer token. It is mutually exclusive with `bearer_token_file`. |
||||
bearer_token: <secret> |
||||
|
||||
# Sets the `Authorization` header on every promtail request with the bearer token |
||||
# read from the configured file. It is mutually exclusive with `bearer_token`. |
||||
bearer_token_file: /path/to/bearer/token/file |
||||
|
||||
# Configures the promtail request's TLS settings. |
||||
tls_config: |
||||
# CA certificate to validate API server certificate with. |
||||
# If not provided Trusted CA from system will be used. |
||||
ca_file: <filename> |
||||
|
||||
# Certificate and key files for client cert authentication to the server. |
||||
cert_file: <filename> |
||||
key_file: <filename> |
||||
|
||||
# ServerName extension to indicate the name of the server. |
||||
# https://tools.ietf.org/html/rfc4366#section-3.1 |
||||
server_name: <string> |
||||
|
||||
# Disable validation of the server certificate. |
||||
insecure_skip_verify: <boolean> |
||||
|
||||
# Optional proxy URL. |
||||
proxy_url: <string> |
||||
|
||||
# Maximum wait period before sending batch |
||||
batchwait: 1s |
||||
|
||||
# Maximum batch size to accrue before sending, unit is byte |
||||
batchsize: 102400 |
||||
|
||||
# Maximum time to wait for server to respond to a request |
||||
timeout: 10s |
||||
|
||||
backoff_config: |
||||
# Initial backoff time between retries |
||||
minbackoff: 100ms |
||||
# Maximum backoff time between retries |
||||
maxbackoff: 5s |
||||
# Maximum number of retires when sending batches, 0 means infinite retries |
||||
maxretries: 5 |
||||
|
||||
# The labels to add to any time series or alerts when communicating with loki |
||||
external_labels: {} |
||||
``` |
||||
@ -1,92 +0,0 @@ |
||||
# Examples |
||||
|
||||
This document shows some example use-cases for promtail and their configuration. |
||||
|
||||
## Local Config |
||||
Using this configuration, all files in `/var/log` and `/srv/log/someone_service` are ingested into Loki. |
||||
The labels `job` and `host` are set using `static_configs`. |
||||
|
||||
When using this configuration with Docker, do not forget to mount the configuration, `/var/log` and `/src/log/someone_service` using [volumes](https://docs.docker.com/storage/volumes/). |
||||
|
||||
```yaml |
||||
server: |
||||
http_listen_port: 9080 |
||||
grpc_listen_port: 0 |
||||
|
||||
positions: |
||||
filename: /tmp/positions.yaml # progress of the individual files |
||||
|
||||
client: |
||||
url: http://ip_or_hostname_where_loki_runs:3100/api/prom/push |
||||
|
||||
scrape_configs: |
||||
- job_name: system |
||||
pipeline_stages: |
||||
- docker: # Docker wraps logs in json. Undo this. |
||||
static_configs: # running locally here, no need for service discovery |
||||
- targets: |
||||
- localhost |
||||
labels: |
||||
job: varlogs |
||||
host: yourhost |
||||
__path__: /var/log/*.log # tail all files under /var/log |
||||
|
||||
- job_name: someone_service |
||||
pipeline_stages: |
||||
- docker: # Docker wraps logs in json. Undo this. |
||||
static_configs: # running locally here, no need for service discovery |
||||
- targets: |
||||
- localhost |
||||
labels: |
||||
job: someone_service |
||||
host: yourhost |
||||
__path__: /srv/log/someone_service/*.log # tail all files under /srv/log/someone_service |
||||
|
||||
``` |
||||
|
||||
## Systemd Journal |
||||
This example shows how to ship the `systemd` journal to Loki. |
||||
|
||||
Just like the Docker example, the `scrape_configs` section holds various |
||||
jobs for parsing logs. A job with a `journal` key configures it for systemd |
||||
journal reading. |
||||
|
||||
`path` is an optional string specifying the path to read journal entries |
||||
from. If unspecified, defaults to the system default (`/var/log/journal`). |
||||
|
||||
`labels`: is a map of string values specifying labels that should always |
||||
be associated with each log entry being read from the systemd journal. |
||||
In our example, each log will have a label of `job=systemd-journal`. |
||||
|
||||
Every field written to the systemd journal is available for processing |
||||
in the `relabel_configs` section. Label names are converted to lowercase |
||||
and prefixed with `__journal_`. After `relabel_configs` processes all |
||||
labels for a job entry, any label starting with `__` is deleted. |
||||
|
||||
Our example renames the `_SYSTEMD_UNIT` label (available as |
||||
`__journal__systemd_unit` in promtail) to `unit** so it will be available |
||||
in Loki. All other labels from the journal entry are dropped. |
||||
|
||||
When running using Docker, **remember to bind the journal into the container**. |
||||
|
||||
```yaml |
||||
server: |
||||
http_listen_port: 9080 |
||||
grpc_listen_port: 0 |
||||
|
||||
positions: |
||||
filename: /tmp/positions.yaml |
||||
|
||||
clients: |
||||
- url: http://ip_or_hostname_where_loki_runns:3100/api/prom/push |
||||
|
||||
scrape_configs: |
||||
- job_name: journal |
||||
journal: |
||||
path: /var/log/journal |
||||
labels: |
||||
job: systemd-journal |
||||
relabel_configs: |
||||
- source_labels: ['__journal__systemd_unit'] |
||||
target_label: 'unit' |
||||
``` |
||||
@ -1,53 +0,0 @@ |
||||
# Promtail Known Failure Modes |
||||
|
||||
This document describes known failure modes of `promtail` on edge cases and the adopted trade-offs. |
||||
|
||||
|
||||
## A tailed file is truncated while `promtail` is not running |
||||
|
||||
Given the following order of events: |
||||
|
||||
1. `promtail` is tailing `/app.log` |
||||
2. `promtail` current position for `/app.log` is `100` (bytes) |
||||
3. `promtail` is stopped |
||||
4. `/app.log` is truncated and new logs are appended to it |
||||
5. `promtail` is restarted |
||||
|
||||
When `promtail` is restarted, it reads the previous position (`100`) from the positions file. Two scenarios are then possible: |
||||
|
||||
- `/app.log` size is < than the position before truncating |
||||
- `/app.log` size is >= than the position before truncating |
||||
|
||||
If the `/app.log` file size is less than the previous position, then the file is detected as truncated and logs will be tailed starting from position `0`. Otherwise, if the `/app.log` file size is >= than the previous position, `promtail` can't detect it was truncated while not running and will continue tailing the file from position `100`. |
||||
|
||||
Generally speaking, `promtail` uses only the path to the file as key in the positions file. Whenever `promtail` is started, for each file path referenced in the positions file, `promtail` will read the file from the beginning if the file size is less than the offset stored in the position file, otherwise it will continue from the offset, regardless the file has been truncated or rolled multiple times while `promtail` was not running. |
||||
|
||||
|
||||
## Loki is unavailable |
||||
|
||||
For each tailing file, `promtail` reads a line, process it through the configured `pipeline_stages` and push the log entry to Loki. Log entries are batched together before getting pushed to Loki, based on the max batch duration `client.batch-wait` and size `client.batch-size-bytes`, whichever comes first. |
||||
|
||||
In case of any error while sending a log entries batch, `promtail` adopts a "retry then discard" strategy: |
||||
|
||||
- `promtail` retries to send log entry to the ingester up to `maxretries` times |
||||
- if all retries fail, `promtail` discards the batch of log entries (_which will be lost_) and proceeds with the next one |
||||
|
||||
You can configure the `maxretries` and the delay between two retries via the `backoff_config` in the promtail config file: |
||||
|
||||
``` |
||||
clients: |
||||
- url: INGESTER-URL |
||||
backoff_config: |
||||
minbackoff: 100ms |
||||
maxbackoff: 5s |
||||
maxretries: 5 |
||||
``` |
||||
|
||||
|
||||
## Log entries pushed after a `promtail` crash / panic / abruptly termination |
||||
|
||||
When `promtail` shutdown gracefully, it saves the last read offsets in the positions file, so that on a subsequent restart it will continue tailing logs without duplicates neither losses. |
||||
|
||||
In the event of a crash or abruptly termination, `promtail` can't save the last read offsets in the positions file. When restarted, `promtail` will read the positions file saved at the last sync period and will continue tailing the files from there. This means that if new log entries have been read and pushed to the ingester between the last sync period and the crash, these log entries will be sent again to the ingester on `promtail` restart. |
||||
|
||||
However, for each log stream (set of unique labels) the Loki ingester skips all log entries received out of timestamp order. For this reason, even if duplicated logs may be sent from `promtail` to the ingester, entries whose timestamp is older than the latest received will be discarded to avoid having duplicated logs. To leverage on this, it's important that your `pipeline_stages` include the `timestamp` stage, parsing the log entry timestamp from the log line instead of relying on the default behaviour of setting the timestamp as the point in time when the line is read by `promtail`. |
||||
@ -1,171 +0,0 @@ |
||||
# Querying |
||||
|
||||
To get the previously ingested logs back from Loki for analysis, you need a |
||||
client that supports LogQL. |
||||
Grafana will be the first choice for most users, |
||||
nevertheless [LogCLI](logcli.md) represents a viable standalone alternative. |
||||
|
||||
## Clients |
||||
### Grafana |
||||
|
||||
Grafana ships with built-in support for Loki for versions greater than |
||||
[6.0](https://grafana.com/grafana/download/6.0.0), however using |
||||
[6.3](https://grafana.com/grafana/download/6.3.0) or later is highly |
||||
recommended. |
||||
|
||||
1. Log into your Grafana, e.g, `http://localhost:3000` (default username: |
||||
`admin`, default password: `admin`) |
||||
2. Go to `Configuration` > `Data Sources` via the cog icon on the left side bar. |
||||
3. Click the big <kbd>+ Add data source</kbd> button. |
||||
4. Choose Loki from the list. |
||||
5. The http URL field should be the address of your Loki server e.g. |
||||
`http://localhost:3100` when running locally or with docker, |
||||
`http://loki:3100` when running with docker-compose or kubernetes. |
||||
6. To see the logs, click <kbd>Explore</kbd> on the sidebar, select the Loki |
||||
datasource, and then choose a log stream using the <kbd>Log labels</kbd> |
||||
button. |
||||
|
||||
Read more about the Explore feature in the [Grafana |
||||
docs](http://docs.grafana.org/features/explore) and on how to search and filter |
||||
logs with Loki. |
||||
|
||||
> To configure the datasource via provisioning see [Configuring Grafana via |
||||
> Provisioning](http://docs.grafana.org/features/datasources/loki/#configure-the-datasource-with-provisioning) |
||||
> and make sure to adjust the URL similarly as shown above. |
||||
|
||||
### LogCLI |
||||
If you prefer a command line interface, [LogCLI](logcli.md) also allows to run |
||||
LogQL queries against a Loki server. Refer to its [documentation](logcli.md) for |
||||
more details. |
||||
|
||||
## LogQL |
||||
Loki has it's very own language for querying logs from the Loki server called *LogQL*. Think of |
||||
it as distributed `grep` with labels for selection. |
||||
|
||||
A log query consists of two parts: **log stream selector**, and a **filter |
||||
expression**. For performance reasons you need to start by choosing a set of log |
||||
streams using a Prometheus-style log stream selector. |
||||
|
||||
The log stream selector will reduce the number of log streams to a manageable |
||||
volume and then the regex search expression is used to do a distributed grep |
||||
over those log streams. |
||||
|
||||
### Log Stream Selector |
||||
|
||||
For the label part of the query expression, wrap it in curly braces `{}` and |
||||
then use the key value syntax for selecting labels. Multiple label expressions |
||||
are separated by a comma: |
||||
|
||||
`{app="mysql",name="mysql-backup"}` |
||||
|
||||
The following label matching operators are currently supported: |
||||
|
||||
- `=` exactly equal. |
||||
- `!=` not equal. |
||||
- `=~` regex-match. |
||||
- `!~` do not regex-match. |
||||
|
||||
Examples: |
||||
|
||||
- `{name=~"mysql.+"}` |
||||
- `{name!~"mysql.+"}` |
||||
|
||||
The same rules that apply for [Prometheus Label |
||||
Selectors](https://prometheus.io/docs/prometheus/latest/querying/basics/#instant-vector-selectors) |
||||
apply for Loki Log Stream Selectors. |
||||
|
||||
### Filter Expression |
||||
|
||||
After writing the Log Stream Selector, you can filter the results further by |
||||
writing a search expression. The search expression can be just text or a regex |
||||
expression. |
||||
|
||||
Example queries: |
||||
|
||||
- `{job="mysql"} |= "error"` |
||||
- `{name="kafka"} |~ "tsdb-ops.*io:2003"` |
||||
- `{instance=~"kafka-[23]",name="kafka"} != kafka.server:type=ReplicaManager` |
||||
|
||||
Filter operators can be chained and will sequentially filter down the |
||||
expression - resulting log lines will satisfy _every_ filter. Eg: |
||||
|
||||
`{job="mysql"} |= "error" != "timeout"` |
||||
|
||||
The following filter types have been implemented: |
||||
|
||||
- `|=` line contains string. |
||||
- `!=` line does not contain string. |
||||
- `|~` line matches regular expression. |
||||
- `!~` line does not match regular expression. |
||||
|
||||
The regex expression accepts [RE2 |
||||
syntax](https://github.com/google/re2/wiki/Syntax). The matching is |
||||
case-sensitive by default and can be switched to case-insensitive prefixing the |
||||
regex with `(?i)`. |
||||
|
||||
### Query Language Extensions |
||||
|
||||
The query language is still under development to support more features, e.g.,: |
||||
|
||||
- `AND` / `NOT` operators |
||||
- Number extraction for timeseries based on number in log messages |
||||
- JSON accessors for filtering of JSON-structured logs |
||||
- Context (like `grep -C n`) |
||||
|
||||
## Counting logs |
||||
|
||||
Loki's LogQL support sample expression allowing to count entries per stream after the regex filtering stage. |
||||
|
||||
### Range Vector aggregation |
||||
|
||||
The language shares the same [range vector](https://prometheus.io/docs/prometheus/latest/querying/basics/#range-vector-selectors) concept from Prometheus, except that the selected range of samples contains a value of one for each log entry. You can then apply an aggregation over the selected range to transform it into an instant vector. |
||||
|
||||
`rate` calculates the number of entries per second and `count_over_time` count of entries for the each log stream within the range. |
||||
|
||||
In this example, we count all the log lines we have recorded within the last 5min for the mysql job. |
||||
|
||||
> `count_over_time({job="mysql"}[5m])` |
||||
|
||||
A range vector aggregation can also be applied to a [Filter Expression](#filter-expression), allowing you to select only matching log entries. |
||||
|
||||
> `rate( ( {job="mysql"} |= "error" != "timeout)[10s] ) )` |
||||
|
||||
The query above will compute the per second rate of all errors except those containing `timeout` within the last 10 seconds. |
||||
|
||||
You can then use aggregation operators over the range vector aggregation. |
||||
|
||||
### Aggregation operators |
||||
|
||||
Like [PromQL](https://prometheus.io/docs/prometheus/latest/querying/operators/#aggregation-operators), Loki's LogQL support a subset of built-in aggregation operators that can be used to aggregate the element of a single vector, resulting in a new vector of fewer elements with aggregated values: |
||||
|
||||
- `sum` (calculate sum over dimensions) |
||||
- `min` (select minimum over dimensions) |
||||
- `max` (select maximum over dimensions) |
||||
- `avg` (calculate the average over dimensions) |
||||
- `stddev` (calculate population standard deviation over dimensions) |
||||
- `stdvar` (calculate population standard variance over dimensions) |
||||
- `count` (count number of elements in the vector) |
||||
- `bottomk` (smallest k elements by sample value) |
||||
- `topk` (largest k elements by sample value) |
||||
|
||||
These operators can either be used to aggregate over all label dimensions or preserve distinct dimensions by including a without or by clause. |
||||
|
||||
> `<aggr-op>([parameter,] <vector expression>) [without|by (<label list>)]` |
||||
|
||||
parameter is only required for `topk` and `bottomk`. without removes the listed labels from the result vector, while all other labels are preserved the output. by does the opposite and drops labels that are not listed in the by clause, even if their label values are identical between all elements of the vector. |
||||
|
||||
topk and bottomk are different from other aggregators in that a subset of the input samples, including the original labels, are returned in the result vector. by and without are only used to bucket the input vector. |
||||
|
||||
#### Examples |
||||
|
||||
Get top 10 applications by highest log throughput: |
||||
|
||||
> `topk(10,sum(rate({region="us-east1"}[5m]) by (name))` |
||||
|
||||
Get the count of logs during the last 5 minutes by level: |
||||
|
||||
> `sum(count_over_time({job="mysql"}[5m])) by (level)` |
||||
|
||||
Get the rate of HTTP GET requests from nginx logs: |
||||
|
||||
> `avg(rate(({job="nginx"} |= "GET")[10s])) by (region)` |
||||
@ -1,117 +0,0 @@ |
||||
# Troubleshooting |
||||
|
||||
## "Loki: Bad Gateway. 502" |
||||
This error can appear in Grafana when you add Loki as a datasource. It means |
||||
that Grafana cannot connect to Loki. This can have several reasons: |
||||
|
||||
- If you deploy in using Docker, Grafana and Loki are not in same node, check |
||||
iptables or firewalls to ensure they can connect. |
||||
- If you deploy using Kubernetes, please note: |
||||
- If Grafana and Loki are in the same namespace, set Loki url as |
||||
`http://$LOKI_SERVICE_NAME:$LOKI_PORT`. |
||||
- If Grafana and Loki are in different namespaces, set Loki url as |
||||
`http://$LOKI_SERVICE_NAME.$LOKI_NAMESPACE:$LOKI_PORT`. |
||||
|
||||
## "Data source connected, but no labels received. Verify that Loki and Promtail is configured properly." |
||||
|
||||
This error can appear in Grafana when you add Loki as a datasource. It means |
||||
that Grafana can connect to Loki, but Loki has not received any logs from |
||||
promtail. This can have several reasons: |
||||
|
||||
- Promtail cannot reach Loki, check promtail's output. |
||||
- Promtail started sending logs before Loki was ready. This can happen in test |
||||
environments where promtail already read all logs and sent them off. Here is |
||||
what you can do: |
||||
- Generally start promtail after Loki, e.g., 60 seconds later. |
||||
- Restarting promtail will not necessarily resend log messages that have been |
||||
read. To force sending all messages again, delete the positions file |
||||
(default location `/tmp/positions.yaml`) or make sure new log messages are |
||||
written after both promtail and Loki have started. |
||||
- Promtail is ignoring targets because of a configuration rule |
||||
- Detect this by turning on debug logging and then look for `dropping target, |
||||
no labels` or `ignoring target` messages. |
||||
- Promtail cannot find the location of your log files. Check that the |
||||
`scrape_configs` contains valid path setting for finding the logs in your worker |
||||
nodes. |
||||
- Your pods are running but not with the labels Promtail is expecting. Check the |
||||
Promtail `scape_configs`. |
||||
|
||||
## Troubleshooting targets |
||||
|
||||
Promtail offers two pages that you can use to understand how service discovery |
||||
works. The service discovery page (`/service-discovery`) shows all discovered |
||||
targets with their labels before and after relabeling as well as the reason why |
||||
the target has been dropped. The targets page (`/targets`) however displays only |
||||
targets being actively scraped with their respective labels, files and |
||||
positions. |
||||
|
||||
On Kubernetes, you can access those two pages by port-forwarding the promtail port (`9080` or |
||||
`3101` via helm) locally: |
||||
|
||||
```bash |
||||
kubectl port-forward loki-promtail-jrfg7 9080 |
||||
``` |
||||
|
||||
## Debug output |
||||
|
||||
Both binaries support a log level parameter on the command-line, e.g.: |
||||
``` bash |
||||
$ loki —log.level=debug |
||||
``` |
||||
|
||||
## Failed to create target, `ioutil.ReadDir: readdirent: not a directory` |
||||
|
||||
The promtail configuration contains a `__path__` entry to a directory that |
||||
promtail cannot find. |
||||
|
||||
## Connecting to a promtail pod to troubleshoot |
||||
|
||||
First check [Troubleshooting targets](#troubleshooting-targets) section above, |
||||
if that doesn't help answer your questions you can connect to the promtail pod |
||||
to further investigate. |
||||
|
||||
If you are running promtail as a DaemonSet in your cluster, you will have a |
||||
promtail pod on each node, so figure out which promtail you need to debug first: |
||||
|
||||
|
||||
```shell |
||||
$ kubectl get pods --all-namespaces -o wide |
||||
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE |
||||
... |
||||
nginx-7b6fb56fb8-cw2cm 1/1 Running 0 41d 10.56.4.12 node-ckgc <none> |
||||
... |
||||
promtail-bth9q 1/1 Running 0 3h 10.56.4.217 node-ckgc <none> |
||||
``` |
||||
|
||||
That output is truncated to highlight just the two pods we are interested in, |
||||
you can see with the `-o wide` flag the NODE on which they are running. |
||||
|
||||
You'll want to match the node for the pod you are interested in, in this example |
||||
nginx, to the promtail running on the same node. |
||||
|
||||
To debug you can connect to the promtail pod: |
||||
|
||||
```shell |
||||
kubectl exec -it promtail-bth9q -- /bin/sh |
||||
``` |
||||
|
||||
Once connected, verify the config in `/etc/promtail/promtail.yml` is what you |
||||
expected |
||||
|
||||
Also check `/var/log/positions.yaml`(`/run/promtail/positions.yaml` when deploy by helm or the value of `positions.file`) and make sure promtail is tailing the logs |
||||
you would expect |
||||
|
||||
You can check the promtail log by looking in `/var/log/containers` at the |
||||
promtail container log |
||||
|
||||
## Enable tracing for Loki |
||||
|
||||
Loki can be traced using [Jaeger](https://www.jaegertracing.io/) by setting |
||||
the environment variable `JAEGER_AGENT_HOST` to the hostname and port where |
||||
Loki is running. |
||||
|
||||
If you deploy with helm, refer to the following command: |
||||
|
||||
```bash |
||||
$ helm upgrade --install loki loki/loki --set "loki.tracing.jaegerAgentHost=YOUR_JAEGER_AGENT_HOST" |
||||
``` |
||||
Loading…
Reference in new issue