Documentation Rewrite (#982)

* docs: create structure of docs overhaul

This commit removes all old docs and lays out the table of contents and
framework for how the new documentation will be intended to be read.

* docs: add design docs back in

* docs: add community documentation

* docs: add LogQL docs

* docs: port existing operations documentation

* docs: add new placeholder file for promtail configuration docs

* docs: add TOC for operations/storage

* docs: add Loki API documentation

* docs: port troubleshooting document

* docs: add docker-driver documentation

* docs: link to configuration from main docker-driver document

* docs: update API for new paths

* docs: fix broken links in api.md and remove json marker from examples

* docs: incorporate api changes from #1009

* docs: port promtail documentation

* docs: add TOC to promtail configuration reference

* docs: fix promtail spelling errors

* docs: add loki configuration reference

* docs: add TOC to configuration

* docs: add loki configuration example

* docs: add Loki overview with brief explanation about each component

* docs: add comparisons document

* docs: add info on table manager and update storage/README.md

* docs: add getting started

* docs: incorporate config yaml changes from #755

* docs: fix typo in releases url for promtail

* docs: add installation instructions

* docs: add more configuration examples

* docs: add information on fluentd client

fluent-bit has been temporarily removed until the PR for it is merged.

* docs: PR review feedback

* docs: add architecture document

* docs: add missing information from old docs

* `localy` typo

Co-Authored-By: Ed Welch <ed@oqqer.com>

* docs: s/ran/run/g

* Typo

* Typo

* Tyop

* Typo

* docs: fixed typo

* docs: PR feedback

* docs: @cyriltovena PR feedback

* docs: add more details to promtail url config option

* docs: expand promtail's pipelines document with extra detail

* docs: remove reference to Stage interface in pipelines.md

* docs: fixed some spelling

* docs: clarify promtail configuration and scraping

* docs: attempt #2 at explaining promtail's usage of machine hostname

* docs: spelling fixes

* docs: add reference to promtail custom metrics and fix silly typo

* docs: cognizant -> aware

* docs: typo

* docs: typos

* docs: add which components expose which API endpoints in microservices mode

* docs: change ksonnet installation to tanka

* docs: address most @pracucci feedback

* docs: fix all spelling errors so reviewers don't have to keep finding them :)

* docs: incorporate changes to API endpoints made in #1022

* docs: add missing loki metrics

* docs: add missing promtail metrics

* docs: @pstribrany feedback

* docs: more @pracucci feedback

* docs: move metrics into a table

* docs: update push path references to /loki/api/v1/push

* docs: add detail to further explain limitations of monolithic mode

* docs: add alternative names to modes_of_operation diagram

* docs: add log ordering requirement

* docs: add procedure for updating docs with latest version

* docs: separate out stages documentation into one document per stage

* docs: list supported stores in storage documentation

* docs: add info on duplicate log lines in pipelines

* docs: add line_format as key feature to fluentd

* docs: hopefully final commit :)
pull/1060/head
Robert Fratto 6 years ago committed by Cyril Tovena
parent f755c59011
commit 65ba42a6e7
  1. 84
      docs/README.md
  2. 639
      docs/api.md
  3. 306
      docs/architecture.md
  4. 166
      docs/canary/README.md
  5. 42
      docs/canary/loki-canary-daemonset.yml
  6. 36
      docs/canary/loki-canary.yml
  7. BIN
      docs/chunks_diagram.png
  8. 29
      docs/clients/README.md
  9. 54
      docs/clients/docker-driver/README.md
  10. 152
      docs/clients/docker-driver/configuration.md
  11. 179
      docs/clients/fluentd.md
  12. 83
      docs/clients/promtail/README.md
  13. 949
      docs/clients/promtail/configuration.md
  14. 70
      docs/clients/promtail/installation.md
  15. 208
      docs/clients/promtail/pipelines.md
  16. 143
      docs/clients/promtail/scraping.md
  17. 25
      docs/clients/promtail/stages/README.md
  18. 91
      docs/clients/promtail/stages/json.md
  19. 37
      docs/clients/promtail/stages/labels.md
  20. 85
      docs/clients/promtail/stages/match.md
  21. 215
      docs/clients/promtail/stages/metrics.md
  22. 43
      docs/clients/promtail/stages/output.md
  23. 76
      docs/clients/promtail/stages/regex.md
  24. 70
      docs/clients/promtail/stages/template.md
  25. 81
      docs/clients/promtail/stages/timestamp.md
  26. 81
      docs/clients/promtail/troubleshooting.md
  27. 5
      docs/community/README.md
  28. 60
      docs/community/contributing.md
  29. 12
      docs/community/getting-in-touch.md
  30. 6
      docs/community/governance.md
  31. 833
      docs/configuration/README.md
  32. 163
      docs/configuration/examples.md
  33. 6
      docs/getting-started/README.md
  34. 29
      docs/getting-started/grafana.md
  35. 63
      docs/getting-started/logcli.md
  36. 126
      docs/getting-started/troubleshooting.md
  37. 5
      docs/installation/README.md
  38. 108
      docs/installation/helm.md
  39. 42
      docs/installation/local.md
  40. 66
      docs/installation/tanka.md
  41. 5
      docs/logentry/README.md
  42. 621
      docs/logentry/processing-log-lines.md
  43. 153
      docs/logql.md
  44. 60
      docs/loki/README.md
  45. 311
      docs/loki/api.md
  46. 88
      docs/loki/operations.md
  47. 86
      docs/loki/setup.md
  48. 158
      docs/loki/storage.md
  49. 5
      docs/maintaining/README.md
  50. 5
      docs/maintaining/release.md
  51. BIN
      docs/modes_of_operation.png
  52. 10
      docs/operations/README.md
  53. 14
      docs/operations/authentication.md
  54. 0
      docs/operations/loki-canary-block.png
  55. 172
      docs/operations/loki-canary.md
  56. 15
      docs/operations/multi-tenancy.md
  57. 87
      docs/operations/observability.md
  58. 11
      docs/operations/scalability.md
  59. 95
      docs/operations/storage/README.md
  60. 57
      docs/operations/storage/retention.md
  61. 33
      docs/operations/storage/table-manager.md
  62. 149
      docs/overview/README.md
  63. 51
      docs/overview/comparisons.md
  64. 115
      docs/promtail.md
  65. 45
      docs/promtail/README.md
  66. 22
      docs/promtail/api.md
  67. 141
      docs/promtail/config-examples.md
  68. 185
      docs/promtail/configuration.md
  69. 253
      docs/promtail/deployment-methods.md
  70. 92
      docs/promtail/examples.md
  71. 53
      docs/promtail/known-failure-modes.md
  72. 171
      docs/querying.md
  73. 117
      docs/troubleshooting.md

@ -1,42 +1,56 @@
# Loki Documentation
<p align="center"> <img src="logo_and_name.png" alt="Loki Logo"> <br>
<small>Like Prometheus, but for logs!</small> </p>
Grafana Loki is a set of components, that can be composed into a fully featured
Grafana Loki is a set of components that can be composed into a fully featured
logging stack.
It builds around the idea of treating a single log line as-is. This means that
instead of full-text indexing them, related logs are grouped using the same
labels as in Prometheus. This is much more efficient and scales better.
## Components
- **[Loki](loki/README.md)**: The main server component is called Loki. It is
responsible for permanently storing the logs it is being shipped and it
executes the LogQL
queries from clients.
Loki shares its high-level architecture with Cortex, a highly scalable
Prometheus backend.
- **[Promtail](promtail/README.md)**: To ship logs to a central place, an
agent is required. Promtail
is deployed to every node that should be monitored and sends the logs to Loki.
It also does important task of pre-processing the log lines, including
attaching labels to them for easier querying.
- *Grafana*: The *Explore* feature of Grafana 6.0+ is the primary place of
contact between a human and Loki. It is used for discovering and analyzing
logs.
Unlike other logging systems, Loki is built around the idea of only indexing
metadata about your logs: labels (just like Prometheus labels). Log data itself
is then compressed and stored in chunks in object stores such as S3 or GCS, or
even locally on the filesystem. A small index and highly compressed chunks
simplifies the operation and significantly lowers the cost of Loki.
Alongside these main components, there are some other ones as well:
## Table of Contents
- **[LogCLI](logcli.md)**: A command line interface to query logs and labels
from Loki
- **[Canary](canary/README.md)**: An audit utility to analyze the log-capturing
performance of Loki. Ingests data into Loki and immediately reads it back to
check for latency and loss.
- **[Docker
Driver](https://github.com/grafana/loki/tree/master/cmd/docker-driver)**: A
Docker [log
driver](https://docs.docker.com/config/containers/logging/configure/) to ship
logs captured by Docker directly to Loki, without the need of an agent.
- **[Fluentd
Plugin](https://github.com/grafana/loki/tree/master/fluentd/fluent-plugin-grafana-loki)**:
An Fluentd [output plugin](https://docs.fluentd.org/output), to use Fluentd
for shipping logs into Loki
1. [Overview](overview/README.md)
1. [Comparison to other Log Systems](overview/comparisons.md)
2. [Installation](installation/README.md)
1. [Installing with Tanka](installation/tanka.md)
2. [Installing with Helm](installation/helm.md)
3. [Installing Locally](installation/local.md)
3. [Getting Started](getting-started/README.md)
1. [Grafana](getting-started/grafana.md)
2. [LogCLI](getting-started/logcli.md)
4. [Troubleshooting](getting-started/troubleshooting.md)
4. [Configuration](configuration/README.md)
1. [Examples](configuration/examples.md)
5. [Clients](clients/README.md)
1. [Promtail](clients/promtail/README.md)
1. [Installation](clients/promtail/installation.md)
2. [Configuration](clients/promtail/configuration.md)
3. [Scraping](clients/promtail/scraping.md)
4. [Pipelines](clients/promtail/pipelines.md)
5. [Troubleshooting](clients/promtail/troubleshooting.md)
2. [Docker Driver](clients/docker-driver/README.md)
1. [Configuration](clients/docker-driver/configuration.md)
3. [Fluentd](clients/fluentd.md)
6. [LogQL](logql.md)
7. [Operations](operations/README.md)
1. [Authentication](operations/authentication.md)
2. [Observability](operations/observability.md)
3. [Scalability](operations/scalability.md)
4. [Storage](operations/storage/README.md)
1. [Table Manager](operations/storage/table-manager.md)
2. [Retention](operations/storage/retention.md)
5. [Multi-tenancy](operations/multi-tenancy.md)
6. [Loki Canary](operations/loki-canary.md)
8. [HTTP API](api.md)
9. [Architecture](architecture.md)
10. [Community](community/README.md)
1. [Governance](community/governance.md)
2. [Getting in Touch](community/getting-in-touch.md)
3. [Contributing to Loki](community/contributing.md)
11. [Loki Maintainers Guide](./maintaining/README.md)
1. [Releasing Loki](./maintaining/release.md)

@ -0,0 +1,639 @@
# Loki's HTTP API
Loki exposes an HTTP API for pushing, querying, and tailing log data.
Note that [authenticating](operations/authentication.md) against the API is
out of scope for Loki.
The HTTP API includes the following endpoints:
- [`GET /loki/api/v1/query`](#get-lokiapiv1query)
- [`GET /loki/api/v1/query_range`](#get-lokiapiv1query_range)
- [`GET /loki/api/v1/label`](#get-lokiapiv1label)
- [`GET /loki/api/v1/label/<name>/values`](#get-lokiapiv1labelnamevalues)
- [`GET /loki/api/v1/tail`](#get-lokiapiv1tail)
- [`POST /loki/api/v1/push`](#post-lokiapiv1push)
- [`GET /api/prom/tail`](#get-apipromtail)
- [`GET /api/prom/query`](#get-apipromquery)
- [`GET /ready`](#get-ready)
- [`POST /flush`](#post-flush)
- [`GET /metrics`](#get-metrics)
## Microservices Mode
When deploying Loki in microservices mode, the set of endpoints exposed by each
component is different.
These endpoints are exposed by all components:
- [`GET /ready`](#get-ready)
- [`GET /metrics`](#get-metrics)
These endpoints are exposed by just the querier:
- [`GET /loki/api/v1/query`](#get-lokiapiv1query)
- [`GET /loki/api/v1/query_range`](#get-lokiapiv1query_range)
- [`GET /loki/api/v1/label`](#get-lokiapiv1label)
- [`GET /loki/api/v1/label/<name>/values`](#get-lokiapiv1labelnamevalues)
- [`GET /loki/api/v1/tail`](#get-lokiapiv1tail)
- [`GET /api/prom/tail`](#get-lokiapipromtail)
- [`GET /api/prom/query`](#get-apipromquery)
While these endpoints are exposed by just the distributor:
- [`POST /loki/api/v1/push`](#post-lokiapiv1push)
And these endpoints are exposed by just the ingester:
- [`POST /flush`](#post-flush)
The API endpoints starting with `/loki/` are [Prometheus API-compatible](https://prometheus.io/docs/prometheus/latest/querying/api/) and the result formats can be used interchangeably.
[Example clients](#example-clients) can be found at the bottom of this document.
## Matrix, Vector, And Streams
Some Loki API endpoints return a result of a matrix, a vector, or a stream:
- Matrix: a table of values where each row represents a different label set
and the columns are each sample value for that row over the queried time.
Matrix types are only returned when running a query that computes some value.
- Instant Vector: denoted in the type as just `vector`, an Instant Vector
represents the latest value of a calculation for a given labelset. Instant
Vectors are only returned when doing a query against a single point in
time.
- Stream: a Stream is a set of all values (logs) for a given label set over the
queried time range. Streams are the only type that will result in log lines
being returned.
## `GET /loki/api/v1/query`
`/loki/api/v1/query` allows for doing queries against a single point in time. The URL
query parameters support the following values:
- `query`: The [LogQL](./logql.md) query to perform
- `limit`: The max number of entries to return
- `time`: The evaluation time for the query as a nanosecond Unix epoch. Defaults to now.
- `direction`: Determines the sort order of logs. Supported values are `forward` or `backward`. Defaults to `backward.`
In microservices mode, `/loki/api/v1/query` is exposed by the querier.
Response:
```
{
"status": "success",
"data": {
"resultType": "vector" | "streams",
"result": [<vector value>] | [<stream value>]
}
}
```
Where `<vector value>` is:
```
{
"metric": {
<label key-value pairs>
},
"value": [
<number: nanosecond unix epoch>,
<string: value>
]
}
```
And `<stream value>` is:
```
{
"stream": {
<label key-value pairs>
},
"values": [
[
<string: nanosecond unix epoch>,
<string: log line>
],
...
]
}
```
### Examples
```bash
$ curl -G -s "http://localhost:3100/loki/api/v1/query" --data-urlencode 'query=sum(rate({job="varlogs"}[10m])) by (level)' | jq
{
"status": "success",
"data": {
"resultType": "vector",
"result": [
{
"metric": {},
"value": [
1559848867745737,
"1267.1266666666666"
]
},
{
"metric": {
"level": "warn"
},
"value": [
1559848867745737,
"37.77166666666667"
]
},
{
"metric": {
"level": "info"
},
"value": [
1559848867745737,
"37.69"
]
}
]
}
}
```
```bash
$ curl -G -s "http://localhost:3100/loki/api/v1/query" --data-urlencode 'query={job="varlogs"}' | jq
{
"status": "success",
"data": {
"resultType": "streams",
"result": [
{
"stream": {
"filename": "/var/log/myproject.log",
"job": "varlogs",
"level": "info"
},
"values": [
[
"1568234281726420425",
"foo"
],
[
"1568234269716526880",
"bar"
]
]
}
]
}
}
```
## `GET /loki/api/v1/query_range`
`/loki/api/v1/query_range` is used to do a query over a range of time and
accepts the following query parameters in the URL:
- `query`: The [LogQL](./logql.md) query to perform
- `limit`: The max number of entries to return
- `start`: The start time for the query as a nanosecond Unix epoch. Defaults to one hour ago.
- `end`: The start time for the query as a nanosecond Unix epoch. Defaults to now.
- `step`: Query resolution step width in seconds. Defaults to 1.
- `direction`: Determines the sort order of logs. Supported values are `forward` or `backward`. Defaults to `backward.`
Requests against this endpoint require Loki to query the index store in order to
find log streams for particular labels. Because the index store is spread out by
time, the time span covered by `start` and `end`, if large, may cause additional
load against the index server and result in a slow query.
In microservices mode, `/loki/api/v1/query_range` is exposed by the querier.
Response:
```
{
"status": "success",
"data": {
"resultType": "matrix" | "streams",
"result": [<matrix value>] | [<stream value>]
}
}
```
Where `<matrix value>` is:
```
{
"metric": {
<label key-value pairs>
},
"values": [
<number: nanosecond unix epoch>,
<string: value>
]
}
```
And `<stream value>` is:
```
{
"stream": {
<label key-value pairs>
},
"values": [
[
<string: nanosecond unix epoch>,
<string: log line>
],
...
]
}
```
### Examples
```bash
$ curl -G -s "http://localhost:3100/loki/api/v1/query_range" --data-urlencode 'query=sum(rate({job="varlogs"}[10m])) by (level)' --data-urlencode 'step=300' | jq
{
"status": "success",
"data": {
"resultType": "matrix",
"result": [
{
"metric": {
"level": "info"
},
"values": [
[
1559848958663735,
"137.95"
],
[
1559849258663735,
"467.115"
],
[
1559849558663735,
"658.8516666666667"
]
]
},
{
"metric": {
"level": "warn"
},
"values": [
[
1559848958663735,
"137.27833333333334"
],
[
1559849258663735,
"467.69"
],
[
1559849558663735,
"660.6933333333334"
]
]
}
]
}
}
```
```bash
$ curl -G -s "http://localhost:3100/loki/api/v1/query_range" --data-urlencode 'query={job="varlogs"}' | jq
{
"status": "success",
"data": {
"resultType": "streams",
"result": [
{
"stream": {
"filename": "/var/log/myproject.log",
"job": "varlogs",
"level": "info"
},
"values": [
{
"1569266497240578000",
"foo"
},
{
"1569266492548155000",
"bar"
}
]
}
]
}
}
```
## `GET /loki/api/v1/label`
`/loki/api/v1/label` retrieves the list of known labels within a given time span. It
accepts the following query parameters in the URL:
- `start`: The start time for the query as a nanosecond Unix epoch. Defaults to 6 hours ago.
- `end`: The start time for the query as a nanosecond Unix epoch. Defaults to now.
In microservices mode, `/loki/api/v1/label` is exposed by the querier.
Response:
```
{
"values": [
<label string>,
...
]
}
```
### Examples
```bash
$ curl -G -s "http://localhost:3100/loki/api/v1/label" | jq
{
"values": [
"foo",
"bar",
"baz"
]
}
```
## `GET /loki/api/v1/label/<name>/values`
`/loki/api/v1/label/<name>/values` retrieves the list of known values for a given
label within a given time span. It accepts the following query parameters in
the URL:
- `start`: The start time for the query as a nanosecond Unix epoch. Defaults to 6 hours ago.
- `end`: The start time for the query as a nanosecond Unix epoch. Defaults to now.
In microservices mode, `/loki/api/v1/label/<name>/values` is exposed by the querier.
Response:
```
{
"values": [
<label value>,
...
]
}
```
### Examples
```bash
$ curl -G -s "http://localhost:3100/loki/api/v1/label/foo/values" | jq
{
"values": [
"cat",
"dog",
"axolotl"
]
}
```
## `GET /loki/api/v1/tail`
`/loki/api/v1/tail` is a WebSocket endpoint that will stream log messages based on
a query. It accepts the following query parameters in the URL:
- `query`: The [LogQL](./logql.md) query to perform
- `delay_for`: The number of seconds to delay retrieving logs to let slow
loggers catch up. Defaults to 0 and cannot be larger than 5.
- `limit`: The max number of entries to return
- `start`: The start time for the query as a nanosecond Unix epoch. Defaults to one hour ago.
In microservices mode, `/loki/api/v1/tail` is exposed by the querier.
Response (streamed):
```
{
"streams": [
{
"stream": {
<label key-value pairs>
},
"values": [
[
<string: nanosecond unix epoch>,
<string: log line>
]
]
}
],
"dropped_entries": [
{
"labels": {
<label key-value pairs>
},
"timestamp": "<nanosecond unix epoch>"
}
]
}
```
## `POST /loki/api/v1/push`
Alias (DEPRECATED): `POST /loki/api/v1/push`
`/loki/api/v1/push` is the endpoint used to send log entries to Loki. The default
behavior is for the POST body to be a snappy-compressed protobuf messsage:
- [Protobuf definition](/pkg/logproto/logproto.proto)
- [Go client library](/pkg/promtail/client/client.go)
Alternatively, if the `Content-Type` header is set to `application/json`, a
JSON post body can be sent in the following format:
```
{
"streams": [
{
"labels": "<LogQL label key-value pairs>",
"entries": [
{
"ts": "<RFC3339Nano string>",
"line": "<log line>"
}
]
}
]
}
```
> **NOTE**: logs sent to Loki for every stream must be in timestamp-ascending
> order, meaning each log line must be more recent than the one last received.
> If logs do not follow this order, Loki will reject the log with an out of
> order error.
In microservices mode, `/loki/api/v1/push` is exposed by the distributor.
### Examples
```bash
$ curl -H "Content-Type: application/json" -XPOST -s "https://localhost:3100/loki/api/v1/push" --data-raw \
'{"streams": [{ "labels": "{foo=\"bar\"}", "entries": [{ "ts": "2018-12-18T08:28:06.801064-04:00", "line": "fizzbuzz" }] }]}'
```
## `GET /api/prom/tail`
> **DEPRECATED**: `/api/prom/tail` is deprecated. Use `/loki/api/v1/tail`
> instead.
`/api/prom/tail` is a WebSocket endpoint that will stream log messages based on
a query. It accepts the following query parameters in the URL:
- `query`: The [LogQL](./logql.md) query to perform
- `delay_for`: The number of seconds to delay retrieving logs to let slow
loggers catch up. Defaults to 0 and cannot be larger than 5.
- `limit`: The max number of entries to return
- `start`: The start time for the query as a nanosecond Unix epoch. Defaults to one hour ago.
In microservices mode, `/api/prom/tail` is exposed by the querier.
Response (streamed):
```json
{
"streams": [
{
"labels": "<LogQL label key-value pairs>",
"entries": [
{
"ts": "<RFC3339Nano timestamp>",
"line": "<log line>"
}
]
}
],
"dropped_entries": [
{
"Timestamp": "<RFC3339Nano timestamp>",
"Labels": "<LogQL label key-value pairs>"
}
]
}
```
`dropped_entries` will be populated when the tailer could not keep up with the
amount of traffic in Loki. When present, it indicates that the entries received
in the streams is not the full amount of logs that are present in Loki. Note
that the keys in `dropped_entries` will be sent as uppercase `Timestamp`
and `Labels` instead of `labels` and `ts` like in the entries for the stream.
As the response is streamed, the object defined by the response format above
will be sent over the WebSocket multiple times.
## `GET /api/prom/query`
> **WARNING**: `/api/prom/query` is DEPRECATED; use `/loki/api/v1/query_range`
> instead.
`/api/prom/query` supports doing general queries. The URL query parameters
support the following values:
- `query`: The [LogQL](./logql.md) query to perform
- `limit`: The max number of entries to return
- `start`: The start time for the query as a nanosecond Unix epoch. Defaults to one hour ago.
- `end`: The start time for the query as a nanosecond Unix epoch. Defaults to now.
- `direction`: Determines the sort order of logs. Supported values are `forward` or `backward`. Defaults to `backward.`
- `regexp`: a regex to filter the returned results
In microservices mode, `/api/prom/query` is exposed by the querier.
Note that the larger the time span between `start` and `end` will cause
additional load on Loki and the index store, resulting in slower queries.
Response:
```
{
"streams": [
{
"labels": "<LogQL label key-value pairs>",
"entries": [
{
"ts": "<RFC3339Nano string>",
"line": "<log line>"
},
...
],
},
...
]
}
```
### Examples
```bash
$ curl -G -s "http://localhost:3100/api/prom/query" --data-urlencode '{foo="bar"}' | jq
{
"streams": [
{
"labels": "{filename=\"/var/log/myproject.log\", job=\"varlogs\", level=\"info\"}",
"entries": [
{
"ts": "2019-06-06T19:25:41.972739Z",
"line": "foo"
},
{
"ts": "2019-06-06T19:25:41.972722Z",
"line": "bar"
}
]
}
]
}
```
### Examples
```bash
$ curl -H "Content-Type: application/json" -XPOST -s "https://localhost:3100/loki/api/v1/push" --data-raw \
'{"streams": [{ "labels": "{foo=\"bar\"}", "entries": [{ "ts": "2018-12-18T08:28:06.801064-04:00", "line": "fizzbuzz" }] }]}'
```
## `GET /ready`
`/ready` returns HTTP 200 when the Loki ingester is ready to accept traffic. If
running Loki on Kubernetes, `/ready` can be used as a readiness probe.
In microservices mode, the `/ready` endpoint is exposed by all components.
## `POST /flush`
`/flush` triggers a flush of all in-memory chunks held by the ingesters to the
backing store. Mainly used for local testing.
In microservices mode, the `/flush` endpoint is exposed by the ingester.
## `GET /metrics`
`/metrics` exposes Prometheus metrics. See
[Observing Loki](operations/observability.md)
for a list of exported metrics.
In microservices mode, the `/metrics` endpoint is exposed by all components.
## Example Clients
Please note that the Loki API is not stable yet and breaking changes may occur
when using or writing a third-party client.
- [Promtail](https://github.com/grafana/loki/tree/master/pkg/promtail) (Official, Go)
- [promtail-client](https://github.com/afiskon/promtail-client) (Go)
- [push-to-loki.py](https://github.com/sleleko/devops-kb/blob/master/python/push-to-loki.py) (Python 3)

@ -0,0 +1,306 @@
# Loki's Architecture
This document will expand on the information detailed in the [Loki
Overview](overview/README.md).
## Multi Tenancy
All data - both in memory and in long-term storage - is partitioned by a
tenant ID, pulled from the `X-Scope-OrgID` HTTP header in the request when Loki
is running in multi-tenant mode. When Loki is **not** in multi-tenant mode, the
header is ignored and the tenant ID is set to "fake", which will appear in the
index and in stored chunks.
## Modes of Operation
![modes_diagram](modes_of_operation.png)
Loki has a set of components (defined below in [Components](#components)) which
are internally referred to as modules. Each component spawns a gRPC server for
internal traffic and an HTTP/1 server for external API requests. All components
come with an HTTP/1 server, but most only expose readiness, health, and metrics
endpoints.
Which component Loki runs is determined by either the `-target` flag at the
command line or the `target: <string>` section in Loki's config file. When the
value of `target` is `all`, Loki will run all of its components in a single
process. This is referred to as "single process", "single binary", or monolithic
mode. Monolithic mode is the default deployment of Loki when Loki is installed
using Helm.
When `target` is _not_ set to `all` (i.e., it is set to `querier`, `ingester`,
or `distributor`), then Loki is said to be running in "horizontally scalable",
or microservices, mode.
Each component of Loki, such as the ingesters and distributors, communicate with
one another over gRPC using the gRPC listen port defined in the Loki config.
When running components in monolithic mode, this is still true: each component,
although running in the same process, will connect to each other over the local
network for inter-component communication.
Single process mode is ideally suited for local development, small workloads,
and for evaluation purposes. Monolithic mode can be scaled with multiple
processes with the following limitations:
1. Local index and local storage cannot currently be used when running
monolithic mode with more than one replica, as each replica must be able to
access the same storage backend, and local storage is not safe for concurrent
access.
2. Individual components cannot be scaled independently, so it is not possible
to have more read components than write components.
## Components
### Distributor
The **distributor** service is responsible for handling incoming streams by
clients. It's the first stop in the write path for log data. Once the
distributor receives a set of streams, each stream is validated for correctness
and to ensure that it is within the configured tenant (or global) limits. Valid
chunks are then split into batches and sent to multiple [ingesters](#ingester)
in parallel.
#### Hashing
Distributors use consistent hashing in conjunction with a configurable
replication factor to determine which instances of the ingester service should
receive a given stream.
A stream is a set of logs associated to a tenant and a unique labelset. The
stream is hashed using both the tenant ID and the labelset and then the hash is
used to find the ingesters to send the stream to.
A hash ring stored in [Consul](https://www.consul.io) is used to achieve
consistent hashing; all [ingesters](#ingester) register themselves into the hash
ring with a set of tokens they own. Each token is a random unsigned 32-bit
number. Along with a set of tokens, ingesters register their state into the
hash ring. The state JOINING, and ACTIVE may all receive write requests, while
ACTIVE and LEAVING ingesters may receive read requests. When doing a hash
lookup, distributors only use tokens for ingesters who are in the appropriate
state for the request.
To do the hash lookup, distributors find the smallest appropriate token whose
value is larger than the hash of the stream. When the replication factor is
larger than 1, the next subsequent tokens (clockwise in the ring) that belong to
different ingesters will also be included in the result.
The effect of this hash set up is that each token that an ingester owns is
responsible for a range of hashes. If there are three tokens with values 0, 25,
and 50, then a hash of 3 would be given to the ingester that owns the token 25;
the ingester owning token 25 is responsible for the hash range of 1-25.
#### Quorum consistency
Since all distributors share access to the same hash ring, write requests can be
sent to any distributor.
To ensure consistent query results, Loki uses
[Dynamo-style](https://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf)
quorum consistency on reads and writes. This means that the distributor will wait
for a positive response of at least one half plus one of the ingesters to send
the sample to before responding to the client that initiated the send.
### Ingester
The **ingester** service is responsible for writing log data to long-term
storage backends (DynamoDB, S3, Cassandra, etc.) on the write path and returning
log data for in-memory queries on the read path.
Ingesters contain a _lifecycler_ which manages the lifecycle of an ingester in
the hash ring. Each ingester has a state of either `PENDING`, `JOINING`,
`ACTIVE`, `LEAVING`, or `UNHEALTHY`:
1. `PENDING` is an Ingester's state when it is waiting for a handoff from
another ingester that is `LEAVING`.
2. `JOINING` is an Ingester's state when it is currently inserting its tokens
into the ring and initializing itself. It may receive write requests for
tokens it owns.
3. `ACTIVE` is an Ingester's state when it is fully initialized. It may receive
both write and read requests for tokens it owns.
4. `LEAVING` is an Ingester's state when it is shutting down. It may receive
read requests for data it still has in memory.
5. `UNHEALTHY` is an Ingester's state when it has failed to heartbeat to
Consul. `UNHEALHTY` is set by the distributor when it periodically checks the ring.
Each log stream that an ingester receives is built up into a set of many
"chunks" in memory and flushed to the backing storage backend at a configurable
interval.
Chunks are compressed and marked as read-only when:
1. The current chunk has reached capacity (a configurable value).
2. Too much time has passed without the current chunk being updated
3. A flush occurs.
Whenever a chunk is compressed and marked as read-only, a writable chunk takes
its place.
If an ingester process crashes or exits abruptly, all the data that has not yet
been flushed will be lost. Loki is usually configured to replicate multiple
replicas (usually 3) of each log to mitigate this risk.
When a flush occurs to a persistent storage provider, the chunk is hashed based
on its tenant, labels, and contents. This means that multiple ingesters with the
same copy of data will not write the same data to the backing store twice, but
if any write failed to one of the replicas, multiple differing chunk objects
will be created in the backing store. See [Querier](#querier) for how data is
deduplicated.
#### Handoff
By default, when an ingester is shutting down and tries to leave the hash ring,
it will wait to see if a new ingester tries to enter before flushing and will
try to initiate a handoff. The handoff will transfer all of the tokens and
in-memory chunks owned by the leaving ingester to the new ingester.
Before joining the hash ring, ingesters will wait in `PENDING` state for a
handoff to occur. After a configurable timeout, ingesters in the `PENDING` state
that have not received a transfer will join the ring normally, inserting a new
set of tokens.
This process is used to avoid flushing all chunks when shutting down, which is a
slow process.
### Querier
The **querier** service handles queries using the [LogQL](./logql.md) query
language, fetching logs both from the ingesters and long-term storage.
Queriers query all ingesters for in-memory data before falling back to
running the same query against the backend store. Because of the replication
factor, it is possible that the querier may receive duplicate data. To resolve
this, the querier internally **deduplicates** data that has the same nanosecond
timestamp, label set, and log message.
## Chunk Format
```
-------------------------------------------------------------------
| | |
| MagicNumber(4b) | version(1b) |
| | |
-------------------------------------------------------------------
| block-1 bytes | checksum (4b) |
-------------------------------------------------------------------
| block-2 bytes | checksum (4b) |
-------------------------------------------------------------------
| block-n bytes | checksum (4b) |
-------------------------------------------------------------------
| #blocks (uvarint) |
-------------------------------------------------------------------
| #entries(uvarint) | mint, maxt (varint) | offset, len (uvarint) |
-------------------------------------------------------------------
| #entries(uvarint) | mint, maxt (varint) | offset, len (uvarint) |
-------------------------------------------------------------------
| #entries(uvarint) | mint, maxt (varint) | offset, len (uvarint) |
-------------------------------------------------------------------
| #entries(uvarint) | mint, maxt (varint) | offset, len (uvarint) |
-------------------------------------------------------------------
| checksum(from #blocks) |
-------------------------------------------------------------------
| #blocks section byte offset |
-------------------------------------------------------------------
```
`mint` and `maxt` describe the minimum and maximum Unix nanosecond timestamp,
respectively.
### Block Format
A block is comprised of a series of entries, each of which is an individual log
line.
Note that the bytes of a block are stored compressed using Gzip. The following
is their form when uncompressed:
```
-------------------------------------------------------------------
| ts (varint) | len (uvarint) | log-1 bytes |
-------------------------------------------------------------------
| ts (varint) | len (uvarint) | log-2 bytes |
-------------------------------------------------------------------
| ts (varint) | len (uvarint) | log-3 bytes |
-------------------------------------------------------------------
| ts (varint) | len (uvarint) | log-n bytes |
-------------------------------------------------------------------
```
`ts` is the Unix nanosecond timestamp of the logs, while len is the length in
bytes of the log entry.
## Chunk Store
The **chunk store** is Loki's long-term data store, designed to support
interactive querying and sustained writing without the need for background
maintenance tasks. It consists of:
* An index for the chunks. This index can be backed by:
* [Amazon DynamoDB](https://aws.amazon.com/dynamodb)
* [Google Bigtable](https://cloud.google.com/bigtable)
* [Apache Cassandra](https://cassandra.apache.org)
* A key-value (KV) store for the chunk data itself, which can be:
* [Amazon DynamoDB](https://aws.amazon.com/dynamodb)
* [Google Bigtable](https://cloud.google.com/bigtable)
* [Apache Cassandra](https://cassandra.apache.org)
* [Amazon S3](https://aws.amazon.com/s3)
* [Google Cloud Storage](https://cloud.google.com/storage/)
> Unlike the other core components of Loki, the chunk store is not a separate
> service, job, or process, but rather a library embedded in the two services
> that need to access Loki data: the [ingester](#ingester) and [querier](#querier).
The chunk store relies on a unified interface to the
"[NoSQL](https://en.wikipedia.org/wiki/NoSQL)" stores (DynamoDB, Bigtable, and
Cassandra) that can be used to back the chunk store index. This interface
assumes that the index is a collection of entries keyed by:
* A **hash key**. This is required for *all* reads and writes.
* A **range key**. This is required for writes and can be omitted for reads,
which can be queried by prefix or range.
The interface works somewhat differently across the supported databases:
* DynamoDB supports range and hash keys natively. Index entries are thus
modelled directly as DynamoDB entries, with the hash key as the distribution
key and the range as the DynamoDB range key.
* For Bigtable and Cassandra, index entries are modelled as individual column
values. The hash key becomes the row key and the range key becomes the column
key.
A set of schemas are used to map the matchers and label sets used on reads and
writes to the chunk store into appropriate operations on the index. Schemas have
been added as Loki has evolved, mainly in an attempt to better load balance
writes and improve query performance.
> The current schema recommendation is the **v10 schema**.
## Read Path
To summarize, the read path works as follows:
1. The querier receives an HTTP/1 request for data.
2. The querier passes the query to all ingesters for in-memory data.
3. The ingesters receive the read request and return data matching the query, if
any.
4. The querier lazily loads data from the backing store and runs the query
against it if no ingesters returned data.
5. The querier iterates over all received data and deduplicates, returning a
final set of data over the HTTP/1 connection.
## Write Path
![chunk_diagram](chunks_diagram.png)
To summarize, the write path works as follows:
1. The distributor receives an HTTP/1 request to store data for streams.
2. Each stream is hashed using the hash ring.
3. The distributor sends each stream to the appropriate ingesters and their
replicas (based on the configured replication factor).
4. Each ingester will create a chunk or append to an existing chunk for the
stream's data. A chunk is unique per tenant and per labelset.
5. The distributor responds with a success code over the HTTP/1 connection.

@ -1,166 +0,0 @@
# loki-canary
A standalone app to audit the log capturing performance of Loki.
## How it works
![block_diagram](block.png)
loki-canary writes a log to a file and stores the timestamp in an internal
array, the contents look something like this:
```nohighlight
1557935669096040040 ppppppppppppppppppppppppppppppppppppppppppppppppppppppppppp
```
The relevant part is the timestamp, the `p`'s are just filler bytes to make the
size of the log configurable.
Promtail (or another agent) then reads the log file and ships it to Loki.
Meanwhile loki-canary opens a websocket connection to loki and listens for logs
it creates
When a log is received on the websocket, the timestamp in the log message is
compared to the internal array.
If the received log is:
* The next in the array to be received, it is removed from the array and the
(current time - log timestamp) is recorded in the `response_latency`
histogram, this is the expected behavior for well behaving logs
* Not the next in the array received, is is removed from the array, the
response time is recorded in the `response_latency` histogram, and the
`out_of_order_entries` counter is incremented
* Not in the array at all, it is checked against a separate list of received
logs to either increment the `duplicate_entries` counter or the
`unexpected_entries` counter.
In the background, loki-canary also runs a timer which iterates through all the
entries in the internal array, if any are older than the duration specified by
the `-wait` flag (default 60s), they are removed from the array and the
`websocket_missing_entries` counter is incremented. Then an additional query is
made directly to loki for these missing entries to determine if they were
actually missing or just didn't make it down the websocket. If they are not
found in the followup query the `missing_entries` counter is incremented.
## Installation
### Binary
Loki Canary is provided as a pre-compiled binary as part of the
[Releases](https://github.com/grafana/loki/releases) on GitHub.
### Docker
Loki Canary is also provided as a Docker container image:
```bash
# change tag to the most recent release
$ docker pull grafana/loki-canary:v0.2.0
```
### Kubernetes
To run on Kubernetes, you can do something simple like:
`kubectl run loki-canary --generator=run-pod/v1
--image=grafana/loki-canary:latest --restart=Never --image-pull-policy=IfNotPresent
--labels=name=loki-canary -- -addr=loki:3100`
Or you can do something more complex like deploy it as a daemonset, there is a
ksonnet setup for this in the `production` folder, you can import it using
jsonnet-bundler:
```shell
jb install github.com/grafana/loki-canary/production/ksonnet/loki-canary
```
Then in your ksonnet environments `main.jsonnet` you'll want something like
this:
```jsonnet
local loki_canary = import 'loki-canary/loki-canary.libsonnet';
loki_canary {
loki_canary_args+:: {
addr: "loki:3100",
port: 80,
labelname: "instance",
interval: "100ms",
size: 1024,
wait: "3m",
},
_config+:: {
namespace: "default",
}
}
```
### From Source
If the other options are not sufficient for your use-case, you can compile
`loki-canary` yourself:
```bash
# clone the source tree
$ git clone https://github.com/grafana/loki
# build the binary
$ make loki-canary
# (optionally build the container image)
$ make loki-canary-image
```
## Configuration
It is required to pass in the Loki address with the `-addr` flag, if your server
uses TLS, also pass `-tls=true` (this will create a `wss://` instead of `ws://`
connection)
You should also pass the `-labelname` and `-labelvalue` flags, these are used by
loki-canary to filter the log stream to only process logs for this instance of
loki-canary, so they must be unique per each of your loki-canary instances. The
ksonnet config in this project accomplishes this by passing in the pod name as
the labelvalue
If you get a high number of `unexpected_entries` you may not be waiting long
enough and should increase `-wait` from 60s to something larger.
__Be cognizant__ of the relationship between `pruneinterval` and the `interval`.
For example, with an interval of 10ms (100 logs per second) and a prune interval
of 60s, you will write 6000 logs per minute, if those logs were not received
over the websocket, the canary will attempt to query loki directly to see if
they are completely lost. __However__ the query return is limited to 1000
results so you will not be able to return all the logs even if they did make it
to Loki.
__Likewise__, if you lower the `pruneinterval` you risk causing a denial of
service attack as all your canaries attempt to query for missing logs at
whatever your `pruneinterval` is defined at.
All options:
```nohighlight
-addr string
The Loki server URL:Port, e.g. loki:3100
-buckets int
Number of buckets in the response_latency histogram (default 10)
-interval duration
Duration between log entries (default 1s)
-labelname string
The label name for this instance of loki-canary to use in the log selector (default "name")
-labelvalue string
The unique label value for this instance of loki-canary to use in the log selector (default "loki-canary")
-pass string
Loki password
-port int
Port which loki-canary should expose metrics (default 3500)
-pruneinterval duration
Frequency to check sent vs received logs, also the frequency which queries for missing logs will be dispatched to loki (default 1m0s)
-size int
Size in bytes of each log line (default 100)
-tls
Does the loki connection use TLS?
-user string
Loki username
-wait duration
Duration to wait for log entries before reporting them lost (default 1m0s)
```

@ -1,42 +0,0 @@
---
kind: DaemonSet
apiVersion: extensions/v1beta1
metadata:
labels:
app: loki-canary
name: loki-canary
name: loki-canary
spec:
template:
metadata:
name: loki-canary
labels:
app: loki-canary
spec:
containers:
- args:
- -addr=loki:3100
image: grafana/loki-canary:latest
imagePullPolicy: IfNotPresent
name: loki-canary
resources: {}
---
apiVersion: v1
kind: Service
metadata:
name: loki-canary
labels:
app: loki-canary
spec:
type: ClusterIP
selector:
app: loki-canary
ports:
- name: metrics
protocol: TCP
port: 3500
targetPort: 3500

@ -1,36 +0,0 @@
---
apiVersion: v1
kind: Pod
metadata:
labels:
app: loki-canary
name: loki-canary
name: loki-canary
spec:
containers:
- args:
- -addr=loki:3100
image: grafana/loki-canary:latest
imagePullPolicy: IfNotPresent
name: loki-canary
resources: {}
---
apiVersion: v1
kind: Service
metadata:
name: loki-canary
labels:
app: loki-canary
spec:
type: ClusterIP
selector:
app: loki-canary
ports:
- name: metrics
protocol: TCP
port: 3500
targetPort: 3500

Binary file not shown.

After

Width:  |  Height:  |  Size: 89 KiB

@ -0,0 +1,29 @@
# Loki Clients
Loki supports the following official clients for sending logs:
1. [Promtail](./promtail/README.md)
2. [Docker Driver](./docker-driver/README.md)
3. [Fluentd](./fluentd.md)
## Picking a Client
While all clients can be used simultaneously to cover multiple use cases, which
client is initially picked to send logs depends on your use case.
Promtail is the client of choice when you're running Kubernetes, as you can
configure it to automatically scrape logs from pods running on the same node
that Promtail runs on. Promtail and Prometheus running together in Kubernetes
enables powerful debugging: if Prometheus and Promtail use the same labels,
users can use tools like Grafana to switch between metrics and logs based on the
label set.
Promtail is also the client of choice on bare-metal: since it can be configured
to tail logs from all files given a host path, it is the easiest way to send
logs to Loki from plain-text files (e.g., things that log to `/var/log/*.log`).
When using Docker and not Kubernetes, the Docker Logging driver should be used,
as it automatically adds labels appropriate to the running container.
The Fluentd plugin is ideal when you already have Fluentd deployed and you don't
need the service discovery capabilities of Promtail.

@ -0,0 +1,54 @@
# Docker Driver Client
Loki officially supports a Docker plugin that will read logs from Docker
containers and ship them to Loki. The plugin can be configured to send the logs
to a private Loki instance or [Grafana Cloud](https://grafana.com/oss/loki).
> Docker plugins are not yet supported on Windows; see the
> [Docker docs](https://docs.docker.com/engine/extend) for more information.
Documentation on configuring the Loki Docker Driver can be found on the
[configuration page](./configuration.md).
## Installing
The Docker plugin must be installed on each Docker host that will be running
containers you want to collect logs from.
Run the following command to install the plugin:
```bash
docker plugin install grafana/loki-docker-driver:latest --alias loki
--grant-all-permissions
```
To check installed plugins, use the `docker plugin ls` command. Plugins that
have started successfully are listed as enabled:
```
$ docker plugin ls
ID NAME DESCRIPTION ENABLED
ac720b8fcfdb loki Loki Logging Driver true
```
Once the plugin is installed it can be [configured](./configuration.md).
## Upgrading
The upgrade process involves disabling the existing plugin, upgrading, and then
re-enabling:
```bash
docker plugin disable loki
docker plugin upgrade loki grafana/loki-docker-driver:master
docker plugin enable loki
```
## Uninstalling
To cleanly uninstall the plugin, disable and remove it:
```bash
docker plugin disable loki
docker plugin rm loki
```

@ -0,0 +1,152 @@
# Configuring the Docker Driver
The Docker daemon on each machine has a default logging driver and
each container will use the default driver unless configured otherwise.
## Change the logging driver for a container
The `docker run` command can be configured to use a different logging driver
than the Docker daemon's default with the `--log-driver` flag. Any options that
the logging driver supports can be set using the `--log-opt <NAME>=<VALUE>` flag.
`--log-opt` can be passed multiple times for each option to be set.
The following command will start Grafana in a container and send logs to Grafana
Cloud, using a batch size of 400 entries and no more than 5 retries if a send
fails.
```bash
docker run --log-driver=loki \
--log-opt loki-url="https://<user_id>:<password>@logs-us-west1.grafana.net/loki/api/v1/push" \
--log-opt loki-retries=5 \
--log-opt loki-batch-size=400 \
grafana/grafana
```
> **Note**: The Loki logging driver still uses the json-log driver in
> combination with sending logs to Loki. This is mainly useful to keep the
> `docker logs` command working. You can adjust file size and rotation
> using the respective log option `max-size` and `max-file`.
## Change the default logging driver
If you want the Loki logging driver to be the default for all containers,
change Docker's `daemon.json` file (located in `/etc/docker` on Linux) and set
the value of `log-driver` to `loki`:
```json
{
"debug": true,
"log-driver": "loki"
}
```
Options for the logging driver can also be configured with `log-opts` in the
`daemon.json`:
```json
{
"debug" : true,
"log-driver": "loki",
"log-opts": {
"loki-url": "https://<user_id>:<password>@logs-us-west1.grafana.net/loki/api/v1/push",
"loki-batch-size": "400"
}
}
```
> **Note**: log-opt configuration options in daemon.json must be provided as
> strings. Boolean and numeric values (such as the value for loki-batch-size in
> the example above) must therefore be enclosed in quotes (`"`).
After changing `daemon.json`, restart the Docker daemon for the changes to take
effect. All containers from that host will then send logs to Loki.
## Configure the logging driver for a Swarm service or Compose
You can also configure the logging driver for a [swarm
service](https://docs.docker.com/engine/swarm/how-swarm-mode-works/services/)
directly in your compose file. This also applies for `docker-compose`:
```yaml
version: "3.7"
services:
logger:
image: grafana/grafana
logging:
driver: loki
options:
loki-url: "https://<user_id>:<password>@logs-us-west1.grafana.net/loki/api/v1/push"
```
You can then deploy your stack using:
```bash
docker stack deploy my_stack_name --compose-file docker-compose.yaml
```
Or with `docker-compose`:
```bash
docker-compose -f docker-compose.yaml up
```
Once deployed, the Grafana service will send its logs to Loki.
> **Note**: stack name and service name for each swarm service and project name
> and service name for each compose service are automatically discovered and
> sent as Loki labels, this way you can filter by them in Grafana.
## Labels
By default, the Docker driver will add the following labels to each log line:
- `filename`: where the log is written to on disk
- `host`: the hostname where the log has been generated
- `container_name`: the name of the container generating logs
- `swarm_stack`, `swarm_service`: added when deploying from Docker Swarm.
Custom labels can be added using the `loki-external-labels`,
`loki-pipeline-stage-file`, `labels`, `env`, and `env-regex` options. See the
next section for all supported options.
## Supported log-opt options
The following are all supported options that the Loki logging driver supports:
| Option | Required? | Default Value | Description
| ------------------------------- | :-------: | :------------------------: | -------------------------------------- |
| `loki-url` | Yes | | Loki HTTP push endpoint.
| `loki-external-labels` | No | `container_name={{.Name}}` | Additional label value pairs separated by `,` to send with logs. The value is expanded with the [Docker tag template format](https://docs.docker.com/engine/admin/logging/log_tags/). (e.g.,: `container_name={{.ID}}.{{.Name}},cluster=prod`)
| `loki-timeout` | No | `10s` | The timeout to use when sending logs to the Loki instance. Valid time units are "ns", "us" (or "µs"), "ms", "s", "m", "h".
| `loki-batch-wait` | No | `1s` | The amount of time to wait before sending a log batch complete or not. Valid time units are "ns", "us" (or "µs"), "ms", "s", "m", "h".
| `loki-batch-size` | No | `102400` | The maximum size of a log batch to send.
| `loki-min-backoff` | No | `100ms` | The minimum amount of time to wait before retrying a batch. Valid time units are "ns", "us" (or "µs"), "ms", "s", "m", "h".
| `loki-max-backoff` | No | `10s` | The maximum amount of time to wait before retrying a batch. Valid time units are "ns", "us" (or "µs"), "ms", "s", "m", "h".
| `loki-retries` | No | `10` | The maximum amount of retries for a log batch.
| `loki-pipeline-stage-file` | No | | The location of a pipeline stage configuration file. Pipeline stages allows to parse log lines to extract more labels. [See the Promtail documentation for more info.](../promtail/pipelines.md)
| `loki-tls-ca-file` | No | | Set the path to a custom certificate authority.
| `loki-tls-cert-file` | No | | Set the path to a client certificate file.
| `loki-tls-key-file` | No | | Set the path to a client key.
| `loki-tls-server-name` | No | | Name used to validate the server certificate.
| `loki-tls-insecure-skip-verify` | No | `false` | Allow to skip tls verification.
| `loki-proxy-url` | No | | Proxy URL use to connect to Loki.
| `max-size` | No | -1 | The maximum size of the log before it is rolled. A positive integer plus a modifier representing the unit of measure (k, m, or g). Defaults to -1 (unlimited). This is used by json-log required to keep the `docker log` command working.
| `max-file` | No | 1 | The maximum number of log files that can be present. If rolling the logs creates excess files, the oldest file is removed. Only effective when max-size is also set. A positive integer. Defaults to 1.
| `labels` | No | | Comma-separated list of keys of labels, which should be included in message, if these labels are specified for container.
| `env` | No | | Comma-separated list of keys of environment variables to be included in message if they specified for a container.
| `env-regex` | No | | A regular expression to match logging-related environment variables. Used for advanced log label options. If there is collision between the label and env keys, the value of the env takes precedence. Both options add additional fields to the labels of a logging message.
## Troubleshooting
Plugin logs can be found as docker daemon log. To enable debug mode refer to the
[Docker daemon documentation](https://docs.docker.com/config/daemon/).
The standard output (`stdout`) of a plugin is redirected to Docker logs. Such
entries are prefixed with `plugin=`.
To find out the plugin ID of the Loki logging driver, use `docker plugin ls` and
look for the `loki` entry.
Depending on your system, location of Docker daemon logging may vary. Refer to
[Docker documentation for Docker daemon](https://docs.docker.com/config/daemon/)
log location for your specific platform.

@ -0,0 +1,179 @@
# Fluentd
Loki has a [Fluentd](https://fluentd.org/) output plugin called
`fluent-plugin-grafana-loki` that enables shipping logs to a private Loki
instance or [Grafana Cloud](https://grafana.com/oss/loki).
The plugin offers two line formats and uses protobuf to send compressed data to
Loki.
Key features:
* `extra_labels`: Labels to be added to every line of a log file, useful for
designating environments
* `label_keys`: Customizable list of keys for stream labels
* `line_format`: Format to use when flattening the record to a log line (`json`
or `key_value`).
## Installation
```bash
$ gem install fluent-plugin-grafana-loki
```
## Usage
In your Fluentd configuration, use `@type loki`. Additional configuration is
optional. Default values look like this:
```
<match **>
@type loki
url "https://logs-us-west1.grafana.net"
username "#{ENV['LOKI_USERNAME']}"
password "#{ENV['LOKI_PASSWORD']}"
extra_labels {"env":"dev"}
flush_interval 10s
flush_at_shutdown true
buffer_chunk_limit 1m
</match>
```
### Multi-worker usage
Loki doesn't currently support out-of-order inserts - if you try to insert a log
entry with an earlier timestamp after a log entry with with identical labels but
a later timestamp, the insert will fail with the message
`HTTP status code: 500, message: rpc error: code = Unknown desc = Entry out of
order`. Therefore, in order to use this plugin in a multi-worker Fluentd setup,
you'll need to include the worker ID in the labels sent to Loki.
For example, using
[fluent-plugin-record-modifier](https://github.com/repeatedly/fluent-plugin-record-modifier):
```
<filter mytag>
@type record_modifier
<record>
fluentd_worker "#{worker_id}"
</record>
</filter>
<match mytag>
@type loki
# ...
label_keys "fluentd_worker"
# ...
</match>
```
## Docker Image
There is a Docker image `grafana/fluent-plugin-grafana-loki:master` which
contains default configuration files to git log information
a host's `/var/log` directory, and from the host's journald. To use it, you can set
the `LOKI_URL`, `LOKI_USERNAME`, and `LOKI_PASSWORD` environment variables
(`LOKI_USERNAME` and `LOKI_PASSWORD` can be left blank if Loki is not protected
behind an authenticating proxy).
An example Docker Swarm Compose configuration looks like:
```yaml
services:
fluentd:
image: grafana/fluent-plugin-grafana-loki:master
command:
- "fluentd"
- "-v"
- "-p"
- "/fluentd/plugins"
environment:
LOKI_URL: http://loki:3100
LOKI_USERNAME:
LOKI_PASSWORD:
deploy:
mode: global
configs:
- source: loki_config
target: /fluentd/etc/loki/loki.conf
networks:
- loki
volumes:
- host_logs:/var/log
# Needed for journald log ingestion:
- /etc/machine-id:/etc/machine-id
- /dev/log:/dev/log
- /var/run/systemd/journal/:/var/run/systemd/journal/
logging:
options:
tag: infra.monitoring
```
## Configuration
### Proxy Support
Starting with version 0.8.0, this gem uses `excon`, which supports proxy with
environment variables - https://github.com/excon/excon#proxy-support
### `url`
The URL of the Loki server to send logs to. When sending data the publish path
(`/loki/api/v1/push`) will automatically be appended. By default the URL is set to
`https://logs-us-west1.grafana.net`, the URL of the Grafana Labs [hosted
Loki](https://grafana.com/loki) service.
### `username` / `password`
Specify a username and password if the Loki server requires authentication.
If using the Grafana Labs' hosted Loki, the username needs to be set to your
instanceId and the password should be a grafana.com API Key.
### `tenant`
Loki is a multi-tenant log storage platform and all requests sent must include a
tenant. For some installations (like Hosted Loki) the tenant will be set
automatically by an authenticating proxy. Otherwise you can define a tenant to
be passed through. The tenant can be any string value.
### output format
Loki is intended to index and group log streams using only a small set of
labels and is not intended for full-text indexing. When sending logs to Loki,
the majority of log message will be sent as a single log "line".
There are few configurations settings to control the output format:
- `extra_labels`: (default: nil) set of labels to include with every Loki
stream. (e.g., `{"env":"dev", "datacenter": "dc1"}`)
- `remove_keys`: (default: nil) comma separated list of record keys to
remove. All other keys will be placed into the log line.
- `label_keys`: (default: "job,instance") comma separated list of keys to use as
stream labels.
- `line_format`: format to use when flattening the record to a log line. Valid
values are `json` or `key_value`. If set to `json` the log line sent to Loki
will be the fluentd record (excluding any keys extracted out as labels) dumped
as json. If set to `key_value`, the log line will be each item in the record
concatenated together (separated by a single space) in the format
`<key>=<value>`.
- `drop_single_key`: if set to true, when the set of extracted `label_keys`
after dropping with `remove_keys`, the log line sent to Loki will just be
the value of the single remaining record key.
### Buffer options
`fluentd-plugin-loki` extends [Fluentd's builtin Output
plugin](https://docs.fluentd.org/v1.0/articles/output-plugin-overview) and uses
the `compat_parameters` plugin helper. It adds the following options:
```
buffer_type memory
flush_interval 10s
retry_limit 17
retry_wait 1.0
num_threads 1
```

@ -0,0 +1,83 @@
# Promtail
Promtail is an agent which ships the contents of local logs to a private Loki
instance or [Grafana Cloud](https://grafana.com/oss/loki). It is usually
deployed to every machine that has applications needed to be monitored.
It primarily:
1. Discovers targets
2. Attaches labels to log streams
3. Pushes them to the Loki instance.
Currently, Promtail can tail logs from two sources: local log files and the
systemd journal (on AMD64 machines only).
## Log File Discovery
Before Promtail can ship any data from log files to Loki, it needs to find out
information about its environment. Specifically, this means discovering
applications emitting log lines to files that need to be monitored.
Promtail borrows the same
[service discovery mechanism from Prometheus](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config),
although it currently only supports `static` and `kubernetes` service
discovery. This limitation is due to the fact that `promtail` is deployed as a
daemon to every local machine and, as such, does not discover label from other
machines. `kubernetes` service discovery fetches required labels from the
Kubernetes API server while `static` usually covers all other use cases.
Just like Prometheus, `promtail` is configured using a `scrape_configs` stanza.
`relabel_configs` allows for fine-grained control of what to ingest, what to
drop, and the final metadata to attach to the log line. Refer to the docs for
[configuring Promtail](configuration.md) for more details.
## Labeling and Parsing
During service discovery, metadata is determined (pod name, filename, etc.) that
may be attached to the log line as a label for easier identification when
querying logs in Loki. Through `relabel_configs`, discovered labels can be
mutated into the desired form.
To allow more sophisticated filtering afterwards, Promtail allows to set labels
not only from service discovery, but also based on the contents of each log
line. The `pipeline_stages` can be used to add or update labels, correct the
timestamp, or re-write log lines entirely. Refer to the documentation for
[pipelines](pipelines.md) for more details.
## Shipping
Once Promtail has a set of targets (i.e., things to read from, like files) and
all labels are set correctly, it will start tailing (continuously reading) the
logs from targets. Once enough data is read into memory or after a configurable
timeout, it is flushed as a single batch to Loki.
As Promtail reads data from sources (files and systemd journal, if configured),
it will track the last offset it read in a positions file. By default, the
positions file is stored at `/var/log/positions.yaml`. The positions file helps
Promtail continue reading from where it left off in the case of the Promtail
instance restarting.
## API
Promtail features an embedded web server exposing a web console at `/` and the following API endpoints:
### `GET /ready`
This endpoint returns 200 when Promtail is up and running, and there's at least one working target.
### `GET /metrics`
This endpoint returns Promtail metrics for Prometheus. See
"[Operations > Observability](../../operations/observability.md)" to get a list
of exported metrics.
### Promtail web server config
The web server exposed by Promtail can be configured in the Promtail `.yaml` config file:
```yaml
server:
http_listen_host: 127.0.0.1
http_listen_port: 9080
```

@ -0,0 +1,949 @@
# Configuring Promtail
Promtail is configured in a YAML file (usually referred to as `config.yaml`)
which contains information on the Promtail server, where positions are stored,
and how to scrape logs from files.
* [Configuration File Reference](#configuration-file-reference)
* [server_config](#server_config)
* [client_config](#client_config)
* [position_config](#position_config)
* [scrape_config](#scrape_config)
* [pipeline_stages](#pipeline_stages)
* [regex_stage](#regex_stage)
* [json_stage](#json_stage)
* [template_stage](#template_stage)
* [match_stage](#match_stage)
* [timestamp_stage](#timestamp_stage)
* [output_stage](#output_stage)
* [labels_stage](#labels_stage)
* [metrics_stage](#metrics_stage)
* [metric_counter](#metric_counter)
* [metric_gauge](#metric_gauge)
* [metric_histogram](#metric_histogram)
* [journal_config](#journal_config)
* [relabel_config](#relabel_config)
* [static_config](#static_config)
* [file_sd_config](#file_sd_config)
* [kubernetes_sd_config](#kubernetes_sd_config)
* [target_config](#target_config)
* [Example Docker Config](#example-docker-config)
* [Example Journal Config](#example-journal-config)
## Configuration File Reference
To specify which configuration file to load, pass the `-config.file` flag at the
command line. The file is written in [YAML format](https://en.wikipedia.org/wiki/YAML),
defined by the schema below. Brackets indicate that a parameter is optional. For
non-list parameters the value is set to the specified default.
For more detailed information on configuring how to discover and scrape logs from
targets, see [Scraping](scraping.md). For more information on transforming logs
from scraped targets, see [Pipelines](pipelines.md).
Generic placeholders are defined as follows:
* `<boolean>`: a boolean that can take the values `true` or `false`
* `<int>`: any integer matching the regular expression `[1-9]+[0-9]*`
* `<duration>`: a duration matching the regular expression `[0-9]+(ms|[smhdwy])`
* `<labelname>`: a string matching the regular expression `[a-zA-Z_][a-zA-Z0-9_]*`
* `<labelvalue>`: a string of Unicode characters
* `<filename>`: a valid path relative to current working directory or an
absolute path.
* `<host>`: a valid string consisting of a hostname or IP followed by an optional port number
* `<string>`: a regular string
* `<secret>`: a regular string that is a secret, such as a password
Supported contents and default values of `config.yaml`:
```yaml
# Configures the server for Promtail.
[server: <server_config>]
# Describes how promtail connects to multiple instances
# of Loki, sending logs to each.
clients:
- [<client_config>]
# Describes how to save read file offsets to disk
[positions: <position_config>]
scrape_configs:
- [<scrape_config>]
# Configures how tailed targets will be watched.
[target_config: <target_config>]
```
## server_config
The `server_config` block configures Promtail's behavior as an HTTP server:
```yaml
# HTTP server listen host
[http_listen_host: <string>]
# HTTP server listen port
[http_listen_port: <int> | default = 80]
# gRPC server listen host
[grpc_listen_host: <string>]
# gRPC server listen port
[grpc_listen_port: <int> | default = 9095]
# Register instrumentation handlers (/metrics, etc.)
[register_instrumentation: <boolean> | default = true]
# Timeout for graceful shutdowns
[graceful_shutdown_timeout: <duration> | default = 30s]
# Read timeout for HTTP server
[http_server_read_timeout: <duration> | default = 30s]
# Write timeout for HTTP server
[http_server_write_timeout: <duration> | default = 30s]
# Idle timeout for HTTP server
[http_server_idle_timeout: <duration> | default = 120s]
# Max gRPC message size that can be received
[grpc_server_max_recv_msg_size: <int> | default = 4194304]
# Max gRPC message size that can be sent
[grpc_server_max_send_msg_size: <int> | default = 4194304]
# Limit on the number of concurrent streams for gRPC calls (0 = unlimited)
[grpc_server_max_concurrent_streams: <int> | default = 100]
# Log only messages with the given severity or above. Supported values [debug,
# info, warn, error]
[log_level: <string> | default = "info"]
# Base path to server all API routes from (e.g., /v1/).
[http_path_prefix: <string>]
```
## client_config
The `client_config` block configures how Promtail connects to an instance of
Loki:
```yaml
# The URL where Loki is listening, denoted in Loki as http_listen_host and
# http_listen_port. If Loki is running in microservices mode, this is the HTTP
# URL for the Distributor.
url: <string>
# Maximum amount of time to wait before sending a batch, even if that
# batch isn't full.
[batchwait: <duration> | default = 1s]
# Maximum batch size (in bytes) of logs to accumulate before sending
# the batch to Loki.
[batchsize: <int> | default = 102400]
# If using basic auth, configures the username and password
# sent.
basic_auth:
# The username to use for basic auth
[username: <string>]
# The password to use for basic auth
[password: <string>]
# The file containing the password for basic auth
[password_file: <filename>]
# Bearer token to send to the server.
[bearer_token: <secret>]
# File containing bearer token to send to the server.
[bearer_token_file: <filename>]
# HTTP proxy server to use to connect to the server.
[proxy_url: <string>]
# If connecting to a TLS server, configures how the TLS
# authentication handshake will operate.
tls_config:
# The CA file to use to verify the server
[ca_file: <string>]
# The cert file to send to the server for client auth
[cert_file: <filename>]
# The key file to send to the server for client auth
[key_file: <filename>]
# Validates that the server name in the server's certificate
# is this value.
[server_name: <string>]
# If true, ignores the server certificate being signed by an
# unknown CA.
[insecure_skip_verify: <boolean> | default = false]
# Configures how to retry requests to Loki when a request
# fails.
backoff_config:
# Initial backoff time between retries
[minbackoff: <duration> | default = 100ms]
# Maximum backoff time between retries
[maxbackoff: <duration> | default = 10s]
# Maximum number of retries to do
[maxretries: <int> | default = 10]
# Static labels to add to all logs being sent to Loki.
# Use map like {"foo": "bar"} to add a label foo with
# value bar.
external_labels:
[ <labelname>: <labelvalue> ... ]
# Maximum time to wait for a server to respond to a request
[timeout: <duration> | default = 10s]
```
## position_config
The `position_config` block configures where Promtail will save a file
indicating how far it has read into a file. It is needed for when Promtail
is restarted to allow it to continue from where it left off.
```yaml
# Location of positions file
[filename: <string> | default = "/var/log/positions.yaml"]
# How often to update the positions file
[sync_period: <duration> | default = 10s]
```
## scrape_config
The `scrape_config` block configures how Promtail can scrape logs from a series
of targets using a specified discovery method:
```yaml
# Name to identify this scrape config in the promtail UI.
job_name: <string>
# Describes how to parse log lines. Suported values [cri docker raw]
[entry_parser: <string> | default = "docker"]
# Describes how to transform logs from targets.
[pipeline_stages: <pipeline_stages>]
# Describes how to scrape logs from the journal.
[journal: <journal_config>]
# Describes how to relabel targets to determine if they should
# be processed.
relabel_configs:
- [<relabel_config>]
# Static targets to scrape.
static_configs:
- [<static_config>]
# Files containing targets to scrape.
file_sd_configs:
- [<file_sd_configs>]
# Describes how to discover Kubernetes services running on the
# same host.
kubernetes_sd_configs:
- [<kubernetes_sd_config>]
```
### pipeline_stages
The [pipeline](./pipelines.md) stages (`pipeline_stages`) is used to transform
log entries and their labels after discovery. It is simply an array of various
stages, defined below.
The purpose of most stages is to extract fields and values into a temporary
set of key-value pairs that is passed around from stage to stage.
```yaml
- [
<regex_stage>
<json_stage> |
<template_stage> |
<match_stage> |
<timestamp_stage> |
<output_stage> |
<labels_stage> |
<metrics_stage>
]
```
Example:
```yaml
pipeline_stages:
- regex:
expr: "./*"
- json:
timestamp:
source: time
format: RFC3339
labels:
stream:
source: json_key_name.json_sub_key_name
output:
```
#### regex_stage
The Regex stage takes a regular expression and extracts captured named groups to
be used in further stages.
```yaml
regex:
# The RE2 regular expression. Each capture group must be named.
expression: <string>
# Name from extracted data to parse. If empty, uses the log message.
[source: <string>]
```
#### json_stage
The JSON stage parses a log line as JSON and takes
[JMESPath](http://jmespath.org/) expressions to extract data from the JSON to be
used in further stages.
```yaml
json:
# Set of key/value pairs of JMESPath expressions. The key will be
# the key in the extracted data while the expression will the value,
# evaluated as a JMESPath from the source data.
expressions:
[ <string>: <string> ... ]
# Name from extracted data to parse. If empty, uses the log message.
[source: <string>]
```
#### template_stage
The template stage uses Go's
[`text/template`](https://golang.org/pkg/text/template) language to manipulate
values.
```yaml
template:
# Name from extracted data to parse. If key in extract data doesn't exist, an
# entry for it will be created.
source: <string>
# Go template string to use. In additional to normal template
# functions, ToLower, ToUpper, Replace, Trim, TrimLeft, TrimRight,
# TrimPrefix, TrimSuffix, and TrimSpace are available as functions.
template: <string>
```
Example:
```yaml
template:
source: level
template: '{{ if eq .Value "WARN" }}{{ Replace .Value "WARN" "OK" -1 }}{{ else }}{{ .Value }}{{ end }}'
```
#### match_stage
The match stage conditionally executes a set of stages when a log entry matches
a configurable [LogQL](../../logql.md) stream selector.
```yaml
match:
# LogQL stream selector.
selector: <string>
# Names the pipeline. When defined, creates an additional label in
# the pipeline_duration_seconds histogram, where the value is
# concatenated with job_name using an underscore.
[pipieline_name: <string>]
# Nested set of pipeline stages only if the selector
# matches the labels of the log entries:
stages:
- [
<regex_stage>
<json_stage> |
<template_stage> |
<match_stage> |
<timestamp_stage> |
<output_stage> |
<labels_stage> |
<metrics_stage>
]
```
#### timestamp_stage
The timestamp stage parses data from the extracted map and overrides the final
time value of the log that is stored by Loki. If this stage isn't present,
Promtail will associate the timestamp of the log entry with the time that
log entry was read.
```yaml
timestamp:
# Name from extracted data to use for the timestamp.
source: <string>
# Determines how to parse the time string. Can use
# pre-defined formats by name: [ANSIC UnixDate RubyDate RFC822
# RFC822Z RFC850 RFC1123 RFC1123Z RFC3339 RFC3339Nano Unix
# UnixMs UnixNs].
format: <string>
# IANA Timezone Database string.
[location: <string>]
```
##### output_stage
The output stage takes data from the extracted map and sets the contents of the
log entry that will be stored by Loki.
```yaml
output:
# Name from extracted data to use for the log entry.
source: <string>
```
#### labels_stage
The labels stage takes data from the extracted map and sets additional labels
on the log entry that will be sent to Loki.
```yaml
labels:
# Key is REQUIRED and the name for the label that will be created.
# Value is optional and will be the name from extracted data whose value
# will be used for the value of the label. If empty, the value will be
# inferred to be the same as the key.
[ <string>: [<string>] ... ]
```
#### metrics_stage
The metrics stage allows for defining metrics from the extracted data.
Created metrics are not pushed to Loki and are instead exposed via Promtail's
`/metrics` endpoint. Prometheus should be configured to scrape Promtail to be
able to retrieve the metrics configured by this stage.
```yaml
# A map where the key is the name of the metric and the value is a specific
# metric type.
metrics:
[<string>: [ <metric_counter> | <metric_gauge> | <metric_histogram> ] ...]
```
##### metric_counter
Defines a counter metric whose value only goes up.
```yaml
# The metric type. Must be Counter.
type: Counter
# Describes the metric.
[description: <string>]
# Key from the extracted data map to use for the mtric,
# defaulting to the metric's name if not present.
[source: <string>]
config:
# Filters down source data and only changes the metric
# if the targeted value exactly matches the provided string.
# If not present, all data will match.
[value: <string>]
# Must be either "inc" or "add" (case insensitive). If
# inc is chosen, the metric value will increase by 1 for each
# log line receieved that passed the filter. If add is chosen,
# the extracted value most be convertible to a positive float
# and its value will be added to the metric.
action: <string>
```
##### metric_gauge
Defines a gauge metric whose value can go up or down.
```yaml
# The metric type. Must be Gauge.
type: Gauge
# Describes the metric.
[description: <string>]
# Key from the extracted data map to use for the mtric,
# defaulting to the metric's name if not present.
[source: <string>]
config:
# Filters down source data and only changes the metric
# if the targeted value exactly matches the provided string.
# If not present, all data will match.
[value: <string>]
# Must be either "set", "inc", "dec"," add", or "sub". If
# add, set, or sub is chosen, the extracted value must be
# convertible to a positive float. inc and dec will increment
# or decrement the metric's value by 1 respectively.
action: <string>
```
##### metric_histogram
Defines a histogram metric whose values are bucketed.
```yaml
# The metric type. Must be Histogram.
type: Histogram
# Describes the metric.
[description: <string>]
# Key from the extracted data map to use for the mtric,
# defaulting to the metric's name if not present.
[source: <string>]
config:
# Filters down source data and only changes the metric
# if the targeted value exactly matches the provided string.
# If not present, all data will match.
[value: <string>]
# Must be either "inc" or "add" (case insensitive). If
# inc is chosen, the metric value will increase by 1 for each
# log line receieved that passed the filter. If add is chosen,
# the extracted value most be convertible to a positive float
# and its value will be added to the metric.
action: <string>
# Holds all the numbers in which to bucket the metric.
buckets:
- <int>
```
### journal_config
The `journal_config` block configures reading from the systemd journal from
Promtail. Requires a build of Promtail that has journal support _enabled_. If
using the AMD64 Docker image, this is enabled by default.
```yaml
# The oldest relative time from process start that will be read
# and sent to Loki.
[max_age: <duration> | default = 7h]
# Label map to add to every log coming out of the journal
labels:
[ <labelname>: <labelvalue> ... ]
# Path to a directory to read entries from. Defaults to system
# path when empty.
[path: <string>]
```
### relabel_config
Relabeling is a powerful tool to dynamically rewrite the label set of a target
before it gets scraped. Multiple relabeling steps can be configured per scrape
configuration. They are applied to the label set of each target in order of
their appearance in the configuration file.
After relabeling, the `instance` label is set to the value of `__address__` by
default if it was not set during relabeling. The `__scheme__` and
`__metrics_path__` labels are set to the scheme and metrics path of the target
respectively. The `__param_<name>` label is set to the value of the first passed
URL parameter called `<name>`.
Additional labels prefixed with `__meta_` may be available during the relabeling
phase. They are set by the service discovery mechanism that provided the target
and vary between mechanisms.
Labels starting with `__` will be removed from the label set after target
relabeling is completed.
If a relabeling step needs to store a label value only temporarily (as the
input to a subsequent relabeling step), use the `__tmp` label name prefix. This
prefix is guaranteed to never be used by Prometheus itself.
```yaml
# The source labels select values from existing labels. Their content is concatenated
# using the configured separator and matched against the configured regular expression
# for the replace, keep, and drop actions.
[ source_labels: '[' <labelname> [, ...] ']' ]
# Separator placed between concatenated source label values.
[ separator: <string> | default = ; ]
# Label to which the resulting value is written in a replace action.
# It is mandatory for replace actions. Regex capture groups are available.
[ target_label: <labelname> ]
# Regular expression against which the extracted value is matched.
[ regex: <regex> | default = (.*) ]
# Modulus to take of the hash of the source label values.
[ modulus: <uint64> ]
# Replacement value against which a regex replace is performed if the
# regular expression matches. Regex capture groups are available.
[ replacement: <string> | default = $1 ]
# Action to perform based on regex matching.
[ action: <relabel_action> | default = replace ]
```
`<regex>` is any valid
[RE2 regular expression](https://github.com/google/re2/wiki/Syntax). It is
required for the `replace`, `keep`, `drop`, `labelmap`,`labeldrop` and
`labelkeep` actions. The regex is anchored on both ends. To un-anchor the regex,
use `.*<regex>.*`.
`<relabel_action>` determines the relabeling action to take:
* `replace`: Match `regex` against the concatenated `source_labels`. Then, set
`target_label` to `replacement`, with match group references
(`${1}`, `${2}`, ...) in `replacement` substituted by their value. If `regex`
does not match, no replacement takes place.
* `keep`: Drop targets for which `regex` does not match the concatenated `source_labels`.
* `drop`: Drop targets for which `regex` matches the concatenated `source_labels`.
* `hashmod`: Set `target_label` to the `modulus` of a hash of the concatenated `source_labels`.
* `labelmap`: Match `regex` against all label names. Then copy the values of the matching labels
to label names given by `replacement` with match group references
(`${1}`, `${2}`, ...) in `replacement` substituted by their value.
* `labeldrop`: Match `regex` against all label names. Any label that matches will be
removed from the set of labels.
* `labelkeep`: Match `regex` against all label names. Any label that does not match will be
removed from the set of labels.
Care must be taken with `labeldrop` and `labelkeep` to ensure that logs are
still uniquely labeled once the labels are removed.
### static_config
A `static_config` allows specifying a list of targets and a common label set
for them. It is the canonical way to specify static targets in a scrape
configuration.
```yaml
# Configures the discovery to look on the current machine. Must be either
# localhost or the hostname of the current computer.
targets:
- localhost
# Defines a file to scrape and an optional set of additional labels to apply to
# all streams defined by the files from __path__.
labels:
# The path to load logs from. Can use glob patterns (e.g., /var/log/*.log).
__path__: <string>
# Additional labels to assign to the logs
[ <labelname>: <labelvalue> ... ]
```
### file_sd_config
File-based service discovery provides a more generic way to configure static
targets and serves as an interface to plug in custom service discovery
mechanisms.
It reads a set of files containing a list of zero or more
`<static_config>`s. Changes to all defined files are detected via disk watches
and applied immediately. Files may be provided in YAML or JSON format. Only
changes resulting in well-formed target groups are applied.
The JSON file must contain a list of static configs, using this format:
```yaml
[
{
"targets": [ "localhost" ],
"labels": {
"__path__": "<string>", ...
"<labelname>": "<labelvalue>", ...
}
},
...
]
```
As a fallback, the file contents are also re-read periodically at the specified
refresh interval.
Each target has a meta label `__meta_filepath` during the
[relabeling phase](#relabel_config). Its value is set to the
filepath from which the target was extracted.
```yaml
# Patterns for files from which target groups are extracted.
files:
[ - <filename_pattern> ... ]
# Refresh interval to re-read the files.
[ refresh_interval: <duration> | default = 5m ]
```
Where `<filename_pattern>` may be a path ending in `.json`, `.yml` or `.yaml`.
The last path segment may contain a single `*` that matches any character
sequence, e.g. `my/path/tg_*.json`.
### kubernetes_sd_config
Kubernetes SD configurations allow retrieving scrape targets from
[Kubernetes'](https://kubernetes.io/) REST API and always staying synchronized
with the cluster state.
One of the following `role` types can be configured to discover targets:
#### `node`
The `node` role discovers one target per cluster node with the address
defaulting to the Kubelet's HTTP port.
The target address defaults to the first existing address of the Kubernetes
node object in the address type order of `NodeInternalIP`, `NodeExternalIP`,
`NodeLegacyHostIP`, and `NodeHostName`.
Available meta labels:
* `__meta_kubernetes_node_name`: The name of the node object.
* `__meta_kubernetes_node_label_<labelname>`: Each label from the node object.
* `__meta_kubernetes_node_labelpresent_<labelname>`: `true` for each label from the node object.
* `__meta_kubernetes_node_annotation_<annotationname>`: Each annotation from the node object.
* `__meta_kubernetes_node_annotationpresent_<annotationname>`: `true` for each annotation from the node object.
* `__meta_kubernetes_node_address_<address_type>`: The first address for each node address type, if it exists.
In addition, the `instance` label for the node will be set to the node name
as retrieved from the API server.
#### `service`
The `service` role discovers a target for each service port of each service.
This is generally useful for blackbox monitoring of a service.
The address will be set to the Kubernetes DNS name of the service and respective
service port.
Available meta labels:
* `__meta_kubernetes_namespace`: The namespace of the service object.
* `__meta_kubernetes_service_annotation_<annotationname>`: Each annotation from the service object.
* `__meta_kubernetes_service_annotationpresent_<annotationname>`: "true" for each annotation of the service object.
* `__meta_kubernetes_service_cluster_ip`: The cluster IP address of the service. (Does not apply to services of type ExternalName)
* `__meta_kubernetes_service_external_name`: The DNS name of the service. (Applies to services of type ExternalName)
* `__meta_kubernetes_service_label_<labelname>`: Each label from the service object.
* `__meta_kubernetes_service_labelpresent_<labelname>`: `true` for each label of the service object.
* `__meta_kubernetes_service_name`: The name of the service object.
* `__meta_kubernetes_service_port_name`: Name of the service port for the target.
* `__meta_kubernetes_service_port_protocol`: Protocol of the service port for the target.
#### `pod`
The `pod` role discovers all pods and exposes their containers as targets. For
each declared port of a container, a single target is generated. If a container
has no specified ports, a port-free target per container is created for manually
adding a port via relabeling.
Available meta labels:
* `__meta_kubernetes_namespace`: The namespace of the pod object.
* `__meta_kubernetes_pod_name`: The name of the pod object.
* `__meta_kubernetes_pod_ip`: The pod IP of the pod object.
* `__meta_kubernetes_pod_label_<labelname>`: Each label from the pod object.
* `__meta_kubernetes_pod_labelpresent_<labelname>`: `true`for each label from the pod object.
* `__meta_kubernetes_pod_annotation_<annotationname>`: Each annotation from the pod object.
* `__meta_kubernetes_pod_annotationpresent_<annotationname>`: `true` for each annotation from the pod object.
* `__meta_kubernetes_pod_container_init`: `true` if the container is an [InitContainer](https://kubernetes.io/docs/concepts/workloads/pods/init-containers/)
* `__meta_kubernetes_pod_container_name`: Name of the container the target address points to.
* `__meta_kubernetes_pod_container_port_name`: Name of the container port.
* `__meta_kubernetes_pod_container_port_number`: Number of the container port.
* `__meta_kubernetes_pod_container_port_protocol`: Protocol of the container port.
* `__meta_kubernetes_pod_ready`: Set to `true` or `false` for the pod's ready state.
* `__meta_kubernetes_pod_phase`: Set to `Pending`, `Running`, `Succeeded`, `Failed` or `Unknown`
in the [lifecycle](https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-phase).
* `__meta_kubernetes_pod_node_name`: The name of the node the pod is scheduled onto.
* `__meta_kubernetes_pod_host_ip`: The current host IP of the pod object.
* `__meta_kubernetes_pod_uid`: The UID of the pod object.
* `__meta_kubernetes_pod_controller_kind`: Object kind of the pod controller.
* `__meta_kubernetes_pod_controller_name`: Name of the pod controller.
#### `endpoints`
The `endpoints` role discovers targets from listed endpoints of a service. For
each endpoint address one target is discovered per port. If the endpoint is
backed by a pod, all additional container ports of the pod, not bound to an
endpoint port, are discovered as targets as well.
Available meta labels:
* `__meta_kubernetes_namespace`: The namespace of the endpoints object.
* `__meta_kubernetes_endpoints_name`: The names of the endpoints object.
* For all targets discovered directly from the endpoints list (those not additionally inferred
from underlying pods), the following labels are attached:
* `__meta_kubernetes_endpoint_hostname`: Hostname of the endpoint.
* `__meta_kubernetes_endpoint_node_name`: Name of the node hosting the endpoint.
* `__meta_kubernetes_endpoint_ready`: Set to `true` or `false` for the endpoint's ready state.
* `__meta_kubernetes_endpoint_port_name`: Name of the endpoint port.
* `__meta_kubernetes_endpoint_port_protocol`: Protocol of the endpoint port.
* `__meta_kubernetes_endpoint_address_target_kind`: Kind of the endpoint address target.
* `__meta_kubernetes_endpoint_address_target_name`: Name of the endpoint address target.
* If the endpoints belong to a service, all labels of the `role: service` discovery are attached.
* For all targets backed by a pod, all labels of the `role: pod` discovery are attached.
#### `ingress`
The `ingress` role discovers a target for each path of each ingress.
This is generally useful for blackbox monitoring of an ingress.
The address will be set to the host specified in the ingress spec.
Available meta labels:
* `__meta_kubernetes_namespace`: The namespace of the ingress object.
* `__meta_kubernetes_ingress_name`: The name of the ingress object.
* `__meta_kubernetes_ingress_label_<labelname>`: Each label from the ingress object.
* `__meta_kubernetes_ingress_labelpresent_<labelname>`: `true` for each label from the ingress object.
* `__meta_kubernetes_ingress_annotation_<annotationname>`: Each annotation from the ingress object.
* `__meta_kubernetes_ingress_annotationpresent_<annotationname>`: `true` for each annotation from the ingress object.
* `__meta_kubernetes_ingress_scheme`: Protocol scheme of ingress, `https` if TLS
config is set. Defaults to `http`.
* `__meta_kubernetes_ingress_path`: Path from ingress spec. Defaults to `/`.
See below for the configuration options for Kubernetes discovery:
```yaml
# The information to access the Kubernetes API.
# The API server addresses. If left empty, Prometheus is assumed to run inside
# of the cluster and will discover API servers automatically and use the pod's
# CA certificate and bearer token file at /var/run/secrets/kubernetes.io/serviceaccount/.
[ api_server: <host> ]
# The Kubernetes role of entities that should be discovered.
role: <role>
# Optional authentication information used to authenticate to the API server.
# Note that `basic_auth`, `bearer_token` and `bearer_token_file` options are
# mutually exclusive.
# password and password_file are mutually exclusive.
# Optional HTTP basic authentication information.
basic_auth:
[ username: <string> ]
[ password: <secret> ]
[ password_file: <string> ]
# Optional bearer token authentication information.
[ bearer_token: <secret> ]
# Optional bearer token file authentication information.
[ bearer_token_file: <filename> ]
# Optional proxy URL.
[ proxy_url: <string> ]
# TLS configuration.
tls_config:
[ <tls_config> ]
# Optional namespace discovery. If omitted, all namespaces are used.
namespaces:
names:
[ - <string> ]
```
Where `<role>` must be `endpoints`, `service`, `pod`, `node`, or
`ingress`.
See
[this example Prometheus configuration file](/documentation/examples/prometheus-kubernetes.yml)
for a detailed example of configuring Prometheus for Kubernetes.
You may wish to check out the 3rd party
[Prometheus Operator](https://github.com/coreos/prometheus-operator),
which automates the Prometheus setup on top of Kubernetes.
## target_config
The `target_config` block controls the behavior of reading files from discovered
targets.
```yaml
# Period to resync directories being watched and files being tailed to discover
# new ones or stop watching removed ones.
sync_period: "10s"
```
## Example Docker Config
```yaml
server:
http_listen_port: 9080
grpc_listen_port: 0
positions:
filename: /tmp/positions.yaml
client:
url: http://ip_or_hostname_where_Loki_run:3100/loki/api/v1/push
scrape_configs:
- job_name: system
pipeline_stages:
- docker:
static_configs:
- targets:
- localhost
labels:
job: varlogs
host: yourhost
__path__: /var/log/*.log
- job_name: someone_service
pipeline_stages:
- docker:
static_configs:
- targets:
- localhost
labels:
job: someone_service
host: yourhost
__path__: /srv/log/someone_service/*.log
```
## Example Journal Config
```yaml
server:
http_listen_port: 9080
grpc_listen_port: 0
positions:
filename: /tmp/positions.yaml
clients:
- url: http://ip_or_hostname_where_loki_runns:3100/loki/api/v1/push
scrape_configs:
- job_name: journal
journal:
max_age: 12h
path: /var/log/journal
labels:
job: systemd-journal
relabel_configs:
- source_labels: ['__journal__systemd_unit']
target_label: 'unit'
```

@ -1,41 +1,63 @@
# Installation
Promtail is distributed in binary and in container form.
# Installing Promtail
Once it is installed, you have basically two options for operating it:
Either as a daemon sitting on every node, or as a sidecar for the application.
Promtail is distributed as a [binary](#binary), [Docker container](#docker), and
[Helm chart](#helm).
This usually only depends on the configuration though.
## Binary
Every release includes binaries:
Every Loki release includes binaries for Promtail:
```bash
# download a binary (adapt app, os and arch as needed)
# installs v0.2.0. Go to the releases page for up to date URLs
$ curl -fSL -o "/usr/local/bin/promtail.gz" "https://github.com/grafana/promtail/releases/download/v0.2.0/promtail-linux-amd64.gz"
# download a binary (modify app, os, and arch as needed)
# Installs v0.3.0. Go to the releases page for the latest version
$ curl -fSL -o "/usr/local/bin/promtail.gz" "https://github.com/grafana/loki/releases/download/v0.3.0/promtail_linux_amd64.gz"
$ gunzip "/usr/local/bin/promtail.gz"
# make sure it is executable
$ chmod a+x "/usr/local/bin/promtail"
```
Binaries for macOS and Windows are also provided at the [releases page](https://github.com/grafana/loki/releases).
Binaries for macOS and Windows are also provided at the
[Releases page](https://github.com/grafana/loki/releases).
## Docker
```bash
# modify tag to most recent version
$ docker pull grafana/promtail:v0.3.0
```
## Helm
Make sure that Helm is
[installed](https://helm.sh/docs/using_helm/#installing-helm) and
[deployed](https://helm.sh/docs/using_helm/#installing-tiller) to your cluster.
Then you can add Loki's chart repository to Helm:
```bash
$ helm repo add loki https://grafana.github.io/loki/charts
```
And the chart repository can be updated by running:
```bash
# adapt tag to most recent version
$ docker pull grafana/promtail:v0.2.0
$ helm repo update
```
Finally, Promtail can be deployed with:
```bash
$ helm upgrade --install promtail loki/promtail --set "loki.serviceName=loki"
```
## Kubernetes
On Kubernetes, you will use the Docker container above. However, you have too
choose whether you want to run in daemon mode (`DaemonSet`) or sidecar mode
(`Pod container`) in before.
### Daemonset method (Recommended)
A `DaemonSet` will deploy `promtail` on every node within the Kubernetes cluster.
### DaemonSet (Recommended)
A `DaemonSet` will deploy `promtail` on every node within a Kubernetes cluster.
This deployment is great to collect the logs of all containers within the
cluster. It is the best solution for a single tenant.
The DaemonSet deployment is great to collect the logs of all containers within a
cluster. It's the best solution for a single-tenant model.
```yaml
---Daemonset.yaml
@ -111,10 +133,11 @@ roleRef:
apiGroup: rbac.authorization.k8s.io
```
### Sidecar Method
This method will deploy `promtail` as a sidecar container within a pod.
In a multi-tenant environment, this enables teams to aggregate logs
for specific pods and deployments for example for all pods in a namespace.
### Sidecar
The Sidecar method deploys `promtail` as a sidecar container for a specific pod.
In a multi-tenant environment, this enables teams to aggregate logs for specific
pods and deployments.
```yaml
---Deployment.yaml
@ -146,5 +169,4 @@ spec:
mountPath: /etc/promtail
...
...
```

@ -0,0 +1,208 @@
# Pipelines
A detailed look at how to setup Promtail to process your log lines, including
extracting metrics and labels.
## Pipeline
A pipeline is used to transform a single log line, its labels, and its
timestamp. A pipeline is comprised of a set of **stages**. There are 4 types of
stages:
1. **Parsing stages** parse the current log line and extract data out of it. The
extracted data is then available for use by other stages.
2. **Transform stages** transform extracted data from previous stages.
3. **Action stages** take extracted data from previous stages and do something
with them. Actions can:
1. Add or modify existing labels to the log line
2. Change the timestamp of the log line
3. Change the content of the log line
4. Create a metric based on the extracted data
4. **Filtering stages** optionally apply a subset of stages based on some
condition.
Typical pipelines will start with a parsing stage (such as a
[regex](./stages/regex.md) or [json](./stages/json.md) stage) to extract data
from the log line. Then, a series of action stages will be present to do
something with that extract data. The most common action stage will be a
[labels](./stages/labels.md) stage to turn extracted data into a label.
A common stage will also be the [match](./stages/match.md) stage to selectively
apply stages based on the current labels.
Note that pipelines can not currently be used to deduplicate logs; Loki will
receive the same log line multiple times if, for example:
1. Two scrape configs read from the same file
2. Duplicate log lines in a file are sent through a pipeline. Deduplication is
not done.
However, Loki will perform some deduplication at query time for logs that have
the exact same nanosecond timestamp, labels, and log contents.
This documented example gives a good glimpse of what you can achieve with a
pipeline:
```yaml
scrape_configs:
- job_name: kubernetes-pods-name
kubernetes_sd_configs: ....
pipeline_stages:
# This stage is only going to run if the scraped target has a label
# of "name" with value "promtail".
- match:
selector: '{name="promtail"}'
stages:
# The regex stage parses out a level, timestamp, and component. At the end
# of the stage, the values for level, timestamp, and component are only
# set internally for the pipeline. Future stages can use these values and
# decide what to do with them.
- regex:
expression: '.*level=(?P<level>[a-zA-Z]+).*ts=(?P<timestamp>[T\d-:.Z]*).*component=(?P<component>[a-zA-Z]+)'
# The labels stage takes the level and component entries from the previous
# regex stage and promotes them to a label. For example, level=error may
# be a label added by this stage.
- labels:
level:
component:
# Finally, the timestamp stage takes the timestamp extracted from the
# regex stage and promotes it to be the new timestamp of the log entry,
# parsing it as an RFC3339Nano-formatted value.
- timestamp:
format: RFC3339Nano
source: timestamp
# This stage is only going to run if the scraped target has a label of
# "name" with a value of "nginx".
- match:
selector: '{name="nginx"}'
stages:
# This regex stage extracts a new output by matching against some
# values and capturing the rest.
- regex:
expression: \w{1,3}.\w{1,3}.\w{1,3}.\w{1,3}(?P<output>.*)
# The output stage changes the content of the captured log line by
# setting it to the value of output from the regex stage.
- output:
source: output
# This stage is only going to run if the scraped target has a label of
# "name" with a value of "jaeger-agent".
- match:
selector: '{name="jaeger-agent"}'
stages:
# The JSON stage reads the log line as a JSON string and extracts
# the "level" field from the object for use in further stages.
- json:
expressions:
level: level
# The labels stage pulls the value from "level" that was extracted
# from the previous stage and promotes it to a label.
- labels:
level:
- job_name: kubernetes-pods-app
kubernetes_sd_configs: ....
pipeline_stages:
# This stage will only run if the scraped target has a label of "app"
# with a name of *either* grafana or prometheus.
- match:
selector: '{app=~"grafana|prometheus"}'
stages:
# The regex stage will extract a level and component for use in further
# stages, allowing the level to be defined as either lvl=<level> or
# level=<level> and the component to be defined as either
# logger=<component> or component=<component>
- regex:
expression: ".*(lvl|level)=(?P<level>[a-zA-Z]+).*(logger|component)=(?P<component>[a-zA-Z]+)"
# The labels stage then promotes the level and component extracted from
# the regex stage to labels.
- labels:
level:
component:
# This stage will only run if the scraped target has a label of "app"
# and a value of "some-app".
- match:
selector: '{app="some-app"}'
stages:
# The regex stage tries to extract a Go panic by looking for panic:
# in the log message.
- regex:
expression: ".*(?P<panic>panic: .*)"
# The metrics stage is going to increment a panic_total metric counter
# which promtail exposes. The counter is only incremented when panic
# was extracted from the regex stage.
- metrics:
- panic_total:
type: Counter
description: "total count of panic"
source: panic
config:
action: inc
```
### Data Accessible to Stages
The following sections further describe the types that are accessible to each
stage (although not all may be used):
##### Label Set
The current set of labels for the log line. Initialized to be the set of labels
that were scraped along with the log line. The label set is only modified by an
action stage, but filtering stages read from it.
The final label set will be index by Loki and can be used for queries.
##### Extracted Map
A collection of key-value pairs extracted during a parsing stage. Subsequent
stages operate on the extracted map, either transforming them or taking action
with them. At the end of a pipeline, the extracted map is discarded; for a
parsing stage to be useful, it must always be paired with at least one action
stage.
##### Log Timestamp
The current timestamp for the log line. Action stages can modify this value.
If left unset, it defaults to the time when the log was scraped.
The final value for the timestamp is sent to Loki.
##### Log Line
The current log line, represented as text. Initialized to be the text that
promtail scraped. Action stages can modify this value.
The final value for the log line is sent to Loki as the text content for the
given log entry.
## Stages
Parsing stages:
* [regex](./stages/regex.md): Extract data using a regular expression.
* [json](./stages/json.md): Extract data by parsing the log line as JSON.
Transform stages:
* [template](./stages/template.md): Use Go templates to modify extracted data.
Action stages:
* [timestamp](./stages/timestamp.md): Set the timestamp value for the log entry.
* [output](./stages/output.md): Set the log line text.
* [labels](./stages/labels.md): Update the label set for the log entry.
* [metrics](./stages/metrics.md): Calculate metrics based on extracted data.
Filtering stages:
* [match](./stages/match.md): Conditionally run stages based on the label set.

@ -0,0 +1,143 @@
# Promtail Scraping (Service Discovery)
## File Target Discovery
Promtail discovers locations of log files and extract labels from them through
the `scrape_configs` section in the config YAML. The syntax is identical to what
[Prometheus uses](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config).
`scrape_configs` contains one or more entries which are executed for each
discovered target (i.e., each container in each new pod running in the
instance):
```
scrape_configs:
- job_name: local
static_configs:
- ...
- job_name: kubernetes
kubernetes_sd_config:
- ...
```
If more than one scrape config section matches your logs, you will get duplicate
entries as the logs are sent in different streams likely with slightly
different labels.
There are different types of labels present in Promtail:
* Labels starting with `__` (two underscores) are internal labels. They usually
come from dynamic sources like service discovery. Once relabeling is done,
they are removed from the label set. To persist internal labels so they're
sent to Loki, rename them so they don't start with `__`. See
[Relabeling](#relabeling) for more information.
* Labels starting with `__meta_kubernetes_pod_label_*` are "meta labels" which
are generated based on your Kubernetes pod's labels.
For example, if your Kubernetes pod has a label `name` set to `foobar`, then
the `scrape_configs` section will receive an internal label
`__meta_kubernetes_pod_label_name` with a value set to `foobar`.
* Other labels starting with `__meta_kubernetes_*` exist based on other
Kubernetes metadata, such as the namespace of the pod
(`__meta_kubernetes_namespace`) or the name of the container inside the pod
(`__meta_kubernetes_pod_container_name`). Refer to
[the Prometheus docs](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config)
for the full list of Kubernetes meta labels.
* The `__path__` label is a special label which Promtail uses after discovery to
figure out where the file to read is located. Wildcards are allowed.
* The label `filename` is added for every file found in `__path__` to ensure the
uniqueness of the streams. It is set to the absolute path of the file the line
was read from.
### Kubernetes Discovery
Note that while Promtail can utilize the Kubernetes API to discover pods as
targets, it can only read log files from pods that are running on the same node
as the one Promtail is running on. Promtail looks for a `__host__` label on
each target and validates that it is set to the same hostname as Promtail's
(using either `$HOSTNAME` or the hostname reported by the kernel if the
environment variable is not set).
This means that any time Kubernetes service discovery is used, there must be a
`relabel_config` that creates the intermediate label `__host__` from
`__meta_kubernetes_pod_node_name`:
```yaml
relabel_configs:
- source_labels: ['__meta_kubernetes_pod_node_name']
target_label: '__host__'
```
See [Relabeling](#relabeling) for more information.
## Relabeling
Each `scrape_configs` entry can contain a `relabel_configs` stanza.
`relabel_configs` is a list of operations to transform the labels from discovery
into another form.
A single entry in `relabel_configs` can also reject targets by doing an `action:
drop` if a label value matches a specified regex. When a target is dropped, the
owning `scrape_config` will not process logs from that particular source.
Other `scrape_configs` without the drop action reading from the same target
may still use and forward logs from it to Loki.
A common use case of `relabel_configs` is to transform an internal label such
as `__meta_kubernetes_*` into an intermediate internal label such as
`__service__`. The intermediate internal label may then be dropped based on
value or transformed to a final external label, such as `__job__`.
### Examples
* Drop the target if a label (`__service__` in the example) is empty:
```yaml
- action: drop
regex: ^$
source_labels:
- __service__
```
* Drop the target if any of the `source_labels` contain a value:
```yaml
- action: drop
regex: .+
separator: ''
source_labels:
- __meta_kubernetes_pod_label_name
- __meta_kubernetes_pod_label_app
```
* Persist an internal label by renaming it so it will be sent to Loki:
```yaml
- action: replace
source_labels:
- __meta_kubernetes_namespace
target_label: namespace
```
* Persist all Kubernetes pod labels by mapping them, like by mapping
`__meta_kube__meta_kubernetes_pod_label_foo` to `foo`.
```yaml
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
```
Additional reading:
* [Julien Pivotto's slides from PromConf Munich, 2017](https://www.slideshare.net/roidelapluie/taking-advantage-of-prometheus-relabeling-109483749)
## HTTP client options
Promtail uses the Prometheus HTTP client implementation for all calls to Loki.
Therefore it can be configured using the `clients` stanza, where one or more
connections to Loki can be established:
```yaml
clients:
- [ <client_option> ]
```
Refer to [`client_config`](./configuration.md#client_config) from the Promtail
Configuration reference for all available options.

@ -0,0 +1,25 @@
# Stages
This section is a collection of all stages Promtail supports in a
[Pipeline](../ppipelines.md).
Parsing stages:
* [regex](./regex.md): Extract data using a regular expression.
* [json](./json.md): Extract data by parsing the log line as JSON.
Transform stages:
* [template](./template.md): Use Go templates to modify extracted data.
Action stages:
* [timestamp](./timestamp.md): Set the timestamp value for the log entry.
* [output](./output.md): Set the log line text.
* [labels](./labels.md): Update the label set for the log entry.
* [metrics](./metrics.md): Calculate metrics based on extracted data.
Filtering stages:
* [match](./match.md): Conditionally run stages based on the label set.

@ -0,0 +1,91 @@
# `json` stage
The `json` stage is a parsing stage that reads the log line as JSON and accepts
[JMESPath](http://jmespath.org/) expressions to extract data.
## Schema
```yaml
json:
# Set of key/value pairs of JMESPath expressions. The key will be
# the key in the extracted data while the expression will the value,
# evaluated as a JMESPath from the source data.
expressions:
[ <string>: <string> ... ]
# Name from extracted data to parse. If empty, uses the log message.
[source: <string>]
```
This stage uses the Go JSON unmarshaler, which means non-string types like
numbers or booleans will be unmarshaled into those types. The extracted data
can hold non-string values and this stage does not do any type conversions;
downstream stages will need to perform correct type conversion of these values
as necessary. Please refer to the [the `template` stage](./template.md) for how
to do this.
If the value extracted is a complex type, such as an array or a JSON object, it
will be converted back into a JSON string before being inserted into the
extracted data.
## Examples
### Using log line
For the given pipeline:
```yaml
- json:
expressions:
output: log
stream: stream
timestamp: time
```
Given the following log line:
```
{"log":"log message\n","stream":"stderr","time":"2019-04-30T02:12:41.8443515Z"}
```
The following key-value pairs would be created in the set of extracted data:
- `output`: `log message\n`
- `stream`: `stderr`
- `timestamp`: `2019-04-30T02:12:41.8443515`
### Using extracted data
For the given pipeline:
```yaml
- json:
expressions:
output: log
stream: stream
timestamp: time
extra:
- json:
expressions:
user:
source: extra
```
And the given log line:
```
{"log":"log message\n","stream":"stderr","time":"2019-04-30T02:12:41.8443515Z","extra":"{\"user\":\"marco\"}"}
```
The first stage would create the following key-value pairs in the set of
extracted data:
- `output`: `log message\n`
- `stream`: `stderr`
- `timestamp`: `2019-04-30T02:12:41.8443515`
- `extra`: `{"user": "marco"}`
The second stage will parse the value of `extra` from the extracted data as JSON
and append the following key-value pairs to the set of extracted data:
- `user`: `marco`

@ -0,0 +1,37 @@
# `labels` stage
The labels stage is an action stage that takes data from the extracted map and
modifies the label set that is sent to Loki with the log entry.
## Schema
```yaml
labels:
# Key is REQUIRED and the name for the label that will be created.
# Value is optional and will be the name from extracted data whose value
# will be used for the value of the label. If empty, the value will be
# inferred to be the same as the key.
[ <string>: [<string>] ... ]
```
### Examples
For the given pipeline:
```yaml
- json:
expressions:
stream: stream
- labels:
stream:
```
Given the following log line:
```
{"log":"log message\n","stream":"stderr","time":"2019-04-30T02:12:41.8443515Z"}
```
The first stage would extract `stream` into the extracted map with a value of
`stderr`. The labels stage would turn that key-value pair into a label, so the
log line sent to Loki would include the label `stream` with a value of `stderr`.

@ -0,0 +1,85 @@
# `match` stage
The match stage is a filtering stage that conditionally applies a set of stages
when a log entry matches a configurable [LogQL](../../../logql.md) stream
selector.
## Schema
```yaml
match:
# LogQL stream selector.
selector: <string>
# Names the pipeline. When defined, creates an additional label in
# the pipeline_duration_seconds histogram, where the value is
# concatenated with job_name using an underscore.
[pipieline_name: <string>]
# Nested set of pipeline stages only if the selector
# matches the labels of the log entries:
stages:
- [
<regex_stage>
<json_stage> |
<template_stage> |
<match_stage> |
<timestamp_stage> |
<output_stage> |
<labels_stage> |
<metrics_stage>
]
```
Refer to the [Promtail Configuration Reference](../configuration.md) for the
schema on the various other stages referenced here.
### Example
For the given pipeline:
```yaml
pipeline_stages:
- json:
expressions:
app:
- labels:
app:
- match:
selector: "{app=\"loki\"}"
stages:
- json:
expressions:
msg: message
- match:
pipeline_name: "app2"
selector: "{app=\"pokey\"}"
stages:
- json:
expressions:
msg: msg
- output:
source: msg
```
And the given log line:
```
{ "time":"2012-11-01T22:08:41+00:00", "app":"loki", "component": ["parser","type"], "level" : "WARN", "message" : "app1 log line" }
```
The first stage will add `app` with a value of `loki` into the extracted map,
while the second stage will add `app` as a label (again with the value of `loki`).
The third stage uses LogQL to only execute the nested stages when there is a
label of `app` whose value is `loki`. This matches in our case; the nested
`json` stage then adds `msg` into the extracted map with a value of `app1 log
line`.
The fourth stage uses LogQL to only executed the nested stages when there is a
label of `app` whose value is `pokey`. This does **not** match in our case, so
the nested `json` stage is not ran.
The final `output` stage changes the contents of the log line to be the value of
`msg` from the extracted map. In this case, the log line is changed to `app1 log
line`.

@ -0,0 +1,215 @@
# `metrics` stage
The `metrics` stage is an action stage that allows for defining and updating
metrics based on data from the extracted map. Note that created metrics are not
pushed to Loki and are instead exposed via Promtail's `/metrics` endpoint.
Prometheus should be configured to scrape Promtail to be able to retrieve the
metrics configured by this stage.
## Schema
```yaml
# A map where the key is the name of the metric and the value is a specific
# metric type.
metrics:
[<string>: [ <metric_counter> | <metric_gauge> | <metric_histogram> ] ...]
```
### metric_counter
Defines a counter metric whose value only goes up.
```yaml
# The metric type. Must be Counter.
type: Counter
# Describes the metric.
[description: <string>]
# Key from the extracted data map to use for the mtric,
# defaulting to the metric's name if not present.
[source: <string>]
config:
# Filters down source data and only changes the metric
# if the targeted value exactly matches the provided string.
# If not present, all data will match.
[value: <string>]
# Must be either "inc" or "add" (case insensitive). If
# inc is chosen, the metric value will increase by 1 for each
# log line receieved that passed the filter. If add is chosen,
# the extracted value most be convertible to a positive float
# and its value will be added to the metric.
action: <string>
```
### metric_gauge
Defines a gauge metric whose value can go up or down.
```yaml
# The metric type. Must be Gauge.
type: Gauge
# Describes the metric.
[description: <string>]
# Key from the extracted data map to use for the mtric,
# defaulting to the metric's name if not present.
[source: <string>]
config:
# Filters down source data and only changes the metric
# if the targeted value exactly matches the provided string.
# If not present, all data will match.
[value: <string>]
# Must be either "set", "inc", "dec"," add", or "sub". If
# add, set, or sub is chosen, the extracted value must be
# convertible to a positive float. inc and dec will increment
# or decrement the metric's value by 1 respectively.
action: <string>
```
### metric_histogram
Defines a histogram metric whose values are bucketed.
```yaml
# The metric type. Must be Histogram.
type: Histogram
# Describes the metric.
[description: <string>]
# Key from the extracted data map to use for the mtric,
# defaulting to the metric's name if not present.
[source: <string>]
config:
# Filters down source data and only changes the metric
# if the targeted value exactly matches the provided string.
# If not present, all data will match.
[value: <string>]
# Must be either "inc" or "add" (case insensitive). If
# inc is chosen, the metric value will increase by 1 for each
# log line receieved that passed the filter. If add is chosen,
# the extracted value most be convertible to a positive float
# and its value will be added to the metric.
action: <string>
# Holds all the numbers in which to bucket the metric.
buckets:
- <int>
```
## Examples
### Counter
```yaml
- metrics:
log_lines_total:
type: Counter
description: "total number of log lines"
source: time
config:
action: inc
```
This pipeline creates a `log_lines_total` counter that increments whenever the
extracted map contains a key for `time`. Since every log entry has a timestamp,
this is a good field to use to count every line. Notice that `value` is not
defined in the `config` section as we want to count every line and don't need to
filter the value. Similarly, `inc` is used as the action because we want to
increment the counter by one rather than by using the value of `time`.
```yaml
- regex:
expression: "^.*(?P<order_success>order successful).*$"
- metrics:
succesful_orders_total:
type: Counter
description: "log lines with the message `order successful`"
source: order_success
config:
action: inc
```
This pipeline first tries to find `order successful` in the log line, extracting
it as the `order_success` field in the extracted map. The metrics stage then
creates a metric called `succesful_orders_total` whose value only increases when
`order_success` was found in the extracted map.
The result of this pipeline is a metric whose value only increases when a log
line with the text `order successful` was scraped by Promtail.
```yaml
- regex:
expression: "^.* order_status=(?P<order_status>.*?) .*$"
- metrics:
succesful_orders_total:
type: Counter
description: "successful orders"
source: order_status
config:
value: success
action: inc
failed_orders_total:
type: Counter
description: "failed orders"
source: order_status
config:
fail: fail
action: inc
```
This pipeline first tries to find text in the format `order_status=<value>` in
the log line, pulling out the `<value>` into the extracted map with the key
`order_status`.
The metric stages creates `succesful_orders_total` and `failed_orders_total`
metrics that only increment when the value of `order_status` in the extracted
map is `success` or `fail` respectively.
### Gauge
Gauge examples will be very similar to Counter examples with additional `action`
values.
```yaml
- regex:
expression: "^.* retries=(?P<retries>\d+) .*$"
- metrics:
retries_total:
type: Gauge
description: "total retries"
source: retries
config:
action: add
```
This pipeline first tries to find text in the format `retries=<value>` in the
log line, pulling out the `<value>` into the extracted map with the key
`retries`. Note that the regex only parses numbers for the value in `retries`.
The metrics stage then creates a Gauge whose current value will be added to the
number in the `retries` field from the extracted map.
### Histogram
```yaml
- metrics:
http_response_time_seconds:
type: Histogram
description: "length of each log line"
source: response_time
config:
buckets: [0.001,0.0025,0.005,0.010,0.025,0.050]
```
This pipeline creates a histogram that reads `response_time` from the extracted
map and places it into a bucket, both increasing the count of the bucket and the
sum for that particular bucket.

@ -0,0 +1,43 @@
# `output` stage
The `output` stage is an action stage that takes data from the extracted map and
changes the log line that will be sent to Loki.
## Schema
```yaml
output:
# Name from extracted data to use for the log entry.
source: <string>
```
## Example
For the given pipeline:
```yaml
- json:
expressions:
user: log
message: stream
- labels:
user:
- output:
source: content
```
And the given log line:
```
{"user": "alexis", "message": "hello, world!"}
```
Then the first stage will extract the following key-value pairs into the
extracted map:
- `user`: `alexis`
- `message`: `hello, world!`
The second stage will then add `user=alexis` to the label set for the outgoing
log line, and the final `output` stage will change the log line from the
original JSON to `hello, world!`

@ -0,0 +1,76 @@
# `regex` stage
The `regex` stage is a parsing stage that parses a log line using a regular
expression. Named capture groups in the regex support adding data into the
extracted map.
## Schema
```yaml
regex:
# The RE2 regular expression. Each capture group must be named.
expression: <string>
# Name from extracted data to parse. If empty, uses the log message.
[source: <string>]
```
`expression` needs to be a [Go RE2 regex
string](https://github.com/google/re2/wiki/Syntax). Every capture group `(re)`
will be set into the `extracted` map, every capture group **must be named:**
`(?P<name>re)`. The name of the capture group will be used as the key in the
extracted map.
## Example
### Without `source`
Given the pipeline:
```yaml
- regex:
expression: "^(?s)(?P<time>\\S+?) (?P<stream>stdout|stderr) (?P<flags>\\S+?) (?P<content>.*)$"
```
And the log line:
```
2019-01-01T01:00:00.000000001Z stderr P i'm a log message!
```
The following key-value pairs would be added to the extracted map:
- `time`: `2019-01-01T01:00:00.000000001Z`,
- `stream`: `stderr`,
- `flags`: `P`,
- `content`: `i'm a log message`
### With `source`
Given the pipeline:
```yaml
- json:
expressions:
time:
- regex:
expression: "^(?P<year>\\d+)"
source: "time"
```
And the log line:
```
{"time":"2019-01-01T01:00:00.000000001Z"}
```
The first stage would add the following key-value pairs into the `extracted`
map:
- `time`: `2019-01-01T01:00:00.000000001Z`
While the regex stage would then parse the value for `time` in the extracted map
and append the following key-value pairs back into the extracted map:
- `year`: `2019`

@ -0,0 +1,70 @@
# `template` stage
The `template` stage is a transform stage that lets use manipulate the values in
the extracted map using [Go's template
syntax](https://golang.org/pkg/text/template/).
The `template` stage is primarily useful for manipulating data from other stages
before setting them as labels, such as to replace spaces with underscores or
converting an uppercase string into a lowercase one.
The template stage can also create new keys in the extracted map.
## Schema
```yaml
template:
# Name from extracted data to parse. If key in extract data doesn't exist, an
# entry for it will be created.
source: <string>
# Go template string to use. In additional to normal template
# functions, ToLower, ToUpper, Replace, Trim, TrimLeft, TrimRight,
# TrimPrefix, TrimSuffix, and TrimSpace are available as functions.
template: <string>
```
## Examples
```yaml
- template:
source: new_key
template: 'hello world!'
```
Assuming no data has been added to the extracted map yet, this stage will first
add `new_key` with a blank value into the extracted map. Then its value will be
set to `hello world!`.
```yaml
- template:
source: app
template: '{{ .Value }}_some_suffix'
```
This pipeline takes the value of the `app` key in the existing extracted map and
appends `_some_suffix` to its value. For example, if the extracted map had a
key of `app` and a value of `loki`, this stage would modify the value from
`loki` to `loki_some_suffix`.
```yaml
- template:
source: app
template: '{{ ToLower .Value }}'
```
This pipeline takes the current value of `app` from the extracted map and
converts its value to be all lowercase. For example, if the extracted map
contained `app` with a value of `LOKI`, this pipeline would change its value to
`loki`.
```yaml
- template:
source: app
template: '{{ Replace .Value "loki" "blokey" 1 }}'
```
The template here uses Go's [`string.Replace`
function](https://golang.org/pkg/strings/#Replace). When the template executes,
the entire contents of the `app` key from the extracted map will have at most
`1` instance of `loki` changed to `blokey`.

@ -0,0 +1,81 @@
# `timestamp` stage
The `timestamp` stage is an action stage that can change the timestamp of a log
line before it is sent to Loki. When a `timestamp` stage is not present, the
timestamp of a log line defaults to the time when the log entry is scraped.
## Schema
```yaml
timestamp:
# Name from extracted data to use for the timestamp.
source: <string>
# Determines how to parse the time string. Can use
# pre-defined formats by name: [ANSIC UnixDate RubyDate RFC822
# RFC822Z RFC850 RFC1123 RFC1123Z RFC3339 RFC3339Nano Unix
# UnixMs UnixNs].
format: <string>
# IANA Timezone Database string.
[location: <string>]
```
The `format` field can be provided as an "example" of what timestamps look like
(such as `Mon Jan 02 15:04:05 -0700 2006`) but Promtail accepts using a series
of pre-defined formats to represent common forms:
- `ANSIC`: `Mon Jan _2 15:04:05 2006`
- `UnixDate`: `Mon Jan _2 15:04:05 MST 2006`
- `RubyDate`: `Mon Jan 02 15:04:05 -0700 2006`
- `RFC822`: `02 Jan 06 15:04 MST`
- `RFC822Z`: `02 Jan 06 15:04 -0700`
- `RFC850`: `Monday, 02-Jan-06 15:04:05 MST`
- `RFC1123`: `Mon, 02 Jan 2006 15:04:05 MST`
- `RFC1123Z`: `Mon, 02 Jan 2006 15:04:05 -0700`
- `RFC3339`: `2006-01-02T15:04:05-07:00`
- `RFC3339Nano`: `2006-01-02T15:04:05.999999999-07:00`
Additionally, support for common Unix timestamps is supported with the following
`format` values:
- `Unix`: `1562708916`
- `UnixMs`: `1562708916414`
- `UnixNs`: `1562708916000000123`
Custom formats are passed directly to the layout parameter in Go's
[time.Parse](https://golang.org/pkg/time/#Parse) function. If the custom format
has no year component specified, Promtail will assume that the current year
according to the system's clock should be used.
The syntax used by the custom format defines the reference date and time using
specific values for each component of the timestamp (i.e., `Mon Jan 2 15:04:05
-0700 MST 2006`). The following table shows supported reference values which
should be used in the custom format.
| Timestamp component | Format value |
| ------------------- | ------------------------------------------------------------------------------------------------------------------------------------ |
| Year | `06`, `2006` |
| Month | `1`, `01`, `Jan`, `January` |
| Day | `2`, `02`, `_2` (two digits right justified) |
| Day of the week | `Mon`, `Monday` |
| Hour | `3` (12-hour), `03` (12-hour zero prefixed), `15` (24-hour) |
| Minute | `4`, `04` |
| Second | `5`, `05` |
| Fraction of second | `.000` (ms zero prefixed), `.000000` (μs), `.000000000` (ns), `.999` (ms without trailing zeroes), `.999999` (μs), `.999999999` (ns) |
| 12-hour period | `pm`, `PM` |
| Timezone name | `MST` |
| Timezone offset | `-0700`, `-070000` (with seconds), `-07`, `07:00`, `-07:00:00` (with seconds) |
| Timezone ISO-8601 | `Z0700` (Z for UTC or time offset), `Z070000`, `Z07`, `Z07:00`, `Z07:00:00` |
## Examples
```yaml
- timestamp:
source: time
format: RFC3339Nano
```
This stage looks for a `time` field in the extracted map and reads its value in
`RFC3339Nano` form (e.g., `2006-01-02T15:04:05.999999999-07:00`). The resulting
time value will be used as the timestamp sent with the log line to Loki.

@ -0,0 +1,81 @@
# Troubleshooting Promtail
This document describes known failure modes of `promtail` on edge cases and the
adopted trade-offs.
## A tailed file is truncated while `promtail` is not running
Given the following order of events:
1. `promtail` is tailing `/app.log`
2. `promtail` current position for `/app.log` is `100` (byte offset)
3. `promtail` is stopped
4. `/app.log` is truncated and new logs are appended to it
5. `promtail` is restarted
When `promtail` is restarted, it reads the previous position (`100`) from the
positions file. Two scenarios are then possible:
- `/app.log` size is less than the position before truncating
- `/app.log` size is greater than or equal to the position before truncating
If the `/app.log` file size is less than the previous position, then the file is
detected as truncated and logs will be tailed starting from position `0`.
Otherwise, if the `/app.log` file size is greater than or equal to the previous
position, `promtail` can't detect it was truncated while not running and will
continue tailing the file from position `100`.
Generally speaking, `promtail` uses only the path to the file as key in the
positions file. Whenever `promtail` is started, for each file path referenced in
the positions file, `promtail` will read the file from the beginning if the file
size is less than the offset stored in the position file, otherwise it will
continue from the offset, regardless the file has been truncated or rolled
multiple times while `promtail` was not running.
## Loki is unavailable
For each tailing file, `promtail` reads a line, process it through the
configured `pipeline_stages` and push the log entry to Loki. Log entries are
batched together before getting pushed to Loki, based on the max batch duration
`client.batch-wait` and size `client.batch-size-bytes`, whichever comes first.
In case of any error while sending a log entries batch, `promtail` adopts a
"retry then discard" strategy:
- `promtail` retries to send log entry to the ingester up to `maxretries` times
- If all retries fail, `promtail` discards the batch of log entries (_which will
be lost_) and proceeds with the next one
You can configure the `maxretries` and the delay between two retries via the
`backoff_config` in the promtail config file:
```yaml
clients:
- url: INGESTER-URL
backoff_config:
minbackoff: 100ms
maxbackoff: 5s
maxretries: 5
```
## Log entries pushed after a `promtail` crash / panic / abruptly termination
When `promtail` shuts down gracefully, it saves the last read offsets in the
positions file, so that on a subsequent restart it will continue tailing logs
without duplicates neither losses.
In the event of a crash or abruptly termination, `promtail` can't save the last
read offsets in the positions file. When restarted, `promtail` will read the
positions file saved at the last sync period and will continue tailing the files
from there. This means that if new log entries have been read and pushed to the
ingester between the last sync period and the crash, these log entries will be
sent again to the ingester on `promtail` restart.
However, for each log stream (set of unique labels) the Loki ingester skips all
log entries received out of timestamp order. For this reason, even if duplicated
logs may be sent from `promtail` to the ingester, entries whose timestamp is
older than the latest received will be discarded to avoid having duplicated
logs. To leverage this, it's important that your `pipeline_stages` include
the `timestamp` stage, parsing the log entry timestamp from the log line instead
of relying on the default behaviour of setting the timestamp as the point in
time when the line is read by `promtail`.

@ -0,0 +1,5 @@
# Community
1. [Governance](./governance.md)
2. [Getting in Touch](./getting-in-touch.md)
3. [Contributing](./contributing.md)

@ -0,0 +1,60 @@
# Contributing to Loki
Loki uses [GitHub](https://github.com/grafana/loki) to manage reviews of pull requests:
- If you have a trivial fix or improvement, go ahead and create a pull request.
- If you plan to do something more involved, discuss your ideas on the relevant GitHub issue (creating one if it doesn't exist).
## Steps to contribute
To contribute to Loki, you must clone it into your `$GOPATH` and add your fork
as a remote.
```bash
$ git clone https://github.com/grafana/loki.git $GOPATH/src/github.com/grafana/loki
$ cd $GOPATH/src/github.com/grafana/loki
$ git remote add fork <FORK_URL>
# Make some changes!
$ git add .
$ git commit -m "docs: fix spelling error"
$ git push -u fork HEAD
# Open a PR!
```
Note that if you downloaded Loki using `go get`, the message `package github.com/grafana/loki: no Go files in /go/src/github.com/grafana/loki`
is normal and requires no actions to resolve.
### Building
While `go install ./cmd/loki` works, the preferred way to build is by using
`make`:
- `make loki`: builds Loki and outputs the binary to `./cmd/loki/loki`
- `make promtail`: builds Promtail and outputs the binary to
`./cmd/promtail/promtail`
- `make logcli`: builds LogCLI and outputs the binary to `./cmd/logcli/logcli`
- `make loki-canary`: builds Loki Canary and outputs the binary to
`./cmd/loki-canary/loki-canary`
- `make docker-driver`: builds the Loki Docker Driver and installs it into
Docker.
- `make images`: builds all Docker images (optionally suffix the previous binary
commands with `-image`, e.g., `make loki-image`).
These commands can be chained together to build multiple binaries in one go:
```bash
# Builds binaries for Loki, Promtail, and LogCLI.
$ make loki promtail logcli
```
## Contribute to the Helm Chart
Please follow the [Helm documentation](../../production/helm/README.md).

@ -0,0 +1,12 @@
# Contacting the Loki Team
If you have any questions or feedback regarding Loki:
- Ask a question on the Loki Slack channel. To invite yourself to the Grafana Slack, visit [http://slack.raintank.io/](http://slack.raintank.io/) and join the #loki channel.
- [File a GitHub issue](https://github.com/grafana/loki/issues/new) for bugs, issues and feature suggestions.
- Send an email to [lokiproject@googlegroups.com](mailto:lokiproject@googlegroups.com), or use the [web interface](https://groups.google.com/forum/#!forum/lokiproject).
Please file UI issues directly to the [Grafana repository](https://github.com/grafana/grafana/issues/new).
Your feedback is always welcome.

@ -1,3 +1,5 @@
# Loki's Governance Model
This document describes the rules and governance of the Loki project. It is meant to be followed by all the developers of the project and the community.
## Values
@ -34,7 +36,7 @@ The current team members are:
- May also review for more holistic issues, but not a requirement
- Expected to be responsive to review requests in a timely manner
- Assigned PRs to review related based on expertise
- Granted commit access to Loki repo
- Granted commit access to Loki repository
### New members
@ -62,7 +64,7 @@ Technical decisions are made informally by the team member, and [lazy consensus]
### Governance changes
Material changes to this document are discussed publicly on as an issue on the Loki repo. Any change requires a [supermajority](#supermajority-vote) in favor. Editorial changes may be made by [lazy consensus](#consensus) unless challenged.
Material changes to this document are discussed publicly on as an issue on the Loki repository. Any change requires a [supermajority](#supermajority-vote) in favor. Editorial changes may be made by [lazy consensus](#consensus) unless challenged.
### Other matters

@ -0,0 +1,833 @@
# Configuring Loki
Loki is configured in a YAML file (usually referred to as `loki.yaml`)
which contains information on the Loki server and its individual components,
depending on which mode Loki is launched in.
Configuration examples can be found in the [Configuration Examples](examples.md) document.
* [Configuration File Reference](#configuration-file-reference)
* [server_config](#server_config)
* [querier_config](#querier_config)
* [ingester_client_config](#ingester_client_config)
* [grpc_client_config](#grpc_client_config)
* [ingester_config](#ingester_config)
* [lifecycler_config](#lifecycler_config)
* [ring_config](#ring_config)
* [storage_config](#storage_config)
* [cache_config](#cache_config)
* [chunk_store_config](#chunk_store_config)
* [schema_config](#schema_config)
* [period_config](#period_config)
* [limits_config](#limits_config)
* [table_manager_config](#table_manager_config)
* [provision_config](#provision_config)
* [auto_scaling_config](#auto_scaling_config)
## Configuration File Reference
To specify which configuration file to load, pass the `-config.file` flag at the
command line. The file is written in [YAML format](https://en.wikipedia.org/wiki/YAML),
defined by the scheme below. Brackets indicate that a parameter is optional. For
non-list parameters the value is set to the specified default.
Generic placeholders are defined as follows:
* `<boolean>`: a boolean that can take the values `true` or `false`
* `<int>`: any integer matching the regular expression `[1-9]+[0-9]*`
* `<duration>`: a duration matching the regular expression `[0-9]+(ms|[smhdwy])`
* `<labelname>`: a string matching the regular expression `[a-zA-Z_][a-zA-Z0-9_]*`
* `<labelvalue>`: a string of unicode characters
* `<filename>`: a valid path relative to current working directory or an
absolute path.
* `<host>`: a valid string consisting of a hostname or IP followed by an optional port number
* `<string>`: a regular string
* `<secret>`: a regular string that is a secret, such as a password
Supported contents and default values of `loki.yaml`:
```yaml
# The module to run Loki with. Supported values
# all, querier, table-manager, ingester, distributor
[target: <string> | default = "all"]
# Enables authentication through the X-Scope-OrgID header, which must be present
# if true. If false, the OrgID will always be set to "fake".
[auth_enabled: <boolean> | default = true]
# Configures the server of the launched module(s).
[server: <server_config>]
# Configures the querier. Only appropriate when running all modules or
# just the querier.
[querier: <querier_config>]
# Configures how the distributor will connect to ingesters. Only appropriate
# when running all modules, the distributor, or the querier.
[ingester_client: <ingester_client_config>]
# Configures the ingester and how the ingester will register itself to a
# key value store.
[ingester: <ingester_config>]
# Configures where Loki will store data.
[storage_config: <storage_config>]
# Configures how Loki will store data in the specific store.
[chunk_store_config: <chunk_store_config>]
# Configures the chunk index schema and where it is stored.
[schema_config: <schema_config>]
# Configures limits per-tenant or globally
[limits_config: <limits_config>]
# Configures the table manager for retention
[table_manager: <table_manager_config>]
```
## server_config
The `server_config` block configures Promtail's behavior as an HTTP server:
```yaml
# HTTP server listen host
[http_listen_host: <string>]
# HTTP server listen port
[http_listen_port: <int> | default = 80]
# gRPC server listen host
[grpc_listen_host: <string>]
# gRPC server listen port
[grpc_listen_port: <int> | default = 9095]
# Register instrumentation handlers (/metrics, etc.)
[register_instrumentation: <boolean> | default = true]
# Timeout for graceful shutdowns
[graceful_shutdown_timeout: <duration> | default = 30s]
# Read timeout for HTTP server
[http_server_read_timeout: <duration> | default = 30s]
# Write timeout for HTTP server
[http_server_write_timeout: <duration> | default = 30s]
# Idle timeout for HTTP server
[http_server_idle_timeout: <duration> | default = 120s]
# Max gRPC message size that can be received
[grpc_server_max_recv_msg_size: <int> | default = 4194304]
# Max gRPC message size that can be sent
[grpc_server_max_send_msg_size: <int> | default = 4194304]
# Limit on the number of concurrent streams for gRPC calls (0 = unlimited)
[grpc_server_max_concurrent_streams: <int> | default = 100]
# Log only messages with the given severity or above. Supported values [debug,
# info, warn, error]
[log_level: <string> | default = "info"]
# Base path to server all API routes from (e.g., /v1/).
[http_path_prefix: <string>]
```
## querier_config
The `querier_config` block configures the Loki Querier.
```yaml
# Timeout when querying ingesters or storage during the execution of a
# query request.
[query_timeout: <duration> | default = 1m]
# Limit of the duration for which live tailing requests should be
# served.
[tail_max_duration: <duration> | default = 1h]
# Configuration options for the LogQL engine.
engine:
# Timeout for query execution
[timeout: <duration> | default = 3m]
# The maximum amount of time to look back for log lines. Only
# applicable for instant log queries.
[max_look_back_period: <duration> | default = 30s]
```
## ingester_client_config
The `ingester_client_config` block configures how connections to ingesters
operate.
```yaml
# Configures how connections are pooled
pool_config:
# Whether or not to do health checks.
[health_check_ingesters: <boolean> | default = false]
# How frequently to clean up clients for servers that have gone away after
# a health check.
[client_cleanup_period: <duration> | default = 15s]
# How quickly a dead client will be removed after it has been detected
# to disappear. Set this to a value to allow time for a secondary
# health check to recover the missing client.
[remotetimeout: <duration>]
# The remote request timeout on the client side.
[remote_timeout: <duration> | default = 5s]
# Configures how the gRPC connection to ingesters work as a
# client.
[grpc_client_config: <grpc_client_config>]
```
### grpc_client_config
The `grpc_client_config` block configures a client connection to a gRPC service.
```yaml
# The maximum size in bytes the client can recieve
[max_recv_msg_size: <int> | default = 104857600]
# The maximum size in bytes the client can send
[max_send_msg_size: <int> | default = 16777216]
# Whether or not messages should be compressed
[use_gzip_compression: <bool> | default = false]
# Rate limit for gRPC client. 0 is disabled
[rate_limit: <float> | default = 0]
# Rate limit burst for gRPC client.
[rate_limit_burst: <int> | default = 0]
# Enable backoff and retry when a rate limit is hit.
[backoff_on_ratelimits: <bool> | default = false]
# Configures backoff when enbaled.
backoff_config:
# Minimum delay when backing off.
[minbackoff: <duration> | default = 100ms]
# The maximum delay when backing off.
[maxbackoff: <duration> | default = 10s]
# Number of times to backoff and retry before failing.
[maxretries: <int> | default = 10]
```
## ingester_config
The `ingester_config` block configures Ingesters.
```yaml
# Configures how the lifecycle of the ingester will operate
# and where it will register for discovery.
[lifecycler: <lifecycler_config>]
# Number of times to try and transfer chunks when leaving before
# falling back to flushing to the store.
[max_transfer_retries: <int> | default = 10]
# How many flushes can happen concurrently from each stream.
[concurrent_flushes: <int> | default = 16]
# How often should the ingester see if there are any blocks
# to flush
[flush_check_period: <duration> | default = 30s]
# The timeout before a flush is cancelled
[flush_op_timeout: <duration> | default = 10s]
# How long chunks should be retained in-memory after they've
# been flushed.
[chunk_retain_period: <duration> | default = 15m]
# How long chunks should sit in-memory with no updates before
# being flushed if they don't hit the max block size. This means
# that half-empty chunks will still be flushed after a certain
# period as long as they receieve no further activity.
[chunk_idle_period: <duration> | default = 30m]
# The maximum size in bytes a chunk can be before it should be flushed.
[chunk_block_size: <int> | default = 262144]
```
### lifecycler_config
The `lifecycler_config` is used by the Ingester to control how that ingester
registers itself into the ring and manages its lifecycle during its stay in the
ring.
```yaml
# Configures the ring the lifecycler connects to
[ring: <ring_config>]
# The number of tokens the lifecycler will generate and put into the ring if
# it joined without transfering tokens from another lifecycler.
[num_tokens: <int> | default = 128]
# Period at which to heartbeat to the underlying ring.
[heartbeat_period: <duration> | default = 5s]
# How long to wait to claim tokens and chunks from another member when
# that member is leaving. Will join automatically after the duration expires.
[join_after: <duration> | default = 0s]
# Minimum duration to wait before becoming ready. This is to work around race
# conditions with ingesters exiting and updating the ring.
[min_ready_duration: <duration> | default = 1m]
# Store tokens in a normalised fashion to reduce the number of allocations.
[normalise_tokens: <boolean> | default = false]
# Name of network interfaces to read addresses from.
interface_names:
- [<string> ... | default = ["eth0", "en0"]]
# Duration to sleep before exiting to ensure metrics are scraped.
[final_sleep: <duration> | default = 30s]
```
### ring_config
The `ring_config` is used to discover and connect to Ingesters.
```yaml
kvstore:
# The backend storage to use for the ring. Supported values are
# consul, etcd, inmemory
store: <string>
# The prefix for the keys in the store. Should end with a /.
[prefix: <string> | default = "collectors/"]
# Configuration for a Consul client. Only applies if store
# is "consul"
consul:
# The hostname and port of Consul.
[host: <string> | duration = "localhost:8500"]
# The ACL Token used to interact with Consul.
[acltoken: <string>]
# The HTTP timeout when communicating with Consul
[httpclienttimeout: <duration> | default = 20s]
# Whether or not consistent reads to Consul are enabled.
[consistentreads: <boolean> | default = true]
# Configuration for an ETCD v3 client. Only applies if
# store is "etcd"
etcd:
# The ETCD endpoints to connect to.
endpoints:
- <string>
# The Dial timeout for the ETCD connection.
[dial_tmeout: <duration> | default = 10s]
# The maximum number of retries to do for failed ops to ETCD.
[max_retries: <int> | default = 10]
# The heartbeart timeout after which ingesters are skipped for
# reading and writing.
[heartbeart_timeout: <duration> | default = 1m]
# The number of ingesters to write to and read from. Must be at least
# 1.
[replication_factor: <int> | default = 3]
```
## storage_config
The `storage_config` block configures one of many possible stores for both the
index and chunks. Which configuration is read from depends on the schema_config
block and what is set for the store value.
```yaml
# Configures storing chunks in AWS. Required options only required when aws is
# present.
aws:
# S3 or S3-compatible URL to connect to. If only region is specified as a
# host, the proper endpoint will be deduced. Use inmemory:///<bucket-name> to
# use a mock in-memory implementation.
s3: <string>
# Set to true to force the request to use path-style addressing
[s3forcepathstyle: <boolean> | default = false]
# Configure the DynamoDB conection
dynamodbconfig:
# URL for DynamoDB with escaped Key and Secret encoded. If only region is specified as a
# host, the proper endpoint will be deduced. Use inmemory:///<bucket-name> to
# use a mock in-memory implementation.
dynamodb: <string>
# DynamoDB table management requests per-second limit.
[apilimit: <float> | default = 2.0]
# DynamoDB rate cap to back off when throttled.
[throttlelimit: <float> | default = 10.0]
# Application Autoscaling endpoint URL with escaped Key and Secret
# encoded.
[applicationautoscaling: <string>]
# Metics-based autoscaling configuration.
metrics:
# Use metrics-based autoscaling via this Prometheus query URL.
[url: <string>]
# Queue length above which we will scale up capacity.
[targetqueuelen: <int> | default = 100000]
# Scale up capacity by this multiple
[scaleupfactor: <float64> | default = 1.3]
# Ignore throttling below this level (rate per second)
[minthrottling: <float64> | default = 1]
# Query to fetch ingester queue length
[queuelengthquery: <string> | default = "sum(avg_over_time(cortex_ingester_flush_queue_length{job="cortex/ingester"}[2m]))"]
# Query to fetch throttle rates per table
[throttlequery: <string> | default = "sum(rate(cortex_dynamo_throttled_total{operation="DynamoDB.BatchWriteItem"}[1m])) by (table) > 0"]
# Quer to fetch write capacity usage per table
[usagequery: <string> | default = "sum(rate(cortex_dynamo_consumed_capacity_total{operation="DynamoDB.BatchWriteItem"}[15m])) by (table) > 0"]
# Query to fetch read capacity usage per table
[readusagequery: <string> | default = "sum(rate(cortex_dynamo_consumed_capacity_total{operation="DynamoDB.QueryPages"}[1h])) by (table) > 0"]
# Query to fetch read errors per table
[readerrorquery: <string> | default = "sum(increase(cortex_dynamo_failures_total{operation="DynamoDB.QueryPages",error="ProvisionedThroughputExceededException"}[1m])) by (table) > 0"]
# Number of chunks to group together to parallelise fetches (0 to disable)
[chunkgangsize: <int> | default = 10]
# Max number of chunk get operations to start in parallel.
[chunkgetmaxparallelism: <int> | default = 32]
# Configures storing chunks in Bigtable. Required fields only required
# when bigtable is defined in config.
bigtable:
# BigTable project ID
project: <string>
# BigTable instance ID
instance: <string>
# Configures the gRPC client used to connect to Bigtable.
[grpc_client_config: <grpc_client_config>]
# Configures storing index in GCS. Required fields only required
# when gcs is defined in config.
gcs:
# Name of GCS bucket to put chunks in.
bucket_name: <string>
# The size of the buffer that the GCS client uses for each PUT request. 0
# to disable buffering.
[chunk_buffer_size: <int> | default = 0]
# The duration after which the requests to GCS should be timed out.
[request_timeout: <duration> | default = 0s]
# Configures storing chunks in Cassandra
cassandra:
# Comma-separated hostnames or IPs of Cassandra instances
addresses: <string>
# Port that cassandra is running on
[port: <int> | default = 9042]
# Keyspace to use in Cassandra
keyspace: <string>
# Consistency level for Cassandra
[consistency: <string> | default = "QUORUM"]
# Replication factor to use in Cassandra.
[replication_factor: <int> | default = 1]
# Instruct the Cassandra driver to not attempt to get host
# info from the system.peers table.
[disable_initial_host_lookup: <bool> | default = false]
# Use SSL when connecting to Cassandra instances.
[SSL: <boolean> | default = false]
# Require SSL certificate validation when SSL is enabled.
[host_verification: <bool> | default = true]
# Path to certificate file to verify the peer when SSL is
# enabled.
[CA_path: <string>]
# Enable password authentication when connecting to Cassandra.
[auth: <bool> | default = false]
# Username for password authentication when auth is true.
[username: <string>]
# Password for password authentication when auth is true.
[password: <string>]
# Timeout when connecting to Cassandra.
[timeout: <duration> | default = 600ms]
# Initial connection timeout during initial dial to server.
[connect_timeout: <duration> | default = 600ms]
# Configures storing index in BoltDB. Required fields only
# required when boltdb is present in config.
boltdb:
# Location of BoltDB index files.
directory: <string>
# Configures storing the chunks on the local filesystem. Required
# fields only required when filesystem is present in config.
filesystem:
# Directory to store chunks in.
directory: <string>
# Cache validity for active index entries. Should be no higher than
# the chunk_idle_period in the ingester settings.
[indexcachevalidity: <duration> | default = 5m]
# The maximum number of chunks to fetch per batch.
[max_chunk_batch_size: <int> | default = 50]
# Config for how the cache for index queries should
# be built.
index_queries_cache_config: <cache_config>
```
### cache_config
The `cache_config` block configures how Loki will cache requests, chunks, and
the index to a backing cache store.
```yaml
# Enable in-memory cache.
[enable_fifocache: <boolean>]
# The default validity of entries for caches unless overriden.
# "defaul" is correct.
[defaul_validity: <duration>]
# Configures the background cache when memcached is used.
background:
# How many goroutines to use to write back to memcached.
[writeback_goroutines: <int> | default = 10]
# How many chunks to buffer for background write back to memcached.
[writeback_buffer: <int> = 10000]
# Configures memcached settings.
memcached:
# Configures how long keys stay in memcached.
expiration: <duration>
# Configures how many keys to fetch in each batch request.
batch_size: <int>
# Maximum active requests to memcached.
[parallelism: <int> | default = 100]
# Configures how to connect to one or more memcached servers.
memcached_client:
# The hostname to use for memcached services when caching chunks. If
# empty, no memcached will be used. A SRV lookup will be used.
[host: <string>]
# SRV service used to discover memcached servers.
[service: <string> | default = "memcached"]
# Maximum time to wait before giving up on memcached requests.
[timeout: <duration> | default = 100ms]
# The maximum number of idle connections in the memcached client
# pool.
[max_idle_conns: <int> | default = 100]
# The period with which to poll the DNS for memcached servers.
[update_interval: <duration> | default = 1m]
# Whether or not to use a consistent hash to discover multiple memcached
# servers.
[consistent_hash: <bool>]
fifocache:
# Number of entries to cache in-memory.
[size: <int> | default = 0]
# The expiry duration for the in-memory cache.
[validity: <duration> | default = 0s]
```
## chunk_store_config
The `chunk_store_config` block configures how chunks will be cached and how long
to wait before saving them to the backing store.
```yaml
# The cache configuration for storing chunks
[chunk_cache_config: <cache_config>]
# The cache configuration for deduplicating writes
[write_dedupe_cache_config: <cache_config>]
# The minimum time between a chunk update and being saved
# to the store.
[min_chunk_age: <duration>]
# Cache index entries older than this period. Default is
# disabled.
[cache_lookups_older_than: <duration>]
# Limit how long back data can be queries. Default is disabled.
[max_look_back_period: <duration>]
```
## schema_config
The `schema_config` block configures schemas from given dates.
```yaml
# The configuration for chunk index schemas.
configs:
- [<period_config>]
```
### period_config
The `period_config` block configures what index schemas should be used
for from specific time periods.
```yaml
# The date of the first day that index buckets should be created. Use
# a date in the past if this is your only period_config, otherwise
# use a date when you want the schema to switch over.
[from: <daytime>]
# store and object_store below affect which <storage_config> key is
# used.
# Which store to use for the index. Either cassandra, bigtable, dynamodb, or
# boltdb
store: <string>
# Which store to use for the chunks. Either gcs, s3, inmemory, filesystem,
# cassandra. If omitted, defaults to same value as store.
[object_store: <string>]
# The schema to use. Set to v9 or v10.
schema: <string>
# Configures how the index is updated and stored.
index:
# Table prefix for all period tables.
prefix: <string>
# Table period.
[period: <duration> | default = 168h]
# A map to be added to all managed tables.
tags:
[<string>: <string> ...]
# Configured how the chunks are updated and stored.
chunks:
# Table prefix for all period tables.
prefix: <string>
# Table period.
[period: <duration> | default = 168h]
# A map to be added to all managed tables.
tags:
[<string>: <string> ...]
# How many shards will be created. Only used if schema is v10.
[row_shards: <int> | default = 16]
```
Where `daytime` is a value in the format of `yyyy-mm-dd` like `2006-01-02`.
## limits_config
The `limits_config` block configures global and per-tenant limits for ingesting
logs in Loki.
```yaml
# Per-user ingestion rate limit in samples per second.
[ingestion_rate: <float> | default = 25000]
# Per-user allowed ingestion burst size (in number of samples).
[ingestion_burst_size: <int> | default = 50000]
# Whether or not, for all users, samples with external labels
# identifying replicas in an HA Prometheus setup will be handled.
[accept_ha_samples: <boolean> | default = false]
# Prometheus label to look for in samples to identify a
# Prometheus HA cluster.
[ha_cluster_label: <string> | default = "cluster"]
# Prometheus label to look for in samples to identify a Prometheus HA
# replica.
[ha_replica_label: <string> | default = "__replica__"]
# Maximum length of a label name.
[max_label_name_length: <int> | default = 1024]
# Maximum length of a label value.
[max_label_value_length: <int> | default = 2048]
# Maximum number of label names per series.
[max_label_names_per_series: <int> | default = 30]
# Whether or not old samples will be rejected.
[reject_old_samples: <bool> | default = false]
# Maximum accepted sample age before rejecting.
[reject_old_samples_max_age: <duration> | default = 336h]
# Duration for a table to be created/deleted before/after it's
# needed. Samples won't be accepted before this time.
[creation_grace_period: <duration> | default = 10m]
# Enforce every sample has a metric name.
[enforce_metric_name: <boolean> | default = true]
# Maximum number of samples that a query can return.
[max_samples_per_query: <int> | default = 1000000]
# Maximum number of active series per user.
[max_series_per_user: <int> | default = 5000000]
# Maximum number of active series per metric name.
[max_series_per_metric: <int> | default = 50000]
# Maximum number of chunks that can be fetched by a single query.
[max_chunks_per_query: <int> | default = 2000000]
# The limit to length of chunk store queries. 0 to disable.
[max_query_length: <duration> | default = 0]
# Maximum number of queries that will be scheduled in parallel by the
# frontend.
[max_query_parallelism: <int> | default = 14]
# Cardinality limit for index queries
[cardinality_limit: <int> | default = 100000]
# Filename of per-user overrides file
[per_tenant_override_config: <string>]
# Period with which to reload the overrides file if configured.
[per_tenant_override_period: <duration> | default = 10s]
```
## table_manager_config
The `table_manager_config` block configures how the table manager operates
and how to provision tables when DynamoDB is used as the backing store.
```yaml
# Master 'off-switch' for table capacity updates, e.g. when troubleshooting
[throughput_updates_disabled: <boolean> | default = false]
# Master 'on-switch' for table retention deletions
[retention_deletes_enabled: <boolean> | default = false]
# How far back tables will be kept before they are deleted. 0s disables
# deletion.
[retention_period: <duration> | default = 0s]
# Period with which the table manager will poll for tables.
[dynamodb_poll_interval: <duration> | default = 2m]
# duration a table will be created before it is needed.
[creation_grace_period: <duration> | default = 10m]
# Configures management of the index tables for DynamoDB.
index_tables_provisioning: <provision_config>
# Configures management of the chunk tables for DynamoDB.
chunk_tables_provisioning: <provision_config>
```
### provision_config
The `provision_config` block configures provisioning capacity for DynamoDB.
```yaml
# Enables on-demand throughput provisioning for the storage
# provider, if supported. Applies only to tables which are not autoscaled.
[provisioned_throughput_on_demand_mode: <boolean> | default = false]
# DynamoDB table default write throughput.
[provisioned_write_throughput: <int> | default = 3000]
# DynamoDB table default read throughput.
[provisioned_read_throughput: <int> | default = 300]
# Enables on-demand throughput provisioning for the storage provide,
# if supported. Applies only to tables which are not autoscaled.
[inactive_throughput_on_demand_mode: <boolean> | default = false]
# DynamoDB table write throughput for inactive tables.
[inactive_write_throughput: <int> | default = 1]
# DynamoDB table read throughput for inactive tables.
[inactive_read_throughput: <int> | Default = 300]
# Active table write autoscale config.
[write_scale: <auto_scaling_config>]
# Inactive table write autoscale config.
[inactive_write_scale: <auto_scaling_config>]
# Number of last inactive tables to enable write autoscale.
[inactive_write_scale_lastn: <int>]
# Active table read autoscale config.
[read_scale: <auto_scaling_config>]
# Inactive table read autoscale config.
[inactive_read_scale: <auto_scaling_config>]
# Number of last inactive tables to enable read autoscale.
[inactive_read_scale_lastn: <int>]
```
#### auto_scaling_config
The `auto_scaling_config` block configures autoscaling for DynamoDB.
```yaml
# Whether or not autoscaling should be enabled.
[enabled: <boolean>: default = false]
# AWS AutoScaling role ARN
[role_arn: <string>]
# DynamoDB minimum provision capacity.
[min_capacity: <int> | default = 3000]
# DynamoDB maximum provision capacity.
[max_capacity: <int> | default = 6000]
# DynamoDB minimum seconds between each autoscale up.
[out_cooldown: <int> | default = 1800]
# DynamoDB minimum seconds between each autoscale down.
[in_cooldown: <int> | default = 1800]
# DynamoDB target ratio of consumed capacity to provisioned capacity.
[target: <float> | default = 80]
```

@ -0,0 +1,163 @@
# Loki Configuration Examples
1. [Complete Local Config](#complete-local-config)
2. [Google Cloud Storage](#google-cloud-storage)
3. [Cassandra Index](#cassandra-index)
4. [AWS](#aws)
## Complete Local config
```yaml
auth_enabled: false
server:
http_listen_port: 3100
ingester:
lifecycler:
address: 127.0.0.1
ring:
kvstore:
store: inmemory
replication_factor: 1
final_sleep: 0s
chunk_idle_period: 5m
chunk_retain_period: 30s
schema_config:
configs:
- from: 2018-04-15
store: boltdb
object_store: filesystem
schema: v9
index:
prefix: index_
period: 168h
storage_config:
boltdb:
directory: /tmp/loki/index
filesystem:
directory: /tmp/loki/chunks
limits_config:
enforce_metric_name: false
reject_old_samples: true
reject_old_samples_max_age: 168h
chunk_store_config:
max_look_back_period: 0
table_manager:
chunk_tables_provisioning:
inactive_read_throughput: 0
inactive_write_throughput: 0
provisioned_read_throughput: 0
provisioned_write_throughput: 0
index_tables_provisioning:
inactive_read_throughput: 0
inactive_write_throughput: 0
provisioned_read_throughput: 0
provisioned_write_throughput: 0
retention_deletes_enabled: false
retention_period: 0
```
## Google Cloud Storage
This is partial config that uses GCS and Bigtable for the chunk and index
stores, respectively.
```yaml
schema_config:
configs:
- from: 2018-04-15
store: bigtable
object_store: gcs
schema: v9
index:
prefix: loki_index_
period: 168h
storage_config:
bigtable:
instance: BIGTABLE_INSTANCE
project: BIGTABLE_PROJECT
gcs:
bucket_name: GCS_BUCKET_NAME
```
## Cassandra Index
This is a partial config that uses the local filesystem for chunk storage and
Cassandra for the index storage:
```yaml
schema_config:
configs:
- from: 2018-04-15
store: cassandra
object_store: filesystem
schema: v9
index:
prefix: cassandra_table
period: 168h
storage_config:
cassandra:
username: cassandra
password: cassandra
addresses: 127.0.0.1
auth: true
keyspace: lokiindex
filesystem:
directory: /tmp/loki/chunks
```
## AWS
This is a partial config that uses S3 for chunk storage and DynamoDB for the
index storage:
```yaml
schema_config:
configs:
- from: 0
store: dynamo
object_store: s3
schema: v9
index:
prefix: dynamodb_table_name
period: 0
storage_config:
aws:
s3: s3://access_key:secret_access_key@region/bucket_name
dynamodbconfig:
dynamodb: dynamodb://access_key:secret_access_key@region
```
If you don't wish to hard-code S3 credentials, you can also configure an EC2
instance role by changing the `storage_config` section:
```yaml
storage_config:
aws:
s3: s3://region/bucket_name
dynamodbconfig:
dynamodb: dynamodb://region
```
### S3-compatible APIs
S3-compatible APIs (e.g., Ceph Object Storage with an S3-compatible API) can be
used. If the API supports path-style URL rather than virtual hosted bucket
addressing, configure the URL in `storage_config` with the custom endpoint:
```yaml
storage_config:
aws:
s3: s3://access_key:secret_access_key@custom_endpoint/bucket_name
s3forcepathstyle: true
```

@ -0,0 +1,6 @@
# Getting started with Loki
1. [Grafana](grafana.md)
2. [LogCLI](logcli.md)
4. [Troubleshooting](troubleshooting.md)

@ -0,0 +1,29 @@
# Loki in Grafana
Grafana ships with built-in support for Loki for versions greater than
[6.0](https://grafana.com/grafana/download/6.0.0). Using
[6.3](https://grafana.com/grafana/download/6.3.0) or later is highly
recommended to take advantage of new LogQL functionality.
1. Log into your Grafana instance. If this is your first time running
Grafana, the username and password are both defaulted to `admin`.
2. In Grafana, go to `Configuration` > `Data Sources` via the cog icon on the
left sidebar.
3. Click the big <kbd>+ Add data source</kbd> button.
4. Choose Loki from the list.
5. The http URL field should be the address of your Loki server. For example,
when running locally or with Docker using port mapping, the address is
likely `http://localhost:3100`. When running with docker-compose or
Kubernetes, the address is likely `https://loki:3100`.
6. To see the logs, click <kbd>Explore</kbd> on the sidebar, select the Loki
datasource in the top-left dropdown, and then choose a log stream using the
<kbd>Log labels</kbd> button.
Read more about Grafana's Explore feature in the
[Grafana documentation](http://docs.grafana.org/features/explore) and on how to
search and filter for logs with Loki.
> To configure the datasource via provisioning, see [Configuring Grafana via
> Provisioning](http://docs.grafana.org/features/datasources/loki/#configure-the-datasource-with-provisioning)
> in the Grafana documentation and make sure to adjust the URL similarly as
> shown above.

@ -1,15 +1,19 @@
# LogCLI
# Querying Loki with LogCLI
LogCLI is a handy tool to query logs from Loki without having to run a full Grafana instance.
If you prefer a command line interface, LogCLI also allows users to run LogQL
queries against a Loki server.
## Installation
### Binary (Recommended)
Head over to the [Releases](https://github.com/grafana/loki/releases) and download the `logcli` binary for your OS:
Navigate to the [Loki Releases page](https://github.com/grafana/loki/releases)
and download the `logcli` binary for your OS:
```bash
# download a binary (adapt app, os and arch as needed)
# installs v0.2.0. For up to date URLs refer to the release's description
$ curl -fSL -o "/usr/local/bin/logcli.gz" "https://github.com/grafana/logcli/releases/download/v0.2.0/logcli-linux-amd64.gz"
# installs v0.3.0. For up to date URLs refer to the release's description
$ curl -fSL -o "/usr/local/bin/logcli.gz" "https://github.com/grafana/logcli/releases/download/v0.3.0/logcli-linux-amd64.gz"
$ gunzip "/usr/local/bin/logcli.gz"
# make sure it is executable
@ -18,27 +22,34 @@ $ chmod a+x "/usr/local/bin/logcli"
### From source
Use `go get` to install `logcli` to `$GOPATH/bin`:
```
$ go get github.com/grafana/loki/cmd/logcli
```
Now `logcli` is in your current directory.
## Usage
### Example
If you are running on Grafana cloud, use:
```
If you are running on Grafana Cloud, use:
```bash
$ export GRAFANA_ADDR=https://logs-us-west1.grafana.net
$ export GRAFANA_USERNAME=<username>
$ export GRAFANA_PASSWORD=<password>
```
Otherwise, when running e.g. [locally](https://github.com/grafana/loki/tree/master/production#run-locally-using-docker), point it to your Loki instance:
```
Otherwise you can point LogCLI to a local instance directly
without needing a username and password:
```bash
$ export GRAFANA_ADDR=http://localhost:3100
```
> Note: If you are running loki behind a proxy server and have an authentication setup, you will have to pass URL, username and password accordingly. Please refer to [Authentication](loki/operations.md#authentication) for more info.
> Note: If you are running Loki behind a proxy server and you have
> authentication configured, you will also have to pass in GRAFANA_USERNAME
> and GRAFANA_PASSWORD accordingly.
```bash
$ logcli labels job
@ -48,7 +59,7 @@ cortex-ops/cortex-gw
...
$ logcli query '{job="cortex-ops/consul"}'
https://logs-dev-ops-tools1.grafana.net/api/v1/query_range?query=%7Bjob%3D%22cortex-ops%2Fconsul%22%7D&limit=30&start=1529928228&end=1529931828&direction=backward&regexp=
https://logs-dev-ops-tools1.grafana.net/api/prom/query?query=%7Bjob%3D%22cortex-ops%2Fconsul%22%7D&limit=30&start=1529928228&end=1529931828&direction=backward&regexp=
Common labels: {job="cortex-ops/consul", namespace="cortex-ops"}
2018-06-25T12:52:09Z {instance="consul-8576459955-pl75w"} 2018/06/25 12:52:09 [INFO] raft: Snapshot to 475409 complete
2018-06-25T12:52:09Z {instance="consul-8576459955-pl75w"} 2018/06/25 12:52:09 [INFO] raft: Compacting logs from 456973 to 465169
@ -61,8 +72,6 @@ Configuration values are considered in the following order (lowest to highest):
- Environment variables
- Command line flags
The URLs of the requests are printed to help with integration work.
### Details
```bash
@ -75,8 +84,7 @@ Flags:
--help Show context-sensitive help (also try --help-long and --help-man).
-q, --quiet suppress everything but log lines
-o, --output=default specify output mode [default, raw, jsonl]
-z, --timezone=Local Specify the timezone to use when formatting output timestamps [Local, UTC]
--addr="https://logs-us-west1.grafana.net"
--addr="https://logs-us-west1.grafana.net"
Server address.
--username="" Username for HTTP basic auth.
--password="" Password for HTTP basic auth.
@ -89,17 +97,14 @@ Commands:
help [<command>...]
Show help.
query [<flags>] <query>
query [<flags>] <query> [<regex>]
Run a LogQL query.
instant-query [<flags>] <query>
Run an instant LogQL query
labels [<label>]
Find values for a given label.
$ logcli help query
usage: logcli query [<flags>] <query>
usage: logcli query [<flags>] <query> [<regex>]
Run a LogQL query.
@ -107,8 +112,7 @@ Flags:
--help Show context-sensitive help (also try --help-long and --help-man).
-q, --quiet suppress everything but log lines
-o, --output=default specify output mode [default, raw, jsonl]
-z, --timezone=Local Specify the timezone to use when formatting output timestamps [Local, UTC]
--addr="https://logs-us-west1.grafana.net"
--addr="https://logs-us-west1.grafana.net"
Server address.
--username="" Username for HTTP basic auth.
--password="" Password for HTTP basic auth.
@ -121,15 +125,16 @@ Flags:
--from=FROM Start looking for logs at this absolute time (inclusive)
--to=TO Stop looking for logs at this absolute time (exclusive)
--forward Scan forwards through logs.
-t, --tail Tail the logs
--delay-for=0 Delay in tailing by number of seconds to accumulate logs for re-ordering
--no-labels Do not print any labels
--exclude-label=EXCLUDE-LABEL ...
--exclude-label=EXCLUDE-LABEL ...
Exclude labels given the provided key during output.
--include-label=INCLUDE-LABEL ...
--include-label=INCLUDE-LABEL ...
Include labels given the provided key during output.
--labels-length=0 Set a fixed padding to labels
-t, --tail Tail the logs
--delay-for=0 Delay in tailing by number of seconds to accumulate logs for re-ordering
Args:
<query> eg '{foo="bar",baz=~".*blip"} |~ ".*error.*"'
<query> eg '{foo="bar",baz="blip"}'
[<regex>]
```

@ -0,0 +1,126 @@
# Troubleshooting Loki
## "Loki: Bad Gateway. 502"
This error can appear in Grafana when Loki is added as a
datasource, indicating that Grafana in unable to connect to Loki. There may
one of many root causes:
- If Loki is deployed with Docker, and Grafana and Loki are not running in the
same node, check your firewall to make sure the nodes can connect.
- If Loki is deployed with Kubernetes:
- If Grafana and Loki are in the same namespace, set the Loki URL as
`http://$LOKI_SERVICE_NAME:$LOKI_PORT`
- Otherwise, set the Loki URL as
`http://$LOKI_SERVICE_NAME.$LOKI_NAMESPACE:$LOKI_PORT`
## "Data source connected, but no labels received. Verify that Loki and Promtail is configured properly."
This error can appear in Grafana when Loki is added as a datasource, indicating
that although Grafana has connected to Loki, Loki hasn't received any logs from
Promtail yet. There may be one of many root causes:
- Promtail is running and collecting logs but is unable to connect to Loki to
send the logs. Check Promtail's output.
- Promtail started sending logs to Loki before Loki was ready. This can
happen in test environment where Promtail has already read all logs and sent
them off. Here is what you can do:
- Start Promtail after Loki, e.g., 60 seconds later.
- To force Promtail to re-send log messages, delete the positions file
(default location `/tmp/positions.yaml`).
- Promtail is ignoring targets and isn't reading any logs because of a
configuration issue.
- This can be detected by turning on debug logging in Promtail and looking
for `dropping target, no labels` or `ignoring target` messages.
- Promtail cannot find the location of your log files. Check that the
`scrape_configs` contains valid path settings for finding the logs on your
worker nodes.
- Your pods are running with different labels than the ones Promtail is
configured to read. Check `scrape_configs` to validate.
## Troubleshooting targets
Promtail exposes two web pages that can be used to understand how its service
discovery works.
The service discovery page (`/service-discovery`) shows all
discovered targets with their labels before and after relabeling as well as
the reason why the target has been dropped.
The targets page (`/targets`) displays only targets that are being actively
scraped and their respective labels, files, and positions.
On Kubernetes, you can access those two pages by port-forwarding the Promtail
port (`9080` or `3101` if using Helm) locally:
```bash
$ kubectl port-forward loki-promtail-jrfg7 9080
# Then, in a web browser, visit http://localhost:9080/service-discovery
```
## Debug output
Both `loki` and `promtail` support a log level flag on the command-line:
```bash
$ loki —log.level=debug
$ promtail -log.level=debug
```
## Failed to create target, `ioutil.ReadDir: readdirent: not a directory`
The Promtail configuration contains a `__path__` entry to a directory that
Promtail cannot find.
## Connecting to a Promtail pod to troubleshoot
First check [Troubleshooting targets](#troubleshooting-targets) section above.
If that doesn't help answer your questions, you can connect to the Promtail pod
to investigate further.
If you are running Promtail as a DaemonSet in your cluster, you will have a
Promtail pod on each node, so figure out which Promtail you need to debug first:
```shell
$ kubectl get pods --all-namespaces -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
...
nginx-7b6fb56fb8-cw2cm 1/1 Running 0 41d 10.56.4.12 node-ckgc <none>
...
promtail-bth9q 1/1 Running 0 3h 10.56.4.217 node-ckgc <none>
```
That output is truncated to highlight just the two pods we are interested in,
you can see with the `-o wide` flag the NODE on which they are running.
You'll want to match the node for the pod you are interested in, in this example
NGINX, to the Promtail running on the same node.
To debug you can connect to the Promtail pod:
```shell
kubectl exec -it promtail-bth9q -- /bin/sh
```
Once connected, verify the config in `/etc/promtail/promtail.yml` has the
contents you expect.
Also check `/var/log/positions.yaml` (`/run/promtail/positions.yaml` when
deployed by Helm or whatever value is specified for `positions.file`) and make
sure Promtail is tailing the logs you would expect.
You can check the Promtail log by looking in `/var/log/containers` at the
Promtail container log.
## Enable tracing for Loki
Loki can be traced using [Jaeger](https://www.jaegertracing.io/) by setting
the environment variable `JAEGER_AGENT_HOST` to the hostname and port where
Loki is running.
If you deploy with Helm, use the following command:
```bash
$ helm upgrade --install loki loki/loki --set "loki.tracing.jaegerAgentHost=YOUR_JAEGER_AGENT_HOST"
```

@ -0,0 +1,5 @@
# Installing Loki
1. [Installing using Tanka (recommended)](./tanka.md)
2. [Installing through Helm](./helm.md)
3. [Installing locally](./local.md)

@ -0,0 +1,108 @@
# Installing Loki with Helm
## Prerequisites
Make sure you have Helm [installed](https://helm.sh/docs/using_helm/#installing-helm) and
[deployed](https://helm.sh/docs/using_helm/#installing-tiller) to your cluster. Then add
Loki's chart repository to Helm:
```bash
$ helm repo add loki https://grafana.github.io/loki/charts
```
You can update the chart repository by running:
```bash
$ helm repo update
```
## Deploy Loki to your cluster
### Deploy with default config
```bash
$ helm upgrade --install loki loki/loki-stack
```
### Deploy in a custom namespace
```bash
$ helm upgrade --install loki --namespace=loki loki/loki
```
### Deploy with custom config
```bash
$ helm upgrade --install loki loki/loki --set "key1=val1,key2=val2,..."
```
### Deploy Loki Stack (Loki, Promtail, Grafana, Prometheus)
```bash
$ helm upgrade --install loki loki/loki-stack
```
## Deploy Grafana to your cluster
To install Grafana on your cluster with Helm, use the following command:
```bash
$ helm install stable/grafana -n loki-grafana
```
To get the admin password for the Grafana pod, run the following command:
```bash
$ kubectl get secret --namespace <YOUR-NAMESPACE> loki-grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo
```
To access the Grafana UI, run the following command:
```bash
$ kubectl port-forward --namespace <YOUR-NAMESPACE> service/loki-grafana 3000:80
```
Navigate to `http://localhost:3000` and login with `admin` and the password
output above. Then follow the [instructions for adding the Loki Data Source](../getting-started/grafana.md), using the URL
`http://loki:3100/` for Loki.
## Run Loki behind HTTPS ingress
If Loki and Promtail are deployed on different clusters you can add an Ingress
in front of Loki. By adding a certificate you create an HTTPS endpoint. For
extra security you can also enable Basic Authentication on the Ingress.
In Promtail, set the following values to communicate using HTTPS and basic
authentication:
```yaml
loki:
serviceScheme: https
user: user
password: pass
```
Sample Helm template for Ingress:
```yaml
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
annotations:
kubernetes.io/ingress.class: {{ .Values.ingress.class }}
ingress.kubernetes.io/auth-type: "basic"
ingress.kubernetes.io/auth-secret: {{ .Values.ingress.basic.secret }}
name: loki
spec:
rules:
- host: {{ .Values.ingress.host }}
http:
paths:
- backend:
serviceName: loki
servicePort: 3100
tls:
- secretName: {{ .Values.ingress.cert }}
hosts:
- {{ .Values.ingress.host }}
```

@ -0,0 +1,42 @@
# Installing Loki Locally
## Release Binaries
Every [Loki release](https://github.com/grafana/loki/releases) includes
prebuilt binaries:
```bash
# download a binary (modify app, os, and arch as needed)
# Installs v0.3.0. Go to the releases page for the latest version
$ curl -fSL -o "/usr/local/bin/loki.gz" "https://github.com/grafana/loki/releases/download/v0.3.0/loki_linux_amd64.gz"
$ gunzip "/usr/local/bin/loki.gz"
# make sure it is executable
$ chmod a+x "/usr/local/bin/loki"
```
## Manual Build
### Prerequisites
- Go 1.11 or later
- Make
- Docker (for updating protobuf files and yacc files)
### Building
Clone Loki to `$GOPATH/src/github.com/grafana/loki`:
```bash
$ git clone https://github.com/grafana/loki $(go env GOPATH)/src/github.com/grafana/loki
```
Then change into that directory and run `make loki`:
```bash
$ cd $(go env GOPATH)/src/github.com/grafana/loki
$ make loki
# A file at ./cmd/loki/loki will be created and is the
# final built binary.
```

@ -0,0 +1,66 @@
# Installing Loki with Tanka
[Tanka](https://tanka.dev) is a reimplementation of
[Ksonnet](https://ksonnet.io) that Grafana Labs created after Ksonnet was
deprecated. Tanka is used by Grafana Labs to run Loki in production.
## Prerequisites
Grab the latest version of Tanka (at least version v0.5.0) for the `tk env`
commands. Prebuilt binaries for Tanka can be found at the [Tanka releases
URL](https://github.com/grafana/tanka/releases).
In your config repo, if you don't have a Tanka application, create a folder and
call `tk init` inside of it. Then create an environment for Loki and provide the
URL for the Kubernetes API server to deploy to (e.g., `https://localhost:6443`):
```
$ mkdir <application name>
$ cd <application name>
$ tk init
$ tk env add environments/loki --namespace=loki --server=<Kubernetes API server>
```
## Deploying
Grab the Loki module using `jb`:
```bash
$ go get -u github.com/jsonnet-bundler/jsonnet-bundler/cmd/jb
$ jb init
$ jb install github.com/grafana/loki/production/ksonnet/loki
```
Be sure to replace the username, password and the relevant `htpasswd` contents.
Making sure to set the value for username, password, and `htpasswd` properly,
replace the contents of `environments/loki/main.jsonnet` with:
```jsonnet
local gateway = import 'loki/gateway.libsonnet';
local loki = import 'loki/loki.libsonnet';
local promtail = import 'promtail/promtail.libsonnet';
loki + promtail + gateway {
_config+:: {
namespace: 'loki',
htpasswd_contents: 'loki:$apr1$H4yGiGNg$ssl5/NymaGFRUvxIV1Nyr.',
promtail_config: {
scheme: 'http',
hostname: 'gateway.%(namespace)s.svc' % $._config,
username: 'loki',
password: 'password',
container_root_path: '/var/lib/docker',
},
replication_factor: 3,
consul_replicas: 1,
},
}
```
Notice that `container_root_path` is your own data root for the Docker Daemon,
run `docker info | grep "Root Dir"` to get it.
Run `tk show environments/loki` to see the manifests that will be deployed to the cluster and
finally run `tk apply environments/loki` to deploy it.

@ -1,5 +0,0 @@
# logentry
Both the Docker Driver and Promtail support transformations on received log
entries to control what data is sent to Loki. Please see the documentation
on how to [process log lines](processing-log-lines.md) for more information.

@ -1,621 +0,0 @@
# Processing Log Lines
A detailed look at how to setup promtail to process your log lines, including extracting metrics and labels.
* [Pipeline](#pipeline)
* [Stages](#stages)
## Pipeline
Pipeline stages implement the following interface:
```go
type Stage interface {
Process(labels model.LabelSet, extracted map[string]interface{}, time *time.Time, entry *string)
}
```
Any Stage is capable of modifying the `labels`, `extracted` data, `time`, and/or `entry`, though generally a Stage should only modify one of those things to reduce complexity.
Typical pipelines will start with a [regex](#regex) or [json](#json) stage to extract data from the log line. Then any combination of other stages follow to use the data in the `extracted` map. It may also be common to see the use of [match](#match) at the start of a pipeline to selectively apply stages based on labels.
The example below gives a good glimpse of what you can achieve with a pipeline :
```yaml
scrape_configs:
- job_name: kubernetes-pods-name
kubernetes_sd_configs: ....
pipeline_stages:
- match:
selector: '{name="promtail"}'
stages:
- regex:
expression: '.*level=(?P<level>[a-zA-Z]+).*ts=(?P<timestamp>[T\d-:.Z]*).*component=(?P<component>[a-zA-Z]+)'
- labels:
level:
component:
- timestamp:
format: RFC3339Nano
source: timestamp
- match:
selector: '{name="nginx"}'
stages:
- regex:
expression: \w{1,3}.\w{1,3}.\w{1,3}.\w{1,3}(?P<output>.*)
- output:
source: output
- match:
selector: '{name="jaeger-agent"}'
stages:
- json:
expressions:
level: level
- labels:
level:
- job_name: kubernetes-pods-app
kubernetes_sd_configs: ....
pipeline_stages:
- match:
selector: '{app=~"grafana|prometheus"}'
stages:
- regex:
expression: ".*(lvl|level)=(?P<level>[a-zA-Z]+).*(logger|component)=(?P<component>[a-zA-Z]+)"
- labels:
level:
component:
- match:
selector: '{app="some-app"}'
stages:
- regex:
expression: ".*(?P<panic>panic: .*)"
- metrics:
- panic_total:
type: Counter
description: "total count of panic"
source: panic
config:
action: inc
```
In the first job:
The first `match` stage will only run if a label named `name` == `promtail`, it then applies a regex to parse the line, followed by setting two labels (level and component) and the timestamp from extracted data.
The second `match` stage will only run if a label named `name` == `nginx`, it then parses the log line with regex and extracts the `output` which is then set as the log line output sent to loki
The third `match` stage will only run if label named `name` == `jaeger-agent`, it then parses this log as JSON extracting `level` which is then set as a label
In the second job:
The first `match` stage will only run if a label named `app` == `grafana` or `prometheus`, it then parses the log line with regex, and sets two new labels of level and component from the extracted data.
The second `match` stage will only run if a label named `app` == `some-app`, it then parses the log line and creates an extracted key named panic if it finds `panic: ` in the log line. Then a metrics stage will increment a counter if the extracted key `panic` is found in the `extracted` map.
More info on each field in the interface:
##### labels model.LabelSet
A set of prometheus style labels which will be sent with the log line and will be indexed by Loki.
##### extracted map[string]interface{}
metadata extracted during the pipeline execution which can be used by subsequent stages. This data is not sent with the logs and is dropped after the log entry is processed through the pipeline.
For example, stages like [regex](#regex) and [json](#json) will use expressions to extract data from a log line and store it in the `extracted` map, which following stages like [timestamp](#timestamp) or [output](#output) can use to manipulate the log lines `time` and `entry`.
##### time *time.Time
The timestamp which loki will store for the log line, if not set within the pipeline using the [timestamp](#timestamp) stage, it will default to time.Now().
##### entry *string
The log line which will be stored by loki, the [output](#output) stage is capable of modifying this value, if no stage modifies this value the log line stored will match what was input to the system and not be modified.
## Stages
Extracting data (for use by other stages)
* [regex](#regex) - use regex to extract data
* [json](#json) - parse a JSON log and extract data
Modifying extracted data
* [template](#template) - use Go templates to modify extracted data
Filtering stages
* [match](#match) - apply selectors to conditionally run stages based on labels
Mutating/manipulating output
* [timestamp](#timestamp) - set the timestamp sent to Loki
* [output](#output) - set the log content sent to Loki
Adding Labels
* [labels](#labels) - add labels to the log stream
Metrics
* [metrics](#metrics) - calculate metrics from the log content
### regex
A regex stage will take the provided regex and set the named groups as data in the `extracted` map.
```yaml
- regex:
expression: ①
source: ②
```
`expression` is **required** and needs to be a [golang RE2 regex string](https://github.com/google/re2/wiki/Syntax). Every capture group `(re)` will be set into the `extracted` map, every capture group **must be named:** `(?P<name>re)`, the name will be used as the key in the map.
`source` is optional and contains the name of key in the `extracted` map containing the data to parse. If omitted, the regex stage will parse the log `entry`.
##### Example (without source):
```yaml
- regex:
expression: "^(?s)(?P<time>\\S+?) (?P<stream>stdout|stderr) (?P<flags>\\S+?) (?P<content>.*)$"
```
Log line: `2019-01-01T01:00:00.000000001Z stderr P i'm a log message!`
Would create the following `extracted` map:
```go
{
"time": "2019-01-01T01:00:00.000000001Z",
"stream": "stderr",
"flags": "P",
"content": "i'm a log message",
}
```
These map entries can then be used by other pipeline stages such as [timestamp](#timestamp) and/or [output](#output)
[Example in unit test](../../pkg/logentry/stages/regex_test.go)
##### Example (with source):
```yaml
- json:
expressions:
time:
- regex:
expression: "^(?P<year>\\d+)"
source: "time"
```
Log line: `{"time":"2019-01-01T01:00:00.000000001Z"}`
Would create the following `extracted` map:
```go
{
"time": "2019-01-01T01:00:00.000000001Z",
"year": "2019"
}
```
These map entries can then be used by other pipeline stages such as [timestamp](#timestamp) and/or [output](#output)
### json
A json stage will take the provided [JMESPath expressions](http://jmespath.org/) and set the key/value data in the `extracted` map.
```yaml
- json:
expressions: ①
key: expression ②
source: ③
```
`expressions` is a required yaml object containing key/value pairs of JMESPath expressions
`key: expression` where `key` will be the key in the `extracted` map, and the value will be the evaluated JMESPath expression.
`source` is optional and contains the name of key in the `extracted` map containing the json to parse. If omitted, the json stage will parse the log `entry`.
This stage uses the Go JSON unmarshaller, which means non string types like numbers or booleans will be unmarshalled into those types. The `extracted` map will accept non-string values and this stage will keep primitive types as they are unmarshalled (e.g. bool or float64). Downstream stages will need to perform correct type conversion of these values as necessary.
If the value is a complex type, for example a JSON object, it will be marshalled back to JSON before being put in the `extracted` map.
##### Example (without source):
```yaml
- json:
expressions:
output: log
stream: stream
timestamp: time
```
Log line: `{"log":"log message\n","stream":"stderr","time":"2019-04-30T02:12:41.8443515Z"}`
Would create the following `extracted` map:
```go
{
"output": "log message\n",
"stream": "stderr",
"timestamp": "2019-04-30T02:12:41.8443515"
}
```
[Example in unit test](../../pkg/logentry/stages/json_test.go)
##### Example (with source):
```yaml
- json:
expressions:
output: log
stream: stream
timestamp: time
extra:
- json:
expressions:
user:
source: extra
```
Log line: `{"log":"log message\n","stream":"stderr","time":"2019-04-30T02:12:41.8443515Z","extra":"{\"user\":\"marco\"}"}`
Would create the following `extracted` map:
```go
{
"output": "log message\n",
"stream": "stderr",
"timestamp": "2019-04-30T02:12:41.8443515",
"extra": "{\"user\":\"marco\"}",
"user": "marco"
}
```
#### template
A template stage lets you manipulate the values in the `extracted` data map using [Go's template package](https://golang.org/pkg/text/template/). This can be useful if you want to manipulate data extracted by regex or json stages before setting label values. Maybe to replace all spaces with underscores or make everything lowercase, or append some values to the extracted data.
You can set values in the extracted map for keys that did not previously exist.
```yaml
- template:
source: ①
template: ②
```
`source` is **required** and is the key to the value in the `extracted` data map you wish to modify, this key does __not__ have to be present and will be added if missing.
`template` is **required** and is a [Go template string](https://golang.org/pkg/text/template/)
The value of the extracted data map is accessed by using `.Value` in your template
In addition to normal template syntax, several functions have also been mapped to use directly or in a pipe configuration:
```go
"ToLower": strings.ToLower,
"ToUpper": strings.ToUpper,
"Replace": strings.Replace,
"Trim": strings.Trim,
"TrimLeft": strings.TrimLeft,
"TrimRight": strings.TrimRight,
"TrimPrefix": strings.TrimPrefix,
"TrimSuffix": strings.TrimSuffix,
"TrimSpace": strings.TrimSpace,
```
##### Example
```yaml
- template:
source: app
template: '{{ .Value }}_some_suffix'
```
This would take the value of the `app` key in the `extracted` data map and append `_some_suffix` to it. For example, if `app=loki` the new value for `app` in the map would be `loki_some_suffix`
```yaml
- template:
source: app
template: '{{ ToLower .Value }}'
```
This would take the value of `app` from `extracted` data and lowercase all the letters. If `app=LOKI` the new value for `app` would be `loki`.
The template syntax passes paramters to functions using space delimiters, functions only taking a single argument can also use the pipe syntax:
```yaml
- template:
source: app
template: '{{ .Value | ToLower }}'
```
A more complicated function example:
```yaml
- template:
source: app
template: '{{ Replace .Value "loki" "bloki" 1 }}'
```
The arguments here as described for the [Replace function](https://golang.org/pkg/strings/#Replace), in this example we are saying to Replace in the string `.Value` (which is our extracted value for the `app` key) the occurrence of the string "loki" with the string "bloki" exactly 1 time.
[More examples in unit test](../../pkg/logentry/stages/template_test.go)
### match
A match stage will take the provided label `selector` and determine if a group of provided Stages will be executed or not based on labels
```yaml
- match:
selector: "{app=\"loki\"}" ①
pipeline_name: loki_pipeline ②
stages: ③
```
`selector` is **required** and must be a [logql stream selector](../querying.md#log-stream-selector).
`pipeline_name` is **optional** but when defined, will create an additional label on the `pipeline_duration_seconds` histogram, the value for `pipeline_name` will be concatenated with the `job_name` using an underscore: `job_name`_`pipeline_name`
`stages` is a **required** list of additional pipeline stages which will only be executed if the defined `selector` matches the labels. The format is a list of pipeline stages which is defined exactly the same as the root pipeline
[Example in unit test](../../pkg/logentry/stages/match_test.go)
### timestamp
A timestamp stage will parse data from the `extracted` map and set the `time` value which will be stored by Loki. The timestamp stage is important for having log entries in the correct order. In the absence of this stage, promtail will associate the current timestamp to the log entry.
```yaml
- timestamp:
source: ①
format: ②
location: ③
```
`source` is **required** and is the key name to data in the `extracted` map.
`format` is **required** and is the input to Go's [time.parse](https://golang.org/pkg/time/#Parse) function.
`location` is **optional** and is an IANA Timezone Database string, see the [go docs](https://golang.org/pkg/time/#LoadLocation) for more info
Several of Go's pre-defined format's can be used by their name:
```go
ANSIC = "Mon Jan _2 15:04:05 2006"
UnixDate = "Mon Jan _2 15:04:05 MST 2006"
RubyDate = "Mon Jan 02 15:04:05 -0700 2006"
RFC822 = "02 Jan 06 15:04 MST"
RFC822Z = "02 Jan 06 15:04 -0700" // RFC822 with numeric zone
RFC850 = "Monday, 02-Jan-06 15:04:05 MST"
RFC1123 = "Mon, 02 Jan 2006 15:04:05 MST"
RFC1123Z = "Mon, 02 Jan 2006 15:04:05 -0700" // RFC1123 with numeric zone
RFC3339 = "2006-01-02T15:04:05-07:00"
RFC3339Nano = "2006-01-02T15:04:05.999999999-07:00"
```
Additionally support for common Unix timestamps is supported:
```go
Unix = 1562708916
UnixMs = 1562708916414
UnixNs = 1562708916000000123
```
Finally any custom format can be supplied, and will be passed directly in as the layout parameter in `time.Parse()`. If the custom format has no year component specified (ie. syslog's default logs), promtail will assume the current year should be used, correctly handling the edge cases around new year's eve.
The syntax used by the custom format defines the reference date and time using specific values for each component of the timestamp (ie. `Mon Jan 2 15:04:05 -0700 MST 2006`). The following table shows supported reference values which should be used in the custom format.
| Timestamp component | Format value |
| ------------------- | ------------ |
| Year | `06`, `2006` |
| Month | `1`, `01`, `Jan`, `January` |
| Day | `2`, `02`, `_2` (two digits right justified) |
| Day of the week | `Mon`, `Monday` |
| Hour | `3` (12-hour), `03` (12-hour zero prefixed), `15` (24-hour) |
| Minute | `4`, `04` |
| Second | `5`, `05` |
| Fraction of second | `.000` (ms zero prefixed), `.000000` (μs), `.000000000` (ns), `.999` (ms without trailing zeroes), `.999999` (μs), `.999999999` (ns) |
| 12-hour period | `pm`, `PM` |
| Timezone name | `MST` |
| Timezone offset | `-0700`, `-070000` (with seconds), `-07`, `07:00`, `-07:00:00` (with seconds) |
| Timezone ISO-8601 | `Z0700` (Z for UTC or time offset), `Z070000`, `Z07`, `Z07:00`, `Z07:00:00`
_For more details, read the [`time.Parse()`](https://golang.org/pkg/time/#Parse) docs and [`format.go`](https://golang.org/src/time/format.go) sources._
##### Example:
```yaml
- timestamp:
source: time
format: RFC3339Nano
```
This stage would be placed after the [regex](#regex) example stage above, and the resulting `extracted` map _time_ value would be stored by Loki.
[Example in unit test](../../pkg/logentry/stages/timestamp_test.go)
### output
An output stage will take data from the `extracted` map and set the `entry` value which be stored by Loki.
```yaml
- output:
source: ①
```
`source` is **required** and is the key name to data in the `extracted` map.
##### Example:
```yaml
- output:
source: content
```
This stage would be placed after the [regex](#regex) example stage above, and the resulting `extracted` map _content_ value would be stored as the log value by Loki.
[Example in unit test](../../pkg/logentry/stages/output_test.go)
### labels
A label stage will take data from the `extracted` map and set additional `labels` on the log line.
```yaml
- labels:
label_name: source ①②
```
`label_name` is **required** and will be the name of the label added.
`"source"` is **optional**, if not provided the label_name is used as the source key into the `extracted` map
##### Example:
```yaml
- labels:
stream:
```
This stage when placed after the [regex](#regex) example stage above, would create the following `labels`:
```go
{
"stream": "stderr",
}
```
[Example in unit test](../../pkg/logentry/stages/labels_test.go)
### metrics
A metrics stage will define and update metrics from `extracted` data.
[Simple example in unit test](../../pkg/logentry/stages/metrics_test.go)
Several metric types are available:
#### Counter
```yaml
- metrics:
counter_name: ①
type: Counter ②
description: ③
source: ④
config:
value: ⑤
action: ⑥
```
`counter_name` is **required** and should be set to the desired counters name.
`type` is **required** and should be the word `Counter` (case insensitive).
`description` is **optional** but recommended.
`source` is **optional** and is will be used as the key in the `extracted` data map, if not provided it will default to the `counter_name`.
`value` is **optional**, if present, the metric will only be operated on if `value` == `extracted[source]`. For example, if `value` is _panic_ then the counter will only be modified if `extracted[source] == "panic"`.
`action` is **required** and must be either `inc` or `add` (case insensitive). If `add` is chosen, the value of the `extracted` data will be used as the parameter to the method and therefore must be convertible to a positive float.
##### Examples
```yaml
- metrics:
log_lines_total:
type: Counter
description: "total number of log lines"
source: time
config:
action: inc
```
This counter will increment whenever the _time_ key is present in the `extracted` map, since every log entry should have a timestamp this is a good field to pick if you wanted to count every line. Notice `value` is missing here because we don't care what the value is, we want to match every timestamp. Also we use `inc` because we are not interested in the value of the extracted _time_ field.
```yaml
- regex:
expression: "^.*(?P<order_success>order successful).*$"
- metrics:
succesful_orders_total:
type: Counter
description: "log lines with the message `order successful`"
source: order_success
config:
action: inc
```
This combo regex and counter would count any log line which has the words `order successful` in it.
```yaml
- regex:
expression: "^.* order_status=(?P<order_status>.*?) .*$"
- metrics:
succesful_orders_total:
type: Counter
description: "successful orders"
source: order_status
config:
value: success
action: inc
failed_orders_total:
type: Counter
description: "failed orders"
source: order_status
config:
fail: fail
action: inc
```
Similarly, this would look for a key=value pair of `order_status=success` or `order_status=fail` and increment each counter respectively.
#### Gauge
```yaml
- metrics:
gauge_name: ①
type: Gauge ②
description: ③
source: ④
config:
value: ⑤
action: ⑥
```
`gauge_name` is **required** and should be set to the desired counters name.
`type` is **required** and should be the word `Gauge` (case insensitive).
`description` is **optional** but recommended.
`source` is **optional** and is will be used as the key in the `extracted` data map, if not provided it will default to the `gauge_name`.
`value` is **optional**, if present, the metric will only be operated on if `value` == `extracted[source]`. For example, if `value` is _panic_ then the counter will only be modified if `extracted[source] == "panic"`.
`action` is **required** and must be either `set`, `inc`, `dec`, `add` or `sub` (case insensitive). If `add`, `set`, or `sub`, is chosen, the value of the `extracted` data will be used as the parameter to the method and therefore must be convertible to a positive float.
##### Example
Gauge examples will be very similar to Counter examples with additional `action` values
#### Histogram
```yaml
- metrics:
histogram_name: ①
type: Histogram ②
description: ③
source: ④
config:
value: ⑤
buckets: [] ⑥⑦
```
`histogram_name` is **required** and should be set to the desired counters name.
`type` is **required** and should be the word `Histogram` (case insensitive).
`description` is **optional** but recommended.
`source` is **optional** and is will be used as the key in the `extracted` data map, if not provided it will default to the `histogram_name`.
`value` is **optional**, if present, the metric will only be operated on if `value` == `extracted[source]`. For example, if `value` is _panic_ then the counter will only be modified if `extracted[source] == "panic"`.
`action` is **required** and must be either `inc` or `add` (case insensitive). If `add` is chosen, the value of the `extracted` data will be used as the parameter in `add()` and therefore must be convertible to a numeric type.
⑦ bucket values should be an array of numeric type
##### Example
```yaml
- metrics:
http_response_time_seconds:
type: Histogram
description: "length of each log line"
source: response_time
config:
buckets: [0.001,0.0025,0.005,0.010,0.025,0.050]
```
This would create a Histogram which looks for _response_time_ in the `extracted` data and applies the value to the histogram.

@ -0,0 +1,153 @@
# LogQL: Log Query Language
Loki comes with its very own language for querying logs called *LogQL*. LogQL
can be considered a distributed `grep` with labels for filtering.
A basic LogQL query consists of two parts: the **log stream selector** and a
**filter expression**. Due to Loki's design, all LogQL queries are required to
contain a log stream selector.
The log stream selector will reduce the number of log streams to a manageable
volume. Depending how many labels you use to filter down the log streams will
affect the relative performance of the query's execution. The filter expression
is then used to do a distributed `grep` over the retrieved log streams.
### Log Stream Selector
The log stream selector determines which log streams should be included in your
query. The stream selector is comprised of one or more key-value pairs, where
each key is a **log label** and the value is that label's value.
The log stream selector is written by wrapping the key-value pairs in a
pair of curly braces:
```
{app="mysql",name="mysql-backup"}
```
In this example, log streams that have a label of `app` whose value is `mysql`
_and_ a label of `name` whose value is `mysql-backup` will be included in the
query results.
The `=` operator after the label name is a **label matching operator**. The
following label matching operators are supported:
- `=`: exactly equal.
- `!=`: not equal.
- `=~`: regex matches.
- `!~`: regex does not match.
Examples:
- `{name=~"mysql.+"}`
- `{name!~"mysql.+"}`
The same rules that apply for [Prometheus Label
Selectors](https://prometheus.io/docs/prometheus/latest/querying/basics/#instant-vector-selectors)
apply for Loki log stream selectors.
### Filter Expression
After writing the log stream selector, the resulting set of logs can be filtered
further with a search expression. The search expression can be just text or
regex:
- `{job="mysql"} |= "error"`
- `{name="kafka"} |~ "tsdb-ops.*io:2003"`
- `{instance=~"kafka-[23]",name="kafka"} != kafka.server:type=ReplicaManager`
In the previous examples, `|=`, `|~`, and `!=` act as **filter operators** and
the following filter operators are supported:
- `|=`: Log line contains string.
- `!=`: Log line does not contain string.
- `|~`: Log line matches regular expression.
- `!~`: Log line does not match regular expression.
Filter operators can be chained and will sequentially filter down the
expression - resulting log lines must satisfy _every_ filter:
`{job="mysql"} |= "error" != "timeout"`
When using `|~` and `!~`,
[Go RE2 syntax](https://github.com/google/re2/wiki/Syntax) regex may be used. The
matching is case-sensitive by default and can be switched to case-insensitive
prefixing the regex with `(?i)`.
## Counting logs
LogQL also supports functions that wrap a query and allow for counting entries
per stream.
### Range Vector aggregation
LogQL shares the same [range vector](https://prometheus.io/docs/prometheus/latest/querying/basics/#range-vector-selectors)
concept from Prometheus, except the selected range of samples include a value of
1 for each log entry. An aggregation can be applied over the selected range to
transform it into an instance vector.
The currently supported functions for operating over are:
- `rate`: calculate the number of entries per second
- `count_over_time`: counts the entries for each log stream within the given
range.
> `count_over_time({job="mysql"}[5m])`
This example counts all the log lines within the last five minutes for the
MySQL job.
> `rate( ( {job="mysql"} |= "error" != "timeout)[10s] ) )`
This example demonstrates that a fully LogQL query can be wrapped in the
aggregation syntax, including filter expressions. This example gets the
per-second rate of all non-timeout errors within the last ten seconds for the
MySQL job.
### Aggregation operators
Like [PromQL](https://prometheus.io/docs/prometheus/latest/querying/operators/#aggregation-operators),
LogQL supports a subset of built-in aggregation operators that can be used to
aggregate the element of a single vector, resulting in a new vector of fewer
elements but with aggregated values:
- `sum`: Calculate sum over labels
- `min`: Select minimum over labels
- `max`: Select maximum over labels
- `avg`: Calculate the average over labels
- `stddev`: Calculate the population standard deviation over labels
- `stdvar`: Calculate the population standard variance over labels
- `count`: Count number of elements in the vector
- `bottomk`: Select smallest k elements by sample value
- `topk`: Select largest k elements by sample value
The aggregation operators can either be used to aggregate over all label
values or a set of distinct label values by including a `withou` or a
`by` clause:
> `<aggr-op>([parameter,] <vector expression>) [without|by (<label list>)]`
`parameter` is only required when using `topk` and `bottomk`. `topk` and
`bottomk` are different from other aggregators in that a subset of the input
samples, including the original labels, are returned in the result vector. `by`
and `without` are only used to group the input vector.
The `without` cause removes the listed labels from the resulting vector, keeping
all others. The `by` clause does the opposite, dropping labels that are not
listed in the clause, even if their label values are identical between all
elements of the vector.
#### Examples
Get the top 10 applications by the highest log throughput:
> `topk(10,sum(rate({region="us-east1"}[5m]) by (name))`
Get the count of logs during the last five minutes, grouping
by level:
> `sum(count_over_time({job="mysql"}[5m])) by (level)`
Get the rate of HTTP GET requests from NGINX logs:
> `avg(rate(({job="nginx"} |= "GET")[10s])) by (region)`

@ -1,60 +0,0 @@
# Overview
Loki is the heart of the whole logging stack. It is responsible for permanently
storing the ingested log lines, as well as executing the queries against its
persistent store to analyze the contents.
## Architecture
Loki mainly consists of three and a half individual services that achieve this
in common. The high level architecture is based on Cortex, most of their documentation
usually applies to Loki as well.
### Distributor
The distributor can be considered the "first stop" for the log lines ingested by
the agents (e.g. Promtail).
It performs validation tasks on the data, splits it into batches and sends it to
multiple Ingesters in parallel.
Distributors communicate with ingesters via gRPC. They are *stateless* and can be scaled up and down as needed.
Refer to the [Cortex
docs](https://github.com/cortexproject/cortex/blob/master/docs/architecture.md#distributor)
for details on the internals.
### Ingester
The ingester service is responsible for de-duplicating and persisting the data
to long-term storage backends (DynamoDB, S3, Cassandra, etc.).
Ingesters are semi-*stateful*, the maintain the last 12 hours worth of logs
before flushing to the [Chunk store](#chunk-store). When restarting ingesters,
care must be taken no to lose this data.
More details can be found in the [Cortex
docs](https://github.com/cortexproject/cortex/blob/master/docs/architecture.md#ingester).
### Chunk store
Loki is not database, so it needs some place to persist the ingested log lines
to, for a longer period of time.
The chunk store is not really a service of Loki in the traditional way, but
rather some storage backend Loki uses.
It consists of a key-value (KV) store for the actual **chunk data** and an
**index store** to keep track of them. Refer to [Storage](storage.md) for details.
The [Cortex
docs](https://github.com/cortexproject/cortex/blob/master/docs/architecture.md#chunk-store)
also have good information about this.
### Querier
The Querier executes the LogQL queries from clients such as Grafana and LogCLI.
It fetches its data directly from the [Chunk store](#chunk-store) and the
[Ingesters](#ingester).

@ -1,311 +0,0 @@
# Loki API
The Loki server has the following API endpoints (_Note:_ Authentication is out of scope for this project):
- `POST /api/prom/push`
For sending log entries, expects a snappy compressed proto in the HTTP Body:
- [ProtoBuffer definition](/pkg/logproto/logproto.proto)
- [Golang client library](/pkg/promtail/client/client.go)
Also accepts JSON formatted requests when the header `Content-Type: application/json` is sent. Example of the JSON format:
```json
{
"streams": [
{
"labels": "{foo=\"bar\"}",
"entries": [{ "ts": "2018-12-18T08:28:06.801064-04:00", "line": "baz" }]
}
]
}
```
- `GET /api/v1/query`
For doing instant queries at a single point in time, accepts the following parameters in the query-string:
- `query`: a logQL query
- `limit`: max number of entries to return (not used for metric queries)
- `time`: the evaluation time for the query, as a nanosecond Unix epoch (nanoseconds since 1970). Default is always now.
- `direction`: `forward` or `backward`, useful when specifying a limit. Default is backward.
Loki needs to query the index store in order to find log streams for particular labels and the store is spread out by time,
so you need to specify the time and labels accordingly. Querying a long time into the history will cause additional
load to the index server and make the query slower.
Responses looks like this:
```json
{
"status" : "success",
"data": {
"resultType": "vector" | "streams",
"result": <value>
}
}
```
Examples:
```bash
$ curl -G -s "http://localhost:3100/api/v1/query" --data-urlencode 'query=sum(rate({job="varlogs"}[10m])) by (level)' | jq
{
"status" : "success",
"data": {
"resultType": "vector",
"result": [
{
"metric": {},
"value": [
1559848867745737,
"1267.1266666666666"
]
},
{
"metric": {
"level": "warn"
},
"value": [
1559848867745737,
"37.77166666666667"
]
},
{
"metric": {
"level": "info"
},
"value": [
1559848867745737,
"37.69"
]
}
]
}
}
```
```bash
curl -G -s "http://localhost:3100/api/v1/query" --data-urlencode 'query={job="varlogs"}' | jq
{
"status" : "success",
"data": {
"resultType": "streams",
"result": [
{
"labels": "{filename=\"/var/log/myproject.log\", job=\"varlogs\", level=\"info\"}",
"entries": [
{
"ts": "2019-06-06T19:25:41.972739Z",
"line": "foo"
},
{
"ts": "2019-06-06T19:25:41.972722Z",
"line": "bar"
}
]
}
]
}
}
```
- `GET /api/v1/query_range`
For doing queries over a range of time, accepts the following parameters in the query-string:
- `query`: a logQL query
- `limit`: max number of entries to return (not used for metric queries)
- `start`: the start time for the query, as a nanosecond Unix epoch (nanoseconds since 1970). Default is always one hour ago.
- `end`: the end time for the query, as a nanosecond Unix epoch (nanoseconds since 1970). Default is always now.
- `step`: query resolution step width in seconds. Default 1 second.
- `direction`: `forward` or `backward`, useful when specifying a limit. Default is backward.
Loki needs to query the index store in order to find log streams for particular labels and the store is spread out by time,
so you need to specify the time and labels accordingly. Querying a long time into the history will cause additional
load to the index server and make the query slower.
Responses looks like this:
```json
{
"status" : "success",
"data": {
"resultType": "matrix" | "streams",
"result": <value>
}
}
```
Examples:
```bash
$ curl -G -s "http://localhost:3100/api/v1/query_range" --data-urlencode 'query=sum(rate({job="varlogs"}[10m])) by (level)' --data-urlencode 'step=300' | jq
{
"status" : "success",
"data": {
"resultType": "matrix",
"result": [
{
"metric": {
"level": "info"
},
"values": [
[
1559848958663735,
"137.95"
],
[
1559849258663735,
"467.115"
],
[
1559849558663735,
"658.8516666666667"
]
]
},
{
"metric": {
"level": "warn"
},
"values": [
[
1559848958663735,
"137.27833333333334"
],
[
1559849258663735,
"467.69"
],
[
1559849558663735,
"660.6933333333334"
]
]
}
]
}
}
```
```bash
curl -G -s "http://localhost:3100/api/v1/query_range" --data-urlencode 'query={job="varlogs"}' | jq
{
"status" : "success",
"data": {
"resultType": "streams",
"result": [
{
"labels": "{filename=\"/var/log/myproject.log\", job=\"varlogs\", level=\"info\"}",
"entries": [
{
"ts": "2019-06-06T19:25:41.972739Z",
"line": "foo"
},
{
"ts": "2019-06-06T19:25:41.972722Z",
"line": "bar"
}
]
}
]
}
}
```
- `GET /api/prom/query`
For doing queries, accepts the following parameters in the query-string:
- `query`: a [logQL query](../querying.md) (eg: `{name=~"mysql.+"}` or `{name=~"mysql.+"} |= "error"`)
- `limit`: max number of entries to return
- `start`: the start time for the query, as a nanosecond Unix epoch (nanoseconds since 1970) or as RFC3339Nano (eg: "2006-01-02T15:04:05.999999999-07:00"). Default is always one hour ago.
- `end`: the end time for the query, as a nanosecond Unix epoch (nanoseconds since 1970) or as RFC3339Nano (eg: "2006-01-02T15:04:05.999999999-07:00"). Default is current time.
- `direction`: `forward` or `backward`, useful when specifying a limit. Default is backward.
- `regexp`: a regex to filter the returned results
Loki needs to query the index store in order to find log streams for particular labels and the store is spread out by time,
so you need to specify the start and end labels accordingly. Querying a long time into the history will cause additional
load to the index server and make the query slower.
> This endpoint will be deprecated in the future you should use `api/v1/query_range` instead.
> You can only query for logs, it doesn't accept [queries returning metrics](../querying.md#counting-logs).
Responses looks like this:
```json
{
"streams": [
{
"labels": "{instance=\"...\", job=\"...\", namespace=\"...\"}",
"entries": [
{
"ts": "2018-06-27T05:20:28.699492635Z",
"line": "..."
},
...
]
},
...
]
}
```
- `GET /api/prom/label`
For doing label name queries, accepts the following parameters in the query-string:
- `start`: the start time for the query, as a nanosecond Unix epoch (nanoseconds since 1970). Default is always 6 hour ago.
- `end`: the end time for the query, as a nanosecond Unix epoch (nanoseconds since 1970). Default is current time.
Responses looks like this:
```json
{
"values": [
"instance",
"job",
...
]
}
```
- `GET /api/prom/label/<name>/values`
For doing label values queries, accepts the following parameters in the query-string:
- `start`: the start time for the query, as a nanosecond Unix epoch (nanoseconds since 1970). Default is always 6 hour ago.
- `end`: the end time for the query, as a nanosecond Unix epoch (nanoseconds since 1970). Default is current time.
Responses looks like this:
```json
{
"values": [
"default",
"cortex-ops",
...
]
}
```
- `GET /ready`
This endpoint returns 200 when Loki ingester is ready to accept traffic. If you're running Loki on Kubernetes, this endpoint can be used as readiness probe.
- `GET /flush`
This endpoint triggers a flush of all in memory chunks in the ingester. Mainly used for local testing.
- `GET /metrics`
This endpoint returns Loki metrics for Prometheus. See "[Operations > Observability > Metrics](./operations.md)" to have a list of exported metrics.
## Examples of using the API in a third-party client library
1) Take a look at this [client](https://github.com/afiskon/promtail-client), but be aware that the API is not stable yet (Golang).
2) Example on [Python3](https://github.com/sleleko/devops-kb/blob/master/python/push-to-loki.py)

@ -1,88 +0,0 @@
# Operations
This page lists operational aspects of running Loki in alphabetical order:
## Authentication
Loki does not have an authentication layer.
You are expected to run an authenticating reverse proxy in front of your services, such as an Nginx with basic auth or an OAuth2 proxy.
See [client options](../promtail/deployment-methods.md#custom-client-options) for more details about supported authentication methods.
### Multi-tenancy
Loki is a multitenant system; requests and data for tenant A are isolated from tenant B.
Requests to the Loki API should include an HTTP header (`X-Scope-OrgID`) identifying the tenant for the request.
Tenant IDs can be any alphanumeric string; limiting them to 20 bytes is reasonable.
To run in multitenant mode, loki should be started with `auth_enabled: true`.
Loki can be run in "single-tenant" mode where the `X-Scope-OrgID` header is not required.
In this situation, the tenant ID is defaulted to be `fake`.
## Observability
### Metrics
Both Loki and promtail expose a `/metrics` endpoint for Prometheus metrics.
You need a local Prometheus and make sure it can add your Loki and promtail as targets, [see Prometheus configuration](https://prometheus.io/docs/prometheus/latest/configuration/configuration).
When Prometheus can scrape Loki and promtail, you get the following metrics:
Loki metrics:
- `log_messages_total` Total number of log messages.
- `loki_distributor_bytes_received_total` The total number of uncompressed bytes received per tenant.
- `loki_distributor_lines_received_total` The total number of lines received per tenant.
- `loki_ingester_streams_created_total` The total number of streams created per tenant.
- `loki_request_duration_seconds_count` Number of received HTTP requests.
Promtail metrics:
- `promtail_read_bytes_total` Number of bytes read.
- `promtail_read_lines_total` Number of lines read.
- `promtail_request_duration_seconds_count` Number of send requests.
- `promtail_encoded_bytes_total` Number of bytes encoded and ready to send.
- `promtail_sent_bytes_total` Number of bytes sent.
- `promtail_dropped_bytes_total` Number of bytes dropped because failed to be sent to the ingester after all retries.
- `promtail_sent_entries_total` Number of log entries sent to the ingester.
- `promtail_dropped_entries_total` Number of log entries dropped because failed to be sent to the ingester after all retries.
Most of these metrics are counters and should continuously increase during normal operations:
1. Your app emits a log line to a file tracked by promtail.
2. Promtail reads the new line and increases its counters.
3. Promtail forwards the line to a Loki distributor, its received counters should increase.
4. The Loki distributor forwards it to a Loki ingester, its request duration counter increases.
You can import dashboard with ID [10004](https://grafana.com/dashboards/10004) to see them in Grafana UI.
### Monitoring Mixins
Check out our [Loki mixin](../../production/loki-mixin) for a set of dashboards, recording rules, and alerts.
These give you a comprehensive package on how to monitor Loki in production.
For more information about mixins, take a look at the [mixins project docs](https://github.com/monitoring-mixins/docs).
## Retention/Deleting old data
Retention in Loki can be done by configuring Table Manager. You need to set a retention period and enable deletes for retention using yaml config as seen [here](https://github.com/grafana/loki/blob/39bbd733be4a0d430986d9513476a91334485e9f/production/ksonnet/loki/config.libsonnet#L128-L129) or using `table-manager.retention-period` and `table-manager.retention-deletes-enabled` command line args. Retention period needs to be a duration in string format that can be parsed using [time.Duration](https://golang.org/pkg/time/#ParseDuration).
**[NOTE]** Retention period should be at least twice the [duration of periodic table config](https://github.com/grafana/loki/blob/347a3e18f4976d799d51a26cee229efbc27ef6c9/production/helm/loki/values.yaml#L53), which currently defaults to 7 days.
In the case of chunks retention when using S3 or GCS, you need to set the expiry policy on the bucket that is configured for storing chunks. For more details check [this](https://docs.aws.amazon.com/AmazonS3/latest/dev/object-lifecycle-mgmt.html) for S3 and [this](https://cloud.google.com/storage/docs/managing-lifecycles) for GCS.
Currently we only support global retention policy. A per user retention policy and API to delete ingested logs is still under development.
Feel free to add your use case to this [GitHub issue](https://github.com/grafana/loki/issues/162).
A design goal of Loki is that storing logs should be cheap, hence a volume-based deletion API was deprioritized.
Until this feature is released: If you suddenly must delete ingested logs, you can delete old chunks in your object store.
Note that this will only delete the log content while keeping the label index intact.
You will still be able to see related labels, but the log retrieval of the deleted log content will no longer work.
## Scalability
See this [blog post](https://grafana.com/blog/2018/12/12/loki-prometheus-inspired-open-source-logging-for-cloud-natives/) on a discussion about Loki's scalability.
When scaling Loki, consider running several Loki processes with their respective roles of ingestor, distributor, and querier.
Take a look at their respective `.libsonnet` files in [our production setup](../../production/ksonnet/loki) to get an idea about resource usage.
We're happy to get feedback about your resource usage.

@ -1,86 +0,0 @@
## Installation
Loki is provided as pre-compiled binaries, or as a Docker container image.
### Docker container (Recommended)
If you want to run in a container, use our Docker image:
```bash
$ docker pull "grafana/loki:v0.2.0"
```
### Binary
If you want to use plain binaries instead, head over to the
[Releases](https://github.com/grafana/loki/releases) on GitHub and download the
most recent one for you operating system and architecture.
Example (Linux, `amd64`), Loki `v0.2.0`:
```bash
# download binary (adapt app, os and arch as needed)
$ curl -fSL -o "/usr/local/bin/loki.gz" "https://github.com/grafana/loki/releases/download/v0.2.0/loki-linux-amd64.gz"
$ gunzip "/usr/local/bin/loki.gz"
# make sure it is executable
$ chmod a+x "/usr/local/bin/loki"
```
## Running
After you have Loki installed, you need to pick between one of the two operation modes:
You can either run all three components (Distributor, Ingester and Querier)
together as a single fat process, which allows easy operations because you do
not need to bother about inter-service communication. This is usually
recommended for most use-cases.
This still allows to scale horizontally, but keep in mind you scale all three
services at once.
The other option is the distributed mode, where each component runs on its own
and communicates with the others using gRPC.
This especially allows fine-grained control over scaling, because you can scale
the services individually.
### Single Process
#### `docker-compose`
To try it out locally, or to run on only a few systems, `docker-compose` is a
good choice.
Check out the [`docker-compose.yml` file on
GitHub](https://github.com/grafana/loki/blob/master/production/docker-compose.yaml).
#### `helm`
If want to quickly get up and running on Kubernetes, `helm` got you covered:
```bash
# add the loki repository to helm
$ helm repo add loki https://grafana.github.io/loki/charts
$ helm update
```
You can then choose between deploying the whole stack (Loki and Promtail), or
each component individually:
```bash
# whole stack
$ helm upgrade --install loki loki/loki-stack
# only Loki
$ helm upgrade --install loki loki/loki
# only Promtail
helm upgrade --install promtail loki/promtail --set "loki.serviceName=loki"
```
Refer to the
[Chart](https://github.com/grafana/loki/tree/master/production/helm) for more
information
### Distributed
Running in distributed mode is currently pretty hard, because of the multitude
of challenges that one encounters.
We do this internally, but cannot really recommend anyone to try as well.
If you really want to, you can take a look at our [production
setup](https://github.com/grafana/loki/tree/master/production/ksonnet). You have
been warned :P

@ -1,158 +0,0 @@
# Storage
Loki needs to store two different types of data: **Chunks** and **Indexes**.
Loki receives logs in separate streams. Each stream is identified by a set of labels.
As the log entries from a stream arrive, they are gzipped as chunks and saved in
the chunks store. The chunk format is documented in [`pkg/chunkenc`](../../pkg/chunkenc/README.md).
On the other hand, the index stores the stream's label set and links them to the
individual chunks.
### Local storage
By default, Loki stores everything on disk. The index is stored in a BoltDB under
`/tmp/loki/index` and the chunks are stored under `/tmp/loki/chunks`.
### Google Cloud Storage
Loki supports Google Cloud Storage. Refer to Grafana Labs'
[production setup](https://github.com/grafana/loki/blob/a422f394bb4660c98f7d692e16c3cc28747b7abd/production/ksonnet/loki/config.libsonnet#L55)
for the relevant configuration fields.
### Cassandra
Loki can use Cassandra for the index storage. Example config using Cassandra:
```yaml
schema_config:
configs:
- from: 2018-04-15
store: cassandra
object_store: filesystem
schema: v9
index:
prefix: cassandra_table
period: 168h
storage_config:
cassandra:
username: cassandra
password: cassandra
addresses: 127.0.0.1
auth: true
keyspace: lokiindex
filesystem:
directory: /tmp/loki/chunks
```
### AWS S3 & DynamoDB
Example config for using S3 & DynamoDB:
```yaml
schema_config:
configs:
- from: 0
store: dynamo
object_store: s3
schema: v9
index:
prefix: dynamodb_table_name
period: 0
storage_config:
aws:
s3: s3://access_key:secret_access_key@region/bucket_name
dynamodbconfig:
dynamodb: dynamodb://access_key:secret_access_key@region
```
If you don't wish to hard-code S3 credentials, you can also configure an
EC2 instance role by changing the `storage_config` section:
```yaml
storage_config:
aws:
s3: s3://region/bucket_name
dynamodbconfig:
dynamodb: dynamodb://region
```
#### S3
Loki can use S3 as object storage, storing logs within directories based on
the [OrgID](./operations.md#multi-tenancy). For example, logs from the `faker`
org will be stored in `s3://BUCKET_NAME/faker/`.
The S3 configuration is set up using the URL format:
`s3://access_key:secret_access_key@region/bucket_name`.
S3-compatible APIs (e.g., Ceph Object Storage with an S3-compatible API) can
be used. If the API supports path-style URL rather than virtual hosted bucket
addressing, configure the URL in `storage_config` with the custom endpoint:
```yaml
storage_config:
aws:
s3: s3://access_key:secret_access_key@custom_endpoint/bucket_name
s3forcepathstyle: true
```
Loki needs the following permissions to write to an S3 bucket:
* s3:ListBucket
* s3:PutObject
* s3:GetObject
#### DynamoDB
Loki can use DynamoDB for storing the index. The index is used for querying
logs. Throughput to the index should be adjusted to your usage.
Access to DynamoDB is very similar to S3; however, a table name does not
need to be specified in the storage section, as Loki calculates that for
you. The table name prefix will need to be configured inside `schema_config`
for Loki to be able to create new tables.
DynamoDB can be set up manually or automatically through `table-manager`.
The `table-manager` allows deleting old indices by rotating a number of
different DynamoDB tables and deleting the oldest one. An example deployment
of the `table-manager` using ksonnet can be found
[here](../../production/ksonnet/loki/table-manager.libsonnet) and more information
about it can be find at the
[Cortex project](https://github.com/cortexproject/cortex).
DynamoDB's `table-manager` client defaults provisioning capacity units
read to 300 and writes to 3000. The defaults can be overwritten in the
config:
```yaml
table_manager:
index_tables_provisioning:
provisioned_write_throughput: 10
provisioned_read_throughput: 10
chunk_tables_provisioning:
provisioned_write_throughput: 10
provisioned_read_throughput: 10
```
If DynamoDB is set up manually, old data cannot be easily erased and the index
will grow indefinitely. Manual configurations should ensure that the primary
index key is set to `h` (string) and the sort key is set to `r` (binary). The
"period" attribute in the yaml should be set to zero.
Loki needs the following permissions to write to DynamoDB:
* dynamodb:BatchGetItem
* dynamodb:BatchWriteItem
* dynamodb:DeleteItem
* dynamodb:DescribeTable
* dynamodb:GetItem
* dynamodb:ListTagsOfResource
* dynamodb:PutItem
* dynamodb:Query
* dynamodb:TagResource
* dynamodb:UntagResource
* dynamodb:UpdateItem
* dynamodb:UpdateTable

@ -0,0 +1,5 @@
# Loki Maintainers Guide
This section details information for maintainers of Loki.
1. [Releasing Loki](./release.md)

@ -5,13 +5,14 @@
* Create a new branch for updating changelog and version numbers
* In the changelog, set the version number and release date, create the next release as (unreleased) as a placeholder for people to add notes to the changelog
* List all the merged PR's since the last release, this command is helpful for generating the output: `curl https://api.github.com/search/issues?q=repo:grafana/loki+is:pr+"merged:>=2019-08-02" | jq -r ' .items[] | "* [" + (.number|tostring) + "](" + .html_url + ") **" + .user.login + "**: " + .title'`
* Go through the `docs/` and update references to the previous release version to the new one.
> Until [852](https://github.com/grafana/loki/issues/852) is fixed, updating the Helm and Ksonnet configs has to wait until after the release tag is pushed so that the Helm tests will pass.
* Merge the changelog PR
* Run:
**Note: This step creates the tag and therefore the release, this will trigger CI to build release artifacts (binaries and images) as well as publish them. As soon as this tag is pushed when CI finishes the new release artifacts will be available to the public**
**Note: This step creates the tag and therefore the release, this will trigger CI to build release artifacts (binaries and images) as well as publish them. As soon as this tag is pushed when CI finishes the new release artifacts will be available to the public**
```https://github.com/grafana/loki/releases
git pull
@ -27,7 +28,7 @@ make release-prepare
* Set the release version and in most cases the auto selected helm version numbers should be fine.
* Commit to another branch, make a PR and get merge it.
* Go to github and finish manually editing the auto generated release template and publish it!
* Go to GitHub and finish manually editing the auto generated release template and publish it!

Binary file not shown.

After

Width:  |  Height:  |  Size: 71 KiB

@ -0,0 +1,10 @@
# Operating Loki
1. [Authentication](authentication.md)
2. [Observability](observability.md)
3. [Scalability](scalability.md)
4. [Storage](storage/README.md)
1. [Table Manager](storage/table-manager.md)
2. [Retention](storage/retention.md)
5. [Multi-tenancy](multi-tenancy.md)
6. [Loki Canary](loki-canary.md)

@ -0,0 +1,14 @@
# Authentication with Loki
Loki does not come with any included authentication layer. Operators are
expected to run an authenticating reverse proxy in front of your services, such
as NGINX using basic auth or an OAuth2 proxy.
Note that when using Loki in multi-tenant mode, Loki requires the HTTP header
`X-Scopre-OrgID` to be set to a string identifying the user; the responsibility
of populating this value should be handled by the authenticating reverse proxy.
For more information on multi-tenancy please read its
[documentation](multi-tenancy.md).
For information on authenticating promtail, please see the docs for [how to
configure Promtail](../clients/promtail/configuration.md).

Before

Width:  |  Height:  |  Size: 55 KiB

After

Width:  |  Height:  |  Size: 55 KiB

@ -0,0 +1,172 @@
# Loki Canary
Loki Canary is a standalone app that audits the log capturing performance of
Loki.
## How it works
![block_diagram](loki-canary-block.png)
Loki Canary writes a log to a file and stores the timestamp in an internal
array. The contents look something like this:
```nohighlight
1557935669096040040 ppppppppppppppppppppppppppppppppppppppppppppppppppppppppppp
```
The relevant part of the log entry is the timestamp; the `p`s are just filler
bytes to make the size of the log configurable.
An agent (like Promtail) should be configured to read the log file and ship it
to Loki.
Meanwhile, Loki Canary will open a WebSocket connection to Loki and will tail
the logs it creates. When a log is received on the WebSocket, the timestamp
in the log message is compared to the internal array.
If the received log is:
* The next in the array to be received, it is removed from the array and the
(current time - log timestamp) is recorded in the `response_latency`
histogram. This is the expected behavior for well behaving logs.
* Not the next in the array to be received, it is removed from the array, the
response time is recorded in the `response_latency` histogram, and the
`out_of_order_entries` counter is incremented.
* Not in the array at all, it is checked against a separate list of received
logs to either increment the `duplicate_entries` counter or the
`unexpected_entries` counter.
In the background, Loki Canary also runs a timer which iterates through all of
the entries in the internal array. If any of the entries are older than the
duration specified by the `-wait` flag (defaulting to 60s), they are removed
from the array and the `websocket_missing_entries` counter is incremented. An
additional query is then made directly to Loki for any missing entries to
determine if they are truly missing or only missing from the WebSocket. If
missing entries are not found in the direct query, the `missing_entries` counter
is incremented.
## Installation
### Binary
Loki Canary is provided as a pre-compiled binary as part of the
[Loki Releases](https://github.com/grafana/loki/releases) on GitHub.
### Docker
Loki Canary is also provided as a Docker container image:
```bash
# change tag to the most recent release
$ docker pull grafana/loki-canary:v0.2.0
```
### Kubernetes
To run on Kubernetes, you can do something simple like:
`kubectl run loki-canary --generator=run-pod/v1
--image=grafana/loki-canary:latest --restart=Never --image-pull-policy=Never
--labels=name=loki-canary -- -addr=loki:3100`
Or you can do something more complex like deploy it as a DaemonSet, there is a
Tanka setup for this in the `production` folder, you can import it using
`jsonnet-bundler`:
```shell
jb install github.com/grafana/loki-canary/production/ksonnet/loki-canary
```
Then in your Tanka environment's `main.jsonnet` you'll want something like
this:
```jsonnet
local loki_canary = import 'loki-canary/loki-canary.libsonnet';
loki_canary {
loki_canary_args+:: {
addr: "loki:3100",
port: 80,
labelname: "instance",
interval: "100ms",
size: 1024,
wait: "3m",
},
_config+:: {
namespace: "default",
}
}
```
### From Source
If the other options are not sufficient for your use case, you can compile
`loki-canary` yourself:
```bash
# clone the source tree
$ git clone https://github.com/grafana/loki
# build the binary
$ make loki-canary
# (optionally build the container image)
$ make loki-canary-image
```
## Configuration
The address of Loki must be passed in with the `-addr` flag, and if your Loki
server uses TLS, `-tls=true` must also be provided. Note that using TLS will
cause the WebSocket connection to use `wss://` instead of `ws://`.
The `-labelname` and `-labelvalue` flags should also be provided, as these are
used by Loki Canary to filter the log stream to only process logs for the
current instance of the canary. Ensure that the values provided to the flags are
unique to each instance of Loki Canary. Grafana Labs' Tanka config
accomplishes this by passing in the pod name as the label value.
If Loki Canary reports a high number of `unexpected_entries`, Loki Canary may
not be waiting long enough and the value for the `-wait` flag should be
increased to a larger value than 60s.
__Be aware__ of the relationship between `pruneinterval` and the `interval`.
For example, with an interval of 10ms (100 logs per second) and a prune interval
of 60s, you will write 6000 logs per minute. If those logs were not received
over the WebSocket, the canary will attempt to query Loki directly to see if
they are completely lost. __However__ the query return is limited to 1000
results so you will not be able to return all the logs even if they did make it
to Loki.
__Likewise__, if you lower the `pruneinterval` you risk causing a denial of
service attack as all your canaries attempt to query for missing logs at
whatever your `pruneinterval` is defined at.
All options:
```nohighlight
-addr string
The Loki server URL:Port, e.g. loki:3100
-buckets int
Number of buckets in the response_latency histogram (default 10)
-interval duration
Duration between log entries (default 1s)
-labelname string
The label name for this instance of loki-canary to use in the log selector (default "name")
-labelvalue string
The unique label value for this instance of loki-canary to use in the log selector (default "loki-canary")
-pass string
Loki password
-port int
Port which loki-canary should expose metrics (default 3500)
-pruneinterval duration
Frequency to check sent vs received logs, also the frequency which queries for missing logs will be dispatched to loki (default 1m0s)
-size int
Size in bytes of each log line (default 100)
-tls
Does the loki connection use TLS?
-user string
Loki username
-wait duration
Duration to wait for log entries before reporting them lost (default 1m0s)
```

@ -0,0 +1,15 @@
# Loki Multi-Tenancy
Loki is a multi-tenant system; requests and data for tenant A are isolated from
tenant B. Requests to the Loki API should include an HTTP header
(`X-Scope-OrgID`) that identifies the tenant for the request.
Tenant IDs can be any alphanumeric string that fits within the Go HTTP header
limit (1MB). Operators are recommended to use a reasonable limit for uniquely
identifying tenants; 20 bytes is usually enough.
To run in multi-tenant mode, Loki should be started with `auth_enabled: true`.
Loki can be run in "single-tenant" mode where the `X-Scope-OrgID` header is not
required. In single-tenant mode, the tenant ID defaults to `fake`.

@ -0,0 +1,87 @@
# Observing Loki
Both Loki and Promtail expose a `/metrics` endpoint that expose Prometheus
metrics. You will need a local Prometheus and add Loki and Promtail as targets.
See [configuring
Prometheus](https://prometheus.io/docs/prometheus/latest/configuration/configuration)
for more information.
All components of Loki expose the following metrics:
| Metric Name | Metric Type | Description |
| ------------------------------- | ----------- | ---------------------------------------- |
| `log_messages_total` | Counter | Total number of messages logged by Loki. |
| `loki_request_duration_seconds` | Histogram | Number of received HTTP requests. |
The Loki Distributors expose the following metrics:
| Metric Name | Metric Type | Description |
| ------------------------------------------------- | ----------- | ----------------------------------------------------------- |
| `loki_distributor_ingester_appends_total` | Counter | The total number of batch appends sent to ingesters. |
| `loki_distributor_ingester_append_failures_total` | Counter | The total number of failed batch appends sent to ingesters. |
| `loki_distributor_bytes_received_total` | Counter | The total number of uncompressed bytes received per tenant. |
| `loki_distributor_lines_received_total` | Counter | The total number of lines received per tenant. |
The Loki Ingesters expose the following metrics:
| Metric Name | Metric Type | Description |
| ----------------------------------------- | ----------- | ------------------------------------------------------------------------------------------- |
| `cortex_ingester_flush_queue_length` | Gauge | The total number of series pending in the flush queue. |
| `loki_ingester_chunk_age_seconds` | Histogram | Distribution of chunk ages when flushed. |
| `loki_ingester_chunk_encode_time_seconds` | Histogram | Distribution of chunk encode times. |
| `loki_ingester_chunk_entries` | Histogram | Distribution of entires per-chunk when flushed. |
| `loki_ingester_chunk_size_bytes` | Histogram | Distribution of chunk sizes when flushed. |
| `loki_ingester_chunk_stored_bytes_total` | Counter | Total bytes stored in chunks per tenant. |
| `loki_ingester_chunks_created_total` | Counter | The total number of chunks created in the ingester. |
| `loki_ingester_chunks_flushed_total` | Counter | The total number of chunks flushed by the ingester. |
| `loki_ingester_chunks_stored_total` | Counter | Total stored chunks per tenant. |
| `loki_ingester_received_chunks` | Counter | The total number of chunks sent by this ingester whilst joining during the handoff process. |
| `loki_ingester_samples_per_chunk` | Histogram | The number of samples in a chunk. |
| `loki_ingester_sent_chunks` | Counter | The total number of chunks sent by this ingester whilst leaving during the handoff process. |
| `loki_ingester_streams_created_total` | Counter | The total number of streams created per tenant. |
| `loki_ingester_streams_removed_total` | Counter | The total number of streams removed per tenant. |
Promtail exposes these metrics:
| Metric Name | Metric Type | Description |
| ----------------------------------------- | ----------- | ------------------------------------------------------------------------------------------ |
| `promtail_read_bytes_total` | Gauge | Number of bytes read. |
| `promtail_read_lines_total` | Counter | Number of lines read. |
| `promtail_dropped_bytes_total` | Counter | Number of bytes dropped because failed to be sent to the ingester after all retries. |
| `promtail_dropped_entries_total` | Counter | Number of log entries dropped because failed to be sent to the ingester after all retries. |
| `promtail_encoded_bytes_total` | Counter | Number of bytes encoded and ready to send. |
| `promtail_file_bytes_total` | Gauge | Number of bytes read from files. |
| `promtail_files_active_total` | Gauge | Number of active files. |
| `promtail_log_entries_bytes` | Histogram | The total count of bytes read. |
| `promtail_request_duration_seconds_count` | Histogram | Number of send requests. |
| `promtail_sent_bytes_total` | Counter | Number of bytes sent. |
| `promtail_sent_entries_total` | Counter | Number of log entries sent to the ingester. |
| `promtail_targets_active_total` | Gauge | Number of total active targets. |
| `promtail_targets_failed_total` | Counter | Number of total failed targets. |
Most of these metrics are counters and should continuously increase during normal operations:
1. Your app emits a log line to a file that is tracked by Promtail.
2. Promtail reads the new line and increases its counters.
3. Promtail forwards the log line to a Loki distributor, where the received
counters should increase.
4. The Loki distributor forwards the log line to a Loki ingester, where the
request duration counter should increase.
If Promtail uses any pipelines with metrics stages, those metrics will also be
exposed by Promtail at its `/metrics` endpoint. See Promtail's documentation on
[Pipelines](../clients/promtail/pipelines.md) for more information.
An example Grafana dashboard was built by the community and is available as
dashboard [10004](https://grafana.com/dashboards/10004).
## Mixins
The Loki repository has a [mixin](../../production/loki-mixin) that includes a
set of dashboards, recording rules, and alerts. Together, the mixin gives you a
comprehensive package for monitoring Loki in production.
For more information about mixins, take a look at the docs for the
[monitoring-mixins project](https://github.com/monitoring-mixins/docs).

@ -0,0 +1,11 @@
# Scaling with Loki
See this
[blog post](https://grafana.com/blog/2018/12/12/loki-prometheus-inspired-open-source-logging-for-cloud-natives/)
on a discussion about Loki's scalability.
When scaling Loki, operators should consider running several Loki processes
partitioned by role (ingester, distributor, querier) rather than a single Loki
process. Grafana Labs' [production setup](../../production/ksonnet/loki)
contains `.libsonnet` files that demonstrates configuring separate components
and scaling for resource usage.

@ -0,0 +1,95 @@
# Loki Storage
Loki needs to store two different types of data: **chunks** and **indexes**.
Loki receives logs in separate streams, where each stream is uniquely identified
by its tenant ID and its set of labels. As log entries from a stream arrive,
they are GZipped as "chunks" and saved in the chunks store. See [chunk
format](#chunk-format) for how chunks are stored internally.
The **index** stores each stream's label set and links them to the individual
chunks.
Refer to Loki's [configuration](../../configuration/README.md) for details on
how to configure the storage and the index.
For more information:
1. [Table Manager](table-manager.md)
2. [Retention](retention.md)
## Supported Stores
The following are supported for the index:
* [Amazon DynamoDB](https://aws.amazon.com/dynamodb)
* [Google Bigtable](https://cloud.google.com/bigtable)
* [Apache Cassandra](https://cassandra.apache.org)
* [BoltDB](https://github.com/boltdb/bolt) (doesn't work when clustering Loki)
The following are supported for the chunks:
* [Amazon DynamoDB](https://aws.amazon.com/dynamodb)
* [Google Bigtable](https://cloud.google.com/bigtable)
* [Apache Cassandra](https://cassandra.apache.org)
* [Amazon S3](https://aws.amazon.com/s3)
* [Google Cloud Storage](https://cloud.google.com/storage/)
* Filesystem (doesn't work when clustering Loki)
## Cloud Storage Permissions
### S3
When using S3 as object storage, the following permissions are needed:
* `s3:ListBucket`
* `s3:PutObject`
* `s3:GetObject`
### DynamoDB
When using DynamoDB for the index, the following permissions are needed:
* `dynamodb:BatchGetItem`
* `dynamodb:BatchWriteItem`
* `dynamodb:DeleteItem`
* `dynamodb:DescribeTable`
* `dynamodb:GetItem`
* `dynamodb:ListTagsOfResource`
* `dynamodb:PutItem`
* `dynamodb:Query`
* `dynamodb:TagResource`
* `dynamodb:UntagResource`
* `dynamodb:UpdateItem`
* `dynamodb:UpdateTable`
## Chunk Format
```
-------------------------------------------------------------------
| | |
| MagicNumber(4b) | version(1b) |
| | |
-------------------------------------------------------------------
| block-1 bytes | checksum (4b) |
-------------------------------------------------------------------
| block-2 bytes | checksum (4b) |
-------------------------------------------------------------------
| block-n bytes | checksum (4b) |
-------------------------------------------------------------------
| #blocks (uvarint) |
-------------------------------------------------------------------
| #entries(uvarint) | mint, maxt (varint) | offset, len (uvarint) |
-------------------------------------------------------------------
| #entries(uvarint) | mint, maxt (varint) | offset, len (uvarint) |
-------------------------------------------------------------------
| #entries(uvarint) | mint, maxt (varint) | offset, len (uvarint) |
-------------------------------------------------------------------
| #entries(uvarint) | mint, maxt (varint) | offset, len (uvarint) |
-------------------------------------------------------------------
| checksum(from #blocks) |
-------------------------------------------------------------------
| metasOffset - offset to the point with #blocks |
-------------------------------------------------------------------
```

@ -0,0 +1,57 @@
# Loki Storage Retention
Retention in Loki is achieved through the [Table Manager](./table-manager.md).
In order to enable the retention support, the Table Manager needs to be
configured to enable deletions and a retention period. Please refer to the
[`table_manager_config`](../../configuration/README.md#table_manager_config)
section of the Loki configuration reference for all available options.
Alternatively, the `table-manager.retention-period` and
`table-manager.retention-deletes-enabled` command line flags can be used. The
provided retention period needs to be a duration represented as a string that
can be parsed using Go's [time.Duration](https://golang.org/pkg/time/#ParseDuration).
> **WARNING**: The retention period should be at least twice the [duration of
the periodic table config](https://github.com/grafana/loki/blob/347a3e18f4976d799d51a26cee229efbc27ef6c9/production/helm/loki/values.yaml#L53), which currently defaults to 7 days.
When using S3 or GCS, the bucket storing the chunks needs to have the expiry
policy set correctly. For more details check
[S3's documentation](https://docs.aws.amazon.com/AmazonS3/latest/dev/object-lifecycle-mgmt.html)
or
[GCS's documentation](https://cloud.google.com/storage/docs/managing-lifecycles).
Currently, the retention policy can only be set globally. A per-tenant retention
policy with an API to delete ingested logs is still under development.
Since a design goal of Loki is to make storing logs cheap, a volume-based
deletion API is deprioritized. Until this feature is released, if you suddenly
must delete ingested logs, you can delete old chunks in your object store. Note,
however, that this only deletes the log content and keeps the label index
intact; you will still be able to see related labels but will be unable to
retrieve the deleted log content.
## Example Configuration
Example configuration with GCS with a 30 day retention:
```yaml
schema_config:
configs:
- from: 2018-04-15
store: bigtable
object_store: gcs
schema: v9
index:
prefix: loki_index_
period: 168h
storage_config:
bigtable:
instance: BIGTABLE_INSTANCE
project: BIGTABLE_PROJECT
gcs:
bucket_name: GCS_BUCKET_NAME
table_manager:
retention_deletes_enabled: true
retention_period: 720h
```

@ -0,0 +1,33 @@
# Table Manager
The Table Manager is used to delete old data past a certain retention period.
The Table Manager also includes support for automatically provisioning DynamoDB
tables with autoscaling support.
For detailed information on configuring the Table Manager, refer to the
[table_manager_config](../../configuration/README.md#table_manager_config)
section in the Loki configuration document.
## DynamoDB Provisioning
When configuring DynamoDB with the Table Manager, the default [on-demand
provisioning](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.ReadWriteCapacityMode.html)
capacity units for reads are set to 300 and writes are set to 3000. The
defaults can be overwritten:
```yaml
table_manager:
index_tables_provisioning:
provisioned_write_throughput: 10
provisioned_read_throughput: 10
chunk_tables_provisioning:
provisioned_write_throughput: 10
provisioned_read_throughput: 10
```
If Table Manager is not automatically managing DynamoDB, old data cannot easily
be erased and the index will grow indefinitely. Manual configurations should
ensure that the primary index key is set to `h` (string) and the sort key is set
to `r` (binary). The "period" attribute in the configuration YAML should be set
to `0`.

@ -0,0 +1,149 @@
# Overview of Loki
Grafana Loki is a set of components that can be composed into a fully featured
logging stack.
Unlike other logging systems, Loki is built around the idea of only indexing
labels for logs and leaving the original log message unindexed. This means
that Loki is cheaper to operate and can be orders of magnitude more efficient.
For a more detailed version of this same document, please read
[Architecture](../architecture.md).
## Multi Tenancy
Loki supports multi-tenancy so that data between tenants is completely
separated. Multi-tenancy is achieved through a tenant ID (which is represented
as an alphanumeric string). When multi-tenancy mode is disabled, all requests
are internally given a tenant ID of "fake".
## Modes of Operation
Loki is optimized for both running locally (or at small scale) and for scaling
horizontally: Loki comes with a _single process mode_ that runs all of the required
microservices in one process. The single process mode is great for testing Loki
or for running it at a small scale. For horizontal scalability, the
microservices of Loki can be broken out into separate processes, allowing them
to scale independently of each other.
## Components
### Distributor
The **distributor** service is responsible for handling logs written by
[clients](../clients/README.md). It's essentially the "first stop" in the write
path for log data. Once the distributor receives log data, it splits them into
batches and sends them to multiple [ingesters](#ingester) in parallel.
Distributors communicate with ingesters via [gRPC](https://grpc.io). They are
stateless and can be scaled up and down as needed.
#### Hashing
Distributors use consistent hashing in conjunction with a configurable
replication factor to determine which instances of the ingester service should
receive log data.
The hash is based on a combination of the log's labels and the tenant ID.
A hash ring stored in [Consul](https://www.consul.io) is used to achieve
consistent hashing; all [ingesters](#ingester) register themselves into the
hash ring with a set of tokens they own. Distributors then find the token that
most closely matches the value of the log's hash and will send data to that
token's owner.
#### Quorum consistency
Since all distributors share access to the same hash ring, write requests can be
sent to any distributor.
To ensure consistent query results, Loki uses
[Dynamo-style](https://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf)
quorum consistency on reads and writes. This means that the distributor will wait
for a positive response of at least one half plus one of the ingesters to send
the sample to before responding to the user.
### Ingester
The **ingester** service is responsible for writing log data to long-term
storage backends (DynamoDB, S3, Cassandra, etc.).
The ingester validates that ingested log lines are received in
timestamp-ascending order (i.e., each log has a timestamp that occurs at a later
time than the log before it). When the ingester receives a log that does not
follow this order, the log line is rejected and an error is returned.
Logs from each unique set of labels are built up into "chunks" in memory and
then flushed to the backing storage backend.
If an ingester process crashes or exits abruptly, all the data that has not yet
been flushed will be lost. Loki is usually configured to replicate multiple
replicas (usually 3) of each log to mitigate this risk.
#### Handoff
By default, when an ingester is shutting down and tries to leave the hash ring,
it will wait to see if a new ingester tries to enter before flushing and will
try to initiate a handoff. The handoff will transfer all of the tokens and
in-memory chunks owned by the leaving ingester to the new ingester.
This process is used to avoid flushing all chunks when shutting down, which is a
slow process.
#### Filesystem Support
While ingesters do support writing to the filesystem through BoltDB, this only
works in single-process mode as [queriers](#querier) need access to the same
back-end store and BoltDB only allows one process to have a lock on the DB at a
given time.
### Querier
The **querier** service handles the actual [LogQL](../logql.md) evaluation of
logs stored in long-term storage.
It first tries to query all ingesters for in-memory data before falling back to
loading data from the backend store.
## Chunk Store
The **chunk store** is Loki's long-term data store, designed to support
interactive querying and sustained writing without the need for background
maintenance tasks. It consists of:
* An index for the chunks. This index can be backed by
[DynamoDB from Amazon Web Services](https://aws.amazon.com/dynamodb),
[Bigtable from Google Cloud Platform](https://cloud.google.com/bigtable), or
[Apache Cassandra](https://cassandra.apache.org).
* A key-value (KV) store for the chunk data itself, which can be DynamoDB,
Bigtable, Cassandra again, or an object store such as
[Amazon * S3](https://aws.amazon.com/s3)
> Unlike the other core components of Loki, the chunk store is not a separate
> service, job, or process, but rather a library embedded in the two services
> that need to access Loki data: the [ingester](#ingester) and [querier](#querier).
The chunk store relies on a unified interface to the
"[NoSQL](https://en.wikipedia.org/wiki/NoSQL)" stores (DynamoDB, Bigtable, and
Cassandra) that can be used to back the chunk store index. This interface
assumes that the index is a collection of entries keyed by:
* A **hash key**. This is required for *all* reads and writes.
* A **range key**. This is required for writes and can be omitted for reads,
which can be queried by prefix or range.
The interface works somewhat differently across the supported databases:
* DynamoDB supports range and hash keys natively. Index entries are thus
modelled directly as DynamoDB entries, with the hash key as the distribution
key and the range as the range key.
* For Bigtable and Cassandra, index entries are modelled as individual column
values. The hash key becomes the row key and the range key becomes the column
key.
A set of schemas are used to map the matchers and label sets used on reads and
writes to the chunk store into appropriate operations on the index. Schemas have
been added as Loki has evolved, mainly in an attempt to better load balance
writes and improve query performance.
> The current schema recommendation is the **v10 schema**.

@ -0,0 +1,51 @@
# Loki compared to other log systems
## Loki / Promtail / Grafana vs EFK
The EFK (Elasticsearch, Fluentd, Kibana) stack is used to ingest, visualize, and
query for logs from various sources.
Data in Elasticsearch is stored on-disk as unstructured JSON objects. Both the
keys for each object and the contents of each key are indexed. Data can then be
queried using a JSON object to define a query (called the Query DSL) or through
the Lucene query language.
In comparison, Loki in single-binary mode can store data on-disk, but in
horizontally-scalable mode data is stored in a cloud storage system such as S3,
GCS, or Cassandra. Logs are stored in plaintext form tagged with a set of label
names and values, where only the label pairs are indexed. This tradeoff makes it
cheaper to operate than a full index and allows developers to aggressively log
from their applications. Logs in Loki are queried using [LogQL](../logql.md).
However, because of this design tradeoff, LogQL queries that filter based on
content (i.e., text within the log lines) require loading all chunks within the
search window that match the labels defined in the query.
Fluentd is usually used to collect and forward logs to Elasticsearch. Fluentd is
called a data collector which can ingest logs from many sources, process it, and
forward it to one or more targets.
In comparison, Promtail's use case is specifically tailored to Loki. Its main mode
of operation is to discover log files stored on disk and forward them associated
with a set of labels to Loki. Promtail can do service discovery for Kubernetes
pods running on the same node as Promtail, act as a container sidecar or a
Docker logging driver, read logs from specified folders, and tail the systemd
journal.
The way Loki represents logs by a set of label pairs is similar to how
[Prometheus](https://prometheus.io) represents metrics. When deployed in an
environment alongside Prometheus, logs from Promtail usually have the same
labels as your applications metrics thanks to using the same service
discovery mechanisms. Having logs and metrics with the same levels enables users
to seamlessly context switch between metrics and logs, helping with root cause
analysis.
Kibana is used to visualize and search Elasticsearch data and is very powerful
for doing analytics on that data. Kibana provides many visualization tools to do
data analysis, such as location maps, machine learning for anomaly detection,
and graphs to discover relationships in data. Alerts can be configured to notify
users when an unexpected condition occurs.
In comparison, Grafana is tailored specifically towards time series data from
sources like Prometheus and Loki. Dashboards can be set up to visualize metrics
(log support coming soon) and an explore view can be used to make ad-hoc queries
against your data. Like Kibana, Grafana supports alerting based on your metrics.

@ -1,115 +0,0 @@
# Promtail
* [Scrape Configs](#scrape-configs)
* [Entry Parsing](#entry-parser)
* [Deployment Methods](./promtail/deployment-methods.md)
* [Promtail API](./promtail/api.md)
* [Config and Usage Examples](./promtail/config-examples.md)
* [Failure modes](./promtail/known-failure-modes.md)
* [Troubleshooting](./troubleshooting.md)
## Scrape Configs
Promtail is an agent which reads log files and sends streams of log data to
the centralised Loki instances along with a set of labels. For example if you are running Promtail in Kubernetes
then each container in a single pod will usually yield a single log stream with a set of labels
based on that particular pod Kubernetes labels. You can also run Promtail outside Kubernetes, but you would
then need to customise the `scrape_configs` for your particular use case.
The way how Promtail finds out the log locations and extracts the set of labels is by using the *`scrape_configs`*
section in the Promtail yaml configuration. The syntax is the same what Prometheus uses.
The `scrape_config`s contains one or more *entries* which are all executed for each container in each new pod running
in the instance. If more than one entry matches your logs you will get duplicates as the logs are sent in more than
one stream, likely with a slightly different labels. Everything is based on different labels.
The term "label" here is used in more than one different way and they can be easily confused.
* Labels starting with `__` (two underscores) are internal labels. They are not stored to the loki index and are
invisible after Promtail. They "magically" appear from different sources.
* Labels starting with `__meta_kubernetes_pod_label_*` are "meta labels" which are generated based on your kubernetes
pod labels. Example: If your kubernetes pod has a label "name" set to "foobar" then the scrape_configs section
will have a label `__meta_kubernetes_pod_label_name` with value set to "foobar".
* There are other `__meta_kubernetes_*` labels based on the Kubernetes metadata, such as the namespace the pod is
running (`__meta_kubernetes_namespace`) or the name of the container inside the pod (`__meta_kubernetes_pod_container_name`)
* The label `__path__` is a special label which Promtail will read to find out where the log files are to be read in.
* The label `filename` is added for every file found in `__path__` to ensure uniqueness of the streams. It contains the absolute path of the file being tailed.
The most important part of each entry is the `relabel_configs` which are a list of operations which creates,
renames, modifies or alters labels. A single `scrape_config` can also reject logs by doing an `action: drop` if
a label value matches a specified regex, which means that this particular `scrape_config` will not forward logs
from a particular log source, but another scrape_config might.
Many of the scrape_configs read labels from `__meta_kubernetes_*` meta-labels, assign them to intermediate labels
such as `__service__` based on a few different logic, possibly drop the processing if the `__service__` was empty
and finally set visible labels (such as "job") based on the `__service__` label.
In general, all of the default Promtail scrape_configs do the following:
* They read pod logs from under /var/log/pods/$1/*.log.
* They set "namespace" label directly from the `__meta_kubernetes_namespace.`
* They expect to see your pod name in the "name" label
* They set a "job" label which is roughly "your namespace/your job name"
### Idioms and examples on different `relabel_configs:`
* Drop the processing if a label is empty:
```yaml
- action: drop
regex: ^$
source_labels:
- __service__
```
* Drop the processing if any of these labels contains a value:
```yaml
- action: drop
regex: .+
separator: ''
source_labels:
- __meta_kubernetes_pod_label_name
- __meta_kubernetes_pod_label_app
```
* Rename a metadata label into another so that it will be visible in the final log stream:
```yaml
- action: replace
source_labels:
- __meta_kubernetes_namespace
target_label: namespace
```
* Convert all of the Kubernetes pod labels into visible labels:
```yaml
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
```
Additional reading:
* https://www.slideshare.net/roidelapluie/taking-advantage-of-prometheus-relabeling-109483749
## Entry parser
### Overview
Each job can be configured with a `pipeline_stages` to parse and mutate your log entry.
This allows you to add more labels, correct the timestamp or entirely rewrite the log line sent to Loki.
> Rewriting labels by parsing the log entry should be done with caution, this could increase the cardinality
> of streams created by Promtail.
Aside from mutating the log entry, pipeline stages can also generate metrics which could be useful in situation where you can't instrument an application.
See [Processing Log Lines](./logentry/processing-log-lines.md) for a detailed pipeline description
#### Labels
[The original design doc](./design-documents/labels.md) for labels. Post implementation we have strayed quit a bit from the config examples, though the pipeline idea was maintained.
See the [pipeline label docs](./logentry/processing-log-lines.md#labels) for more info on creating labels from log content.
#### Metrics
Metrics can also be extracted from log line content as a set of Prometheus metrics. Metrics are exposed on the path `/metrics` in promtail. By default a log size histogram (`log_entries_bytes_bucket`) per stream is computed. This means you don't need to create metrics to count status code or log level, simply parse the log entry and add them to the labels. All custom metrics are prefixed with `promtail_custom_`.
There are three [Prometheus metric types](https://prometheus.io/docs/concepts/metric_types/) available.
`Counter` and `Gauge` record metrics for each line parsed by adding the value. While `Histograms` observe sampled values by `buckets`.
See the [pipeline metric docs](./logentry/processing-log-lines.md#metrics) for more info on creating metrics from log content.

@ -1,45 +0,0 @@
# Overview
Promtail is an agent which ships the content of local log files to Loki. It is
usually deployed to every machine that has applications needed to be monitored.
It primarily **discovers** targets, attaches **labels** to log streams and
**pushes** them to the Loki instance.
### Discovery
Before Promtail is able to ship anything to Loki, it needs to find about its
environment. This specifically means discovering applications emitting log lines
that need to be monitored.
Promtail borrows the [service discovery mechanism from
Prometheus](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config),
although it currently only supports `static` and `kubernetes` service discovery.
This is due to the fact that `promtail` is deployed as a daemon to every local
machine and does not need to discover labels from other systems. `kubernetes`
service discovery fetches required labels from the api-server, `static` usually
covers the other use cases.
Just like Prometheus, `promtail` is configured using a `scrape_configs` stanza.
`relabel_configs` allows fine-grained control of what to ingest, what to drop
and the final metadata attached to the log line. Refer to the
[configuration](configuration.md) for more details.
### Labeling and Parsing
During service discovery, metadata is determined (pod name, filename, etc.) that
may be attached to the log line as a label for easier identification afterwards.
Using `relabel_configs`, those discovered labels can be mutated into the form
they should have for querying.
To allow more sophisticated filtering afterwards, Promtail allows to set labels
not only from service discovery, but also based on the contents of the log
lines. The so-called `pipeline_stages` can be used to add or update labels,
correct the timestamp or rewrite the log line entirely. Refer to the [logentry
processing documentation](../logentry/processing-log-lines.md) for more details.
### Shipping
Once Promtail is certain about what to ingest and all labels are set correctly,
it starts *tailing* (continuously reading) the log files from the applications.
Once enough data is read into memory, it is flushed in as a batch to Loki.

@ -1,22 +0,0 @@
# API
Promtail features an embedded web server exposing a web console at `/` and the following API endpoints:
### `GET /ready`
This endpoint returns 200 when Promtail is up and running, and there's at least one working target.
### `GET /metrics`
This endpoint returns Promtail metrics for Prometheus. See "[Operations > Observability > Metrics](../loki/operations.md)" to have a list of exported metrics.
## Promtail web server config
The web server exposed by Promtail can be configured in the promtail `.yaml` config file:
```
server:
http_listen_host: 127.0.0.1
http_listen_port: 9080
```

@ -1,141 +0,0 @@
# Promtail Config Examples
## Pipeline Examples
[Pipeline Docs](../logentry/processing-log-lines.md) contains detailed documentation of the pipeline stages
## Simple Docker Config
This example of config promtail based on original docker [config](https://github.com/grafana/loki/blob/master/cmd/promtail/promtail-docker-config.yaml)
and show how work with 2 and more sources:
Filename for example: my-docker-config.yaml
```yaml
server:
http_listen_port: 9080
grpc_listen_port: 0
positions:
filename: /tmp/positions.yaml
client:
url: http://ip_or_hostname_where_Loki_run:3100/api/prom/push
scrape_configs:
- job_name: system
pipeline_stages:
- docker:
static_configs:
- targets:
- localhost
labels:
job: varlogs
host: yourhost
__path__: /var/log/*.log
- job_name: someone_service
pipeline_stages:
- docker:
static_configs:
- targets:
- localhost
labels:
job: someone_service
host: yourhost
__path__: /srv/log/someone_service/*.log
```
#### Description
`scrape_config` section of `config.yaml` contents contains various jobs for parsing your logs
`job` and `host` are examples of static labels added to all logs, labels are indexed by Loki and are used to help search logs.
`__path__` it is path to directory where stored your logs.
If you run promtail and this `config.yaml` in Docker container, don't forget use docker volumes for mapping real directories
with log to those folders in the container.
#### Example Use
1) Create folder, for example `promtail`, then new sub directory `build/conf` and place there `my-docker-config.yaml`.
2) Create new Dockerfile in root folder `promtail`, with contents
```dockerfile
FROM grafana/promtail:latest
COPY build/conf /etc/promtail
```
3) Create your Docker image based on original Promtail image and tag it, for example `mypromtail-image`
3) After that you can run Docker container by this command:
`docker run -d --name promtail --network loki_network -p 9080:9080 -v /var/log:/var/log -v /srv/log/someone_service:/srv/log/someone_service mypromtail-image -config.file=/etc/promtail/my-docker-config.yaml`
## Simple Systemd Journal Config
This example demonstrates how to configure promtail to listen to systemd journal
entries and write them to Loki:
Filename for example: my-systemd-journal-config.yaml
```yaml
server:
http_listen_port: 9080
grpc_listen_port: 0
positions:
filename: /tmp/positions.yaml
clients:
- url: http://ip_or_hostname_where_loki_runns:3100/api/prom/push
scrape_configs:
- job_name: journal
journal:
max_age: 12h
path: /var/log/journal
labels:
job: systemd-journal
relabel_configs:
- source_labels: ['__journal__systemd_unit']
target_label: 'unit'
```
### Description
Just like the Docker example, the `scrape_configs` sections holds various
jobs for parsing logs. A job with a `journal` key configures it for systemd
journal reading.
`max_age` is an optional string specifying the earliest entry that will be
read. If unspecified, `max_age` defaults to `7h`. Even if the position in the
journal is saved, if the entry corresponding to that position is older than
the max_age, the position won't be used.
`path` is an optional string specifying the path to read journal entries
from. If unspecified, defaults to the system default (`/var/log/journal`).
`labels`: is a map of string values specifying labels that should always
be associated with each log entry being read from the systemd journal.
In our example, each log will have a label of `job=systemd-journal`.
Every field written to the systemd journal is available for processing
in the `relabel_configs` section. Label names are converted to lowercase
and prefixed with `__journal_`. After `relabel_configs` processes all
labels for a job entry, any label starting with `__` is deleted.
Our example renames the `_SYSTEMD_UNIT` label (available as
`__journal__systemd_unit` in promtail) to `unit` so it will be available
in Loki. All other labels from the journal entry are dropped.
### Example Use
`promtail` must have access to the journal path (`/var/log/journal`)
where journal entries are stored and the machine ID (`/etc/machine-id`) for
journal support to work correctly.
If running with Docker, that means to bind those paths:
```bash
docker run -d --name promtail --network loki_network -p 9080:9080 \
-v /var/log/journal:/var/log/journal \
-v /etc/machine-id:/etc/machine-id \
mypromtail-image -config.file=/etc/promtail/my-systemd-journal-config.yaml
```

@ -1,185 +0,0 @@
# Configuration
## `scrape_configs` (Target Discovery)
The way how Promtail finds out the log locations and extracts the set of labels
is by using the `scrape_configs` section in the `promtail.yaml` configuration
file. The syntax is equal to what [Prometheus
uses](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config).
The `scrape_configs` contains one or more *entries* which are all executed for
each discovered target (read each container in each new pod running in the instance):
```yaml
scrape_configs:
- job_name: local
static_configs:
- ...
- job_name: kubernetes
kubernetes_sd_config:
- ...
```
If more than one entry matches your logs, you will get duplicates as the logs are
sent in more than one stream, likely with a slightly different labels.
There are different types of labels present in Promtail:
* Labels starting with `__` (two underscores) are internal labels. They usually
come from dynamic sources like the service discovery. Once relabeling is done,
they are removed from the label set. To persist those, rename them to
something not starting with `__`.
* Labels starting with `__meta_kubernetes_pod_label_*` are "meta labels" which
are generated based on your kubernetes pod labels.
Example: If your kubernetes pod has a label `name` set to `foobar` then the
`scrape_configs` section will have a label `__meta_kubernetes_pod_label_name`
with value set to `foobar`.
* There are other `__meta_kubernetes_*` labels based on the Kubernetes
metadadata, such as the namespace the pod is running in
(`__meta_kubernetes_namespace`) or the name of the container inside the pod
(`__meta_kubernetes_pod_container_name`). Refer to [the Prometheus
docs](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config)
for the full list.
* The label `__path__` is a special label which Promtail will use afterwards to
figure out where the file to be read is located. Wildcards are allowed.
* The label `filename` is added for every file found in `__path__` to ensure
uniqueness of the streams. It contains the absolute path of the file the line
was read from.
## `relabel_configs` (Relabeling)
The most important part of each entry is the `relabel_configs` stanza, which is a list
of operations to create, rename, modify or alter the labels.
A single `scrape_config` can also reject logs by doing an `action: drop` if a label value
matches a specified regex, which means that this particular `scrape_config` will
not forward logs from a particular log source.
This does not mean that other `scrape_config`'s might not do, though.
Many of the `scrape_configs` read labels from `__meta_kubernetes_*` meta-labels,
assign them to intermediate labels such as `__service__` based on
different logic, possibly drop the processing if the `__service__` was empty
and finally set visible labels (such as `job`) based on the `__service__`
label.
In general, all of the default Promtail `scrape_configs` do the following:
* They read pod logs from under `/var/log/pods/$1/*.log`.
* They set `namespace` label directly from the `__meta_kubernetes_namespace`.
* They expect to see your pod name in the `name` label
* They set a `job` label which is roughly `namespace/job`
#### Examples
* Drop the processing if a label is empty:
```yaml
- action: drop
regex: ^$
source_labels:
- __service__
```
* Drop the processing if any of these labels contains a value:
```yaml
- action: drop
regex: .+
separator: ''
source_labels:
- __meta_kubernetes_pod_label_name
- __meta_kubernetes_pod_label_app
```
* Rename a metadata label into another so that it will be visible in the final log stream:
```yaml
- action: replace
source_labels:
- __meta_kubernetes_namespace
target_label: namespace
```
* Convert all of the Kubernetes pod labels into visible labels:
```yaml
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
```
Additional reading:
* [Julien Pivotto's slides from PromConf Munich, 2017](https://www.slideshare.net/roidelapluie/taking-advantage-of-prometheus-relabeling-109483749)
## `client_option` (HTTP Client)
Promtail uses the Prometheus HTTP client implementation for all calls to Loki.
Therefore, you can configure it using the `client` stanza:
```yaml
client: [ <client_option> ]
```
Reference for `client_option`:
```yaml
# Sets the `url` of loki api push endpoint
url: http[s]://<host>:<port>/api/prom/push
# Sets the `Authorization` header on every promtail request with the
# configured username and password.
# password and password_file are mutually exclusive.
basic_auth:
username: <string>
password: <secret>
password_file: <string>
# Sets the `Authorization` header on every promtail request with
# the configured bearer token. It is mutually exclusive with `bearer_token_file`.
bearer_token: <secret>
# Sets the `Authorization` header on every promtail request with the bearer token
# read from the configured file. It is mutually exclusive with `bearer_token`.
bearer_token_file: /path/to/bearer/token/file
# Configures the promtail request's TLS settings.
tls_config:
# CA certificate to validate API server certificate with.
# If not provided Trusted CA from system will be used.
ca_file: <filename>
# Certificate and key files for client cert authentication to the server.
cert_file: <filename>
key_file: <filename>
# ServerName extension to indicate the name of the server.
# https://tools.ietf.org/html/rfc4366#section-3.1
server_name: <string>
# Disable validation of the server certificate.
insecure_skip_verify: <boolean>
# Optional proxy URL.
proxy_url: <string>
# Maximum wait period before sending batch
batchwait: 1s
# Maximum batch size to accrue before sending, unit is byte
batchsize: 102400
# Maximum time to wait for server to respond to a request
timeout: 10s
backoff_config:
# Initial backoff time between retries
minbackoff: 100ms
# Maximum backoff time between retries
maxbackoff: 5s
# Maximum number of retires when sending batches, 0 means infinite retries
maxretries: 5
# The labels to add to any time series or alerts when communicating with loki
external_labels: {}
```
#### Ship to multiple Loki Servers
Promtail is able to push logs to as many different Loki servers as you like. Use
`clients` instead of `client` if needed:
```yaml
# Single Loki
client: [ <client_option> ]
# Multiple Loki instances
clients:
- [ <client_option> ]
```

@ -1,253 +0,0 @@
# Promtail Deployment Methods
## Daemonset method
Daemonset will deploy `promtail` on every node within the Kubernetes cluster.
Daemonset deployment is great to collect all of the container logs within the
cluster. It is great solution for single tenant. All of the logs will send to a
single Loki server.
Check the `production` folder for examples of a daemonset deployment for kubernetes using both helm and ksonnet.
### Example
```yaml
---Daemonset.yaml
apiVersion: extensions/v1beta1
kind: Daemonset
metadata:
name: promtail-daemonset
...
spec:
...
template:
spec:
serviceAccount: SERVICE_ACCOUNT
serviceAccountName: SERVICE_ACCOUNT
volumes:
- name: logs
hostPath: HOST_PATH
- name: promtail-config
configMap
name: promtail-configmap
containers:
- name: promtail-container
args:
- -config.file=/etc/promtail/promtail.yaml
volumeMounts:
- name: logs
mountPath: MOUNT_PATH
- name: promtail-config
mountPath: /etc/promtail
...
---configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: promtail-config
...
data:
promtail.yaml: YOUR CONFIG
---Clusterrole.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: promtail-clusterrole
rules:
- apiGroups:
resources:
- nodes
- services
- pod
verbs:
- get
- watch
- list
---ServiceAccount.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: promtail-serviceaccount
---Rolebinding
apiVersion: rbac.authorization.k9s.io/v1
kind: ClusterRoleBinding
metadata:
name: promtail-clusterrolebinding
subjects:
- kind: ServiceAccount
name: promtail-serviceaccount
roleRef:
kind: ClusterRole
name: promtail-clusterrole
apiGroup: rbac.authorization.k8s.io
```
## Sidecar Method
Sidecar method will deploy `promtail` as a container within a pod that
developer/devops create.
This method will deploy `promtail` as a sidecar container within a pod.
In a multi-tenant environment, this enables teams to aggregate logs
for specific pods and deployments for example for all pods in a namespace.
### Example
```yaml
---Deployment.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: my_test_app
...
spec:
...
template:
spec:
serviceAccount: SERVICE_ACCOUNT
serviceAccountName: SERVICE_ACCOUNT
volumes:
- name: logs
hostPath: HOST_PATH
- name: promtail-config
configMap
name: promtail-configmap
containers:
- name: promtail-container
args:
- -config.file=/etc/promtail/promtail.yaml
volumeMounts:
- name: logs
mountPath: MOUNT_PATH
- name: promtail-config
mountPath: /etc/promtail
...
...
```
### Custom Log Paths
Sometime application create customized log files. To collect those logs, you
would need to have a customized `__path__` in your scrape_config.
Right now, the best way to watch and tail custom log path is define log file path
as a label for the pod.
#### Example
```yaml
---Deployment.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: test-app-deployment
namespace: your_namespace
labels:
logFileName: my_app_log
...
---promtail_config.yaml
...
scrape_configs:
...
- job_name: job_name
kubernetes_sd_config:
- role: pod
relabel_config:
...
- action: replace
target_label: __path__
source_labes:
- __meta_kubernetes_pod_label_logFileName
replacement: /your_log_file_dir/$1.log
...
```
### Custom Client options
`promtail` client configuration uses the [Prometheus http client](https://godoc.org/github.com/prometheus/common/config) implementation.
Therefore you can configure the following authentication parameters in the `client` or `clients` section.
```yaml
---promtail_config.yaml
...
# Simple client
client:
[ <client_option> ]
# Multiple clients
clients:
[ - <client_option> ]
...
```
>Note: Passing the `-client.url` from command line is only valid if you set the `client` section.
#### `<client_option>`
```yaml
# Sets the `url` of loki api push endpoint
url: http[s]://<host>:<port>/api/prom/push
# Sets the `Authorization` header on every promtail request with the
# configured username and password.
# password and password_file are mutually exclusive.
basic_auth:
username: <string>
password: <secret>
password_file: <string>
# Sets the `Authorization` header on every promtail request with
# the configured bearer token. It is mutually exclusive with `bearer_token_file`.
bearer_token: <secret>
# Sets the `Authorization` header on every promtail request with the bearer token
# read from the configured file. It is mutually exclusive with `bearer_token`.
bearer_token_file: /path/to/bearer/token/file
# Configures the promtail request's TLS settings.
tls_config:
# CA certificate to validate API server certificate with.
# If not provided Trusted CA from system will be used.
ca_file: <filename>
# Certificate and key files for client cert authentication to the server.
cert_file: <filename>
key_file: <filename>
# ServerName extension to indicate the name of the server.
# https://tools.ietf.org/html/rfc4366#section-3.1
server_name: <string>
# Disable validation of the server certificate.
insecure_skip_verify: <boolean>
# Optional proxy URL.
proxy_url: <string>
# Maximum wait period before sending batch
batchwait: 1s
# Maximum batch size to accrue before sending, unit is byte
batchsize: 102400
# Maximum time to wait for server to respond to a request
timeout: 10s
backoff_config:
# Initial backoff time between retries
minbackoff: 100ms
# Maximum backoff time between retries
maxbackoff: 5s
# Maximum number of retires when sending batches, 0 means infinite retries
maxretries: 5
# The labels to add to any time series or alerts when communicating with loki
external_labels: {}
```

@ -1,92 +0,0 @@
# Examples
This document shows some example use-cases for promtail and their configuration.
## Local Config
Using this configuration, all files in `/var/log` and `/srv/log/someone_service` are ingested into Loki.
The labels `job` and `host` are set using `static_configs`.
When using this configuration with Docker, do not forget to mount the configuration, `/var/log` and `/src/log/someone_service` using [volumes](https://docs.docker.com/storage/volumes/).
```yaml
server:
http_listen_port: 9080
grpc_listen_port: 0
positions:
filename: /tmp/positions.yaml # progress of the individual files
client:
url: http://ip_or_hostname_where_loki_runs:3100/api/prom/push
scrape_configs:
- job_name: system
pipeline_stages:
- docker: # Docker wraps logs in json. Undo this.
static_configs: # running locally here, no need for service discovery
- targets:
- localhost
labels:
job: varlogs
host: yourhost
__path__: /var/log/*.log # tail all files under /var/log
- job_name: someone_service
pipeline_stages:
- docker: # Docker wraps logs in json. Undo this.
static_configs: # running locally here, no need for service discovery
- targets:
- localhost
labels:
job: someone_service
host: yourhost
__path__: /srv/log/someone_service/*.log # tail all files under /srv/log/someone_service
```
## Systemd Journal
This example shows how to ship the `systemd` journal to Loki.
Just like the Docker example, the `scrape_configs` section holds various
jobs for parsing logs. A job with a `journal` key configures it for systemd
journal reading.
`path` is an optional string specifying the path to read journal entries
from. If unspecified, defaults to the system default (`/var/log/journal`).
`labels`: is a map of string values specifying labels that should always
be associated with each log entry being read from the systemd journal.
In our example, each log will have a label of `job=systemd-journal`.
Every field written to the systemd journal is available for processing
in the `relabel_configs` section. Label names are converted to lowercase
and prefixed with `__journal_`. After `relabel_configs` processes all
labels for a job entry, any label starting with `__` is deleted.
Our example renames the `_SYSTEMD_UNIT` label (available as
`__journal__systemd_unit` in promtail) to `unit** so it will be available
in Loki. All other labels from the journal entry are dropped.
When running using Docker, **remember to bind the journal into the container**.
```yaml
server:
http_listen_port: 9080
grpc_listen_port: 0
positions:
filename: /tmp/positions.yaml
clients:
- url: http://ip_or_hostname_where_loki_runns:3100/api/prom/push
scrape_configs:
- job_name: journal
journal:
path: /var/log/journal
labels:
job: systemd-journal
relabel_configs:
- source_labels: ['__journal__systemd_unit']
target_label: 'unit'
```

@ -1,53 +0,0 @@
# Promtail Known Failure Modes
This document describes known failure modes of `promtail` on edge cases and the adopted trade-offs.
## A tailed file is truncated while `promtail` is not running
Given the following order of events:
1. `promtail` is tailing `/app.log`
2. `promtail` current position for `/app.log` is `100` (bytes)
3. `promtail` is stopped
4. `/app.log` is truncated and new logs are appended to it
5. `promtail` is restarted
When `promtail` is restarted, it reads the previous position (`100`) from the positions file. Two scenarios are then possible:
- `/app.log` size is < than the position before truncating
- `/app.log` size is >= than the position before truncating
If the `/app.log` file size is less than the previous position, then the file is detected as truncated and logs will be tailed starting from position `0`. Otherwise, if the `/app.log` file size is >= than the previous position, `promtail` can't detect it was truncated while not running and will continue tailing the file from position `100`.
Generally speaking, `promtail` uses only the path to the file as key in the positions file. Whenever `promtail` is started, for each file path referenced in the positions file, `promtail` will read the file from the beginning if the file size is less than the offset stored in the position file, otherwise it will continue from the offset, regardless the file has been truncated or rolled multiple times while `promtail` was not running.
## Loki is unavailable
For each tailing file, `promtail` reads a line, process it through the configured `pipeline_stages` and push the log entry to Loki. Log entries are batched together before getting pushed to Loki, based on the max batch duration `client.batch-wait` and size `client.batch-size-bytes`, whichever comes first.
In case of any error while sending a log entries batch, `promtail` adopts a "retry then discard" strategy:
- `promtail` retries to send log entry to the ingester up to `maxretries` times
- if all retries fail, `promtail` discards the batch of log entries (_which will be lost_) and proceeds with the next one
You can configure the `maxretries` and the delay between two retries via the `backoff_config` in the promtail config file:
```
clients:
- url: INGESTER-URL
backoff_config:
minbackoff: 100ms
maxbackoff: 5s
maxretries: 5
```
## Log entries pushed after a `promtail` crash / panic / abruptly termination
When `promtail` shutdown gracefully, it saves the last read offsets in the positions file, so that on a subsequent restart it will continue tailing logs without duplicates neither losses.
In the event of a crash or abruptly termination, `promtail` can't save the last read offsets in the positions file. When restarted, `promtail` will read the positions file saved at the last sync period and will continue tailing the files from there. This means that if new log entries have been read and pushed to the ingester between the last sync period and the crash, these log entries will be sent again to the ingester on `promtail` restart.
However, for each log stream (set of unique labels) the Loki ingester skips all log entries received out of timestamp order. For this reason, even if duplicated logs may be sent from `promtail` to the ingester, entries whose timestamp is older than the latest received will be discarded to avoid having duplicated logs. To leverage on this, it's important that your `pipeline_stages` include the `timestamp` stage, parsing the log entry timestamp from the log line instead of relying on the default behaviour of setting the timestamp as the point in time when the line is read by `promtail`.

@ -1,171 +0,0 @@
# Querying
To get the previously ingested logs back from Loki for analysis, you need a
client that supports LogQL.
Grafana will be the first choice for most users,
nevertheless [LogCLI](logcli.md) represents a viable standalone alternative.
## Clients
### Grafana
Grafana ships with built-in support for Loki for versions greater than
[6.0](https://grafana.com/grafana/download/6.0.0), however using
[6.3](https://grafana.com/grafana/download/6.3.0) or later is highly
recommended.
1. Log into your Grafana, e.g, `http://localhost:3000` (default username:
`admin`, default password: `admin`)
2. Go to `Configuration` > `Data Sources` via the cog icon on the left side bar.
3. Click the big <kbd>+ Add data source</kbd> button.
4. Choose Loki from the list.
5. The http URL field should be the address of your Loki server e.g.
`http://localhost:3100` when running locally or with docker,
`http://loki:3100` when running with docker-compose or kubernetes.
6. To see the logs, click <kbd>Explore</kbd> on the sidebar, select the Loki
datasource, and then choose a log stream using the <kbd>Log labels</kbd>
button.
Read more about the Explore feature in the [Grafana
docs](http://docs.grafana.org/features/explore) and on how to search and filter
logs with Loki.
> To configure the datasource via provisioning see [Configuring Grafana via
> Provisioning](http://docs.grafana.org/features/datasources/loki/#configure-the-datasource-with-provisioning)
> and make sure to adjust the URL similarly as shown above.
### LogCLI
If you prefer a command line interface, [LogCLI](logcli.md) also allows to run
LogQL queries against a Loki server. Refer to its [documentation](logcli.md) for
more details.
## LogQL
Loki has it's very own language for querying logs from the Loki server called *LogQL*. Think of
it as distributed `grep` with labels for selection.
A log query consists of two parts: **log stream selector**, and a **filter
expression**. For performance reasons you need to start by choosing a set of log
streams using a Prometheus-style log stream selector.
The log stream selector will reduce the number of log streams to a manageable
volume and then the regex search expression is used to do a distributed grep
over those log streams.
### Log Stream Selector
For the label part of the query expression, wrap it in curly braces `{}` and
then use the key value syntax for selecting labels. Multiple label expressions
are separated by a comma:
`{app="mysql",name="mysql-backup"}`
The following label matching operators are currently supported:
- `=` exactly equal.
- `!=` not equal.
- `=~` regex-match.
- `!~` do not regex-match.
Examples:
- `{name=~"mysql.+"}`
- `{name!~"mysql.+"}`
The same rules that apply for [Prometheus Label
Selectors](https://prometheus.io/docs/prometheus/latest/querying/basics/#instant-vector-selectors)
apply for Loki Log Stream Selectors.
### Filter Expression
After writing the Log Stream Selector, you can filter the results further by
writing a search expression. The search expression can be just text or a regex
expression.
Example queries:
- `{job="mysql"} |= "error"`
- `{name="kafka"} |~ "tsdb-ops.*io:2003"`
- `{instance=~"kafka-[23]",name="kafka"} != kafka.server:type=ReplicaManager`
Filter operators can be chained and will sequentially filter down the
expression - resulting log lines will satisfy _every_ filter. Eg:
`{job="mysql"} |= "error" != "timeout"`
The following filter types have been implemented:
- `|=` line contains string.
- `!=` line does not contain string.
- `|~` line matches regular expression.
- `!~` line does not match regular expression.
The regex expression accepts [RE2
syntax](https://github.com/google/re2/wiki/Syntax). The matching is
case-sensitive by default and can be switched to case-insensitive prefixing the
regex with `(?i)`.
### Query Language Extensions
The query language is still under development to support more features, e.g.,:
- `AND` / `NOT` operators
- Number extraction for timeseries based on number in log messages
- JSON accessors for filtering of JSON-structured logs
- Context (like `grep -C n`)
## Counting logs
Loki's LogQL support sample expression allowing to count entries per stream after the regex filtering stage.
### Range Vector aggregation
The language shares the same [range vector](https://prometheus.io/docs/prometheus/latest/querying/basics/#range-vector-selectors) concept from Prometheus, except that the selected range of samples contains a value of one for each log entry. You can then apply an aggregation over the selected range to transform it into an instant vector.
`rate` calculates the number of entries per second and `count_over_time` count of entries for the each log stream within the range.
In this example, we count all the log lines we have recorded within the last 5min for the mysql job.
> `count_over_time({job="mysql"}[5m])`
A range vector aggregation can also be applied to a [Filter Expression](#filter-expression), allowing you to select only matching log entries.
> `rate( ( {job="mysql"} |= "error" != "timeout)[10s] ) )`
The query above will compute the per second rate of all errors except those containing `timeout` within the last 10 seconds.
You can then use aggregation operators over the range vector aggregation.
### Aggregation operators
Like [PromQL](https://prometheus.io/docs/prometheus/latest/querying/operators/#aggregation-operators), Loki's LogQL support a subset of built-in aggregation operators that can be used to aggregate the element of a single vector, resulting in a new vector of fewer elements with aggregated values:
- `sum` (calculate sum over dimensions)
- `min` (select minimum over dimensions)
- `max` (select maximum over dimensions)
- `avg` (calculate the average over dimensions)
- `stddev` (calculate population standard deviation over dimensions)
- `stdvar` (calculate population standard variance over dimensions)
- `count` (count number of elements in the vector)
- `bottomk` (smallest k elements by sample value)
- `topk` (largest k elements by sample value)
These operators can either be used to aggregate over all label dimensions or preserve distinct dimensions by including a without or by clause.
> `<aggr-op>([parameter,] <vector expression>) [without|by (<label list>)]`
parameter is only required for `topk` and `bottomk`. without removes the listed labels from the result vector, while all other labels are preserved the output. by does the opposite and drops labels that are not listed in the by clause, even if their label values are identical between all elements of the vector.
topk and bottomk are different from other aggregators in that a subset of the input samples, including the original labels, are returned in the result vector. by and without are only used to bucket the input vector.
#### Examples
Get top 10 applications by highest log throughput:
> `topk(10,sum(rate({region="us-east1"}[5m]) by (name))`
Get the count of logs during the last 5 minutes by level:
> `sum(count_over_time({job="mysql"}[5m])) by (level)`
Get the rate of HTTP GET requests from nginx logs:
> `avg(rate(({job="nginx"} |= "GET")[10s])) by (region)`

@ -1,117 +0,0 @@
# Troubleshooting
## "Loki: Bad Gateway. 502"
This error can appear in Grafana when you add Loki as a datasource. It means
that Grafana cannot connect to Loki. This can have several reasons:
- If you deploy in using Docker, Grafana and Loki are not in same node, check
iptables or firewalls to ensure they can connect.
- If you deploy using Kubernetes, please note:
- If Grafana and Loki are in the same namespace, set Loki url as
`http://$LOKI_SERVICE_NAME:$LOKI_PORT`.
- If Grafana and Loki are in different namespaces, set Loki url as
`http://$LOKI_SERVICE_NAME.$LOKI_NAMESPACE:$LOKI_PORT`.
## "Data source connected, but no labels received. Verify that Loki and Promtail is configured properly."
This error can appear in Grafana when you add Loki as a datasource. It means
that Grafana can connect to Loki, but Loki has not received any logs from
promtail. This can have several reasons:
- Promtail cannot reach Loki, check promtail's output.
- Promtail started sending logs before Loki was ready. This can happen in test
environments where promtail already read all logs and sent them off. Here is
what you can do:
- Generally start promtail after Loki, e.g., 60 seconds later.
- Restarting promtail will not necessarily resend log messages that have been
read. To force sending all messages again, delete the positions file
(default location `/tmp/positions.yaml`) or make sure new log messages are
written after both promtail and Loki have started.
- Promtail is ignoring targets because of a configuration rule
- Detect this by turning on debug logging and then look for `dropping target,
no labels` or `ignoring target` messages.
- Promtail cannot find the location of your log files. Check that the
`scrape_configs` contains valid path setting for finding the logs in your worker
nodes.
- Your pods are running but not with the labels Promtail is expecting. Check the
Promtail `scape_configs`.
## Troubleshooting targets
Promtail offers two pages that you can use to understand how service discovery
works. The service discovery page (`/service-discovery`) shows all discovered
targets with their labels before and after relabeling as well as the reason why
the target has been dropped. The targets page (`/targets`) however displays only
targets being actively scraped with their respective labels, files and
positions.
On Kubernetes, you can access those two pages by port-forwarding the promtail port (`9080` or
`3101` via helm) locally:
```bash
kubectl port-forward loki-promtail-jrfg7 9080
```
## Debug output
Both binaries support a log level parameter on the command-line, e.g.:
``` bash
$ loki —log.level=debug
```
## Failed to create target, `ioutil.ReadDir: readdirent: not a directory`
The promtail configuration contains a `__path__` entry to a directory that
promtail cannot find.
## Connecting to a promtail pod to troubleshoot
First check [Troubleshooting targets](#troubleshooting-targets) section above,
if that doesn't help answer your questions you can connect to the promtail pod
to further investigate.
If you are running promtail as a DaemonSet in your cluster, you will have a
promtail pod on each node, so figure out which promtail you need to debug first:
```shell
$ kubectl get pods --all-namespaces -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
...
nginx-7b6fb56fb8-cw2cm 1/1 Running 0 41d 10.56.4.12 node-ckgc <none>
...
promtail-bth9q 1/1 Running 0 3h 10.56.4.217 node-ckgc <none>
```
That output is truncated to highlight just the two pods we are interested in,
you can see with the `-o wide` flag the NODE on which they are running.
You'll want to match the node for the pod you are interested in, in this example
nginx, to the promtail running on the same node.
To debug you can connect to the promtail pod:
```shell
kubectl exec -it promtail-bth9q -- /bin/sh
```
Once connected, verify the config in `/etc/promtail/promtail.yml` is what you
expected
Also check `/var/log/positions.yaml`(`/run/promtail/positions.yaml` when deploy by helm or the value of `positions.file`) and make sure promtail is tailing the logs
you would expect
You can check the promtail log by looking in `/var/log/containers` at the
promtail container log
## Enable tracing for Loki
Loki can be traced using [Jaeger](https://www.jaegertracing.io/) by setting
the environment variable `JAEGER_AGENT_HOST` to the hostname and port where
Loki is running.
If you deploy with helm, refer to the following command:
```bash
$ helm upgrade --install loki loki/loki --set "loki.tracing.jaegerAgentHost=YOUR_JAEGER_AGENT_HOST"
```
Loading…
Cancel
Save