Ruler docs + single binary inclusion (#2637)

* starts alerting docs

* ruler in single binary

* make docs interactive

* alerting docs

* ruler prom alerts endpoint

* Apply suggestions from code review

Co-authored-by: Diana Payton <52059945+oddlittlebird@users.noreply.github.com>

* doc fixes

* capitalize ruler

* removes double spaces

* Update docs/sources/alerting/_index.md

Co-authored-by: Diana Payton <52059945+oddlittlebird@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Ed Welch <ed@oqqer.com>
Co-authored-by: Diana Payton <52059945+oddlittlebird@users.noreply.github.com>

Co-authored-by: Diana Payton <52059945+oddlittlebird@users.noreply.github.com>
Co-authored-by: Ed Welch <ed@oqqer.com>
pull/2657/head
Owen Diehl 6 years ago committed by GitHub
parent 21addf7935
commit 6f8bfe0c79
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
  1. 4
      docs/Makefile
  2. 259
      docs/sources/alerting/_index.md
  3. 198
      docs/sources/api/_index.md
  4. 340
      docs/sources/configuration/_index.md
  5. 2
      pkg/loki/loki.go

@ -3,9 +3,9 @@ IMAGE = grafana/docs-base:latest
.PHONY: docs
docs:
docker pull ${IMAGE}
docker run -v ${PWD}/sources:/hugo/content/docs/loki/latest -p 3002:3002 --rm $(IMAGE) /bin/bash -c 'mkdir -p content/docs/grafana/latest/ && touch content/docs/grafana/latest/menu.yaml && make server'
docker run --rm -it -v ${PWD}/sources:/hugo/content/docs/loki/latest -p 3002:3002 $(IMAGE) /bin/bash -c 'mkdir -p content/docs/grafana/latest/ && touch content/docs/grafana/latest/menu.yaml && make server'
.PHONY: docs-test
docs-test:
docker pull ${IMAGE}
docker run -v ${PWD}/sources:/hugo/content/docs/loki/latest -p 3002:3002 --rm $(IMAGE) /bin/bash -c 'mkdir -p content/docs/grafana/latest/ && touch content/docs/grafana/latest/menu.yaml && make prod'
docker run --rm -it -v ${PWD}/sources:/hugo/content/docs/loki/latest -p 3002:3002 $(IMAGE) /bin/bash -c 'mkdir -p content/docs/grafana/latest/ && touch content/docs/grafana/latest/menu.yaml && make prod'

@ -0,0 +1,259 @@
---
title: Alerting
weight: 700
---
# Alerting
Loki includes a component called the Ruler, adapted from our upstream project, Cortex. The Ruler is responsible for continually evaluating a set of configurable queries and then alerting when certain conditions happen, e.g. a high percentage of error logs.
## Prometheus Compatible
When running the Ruler (which runs by default in the single binary), Loki accepts rules files and then schedules them for continual evaluation. These are _Prometheus compatible_! This means the rules file has the same structure as in [Prometheus](https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/), with the exception that the rules specified are in LogQL.
Let's see what that looks like:
The syntax of a rule file is:
```yaml
groups:
[ - <rule_group> ]
```
A simple example file could be:
```yaml
groups:
- name: example
rules:
- alert: HighThroughputLogStreams
expr: sum by(container) (rate({job=~"loki-dev/.*"}[1m])) > 1000
for: 2m
```
### `<rule_group>`
```yaml
# The name of the group. Must be unique within a file.
name: <string>
# How often rules in the group are evaluated.
[ interval: <duration> | default = Ruler.evaluation_interval || 1m ]
rules:
[ - <rule> ... ]
```
### `<rule>`
The syntax for alerting rules is (see the LogQL [docs](https://grafana.com/docs/loki/latest/logql/#metric-queries) for more details):
```yaml
# The name of the alert. Must be a valid label value.
alert: <string>
# The LogQL expression to evaluate (must be an instant vector). Every evaluation cycle this is
# evaluated at the current time, and all resultant time series become
# pending/firing alerts.
expr: <string>
# Alerts are considered firing once they have been returned for this long.
# Alerts which have not yet fired for long enough are considered pending.
[ for: <duration> | default = 0s ]
# Labels to add or overwrite for each alert.
labels:
[ <labelname>: <tmpl_string> ]
# Annotations to add to each alert.
annotations:
[ <labelname>: <tmpl_string> ]
```
### Example
A full-fledged example of a rules file might look like:
```yaml
groups:
- name: should_fire
rules:
- alert: HighPercentageError
expr: |
sum(rate({app="foo", env="production"} |= "error" [5m])) by (job)
/
sum(rate({app="foo", env="production"}[5m])) by (job)
> 0.05
for: 10m
labels:
severity: page
annotations:
summary: High request latency
- name: credentials_leak
rules:
- alert: http-credentials-leaked
annotations:
message: "{{ $labels.job }} is leaking http basic auth credentials."
expr: 'sum by (cluster, job, pod) (count_over_time({namespace="prod"} |~ "http(s?)://(\\w+):(\\w+)@" [5m]) > 0)'
for: 10m
labels:
severity: critical
```
## Use cases
The Ruler's Prometheus compatibility further accentuates the marriage between metrics and logs. For those looking to get started alerting based on logs, or wondering why this might be useful, here are a few use cases we think fit very well.
### We aren't using metrics yet
Many nascent projects, apps, or even companies may not have a metrics backend yet. We tend to add logging support before metric support, so if you're in this stage, alerting based on logs can help bridge the gap. It's easy to start building Loki alerts for things like _the percentage of error logs_ such as the example from earlier:
```yaml
- alert: HighPercentageError
expr: |
sum(rate({app="foo", env="production"} |= "error" [5m])) by (job)
/
sum(rate({app="foo", env="production"}[5m])) by (job)
> 0.05
```
### Black box monitoring
We don't always control the source code of applications we run. Think load balancers and the myriad components (both open source and closed third-party) that support our applications; it's a common problem that these don't expose a metric you want (or any metrics at all). How then, can we bring them into our observability stack in order to monitor them effectively? Alerting based on logs is a great answer for these problems.
For a sneak peek of how to combine this with the upcoming LogQL v2 functionality, take a look at Ward Bekker's [video](https://www.youtube.com/watch?v=RwQlR3D4Km4) which builds a robust nginx monitoring dashboard entirely from nginx logs.
### Event alerting
Sometimes you want to know whether _any_ instance of something has occurred. Alerting based on logs can be a great way to handle this, such as finding examples of leaked authentication credentials:
```yaml
- name: credentials_leak
rules:
- alert: http-credentials-leaked
annotations:
message: "{{ $labels.job }} is leaking http basic auth credentials."
expr: 'sum by (cluster, job, pod) (count_over_time({namespace="prod"} |~ "http(s?)://(\\w+):(\\w+)@" [5m]) > 0)'
for: 10m
labels:
severity: critical
```
### Alerting on high-cardinality sources
Another great use case is alerting on high cardinality sources. These are things which are difficult/expensive to record as metrics because the potential label set is huge. A great example of this is per-tenant alerting in multi-tenanted systems like Loki. It's a common balancing act between the desire to have per-tenant metrics and the cardinality explosion that ensues (adding a single _tenant_ label to an existing Prometheus metric would increase it's cardinality by the number of tenants).
Creating these alerts in LogQL is attractive because these metrics can be extracted at _query time_, meaning we don't suffer the cardinality explosion in our metrics store.
> **Note:** To really take advantage of this, we'll need some features from the upcoming LogQL v2 language. Stay tuned.
## Interacting with the Ruler
Because the rule files are identical to Prometheus rule files, we can interact with the Loki Ruler via [`cortex-tool`](https://github.com/grafana/cortex-tools#rules). The CLI is in early development, but works alongside both Loki and cortex. Make sure to pass the `--backend=loki` argument to commands when using it with Loki.
> **Note:** Not all commands in cortextool currently support Loki.
An example workflow is included below:
```sh
# diff rules against the currently managed ruleset in Loki
cortextool rules diff --rule-dirs=./output --backend=loki
# ensure the remote ruleset matches your local ruleset, creating/updating/deleting remote rules which differ from your local specification.
cortextool rules sync --rule-dirs=./output --backend=loki
# print the remote ruleset
cortextool rules print --backend=loki
```
There is also a [github action](https://github.com/grafana/cortex-rules-action) available for `cortex-tool`, so you can add it into your CI/CD pipelines!
For instance, you can sync rules on master builds via
```yaml
name: sync-cortex-rules-and-alerts
on:
push:
branches:
- master
env:
CORTEX_ADDRESS: '<fill me in>'
CORTEX_TENANT_ID: '<fill me in>'
CORTEX_API_KEY: ${{ secrets.API_KEY }}
RULES_DIR: 'output/'
jobs:
sync-loki-alerts:
runs-on: ubuntu-18.04
steps:
- name: Diff rules
id: diff-rules
uses: grafana/cortex-rules-action@v0.3.0
env:
ACTION: 'diff'
with:
args: --backend=loki
- name: Sync rules
if: ${{ !contains(steps.diff-rules.outputs.detailed, 'no changes detected') }}
uses: grafana/cortex-rules-action@v0.3.0
env:
ACTION: 'sync'
with:
args: --backend=loki
- name: Print rules
uses: grafana/cortex-rules-action@v0.3.0
env:
ACTION: 'print'
```
## Scheduling and best practices
One option to scale the Ruler is by scaling it horizontally. However, with multiple Ruler instances running they will need to coordinate to determine which instance will evaluate which rule. Similar to the ingesters, the Rulers establish a hash ring to divide up the responsibilities of evaluating rules.
The possible configurations are listed fully in the configuration [docs](https://grafana.com/docs/loki/latest/configuration/), but in order to shard rules across multiple Rulers, the rules API must be enabled via flag (`-experimental.Ruler.enable-api`) or config file parameter. Secondly, the Ruler requires it's own ring be configured. From there the Rulers will shard and handle the division of rules automatically. Unlike ingesters, Rulers do not hand over responsibility: all rules are re-sharded randomly every time a Ruler is added to or removed from the ring.
A full Ruler config example is:
```yaml
Ruler:
alertmanager_url: <alertmanager_endpoint>
enable_alertmanager_v2: true
enable_api: true
enable_sharding: true
ring:
kvstore:
consul:
host: consul.loki-dev.svc.cluster.local:8500
store: consul
rule_path: /tmp/rules
storage:
gcs:
bucket_name: <loki-rules-bucket>
```
## Ruler storage
The Ruler supports six kinds of storage: configdb, azure, gcs, s3, swift, and local. Most kinds of storage work with the sharded Ruler configuration in an obvious way, i.e. configure all Rulers to use the same backend.
The local implementation reads the rule files off of the local filesystem. This is a read only backend that does not support the creation and deletion of rules through [the API](https://grafana.com/docs/loki/latest/api/#Ruler). Despite the fact that it reads the local filesystem this method can still be used in a sharded Ruler configuration if the operator takes care to load the same rules to every Ruler. For instance this could be accomplished by mounting a [Kubernetes ConfigMap](https://kubernetes.io/docs/concepts/configuration/configmap/) onto every Ruler pod.
A typical local configuration might look something like:
```
-Ruler.storage.type=local
-Ruler.storage.local.directory=/tmp/loki/rules
```
With the above configuration, the Ruler would expect the following layout:
```
/tmp/loki/rules/<tenant id>/rules1.yaml
/rules2.yaml
```
Yaml files are expected to be in the [Prometheus format](#Prometheus_Compatible) but include LogQL expressions as specified in the beginning of this doc.
## Future improvements
There are a few things coming to increase the robustness of this service. In no particular order:
- Recording rules.
- Backend metric stores adapters for generated alert and recording rule data. The first will likely be Cortex, as Loki is built atop it.
- Introduce LogQL v2.
## Misc Details: Metrics backends vs in-memory
Currently the Loki Ruler is decoupled from a backing Prometheus store. Generally, the result of evaluating rules as well as the history of the alert's state are stored as a time series. Loki is unable to store/retrieve these in order to allow it to run independently of i.e. Prometheus. As a workaround, Loki keeps a small in memory store whose purpose is to lazy load past evaluations when rescheduling or resharding Rulers. In the future, Loki will support optional metrics backends, allowing storage of these metrics for auditing & performance benefits.

@ -41,8 +41,22 @@ The HTTP API includes the following endpoints:
- [Series](#series)
- [Examples](#examples-9)
- [Statistics](#statistics)
## Microservices Mode
- [`GET /ruler/ring`](#ruler-ring-status)
- [`GET /loki/api/v1/rules`](#list-rule-groups)
- [`GET /loki/api/v1/rules/{namespace}`](#get-rule-groups-by-namespace)
- [`GET /loki/api/v1/rules/{namespace}/{groupName}`](#get-rule-group)
- [`POST /loki/api/v1/rules/{namespace}`](#set-rule-group)
- [`DELETE /loki/api/v1/rules/{namespace}/{groupName}`](#delete-rule-group)
- [`DELETE /loki/api/v1/rules/{namespace}`](#delete-namespace)
- [`GET /api/prom/rules`](#list-rule-groups)
- [`GET /api/prom/rules/{namespace}`](#get-rule-groups-by-namespace)
- [`GET /api/prom/rules/{namespace}/{groupName}`](#get-rule-group)
- [`POST /api/prom/rules/{namespace}`](#set-rule-group)
- [`DELETE /api/prom/rules/{namespace}/{groupName}`](#delete-rule-group)
- [`DELETE /api/prom/rules/{namespace}`](#delete-namespace)
- [`GET /prometheus/api/v1/alerts`](#list-alerts)
## Microservices mode
When deploying Loki in microservices mode, the set of endpoints exposed by each
component is different.
@ -95,9 +109,28 @@ And these endpoints are exposed by just the ingester:
The API endpoints starting with `/loki/` are [Prometheus API-compatible](https://prometheus.io/docs/prometheus/latest/querying/api/) and the result formats can be used interchangeably.
These endpoints are exposed by the ruler:
- [`GET /ruler/ring`](#ruler-ring-status)
- [`GET /api/v1/rules`](#list-rules)
- [`GET /api/v1/rules`](#list-rule-groups)
- [`GET /api/v1/rules/{namespace}`](#get-rule-groups-by-namespace)
- [`GET /api/v1/rules/{namespace}/{groupName}`](#get-rule-group)
- [`POST /api/v1/rules/{namespace}`](#set-rule-group)
- [`DELETE /api/v1/rules/{namespace}/{groupName}`](#delete-rule-group)
- [`DELETE /api/v1/rules/{namespace}`](#delete-namespace)
- [`GET /api/prom/rules`](#list-rules)
- [`GET /api/prom/rules`](#list-rule-groups)
- [`GET /api/prom/rules/{namespace}`](#get-rule-groups-by-namespace)
- [`GET /api/prom/rules/{namespace}/{groupName}`](#get-rule-group)
- [`POST /api/prom/rules/{namespace}`](#set-rule-group)
- [`DELETE /api/prom/rules/{namespace}/{groupName}`](#delete-rule-group)
- [`DELETE /api/prom/rules/{namespace}`](#delete-namespace)
- [`GET /prometheus/api/v1/alerts`](#list-alerts)
A [list of clients](../clients) can be found in the clients documentation.
## Matrix, Vector, And Streams
## Matrix, vector, and streams
Some Loki API endpoints return a result of a matrix, a vector, or a stream:
@ -936,3 +969,162 @@ The example belows show all possible statistics returned with their respective d
}
}
```
## Ruler
The ruler API endpoints require to configure a backend object storage to store the recording rules and alerts. The ruler API uses the concept of a "namespace" when creating rule groups. This is a stand-in for the name of the rule file in Prometheus. Rule groups must be named uniquely within a namespace.
### Ruler ring status
```
GET /ruler/ring
```
Displays a web page with the ruler hash ring status, including the state, healthy and last heartbeat time of each ruler.
### List rule groups
```
GET /loki/api/v1/rules
```
List all rules configured for the authenticated tenant. This endpoint returns a YAML dictionary with all the rule groups for each namespace and `200` status code on success.
_This experimental endpoint is disabled by default and can be enabled via the `-experimental.ruler.enable-api` CLI flag (or its respective YAML config option)._
#### Example response
```yaml
---
<namespace1>:
- name: <string>
interval: <duration;optional>
rules:
- alert: <string>
expr: <string>
for: <duration>
annotations:
<annotation_name>: <string>
labels:
<label_name>: <string>
- name: <string>
interval: <duration;optional>
rules:
- alert: <string>
expr: <string>
for: <duration>
annotations:
<annotation_name>: <string>
labels:
<label_name>: <string>
<namespace2>:
- name: <string>
interval: <duration;optional>
rules:
- alert: <string>
expr: <string>
for: <duration>
annotations:
<annotation_name>: <string>
labels:
<label_name>: <string>
```
### Get rule groups by namespace
```
GET /loki/api/v1/rules/{namespace}
```
Returns the rule groups defined for a given namespace.
_This experimental endpoint is disabled by default and can be enabled via the `-experimental.ruler.enable-api` CLI flag (or its respective YAML config option)._
#### Example response
```yaml
name: <string>
interval: <duration;optional>
rules:
- alert: <string>
expr: <string>
for: <duration>
annotations:
<annotation_name>: <string>
labels:
<label_name>: <string>
```
### Get rule group
```
GET /loki/api/v1/rules/{namespace}/{groupName}
```
Returns the rule group matching the request namespace and group name.
_This experimental endpoint is disabled by default and can be enabled via the `-experimental.ruler.enable-api` CLI flag (or its respective YAML config option)._
### Set rule group
```
POST /loki/api/v1/rules/{namespace}
```
Creates or updates a rule group. This endpoint expects a request with `Content-Type: application/yaml` header and the rules **YAML** definition in the request body, and returns `202` on success.
_This experimental endpoint is disabled by default and can be enabled via the `-experimental.ruler.enable-api` CLI flag (or its respective YAML config option)._
#### Example request
Request headers:
- `Content-Type: application/yaml`
Request body:
```yaml
name: <string>
interval: <duration;optional>
rules:
- alert: <string>
expr: <string>
for: <duration>
annotations:
<annotation_name>: <string>
labels:
<label_name>: <string>
```
### Delete rule group
```
DELETE /loki/api/v1/rules/{namespace}/{groupName}
```
Deletes a rule group by namespace and group name. This endpoints returns `202` on success.
### Delete namespace
```
DELETE /loki/api/v1/rules/{namespace}
```
Deletes all the rule groups in a namespace (including the namespace itself). This endpoint returns `202` on success.
_This experimental endpoint is disabled by default and can be enabled via the `-experimental.ruler.enable-api` CLI flag (or its respective YAML config option)._
_Requires [authentication](#authentication)._
### List alerts
```
GET /prometheus/api/v1/alerts
```
Prometheus-compatible rules endpoint to list all active alerts.
_For more information, please check out the Prometheus [alerts](https://prometheus.io/docs/prometheus/latest/querying/api/#alerts) documentation._
_This experimental endpoint is disabled by default and can be enabled via the `-experimental.ruler.enable-api` CLI flag (or its respective YAML config option)._

@ -18,7 +18,8 @@ Configuration examples can be found in the [Configuration Examples](examples/) d
- [querier_config](#querier_config)
- [query_frontend_config](#query_frontend_config)
- [queryrange_config](#queryrange_config)
- [`frontend_worker_config`](#frontend_worker_config)
- [ruler_config](#ruler_config)
- [frontend_worker_config](#frontend_worker_config)
- [ingester_client_config](#ingester_client_config)
- [ingester_config](#ingester_config)
- [consul_config](#consul_config)
@ -103,6 +104,9 @@ Supported contents and default values of `loki.yaml`:
# query-frontend.
[query_range: <queryrange_config>]
# The ruler_config configures the Loki ruler.
[ruler: <ruler_config>]
# Configures how the distributor will connect to ingesters. Only appropriate
# when running all modules, the distributor, or the querier.
[ingester_client: <ingester_client_config>]
@ -332,7 +336,339 @@ results_cache:
[parallelise_shardable_queries: <boolean> | default = false]
```
## `frontend_worker_config`
## `ruler_config`
The `ruler_config` configures the Loki ruler.
```yaml
# URL of alerts return path.
# CLI flag: -ruler.external.url
[external_url: <url> | default = ]
ruler_client:
# Path to the client certificate file, which will be used for authenticating
# with the server. Also requires the key path to be configured.
# CLI flag: -ruler.client.tls-cert-path
[tls_cert_path: <string> | default = ""]
# Path to the key file for the client certificate. Also requires the client
# certificate to be configured.
# CLI flag: -ruler.client.tls-key-path
[tls_key_path: <string> | default = ""]
# Path to the CA certificates file to validate server certificate against. If
# not set, the host's root CA certificates are used.
# CLI flag: -ruler.client.tls-ca-path
[tls_ca_path: <string> | default = ""]
# Skip validating server certificate.
# CLI flag: -ruler.client.tls-insecure-skip-verify
[tls_insecure_skip_verify: <boolean> | default = false]
# How frequently to evaluate rules
# CLI flag: -ruler.evaluation-interval
[evaluation_interval: <duration> | default = 1m]
# How frequently to poll for rule changes
# CLI flag: -ruler.poll-interval
[poll_interval: <duration> | default = 1m]
storage:
# Method to use for backend rule storage (azure, gcs, s3, swift, local)
# CLI flag: -ruler.storage.type
[type: <string> ]
azure:
# Azure Cloud environment. Supported values are: AzureGlobal,
# AzureChinaCloud, AzureGermanCloud, AzureUSGovernment.
# CLI flag: -ruler.storage.azure.environment
[environment: <string> | default = "AzureGlobal"]
# Name of the blob container used to store chunks. This container must be
# created before running cortex.
# CLI flag: -ruler.storage.azure.container-name
[container_name: <string> | default = "cortex"]
# The Microsoft Azure account name to be used
# CLI flag: -ruler.storage.azure.account-name
[account_name: <string> | default = ""]
# The Microsoft Azure account key to use.
# CLI flag: -ruler.storage.azure.account-key
[account_key: <string> | default = ""]
# Preallocated buffer size for downloads.
# CLI flag: -ruler.storage.azure.download-buffer-size
[download_buffer_size: <int> | default = 512000]
# Preallocated buffer size for uploads.
# CLI flag: -ruler.storage.azure.upload-buffer-size
[upload_buffer_size: <int> | default = 256000]
# Number of buffers used to used to upload a chunk.
# CLI flag: -ruler.storage.azure.download-buffer-count
[upload_buffer_count: <int> | default = 1]
# Timeout for requests made against azure blob storage.
# CLI flag: -ruler.storage.azure.request-timeout
[request_timeout: <duration> | default = 30s]
# Number of retries for a request which times out.
# CLI flag: -ruler.storage.azure.max-retries
[max_retries: <int> | default = 5]
# Minimum time to wait before retrying a request.
# CLI flag: -ruler.storage.azure.min-retry-delay
[min_retry_delay: <duration> | default = 10ms]
# Maximum time to wait before retrying a request.
# CLI flag: -ruler.storage.azure.max-retry-delay
[max_retry_delay: <duration> | default = 500ms]
gcs:
# Name of GCS bucket to put chunks in.
# CLI flag: -ruler.storage.gcs.bucketname
[bucket_name: <string> | default = ""]
# The size of the buffer that GCS client for each PUT request. 0 to disable
# buffering.
# CLI flag: -ruler.storage.gcs.chunk-buffer-size
[chunk_buffer_size: <int> | default = 0]
# The duration after which the requests to GCS should be timed out.
# CLI flag: -ruler.storage.gcs.request-timeout
[request_timeout: <duration> | default = 0s]
s3:
# S3 endpoint URL with escaped Key and Secret encoded. If only region is
# specified as a host, proper endpoint will be deduced. Use
# inmemory:///<bucket-name> to use a mock in-memory implementation.
# CLI flag: -ruler.storage.s3.url
[s3: <url> | default = ]
# Set this to `true` to force the request to use path-style addressing.
# CLI flag: -ruler.storage.s3.force-path-style
[s3forcepathstyle: <boolean> | default = false]
# Comma separated list of bucket names to evenly distribute chunks over.
# Overrides any buckets specified in s3.url flag
# CLI flag: -ruler.storage.s3.buckets
[bucketnames: <string> | default = ""]
# S3 Endpoint to connect to.
# CLI flag: -ruler.storage.s3.endpoint
[endpoint: <string> | default = ""]
# AWS region to use.
# CLI flag: -ruler.storage.s3.region
[region: <string> | default = ""]
# AWS Access Key ID
# CLI flag: -ruler.storage.s3.access-key-id
[access_key_id: <string> | default = ""]
# AWS Secret Access Key
# CLI flag: -ruler.storage.s3.secret-access-key
[secret_access_key: <string> | default = ""]
# Disable https on S3 connection.
# CLI flag: -ruler.storage.s3.insecure
[insecure: <boolean> | default = false]
# Enable AES256 AWS server-side encryption
# CLI flag: -ruler.storage.s3.sse-encryption
[sse_encryption: <boolean> | default = false]
http_config:
# The maximum amount of time an idle connection will be held open.
# CLI flag: -ruler.storage.s3.http.idle-conn-timeout
[idle_conn_timeout: <duration> | default = 1m30s]
# If non-zero, specifies the amount of time to wait for a server's
# response headers after fully writing the request.
# CLI flag: -ruler.storage.s3.http.response-header-timeout
[response_header_timeout: <duration> | default = 0s]
# Set to false to skip verifying the certificate chain and hostname.
# CLI flag: -ruler.storage.s3.http.insecure-skip-verify
[insecure_skip_verify: <boolean> | default = false]
swift:
# Openstack authentication URL.
# CLI flag: -ruler.storage.swift.auth-url
[auth_url: <string> | default = ""]
# Openstack username for the api.
# CLI flag: -ruler.storage.swift.username
[username: <string> | default = ""]
# Openstack user's domain name.
# CLI flag: -ruler.storage.swift.user-domain-name
[user_domain_name: <string> | default = ""]
# Openstack user's domain ID.
# CLI flag: -ruler.storage.swift.user-domain-id
[user_domain_id: <string> | default = ""]
# Openstack user ID for the API.
# CLI flag: -ruler.storage.swift.user-id
[user_id: <string> | default = ""]
# Openstack API key.
# CLI flag: -ruler.storage.swift.password
[password: <string> | default = ""]
# Openstack user's domain ID.
# CLI flag: -ruler.storage.swift.domain-id
[domain_id: <string> | default = ""]
# Openstack user's domain name.
# CLI flag: -ruler.storage.swift.domain-name
[domain_name: <string> | default = ""]
# Openstack project ID (v2,v3 auth only).
# CLI flag: -ruler.storage.swift.project-id
[project_id: <string> | default = ""]
# Openstack project name (v2,v3 auth only).
# CLI flag: -ruler.storage.swift.project-name
[project_name: <string> | default = ""]
# ID of the project's domain (v3 auth only), only needed if it differs the
# from user domain.
# CLI flag: -ruler.storage.swift.project-domain-id
[project_domain_id: <string> | default = ""]
# Name of the project's domain (v3 auth only), only needed if it differs
# from the user domain.
# CLI flag: -ruler.storage.swift.project-domain-name
[project_domain_name: <string> | default = ""]
# Openstack Region to use eg LON, ORD - default is use first region (v2,v3
# auth only)
# CLI flag: -ruler.storage.swift.region-name
[region_name: <string> | default = ""]
# Name of the Swift container to put chunks in.
# CLI flag: -ruler.storage.swift.container-name
[container_name: <string> | default = "cortex"]
local:
# Directory to scan for rules
# CLI flag: -ruler.storage.local.directory
[directory: <string> | default = ""]
# File path to store temporary rule files
# CLI flag: -ruler.rule-path
[rule_path: <string> | default = "/rules"]
# Comma-separated list of Alertmanager URLs to send notifications to.
# Each Alertmanager URL is treated as a separate group in the configuration.
# Multiple Alertmanagers in HA per group can be supported by using DNS
# resolution via -ruler.alertmanager-discovery.
# CLI flag: -ruler.alertmanager-url
[alertmanager_url: <string> | default = ""]
# Use DNS SRV records to discover Alertmanager hosts.
# CLI flag: -ruler.alertmanager-discovery
[enable_alertmanager_discovery: <boolean> | default = false]
# How long to wait between refreshing DNS resolutions of Alertmanager hosts.
# CLI flag: -ruler.alertmanager-refresh-interval
[alertmanager_refresh_interval: <duration> | default = 1m]
# If enabled, then requests to Alertmanager use the v2 API.
# CLI flag: -ruler.alertmanager-use-v2
[enable_alertmanager_v2: <boolean> | default = false]
# Capacity of the queue for notifications to be sent to the Alertmanager.
# CLI flag: -ruler.notification-queue-capacity
[notification_queue_capacity: <int> | default = 10000]
# HTTP timeout duration when sending notifications to the Alertmanager.
# CLI flag: -ruler.notification-timeout
[notification_timeout: <duration> | default = 10s]
# Max time to tolerate outage for restoring "for" state of alert.
# CLI flag: -ruler.for-outage-tolerance
[for_outage_tolerance: <duration> | default = 1h]
# Minimum duration between alert and restored "for" state. This is maintained
# only for alerts with configured "for" time greater than the grace period.
# CLI flag: -ruler.for-grace-period
[for_grace_period: <duration> | default = 10m]
# Minimum amount of time to wait before resending an alert to Alertmanager.
# CLI flag: -ruler.resend-delay
[resend_delay: <duration> | default = 1m]
# Distribute rule evaluation using ring backend.
# CLI flag: -ruler.enable-sharding
[enable_sharding: <boolean> | default = false]
# Time to spend searching for a pending ruler when shutting down.
# CLI flag: -ruler.search-pending-for
[search_pending_for: <duration> | default = 5m]
ring:
kvstore:
# Backend storage to use for the ring. Supported values are: consul, etcd,
# inmemory, memberlist, multi.
# CLI flag: -ruler.ring.store
[store: <string> | default = "consul"]
# The prefix for the keys in the store. Should end with a /.
# CLI flag: -ruler.ring.prefix
[prefix: <string> | default = "rulers/"]
# The consul_config configures the consul client.
# The CLI flags prefix for this block config is: ruler.ring
[consul: <consul_config>]
# The etcd_config configures the etcd client.
# The CLI flags prefix for this block config is: ruler.ring
[etcd: <etcd_config>]
multi:
# Primary backend storage used by multi-client.
# CLI flag: -ruler.ring.multi.primary
[primary: <string> | default = ""]
# Secondary backend storage used by multi-client.
# CLI flag: -ruler.ring.multi.secondary
[secondary: <string> | default = ""]
# Mirror writes to secondary store.
# CLI flag: -ruler.ring.multi.mirror-enabled
[mirror_enabled: <boolean> | default = false]
# Timeout for storing value to secondary store.
# CLI flag: -ruler.ring.multi.mirror-timeout
[mirror_timeout: <duration> | default = 2s]
# Period at which to heartbeat to the ring.
# CLI flag: -ruler.ring.heartbeat-period
[heartbeat_period: <duration> | default = 5s]
# The heartbeat timeout after which rulers are considered unhealthy within the
# ring.
# CLI flag: -ruler.ring.heartbeat-timeout
[heartbeat_timeout: <duration> | default = 1m]
# Number of tokens for each ingester.
# CLI flag: -ruler.ring.num-tokens
[num_tokens: <int> | default = 128]
# Period with which to attempt to flush rule groups.
# CLI flag: -ruler.flush-period
[flush_period: <duration> | default = 1m]
# Enable the Ruler API.
# CLI flag: -experimental.ruler.enable-api
[enable_api: <boolean> | default = false]
```
## frontend_worker_config
The `frontend_worker_config` configures the worker - running within the Loki querier - picking up and executing queries enqueued by the query-frontend.

@ -349,7 +349,7 @@ func (t *Loki) setupModuleManager() error {
TableManager: {Server},
Compactor: {Server},
IngesterQuerier: {Ring},
All: {Querier, Ingester, Distributor, TableManager},
All: {Querier, Ingester, Distributor, TableManager, Ruler},
}
// Add IngesterQuerier as a dependency for store when target is either ingester or querier.

Loading…
Cancel
Save