mirror of https://github.com/grafana/loki
Add stream sharding docs (#7901)
This PR adds documentation for the new stream sharding feature. Co-authored-by: Steven Dungan <114922977+stevendungan@users.noreply.github.com>pull/8089/head
parent
55733561f7
commit
0db15efd50
@ -0,0 +1,65 @@ |
|||||||
|
--- |
||||||
|
title: Automatic Stream Sharding |
||||||
|
menuTitle: Automatic stream sharding |
||||||
|
description: Automatic stream sharding can control issues around the per-stream rate limit |
||||||
|
weight: 110 |
||||||
|
--- |
||||||
|
|
||||||
|
# Automatic stream sharding |
||||||
|
|
||||||
|
Automatic stream sharding will attempt to keep streams under a `desired_rate` by adding new labels and values to |
||||||
|
existing streams. When properly tuned, this should eliminate issues where log producers are rate limited due to the |
||||||
|
per-stream rate limit. |
||||||
|
|
||||||
|
## When to use automatic stream sharding |
||||||
|
|
||||||
|
Large log streams present several problems for Loki, namely increased and uneven resource usage on Ingesters and |
||||||
|
Distributors. The general recommendation is to explore existing log streams for additional label values that are both |
||||||
|
useful for querying and sufficiently low cardinality. There are many cases, however, where no more labels can |
||||||
|
be extracted, or cardinality for a label is dangerously large. To protect itself from such volume leading to operational failure, Loki implements per-stream rate limits; |
||||||
|
but the result is that some data is lost. The per-stream limit also needs human intervention to change, which is not ideal when log volumes increase and decrease. |
||||||
|
|
||||||
|
Loki uses automatic stream sharding to avoid rate limiting and large streams for any log stream by ensuring it is close |
||||||
|
to a configured `desired_rate`. |
||||||
|
|
||||||
|
## How automatic stream sharding works |
||||||
|
|
||||||
|
Automatic stream sharding works by adding a new label, `__stream_shard__`, to streams and incrementing its value to try |
||||||
|
and keep all streams below a configured `desired_rate`. |
||||||
|
|
||||||
|
The feature adds a new API to Ingesters that reports the size of all existing log streams. Once per second, Distributors |
||||||
|
query the API to get a picture of all stream rates in the system. Distributors use the existing stream-rate data and a |
||||||
|
configured `desired_rate` to determine how many shards a given stream should have. The desired number of new log streams |
||||||
|
are created with the label `__stream_shard__` and logs are divided evenly among the streams. |
||||||
|
|
||||||
|
Because automatic stream sharding is reactive and relies on successive calls to Ingesters, the view of current rates is |
||||||
|
always somewhat behind. As a result, the actual size of sharded streams will always be higher than the `desired_rate`. |
||||||
|
In practice, this is still sufficient to keep log producers from being rate limited by per-stream rate limits. |
||||||
|
|
||||||
|
## Enabling and configuring automatic stream sharding |
||||||
|
|
||||||
|
Enable automatic sharding by setting the global or per-tenant override `shard_streams`. This configuration contains |
||||||
|
an `enabled` flag to turn the feature on, a `desired_rate` configuration for the desired stream rate, and an |
||||||
|
optional `logging_enabled` flag to enable debug logging of stream sharding. |
||||||
|
|
||||||
|
*NOTE*: Setting `logging_enabled` may affect the ingestion performance of Loki. |
||||||
|
|
||||||
|
## Automatic stream sharding metrics |
||||||
|
|
||||||
|
Use these metrics to help tune Loki so that it is sharding streams aggressively enough to avoid the per-stream rate |
||||||
|
limit: |
||||||
|
|
||||||
|
- `loki_rate_store_refresh_failures_total`: The total number of failed attempts to refresh the distributor's view of |
||||||
|
stream rates. |
||||||
|
- `loki_rate_store_streams`: The number of unique streams reported by all Ingesters. Sharded streams are reported as if |
||||||
|
they were unsharded. |
||||||
|
- `loki_rate_store_max_stream_shards`: The maximum number of shards for any tenant of the system. |
||||||
|
- `loki_rate_store_stream_shards`: A histogram of the distribution of shard counts across all streams. |
||||||
|
- `loki_rate_store_max_stream_rate_bytes`: The maximum stream size in bytes/second for any tenant of the system. Sharded |
||||||
|
streams are reported as if they are unsharded. |
||||||
|
- `loki_rate_store_max_unique_stream_rate_bytes`: The maximum size of any stream across all tenants. Stream shards are |
||||||
|
individually reported. |
||||||
|
- `loki_rate_store_stream_rate_bytes`: A histogram of the distribution of stream sizes across all tenants in |
||||||
|
bytes/second. |
||||||
|
- `loki_stream_sharding_count`: The total number of times that streams have been sharded. Useful for calculating the |
||||||
|
sharding rate. |
Loading…
Reference in new issue