apitech/loki - loki - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
Ryo Shindo	ed0d15a6bf	loki: enable PathPrefix for queryHandler endpoints (#8406 ) What this PR does / why we need it: Hi, loki developers. I would like to use the option `http_path_prefix` as described in the official documentation. This setting is valid for endpoints defined at `4d55a24ed3/pkg/loki/loki.go (L477-L494)` but not for the endpoints defined at `4d55a24ed3/pkg/loki/modules.go (L366-L392)` . The current latest version with `http_path_prefix` will return the expected response for the former endpoint, but will return a 404 not found error when accessing the latter endpoint. For example, if you define `http_path_prefix = /xxx`, requests for `/xxx/ready` will work as expected, but requests for `/xxx/loki/api/v1/query` will result in 404 errors. I've created a pull request for this issue because I've been experiencing some inconvenience when using loki. Please check it out. Which issue(s) this PR fixes: Fixes https://github.com/grafana/loki/issues/4756 Special notes for your reviewer: Checklist - [x] Reviewed the [`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md) guide (required) - [x] Tests updated --------- Co-authored-by: Michel Hollands <42814411+MichelHollands@users.noreply.github.com> Co-authored-by: Travis Patterson <travis.patterson@grafana.com>	3 years ago
Sandeep Sukhani	3e1f2fc273	caching: do not try to fill the gap in log results cache when the new query interval does not overlap the cached query interval (#9757 ) What this PR does / why we need it: Currently, when we find a relevant cached negative response for a logs query, we do the following: * If the cached query completely covers the new query: * return back an empty response. * else: * fill the gaps on either/both sides of the cached query. The problem with filling the gaps is that when the cached query does not overlap at all with the new query, we have to extend the query beyond what the query requests for. However, with the logs query, we have a limit on the number of lines we can send back in the response. So, this could result in the query response having logs which were not requested by the query, which then get filtered out by the [response extractor](`b78d3f0552/pkg/querier/queryrange/log_result_cache.go (L299)`), unexpectedly resulting in an empty response. For example, if the query was cached for start=15, end=20 and we get a `backwards` query for start=5, end=10. To fill the gap, the query would be executed for start=5, end=15. Now, if we have logs more than the query `limit` in the range 10-15, we would filter out all the data in the response extractor and send back an empty response to the user. This PR fixes the issue by doing the following changes when handling cache hit: * If the cached query completely covers the new query: * return back an empty response[_existing_]. * else if the cached query does not overlap with the new query: * do the new query as requested. * If the new query results in an empty response and has a higher interval than the cached query: * update the cache * else: * query the data for missing intervals on both/either side[_existing_] * update the cache with extended intervals if the new queries resulted in an empty response[_existing_] Special notes for your reviewer: We could do further improvements in the handling of queries not overlapping with cached query by selectively extending the queries based on query direction and cached query lying before/after the new query. For example, if the new query is doing `backwards` query and the `cachedQuery.End` < `newQuery.Start`, it should be okay to extend the query and do `cachedQuery.End` to `newQuery.End` to fill the cache since query would first fill the most relevant data before hitting the limits. I did not want to complicate the fix so went without implementing this approach. We can revisit later if we feel we need to improve our caching. Checklist - [x] Tests updated - [x] `CHANGELOG.md` updated --------- Co-authored-by: Travis Patterson <travis.patterson@grafana.com>	3 years ago
Susana Ferreira	35465d0297	Fix instant query summary split stats (#9773 ) What this PR does / why we need it: Fix instant query summary statistic's `splits` corresponding to the number of subqueries a query is split into based on `split_queries_by_interval`. * Update rangemapper with a statistics structure to include the number of split queries a query is mapped into. * In the `split_by_range` middleware once the mapped query is returned update the middleware statistics with the number of split queries. This value will then be merged with the statistics of the Loki response. Checklist - [ ] Reviewed the [`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md) guide (required) - [ ] Documentation added - [x] Tests updated - [x] `CHANGELOG.md` updated - [ ] If the change is worth mentioning in the release notes, add `add-to-release-notes` label - [ ] Changes that require user attention or interaction to upgrade are documented in `docs/sources/upgrading/_index.md` - [ ] For Helm chart changes bump the Helm chart version in `production/helm/loki/Chart.yaml` and update `production/helm/loki/CHANGELOG.md` and `production/helm/loki/README.md`. [Example PR](`d10549e3ec`)	3 years ago
Travis Patterson	4da0f63789	Remove unused Value field (#9774 ) We didn't end up needing the `Value` field because we can express everything we need to as selectors	3 years ago
Travis Patterson	806674fdaa	Add log-volume feature flag (#9762 ) Adds a feature flag for use with the new log-volume endpoints so associated features can be rolled out incrementally.	3 years ago
Trevor Whitney	dbc3040739	Convert SeriesVolume response to prometheus-style (#9703 ) Changes the response type of the label volume stats endpoint to return volumes as prometheus-style timeseries metrics. It currently only supports instant queries, but is a necessary step to eventually supporting range queries.	3 years ago
Bryan Boreham	904ea0586a	Remove some unused code (#9611 ) What this PR does / why we need it: Remove some functions I came across while making Loki compatible with different `Labels` structures. Move `MetricToLabels` to the only place it is now used, and make it call abstractions instead of assuming `Labels` is a slice. Checklist - [x] Reviewed the [`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md) guide (required) - NA Documentation added - NA Tests updated - NA `CHANGELOG.md` updated - NA Changes that require user attention or interaction to upgrade are documented in `docs/sources/upgrading/_index.md` - NA For Helm chart changes bump the Helm chart version in `production/helm/loki/Chart.yaml` and update `production/helm/loki/CHANGELOG.md` and `production/helm/loki/README.md`. [Example PR](`d10549e3ec`)	3 years ago
Salva Corts	b7359c5d53	Revert "Add summary stats and metrics for stats cache (#9536 )" (#9721 ) This reverts commit `af287ac3eb`. There is a bug in this PR that inflates the stats returned for the query since we reuse the stats ctx in the query execution engine.	3 years ago
Travis Patterson	db97058a84	Series volume endpoint (#9704 ) This changes the `label_volume` endpoint to the `series_volume` endpoint. The new endpoint still returns volumes but now it does it for the requested streams defined by the selector names passed rather than individual labels. All relevant non-requested labels are aggregated into the returned results ex: Assume we have the following streams: ``` {cluster="prod", team="A", component="foo"} {cluster="prod", team="B", component="foo"} {cluster="dev", team="A", component="foo"} {cluster="dev", team="B", component="foo"} ``` - requesting `{cluster="prod"}` returns one result for all streams containing `{cluster="prod"}` - requesting `{cluster=~".+"}` returns two results for the streams containing `{cluster="prod"}` and `{cluster="dev"}` - requesting `{cluster=~".+", team=".+"}` returns four results for the streams containing: ``` {cluster="prod", team="A"} {cluster="prod", team="B"} {cluster="dev", team="A"} {cluster="dev", team="B"} ``` --------- Co-authored-by: Trevor Whitney <trevorjwhitney@gmail.com>	3 years ago
Trevor Whitney	4a56445686	Upgrade `golangci-lint` and fix linting errors (#9601 ) What this PR does / why we need it: Upgrade `golangci-lint` and fixes all the errors. The upgrade includes some stricter linting.	3 years ago
Travis Patterson	065bee7e72	Label Volume Endpoint (#9588 ) For a given set of matchers, returns the top N associated label/value pairs by volume. A query for `{cluster=prod}` will return ``` cluster=prod: size (total logs matching this matcher) . . . nth-label=nth-value ``` This is to service use cases where users want to understand where their log volume has come from by label without making multiple requests to the stats endpoint. Note: This PR is a monster but it's mostly plumbing. I've pointed out the most interesting bits that actually get the volumes from ingesters/indexs	3 years ago
Salva Corts	73ac208981	Improve docs for empty value in cache compression config (#9649 ) What this PR does / why we need it: Follow up PR for https://github.com/grafana/loki/pull/9535#discussion_r1218167670 Checklist - [x] Reviewed the [`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md) guide (required) - [x] Documentation added - [ ] Tests updated - [ ] `CHANGELOG.md` updated - [ ] Changes that require user attention or interaction to upgrade are documented in `docs/sources/upgrading/_index.md` - [ ] For Helm chart changes bump the Helm chart version in `production/helm/loki/Chart.yaml` and update `production/helm/loki/CHANGELOG.md` and `production/helm/loki/README.md`. [Example PR](`d10549e3ec`)	3 years ago
Salva Corts	c6fbff26e1	Add config to avoid caching stats for recent data (#9537 ) What this PR does / why we need it: When we query the stats for recent data, we query both the ingesters and the index gateways for the stats. `ebdb2b1800/pkg/storage/async_store.go (L112-L114)` `ebdb2b1800/pkg/storage/async_store.go (L126-L127)` Then we merge all the responses, which means summing up all the stats `ebdb2b1800/pkg/storage/async_store.go (L157-L158)` `ebdb2b1800/pkg/storage/stores/index/stats/stats.go (L23-L26)` Because we have a replication factor of 3, this means that we will get the stats from the ingesters repeated up to 3 times, hence inflating the stats. In the stats cache, we store the stats for a given matcher set for the whole day, then we extract the stats from the cache by the factor of time from the request that is stored in the cache: `336283acad/pkg/querier/queryrange/index_stats_cache.go (L33)` `336283acad/pkg/querier/queryrange/index_stats_cache.go (L40)` Inflated stats for recent data will be cached, so subsequent stats extracted from the cache will be inflated regardless of the time. This PR adds a new per-tenant limit `max_stats_cache_freshness` to not cache requests with an end time that falls within Now minus this duration. Here's a scenario illustrating this. The graphs below show the bytes stats queried in the sharding middleware. We are running a log filter query that won't match any log, every 5 seconds with a length of 3h. ![image](https://github.com/grafana/loki/assets/8354290/45c2e6e9-185c-4a18-b290-47da27fc3e39) As can be seen, after enabling the stats cache and configuring`do_not_cache_request_within` to not cache stats for requests within 30m, the bytes stats used in the sharding middleware stopped increasing. In both cases the stats cache hit ration was 100%. ![image](https://github.com/grafana/loki/assets/8354290/cd35bcb8-0c77-4693-a06b-502741fd6e23) Special notes for your reviewer: - Blocked by https://github.com/grafana/loki/pull/9535 - Note that this PR doesn't fix the root issue of inflated stats form the ingesters, but rather buys us some time to work on that. Checklist - [x] Reviewed the [`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md) guide (required) - [x] Documentation added - [x] Tests updated - [ ] `CHANGELOG.md` updated - [ ] Changes that require user attention or interaction to upgrade are documented in `docs/sources/upgrading/_index.md` - [ ] For Helm chart changes bump the Helm chart version in `production/helm/loki/Chart.yaml` and update `production/helm/loki/CHANGELOG.md` and `production/helm/loki/README.md`. [Example PR](`d10549e3ec`)	3 years ago
Ivana Huckova	eb7dae4583	Loki: Improve error message when step too low (#9641 ) What this PR does / why we need it: In https://github.com/grafana/grafana/pull/69648 we are in Grafana introducing a step editor in Loki. Unfortunately, the error message when user sets too low step parameter is hard to understand, so I am proposing following change to make it more understandable and actionable. Let me know what do you think. --------- Co-authored-by: J Stickler <julie.stickler@grafana.com>	3 years ago
Salva Corts	af287ac3eb	Add summary stats and metrics for stats cache (#9536 ) What this PR does / why we need it: When a query finishes, we return (and log) the following stats: ```go Cache.Chunk.Requests 0 Cache.Chunk.EntriesRequested 0 Cache.Chunk.EntriesFound 0 Cache.Chunk.EntriesStored 0 Cache.Chunk.BytesSent 0 B Cache.Chunk.BytesReceived 0 B Cache.Chunk.DownloadTime 0s Cache.Index.Requests 0 Cache.Index.EntriesRequested 0 Cache.Index.EntriesFound 0 Cache.Index.EntriesStored 0 Cache.Index.BytesSent 0 B Cache.Index.BytesReceived 0 B Cache.Index.DownloadTime 0s Cache.Result.Requests 13 Cache.Result.EntriesRequested 13 Cache.Result.EntriesFound 13 Cache.Result.EntriesStored 0 Cache.Result.BytesSent 0 B Cache.Result.BytesReceived 2.5 kB Cache.Result.DownloadTime 4.600266ms ``` In addition to that, we log the following in metrics.go: ``` level=info ts=2023-05-29T09:17:10.93029945Z caller=metrics.go:152 component=frontend org_id=145265 traceID=52d59b78fe6b9221 sampled=true latency=fast query="{cluster=\"dev-us-central-0\", namespace=~\"loki.\", container=~\"distributor\|ingester \|promtail\|index-gateway\|compactor\"} \|= \"thislinewillnotexist\"" query_hash=1194136170 query_type=filter range_type=range length=3h0m0s start_delta=165h37m24.930289434s end_delta=162h37m24.930289612s step=43s duration=2.473055ms status=200 lim it=30 returned_lines=0 throughput=0B total_bytes=0B lines_per_second=0 total_lines=0 total_entries=0 store_chunks_download_time=0s queue_time=0s splits=13 shards=0 cache_chunk_req=0 cache_chunk_hit=0 cache_chunk_bytes_stored=0 cache_chunk_bytes _fetched=0 cache_chunk_download_time=0s cache_index_req=0 cache_index_hit=0 cache_index_download_time=0s cache_result_req=13 cache_result_hit=13 cache_result_download_time=4.600266ms ``` With the goal of being able to better monitor how the stats cache is performing; this PR adds stats for the index stats cache, similarly to how it's done for the results cache. Here's an example of the new stats being returned and printed: ```go ... Cache.StatsResult.Requests 180 Cache.StatsResult.EntriesRequested 129 Cache.StatsResult.EntriesFound 129 Cache.StatsResult.EntriesStored 51 Cache.StatsResult.BytesSent 0 B Cache.StatsResult.BytesReceived 75 kB ... ``` And the new stats from metrics.go ``` ... caller=metrics.go:155 ... cache_stats_results_req=129 cache_stats_results_hit=129 cache_stats_results_download_ti me=156.864429ms ... ``` Special notes for your reviewer: - Blocked by https://github.com/grafana/loki/pull/9535 - Note the new`stats.GetOrCreateContext` func. It's used inside the `query.Exec` method so we don't overwrite the stats added in the stats middleware. Checklist* - [x] Reviewed the [`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md) guide (required) - [ ] Documentation added - [x] Tests updated - [ ] `CHANGELOG.md` updated - [ ] Changes that require user attention or interaction to upgrade are documented in `docs/sources/upgrading/_index.md` - [ ] For Helm chart changes bump the Helm chart version in `production/helm/loki/Chart.yaml` and update `production/helm/loki/CHANGELOG.md` and `production/helm/loki/README.md`. [Example PR](`d10549e3ec`)	3 years ago
Salva Corts	1694ad0f9b	Stats cache can be configured independently (#9535 ) What this PR does / why we need it: Before this PR, the index stats cache would use the same config as the query results cache. This was a limitation since: 1. We would not be able to point to a different cache for storing the index stats if needed. 2. We would not be able to add specific settings for this cache, without adding it to the results cache. In this PR, we refactor the index stats cache config to be independently configurable. Note that if it's not configured, it will try to use the results cache settings. Which issue(s) this PR fixes: This is needed for: - https://github.com/grafana/loki/pull/9537 - https://github.com/grafana/loki/pull/9536 Special notes for your reviewer: - This PR also refactors all the tripperwares in rountrip.go to reuse the same stats tripperware instead of each one creating their own. - Configuring a new cache in rountrip.go is a requirement for https://github.com/grafana/loki/pull/9536 so the stats summary can distinguish before the stats cache and the results cache. Checklist - [x] Reviewed the [`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md) guide (required) - [x] Documentation added - [x] Tests updated - [x] `CHANGELOG.md` updated - [ ] Changes that require user attention or interaction to upgrade are documented in `docs/sources/upgrading/_index.md` - [ ] For Helm chart changes bump the Helm chart version in `production/helm/loki/Chart.yaml` and update `production/helm/loki/CHANGELOG.md` and `production/helm/loki/README.md`. [Example PR](`d10549e3ec`)	3 years ago
Sandeep Sukhani	0a5e149ea5	query-scheduler: fix query distribution in SSD mode (#9471 ) What this PR does / why we need it: When we run the `query-scheduler` in `ring` mode, `queriers` and `query-frontend` discover the available `query-scheduler` instances using the ring. However, we have a problem when `query-schedulers` are not running in the same process as queriers and query-frontend since [we try to get the ring client interface from the scheduler instance](`abd6131bba/pkg/loki/modules.go (L358)`). This causes queries not to be spread across all the available queriers when running in SSD mode because [we point querier workers to query frontend when there is no ring client and scheduler address configured](`b05f4fced3/pkg/querier/worker_service.go (L115)`). I have fixed this issue by adding a new hidden target to initialize the ring client in `reader`/`member` mode based on which service is initializing it. `reader` mode will be used by `queriers` and `query-frontend` for discovering `query-scheduler` instances from the ring. `member` mode will be used by `query-schedulers` for registering themselves in the ring. I have also made a couple of changes not directly related to the issue but it fixes some problems: * [reset metric registry for each integration test](`18c4fe5907`) - Previously we were reusing the same registry for all the tests and just [ignored the attempts to register same metrics](`01f0ded7fc/integration/cluster/cluster.go (L113)`). This causes the registry to have metrics registered only from the first test so any updates from subsequent tests won't reflect in the metrics. metrics was the only reliable way for me to verify that `query-schedulers` were connected to `queriers` and `query-frontend` when running in ring mode in the integration test that I added to test my changes. This should also help with other tests where earlier it was hard to reliably check the metrics. * [load config from cli as well before applying dynamic config](`f9e2448fc7`) - Previously we were applying dynamic config considering just the config from config file. This results in unexpected config changes, for example, [this config change](`4148dd2c51/integration/loki_micro_services_test.go (L66)`) was getting ignored and [dynamic config tuning was unexpectedly turning on ring mode](`52cd0a39b8/pkg/loki/config_wrapper.go (L94)`) in the config. It is better to do any config tuning based on both file and cli args configs. Which issue(s) this PR fixes: Fixes #9195	3 years ago
Salva Corts	87a659a6db	Add span events for index stats and result cache (#9552 ) What this PR does / why we need it: This PR adds events to the traces to have some extra observability for how we compute the index stats. We also add some trace events to the results cache. ![image](https://github.com/grafana/loki/assets/8354290/7566b755-8193-4e46-ba10-37d3377ea31a) ![image](https://github.com/grafana/loki/assets/8354290/d1990150-84b1-4522-9898-6e37c2782c5b) ![image](https://github.com/grafana/loki/assets/8354290/a8c23e7f-a06d-4a47-8cd4-e900fce01e80) ![image](https://github.com/grafana/loki/assets/8354290/d1e15fb6-fb6c-4fe1-9c5f-f1c8164889de) ![image](https://github.com/grafana/loki/assets/8354290/0c0d001e-7083-488c-8809-0446b4b7c852) Special notes for your reviewer: Checklist - [ ] Reviewed the [`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md) guide (required) - [ ] Documentation added - [ ] Tests updated - [ ] `CHANGELOG.md` updated - [ ] Changes that require user attention or interaction to upgrade are documented in `docs/sources/upgrading/_index.md` - [ ] For Helm chart changes bump the Helm chart version in `production/helm/loki/Chart.yaml` and update `production/helm/loki/CHANGELOG.md` and `production/helm/loki/README.md`. [Example PR](`d10549e3ec`)	3 years ago
Owen Diehl	2efd059b49	Slight improvements to `GetFactorOfTime` (#9473 ) * correctly returns zero for non-overlapping data * adds tests	3 years ago
Paul Rogers	14370bb8ce	Revert "Augment statistics.." PR 9400. (#9430 ) What this PR does / why we need it: This PR reverts PR 9400. The data collected within that PR was not sufficient. When queries are done, they are filtered before the merge iterator, resulting in an inability to collect an accurate count of duplicated data. Which issue(s) this PR fixes: Special notes for your reviewer: Checklist - [ ] Reviewed the [`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md) guide (required) - [ ] Documentation added - [ ] Tests updated - [ ] `CHANGELOG.md` updated - [ ] Changes that require user attention or interaction to upgrade are documented in `docs/sources/upgrading/_index.md`	3 years ago
Paul Rogers	1671751cbd	Augment statistics to note how many bytes are in duplicate lines due to replicas (#9400 ) What this PR does / why we need it: This PR is for counting the number of bytes of log lines that were marked as duplicates. This will be utilized to collect better statistics.	3 years ago
Peter Štibraný	90a1d4593e	Update Prometheus dependency (#9205 )	3 years ago
Salva Corts	422560b6b1	Flag to disable index stats cache (#9177 ) What this PR does / why we need it: At https://github.com/grafana/loki/pull/8972 we started caching all index stats requests. If the results cache gets overloaded, it can quickly take down the rest of the loki cell due to all the increased work. This PR adds a new flag so we can easily disable caching index stats requests. Which issue(s) this PR fixes: This PR is a follow up for https://github.com/grafana/loki/pull/8972 Special notes for your reviewer: Checklist - [x] Reviewed the [`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md) guide (required) - [x] Documentation added - [x] Tests updated - [x] `CHANGELOG.md` updated - [ ] Changes that require user attention or interaction to upgrade are documented in `docs/sources/upgrading/_index.md`	3 years ago
Salva Corts	fd16425062	Cache index stats requests (#8972 ) What this PR does / why we need it: As described in https://github.com/grafana/loki/issues/8973, we are substantially increasing the load of index stat requests we sent to our index gateways. Many of these requests should be easily re-used by caching them. This PR adds caching for index stat requests by reusing the results cache. Here's a demo ([source][1]): ![image](https://user-images.githubusercontent.com/8354290/229104609-4dd26f0a-9260-4f21-85ef-ac4a86ebba7a.png) Which issue(s) this PR fixes: Fixes https://github.com/grafana/loki/issues/8973 Special notes for your reviewer: Checklist - [x] Reviewed the [`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md) guide (required) - [ ] Documentation added - [x] Tests updated - [x] `CHANGELOG.md` updated - [ ] Changes that require user attention or interaction to upgrade are documented in `docs/sources/upgrading/_index.md` [1]: https://ops.grafana-ops.net/d/afcaef21-e5ad-49e7-ab06-42a9d7d915eb/index-stats?orgId=1&var-datasource=dev-cortex&var-cluster=dev-eu-west-2&var-namespace=loki-dev-009&var-loki_datasource=Grafana%20Logging&from=1680259907288&to=1680260431814&var-operation=All --------- Co-authored-by: Owen Diehl <ow.diehl@gmail.com>	3 years ago
Salva Corts	8cf921a145	Pass engine opts down to middlewares (#9130 ) What this PR does / why we need it: The following middlewares in the query frontend uses a downstream engine: - `NewQuerySizeLimiterMiddleware` and `NewQuerierSizeLimiterMiddleware` - `NewQueryShardMiddleware` - `NewSplitByRangeMiddleware` These were all creating the downstream engine as follows: ```go logql.NewDownstreamEngine(logql.EngineOpts{LogExecutingQuery: false}, DownstreamHandler{next: next, limits: limits}, limits, logger), ``` As can be seen, the [engine options configured in Loki][1] were not being used at all. In the case of `NewQuerySizeLimiterMiddleware`, `NewQuerierSizeLimiterMiddleware` and `NewQueryShardMiddleware`, the downstream engine was created to get the `MaxLookBackPeriod`. When creating a new Downstream Engine as above, the `MaxLookBackPeriod` [would always be the default][2] (30 seconds). This PR fixes this by passing down the engine config to these middlewares, so this config is used to create the new downstream engines. Which issue(s) this PR fixes: Adresses some pending tasks from https://github.com/grafana/loki/pull/8670#issuecomment-1507031976. Special notes for your reviewer: Checklist - [x] Reviewed the [`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md) guide (required) - [ ] Documentation added - [x] Tests updated - [x] `CHANGELOG.md` updated - [ ] Changes that require user attention or interaction to upgrade are documented in `docs/sources/upgrading/_index.md` [1]: `1bcf683513/pkg/querier/querier.go (L52)` [2]: `edc6b0bff7/pkg/logql/engine.go (L136-L140)`	3 years ago
Trevor Whitney	c587b538ed	Fail through to next middleware when querySizeLimit cannot be applied (#9050 ) What this PR does / why we need it: When the query size limiter can't limit the query, fail through to the next middleware instead of erroring. This can happen, for example, when a query spans schemas, which is still a valid query case, so we want to make sure to fall back to existing behavior. --------- Co-authored-by: Owen Diehl <ow.diehl@gmail.com>	3 years ago
Owen Diehl	acb40ed40e	Eager stream merge (#8968 ) This PR introduces a specialized heap based datastructure to merge incoming log results in the frontend. Recently we've experienced an increase in OOMs on frontends due to logs queries which match lots of data. Sharded requests in loki split based on the amount of data we expect and some queries see thousands of sub requests. For log queries, we'll fetch up the `limit` from each shard, return them to the frontend, and merge. High shard counts * limit log lines, especially combined with large log lines (in byte terms) are accumulated on the frontend. Once they all are received, the frontend merges them. This creates opportunity for OOMs as it can hold up a lot of memory. This PR addresses one of these problems by eagerly accumulating responses as they're received and only retaining a total `limit` number of entries. There's still OOM potential due to race conditions between sub requests returning to the query-frontend and the query-frontend merging other sub requests, but this definitely improves the situation. I've been able to consistently run large limited queries that touch TBs of data (i.e. `{cluster=~".+"} \|= "a"`) that previously OOMed frontends. --------- Signed-off-by: Owen Diehl <ow.diehl@gmail.com>	3 years ago
Owen Diehl	62403350a5	remove redundant splitby middleware (#8996 ) Found this double-copied line which a mistake. This PR removes one of them which won't change behavior (besides removing duplicate spans/etc).	3 years ago
Ed Welch	b892cade6a	Loki: Fixes incorrect query result when querying with start time == end time (#8979 ) What this PR does / why we need it: In several places within Loki we need to determine if a query is a `range query` or `instant query`, this is done by checking to see if the start and end time are equal and the `step=0` The downstream handler was not checking for `step=0` and thus it incorrectly mapped a range query to an instant query when a query has a start time equal to and end time. There are a few other things at play here, mainly that we should really error anytime someone tries to run an instant query for logs which would have exposed this error much more easily. But that's something I'd like to handle in a different PR as it will be considered a breaking change depending on how we do it. This PR uses an existing function we have for testing the query type and addresses the issue found in #8885 Which issue(s) this PR fixes: Fixes #8885 Special notes for your reviewer: Checklist - [ ] Reviewed the [`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md) guide (required) - [ ] Documentation added - [ ] Tests updated - [x] `CHANGELOG.md` updated - [ ] Changes that require user attention or interaction to upgrade are documented in `docs/sources/upgrading/_index.md` --------- Signed-off-by: Edward Welch <edward.welch@grafana.com>	3 years ago
Ed Welch	edc6b0bff7	Loki: Add a limit for the [range] value on range queries (#8343 ) Signed-off-by: Edward Welch <edward.welch@grafana.com> What this PR does / why we need it: Loki does not currently split queries by time to a value smaller than what's in the [range] of a range query. Example ``` sum(rate({job="foo"}[2d])) ``` Imagine now this query being executed over a longer window of a few days with a step of something like 30m. Every step evaluation would query the last [2d] of data. There are use cases where this is desired, specifically if you force the step to match the value in the range, however what is more common is someone accidentally uses `[$__range]` in here instead of `[$__interval]` within Grafana and then sets the query time selector to a large value like 7 days. This PR adds a limit which will fail queries that set the [range] value higher than the configured limit. It's disabled by default. In the future it may be possible for Loki to perform splits within the [range] and remove the need for this limit, but until then this can be an important safeguard in clusters with a lot of data. Which issue(s) this PR fixes: Fixes #8746 Special notes for your reviewer: Checklist - [ ] Reviewed the [`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md) guide (required) - [ ] Documentation added - [ ] Tests updated - [ ] `CHANGELOG.md` updated - [ ] Changes that require user attention or interaction to upgrade are documented in `docs/sources/upgrading/_index.md` --------- Signed-off-by: Edward Welch <edward.welch@grafana.com> Co-authored-by: Karsten Jeschkies <karsten.jeschkies@grafana.com> Co-authored-by: Vladyslav Diachenko <82767850+vlad-diachenko@users.noreply.github.com>	3 years ago
Dylan Guedes	9159c1dac3	Loki: Improve spans usage (#8927 ) What this PR does / why we need it: - At different places, inherit the span/spanlogger from the given context instead of instantiating a new one from scratch, which fix spans being orphaned on a read/write operation. - At different places, turn spans into events. Events are lighter than spans and by having fewer spans in the trace, trace visualization will be cleaner without losing any details. - Adds new spans/events to places that might be a bottleneck for our writes/reads.	3 years ago
Periklis Tsirakidis	1bcf683513	Expose optional label matcher for label values handler (#8824 )	3 years ago
Salva Corts	45775c82f7	Implement `RequiredNumberLabels` query limit (#8918 ) What this PR does / why we need it: As pointed out in https://github.com/grafana/loki/pull/8851, some queries can impose a great workload on a cluster by selecting too many streams. Similarly to the `RequiredLabels` limit introduced at https://github.com/grafana/loki/pull/8851, here we add a new limit `RequiredNumberLabels` to require queries to specify at least N label. For example, if the limit is set to 2, then the query should contain at least 2 label matchers. This limit can be configured per tenant and at query time. ![image](https://user-images.githubusercontent.com/8354290/228271398-4b9bcc49-f539-4e94-86c1-071e519a30a9.png) Which issue(s) this PR fixes: Fixes https://github.com/grafana/loki-private/issues/699 Special notes for your reviewer: Checklist - [x] Reviewed the [`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md) guide (required) - [x] Documentation added - [x] Tests updated - [x] `CHANGELOG.md` updated - [ ] Changes that require user attention or interaction to upgrade are documented in `docs/sources/upgrading/_index.md` --------- Co-authored-by: Dylan Guedes <djmgguedes@gmail.com>	3 years ago
Salva Corts	ee69f2bd37	Split index request in 24h intervals (#8909 ) What this PR does / why we need it: At https://github.com/grafana/loki/pull/8670, we applied a time split of 24h intervals to all index stats requests to enforce the `max_query_bytes_read` and `max_querier_bytes_read` limits. When the limit is surpassed, the following message get's displayed: ![image](https://user-images.githubusercontent.com/8354290/227960400-b74a0397-13ef-4143-a1fc-48d885af55c0.png) As can be seen, the reported bytes read by the query are not the same as those reported by Grafana in the lower right corner of the query editor. This is because: 1. The index stats request for enforcing the limit is split in subqueries of 24h. The other index stats rquest is not time split. 2. When enforcing the limit, we are not displaying the bytes in powers of 2, but powers of 10 ([see here][2]). I.e. 1KB is 1000B vs 1KiB is 1024B. This PR adds the same logic to all index stats requests so we also time split by 24 intervals all requests that hit the Index Stats API endpoint. We also use powers of 2 instead of 10 on the message when enforcing `max_query_bytes_read` and `max_querier_bytes_read`. ![image](https://user-images.githubusercontent.com/8354290/227959491-f57cf7d2-de50-4ee6-8737-faeafb528f99.png) Note that the library we use under the hoot to print the bytes rounds up and down to the nearest integer ([see][3]); that's why we see 16GiB compared to the 15.5GB in the Grafana query editor. Which issue(s) this PR fixes: Fixes https://github.com/grafana/loki/issues/8910 Special notes for your reviewer: - I refactored the`newQuerySizeLimiter` function and the rest of the _Tripperwares_ in `rountrip.go` to reuse the new IndexStatsTripperware. So we configure the split-by-time middleware only once. Checklist - [x] Reviewed the [`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md) guide (required) - [x] Documentation added - [x] Tests updated - [x] `CHANGELOG.md` updated - [ ] Changes that require user attention or interaction to upgrade are documented in `docs/sources/upgrading/_index.md` [1]: https://grafana.com/docs/loki/latest/api/#index-stats [2]: https://github.com/grafana/loki/blob/main/pkg/querier/queryrange/limits.go#L367-L368 [3]: https://github.com/dustin/go-humanize/blob/master/bytes.go#L75-L78	3 years ago
Salva Corts	336e08fc4b	Salvacorts/max querier size messaging (#8916 ) What this PR does / why we need it: In https://github.com/grafana/loki/pull/8670 we introduced a new limit `max_querier_bytes_read`. When the limit was surpassed the following error message is printed: ``` query too large to execute on a single querier, either because parallelization is not enabled, the query is unshardable, or a shard query is too big to execute: (query: %s, limit: %s). Consider adding more specific stream selectors or reduce the time range of the query ``` As pointed out in [this comment][1], a user would have a hard time figuring out whether the cause was `parallelization is not enabled`, `the query is unshardable` or `a shard query is too big to execute`. This PR improves the error messaging for the `max_querier_bytes_read` limit to raise a different error for each of the causes above. Which issue(s) this PR fixes: Followup for https://github.com/grafana/loki/pull/8670 Special notes for your reviewer: Checklist - [x] Reviewed the [`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md) guide (required) - [x] Documentation added - [x] Tests updated - [ ] `CHANGELOG.md` updated - [ ] Changes that require user attention or interaction to upgrade are documented in `docs/sources/upgrading/_index.md` [1]: https://github.com/grafana/loki/pull/8670#discussion_r1146008266 --------- Co-authored-by: Danny Kopping <danny.kopping@grafana.com>	3 years ago
Bryan Boreham	793a689d1f	Iterators: re-implement mergeEntryIterator using loser.Tree for performance (#8637 ) What this PR does / why we need it: Building on #8351, this re-implements `mergeEntryIterator` using `loser.Tree`; the benchmark says it goes much faster but uses a bit more memory (while building the tree). ``` name old time/op new time/op delta SortIterator/merge_sort-4 10.7ms ± 4% 2.9ms ± 2% -72.74% (p=0.008 n=5+5) name old alloc/op new alloc/op delta SortIterator/merge_sort-4 11.2kB ± 0% 21.7kB ± 0% +93.45% (p=0.008 n=5+5) name old allocs/op new allocs/op delta SortIterator/merge_sort-4 6.00 ± 0% 7.00 ± 0% +16.67% (p=0.008 n=5+5) ``` The implementation is very different: rather than relying on iterators supporting `Peek()`, `mergeEntryIterator` now pulls items into its buffer until it finds one with a different timestamp or stream, and always works off what is in the buffer. The comment `"[we] pop the ones whose common value occurs most often."` did not appear to match the previous implementation, and no attempt was made to match this comment. A `Push()` function was added to `loser.Tree` to support live-streaming. This works by finding or making an empty slot, then re-running the initialize function to find the new winner. A consequence is that the previous "winner" value is lost after calling `Push()`, and users must call `Next()` to see the next item. A couple of tests had to be amended to avoid assuming particular behaviour of the implementation; I recommend that reviewers consider these closely. Checklist - [x] Reviewed the [`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md) guide (required) - NA Documentation added - [x] Tests updated - NA `CHANGELOG.md` updated - NA Changes that require user attention or interaction to upgrade are documented in `docs/sources/upgrading/_index.md`	3 years ago
Salva Corts	d24fe3e68b	Max bytes read limit (#8670 ) What this PR does / why we need it: This PR implements two new per-tenant limits that are enforced on log and metric queries (both range and instant) when TSDB is used: - `max_query_bytes_read`: Refuse queries that would read more than the configured bytes here. Overall limit regardless of splitting/sharding. The goal is to refuse queries that would take too long. The default value of 0 disables this limit. - `max_querier_bytes_read`: Refuse queries in which any of their subqueries after splitting and sharding would read more than the configured bytes here. The goal is to avoid a querier from running a query that would load too much data in memory and can potentially get OOMed. The default value of 0 disables this limit. These new limits can be configured per tenant and per query (see https://github.com/grafana/loki/pull/8727). The bytes a query would read are estimated through TSDB's index stats. Even though they are not exact, they are good enough to have a rough estimation of whether a query is too big to run or not. For more details on this refer to this discussion in the PR: https://github.com/grafana/loki/pull/8670#discussion_r1124858508. Both limits are implemented in the frontend. Even though we considered implementing `max_querier_bytes_read` in the querier, this way, the limits for pre and post splitting/sharding queries are enforced close to each other on the same component. Moreover, this way we can reduce the number of index stats requests issued to the index gateways by reusing the stats gathered while sharding the query. With regard to how index stats requests are issued: - We parallelize index stats requests by splitting them into queries that span up to 24h since our indices are sharded by 24h periods. On top of that, this prevents a single index gateway from processing a single huge request like `{app=~".+"} for 30d`. - If sharding is enabled and the query is shardable, for `max_querier_bytes_read`, we re-use the stats requests issued by the sharding ware. Specifically, we look at the [bytesPerShard][1] to enforce this limit. Note that once we merge this PR and enable these limits, the load of index stats requests will increase substantially and we may discover bottlenecks in our index gateways and TSDB. After speaking with @owen-d, we think it should be fine as, if needed, we can scale up our index gateways and support caching index stats requests. Here's a demo of this working: <img width="1647" alt="image" src="https://user-images.githubusercontent.com/8354290/226918478-d4b6c2fd-de4d-478a-9c8b-e38fe148fa95.png"> <img width="1647" alt="image" src="https://user-images.githubusercontent.com/8354290/226918798-a71b1db8-ea68-4d00-933b-e5eb1524d240.png"> Which issue(s) this PR fixes: This PR addresses https://github.com/grafana/loki-private/issues/674. Special notes for your reviewer: - @jeschkies has reviewed the changes related to query-time limits. - I've done some refactoring in this PR: - Extracted logic to get stats for a set of matches into a new function [getStatsForMatchers][2]. - Extracted the _Handler_ interface implementation for [queryrangebase.roundTripper][3] into a new type [queryrangebase.roundTripperHandler][4]. This is used to create the handler that skips the rest of configured middlewares when sending an index stat quests ([example][5]). Checklist - [x] Reviewed the [`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md) guide (required) - [x] Documentation added - [x] Tests updated - [x] `CHANGELOG.md` updated - [ ] Changes that require user attention or interaction to upgrade are documented in `docs/sources/upgrading/_index.md` [1]: `ff847305af/pkg/querier/queryrange/shard_resolver.go (L179-L186)` [2]: `ff847305af/pkg/querier/queryrange/shard_resolver.go (L72)` [3]: `3d2fff3a2d/pkg/querier/queryrange/queryrangebase/roundtrip.go (L124)` [4]: `3d2fff3a2d/pkg/querier/queryrange/queryrangebase/roundtrip.go (L163)` [5]: `f422e0a52b/pkg/querier/queryrange/roundtrip.go (L521)`	3 years ago
Karsten Jeschkies	94725e7908	Define `RequiredLabels` query limit. (#8851 ) What this PR does / why we need it: Some end-users can impose great workload on a cluster by selecting too many streams in their queries. We should be able to limit them. Therefore we introduce a new limit `RequiredLabelMatchers` which list label names that must be included in the stream selectors. The implementation follows the same approach as for max query limit. Which issue(s) this PR fixes: Fixes #8745 Checklist - [ ] Reviewed the [`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md) guide (required) - [x] Documentation added - [x] Tests updated - [x] `CHANGELOG.md` updated - [ ] Changes that require user attention or interaction to upgrade are documented in `docs/sources/upgrading/_index.md`	3 years ago
Karsten Jeschkies	f5f1753851	Print duration in error messages with more readable. (#8816 ) What this PR does / why we need it: The old error messages would print only up to hours. E.g. `169h30s`. This change will print it as `7d1h30s`. See [model.Duration](`66b493f42b/model/time.go (L259-L290)`) for details. Checklist - [ ] Reviewed the [`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md) guide (required) - [ ] Documentation added - [x] Tests updated - [ ] `CHANGELOG.md` updated - [ ] Changes that require user attention or interaction to upgrade are documented in `docs/sources/upgrading/_index.md`	3 years ago
Christian Haudum	be8b4eece3	Scheduler: Add query fairness control across multiple actors within a tenant (#8752 ) What this PR does / why we need it: This PR wires up the scheduler with the hierarchical queues. It is the last PR to implement https://github.com/grafana/loki/pull/8585. When these changes are in place, the client performing query requests can control their QoS (query fairness) using the `X-Actor-Path` HTTP header. This header controls in which sub-queue of the tenant's scheduler queue the query request is enqueued. The place within the hierarchy where it is enqueued defines the probability with which the request gets dequeued. A common use-case for this QoS control is giving each Grafana user within a tenant their fair share of query execution time. Any documentation is still missing and will be provided by follow-up PRs. Special notes for your reviewer: ```console $ gotest -count=1 -v ./pkg/scheduler/queue/... -test.run=TestQueryFairness === RUN TestQueryFairness === RUN TestQueryFairness/use_hierarchical_queues_=_false dequeue_qos_test.go:109: duration actor a 2.007765568s dequeue_qos_test.go:109: duration actor b 2.209088331s dequeue_qos_test.go:112: total duration 2.209280772s === RUN TestQueryFairness/use_hierarchical_queues_=_true dequeue_qos_test.go:109: duration actor b 605.283144ms dequeue_qos_test.go:109: duration actor a 2.270931324s dequeue_qos_test.go:112: total duration 2.271108551s --- PASS: TestQueryFairness (4.48s) --- PASS: TestQueryFairness/use_hierarchical_queues_=_false (2.21s) --- PASS: TestQueryFairness/use_hierarchical_queues_=_true (2.27s) PASS ok github.com/grafana/loki/pkg/scheduler/queue 4.491s ``` ```console $ gotest -count=5 -v ./pkg/scheduler/queue/... -bench=Benchmark -test.run=^$ -benchtime=10000x -benchmem goos: linux goarch: amd64 pkg: github.com/grafana/loki/pkg/scheduler/queue cpu: 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz BenchmarkGetNextRequest BenchmarkGetNextRequest/without_sub-queues BenchmarkGetNextRequest/without_sub-queues-8 10000 29337 ns/op 1600 B/op 100 allocs/op BenchmarkGetNextRequest/without_sub-queues-8 10000 21348 ns/op 1600 B/op 100 allocs/op BenchmarkGetNextRequest/without_sub-queues-8 10000 21595 ns/op 1600 B/op 100 allocs/op BenchmarkGetNextRequest/without_sub-queues-8 10000 21189 ns/op 1600 B/op 100 allocs/op BenchmarkGetNextRequest/without_sub-queues-8 10000 21602 ns/op 1600 B/op 100 allocs/op BenchmarkGetNextRequest/with_1_level_of_sub-queues BenchmarkGetNextRequest/with_1_level_of_sub-queues-8 10000 33770 ns/op 2400 B/op 200 allocs/op BenchmarkGetNextRequest/with_1_level_of_sub-queues-8 10000 33596 ns/op 2400 B/op 200 allocs/op BenchmarkGetNextRequest/with_1_level_of_sub-queues-8 10000 34432 ns/op 2400 B/op 200 allocs/op BenchmarkGetNextRequest/with_1_level_of_sub-queues-8 10000 33760 ns/op 2400 B/op 200 allocs/op BenchmarkGetNextRequest/with_1_level_of_sub-queues-8 10000 33664 ns/op 2400 B/op 200 allocs/op BenchmarkGetNextRequest/with_2_levels_of_sub-queues BenchmarkGetNextRequest/with_2_levels_of_sub-queues-8 10000 71405 ns/op 3200 B/op 300 allocs/op BenchmarkGetNextRequest/with_2_levels_of_sub-queues-8 10000 59472 ns/op 3200 B/op 300 allocs/op BenchmarkGetNextRequest/with_2_levels_of_sub-queues-8 10000 117163 ns/op 3200 B/op 300 allocs/op BenchmarkGetNextRequest/with_2_levels_of_sub-queues-8 10000 106505 ns/op 3200 B/op 300 allocs/op BenchmarkGetNextRequest/with_2_levels_of_sub-queues-8 10000 64374 ns/op 3200 B/op 300 allocs/op BenchmarkQueueRequest BenchmarkQueueRequest-8 10000 168391 ns/op 320588 B/op 1156 allocs/op BenchmarkQueueRequest-8 10000 166203 ns/op 320587 B/op 1156 allocs/op BenchmarkQueueRequest-8 10000 149518 ns/op 320584 B/op 1156 allocs/op BenchmarkQueueRequest-8 10000 219776 ns/op 320583 B/op 1156 allocs/op BenchmarkQueueRequest-8 10000 185198 ns/op 320597 B/op 1156 allocs/op PASS ok github.com/grafana/loki/pkg/scheduler/queue 64.648s ``` Signed-off-by: Christian Haudum <christian.haudum@gmail.com>	3 years ago
Danny Kopping	33e44ed39d	Ruler: remote rule evaluation (#8744 ) What this PR does / why we need it: Adds the ability to evaluate recording & alerting rules against a given `query-frontend`, allowing these queries to be executed with all the parallelisation & optimisation that regular adhoc queries have. This is important because with `local` evaluation all queries are single-threaded, and rules that evaluate a large range/volume of data may timeout or OOM the `ruler` itself, leading to missed metrics or alerts. When `remote` evaluation mode is enabled, the `ruler` effectively just becomes a gRPC client for the `query-frontend`, which will dramatically improve the reliability of the `ruler` and also drastically reduce its resource requirements. Which issue(s) this PR fixes: This PR implements the feature discussed in https://github.com/grafana/loki/pull/8129 (LID 0002: Remote Rule Evaluation).	3 years ago
Ed Welch	a4eb536fb2	Loki: remove `subqueries` from metrics.go logging and replace it with separate split and shard counters (#8761 ) What this PR does / why we need it: Currently the `metrics.go` log line emitted after every query includes a metric called "subqueries". This currently tracks the number of queries created by the split_by_time operations done in Loki, but does not include any counts for subqueries created as a result of sharding. It becomes difficult to make a single subqueries counter that gives useful information to determine how much a query is split by time and sharded by a shard factor, especially now that sharding in TSDB indexes is dynamic. This PR removes and deprecates the `subqueries` stat and instead creates a `splits` and `shards` statistic which records how much a query was split_by_time and the total number of shards created as well. Which issue(s) this PR fixes: Fixes #<issue number> Special notes for your reviewer: Checklist - [ ] Reviewed the [`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md) guide (required) - [ ] Documentation added - [x] Tests updated - [x] `CHANGELOG.md` updated - [ ] Changes that require user attention or interaction to upgrade are documented in `docs/sources/upgrading/_index.md` --------- Signed-off-by: Edward Welch <edward.welch@grafana.com>	3 years ago
Callum Styan	5a85f6647e	Add initial implementation of per-query limits (#8727 ) What this PR does / why we need it: Sometimes we want to limit the impact of a single query by imposing limits that are stricter than the current tenant limit. E.g. the maximum query length could be seven days but based on the query or an admins decision a query should just have a maximum length of one day. This is where per-request limits come into play. They are passed via the `X-Loki-Query-Limit` header and extracted into the requests context. It is the responsibility of the operator or admin that the header is valid. Which issue(s) this PR fixes: Fixes #8762 Checklist - [ ] Reviewed the [`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md) guide (required) - [ ] Documentation added - [x] Tests updated - [x] `CHANGELOG.md` updated - [x] Changes that require user attention or interaction to upgrade are documented in `docs/sources/upgrading/_index.md` --------- Signed-off-by: Callum Styan <callumstyan@gmail.com> Co-authored-by: Karsten Jeschkies <karsten.jeschkies@grafana.com>	3 years ago
Callum Styan	9a2a038f43	Allow passing of context to query related limits functions (#8689 ) In this PR we're allowing for passing of a `context.Context` via the Limits interfaces (some of which are new, to clean up hardcoding/embedding of `validation.Overrides`) This is based on work/ideas by @jeschkies . Fixes #8694 --------- Signed-off-by: Callum Styan <callumstyan@gmail.com> Co-authored-by: Karsten Jeschkies <karsten.jeschkies@grafana.com>	3 years ago
Bryan Boreham	6fd4b5e89b	Update prometheus/prometheus from 2.41 to 2.42 (#8571 ) What this PR does / why we need it: Brings in the latest updates from upstream. These open up some opportunities for optimisations in TSDB indexing. Dependencies updated: * github.com/Azure/go-autorest/autorest/adal v0.9.21 -> v0.9.22 (comment-only change) * github.com/docker/docker v20.10.21 -> v20.10.23 (fixing filters bug) * golang.org/x/exp fae10dda9338 -> d38c7dcee874 (optimisations in `BinarySearch` function) Indirect dependencies also updated: * github.com/digitalocean/godo v1.91.1 -> v1.95.0 (nothing alarming in [release notes](https://github.com/digitalocean/godo/releases)) * github.com/google/pprof aee1124e3a93 -> 76d1ae5aea2b (no changes pulled into vendor) * golang.org/x/tools v0.4.0 -> v0.5.0 (relating to Go compiler utilities) * google.golang.org/genproto 76db0878b65f -> 31e0e69b6fc2 * k8s.io/api v0.26.0 -> v0.26.1 (comment-only change) * k8s.io/apimachinery v0.26.0 -> v0.26.1 (no changes pulled into vendor) * k8s.io/client-go v0.26.0 -> v0.26.1 (small fixes) Special notes for your reviewer: A couple of interfaces changed; these have required matching changes in Loki code. Those changes are split into separate commits. I also note that most calls to `relabel` ignore when the rule says "drop". Maybe this is wrong?	3 years ago
Garrett	433d5bf913	fix panics when cloning a special query (#8531 ) Signed-off-by: garrettlish <garrett.li.sh@gmail.com>	3 years ago
Owen Diehl	6a7403c4f5	correctly calculate max shards (#8494 )	3 years ago
Ed Welch	9f0834793b	Loki: set a maximum number of shards for "limited" queries instead of fixed number (#8487 ) Signed-off-by: Edward Welch <edward.welch@grafana.com>	3 years ago
Ed Welch	37169ca444	Loki: Process "Limited" queries sequentially and not in parallel (#8482 ) Signed-off-by: Edward Welch <edward.welch@grafana.com>	3 years ago
Christian Haudum	96d5227532	Fix parsing of vector expression (#8448 ) Signed-off-by: Christian Haudum <christian.haudum@gmail.com>	3 years ago

1 2 3 4 5 ...

425 Commits (954df433e98f659d006ced52b23151cb5eb2fdfa)