Like Prometheus, but for logs.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
loki/pkg/querier/queryrange/log_result_cache_test.go

726 lines
22 KiB

package queryrange
import (
"context"
"fmt"
"testing"
"time"
"github.com/go-kit/log"
caching: do not try to fill the gap in log results cache when the new query interval does not overlap the cached query interval (#9757) **What this PR does / why we need it**: Currently, when we find a relevant cached negative response for a logs query, we do the following: * If the cached query completely covers the new query: * return back an empty response. * else: * fill the gaps on either/both sides of the cached query. The problem with filling the gaps is that when the cached query does not overlap at all with the new query, we have to extend the query beyond what the query requests for. However, with the logs query, we have a limit on the number of lines we can send back in the response. So, this could result in the query response having logs which were not requested by the query, which then get filtered out by the [response extractor](https://github.com/grafana/loki/blob/b78d3f05525d8bcab13e621bc2e5851aadc8fc91/pkg/querier/queryrange/log_result_cache.go#L299), unexpectedly resulting in an empty response. For example, if the query was cached for start=15, end=20 and we get a `backwards` query for start=5, end=10. To fill the gap, the query would be executed for start=5, end=15. Now, if we have logs more than the query `limit` in the range 10-15, we would filter out all the data in the response extractor and send back an empty response to the user. This PR fixes the issue by doing the following changes when handling cache hit: * If the cached query completely covers the new query: * return back an empty response[_existing_]. * else if the cached query does not overlap with the new query: * do the new query as requested. * If the new query results in an empty response and has a higher interval than the cached query: * update the cache * else: * query the data for missing intervals on both/either side[_existing_] * update the cache with extended intervals if the new queries resulted in an empty response[_existing_] **Special notes for your reviewer**: We could do further improvements in the handling of queries not overlapping with cached query by selectively extending the queries based on query direction and cached query lying before/after the new query. For example, if the new query is doing `backwards` query and the `cachedQuery.End` < `newQuery.Start`, it should be okay to extend the query and do `cachedQuery.End` to `newQuery.End` to fill the cache since query would first fill the most relevant data before hitting the limits. I did not want to complicate the fix so went without implementing this approach. We can revisit later if we feel we need to improve our caching. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated --------- Co-authored-by: Travis Patterson <travis.patterson@grafana.com>
2 years ago
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/testutil"
"github.com/stretchr/testify/mock"
"github.com/stretchr/testify/require"
"github.com/weaveworks/common/user"
"github.com/grafana/loki/pkg/loghttp"
"github.com/grafana/loki/pkg/logproto"
"github.com/grafana/loki/pkg/logqlmodel/stats"
"github.com/grafana/loki/pkg/querier/queryrange/queryrangebase"
"github.com/grafana/loki/pkg/storage/chunk/cache"
)
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
2 years ago
const (
lblFooBar = `{foo="bar"}`
lblFizzBuzz = `{fizz="buzz"}`
entriesLimit = 1000
)
func Test_LogResultCacheSameRange(t *testing.T) {
var (
ctx = user.InjectOrgID(context.Background(), "foo")
lrc = NewLogResultCache(
log.NewNopLogger(),
fakeLimits{
splits: map[string]time.Duration{"foo": time.Minute},
},
cache.NewMockCache(),
nil,
nil,
nil,
)
)
req := &LokiRequest{
StartTs: time.Unix(0, time.Minute.Nanoseconds()),
EndTs: time.Unix(0, 2*time.Minute.Nanoseconds()),
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
2 years ago
Limit: entriesLimit,
}
fake := newFakeResponse([]mockResponse{
{
RequestResponse: queryrangebase.RequestResponse{
Request: req,
Response: emptyResponse(req),
},
},
})
h := lrc.Wrap(fake)
resp, err := h.Do(ctx, req)
require.NoError(t, err)
require.Equal(t, emptyResponse(req), resp)
resp, err = h.Do(ctx, req)
require.NoError(t, err)
require.Equal(t, emptyResponse(req), resp)
fake.AssertExpectations(t)
}
func Test_LogResultCacheSameRangeNonEmpty(t *testing.T) {
var (
ctx = user.InjectOrgID(context.Background(), "foo")
lrc = NewLogResultCache(
log.NewNopLogger(),
fakeLimits{
splits: map[string]time.Duration{"foo": time.Minute},
},
cache.NewMockCache(),
nil,
nil,
nil,
)
)
req := &LokiRequest{
StartTs: time.Unix(0, time.Minute.Nanoseconds()),
EndTs: time.Unix(0, 2*time.Minute.Nanoseconds()),
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
2 years ago
Limit: entriesLimit,
}
fake := newFakeResponse([]mockResponse{
{
RequestResponse: queryrangebase.RequestResponse{
Request: req,
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
2 years ago
Response: nonEmptyResponse(req, time.Unix(61, 0), time.Unix(61, 0), lblFooBar),
},
},
{
RequestResponse: queryrangebase.RequestResponse{
Request: req,
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
2 years ago
Response: nonEmptyResponse(req, time.Unix(62, 0), time.Unix(62, 0), lblFooBar),
},
},
})
h := lrc.Wrap(fake)
resp, err := h.Do(ctx, req)
require.NoError(t, err)
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
2 years ago
require.Equal(t, nonEmptyResponse(req, time.Unix(61, 0), time.Unix(61, 0), lblFooBar), resp)
resp, err = h.Do(ctx, req)
require.NoError(t, err)
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
2 years ago
require.Equal(t, nonEmptyResponse(req, time.Unix(62, 0), time.Unix(62, 0), lblFooBar), resp)
fake.AssertExpectations(t)
}
func Test_LogResultCacheSmallerRange(t *testing.T) {
var (
ctx = user.InjectOrgID(context.Background(), "foo")
lrc = NewLogResultCache(
log.NewNopLogger(),
fakeLimits{
splits: map[string]time.Duration{"foo": time.Minute},
},
cache.NewMockCache(),
nil,
nil,
nil,
)
)
req := &LokiRequest{
StartTs: time.Unix(0, time.Minute.Nanoseconds()),
EndTs: time.Unix(0, 2*time.Minute.Nanoseconds()),
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
2 years ago
Limit: entriesLimit,
}
fake := newFakeResponse([]mockResponse{
{
RequestResponse: queryrangebase.RequestResponse{
Request: req,
Response: emptyResponse(req),
},
},
})
h := lrc.Wrap(fake)
resp, err := h.Do(ctx, req)
require.NoError(t, err)
require.Equal(t, emptyResponse(req), resp)
resp, err = h.Do(ctx, &LokiRequest{
StartTs: time.Unix(0, time.Minute.Nanoseconds()+30*time.Second.Nanoseconds()),
EndTs: time.Unix(0, 2*time.Minute.Nanoseconds()-30*time.Second.Nanoseconds()),
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
2 years ago
Limit: entriesLimit,
})
require.NoError(t, err)
require.Equal(t, emptyResponse(&LokiRequest{
StartTs: time.Unix(0, time.Minute.Nanoseconds()+30*time.Second.Nanoseconds()),
EndTs: time.Unix(0, 2*time.Minute.Nanoseconds()-30*time.Second.Nanoseconds()),
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
2 years ago
Limit: entriesLimit,
}), resp)
fake.AssertExpectations(t)
}
func Test_LogResultCacheDifferentRange(t *testing.T) {
var (
ctx = user.InjectOrgID(context.Background(), "foo")
lrc = NewLogResultCache(
log.NewNopLogger(),
fakeLimits{
splits: map[string]time.Duration{"foo": time.Minute},
},
cache.NewMockCache(),
nil,
nil,
nil,
)
)
req1 := &LokiRequest{
StartTs: time.Unix(0, time.Minute.Nanoseconds()+30*time.Second.Nanoseconds()),
EndTs: time.Unix(0, 2*time.Minute.Nanoseconds()-30*time.Second.Nanoseconds()),
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
2 years ago
Limit: entriesLimit,
}
req2 := &LokiRequest{
StartTs: time.Unix(0, time.Minute.Nanoseconds()),
EndTs: time.Unix(0, 2*time.Minute.Nanoseconds()),
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
2 years ago
Limit: entriesLimit,
}
fake := newFakeResponse([]mockResponse{
{
RequestResponse: queryrangebase.RequestResponse{
Request: req1,
Response: emptyResponse(req1),
},
},
{
RequestResponse: queryrangebase.RequestResponse{
Request: &LokiRequest{
StartTs: time.Unix(0, time.Minute.Nanoseconds()),
EndTs: time.Unix(0, time.Minute.Nanoseconds()+30*time.Second.Nanoseconds()),
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
2 years ago
Limit: entriesLimit,
},
Response: emptyResponse(&LokiRequest{
StartTs: time.Unix(0, time.Minute.Nanoseconds()),
EndTs: time.Unix(0, time.Minute.Nanoseconds()+30*time.Second.Nanoseconds()),
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
2 years ago
Limit: entriesLimit,
}),
},
},
{
RequestResponse: queryrangebase.RequestResponse{
Request: &LokiRequest{
StartTs: time.Unix(0, 2*time.Minute.Nanoseconds()-30*time.Second.Nanoseconds()),
EndTs: time.Unix(0, 2*time.Minute.Nanoseconds()),
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
2 years ago
Limit: entriesLimit,
},
Response: emptyResponse(&LokiRequest{
StartTs: time.Unix(0, 2*time.Minute.Nanoseconds()-30*time.Second.Nanoseconds()),
EndTs: time.Unix(0, 2*time.Minute.Nanoseconds()),
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
2 years ago
Limit: entriesLimit,
}),
},
},
})
h := lrc.Wrap(fake)
resp, err := h.Do(ctx, req1)
require.NoError(t, err)
require.Equal(t, emptyResponse(req1), resp)
resp, err = h.Do(ctx, req2)
require.NoError(t, err)
require.Equal(t, emptyResponse(req2), resp)
fake.AssertExpectations(t)
}
func Test_LogResultCacheDifferentRangeNonEmpty(t *testing.T) {
var (
ctx = user.InjectOrgID(context.Background(), "foo")
lrc = NewLogResultCache(
log.NewNopLogger(),
fakeLimits{
splits: map[string]time.Duration{"foo": time.Minute},
},
cache.NewMockCache(),
nil,
nil,
nil,
)
)
req1 := &LokiRequest{
StartTs: time.Unix(0, time.Minute.Nanoseconds()+30*time.Second.Nanoseconds()),
EndTs: time.Unix(0, 2*time.Minute.Nanoseconds()-30*time.Second.Nanoseconds()),
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
2 years ago
Limit: entriesLimit,
}
req2 := &LokiRequest{
StartTs: time.Unix(0, time.Minute.Nanoseconds()),
EndTs: time.Unix(0, 2*time.Minute.Nanoseconds()),
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
2 years ago
Limit: entriesLimit,
}
fake := newFakeResponse([]mockResponse{
{
RequestResponse: queryrangebase.RequestResponse{
Request: req1,
Response: emptyResponse(req1),
},
},
{
RequestResponse: queryrangebase.RequestResponse{
Request: &LokiRequest{
StartTs: time.Unix(0, time.Minute.Nanoseconds()),
EndTs: time.Unix(0, time.Minute.Nanoseconds()+30*time.Second.Nanoseconds()),
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
2 years ago
Limit: entriesLimit,
},
Response: nonEmptyResponse(&LokiRequest{
StartTs: time.Unix(0, time.Minute.Nanoseconds()),
EndTs: time.Unix(0, time.Minute.Nanoseconds()+30*time.Second.Nanoseconds()),
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
2 years ago
Limit: entriesLimit,
}, time.Unix(61, 0), time.Unix(61, 0), lblFooBar),
},
},
{
RequestResponse: queryrangebase.RequestResponse{
Request: &LokiRequest{
StartTs: time.Unix(0, 2*time.Minute.Nanoseconds()-30*time.Second.Nanoseconds()),
EndTs: time.Unix(0, 2*time.Minute.Nanoseconds()),
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
2 years ago
Limit: entriesLimit,
},
Response: nonEmptyResponse(&LokiRequest{
StartTs: time.Unix(0, 2*time.Minute.Nanoseconds()-30*time.Second.Nanoseconds()),
EndTs: time.Unix(0, 2*time.Minute.Nanoseconds()),
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
2 years ago
Limit: entriesLimit,
}, time.Unix(62, 0), time.Unix(62, 0), lblFooBar),
},
},
})
h := lrc.Wrap(fake)
resp, err := h.Do(ctx, req1)
require.NoError(t, err)
require.Equal(t, emptyResponse(req1), resp)
resp, err = h.Do(ctx, req2)
require.NoError(t, err)
require.Equal(t, mergeLokiResponse(
nonEmptyResponse(&LokiRequest{
StartTs: time.Unix(0, 2*time.Minute.Nanoseconds()-30*time.Second.Nanoseconds()),
EndTs: time.Unix(0, 2*time.Minute.Nanoseconds()),
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
2 years ago
Limit: entriesLimit,
}, time.Unix(62, 0), time.Unix(62, 0), lblFooBar),
nonEmptyResponse(&LokiRequest{
StartTs: time.Unix(0, time.Minute.Nanoseconds()),
EndTs: time.Unix(0, time.Minute.Nanoseconds()+30*time.Second.Nanoseconds()),
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
2 years ago
Limit: entriesLimit,
}, time.Unix(61, 0), time.Unix(61, 0), lblFooBar),
), resp)
fake.AssertExpectations(t)
}
func Test_LogResultCacheDifferentRangeNonEmptyAndEmpty(t *testing.T) {
var (
ctx = user.InjectOrgID(context.Background(), "foo")
lrc = NewLogResultCache(
log.NewNopLogger(),
fakeLimits{
splits: map[string]time.Duration{"foo": time.Minute},
},
cache.NewMockCache(),
nil,
nil,
nil,
)
)
req1 := &LokiRequest{
StartTs: time.Unix(0, time.Minute.Nanoseconds()+30*time.Second.Nanoseconds()),
EndTs: time.Unix(0, 2*time.Minute.Nanoseconds()-30*time.Second.Nanoseconds()),
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
2 years ago
Limit: entriesLimit,
}
req2 := &LokiRequest{
StartTs: time.Unix(0, time.Minute.Nanoseconds()),
EndTs: time.Unix(0, 2*time.Minute.Nanoseconds()),
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
2 years ago
Limit: entriesLimit,
}
fake := newFakeResponse([]mockResponse{
{
RequestResponse: queryrangebase.RequestResponse{
Request: req1,
Response: emptyResponse(req1),
},
},
{
RequestResponse: queryrangebase.RequestResponse{
Request: &LokiRequest{
StartTs: time.Unix(0, time.Minute.Nanoseconds()),
EndTs: time.Unix(0, time.Minute.Nanoseconds()+30*time.Second.Nanoseconds()),
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
2 years ago
Limit: entriesLimit,
},
Response: emptyResponse(&LokiRequest{
StartTs: time.Unix(0, time.Minute.Nanoseconds()),
EndTs: time.Unix(0, time.Minute.Nanoseconds()+30*time.Second.Nanoseconds()),
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
2 years ago
Limit: entriesLimit,
}),
},
},
{
RequestResponse: queryrangebase.RequestResponse{
Request: &LokiRequest{
StartTs: time.Unix(0, 2*time.Minute.Nanoseconds()-30*time.Second.Nanoseconds()),
EndTs: time.Unix(0, 2*time.Minute.Nanoseconds()),
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
2 years ago
Limit: entriesLimit,
},
Response: nonEmptyResponse(&LokiRequest{
StartTs: time.Unix(0, 2*time.Minute.Nanoseconds()-30*time.Second.Nanoseconds()),
EndTs: time.Unix(0, 2*time.Minute.Nanoseconds()),
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
2 years ago
Limit: entriesLimit,
}, time.Unix(61, 0), time.Unix(61, 0), lblFooBar),
},
},
// we call it twice
{
RequestResponse: queryrangebase.RequestResponse{
Request: &LokiRequest{
StartTs: time.Unix(0, 2*time.Minute.Nanoseconds()-30*time.Second.Nanoseconds()),
EndTs: time.Unix(0, 2*time.Minute.Nanoseconds()),
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
2 years ago
Limit: entriesLimit,
},
Response: nonEmptyResponse(&LokiRequest{
StartTs: time.Unix(0, 2*time.Minute.Nanoseconds()-30*time.Second.Nanoseconds()),
EndTs: time.Unix(0, 2*time.Minute.Nanoseconds()),
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
2 years ago
Limit: entriesLimit,
}, time.Unix(61, 0), time.Unix(61, 0), lblFooBar),
},
},
})
h := lrc.Wrap(fake)
resp, err := h.Do(ctx, req1)
require.NoError(t, err)
require.Equal(t, emptyResponse(req1), resp)
resp, err = h.Do(ctx, req2)
require.NoError(t, err)
require.Equal(t, mergeLokiResponse(
emptyResponse(req1),
nonEmptyResponse(&LokiRequest{
StartTs: time.Unix(0, time.Minute.Nanoseconds()),
EndTs: time.Unix(0, time.Minute.Nanoseconds()+30*time.Second.Nanoseconds()),
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
2 years ago
Limit: entriesLimit,
}, time.Unix(61, 0), time.Unix(61, 0), lblFooBar),
), resp)
resp, err = h.Do(ctx, req2)
require.NoError(t, err)
require.Equal(t, mergeLokiResponse(
emptyResponse(req1),
nonEmptyResponse(&LokiRequest{
StartTs: time.Unix(0, time.Minute.Nanoseconds()),
EndTs: time.Unix(0, time.Minute.Nanoseconds()+30*time.Second.Nanoseconds()),
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
2 years ago
Limit: entriesLimit,
}, time.Unix(61, 0), time.Unix(61, 0), lblFooBar),
), resp)
fake.AssertExpectations(t)
}
caching: do not try to fill the gap in log results cache when the new query interval does not overlap the cached query interval (#9757) **What this PR does / why we need it**: Currently, when we find a relevant cached negative response for a logs query, we do the following: * If the cached query completely covers the new query: * return back an empty response. * else: * fill the gaps on either/both sides of the cached query. The problem with filling the gaps is that when the cached query does not overlap at all with the new query, we have to extend the query beyond what the query requests for. However, with the logs query, we have a limit on the number of lines we can send back in the response. So, this could result in the query response having logs which were not requested by the query, which then get filtered out by the [response extractor](https://github.com/grafana/loki/blob/b78d3f05525d8bcab13e621bc2e5851aadc8fc91/pkg/querier/queryrange/log_result_cache.go#L299), unexpectedly resulting in an empty response. For example, if the query was cached for start=15, end=20 and we get a `backwards` query for start=5, end=10. To fill the gap, the query would be executed for start=5, end=15. Now, if we have logs more than the query `limit` in the range 10-15, we would filter out all the data in the response extractor and send back an empty response to the user. This PR fixes the issue by doing the following changes when handling cache hit: * If the cached query completely covers the new query: * return back an empty response[_existing_]. * else if the cached query does not overlap with the new query: * do the new query as requested. * If the new query results in an empty response and has a higher interval than the cached query: * update the cache * else: * query the data for missing intervals on both/either side[_existing_] * update the cache with extended intervals if the new queries resulted in an empty response[_existing_] **Special notes for your reviewer**: We could do further improvements in the handling of queries not overlapping with cached query by selectively extending the queries based on query direction and cached query lying before/after the new query. For example, if the new query is doing `backwards` query and the `cachedQuery.End` < `newQuery.Start`, it should be okay to extend the query and do `cachedQuery.End` to `newQuery.End` to fill the cache since query would first fill the most relevant data before hitting the limits. I did not want to complicate the fix so went without implementing this approach. We can revisit later if we feel we need to improve our caching. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated --------- Co-authored-by: Travis Patterson <travis.patterson@grafana.com>
2 years ago
// Test_LogResultNonOverlappingCache tests the scenario where the cached query does not overlap with the new request
func Test_LogResultNonOverlappingCache(t *testing.T) {
metrics := NewLogResultCacheMetrics(prometheus.NewPedanticRegistry())
mockCache := cache.NewMockCache()
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
2 years ago
var (
ctx = user.InjectOrgID(context.Background(), "foo")
lrc = NewLogResultCache(
log.NewNopLogger(),
fakeLimits{
splits: map[string]time.Duration{"foo": time.Minute},
},
caching: do not try to fill the gap in log results cache when the new query interval does not overlap the cached query interval (#9757) **What this PR does / why we need it**: Currently, when we find a relevant cached negative response for a logs query, we do the following: * If the cached query completely covers the new query: * return back an empty response. * else: * fill the gaps on either/both sides of the cached query. The problem with filling the gaps is that when the cached query does not overlap at all with the new query, we have to extend the query beyond what the query requests for. However, with the logs query, we have a limit on the number of lines we can send back in the response. So, this could result in the query response having logs which were not requested by the query, which then get filtered out by the [response extractor](https://github.com/grafana/loki/blob/b78d3f05525d8bcab13e621bc2e5851aadc8fc91/pkg/querier/queryrange/log_result_cache.go#L299), unexpectedly resulting in an empty response. For example, if the query was cached for start=15, end=20 and we get a `backwards` query for start=5, end=10. To fill the gap, the query would be executed for start=5, end=15. Now, if we have logs more than the query `limit` in the range 10-15, we would filter out all the data in the response extractor and send back an empty response to the user. This PR fixes the issue by doing the following changes when handling cache hit: * If the cached query completely covers the new query: * return back an empty response[_existing_]. * else if the cached query does not overlap with the new query: * do the new query as requested. * If the new query results in an empty response and has a higher interval than the cached query: * update the cache * else: * query the data for missing intervals on both/either side[_existing_] * update the cache with extended intervals if the new queries resulted in an empty response[_existing_] **Special notes for your reviewer**: We could do further improvements in the handling of queries not overlapping with cached query by selectively extending the queries based on query direction and cached query lying before/after the new query. For example, if the new query is doing `backwards` query and the `cachedQuery.End` < `newQuery.Start`, it should be okay to extend the query and do `cachedQuery.End` to `newQuery.End` to fill the cache since query would first fill the most relevant data before hitting the limits. I did not want to complicate the fix so went without implementing this approach. We can revisit later if we feel we need to improve our caching. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated --------- Co-authored-by: Travis Patterson <travis.patterson@grafana.com>
2 years ago
mockCache,
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
2 years ago
nil,
nil,
caching: do not try to fill the gap in log results cache when the new query interval does not overlap the cached query interval (#9757) **What this PR does / why we need it**: Currently, when we find a relevant cached negative response for a logs query, we do the following: * If the cached query completely covers the new query: * return back an empty response. * else: * fill the gaps on either/both sides of the cached query. The problem with filling the gaps is that when the cached query does not overlap at all with the new query, we have to extend the query beyond what the query requests for. However, with the logs query, we have a limit on the number of lines we can send back in the response. So, this could result in the query response having logs which were not requested by the query, which then get filtered out by the [response extractor](https://github.com/grafana/loki/blob/b78d3f05525d8bcab13e621bc2e5851aadc8fc91/pkg/querier/queryrange/log_result_cache.go#L299), unexpectedly resulting in an empty response. For example, if the query was cached for start=15, end=20 and we get a `backwards` query for start=5, end=10. To fill the gap, the query would be executed for start=5, end=15. Now, if we have logs more than the query `limit` in the range 10-15, we would filter out all the data in the response extractor and send back an empty response to the user. This PR fixes the issue by doing the following changes when handling cache hit: * If the cached query completely covers the new query: * return back an empty response[_existing_]. * else if the cached query does not overlap with the new query: * do the new query as requested. * If the new query results in an empty response and has a higher interval than the cached query: * update the cache * else: * query the data for missing intervals on both/either side[_existing_] * update the cache with extended intervals if the new queries resulted in an empty response[_existing_] **Special notes for your reviewer**: We could do further improvements in the handling of queries not overlapping with cached query by selectively extending the queries based on query direction and cached query lying before/after the new query. For example, if the new query is doing `backwards` query and the `cachedQuery.End` < `newQuery.Start`, it should be okay to extend the query and do `cachedQuery.End` to `newQuery.End` to fill the cache since query would first fill the most relevant data before hitting the limits. I did not want to complicate the fix so went without implementing this approach. We can revisit later if we feel we need to improve our caching. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated --------- Co-authored-by: Travis Patterson <travis.patterson@grafana.com>
2 years ago
metrics,
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
2 years ago
)
)
caching: do not try to fill the gap in log results cache when the new query interval does not overlap the cached query interval (#9757) **What this PR does / why we need it**: Currently, when we find a relevant cached negative response for a logs query, we do the following: * If the cached query completely covers the new query: * return back an empty response. * else: * fill the gaps on either/both sides of the cached query. The problem with filling the gaps is that when the cached query does not overlap at all with the new query, we have to extend the query beyond what the query requests for. However, with the logs query, we have a limit on the number of lines we can send back in the response. So, this could result in the query response having logs which were not requested by the query, which then get filtered out by the [response extractor](https://github.com/grafana/loki/blob/b78d3f05525d8bcab13e621bc2e5851aadc8fc91/pkg/querier/queryrange/log_result_cache.go#L299), unexpectedly resulting in an empty response. For example, if the query was cached for start=15, end=20 and we get a `backwards` query for start=5, end=10. To fill the gap, the query would be executed for start=5, end=15. Now, if we have logs more than the query `limit` in the range 10-15, we would filter out all the data in the response extractor and send back an empty response to the user. This PR fixes the issue by doing the following changes when handling cache hit: * If the cached query completely covers the new query: * return back an empty response[_existing_]. * else if the cached query does not overlap with the new query: * do the new query as requested. * If the new query results in an empty response and has a higher interval than the cached query: * update the cache * else: * query the data for missing intervals on both/either side[_existing_] * update the cache with extended intervals if the new queries resulted in an empty response[_existing_] **Special notes for your reviewer**: We could do further improvements in the handling of queries not overlapping with cached query by selectively extending the queries based on query direction and cached query lying before/after the new query. For example, if the new query is doing `backwards` query and the `cachedQuery.End` < `newQuery.Start`, it should be okay to extend the query and do `cachedQuery.End` to `newQuery.End` to fill the cache since query would first fill the most relevant data before hitting the limits. I did not want to complicate the fix so went without implementing this approach. We can revisit later if we feel we need to improve our caching. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated --------- Co-authored-by: Travis Patterson <travis.patterson@grafana.com>
2 years ago
checkCacheMetrics := func(expectedHits, expectedMisses int) {
require.Equal(t, float64(expectedHits), testutil.ToFloat64(metrics.CacheHit))
require.Equal(t, float64(expectedMisses), testutil.ToFloat64(metrics.CacheMiss))
}
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
2 years ago
// data requested for just 1 sec, resulting in empty response
req1 := &LokiRequest{
StartTs: time.Unix(0, time.Minute.Nanoseconds()+30*time.Second.Nanoseconds()),
EndTs: time.Unix(0, time.Minute.Nanoseconds()+31*time.Second.Nanoseconds()),
Limit: entriesLimit,
}
caching: do not try to fill the gap in log results cache when the new query interval does not overlap the cached query interval (#9757) **What this PR does / why we need it**: Currently, when we find a relevant cached negative response for a logs query, we do the following: * If the cached query completely covers the new query: * return back an empty response. * else: * fill the gaps on either/both sides of the cached query. The problem with filling the gaps is that when the cached query does not overlap at all with the new query, we have to extend the query beyond what the query requests for. However, with the logs query, we have a limit on the number of lines we can send back in the response. So, this could result in the query response having logs which were not requested by the query, which then get filtered out by the [response extractor](https://github.com/grafana/loki/blob/b78d3f05525d8bcab13e621bc2e5851aadc8fc91/pkg/querier/queryrange/log_result_cache.go#L299), unexpectedly resulting in an empty response. For example, if the query was cached for start=15, end=20 and we get a `backwards` query for start=5, end=10. To fill the gap, the query would be executed for start=5, end=15. Now, if we have logs more than the query `limit` in the range 10-15, we would filter out all the data in the response extractor and send back an empty response to the user. This PR fixes the issue by doing the following changes when handling cache hit: * If the cached query completely covers the new query: * return back an empty response[_existing_]. * else if the cached query does not overlap with the new query: * do the new query as requested. * If the new query results in an empty response and has a higher interval than the cached query: * update the cache * else: * query the data for missing intervals on both/either side[_existing_] * update the cache with extended intervals if the new queries resulted in an empty response[_existing_] **Special notes for your reviewer**: We could do further improvements in the handling of queries not overlapping with cached query by selectively extending the queries based on query direction and cached query lying before/after the new query. For example, if the new query is doing `backwards` query and the `cachedQuery.End` < `newQuery.Start`, it should be okay to extend the query and do `cachedQuery.End` to `newQuery.End` to fill the cache since query would first fill the most relevant data before hitting the limits. I did not want to complicate the fix so went without implementing this approach. We can revisit later if we feel we need to improve our caching. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated --------- Co-authored-by: Travis Patterson <travis.patterson@grafana.com>
2 years ago
// data requested for just 1 sec(non-overlapping), resulting in empty response
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
2 years ago
req2 := &LokiRequest{
caching: do not try to fill the gap in log results cache when the new query interval does not overlap the cached query interval (#9757) **What this PR does / why we need it**: Currently, when we find a relevant cached negative response for a logs query, we do the following: * If the cached query completely covers the new query: * return back an empty response. * else: * fill the gaps on either/both sides of the cached query. The problem with filling the gaps is that when the cached query does not overlap at all with the new query, we have to extend the query beyond what the query requests for. However, with the logs query, we have a limit on the number of lines we can send back in the response. So, this could result in the query response having logs which were not requested by the query, which then get filtered out by the [response extractor](https://github.com/grafana/loki/blob/b78d3f05525d8bcab13e621bc2e5851aadc8fc91/pkg/querier/queryrange/log_result_cache.go#L299), unexpectedly resulting in an empty response. For example, if the query was cached for start=15, end=20 and we get a `backwards` query for start=5, end=10. To fill the gap, the query would be executed for start=5, end=15. Now, if we have logs more than the query `limit` in the range 10-15, we would filter out all the data in the response extractor and send back an empty response to the user. This PR fixes the issue by doing the following changes when handling cache hit: * If the cached query completely covers the new query: * return back an empty response[_existing_]. * else if the cached query does not overlap with the new query: * do the new query as requested. * If the new query results in an empty response and has a higher interval than the cached query: * update the cache * else: * query the data for missing intervals on both/either side[_existing_] * update the cache with extended intervals if the new queries resulted in an empty response[_existing_] **Special notes for your reviewer**: We could do further improvements in the handling of queries not overlapping with cached query by selectively extending the queries based on query direction and cached query lying before/after the new query. For example, if the new query is doing `backwards` query and the `cachedQuery.End` < `newQuery.Start`, it should be okay to extend the query and do `cachedQuery.End` to `newQuery.End` to fill the cache since query would first fill the most relevant data before hitting the limits. I did not want to complicate the fix so went without implementing this approach. We can revisit later if we feel we need to improve our caching. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated --------- Co-authored-by: Travis Patterson <travis.patterson@grafana.com>
2 years ago
StartTs: time.Unix(0, time.Minute.Nanoseconds()+24*time.Second.Nanoseconds()),
EndTs: time.Unix(0, time.Minute.Nanoseconds()+25*time.Second.Nanoseconds()),
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
2 years ago
Limit: entriesLimit,
}
caching: do not try to fill the gap in log results cache when the new query interval does not overlap the cached query interval (#9757) **What this PR does / why we need it**: Currently, when we find a relevant cached negative response for a logs query, we do the following: * If the cached query completely covers the new query: * return back an empty response. * else: * fill the gaps on either/both sides of the cached query. The problem with filling the gaps is that when the cached query does not overlap at all with the new query, we have to extend the query beyond what the query requests for. However, with the logs query, we have a limit on the number of lines we can send back in the response. So, this could result in the query response having logs which were not requested by the query, which then get filtered out by the [response extractor](https://github.com/grafana/loki/blob/b78d3f05525d8bcab13e621bc2e5851aadc8fc91/pkg/querier/queryrange/log_result_cache.go#L299), unexpectedly resulting in an empty response. For example, if the query was cached for start=15, end=20 and we get a `backwards` query for start=5, end=10. To fill the gap, the query would be executed for start=5, end=15. Now, if we have logs more than the query `limit` in the range 10-15, we would filter out all the data in the response extractor and send back an empty response to the user. This PR fixes the issue by doing the following changes when handling cache hit: * If the cached query completely covers the new query: * return back an empty response[_existing_]. * else if the cached query does not overlap with the new query: * do the new query as requested. * If the new query results in an empty response and has a higher interval than the cached query: * update the cache * else: * query the data for missing intervals on both/either side[_existing_] * update the cache with extended intervals if the new queries resulted in an empty response[_existing_] **Special notes for your reviewer**: We could do further improvements in the handling of queries not overlapping with cached query by selectively extending the queries based on query direction and cached query lying before/after the new query. For example, if the new query is doing `backwards` query and the `cachedQuery.End` < `newQuery.Start`, it should be okay to extend the query and do `cachedQuery.End` to `newQuery.End` to fill the cache since query would first fill the most relevant data before hitting the limits. I did not want to complicate the fix so went without implementing this approach. We can revisit later if we feel we need to improve our caching. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated --------- Co-authored-by: Travis Patterson <travis.patterson@grafana.com>
2 years ago
// data requested for larger interval than req1(overlapping with req2), returns empty response
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
2 years ago
req3 := &LokiRequest{
caching: do not try to fill the gap in log results cache when the new query interval does not overlap the cached query interval (#9757) **What this PR does / why we need it**: Currently, when we find a relevant cached negative response for a logs query, we do the following: * If the cached query completely covers the new query: * return back an empty response. * else: * fill the gaps on either/both sides of the cached query. The problem with filling the gaps is that when the cached query does not overlap at all with the new query, we have to extend the query beyond what the query requests for. However, with the logs query, we have a limit on the number of lines we can send back in the response. So, this could result in the query response having logs which were not requested by the query, which then get filtered out by the [response extractor](https://github.com/grafana/loki/blob/b78d3f05525d8bcab13e621bc2e5851aadc8fc91/pkg/querier/queryrange/log_result_cache.go#L299), unexpectedly resulting in an empty response. For example, if the query was cached for start=15, end=20 and we get a `backwards` query for start=5, end=10. To fill the gap, the query would be executed for start=5, end=15. Now, if we have logs more than the query `limit` in the range 10-15, we would filter out all the data in the response extractor and send back an empty response to the user. This PR fixes the issue by doing the following changes when handling cache hit: * If the cached query completely covers the new query: * return back an empty response[_existing_]. * else if the cached query does not overlap with the new query: * do the new query as requested. * If the new query results in an empty response and has a higher interval than the cached query: * update the cache * else: * query the data for missing intervals on both/either side[_existing_] * update the cache with extended intervals if the new queries resulted in an empty response[_existing_] **Special notes for your reviewer**: We could do further improvements in the handling of queries not overlapping with cached query by selectively extending the queries based on query direction and cached query lying before/after the new query. For example, if the new query is doing `backwards` query and the `cachedQuery.End` < `newQuery.Start`, it should be okay to extend the query and do `cachedQuery.End` to `newQuery.End` to fill the cache since query would first fill the most relevant data before hitting the limits. I did not want to complicate the fix so went without implementing this approach. We can revisit later if we feel we need to improve our caching. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated --------- Co-authored-by: Travis Patterson <travis.patterson@grafana.com>
2 years ago
StartTs: time.Unix(0, time.Minute.Nanoseconds()+24*time.Second.Nanoseconds()),
EndTs: time.Unix(0, time.Minute.Nanoseconds()+29*time.Second.Nanoseconds()),
Limit: entriesLimit,
}
// data requested for larger interval than req3(non-overlapping), returns non-empty response
req4 := &LokiRequest{
StartTs: time.Unix(0, time.Minute.Nanoseconds()+10*time.Second.Nanoseconds()),
EndTs: time.Unix(0, time.Minute.Nanoseconds()+20*time.Second.Nanoseconds()),
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
2 years ago
Limit: entriesLimit,
}
fake := newFakeResponse([]mockResponse{
{
RequestResponse: queryrangebase.RequestResponse{
Request: req1,
Response: emptyResponse(req1),
},
},
caching: do not try to fill the gap in log results cache when the new query interval does not overlap the cached query interval (#9757) **What this PR does / why we need it**: Currently, when we find a relevant cached negative response for a logs query, we do the following: * If the cached query completely covers the new query: * return back an empty response. * else: * fill the gaps on either/both sides of the cached query. The problem with filling the gaps is that when the cached query does not overlap at all with the new query, we have to extend the query beyond what the query requests for. However, with the logs query, we have a limit on the number of lines we can send back in the response. So, this could result in the query response having logs which were not requested by the query, which then get filtered out by the [response extractor](https://github.com/grafana/loki/blob/b78d3f05525d8bcab13e621bc2e5851aadc8fc91/pkg/querier/queryrange/log_result_cache.go#L299), unexpectedly resulting in an empty response. For example, if the query was cached for start=15, end=20 and we get a `backwards` query for start=5, end=10. To fill the gap, the query would be executed for start=5, end=15. Now, if we have logs more than the query `limit` in the range 10-15, we would filter out all the data in the response extractor and send back an empty response to the user. This PR fixes the issue by doing the following changes when handling cache hit: * If the cached query completely covers the new query: * return back an empty response[_existing_]. * else if the cached query does not overlap with the new query: * do the new query as requested. * If the new query results in an empty response and has a higher interval than the cached query: * update the cache * else: * query the data for missing intervals on both/either side[_existing_] * update the cache with extended intervals if the new queries resulted in an empty response[_existing_] **Special notes for your reviewer**: We could do further improvements in the handling of queries not overlapping with cached query by selectively extending the queries based on query direction and cached query lying before/after the new query. For example, if the new query is doing `backwards` query and the `cachedQuery.End` < `newQuery.Start`, it should be okay to extend the query and do `cachedQuery.End` to `newQuery.End` to fill the cache since query would first fill the most relevant data before hitting the limits. I did not want to complicate the fix so went without implementing this approach. We can revisit later if we feel we need to improve our caching. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated --------- Co-authored-by: Travis Patterson <travis.patterson@grafana.com>
2 years ago
// req2 should do query for just its query range and should not update the cache
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
2 years ago
{
RequestResponse: queryrangebase.RequestResponse{
Request: &LokiRequest{
caching: do not try to fill the gap in log results cache when the new query interval does not overlap the cached query interval (#9757) **What this PR does / why we need it**: Currently, when we find a relevant cached negative response for a logs query, we do the following: * If the cached query completely covers the new query: * return back an empty response. * else: * fill the gaps on either/both sides of the cached query. The problem with filling the gaps is that when the cached query does not overlap at all with the new query, we have to extend the query beyond what the query requests for. However, with the logs query, we have a limit on the number of lines we can send back in the response. So, this could result in the query response having logs which were not requested by the query, which then get filtered out by the [response extractor](https://github.com/grafana/loki/blob/b78d3f05525d8bcab13e621bc2e5851aadc8fc91/pkg/querier/queryrange/log_result_cache.go#L299), unexpectedly resulting in an empty response. For example, if the query was cached for start=15, end=20 and we get a `backwards` query for start=5, end=10. To fill the gap, the query would be executed for start=5, end=15. Now, if we have logs more than the query `limit` in the range 10-15, we would filter out all the data in the response extractor and send back an empty response to the user. This PR fixes the issue by doing the following changes when handling cache hit: * If the cached query completely covers the new query: * return back an empty response[_existing_]. * else if the cached query does not overlap with the new query: * do the new query as requested. * If the new query results in an empty response and has a higher interval than the cached query: * update the cache * else: * query the data for missing intervals on both/either side[_existing_] * update the cache with extended intervals if the new queries resulted in an empty response[_existing_] **Special notes for your reviewer**: We could do further improvements in the handling of queries not overlapping with cached query by selectively extending the queries based on query direction and cached query lying before/after the new query. For example, if the new query is doing `backwards` query and the `cachedQuery.End` < `newQuery.Start`, it should be okay to extend the query and do `cachedQuery.End` to `newQuery.End` to fill the cache since query would first fill the most relevant data before hitting the limits. I did not want to complicate the fix so went without implementing this approach. We can revisit later if we feel we need to improve our caching. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated --------- Co-authored-by: Travis Patterson <travis.patterson@grafana.com>
2 years ago
StartTs: time.Unix(0, time.Minute.Nanoseconds()+24*time.Second.Nanoseconds()),
EndTs: time.Unix(0, time.Minute.Nanoseconds()+25*time.Second.Nanoseconds()),
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
2 years ago
Limit: entriesLimit,
},
caching: do not try to fill the gap in log results cache when the new query interval does not overlap the cached query interval (#9757) **What this PR does / why we need it**: Currently, when we find a relevant cached negative response for a logs query, we do the following: * If the cached query completely covers the new query: * return back an empty response. * else: * fill the gaps on either/both sides of the cached query. The problem with filling the gaps is that when the cached query does not overlap at all with the new query, we have to extend the query beyond what the query requests for. However, with the logs query, we have a limit on the number of lines we can send back in the response. So, this could result in the query response having logs which were not requested by the query, which then get filtered out by the [response extractor](https://github.com/grafana/loki/blob/b78d3f05525d8bcab13e621bc2e5851aadc8fc91/pkg/querier/queryrange/log_result_cache.go#L299), unexpectedly resulting in an empty response. For example, if the query was cached for start=15, end=20 and we get a `backwards` query for start=5, end=10. To fill the gap, the query would be executed for start=5, end=15. Now, if we have logs more than the query `limit` in the range 10-15, we would filter out all the data in the response extractor and send back an empty response to the user. This PR fixes the issue by doing the following changes when handling cache hit: * If the cached query completely covers the new query: * return back an empty response[_existing_]. * else if the cached query does not overlap with the new query: * do the new query as requested. * If the new query results in an empty response and has a higher interval than the cached query: * update the cache * else: * query the data for missing intervals on both/either side[_existing_] * update the cache with extended intervals if the new queries resulted in an empty response[_existing_] **Special notes for your reviewer**: We could do further improvements in the handling of queries not overlapping with cached query by selectively extending the queries based on query direction and cached query lying before/after the new query. For example, if the new query is doing `backwards` query and the `cachedQuery.End` < `newQuery.Start`, it should be okay to extend the query and do `cachedQuery.End` to `newQuery.End` to fill the cache since query would first fill the most relevant data before hitting the limits. I did not want to complicate the fix so went without implementing this approach. We can revisit later if we feel we need to improve our caching. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated --------- Co-authored-by: Travis Patterson <travis.patterson@grafana.com>
2 years ago
Response: emptyResponse(&LokiRequest{
StartTs: time.Unix(0, time.Minute.Nanoseconds()+24*time.Second.Nanoseconds()),
EndTs: time.Unix(0, time.Minute.Nanoseconds()+25*time.Second.Nanoseconds()),
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
2 years ago
Limit: entriesLimit,
caching: do not try to fill the gap in log results cache when the new query interval does not overlap the cached query interval (#9757) **What this PR does / why we need it**: Currently, when we find a relevant cached negative response for a logs query, we do the following: * If the cached query completely covers the new query: * return back an empty response. * else: * fill the gaps on either/both sides of the cached query. The problem with filling the gaps is that when the cached query does not overlap at all with the new query, we have to extend the query beyond what the query requests for. However, with the logs query, we have a limit on the number of lines we can send back in the response. So, this could result in the query response having logs which were not requested by the query, which then get filtered out by the [response extractor](https://github.com/grafana/loki/blob/b78d3f05525d8bcab13e621bc2e5851aadc8fc91/pkg/querier/queryrange/log_result_cache.go#L299), unexpectedly resulting in an empty response. For example, if the query was cached for start=15, end=20 and we get a `backwards` query for start=5, end=10. To fill the gap, the query would be executed for start=5, end=15. Now, if we have logs more than the query `limit` in the range 10-15, we would filter out all the data in the response extractor and send back an empty response to the user. This PR fixes the issue by doing the following changes when handling cache hit: * If the cached query completely covers the new query: * return back an empty response[_existing_]. * else if the cached query does not overlap with the new query: * do the new query as requested. * If the new query results in an empty response and has a higher interval than the cached query: * update the cache * else: * query the data for missing intervals on both/either side[_existing_] * update the cache with extended intervals if the new queries resulted in an empty response[_existing_] **Special notes for your reviewer**: We could do further improvements in the handling of queries not overlapping with cached query by selectively extending the queries based on query direction and cached query lying before/after the new query. For example, if the new query is doing `backwards` query and the `cachedQuery.End` < `newQuery.Start`, it should be okay to extend the query and do `cachedQuery.End` to `newQuery.End` to fill the cache since query would first fill the most relevant data before hitting the limits. I did not want to complicate the fix so went without implementing this approach. We can revisit later if we feel we need to improve our caching. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated --------- Co-authored-by: Travis Patterson <travis.patterson@grafana.com>
2 years ago
}),
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
2 years ago
},
},
caching: do not try to fill the gap in log results cache when the new query interval does not overlap the cached query interval (#9757) **What this PR does / why we need it**: Currently, when we find a relevant cached negative response for a logs query, we do the following: * If the cached query completely covers the new query: * return back an empty response. * else: * fill the gaps on either/both sides of the cached query. The problem with filling the gaps is that when the cached query does not overlap at all with the new query, we have to extend the query beyond what the query requests for. However, with the logs query, we have a limit on the number of lines we can send back in the response. So, this could result in the query response having logs which were not requested by the query, which then get filtered out by the [response extractor](https://github.com/grafana/loki/blob/b78d3f05525d8bcab13e621bc2e5851aadc8fc91/pkg/querier/queryrange/log_result_cache.go#L299), unexpectedly resulting in an empty response. For example, if the query was cached for start=15, end=20 and we get a `backwards` query for start=5, end=10. To fill the gap, the query would be executed for start=5, end=15. Now, if we have logs more than the query `limit` in the range 10-15, we would filter out all the data in the response extractor and send back an empty response to the user. This PR fixes the issue by doing the following changes when handling cache hit: * If the cached query completely covers the new query: * return back an empty response[_existing_]. * else if the cached query does not overlap with the new query: * do the new query as requested. * If the new query results in an empty response and has a higher interval than the cached query: * update the cache * else: * query the data for missing intervals on both/either side[_existing_] * update the cache with extended intervals if the new queries resulted in an empty response[_existing_] **Special notes for your reviewer**: We could do further improvements in the handling of queries not overlapping with cached query by selectively extending the queries based on query direction and cached query lying before/after the new query. For example, if the new query is doing `backwards` query and the `cachedQuery.End` < `newQuery.Start`, it should be okay to extend the query and do `cachedQuery.End` to `newQuery.End` to fill the cache since query would first fill the most relevant data before hitting the limits. I did not want to complicate the fix so went without implementing this approach. We can revisit later if we feel we need to improve our caching. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated --------- Co-authored-by: Travis Patterson <travis.patterson@grafana.com>
2 years ago
// req3 should do query for just its query range and should update the cache
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
2 years ago
{
RequestResponse: queryrangebase.RequestResponse{
Request: &LokiRequest{
caching: do not try to fill the gap in log results cache when the new query interval does not overlap the cached query interval (#9757) **What this PR does / why we need it**: Currently, when we find a relevant cached negative response for a logs query, we do the following: * If the cached query completely covers the new query: * return back an empty response. * else: * fill the gaps on either/both sides of the cached query. The problem with filling the gaps is that when the cached query does not overlap at all with the new query, we have to extend the query beyond what the query requests for. However, with the logs query, we have a limit on the number of lines we can send back in the response. So, this could result in the query response having logs which were not requested by the query, which then get filtered out by the [response extractor](https://github.com/grafana/loki/blob/b78d3f05525d8bcab13e621bc2e5851aadc8fc91/pkg/querier/queryrange/log_result_cache.go#L299), unexpectedly resulting in an empty response. For example, if the query was cached for start=15, end=20 and we get a `backwards` query for start=5, end=10. To fill the gap, the query would be executed for start=5, end=15. Now, if we have logs more than the query `limit` in the range 10-15, we would filter out all the data in the response extractor and send back an empty response to the user. This PR fixes the issue by doing the following changes when handling cache hit: * If the cached query completely covers the new query: * return back an empty response[_existing_]. * else if the cached query does not overlap with the new query: * do the new query as requested. * If the new query results in an empty response and has a higher interval than the cached query: * update the cache * else: * query the data for missing intervals on both/either side[_existing_] * update the cache with extended intervals if the new queries resulted in an empty response[_existing_] **Special notes for your reviewer**: We could do further improvements in the handling of queries not overlapping with cached query by selectively extending the queries based on query direction and cached query lying before/after the new query. For example, if the new query is doing `backwards` query and the `cachedQuery.End` < `newQuery.Start`, it should be okay to extend the query and do `cachedQuery.End` to `newQuery.End` to fill the cache since query would first fill the most relevant data before hitting the limits. I did not want to complicate the fix so went without implementing this approach. We can revisit later if we feel we need to improve our caching. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated --------- Co-authored-by: Travis Patterson <travis.patterson@grafana.com>
2 years ago
StartTs: time.Unix(0, time.Minute.Nanoseconds()+24*time.Second.Nanoseconds()),
EndTs: time.Unix(0, time.Minute.Nanoseconds()+29*time.Second.Nanoseconds()),
Limit: entriesLimit,
},
Response: emptyResponse(&LokiRequest{
StartTs: time.Unix(0, time.Minute.Nanoseconds()+24*time.Second.Nanoseconds()),
EndTs: time.Unix(0, time.Minute.Nanoseconds()+29*time.Second.Nanoseconds()),
Limit: entriesLimit,
}),
},
},
// req4 should do query for its query range. Data would be non-empty so cache should not be updated
{
RequestResponse: queryrangebase.RequestResponse{
Request: &LokiRequest{
StartTs: time.Unix(0, time.Minute.Nanoseconds()+10*time.Second.Nanoseconds()),
EndTs: time.Unix(0, time.Minute.Nanoseconds()+20*time.Second.Nanoseconds()),
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
2 years ago
Limit: entriesLimit,
},
Response: nonEmptyResponse(&LokiRequest{
caching: do not try to fill the gap in log results cache when the new query interval does not overlap the cached query interval (#9757) **What this PR does / why we need it**: Currently, when we find a relevant cached negative response for a logs query, we do the following: * If the cached query completely covers the new query: * return back an empty response. * else: * fill the gaps on either/both sides of the cached query. The problem with filling the gaps is that when the cached query does not overlap at all with the new query, we have to extend the query beyond what the query requests for. However, with the logs query, we have a limit on the number of lines we can send back in the response. So, this could result in the query response having logs which were not requested by the query, which then get filtered out by the [response extractor](https://github.com/grafana/loki/blob/b78d3f05525d8bcab13e621bc2e5851aadc8fc91/pkg/querier/queryrange/log_result_cache.go#L299), unexpectedly resulting in an empty response. For example, if the query was cached for start=15, end=20 and we get a `backwards` query for start=5, end=10. To fill the gap, the query would be executed for start=5, end=15. Now, if we have logs more than the query `limit` in the range 10-15, we would filter out all the data in the response extractor and send back an empty response to the user. This PR fixes the issue by doing the following changes when handling cache hit: * If the cached query completely covers the new query: * return back an empty response[_existing_]. * else if the cached query does not overlap with the new query: * do the new query as requested. * If the new query results in an empty response and has a higher interval than the cached query: * update the cache * else: * query the data for missing intervals on both/either side[_existing_] * update the cache with extended intervals if the new queries resulted in an empty response[_existing_] **Special notes for your reviewer**: We could do further improvements in the handling of queries not overlapping with cached query by selectively extending the queries based on query direction and cached query lying before/after the new query. For example, if the new query is doing `backwards` query and the `cachedQuery.End` < `newQuery.Start`, it should be okay to extend the query and do `cachedQuery.End` to `newQuery.End` to fill the cache since query would first fill the most relevant data before hitting the limits. I did not want to complicate the fix so went without implementing this approach. We can revisit later if we feel we need to improve our caching. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated --------- Co-authored-by: Travis Patterson <travis.patterson@grafana.com>
2 years ago
StartTs: time.Unix(0, time.Minute.Nanoseconds()+10*time.Second.Nanoseconds()),
EndTs: time.Unix(0, time.Minute.Nanoseconds()+20*time.Second.Nanoseconds()),
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
2 years ago
Limit: entriesLimit,
caching: do not try to fill the gap in log results cache when the new query interval does not overlap the cached query interval (#9757) **What this PR does / why we need it**: Currently, when we find a relevant cached negative response for a logs query, we do the following: * If the cached query completely covers the new query: * return back an empty response. * else: * fill the gaps on either/both sides of the cached query. The problem with filling the gaps is that when the cached query does not overlap at all with the new query, we have to extend the query beyond what the query requests for. However, with the logs query, we have a limit on the number of lines we can send back in the response. So, this could result in the query response having logs which were not requested by the query, which then get filtered out by the [response extractor](https://github.com/grafana/loki/blob/b78d3f05525d8bcab13e621bc2e5851aadc8fc91/pkg/querier/queryrange/log_result_cache.go#L299), unexpectedly resulting in an empty response. For example, if the query was cached for start=15, end=20 and we get a `backwards` query for start=5, end=10. To fill the gap, the query would be executed for start=5, end=15. Now, if we have logs more than the query `limit` in the range 10-15, we would filter out all the data in the response extractor and send back an empty response to the user. This PR fixes the issue by doing the following changes when handling cache hit: * If the cached query completely covers the new query: * return back an empty response[_existing_]. * else if the cached query does not overlap with the new query: * do the new query as requested. * If the new query results in an empty response and has a higher interval than the cached query: * update the cache * else: * query the data for missing intervals on both/either side[_existing_] * update the cache with extended intervals if the new queries resulted in an empty response[_existing_] **Special notes for your reviewer**: We could do further improvements in the handling of queries not overlapping with cached query by selectively extending the queries based on query direction and cached query lying before/after the new query. For example, if the new query is doing `backwards` query and the `cachedQuery.End` < `newQuery.Start`, it should be okay to extend the query and do `cachedQuery.End` to `newQuery.End` to fill the cache since query would first fill the most relevant data before hitting the limits. I did not want to complicate the fix so went without implementing this approach. We can revisit later if we feel we need to improve our caching. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated --------- Co-authored-by: Travis Patterson <travis.patterson@grafana.com>
2 years ago
}, time.Unix(71, 0), time.Unix(79, 0), lblFooBar),
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
2 years ago
},
},
})
h := lrc.Wrap(fake)
resp, err := h.Do(ctx, req1)
require.NoError(t, err)
require.Equal(t, emptyResponse(req1), resp)
caching: do not try to fill the gap in log results cache when the new query interval does not overlap the cached query interval (#9757) **What this PR does / why we need it**: Currently, when we find a relevant cached negative response for a logs query, we do the following: * If the cached query completely covers the new query: * return back an empty response. * else: * fill the gaps on either/both sides of the cached query. The problem with filling the gaps is that when the cached query does not overlap at all with the new query, we have to extend the query beyond what the query requests for. However, with the logs query, we have a limit on the number of lines we can send back in the response. So, this could result in the query response having logs which were not requested by the query, which then get filtered out by the [response extractor](https://github.com/grafana/loki/blob/b78d3f05525d8bcab13e621bc2e5851aadc8fc91/pkg/querier/queryrange/log_result_cache.go#L299), unexpectedly resulting in an empty response. For example, if the query was cached for start=15, end=20 and we get a `backwards` query for start=5, end=10. To fill the gap, the query would be executed for start=5, end=15. Now, if we have logs more than the query `limit` in the range 10-15, we would filter out all the data in the response extractor and send back an empty response to the user. This PR fixes the issue by doing the following changes when handling cache hit: * If the cached query completely covers the new query: * return back an empty response[_existing_]. * else if the cached query does not overlap with the new query: * do the new query as requested. * If the new query results in an empty response and has a higher interval than the cached query: * update the cache * else: * query the data for missing intervals on both/either side[_existing_] * update the cache with extended intervals if the new queries resulted in an empty response[_existing_] **Special notes for your reviewer**: We could do further improvements in the handling of queries not overlapping with cached query by selectively extending the queries based on query direction and cached query lying before/after the new query. For example, if the new query is doing `backwards` query and the `cachedQuery.End` < `newQuery.Start`, it should be okay to extend the query and do `cachedQuery.End` to `newQuery.End` to fill the cache since query would first fill the most relevant data before hitting the limits. I did not want to complicate the fix so went without implementing this approach. We can revisit later if we feel we need to improve our caching. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated --------- Co-authored-by: Travis Patterson <travis.patterson@grafana.com>
2 years ago
checkCacheMetrics(0, 1)
require.Equal(t, 1, mockCache.NumKeyUpdates())
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
2 years ago
caching: do not try to fill the gap in log results cache when the new query interval does not overlap the cached query interval (#9757) **What this PR does / why we need it**: Currently, when we find a relevant cached negative response for a logs query, we do the following: * If the cached query completely covers the new query: * return back an empty response. * else: * fill the gaps on either/both sides of the cached query. The problem with filling the gaps is that when the cached query does not overlap at all with the new query, we have to extend the query beyond what the query requests for. However, with the logs query, we have a limit on the number of lines we can send back in the response. So, this could result in the query response having logs which were not requested by the query, which then get filtered out by the [response extractor](https://github.com/grafana/loki/blob/b78d3f05525d8bcab13e621bc2e5851aadc8fc91/pkg/querier/queryrange/log_result_cache.go#L299), unexpectedly resulting in an empty response. For example, if the query was cached for start=15, end=20 and we get a `backwards` query for start=5, end=10. To fill the gap, the query would be executed for start=5, end=15. Now, if we have logs more than the query `limit` in the range 10-15, we would filter out all the data in the response extractor and send back an empty response to the user. This PR fixes the issue by doing the following changes when handling cache hit: * If the cached query completely covers the new query: * return back an empty response[_existing_]. * else if the cached query does not overlap with the new query: * do the new query as requested. * If the new query results in an empty response and has a higher interval than the cached query: * update the cache * else: * query the data for missing intervals on both/either side[_existing_] * update the cache with extended intervals if the new queries resulted in an empty response[_existing_] **Special notes for your reviewer**: We could do further improvements in the handling of queries not overlapping with cached query by selectively extending the queries based on query direction and cached query lying before/after the new query. For example, if the new query is doing `backwards` query and the `cachedQuery.End` < `newQuery.Start`, it should be okay to extend the query and do `cachedQuery.End` to `newQuery.End` to fill the cache since query would first fill the most relevant data before hitting the limits. I did not want to complicate the fix so went without implementing this approach. We can revisit later if we feel we need to improve our caching. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated --------- Co-authored-by: Travis Patterson <travis.patterson@grafana.com>
2 years ago
// req2 should not update the cache since it has same length as previously cached query
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
2 years ago
resp, err = h.Do(ctx, req2)
require.NoError(t, err)
caching: do not try to fill the gap in log results cache when the new query interval does not overlap the cached query interval (#9757) **What this PR does / why we need it**: Currently, when we find a relevant cached negative response for a logs query, we do the following: * If the cached query completely covers the new query: * return back an empty response. * else: * fill the gaps on either/both sides of the cached query. The problem with filling the gaps is that when the cached query does not overlap at all with the new query, we have to extend the query beyond what the query requests for. However, with the logs query, we have a limit on the number of lines we can send back in the response. So, this could result in the query response having logs which were not requested by the query, which then get filtered out by the [response extractor](https://github.com/grafana/loki/blob/b78d3f05525d8bcab13e621bc2e5851aadc8fc91/pkg/querier/queryrange/log_result_cache.go#L299), unexpectedly resulting in an empty response. For example, if the query was cached for start=15, end=20 and we get a `backwards` query for start=5, end=10. To fill the gap, the query would be executed for start=5, end=15. Now, if we have logs more than the query `limit` in the range 10-15, we would filter out all the data in the response extractor and send back an empty response to the user. This PR fixes the issue by doing the following changes when handling cache hit: * If the cached query completely covers the new query: * return back an empty response[_existing_]. * else if the cached query does not overlap with the new query: * do the new query as requested. * If the new query results in an empty response and has a higher interval than the cached query: * update the cache * else: * query the data for missing intervals on both/either side[_existing_] * update the cache with extended intervals if the new queries resulted in an empty response[_existing_] **Special notes for your reviewer**: We could do further improvements in the handling of queries not overlapping with cached query by selectively extending the queries based on query direction and cached query lying before/after the new query. For example, if the new query is doing `backwards` query and the `cachedQuery.End` < `newQuery.Start`, it should be okay to extend the query and do `cachedQuery.End` to `newQuery.End` to fill the cache since query would first fill the most relevant data before hitting the limits. I did not want to complicate the fix so went without implementing this approach. We can revisit later if we feel we need to improve our caching. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated --------- Co-authored-by: Travis Patterson <travis.patterson@grafana.com>
2 years ago
require.Equal(t, emptyResponse(req2), resp)
checkCacheMetrics(1, 1)
require.Equal(t, 1, mockCache.NumKeyUpdates())
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
2 years ago
caching: do not try to fill the gap in log results cache when the new query interval does not overlap the cached query interval (#9757) **What this PR does / why we need it**: Currently, when we find a relevant cached negative response for a logs query, we do the following: * If the cached query completely covers the new query: * return back an empty response. * else: * fill the gaps on either/both sides of the cached query. The problem with filling the gaps is that when the cached query does not overlap at all with the new query, we have to extend the query beyond what the query requests for. However, with the logs query, we have a limit on the number of lines we can send back in the response. So, this could result in the query response having logs which were not requested by the query, which then get filtered out by the [response extractor](https://github.com/grafana/loki/blob/b78d3f05525d8bcab13e621bc2e5851aadc8fc91/pkg/querier/queryrange/log_result_cache.go#L299), unexpectedly resulting in an empty response. For example, if the query was cached for start=15, end=20 and we get a `backwards` query for start=5, end=10. To fill the gap, the query would be executed for start=5, end=15. Now, if we have logs more than the query `limit` in the range 10-15, we would filter out all the data in the response extractor and send back an empty response to the user. This PR fixes the issue by doing the following changes when handling cache hit: * If the cached query completely covers the new query: * return back an empty response[_existing_]. * else if the cached query does not overlap with the new query: * do the new query as requested. * If the new query results in an empty response and has a higher interval than the cached query: * update the cache * else: * query the data for missing intervals on both/either side[_existing_] * update the cache with extended intervals if the new queries resulted in an empty response[_existing_] **Special notes for your reviewer**: We could do further improvements in the handling of queries not overlapping with cached query by selectively extending the queries based on query direction and cached query lying before/after the new query. For example, if the new query is doing `backwards` query and the `cachedQuery.End` < `newQuery.Start`, it should be okay to extend the query and do `cachedQuery.End` to `newQuery.End` to fill the cache since query would first fill the most relevant data before hitting the limits. I did not want to complicate the fix so went without implementing this approach. We can revisit later if we feel we need to improve our caching. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated --------- Co-authored-by: Travis Patterson <travis.patterson@grafana.com>
2 years ago
// req3 should update the cache since it has larger length than previously cached query
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
2 years ago
resp, err = h.Do(ctx, req3)
require.NoError(t, err)
caching: do not try to fill the gap in log results cache when the new query interval does not overlap the cached query interval (#9757) **What this PR does / why we need it**: Currently, when we find a relevant cached negative response for a logs query, we do the following: * If the cached query completely covers the new query: * return back an empty response. * else: * fill the gaps on either/both sides of the cached query. The problem with filling the gaps is that when the cached query does not overlap at all with the new query, we have to extend the query beyond what the query requests for. However, with the logs query, we have a limit on the number of lines we can send back in the response. So, this could result in the query response having logs which were not requested by the query, which then get filtered out by the [response extractor](https://github.com/grafana/loki/blob/b78d3f05525d8bcab13e621bc2e5851aadc8fc91/pkg/querier/queryrange/log_result_cache.go#L299), unexpectedly resulting in an empty response. For example, if the query was cached for start=15, end=20 and we get a `backwards` query for start=5, end=10. To fill the gap, the query would be executed for start=5, end=15. Now, if we have logs more than the query `limit` in the range 10-15, we would filter out all the data in the response extractor and send back an empty response to the user. This PR fixes the issue by doing the following changes when handling cache hit: * If the cached query completely covers the new query: * return back an empty response[_existing_]. * else if the cached query does not overlap with the new query: * do the new query as requested. * If the new query results in an empty response and has a higher interval than the cached query: * update the cache * else: * query the data for missing intervals on both/either side[_existing_] * update the cache with extended intervals if the new queries resulted in an empty response[_existing_] **Special notes for your reviewer**: We could do further improvements in the handling of queries not overlapping with cached query by selectively extending the queries based on query direction and cached query lying before/after the new query. For example, if the new query is doing `backwards` query and the `cachedQuery.End` < `newQuery.Start`, it should be okay to extend the query and do `cachedQuery.End` to `newQuery.End` to fill the cache since query would first fill the most relevant data before hitting the limits. I did not want to complicate the fix so went without implementing this approach. We can revisit later if we feel we need to improve our caching. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated --------- Co-authored-by: Travis Patterson <travis.patterson@grafana.com>
2 years ago
require.Equal(t, emptyResponse(req3), resp)
checkCacheMetrics(2, 1)
require.Equal(t, 2, mockCache.NumKeyUpdates())
// req4 returns non-empty response so it should not update the cache
resp, err = h.Do(ctx, req4)
require.NoError(t, err)
require.Equal(t, nonEmptyResponse(req4, time.Unix(71, 0), time.Unix(79, 0), lblFooBar), resp)
checkCacheMetrics(3, 1)
require.Equal(t, 2, mockCache.NumKeyUpdates())
// req2 should return back empty response from the cache, without updating the cache
resp, err = h.Do(ctx, req2)
require.NoError(t, err)
require.Equal(t, emptyResponse(req2), resp)
checkCacheMetrics(4, 1)
require.Equal(t, 2, mockCache.NumKeyUpdates())
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
2 years ago
fake.AssertExpectations(t)
}
func TestExtractLokiResponse(t *testing.T) {
for _, tc := range []struct {
name string
resp *LokiResponse
extractFrom time.Time
extractThrough time.Time
expectedResp *LokiResponse
}{
{
name: "nothing to extract",
resp: &LokiResponse{},
extractFrom: time.Unix(0, 0),
extractThrough: time.Unix(0, 1),
expectedResp: &LokiResponse{
Data: LokiData{Result: logproto.Streams{}},
},
},
{
name: "extract interval within response",
resp: mergeLokiResponse(
nonEmptyResponse(&LokiRequest{Limit: entriesLimit}, time.Unix(0, 0), time.Unix(10, 0), lblFooBar),
nonEmptyResponse(&LokiRequest{Limit: entriesLimit}, time.Unix(2, 0), time.Unix(8, 0), lblFizzBuzz),
),
extractFrom: time.Unix(4, 0),
extractThrough: time.Unix(7, 0),
expectedResp: mergeLokiResponse(
nonEmptyResponse(&LokiRequest{Limit: entriesLimit}, time.Unix(4, 0), time.Unix(6, 0), lblFooBar),
nonEmptyResponse(&LokiRequest{Limit: entriesLimit}, time.Unix(4, 0), time.Unix(6, 0), lblFizzBuzz),
),
},
{
name: "extract part of response in the beginning",
resp: mergeLokiResponse(
nonEmptyResponse(&LokiRequest{Limit: entriesLimit}, time.Unix(0, 0), time.Unix(10, 0), lblFooBar),
nonEmptyResponse(&LokiRequest{Limit: entriesLimit}, time.Unix(2, 0), time.Unix(8, 0), lblFizzBuzz),
nonEmptyResponse(&LokiRequest{Limit: entriesLimit}, time.Unix(5, 0), time.Unix(8, 0), `{not="included"}`),
),
extractFrom: time.Unix(0, 0),
extractThrough: time.Unix(4, 0),
expectedResp: mergeLokiResponse(
nonEmptyResponse(&LokiRequest{Limit: entriesLimit}, time.Unix(0, 0), time.Unix(3, 0), lblFooBar),
nonEmptyResponse(&LokiRequest{Limit: entriesLimit}, time.Unix(2, 0), time.Unix(3, 0), lblFizzBuzz),
&LokiResponse{},
),
},
{
name: "extract part of response in the end",
resp: mergeLokiResponse(
nonEmptyResponse(&LokiRequest{Limit: entriesLimit}, time.Unix(0, 0), time.Unix(10, 0), lblFooBar),
nonEmptyResponse(&LokiRequest{Limit: entriesLimit}, time.Unix(2, 0), time.Unix(8, 0), lblFizzBuzz),
nonEmptyResponse(&LokiRequest{Limit: entriesLimit}, time.Unix(0, 0), time.Unix(2, 0), `{not="included"}`),
),
extractFrom: time.Unix(4, 0),
extractThrough: time.Unix(12, 0),
expectedResp: mergeLokiResponse(
nonEmptyResponse(&LokiRequest{Limit: entriesLimit}, time.Unix(4, 0), time.Unix(10, 0), lblFooBar),
nonEmptyResponse(&LokiRequest{Limit: entriesLimit}, time.Unix(4, 0), time.Unix(8, 0), lblFizzBuzz),
&LokiResponse{},
),
},
{
name: "extract interval out of data range",
resp: mergeLokiResponse(
nonEmptyResponse(&LokiRequest{Limit: entriesLimit}, time.Unix(0, 0), time.Unix(10, 0), lblFooBar),
nonEmptyResponse(&LokiRequest{Limit: entriesLimit}, time.Unix(2, 0), time.Unix(8, 0), lblFizzBuzz),
nonEmptyResponse(&LokiRequest{Limit: entriesLimit}, time.Unix(0, 0), time.Unix(2, 0), `{not="included"}`),
),
extractFrom: time.Unix(50, 0),
extractThrough: time.Unix(52, 0),
expectedResp: mergeLokiResponse(
// empty responses here are to avoid failing test due to difference in count of subqueries in query stats
&LokiResponse{Limit: entriesLimit},
&LokiResponse{Limit: entriesLimit},
&LokiResponse{Limit: entriesLimit},
),
},
} {
t.Run(tc.name, func(t *testing.T) {
require.Equal(t, tc.expectedResp, extractLokiResponse(tc.extractFrom, tc.extractThrough, tc.resp))
})
}
}
type fakeResponse struct {
*mock.Mock
}
type mockResponse struct {
queryrangebase.RequestResponse
err error
}
func newFakeResponse(responses []mockResponse) fakeResponse {
m := &mock.Mock{}
for _, r := range responses {
m.On("Do", mock.Anything, r.Request).Return(r.Response, r.err).Once()
}
return fakeResponse{
Mock: m,
}
}
func (f fakeResponse) Do(ctx context.Context, r queryrangebase.Request) (queryrangebase.Response, error) {
var (
resp queryrangebase.Response
err error
args = f.Mock.Called(ctx, r)
)
if args.Get(0) != nil {
resp = args.Get(0).(queryrangebase.Response)
}
if args.Get(1) != nil {
err = args.Get(1).(error)
}
return resp, err
}
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
2 years ago
// nonEmptyResponse builds a response from [start, end] with 1s step.
func nonEmptyResponse(lokiReq *LokiRequest, start, end time.Time, labels string) *LokiResponse {
r := &LokiResponse{
Status: loghttp.QueryStatusSuccess,
Statistics: stats.Result{},
Direction: lokiReq.Direction,
Limit: lokiReq.Limit,
Version: uint32(loghttp.GetVersion(lokiReq.Path)),
Data: LokiData{
ResultType: loghttp.ResultTypeStream,
Result: []logproto.Stream{
{
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
2 years ago
Labels: labels,
},
},
},
}
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
2 years ago
for ; !start.After(end); start = start.Add(time.Second) {
r.Data.Result[0].Entries = append(r.Data.Result[0].Entries, logproto.Entry{
Timestamp: start,
Line: fmt.Sprintf("%d", start.Unix()),
})
}
return r
}