Like Prometheus, but for logs.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
loki/pkg/querier/queryrange/log_result_cache_test.go

672 lines
20 KiB

package queryrange
import (
"context"
"fmt"
"testing"
"time"
"github.com/go-kit/log"
"github.com/stretchr/testify/mock"
"github.com/stretchr/testify/require"
"github.com/weaveworks/common/user"
"github.com/grafana/loki/pkg/loghttp"
"github.com/grafana/loki/pkg/logproto"
"github.com/grafana/loki/pkg/logqlmodel/stats"
"github.com/grafana/loki/pkg/querier/queryrange/queryrangebase"
"github.com/grafana/loki/pkg/storage/chunk/cache"
)
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
3 years ago
const (
lblFooBar = `{foo="bar"}`
lblFizzBuzz = `{fizz="buzz"}`
entriesLimit = 1000
)
func Test_LogResultCacheSameRange(t *testing.T) {
var (
ctx = user.InjectOrgID(context.Background(), "foo")
lrc = NewLogResultCache(
log.NewNopLogger(),
fakeLimits{
splits: map[string]time.Duration{"foo": time.Minute},
},
cache.NewMockCache(),
nil,
nil,
nil,
)
)
req := &LokiRequest{
StartTs: time.Unix(0, time.Minute.Nanoseconds()),
EndTs: time.Unix(0, 2*time.Minute.Nanoseconds()),
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
3 years ago
Limit: entriesLimit,
}
fake := newFakeResponse([]mockResponse{
{
RequestResponse: queryrangebase.RequestResponse{
Request: req,
Response: emptyResponse(req),
},
},
})
h := lrc.Wrap(fake)
resp, err := h.Do(ctx, req)
require.NoError(t, err)
require.Equal(t, emptyResponse(req), resp)
resp, err = h.Do(ctx, req)
require.NoError(t, err)
require.Equal(t, emptyResponse(req), resp)
fake.AssertExpectations(t)
}
func Test_LogResultCacheSameRangeNonEmpty(t *testing.T) {
var (
ctx = user.InjectOrgID(context.Background(), "foo")
lrc = NewLogResultCache(
log.NewNopLogger(),
fakeLimits{
splits: map[string]time.Duration{"foo": time.Minute},
},
cache.NewMockCache(),
nil,
nil,
nil,
)
)
req := &LokiRequest{
StartTs: time.Unix(0, time.Minute.Nanoseconds()),
EndTs: time.Unix(0, 2*time.Minute.Nanoseconds()),
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
3 years ago
Limit: entriesLimit,
}
fake := newFakeResponse([]mockResponse{
{
RequestResponse: queryrangebase.RequestResponse{
Request: req,
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
3 years ago
Response: nonEmptyResponse(req, time.Unix(61, 0), time.Unix(61, 0), lblFooBar),
},
},
{
RequestResponse: queryrangebase.RequestResponse{
Request: req,
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
3 years ago
Response: nonEmptyResponse(req, time.Unix(62, 0), time.Unix(62, 0), lblFooBar),
},
},
})
h := lrc.Wrap(fake)
resp, err := h.Do(ctx, req)
require.NoError(t, err)
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
3 years ago
require.Equal(t, nonEmptyResponse(req, time.Unix(61, 0), time.Unix(61, 0), lblFooBar), resp)
resp, err = h.Do(ctx, req)
require.NoError(t, err)
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
3 years ago
require.Equal(t, nonEmptyResponse(req, time.Unix(62, 0), time.Unix(62, 0), lblFooBar), resp)
fake.AssertExpectations(t)
}
func Test_LogResultCacheSmallerRange(t *testing.T) {
var (
ctx = user.InjectOrgID(context.Background(), "foo")
lrc = NewLogResultCache(
log.NewNopLogger(),
fakeLimits{
splits: map[string]time.Duration{"foo": time.Minute},
},
cache.NewMockCache(),
nil,
nil,
nil,
)
)
req := &LokiRequest{
StartTs: time.Unix(0, time.Minute.Nanoseconds()),
EndTs: time.Unix(0, 2*time.Minute.Nanoseconds()),
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
3 years ago
Limit: entriesLimit,
}
fake := newFakeResponse([]mockResponse{
{
RequestResponse: queryrangebase.RequestResponse{
Request: req,
Response: emptyResponse(req),
},
},
})
h := lrc.Wrap(fake)
resp, err := h.Do(ctx, req)
require.NoError(t, err)
require.Equal(t, emptyResponse(req), resp)
resp, err = h.Do(ctx, &LokiRequest{
StartTs: time.Unix(0, time.Minute.Nanoseconds()+30*time.Second.Nanoseconds()),
EndTs: time.Unix(0, 2*time.Minute.Nanoseconds()-30*time.Second.Nanoseconds()),
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
3 years ago
Limit: entriesLimit,
})
require.NoError(t, err)
require.Equal(t, emptyResponse(&LokiRequest{
StartTs: time.Unix(0, time.Minute.Nanoseconds()+30*time.Second.Nanoseconds()),
EndTs: time.Unix(0, 2*time.Minute.Nanoseconds()-30*time.Second.Nanoseconds()),
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
3 years ago
Limit: entriesLimit,
}), resp)
fake.AssertExpectations(t)
}
func Test_LogResultCacheDifferentRange(t *testing.T) {
var (
ctx = user.InjectOrgID(context.Background(), "foo")
lrc = NewLogResultCache(
log.NewNopLogger(),
fakeLimits{
splits: map[string]time.Duration{"foo": time.Minute},
},
cache.NewMockCache(),
nil,
nil,
nil,
)
)
req1 := &LokiRequest{
StartTs: time.Unix(0, time.Minute.Nanoseconds()+30*time.Second.Nanoseconds()),
EndTs: time.Unix(0, 2*time.Minute.Nanoseconds()-30*time.Second.Nanoseconds()),
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
3 years ago
Limit: entriesLimit,
}
req2 := &LokiRequest{
StartTs: time.Unix(0, time.Minute.Nanoseconds()),
EndTs: time.Unix(0, 2*time.Minute.Nanoseconds()),
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
3 years ago
Limit: entriesLimit,
}
fake := newFakeResponse([]mockResponse{
{
RequestResponse: queryrangebase.RequestResponse{
Request: req1,
Response: emptyResponse(req1),
},
},
{
RequestResponse: queryrangebase.RequestResponse{
Request: &LokiRequest{
StartTs: time.Unix(0, time.Minute.Nanoseconds()),
EndTs: time.Unix(0, time.Minute.Nanoseconds()+30*time.Second.Nanoseconds()),
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
3 years ago
Limit: entriesLimit,
},
Response: emptyResponse(&LokiRequest{
StartTs: time.Unix(0, time.Minute.Nanoseconds()),
EndTs: time.Unix(0, time.Minute.Nanoseconds()+30*time.Second.Nanoseconds()),
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
3 years ago
Limit: entriesLimit,
}),
},
},
{
RequestResponse: queryrangebase.RequestResponse{
Request: &LokiRequest{
StartTs: time.Unix(0, 2*time.Minute.Nanoseconds()-30*time.Second.Nanoseconds()),
EndTs: time.Unix(0, 2*time.Minute.Nanoseconds()),
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
3 years ago
Limit: entriesLimit,
},
Response: emptyResponse(&LokiRequest{
StartTs: time.Unix(0, 2*time.Minute.Nanoseconds()-30*time.Second.Nanoseconds()),
EndTs: time.Unix(0, 2*time.Minute.Nanoseconds()),
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
3 years ago
Limit: entriesLimit,
}),
},
},
})
h := lrc.Wrap(fake)
resp, err := h.Do(ctx, req1)
require.NoError(t, err)
require.Equal(t, emptyResponse(req1), resp)
resp, err = h.Do(ctx, req2)
require.NoError(t, err)
require.Equal(t, emptyResponse(req2), resp)
fake.AssertExpectations(t)
}
func Test_LogResultCacheDifferentRangeNonEmpty(t *testing.T) {
var (
ctx = user.InjectOrgID(context.Background(), "foo")
lrc = NewLogResultCache(
log.NewNopLogger(),
fakeLimits{
splits: map[string]time.Duration{"foo": time.Minute},
},
cache.NewMockCache(),
nil,
nil,
nil,
)
)
req1 := &LokiRequest{
StartTs: time.Unix(0, time.Minute.Nanoseconds()+30*time.Second.Nanoseconds()),
EndTs: time.Unix(0, 2*time.Minute.Nanoseconds()-30*time.Second.Nanoseconds()),
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
3 years ago
Limit: entriesLimit,
}
req2 := &LokiRequest{
StartTs: time.Unix(0, time.Minute.Nanoseconds()),
EndTs: time.Unix(0, 2*time.Minute.Nanoseconds()),
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
3 years ago
Limit: entriesLimit,
}
fake := newFakeResponse([]mockResponse{
{
RequestResponse: queryrangebase.RequestResponse{
Request: req1,
Response: emptyResponse(req1),
},
},
{
RequestResponse: queryrangebase.RequestResponse{
Request: &LokiRequest{
StartTs: time.Unix(0, time.Minute.Nanoseconds()),
EndTs: time.Unix(0, time.Minute.Nanoseconds()+30*time.Second.Nanoseconds()),
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
3 years ago
Limit: entriesLimit,
},
Response: nonEmptyResponse(&LokiRequest{
StartTs: time.Unix(0, time.Minute.Nanoseconds()),
EndTs: time.Unix(0, time.Minute.Nanoseconds()+30*time.Second.Nanoseconds()),
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
3 years ago
Limit: entriesLimit,
}, time.Unix(61, 0), time.Unix(61, 0), lblFooBar),
},
},
{
RequestResponse: queryrangebase.RequestResponse{
Request: &LokiRequest{
StartTs: time.Unix(0, 2*time.Minute.Nanoseconds()-30*time.Second.Nanoseconds()),
EndTs: time.Unix(0, 2*time.Minute.Nanoseconds()),
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
3 years ago
Limit: entriesLimit,
},
Response: nonEmptyResponse(&LokiRequest{
StartTs: time.Unix(0, 2*time.Minute.Nanoseconds()-30*time.Second.Nanoseconds()),
EndTs: time.Unix(0, 2*time.Minute.Nanoseconds()),
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
3 years ago
Limit: entriesLimit,
}, time.Unix(62, 0), time.Unix(62, 0), lblFooBar),
},
},
})
h := lrc.Wrap(fake)
resp, err := h.Do(ctx, req1)
require.NoError(t, err)
require.Equal(t, emptyResponse(req1), resp)
resp, err = h.Do(ctx, req2)
require.NoError(t, err)
require.Equal(t, mergeLokiResponse(
nonEmptyResponse(&LokiRequest{
StartTs: time.Unix(0, 2*time.Minute.Nanoseconds()-30*time.Second.Nanoseconds()),
EndTs: time.Unix(0, 2*time.Minute.Nanoseconds()),
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
3 years ago
Limit: entriesLimit,
}, time.Unix(62, 0), time.Unix(62, 0), lblFooBar),
nonEmptyResponse(&LokiRequest{
StartTs: time.Unix(0, time.Minute.Nanoseconds()),
EndTs: time.Unix(0, time.Minute.Nanoseconds()+30*time.Second.Nanoseconds()),
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
3 years ago
Limit: entriesLimit,
}, time.Unix(61, 0), time.Unix(61, 0), lblFooBar),
), resp)
fake.AssertExpectations(t)
}
func Test_LogResultCacheDifferentRangeNonEmptyAndEmpty(t *testing.T) {
var (
ctx = user.InjectOrgID(context.Background(), "foo")
lrc = NewLogResultCache(
log.NewNopLogger(),
fakeLimits{
splits: map[string]time.Duration{"foo": time.Minute},
},
cache.NewMockCache(),
nil,
nil,
nil,
)
)
req1 := &LokiRequest{
StartTs: time.Unix(0, time.Minute.Nanoseconds()+30*time.Second.Nanoseconds()),
EndTs: time.Unix(0, 2*time.Minute.Nanoseconds()-30*time.Second.Nanoseconds()),
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
3 years ago
Limit: entriesLimit,
}
req2 := &LokiRequest{
StartTs: time.Unix(0, time.Minute.Nanoseconds()),
EndTs: time.Unix(0, 2*time.Minute.Nanoseconds()),
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
3 years ago
Limit: entriesLimit,
}
fake := newFakeResponse([]mockResponse{
{
RequestResponse: queryrangebase.RequestResponse{
Request: req1,
Response: emptyResponse(req1),
},
},
{
RequestResponse: queryrangebase.RequestResponse{
Request: &LokiRequest{
StartTs: time.Unix(0, time.Minute.Nanoseconds()),
EndTs: time.Unix(0, time.Minute.Nanoseconds()+30*time.Second.Nanoseconds()),
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
3 years ago
Limit: entriesLimit,
},
Response: emptyResponse(&LokiRequest{
StartTs: time.Unix(0, time.Minute.Nanoseconds()),
EndTs: time.Unix(0, time.Minute.Nanoseconds()+30*time.Second.Nanoseconds()),
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
3 years ago
Limit: entriesLimit,
}),
},
},
{
RequestResponse: queryrangebase.RequestResponse{
Request: &LokiRequest{
StartTs: time.Unix(0, 2*time.Minute.Nanoseconds()-30*time.Second.Nanoseconds()),
EndTs: time.Unix(0, 2*time.Minute.Nanoseconds()),
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
3 years ago
Limit: entriesLimit,
},
Response: nonEmptyResponse(&LokiRequest{
StartTs: time.Unix(0, 2*time.Minute.Nanoseconds()-30*time.Second.Nanoseconds()),
EndTs: time.Unix(0, 2*time.Minute.Nanoseconds()),
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
3 years ago
Limit: entriesLimit,
}, time.Unix(61, 0), time.Unix(61, 0), lblFooBar),
},
},
// we call it twice
{
RequestResponse: queryrangebase.RequestResponse{
Request: &LokiRequest{
StartTs: time.Unix(0, 2*time.Minute.Nanoseconds()-30*time.Second.Nanoseconds()),
EndTs: time.Unix(0, 2*time.Minute.Nanoseconds()),
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
3 years ago
Limit: entriesLimit,
},
Response: nonEmptyResponse(&LokiRequest{
StartTs: time.Unix(0, 2*time.Minute.Nanoseconds()-30*time.Second.Nanoseconds()),
EndTs: time.Unix(0, 2*time.Minute.Nanoseconds()),
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
3 years ago
Limit: entriesLimit,
}, time.Unix(61, 0), time.Unix(61, 0), lblFooBar),
},
},
})
h := lrc.Wrap(fake)
resp, err := h.Do(ctx, req1)
require.NoError(t, err)
require.Equal(t, emptyResponse(req1), resp)
resp, err = h.Do(ctx, req2)
require.NoError(t, err)
require.Equal(t, mergeLokiResponse(
emptyResponse(req1),
nonEmptyResponse(&LokiRequest{
StartTs: time.Unix(0, time.Minute.Nanoseconds()),
EndTs: time.Unix(0, time.Minute.Nanoseconds()+30*time.Second.Nanoseconds()),
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
3 years ago
Limit: entriesLimit,
}, time.Unix(61, 0), time.Unix(61, 0), lblFooBar),
), resp)
resp, err = h.Do(ctx, req2)
require.NoError(t, err)
require.Equal(t, mergeLokiResponse(
emptyResponse(req1),
nonEmptyResponse(&LokiRequest{
StartTs: time.Unix(0, time.Minute.Nanoseconds()),
EndTs: time.Unix(0, time.Minute.Nanoseconds()+30*time.Second.Nanoseconds()),
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
3 years ago
Limit: entriesLimit,
}, time.Unix(61, 0), time.Unix(61, 0), lblFooBar),
), resp)
fake.AssertExpectations(t)
}
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
3 years ago
func Test_LogResultFillingGap(t *testing.T) {
var (
ctx = user.InjectOrgID(context.Background(), "foo")
lrc = NewLogResultCache(
log.NewNopLogger(),
fakeLimits{
splits: map[string]time.Duration{"foo": time.Minute},
},
cache.NewMockCache(),
nil,
nil,
nil,
)
)
// data requested for just 1 sec, resulting in empty response
req1 := &LokiRequest{
StartTs: time.Unix(0, time.Minute.Nanoseconds()+30*time.Second.Nanoseconds()),
EndTs: time.Unix(0, time.Minute.Nanoseconds()+31*time.Second.Nanoseconds()),
Limit: entriesLimit,
}
// data requested for just 1 sec, within the same split but couple seconds apart
req2 := &LokiRequest{
StartTs: time.Unix(0, time.Minute.Nanoseconds()+35*time.Second.Nanoseconds()),
EndTs: time.Unix(0, time.Minute.Nanoseconds()+36*time.Second.Nanoseconds()),
Limit: entriesLimit,
}
req3 := &LokiRequest{
StartTs: time.Unix(0, time.Minute.Nanoseconds()+25*time.Second.Nanoseconds()),
EndTs: time.Unix(0, time.Minute.Nanoseconds()+26*time.Second.Nanoseconds()),
Limit: entriesLimit,
}
fake := newFakeResponse([]mockResponse{
{
RequestResponse: queryrangebase.RequestResponse{
Request: req1,
Response: emptyResponse(req1),
},
},
// partial request being made for missing interval at the end
{
RequestResponse: queryrangebase.RequestResponse{
Request: &LokiRequest{
StartTs: time.Unix(0, time.Minute.Nanoseconds()+31*time.Second.Nanoseconds()),
EndTs: time.Unix(0, time.Minute.Nanoseconds()+36*time.Second.Nanoseconds()),
Limit: entriesLimit,
},
Response: nonEmptyResponse(&LokiRequest{
StartTs: time.Unix(0, time.Minute.Nanoseconds()+31*time.Second.Nanoseconds()),
EndTs: time.Unix(0, time.Minute.Nanoseconds()+36*time.Second.Nanoseconds()),
Limit: entriesLimit,
}, time.Unix(31, 0), time.Unix(34, 0), lblFooBar), // data not present for actual query interval i.e req2
},
},
// partial request being made for missing interval at the beginning
{
RequestResponse: queryrangebase.RequestResponse{
Request: &LokiRequest{
StartTs: time.Unix(0, time.Minute.Nanoseconds()+25*time.Second.Nanoseconds()),
EndTs: time.Unix(0, time.Minute.Nanoseconds()+30*time.Second.Nanoseconds()),
Limit: entriesLimit,
},
Response: nonEmptyResponse(&LokiRequest{
StartTs: time.Unix(0, time.Minute.Nanoseconds()+25*time.Second.Nanoseconds()),
EndTs: time.Unix(0, time.Minute.Nanoseconds()+30*time.Second.Nanoseconds()),
Limit: entriesLimit,
}, time.Unix(27, 0), time.Unix(29, 0), lblFooBar), // data not present for actual query interval i.e req3
},
},
})
h := lrc.Wrap(fake)
resp, err := h.Do(ctx, req1)
require.NoError(t, err)
require.Equal(t, emptyResponse(req1), resp)
// although the caching code would request for more data than the actual query, we should have empty response here since we
// do not have any data for the query we made
resp, err = h.Do(ctx, req2)
require.NoError(t, err)
require.Equal(t, mergeLokiResponse(emptyResponse(req1), emptyResponse(req2)), resp)
resp, err = h.Do(ctx, req3)
require.NoError(t, err)
require.Equal(t, mergeLokiResponse(emptyResponse(req1), emptyResponse(req3)), resp)
fake.AssertExpectations(t)
}
func TestExtractLokiResponse(t *testing.T) {
for _, tc := range []struct {
name string
resp *LokiResponse
extractFrom time.Time
extractThrough time.Time
expectedResp *LokiResponse
}{
{
name: "nothing to extract",
resp: &LokiResponse{},
extractFrom: time.Unix(0, 0),
extractThrough: time.Unix(0, 1),
expectedResp: &LokiResponse{
Data: LokiData{Result: logproto.Streams{}},
},
},
{
name: "extract interval within response",
resp: mergeLokiResponse(
nonEmptyResponse(&LokiRequest{Limit: entriesLimit}, time.Unix(0, 0), time.Unix(10, 0), lblFooBar),
nonEmptyResponse(&LokiRequest{Limit: entriesLimit}, time.Unix(2, 0), time.Unix(8, 0), lblFizzBuzz),
),
extractFrom: time.Unix(4, 0),
extractThrough: time.Unix(7, 0),
expectedResp: mergeLokiResponse(
nonEmptyResponse(&LokiRequest{Limit: entriesLimit}, time.Unix(4, 0), time.Unix(6, 0), lblFooBar),
nonEmptyResponse(&LokiRequest{Limit: entriesLimit}, time.Unix(4, 0), time.Unix(6, 0), lblFizzBuzz),
),
},
{
name: "extract part of response in the beginning",
resp: mergeLokiResponse(
nonEmptyResponse(&LokiRequest{Limit: entriesLimit}, time.Unix(0, 0), time.Unix(10, 0), lblFooBar),
nonEmptyResponse(&LokiRequest{Limit: entriesLimit}, time.Unix(2, 0), time.Unix(8, 0), lblFizzBuzz),
nonEmptyResponse(&LokiRequest{Limit: entriesLimit}, time.Unix(5, 0), time.Unix(8, 0), `{not="included"}`),
),
extractFrom: time.Unix(0, 0),
extractThrough: time.Unix(4, 0),
expectedResp: mergeLokiResponse(
nonEmptyResponse(&LokiRequest{Limit: entriesLimit}, time.Unix(0, 0), time.Unix(3, 0), lblFooBar),
nonEmptyResponse(&LokiRequest{Limit: entriesLimit}, time.Unix(2, 0), time.Unix(3, 0), lblFizzBuzz),
&LokiResponse{},
),
},
{
name: "extract part of response in the end",
resp: mergeLokiResponse(
nonEmptyResponse(&LokiRequest{Limit: entriesLimit}, time.Unix(0, 0), time.Unix(10, 0), lblFooBar),
nonEmptyResponse(&LokiRequest{Limit: entriesLimit}, time.Unix(2, 0), time.Unix(8, 0), lblFizzBuzz),
nonEmptyResponse(&LokiRequest{Limit: entriesLimit}, time.Unix(0, 0), time.Unix(2, 0), `{not="included"}`),
),
extractFrom: time.Unix(4, 0),
extractThrough: time.Unix(12, 0),
expectedResp: mergeLokiResponse(
nonEmptyResponse(&LokiRequest{Limit: entriesLimit}, time.Unix(4, 0), time.Unix(10, 0), lblFooBar),
nonEmptyResponse(&LokiRequest{Limit: entriesLimit}, time.Unix(4, 0), time.Unix(8, 0), lblFizzBuzz),
&LokiResponse{},
),
},
{
name: "extract interval out of data range",
resp: mergeLokiResponse(
nonEmptyResponse(&LokiRequest{Limit: entriesLimit}, time.Unix(0, 0), time.Unix(10, 0), lblFooBar),
nonEmptyResponse(&LokiRequest{Limit: entriesLimit}, time.Unix(2, 0), time.Unix(8, 0), lblFizzBuzz),
nonEmptyResponse(&LokiRequest{Limit: entriesLimit}, time.Unix(0, 0), time.Unix(2, 0), `{not="included"}`),
),
extractFrom: time.Unix(50, 0),
extractThrough: time.Unix(52, 0),
expectedResp: mergeLokiResponse(
// empty responses here are to avoid failing test due to difference in count of subqueries in query stats
&LokiResponse{Limit: entriesLimit},
&LokiResponse{Limit: entriesLimit},
&LokiResponse{Limit: entriesLimit},
),
},
} {
t.Run(tc.name, func(t *testing.T) {
require.Equal(t, tc.expectedResp, extractLokiResponse(tc.extractFrom, tc.extractThrough, tc.resp))
})
}
}
type fakeResponse struct {
*mock.Mock
}
type mockResponse struct {
queryrangebase.RequestResponse
err error
}
func newFakeResponse(responses []mockResponse) fakeResponse {
m := &mock.Mock{}
for _, r := range responses {
m.On("Do", mock.Anything, r.Request).Return(r.Response, r.err).Once()
}
return fakeResponse{
Mock: m,
}
}
func (f fakeResponse) Do(ctx context.Context, r queryrangebase.Request) (queryrangebase.Response, error) {
var (
resp queryrangebase.Response
err error
args = f.Mock.Called(ctx, r)
)
if args.Get(0) != nil {
resp = args.Get(0).(queryrangebase.Response)
}
if args.Get(1) != nil {
err = args.Get(1).(error)
}
return resp, err
}
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
3 years ago
// nonEmptyResponse builds a response from [start, end] with 1s step.
func nonEmptyResponse(lokiReq *LokiRequest, start, end time.Time, labels string) *LokiResponse {
r := &LokiResponse{
Status: loghttp.QueryStatusSuccess,
Statistics: stats.Result{},
Direction: lokiReq.Direction,
Limit: lokiReq.Limit,
Version: uint32(loghttp.GetVersion(lokiReq.Path)),
Data: LokiData{
ResultType: loghttp.ResultTypeStream,
Result: []logproto.Stream{
{
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
3 years ago
Labels: labels,
},
},
},
}
fix bugs in logs results caching and its tests (#7925) **What this PR does / why we need it**: When a logs query results in an empty response, we cache it to avoid doing that query again and respond straight away with an empty response. However, we cache a single entry per time split interval with the query itself to keep things simple. For example, if the time split config for tenant `A` is `30m`, then queries for intervals `10m`-`20m` and `21m`-`25m` would have the same cache key. Here is roughly how cache hit is handled: * If the new query is within the cached query bounds, return empty results * If the start of new query is before the start time of the cached query, do a query from `newQuery.Start` to `cachedQuery.Start` * If the response of last query is also empty, set `cachedQuery.Start` = `newQuery.Start` * If the end of new query is after the end time of the cached query, do a query from `cachedQuery.End` to `newQuery.Start` * If the response of last query is also empty, set `cachedQuery.End` = `newQuery.End` * If we have changes in `cachedQuery.Start/End`, update it in the cache. The problem here is when we do queries to fill the gap, we sometimes do queries for the range outside of what the user requested and respond back without reducing the response to what the user requested. For example, if the cached query is from `21m`-`25m` and the user query is from `10m`-`15m`, we will query for the whole gap i.e `10m`-`21m`. If there are logs from `15m`-`21m` in the response, we will unexpectedly send it back to the user. This PR takes care of this issue by extracting the data and sending back only the user's requested logs. I have also found the tests for logs results cache were incorrect. They heavily use [mergeLokiResponse](https://github.com/grafana/loki/blob/e2842c69c5e089389b697faa792e368f42f84c19/pkg/querier/queryrange/codec.go#L1014) for building the test data, but they were not setting the `limit` in test queries which was causing `mergeLokiResponse` to send back empty results. This means were were always comparing two empty results, which would always be the same. This PR also takes care of fixing it, and relevant changes to correct the test. **Checklist** - [x] Tests updated - [x] `CHANGELOG.md` updated
3 years ago
for ; !start.After(end); start = start.Add(time.Second) {
r.Data.Result[0].Entries = append(r.Data.Result[0].Entries, logproto.Entry{
Timestamp: start,
Line: fmt.Sprintf("%d", start.Unix()),
})
}
return r
}