Like Prometheus, but for logs.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
loki/pkg/storage/stores/tsdb/index.go

82 lines
3.5 KiB

package tsdb
import (
"context"
"github.com/prometheus/common/model"
"github.com/prometheus/prometheus/model/labels"
TSDB shipper + WAL (#6049) * begins speccing out TSDB Head * auto incrementing series ref + mempostings * mintime/maxtime methods * tsdb head IndexReader impl * head correctly populates ref lookup * tsdb head tests * adds prometheus license to tsdb head * linting * [WIP] speccing out tsdb head wal * fix length check and adds tsdb wal encoding tests * exposes wal structs & removes closed semantics * logs start time in the tsdb wal * wal interface + testing * exports walrecord + returns ref when appending * specs out head manager * tsdb head manager wal initialization * tsdb wal rotation * wals dont use node name, but tsdb files do * cleans up fn signature * multi tsdb idx now just wraps Index interfaces * no longer sorts indices when creating multi-idx * tenantHeads & HeadManger index impls * head mgr tests * bugfixes & head manager tests * tsdb dir selection now helper fns * period utility * pulls out more code to helpers, fixes some var races * head recovery is more generic * tsdb manager builds from wals * pulls more helpers out of headmanager * lockedIdx, Close() on idx, tsdbManager update * removes mmap from index reader implementation * tsdb file * adds tsdb shipper config and refactors initStore * removes unused tsdbManager code * implements stores.Index and stores.ChunkWriter for tsdb * chunk.Data now supports an Entries() method * moves walreader to new util/wal pkg to avoid circular dep + tsdb storage alignment * tsdb store * passes indexWriter to chunkWriter * build a tsdb per index bucket in according with shipper conventions * dont open tsdb files until necessary for indexshipper * tsdbManager Index impl * tsdb defaults + initStore fix for invalid looping * fixes UsingTSDB helper * disables deleteRequestStore when using TSDB * pass limits to tsdb store * always start headmanager for tsdb Signed-off-by: Owen Diehl <ow.diehl@gmail.com> * fixes copy bug Signed-off-by: Owen Diehl <ow.diehl@gmail.com> * more logging Signed-off-by: Owen Diehl <ow.diehl@gmail.com> * fixes duplicate tenant label bug Signed-off-by: Owen Diehl <ow.diehl@gmail.com> * debug logs, uses label builder, removes __name__=logs for tsdb Signed-off-by: Owen Diehl <ow.diehl@gmail.com> * tsdb fixes labels at earlier pt Signed-off-by: Owen Diehl <ow.diehl@gmail.com> * account for setting tenant label in head manager test * changing tsdb dir names * identifier interface, builder to tsdb pkg * tsdb version path prefix * fixes buildfromwals identifier * fixes tsdb shipper paths * split buckets once per user set * refactors combining single and multi tenant tsdb indices on shipper reads * indexshipper ignores old gzip logic * method name refactor * remove unused record type * removes v1 prefix in tsdb paths and refactores indices method * ignores double optimization in tsdb looking for multitenant idx, shipper handles this * removes 5-ln requirement on shipper tablename regexp * groups identifiers, begins removing multitenant prefix in shipped files * passses open fn to indexshipper * exposes RealByteSlice * TSDBFile no longer needs a file descriptor, parses gzip extensions * method signature fixing * stop masquerading as compressed indices post-download in indexshipper * variable bucket regexp * removes accidental configs committed * label matcher handling for multitenancy and metricname in tsdb * explicitly require fingerprint when creating tsdb index * only add tenant label when creating multitenant tsdb write fingerprints without synthetic tenant label strip out tenant labels from queries * linting + unused removal * more linting :( * goimports * removes uploadername from indexshipper * maxuint32 for arm32 builds * tsdb chunk filterer support * always set ingester name when using object storage index Co-authored-by: Sandeep Sukhani <sandeep.d.sukhani@gmail.com>
3 years ago
"github.com/grafana/loki/pkg/storage/chunk"
"github.com/grafana/loki/pkg/storage/stores/tsdb/index"
)
type Series struct {
Labels labels.Labels
Fingerprint model.Fingerprint
}
type ChunkRef struct {
User string
Fingerprint model.Fingerprint
Start, End model.Time
Checksum uint32
}
// Compares by (Start, End)
// Assumes User is equivalent
func (r ChunkRef) Less(x ChunkRef) bool {
if r.Start != x.Start {
return r.Start < x.Start
}
return r.End <= x.End
}
avoid using bloomfilters for chunks in stats calls by avoiding duplicates (#7209) **What this PR does / why we need it**: Avoid using bloomfilters for chunks deduplication in tsdb `Stats` calls by avoiding fetching duplicate entries. The idea is to split and align queries by [ObjectStorageIndexRequiredPeriod](https://github.com/grafana/loki/blob/61794710a720da135d8479b7c2f723c740e86404/pkg/storage/config/schema_config.go#L47) and make each split process chunks with a start time >= start time of the table interval. In other terms, table interval that contains start time of the chunk, owns it. For e.g. if the table interval is 10s, and we have chunks 5-7, 8-12, 11-13. Query with range 6-15 would be split into 6-10, 10-15. query1 would process chunks 5-7, 8-12 and query2 would process chunks 11-13. This check is not applied for the first split so that we do not eliminate any chunks that overlaps the original query intervals but starts at the previous table. For e.g. if the table interval is 10s, and we have chunks 5-7, 8-13, 14-13. Query with range 11-12 should process chunk 8-13 even though its start time <= start time of table we will query for index. The caveat here is that we will overestimate the data we will be processing if the index is not compacted yet since it could have duplicate chunks when RF > 1. I think it is okay since the Stats call is just an estimation and need not be accurate. Removing all the extra processing saves us quite a bit of CPU and memory, as seen from the benchmark comparison between the two implementations: ``` name old time/op new time/op delta IndexClient_Stats-10 187µs ± 0% 34µs ± 1% -82.00% (p=0.008 n=5+5) name old alloc/op new alloc/op delta IndexClient_Stats-10 61.5kB ± 4% 12.5kB ± 2% -79.69% (p=0.008 n=5+5) name old allocs/op new allocs/op delta IndexClient_Stats-10 1.46k ± 0% 0.48k ± 0% -67.28% (p=0.008 n=5+5) ``` **Checklist** - [x] Tests updated
3 years ago
type shouldIncludeChunk func(index.ChunkMeta) bool
type Index interface {
Bounded
TSDB shipper + WAL (#6049) * begins speccing out TSDB Head * auto incrementing series ref + mempostings * mintime/maxtime methods * tsdb head IndexReader impl * head correctly populates ref lookup * tsdb head tests * adds prometheus license to tsdb head * linting * [WIP] speccing out tsdb head wal * fix length check and adds tsdb wal encoding tests * exposes wal structs & removes closed semantics * logs start time in the tsdb wal * wal interface + testing * exports walrecord + returns ref when appending * specs out head manager * tsdb head manager wal initialization * tsdb wal rotation * wals dont use node name, but tsdb files do * cleans up fn signature * multi tsdb idx now just wraps Index interfaces * no longer sorts indices when creating multi-idx * tenantHeads & HeadManger index impls * head mgr tests * bugfixes & head manager tests * tsdb dir selection now helper fns * period utility * pulls out more code to helpers, fixes some var races * head recovery is more generic * tsdb manager builds from wals * pulls more helpers out of headmanager * lockedIdx, Close() on idx, tsdbManager update * removes mmap from index reader implementation * tsdb file * adds tsdb shipper config and refactors initStore * removes unused tsdbManager code * implements stores.Index and stores.ChunkWriter for tsdb * chunk.Data now supports an Entries() method * moves walreader to new util/wal pkg to avoid circular dep + tsdb storage alignment * tsdb store * passes indexWriter to chunkWriter * build a tsdb per index bucket in according with shipper conventions * dont open tsdb files until necessary for indexshipper * tsdbManager Index impl * tsdb defaults + initStore fix for invalid looping * fixes UsingTSDB helper * disables deleteRequestStore when using TSDB * pass limits to tsdb store * always start headmanager for tsdb Signed-off-by: Owen Diehl <ow.diehl@gmail.com> * fixes copy bug Signed-off-by: Owen Diehl <ow.diehl@gmail.com> * more logging Signed-off-by: Owen Diehl <ow.diehl@gmail.com> * fixes duplicate tenant label bug Signed-off-by: Owen Diehl <ow.diehl@gmail.com> * debug logs, uses label builder, removes __name__=logs for tsdb Signed-off-by: Owen Diehl <ow.diehl@gmail.com> * tsdb fixes labels at earlier pt Signed-off-by: Owen Diehl <ow.diehl@gmail.com> * account for setting tenant label in head manager test * changing tsdb dir names * identifier interface, builder to tsdb pkg * tsdb version path prefix * fixes buildfromwals identifier * fixes tsdb shipper paths * split buckets once per user set * refactors combining single and multi tenant tsdb indices on shipper reads * indexshipper ignores old gzip logic * method name refactor * remove unused record type * removes v1 prefix in tsdb paths and refactores indices method * ignores double optimization in tsdb looking for multitenant idx, shipper handles this * removes 5-ln requirement on shipper tablename regexp * groups identifiers, begins removing multitenant prefix in shipped files * passses open fn to indexshipper * exposes RealByteSlice * TSDBFile no longer needs a file descriptor, parses gzip extensions * method signature fixing * stop masquerading as compressed indices post-download in indexshipper * variable bucket regexp * removes accidental configs committed * label matcher handling for multitenancy and metricname in tsdb * explicitly require fingerprint when creating tsdb index * only add tenant label when creating multitenant tsdb write fingerprints without synthetic tenant label strip out tenant labels from queries * linting + unused removal * more linting :( * goimports * removes uploadername from indexshipper * maxuint32 for arm32 builds * tsdb chunk filterer support * always set ingester name when using object storage index Co-authored-by: Sandeep Sukhani <sandeep.d.sukhani@gmail.com>
3 years ago
SetChunkFilterer(chunkFilter chunk.RequestChunkFilterer)
Close() error
// GetChunkRefs accepts an optional []ChunkRef argument.
// If not nil, it will use that slice to build the result,
// allowing us to avoid unnecessary allocations at the caller's discretion.
// If nil, the underlying index implementation is required
// to build the resulting slice nonetheless (it should not panic),
// ideally by requesting a slice from the pool.
// Shard is also optional. If not nil, TSDB will limit the result to
// the requested shard. If it is nil, TSDB will return all results,
// regardless of shard.
// Note: any shard used must be a valid factor of two, meaning `0_of_2` and `3_of_4` are fine, but `0_of_3` is not.
GetChunkRefs(ctx context.Context, userID string, from, through model.Time, res []ChunkRef, shard *index.ShardAnnotation, matchers ...*labels.Matcher) ([]ChunkRef, error)
// Series follows the same semantics regarding the passed slice and shard as GetChunkRefs.
Series(ctx context.Context, userID string, from, through model.Time, res []Series, shard *index.ShardAnnotation, matchers ...*labels.Matcher) ([]Series, error)
LabelNames(ctx context.Context, userID string, from, through model.Time, matchers ...*labels.Matcher) ([]string, error)
LabelValues(ctx context.Context, userID string, from, through model.Time, name string, matchers ...*labels.Matcher) ([]string, error)
avoid using bloomfilters for chunks in stats calls by avoiding duplicates (#7209) **What this PR does / why we need it**: Avoid using bloomfilters for chunks deduplication in tsdb `Stats` calls by avoiding fetching duplicate entries. The idea is to split and align queries by [ObjectStorageIndexRequiredPeriod](https://github.com/grafana/loki/blob/61794710a720da135d8479b7c2f723c740e86404/pkg/storage/config/schema_config.go#L47) and make each split process chunks with a start time >= start time of the table interval. In other terms, table interval that contains start time of the chunk, owns it. For e.g. if the table interval is 10s, and we have chunks 5-7, 8-12, 11-13. Query with range 6-15 would be split into 6-10, 10-15. query1 would process chunks 5-7, 8-12 and query2 would process chunks 11-13. This check is not applied for the first split so that we do not eliminate any chunks that overlaps the original query intervals but starts at the previous table. For e.g. if the table interval is 10s, and we have chunks 5-7, 8-13, 14-13. Query with range 11-12 should process chunk 8-13 even though its start time <= start time of table we will query for index. The caveat here is that we will overestimate the data we will be processing if the index is not compacted yet since it could have duplicate chunks when RF > 1. I think it is okay since the Stats call is just an estimation and need not be accurate. Removing all the extra processing saves us quite a bit of CPU and memory, as seen from the benchmark comparison between the two implementations: ``` name old time/op new time/op delta IndexClient_Stats-10 187µs ± 0% 34µs ± 1% -82.00% (p=0.008 n=5+5) name old alloc/op new alloc/op delta IndexClient_Stats-10 61.5kB ± 4% 12.5kB ± 2% -79.69% (p=0.008 n=5+5) name old allocs/op new allocs/op delta IndexClient_Stats-10 1.46k ± 0% 0.48k ± 0% -67.28% (p=0.008 n=5+5) ``` **Checklist** - [x] Tests updated
3 years ago
Stats(ctx context.Context, userID string, from, through model.Time, acc IndexStatsAccumulator, shard *index.ShardAnnotation, shouldIncludeChunk shouldIncludeChunk, matchers ...*labels.Matcher) error
}
TSDB shipper + WAL (#6049) * begins speccing out TSDB Head * auto incrementing series ref + mempostings * mintime/maxtime methods * tsdb head IndexReader impl * head correctly populates ref lookup * tsdb head tests * adds prometheus license to tsdb head * linting * [WIP] speccing out tsdb head wal * fix length check and adds tsdb wal encoding tests * exposes wal structs & removes closed semantics * logs start time in the tsdb wal * wal interface + testing * exports walrecord + returns ref when appending * specs out head manager * tsdb head manager wal initialization * tsdb wal rotation * wals dont use node name, but tsdb files do * cleans up fn signature * multi tsdb idx now just wraps Index interfaces * no longer sorts indices when creating multi-idx * tenantHeads & HeadManger index impls * head mgr tests * bugfixes & head manager tests * tsdb dir selection now helper fns * period utility * pulls out more code to helpers, fixes some var races * head recovery is more generic * tsdb manager builds from wals * pulls more helpers out of headmanager * lockedIdx, Close() on idx, tsdbManager update * removes mmap from index reader implementation * tsdb file * adds tsdb shipper config and refactors initStore * removes unused tsdbManager code * implements stores.Index and stores.ChunkWriter for tsdb * chunk.Data now supports an Entries() method * moves walreader to new util/wal pkg to avoid circular dep + tsdb storage alignment * tsdb store * passes indexWriter to chunkWriter * build a tsdb per index bucket in according with shipper conventions * dont open tsdb files until necessary for indexshipper * tsdbManager Index impl * tsdb defaults + initStore fix for invalid looping * fixes UsingTSDB helper * disables deleteRequestStore when using TSDB * pass limits to tsdb store * always start headmanager for tsdb Signed-off-by: Owen Diehl <ow.diehl@gmail.com> * fixes copy bug Signed-off-by: Owen Diehl <ow.diehl@gmail.com> * more logging Signed-off-by: Owen Diehl <ow.diehl@gmail.com> * fixes duplicate tenant label bug Signed-off-by: Owen Diehl <ow.diehl@gmail.com> * debug logs, uses label builder, removes __name__=logs for tsdb Signed-off-by: Owen Diehl <ow.diehl@gmail.com> * tsdb fixes labels at earlier pt Signed-off-by: Owen Diehl <ow.diehl@gmail.com> * account for setting tenant label in head manager test * changing tsdb dir names * identifier interface, builder to tsdb pkg * tsdb version path prefix * fixes buildfromwals identifier * fixes tsdb shipper paths * split buckets once per user set * refactors combining single and multi tenant tsdb indices on shipper reads * indexshipper ignores old gzip logic * method name refactor * remove unused record type * removes v1 prefix in tsdb paths and refactores indices method * ignores double optimization in tsdb looking for multitenant idx, shipper handles this * removes 5-ln requirement on shipper tablename regexp * groups identifiers, begins removing multitenant prefix in shipped files * passses open fn to indexshipper * exposes RealByteSlice * TSDBFile no longer needs a file descriptor, parses gzip extensions * method signature fixing * stop masquerading as compressed indices post-download in indexshipper * variable bucket regexp * removes accidental configs committed * label matcher handling for multitenancy and metricname in tsdb * explicitly require fingerprint when creating tsdb index * only add tenant label when creating multitenant tsdb write fingerprints without synthetic tenant label strip out tenant labels from queries * linting + unused removal * more linting :( * goimports * removes uploadername from indexshipper * maxuint32 for arm32 builds * tsdb chunk filterer support * always set ingester name when using object storage index Co-authored-by: Sandeep Sukhani <sandeep.d.sukhani@gmail.com>
3 years ago
type NoopIndex struct{}
func (NoopIndex) Close() error { return nil }
func (NoopIndex) Bounds() (from, through model.Time) { return }
func (NoopIndex) GetChunkRefs(ctx context.Context, userID string, from, through model.Time, res []ChunkRef, shard *index.ShardAnnotation, matchers ...*labels.Matcher) ([]ChunkRef, error) {
return nil, nil
}
// Series follows the same semantics regarding the passed slice and shard as GetChunkRefs.
func (NoopIndex) Series(ctx context.Context, userID string, from, through model.Time, res []Series, shard *index.ShardAnnotation, matchers ...*labels.Matcher) ([]Series, error) {
return nil, nil
}
func (NoopIndex) LabelNames(ctx context.Context, userID string, from, through model.Time, matchers ...*labels.Matcher) ([]string, error) {
return nil, nil
}
func (NoopIndex) LabelValues(ctx context.Context, userID string, from, through model.Time, name string, matchers ...*labels.Matcher) ([]string, error) {
return nil, nil
}
avoid using bloomfilters for chunks in stats calls by avoiding duplicates (#7209) **What this PR does / why we need it**: Avoid using bloomfilters for chunks deduplication in tsdb `Stats` calls by avoiding fetching duplicate entries. The idea is to split and align queries by [ObjectStorageIndexRequiredPeriod](https://github.com/grafana/loki/blob/61794710a720da135d8479b7c2f723c740e86404/pkg/storage/config/schema_config.go#L47) and make each split process chunks with a start time >= start time of the table interval. In other terms, table interval that contains start time of the chunk, owns it. For e.g. if the table interval is 10s, and we have chunks 5-7, 8-12, 11-13. Query with range 6-15 would be split into 6-10, 10-15. query1 would process chunks 5-7, 8-12 and query2 would process chunks 11-13. This check is not applied for the first split so that we do not eliminate any chunks that overlaps the original query intervals but starts at the previous table. For e.g. if the table interval is 10s, and we have chunks 5-7, 8-13, 14-13. Query with range 11-12 should process chunk 8-13 even though its start time <= start time of table we will query for index. The caveat here is that we will overestimate the data we will be processing if the index is not compacted yet since it could have duplicate chunks when RF > 1. I think it is okay since the Stats call is just an estimation and need not be accurate. Removing all the extra processing saves us quite a bit of CPU and memory, as seen from the benchmark comparison between the two implementations: ``` name old time/op new time/op delta IndexClient_Stats-10 187µs ± 0% 34µs ± 1% -82.00% (p=0.008 n=5+5) name old alloc/op new alloc/op delta IndexClient_Stats-10 61.5kB ± 4% 12.5kB ± 2% -79.69% (p=0.008 n=5+5) name old allocs/op new allocs/op delta IndexClient_Stats-10 1.46k ± 0% 0.48k ± 0% -67.28% (p=0.008 n=5+5) ``` **Checklist** - [x] Tests updated
3 years ago
func (NoopIndex) Stats(ctx context.Context, userID string, from, through model.Time, acc IndexStatsAccumulator, shard *index.ShardAnnotation, shouldIncludeChunk shouldIncludeChunk, matchers ...*labels.Matcher) error {
return nil
}
TSDB shipper + WAL (#6049) * begins speccing out TSDB Head * auto incrementing series ref + mempostings * mintime/maxtime methods * tsdb head IndexReader impl * head correctly populates ref lookup * tsdb head tests * adds prometheus license to tsdb head * linting * [WIP] speccing out tsdb head wal * fix length check and adds tsdb wal encoding tests * exposes wal structs & removes closed semantics * logs start time in the tsdb wal * wal interface + testing * exports walrecord + returns ref when appending * specs out head manager * tsdb head manager wal initialization * tsdb wal rotation * wals dont use node name, but tsdb files do * cleans up fn signature * multi tsdb idx now just wraps Index interfaces * no longer sorts indices when creating multi-idx * tenantHeads & HeadManger index impls * head mgr tests * bugfixes & head manager tests * tsdb dir selection now helper fns * period utility * pulls out more code to helpers, fixes some var races * head recovery is more generic * tsdb manager builds from wals * pulls more helpers out of headmanager * lockedIdx, Close() on idx, tsdbManager update * removes mmap from index reader implementation * tsdb file * adds tsdb shipper config and refactors initStore * removes unused tsdbManager code * implements stores.Index and stores.ChunkWriter for tsdb * chunk.Data now supports an Entries() method * moves walreader to new util/wal pkg to avoid circular dep + tsdb storage alignment * tsdb store * passes indexWriter to chunkWriter * build a tsdb per index bucket in according with shipper conventions * dont open tsdb files until necessary for indexshipper * tsdbManager Index impl * tsdb defaults + initStore fix for invalid looping * fixes UsingTSDB helper * disables deleteRequestStore when using TSDB * pass limits to tsdb store * always start headmanager for tsdb Signed-off-by: Owen Diehl <ow.diehl@gmail.com> * fixes copy bug Signed-off-by: Owen Diehl <ow.diehl@gmail.com> * more logging Signed-off-by: Owen Diehl <ow.diehl@gmail.com> * fixes duplicate tenant label bug Signed-off-by: Owen Diehl <ow.diehl@gmail.com> * debug logs, uses label builder, removes __name__=logs for tsdb Signed-off-by: Owen Diehl <ow.diehl@gmail.com> * tsdb fixes labels at earlier pt Signed-off-by: Owen Diehl <ow.diehl@gmail.com> * account for setting tenant label in head manager test * changing tsdb dir names * identifier interface, builder to tsdb pkg * tsdb version path prefix * fixes buildfromwals identifier * fixes tsdb shipper paths * split buckets once per user set * refactors combining single and multi tenant tsdb indices on shipper reads * indexshipper ignores old gzip logic * method name refactor * remove unused record type * removes v1 prefix in tsdb paths and refactores indices method * ignores double optimization in tsdb looking for multitenant idx, shipper handles this * removes 5-ln requirement on shipper tablename regexp * groups identifiers, begins removing multitenant prefix in shipped files * passses open fn to indexshipper * exposes RealByteSlice * TSDBFile no longer needs a file descriptor, parses gzip extensions * method signature fixing * stop masquerading as compressed indices post-download in indexshipper * variable bucket regexp * removes accidental configs committed * label matcher handling for multitenancy and metricname in tsdb * explicitly require fingerprint when creating tsdb index * only add tenant label when creating multitenant tsdb write fingerprints without synthetic tenant label strip out tenant labels from queries * linting + unused removal * more linting :( * goimports * removes uploadername from indexshipper * maxuint32 for arm32 builds * tsdb chunk filterer support * always set ingester name when using object storage index Co-authored-by: Sandeep Sukhani <sandeep.d.sukhani@gmail.com>
3 years ago
func (NoopIndex) SetChunkFilterer(chunkFilter chunk.RequestChunkFilterer) {}