Like Prometheus, but for logs.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
loki/pkg/storage/chunk/bigchunk.go

257 lines
5.7 KiB

package chunk
import (
"bytes"
"encoding/binary"
"errors"
"io"
"github.com/prometheus/common/model"
"github.com/prometheus/prometheus/tsdb/chunkenc"
Add filter parameter to rebound so lines can be deleted by the compactor (#5879) * Add filter parameter to rebound Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Fix linting issues Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Add filter function to delete request Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Fix linting issues Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Enable api for filter and delete mode Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Add settings for retention Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Use labels to check and add test Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Simplify filter function Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Also enable filter mode Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Remove test settings in config file for docker Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Add extra (unused) param for ProcessString in filter Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Empty commit to trigger CI again Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Update changelog Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Fix flapping test Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Remove commented out unit tests and add some more Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Add extra test case for delete request without line filter Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Use chunk bounds Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * check if the log selector has a filter if the whole chunk is selected Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * fix lint issue: use correct go-kit import Signed-off-by: Michel Hollands <michel.hollands@grafana.com>
4 years ago
"github.com/grafana/loki/pkg/util/filter"
)
const samplesPerChunk = 120
var errOutOfBounds = errors.New("out of bounds")
type smallChunk struct {
chunkenc.XORChunk
start int64
}
// bigchunk is a set of prometheus/tsdb chunks. It grows over time and has no
// upperbound on number of samples it can contain.
type bigchunk struct {
chunks []smallChunk
appender chunkenc.Appender
remainingSamples int
}
func newBigchunk() *bigchunk {
return &bigchunk{}
}
TSDB shipper + WAL (#6049) * begins speccing out TSDB Head * auto incrementing series ref + mempostings * mintime/maxtime methods * tsdb head IndexReader impl * head correctly populates ref lookup * tsdb head tests * adds prometheus license to tsdb head * linting * [WIP] speccing out tsdb head wal * fix length check and adds tsdb wal encoding tests * exposes wal structs & removes closed semantics * logs start time in the tsdb wal * wal interface + testing * exports walrecord + returns ref when appending * specs out head manager * tsdb head manager wal initialization * tsdb wal rotation * wals dont use node name, but tsdb files do * cleans up fn signature * multi tsdb idx now just wraps Index interfaces * no longer sorts indices when creating multi-idx * tenantHeads & HeadManger index impls * head mgr tests * bugfixes & head manager tests * tsdb dir selection now helper fns * period utility * pulls out more code to helpers, fixes some var races * head recovery is more generic * tsdb manager builds from wals * pulls more helpers out of headmanager * lockedIdx, Close() on idx, tsdbManager update * removes mmap from index reader implementation * tsdb file * adds tsdb shipper config and refactors initStore * removes unused tsdbManager code * implements stores.Index and stores.ChunkWriter for tsdb * chunk.Data now supports an Entries() method * moves walreader to new util/wal pkg to avoid circular dep + tsdb storage alignment * tsdb store * passes indexWriter to chunkWriter * build a tsdb per index bucket in according with shipper conventions * dont open tsdb files until necessary for indexshipper * tsdbManager Index impl * tsdb defaults + initStore fix for invalid looping * fixes UsingTSDB helper * disables deleteRequestStore when using TSDB * pass limits to tsdb store * always start headmanager for tsdb Signed-off-by: Owen Diehl <ow.diehl@gmail.com> * fixes copy bug Signed-off-by: Owen Diehl <ow.diehl@gmail.com> * more logging Signed-off-by: Owen Diehl <ow.diehl@gmail.com> * fixes duplicate tenant label bug Signed-off-by: Owen Diehl <ow.diehl@gmail.com> * debug logs, uses label builder, removes __name__=logs for tsdb Signed-off-by: Owen Diehl <ow.diehl@gmail.com> * tsdb fixes labels at earlier pt Signed-off-by: Owen Diehl <ow.diehl@gmail.com> * account for setting tenant label in head manager test * changing tsdb dir names * identifier interface, builder to tsdb pkg * tsdb version path prefix * fixes buildfromwals identifier * fixes tsdb shipper paths * split buckets once per user set * refactors combining single and multi tenant tsdb indices on shipper reads * indexshipper ignores old gzip logic * method name refactor * remove unused record type * removes v1 prefix in tsdb paths and refactores indices method * ignores double optimization in tsdb looking for multitenant idx, shipper handles this * removes 5-ln requirement on shipper tablename regexp * groups identifiers, begins removing multitenant prefix in shipped files * passses open fn to indexshipper * exposes RealByteSlice * TSDBFile no longer needs a file descriptor, parses gzip extensions * method signature fixing * stop masquerading as compressed indices post-download in indexshipper * variable bucket regexp * removes accidental configs committed * label matcher handling for multitenancy and metricname in tsdb * explicitly require fingerprint when creating tsdb index * only add tenant label when creating multitenant tsdb write fingerprints without synthetic tenant label strip out tenant labels from queries * linting + unused removal * more linting :( * goimports * removes uploadername from indexshipper * maxuint32 for arm32 builds * tsdb chunk filterer support * always set ingester name when using object storage index Co-authored-by: Sandeep Sukhani <sandeep.d.sukhani@gmail.com>
4 years ago
// TODO(owen-d): remove bigchunk from our code, we don't use it.
// Hack an Entries() impl
func (b *bigchunk) Entries() int { return 0 }
func (b *bigchunk) Add(sample model.SamplePair) (Data, error) {
if b.remainingSamples == 0 {
if bigchunkSizeCapBytes > 0 && b.Size() > bigchunkSizeCapBytes {
return addToOverflowChunk(sample)
}
if err := b.addNextChunk(sample.Timestamp); err != nil {
return nil, err
}
}
b.appender.Append(int64(sample.Timestamp), float64(sample.Value))
b.remainingSamples--
return nil, nil
}
// addNextChunk adds a new XOR "subchunk" to the internal list of chunks.
func (b *bigchunk) addNextChunk(start model.Time) error {
// To save memory, we "compact" the previous chunk - the array backing the slice
// will be upto 2x too big, and we can save this space.
const chunkCapacityExcess = 32 // don't bother copying if it's within this range
if l := len(b.chunks); l > 0 {
oldBuf := b.chunks[l-1].XORChunk.Bytes()
if cap(oldBuf) > len(oldBuf)+chunkCapacityExcess {
buf := make([]byte, len(oldBuf))
copy(buf, oldBuf)
compacted, err := chunkenc.FromData(chunkenc.EncXOR, buf)
if err != nil {
return err
}
b.chunks[l-1].XORChunk = *compacted.(*chunkenc.XORChunk)
}
}
// Explicitly reallocate slice to avoid up to 2x overhead if we let append() do it
if len(b.chunks)+1 > cap(b.chunks) {
newChunks := make([]smallChunk, len(b.chunks), len(b.chunks)+1)
copy(newChunks, b.chunks)
b.chunks = newChunks
}
b.chunks = append(b.chunks, smallChunk{
XORChunk: *chunkenc.NewXORChunk(),
start: int64(start),
})
appender, err := b.chunks[len(b.chunks)-1].Appender()
if err != nil {
return err
}
b.appender = appender
b.remainingSamples = samplesPerChunk
return nil
}
func (b *bigchunk) Rebound(_, _ model.Time, _ filter.Func) (Data, error) {
return nil, errors.New("not implemented")
}
func (b *bigchunk) Marshal(wio io.Writer) error {
w := writer{wio}
if err := w.WriteVarInt16(uint16(len(b.chunks))); err != nil {
return err
}
for _, chunk := range b.chunks {
buf := chunk.Bytes()
if err := w.WriteVarInt16(uint16(len(buf))); err != nil {
return err
}
if _, err := w.Write(buf); err != nil {
return err
}
}
return nil
}
func (b *bigchunk) MarshalToBuf(buf []byte) error {
writer := bytes.NewBuffer(buf)
return b.Marshal(writer)
}
func (b *bigchunk) UnmarshalFromBuf(buf []byte) error {
r := reader{buf: buf}
numChunks, err := r.ReadUint16()
if err != nil {
return err
}
b.chunks = make([]smallChunk, 0, numChunks+1) // allow one extra space in case we want to add new data
var reuseIter chunkenc.Iterator
for i := uint16(0); i < numChunks; i++ {
chunkLen, err := r.ReadUint16()
if err != nil {
return err
}
chunkBuf, err := r.ReadBytes(int(chunkLen))
if err != nil {
return err
}
chunk, err := chunkenc.FromData(chunkenc.EncXOR, chunkBuf)
if err != nil {
return err
}
var start int64
start, reuseIter, err = firstTime(chunk, reuseIter)
if err != nil {
return err
}
b.chunks = append(b.chunks, smallChunk{
XORChunk: *chunk.(*chunkenc.XORChunk),
start: start,
})
}
return nil
}
func (b *bigchunk) Encoding() Encoding {
return Bigchunk
}
func (b *bigchunk) Utilization() float64 {
return 1.0
}
func (b *bigchunk) Len() int {
sum := 0
for _, c := range b.chunks {
sum += c.NumSamples()
}
return sum
}
// Unused, but for compatibility
func (b *bigchunk) UncompressedSize() int { return b.Size() }
func (b *bigchunk) Size() int {
sum := 2 // For the number of sub chunks.
for _, c := range b.chunks {
sum += 2 // For the length of the sub chunk.
sum += len(c.Bytes())
}
return sum
}
func (b *bigchunk) Slice(start, end model.Time) Data {
i, j := 0, len(b.chunks)
for k := 0; k < len(b.chunks); k++ {
if b.chunks[k].start <= int64(start) {
i = k
}
if b.chunks[k].start > int64(end) {
j = k
break
}
}
return &bigchunk{
chunks: b.chunks[i:j],
}
}
type writer struct {
io.Writer
}
func (w writer) WriteVarInt16(i uint16) error {
var b [2]byte
binary.LittleEndian.PutUint16(b[:], i)
_, err := w.Write(b[:])
return err
}
type reader struct {
i int
buf []byte
}
func (r *reader) ReadUint16() (uint16, error) {
if r.i+2 > len(r.buf) {
return 0, errOutOfBounds
}
result := binary.LittleEndian.Uint16(r.buf[r.i:])
r.i += 2
return result, nil
}
func (r *reader) ReadBytes(count int) ([]byte, error) {
if r.i+count > len(r.buf) {
return nil, errOutOfBounds
}
result := r.buf[r.i : r.i+count]
r.i += count
return result, nil
}
// addToOverflowChunk is a utility function that creates a new chunk as overflow
// chunk, adds the provided sample to it, and returns a chunk slice containing
// the provided old chunk followed by the new overflow chunk.
func addToOverflowChunk(s model.SamplePair) (Data, error) {
overflowChunk := New()
_, err := overflowChunk.(*bigchunk).Add(s)
if err != nil {
return nil, err
}
return overflowChunk, nil
}
func firstTime(c chunkenc.Chunk, iter chunkenc.Iterator) (int64, chunkenc.Iterator, error) {
var first int64
iter = c.Iterator(iter)
if iter.Next() != chunkenc.ValNone {
first, _ = iter.At()
}
return first, iter, iter.Err()
}