Like Prometheus, but for logs.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
loki/pkg/chunkenc/memchunk_test.go

1878 lines
56 KiB

Adds configurable compression algorithms for chunks (#1411) * Adds L4Z encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds encoding benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds chunk size test Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy v2 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Improve benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove chunkenc Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update lz4 to latest master version. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use temporary buffer in serialise method to avoid allocations when doing string -> byte conversion. It also makes code little more readable. We pool those buffers for reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added gzip -1 for comparison. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Initialize reader and buffered reader lazily. This helps with reader/buffered reader reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Don't keep entries, extracted generateData function (mostly to get more understandable profile) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Improve test and benchmark to cover all encodings. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds support for a new chunk format with encoding info. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Ingesters now support encoding config. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add support for no compression. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add docs Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove default Gzip for ByteChunk. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Removes none, snappyv2 and gzip-1 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Move log test lines to testdata and add supported encoding stringer Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * got linted Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
6 years ago
package chunkenc
import (
"bytes"
"context"
"encoding/binary"
Adds configurable compression algorithms for chunks (#1411) * Adds L4Z encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds encoding benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds chunk size test Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy v2 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Improve benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove chunkenc Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update lz4 to latest master version. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use temporary buffer in serialise method to avoid allocations when doing string -> byte conversion. It also makes code little more readable. We pool those buffers for reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added gzip -1 for comparison. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Initialize reader and buffered reader lazily. This helps with reader/buffered reader reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Don't keep entries, extracted generateData function (mostly to get more understandable profile) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Improve test and benchmark to cover all encodings. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds support for a new chunk format with encoding info. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Ingesters now support encoding config. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add support for no compression. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add docs Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove default Gzip for ByteChunk. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Removes none, snappyv2 and gzip-1 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Move log test lines to testdata and add supported encoding stringer Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * got linted Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
6 years ago
"fmt"
"math"
"math/rand"
"sort"
"strconv"
"strings"
Adds configurable compression algorithms for chunks (#1411) * Adds L4Z encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds encoding benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds chunk size test Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy v2 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Improve benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove chunkenc Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update lz4 to latest master version. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use temporary buffer in serialise method to avoid allocations when doing string -> byte conversion. It also makes code little more readable. We pool those buffers for reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added gzip -1 for comparison. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Initialize reader and buffered reader lazily. This helps with reader/buffered reader reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Don't keep entries, extracted generateData function (mostly to get more understandable profile) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Improve test and benchmark to cover all encodings. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds support for a new chunk format with encoding info. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Ingesters now support encoding config. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add support for no compression. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add docs Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove default Gzip for ByteChunk. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Removes none, snappyv2 and gzip-1 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Move log test lines to testdata and add supported encoding stringer Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * got linted Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
6 years ago
"testing"
"time"
"github.com/dustin/go-humanize"
"github.com/prometheus/prometheus/model/labels"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
Adds configurable compression algorithms for chunks (#1411) * Adds L4Z encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds encoding benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds chunk size test Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy v2 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Improve benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove chunkenc Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update lz4 to latest master version. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use temporary buffer in serialise method to avoid allocations when doing string -> byte conversion. It also makes code little more readable. We pool those buffers for reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added gzip -1 for comparison. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Initialize reader and buffered reader lazily. This helps with reader/buffered reader reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Don't keep entries, extracted generateData function (mostly to get more understandable profile) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Improve test and benchmark to cover all encodings. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds support for a new chunk format with encoding info. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Ingesters now support encoding config. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add support for no compression. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add docs Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove default Gzip for ByteChunk. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Removes none, snappyv2 and gzip-1 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Move log test lines to testdata and add supported encoding stringer Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * got linted Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
6 years ago
"github.com/grafana/loki/pkg/chunkenc/testdata"
"github.com/grafana/loki/pkg/iter"
Adds configurable compression algorithms for chunks (#1411) * Adds L4Z encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds encoding benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds chunk size test Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy v2 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Improve benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove chunkenc Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update lz4 to latest master version. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use temporary buffer in serialise method to avoid allocations when doing string -> byte conversion. It also makes code little more readable. We pool those buffers for reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added gzip -1 for comparison. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Initialize reader and buffered reader lazily. This helps with reader/buffered reader reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Don't keep entries, extracted generateData function (mostly to get more understandable profile) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Improve test and benchmark to cover all encodings. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds support for a new chunk format with encoding info. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Ingesters now support encoding config. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add support for no compression. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add docs Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove default Gzip for ByteChunk. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Removes none, snappyv2 and gzip-1 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Move log test lines to testdata and add supported encoding stringer Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * got linted Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
6 years ago
"github.com/grafana/loki/pkg/logproto"
"github.com/grafana/loki/pkg/logql/log"
"github.com/grafana/loki/pkg/logql/syntax"
"github.com/grafana/loki/pkg/logqlmodel/stats"
"github.com/grafana/loki/pkg/push"
"github.com/grafana/loki/pkg/storage/chunk"
Adds configurable compression algorithms for chunks (#1411) * Adds L4Z encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds encoding benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds chunk size test Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy v2 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Improve benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove chunkenc Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update lz4 to latest master version. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use temporary buffer in serialise method to avoid allocations when doing string -> byte conversion. It also makes code little more readable. We pool those buffers for reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added gzip -1 for comparison. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Initialize reader and buffered reader lazily. This helps with reader/buffered reader reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Don't keep entries, extracted generateData function (mostly to get more understandable profile) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Improve test and benchmark to cover all encodings. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds support for a new chunk format with encoding info. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Ingesters now support encoding config. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add support for no compression. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add docs Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove default Gzip for ByteChunk. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Removes none, snappyv2 and gzip-1 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Move log test lines to testdata and add supported encoding stringer Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * got linted Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
6 years ago
)
var testEncoding = []Encoding{
EncNone,
EncGZIP,
EncLZ4_64k,
EncLZ4_256k,
EncLZ4_1M,
EncLZ4_4M,
Adds configurable compression algorithms for chunks (#1411) * Adds L4Z encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds encoding benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds chunk size test Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy v2 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Improve benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove chunkenc Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update lz4 to latest master version. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use temporary buffer in serialise method to avoid allocations when doing string -> byte conversion. It also makes code little more readable. We pool those buffers for reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added gzip -1 for comparison. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Initialize reader and buffered reader lazily. This helps with reader/buffered reader reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Don't keep entries, extracted generateData function (mostly to get more understandable profile) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Improve test and benchmark to cover all encodings. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds support for a new chunk format with encoding info. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Ingesters now support encoding config. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add support for no compression. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add docs Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove default Gzip for ByteChunk. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Removes none, snappyv2 and gzip-1 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Move log test lines to testdata and add supported encoding stringer Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * got linted Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
6 years ago
EncSnappy,
EncFlate,
EncZstd,
Adds configurable compression algorithms for chunks (#1411) * Adds L4Z encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds encoding benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds chunk size test Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy v2 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Improve benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove chunkenc Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update lz4 to latest master version. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use temporary buffer in serialise method to avoid allocations when doing string -> byte conversion. It also makes code little more readable. We pool those buffers for reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added gzip -1 for comparison. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Initialize reader and buffered reader lazily. This helps with reader/buffered reader reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Don't keep entries, extracted generateData function (mostly to get more understandable profile) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Improve test and benchmark to cover all encodings. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds support for a new chunk format with encoding info. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Ingesters now support encoding config. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add support for no compression. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add docs Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove default Gzip for ByteChunk. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Removes none, snappyv2 and gzip-1 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Move log test lines to testdata and add supported encoding stringer Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * got linted Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
6 years ago
}
var (
testBlockSize = 256 * 1024
testTargetSize = 1500 * 1024
testBlockSizes = []int{64 * 1024, 256 * 1024, 512 * 1024}
countExtractor = func() log.StreamSampleExtractor {
ex, err := log.NewLineSampleExtractor(log.CountExtractor, nil, nil, false, false)
if err != nil {
panic(err)
}
return ex.ForStream(labels.Labels{})
}()
allPossibleFormats = []struct {
headBlockFmt HeadBlockFmt
chunkFormat byte
}{
{
headBlockFmt: OrderedHeadBlockFmt,
chunkFormat: chunkFormatV2,
},
{
headBlockFmt: OrderedHeadBlockFmt,
chunkFormat: chunkFormatV3,
},
{
headBlockFmt: UnorderedHeadBlockFmt,
chunkFormat: chunkFormatV3,
},
{
headBlockFmt: UnorderedWithNonIndexedLabelsHeadBlockFmt,
chunkFormat: chunkFormatV4,
},
}
)
const DefaultTestHeadBlockFmt = OrderedHeadBlockFmt
func TestBlocksInclusive(t *testing.T) {
chk := NewMemChunk(EncNone, DefaultTestHeadBlockFmt, testBlockSize, testTargetSize)
err := chk.Append(logprotoEntry(1, "1"))
require.Nil(t, err)
err = chk.cut()
require.Nil(t, err)
blocks := chk.Blocks(time.Unix(0, 1), time.Unix(0, 1))
require.Equal(t, 1, len(blocks))
require.Equal(t, 1, blocks[0].Entries())
}
Adds configurable compression algorithms for chunks (#1411) * Adds L4Z encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds encoding benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds chunk size test Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy v2 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Improve benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove chunkenc Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update lz4 to latest master version. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use temporary buffer in serialise method to avoid allocations when doing string -> byte conversion. It also makes code little more readable. We pool those buffers for reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added gzip -1 for comparison. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Initialize reader and buffered reader lazily. This helps with reader/buffered reader reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Don't keep entries, extracted generateData function (mostly to get more understandable profile) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Improve test and benchmark to cover all encodings. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds support for a new chunk format with encoding info. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Ingesters now support encoding config. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add support for no compression. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add docs Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove default Gzip for ByteChunk. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Removes none, snappyv2 and gzip-1 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Move log test lines to testdata and add supported encoding stringer Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * got linted Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
6 years ago
func TestBlock(t *testing.T) {
for _, enc := range testEncoding {
enc := enc
for _, format := range allPossibleFormats {
chunkFormat, headBlockFmt := format.chunkFormat, format.headBlockFmt
t.Run(fmt.Sprintf("encoding:%v chunkFormat:%v headBlockFmt:%v", enc, chunkFormat, headBlockFmt), func(t *testing.T) {
t.Parallel()
chk := newMemChunkWithFormat(chunkFormat, enc, headBlockFmt, testBlockSize, testTargetSize)
cases := []struct {
ts int64
str string
lbs []logproto.LabelAdapter
cut bool
}{
{
ts: 1,
str: "hello, world!",
},
{
ts: 2,
str: "hello, world2!",
lbs: []logproto.LabelAdapter{
{Name: "app", Value: "myapp"},
},
},
{
ts: 3,
str: "hello, world3!",
lbs: []logproto.LabelAdapter{
{Name: "a", Value: "a"},
{Name: "b", Value: "b"},
},
},
{
ts: 4,
str: "hello, world4!",
},
{
ts: 5,
str: "hello, world5!",
},
{
ts: 6,
str: "hello, world6!",
cut: true,
},
{
ts: 7,
str: "hello, world7!",
},
{
ts: 8,
str: "hello, worl\nd8!",
},
{
ts: 8,
str: "hello, world 8, 2!",
},
{
ts: 8,
str: "hello, world 8, 3!",
},
{
ts: 9,
str: "",
},
{
ts: 10,
str: "hello, world10!",
lbs: []logproto.LabelAdapter{
{Name: "a", Value: "a2"},
{Name: "b", Value: "b"},
},
},
Adds configurable compression algorithms for chunks (#1411) * Adds L4Z encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds encoding benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds chunk size test Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy v2 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Improve benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove chunkenc Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update lz4 to latest master version. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use temporary buffer in serialise method to avoid allocations when doing string -> byte conversion. It also makes code little more readable. We pool those buffers for reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added gzip -1 for comparison. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Initialize reader and buffered reader lazily. This helps with reader/buffered reader reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Don't keep entries, extracted generateData function (mostly to get more understandable profile) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Improve test and benchmark to cover all encodings. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds support for a new chunk format with encoding info. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Ingesters now support encoding config. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add support for no compression. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add docs Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove default Gzip for ByteChunk. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Removes none, snappyv2 and gzip-1 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Move log test lines to testdata and add supported encoding stringer Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * got linted Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
6 years ago
}
Improve metric queries by computing samples at the edges. (#2293) * First pass breaking the code appart. Wondering how we're going to achieve fast mutation of labels. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Work in progress. I realize I need hash for deduping lines. going to benchmark somes. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Tested some hash and decided which one to use. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Wip Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Starting working on ingester. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Trying to find a better hash function. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * More hash testing we have a winner. xxhash it is. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Settle on xxhash Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Better params interfacing. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add interface for queryparams for things that exist in both type of params. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add storage sample iterator implementations. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Fixing tests and verifying we don't get collions for the hashing method. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Fixing ingesters tests and refactoring utility function/tests. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Fixing and testing that stats are still well computed. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Fixing more tests. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * More engine tests finished. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Fixes sharding evaluator. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Fixes more engine tests. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Fix error tests in the engine. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Finish fixing all tests. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Fixes a bug where extractor was not passed in correctly. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add notes about upgrade. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Renamed and fix a bug. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add memchunk tests and starting test for sampleIterator. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Test heap sample iterator. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * working on test. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Finishing testing all new iterators. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Making sure all store functions are tested. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Benchmark and verify everything is working well. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Make the linter happy. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * use xxhash v2. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Fix a flaky test because of map. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * go.mod. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> Co-authored-by: Edward Welch <edward.welch@grafana.com>
6 years ago
for _, c := range cases {
require.NoError(t, chk.Append(logprotoEntryWithNonIndexedLabels(c.ts, c.str, c.lbs)))
if c.cut {
require.NoError(t, chk.cut())
}
}
Adds configurable compression algorithms for chunks (#1411) * Adds L4Z encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds encoding benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds chunk size test Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy v2 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Improve benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove chunkenc Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update lz4 to latest master version. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use temporary buffer in serialise method to avoid allocations when doing string -> byte conversion. It also makes code little more readable. We pool those buffers for reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added gzip -1 for comparison. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Initialize reader and buffered reader lazily. This helps with reader/buffered reader reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Don't keep entries, extracted generateData function (mostly to get more understandable profile) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Improve test and benchmark to cover all encodings. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds support for a new chunk format with encoding info. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Ingesters now support encoding config. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add support for no compression. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add docs Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove default Gzip for ByteChunk. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Removes none, snappyv2 and gzip-1 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Move log test lines to testdata and add supported encoding stringer Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * got linted Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
6 years ago
var noopStreamPipeline = log.NewNoopPipeline().ForStream(labels.Labels{})
it, err := chk.Iterator(context.Background(), time.Unix(0, 0), time.Unix(0, math.MaxInt64), logproto.FORWARD, noopStreamPipeline)
Adds configurable compression algorithms for chunks (#1411) * Adds L4Z encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds encoding benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds chunk size test Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy v2 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Improve benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove chunkenc Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update lz4 to latest master version. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use temporary buffer in serialise method to avoid allocations when doing string -> byte conversion. It also makes code little more readable. We pool those buffers for reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added gzip -1 for comparison. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Initialize reader and buffered reader lazily. This helps with reader/buffered reader reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Don't keep entries, extracted generateData function (mostly to get more understandable profile) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Improve test and benchmark to cover all encodings. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds support for a new chunk format with encoding info. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Ingesters now support encoding config. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add support for no compression. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add docs Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove default Gzip for ByteChunk. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Removes none, snappyv2 and gzip-1 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Move log test lines to testdata and add supported encoding stringer Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * got linted Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
6 years ago
require.NoError(t, err)
idx := 0
Adds configurable compression algorithms for chunks (#1411) * Adds L4Z encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds encoding benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds chunk size test Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy v2 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Improve benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove chunkenc Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update lz4 to latest master version. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use temporary buffer in serialise method to avoid allocations when doing string -> byte conversion. It also makes code little more readable. We pool those buffers for reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added gzip -1 for comparison. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Initialize reader and buffered reader lazily. This helps with reader/buffered reader reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Don't keep entries, extracted generateData function (mostly to get more understandable profile) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Improve test and benchmark to cover all encodings. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds support for a new chunk format with encoding info. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Ingesters now support encoding config. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add support for no compression. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add docs Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove default Gzip for ByteChunk. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Removes none, snappyv2 and gzip-1 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Move log test lines to testdata and add supported encoding stringer Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * got linted Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
6 years ago
for it.Next() {
e := it.Entry()
require.Equal(t, cases[idx].ts, e.Timestamp.UnixNano())
require.Equal(t, cases[idx].str, e.Line)
require.Empty(t, e.NonIndexedLabels)
if chunkFormat < chunkFormatV4 {
require.Equal(t, labels.EmptyLabels().String(), it.Labels())
} else {
expectedLabels := logproto.FromLabelAdaptersToLabels(cases[idx].lbs).String()
require.Equal(t, expectedLabels, it.Labels())
}
Adds configurable compression algorithms for chunks (#1411) * Adds L4Z encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds encoding benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds chunk size test Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy v2 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Improve benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove chunkenc Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update lz4 to latest master version. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use temporary buffer in serialise method to avoid allocations when doing string -> byte conversion. It also makes code little more readable. We pool those buffers for reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added gzip -1 for comparison. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Initialize reader and buffered reader lazily. This helps with reader/buffered reader reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Don't keep entries, extracted generateData function (mostly to get more understandable profile) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Improve test and benchmark to cover all encodings. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds support for a new chunk format with encoding info. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Ingesters now support encoding config. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add support for no compression. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add docs Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove default Gzip for ByteChunk. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Removes none, snappyv2 and gzip-1 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Move log test lines to testdata and add supported encoding stringer Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * got linted Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
6 years ago
idx++
}
Adds configurable compression algorithms for chunks (#1411) * Adds L4Z encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds encoding benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds chunk size test Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy v2 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Improve benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove chunkenc Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update lz4 to latest master version. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use temporary buffer in serialise method to avoid allocations when doing string -> byte conversion. It also makes code little more readable. We pool those buffers for reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added gzip -1 for comparison. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Initialize reader and buffered reader lazily. This helps with reader/buffered reader reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Don't keep entries, extracted generateData function (mostly to get more understandable profile) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Improve test and benchmark to cover all encodings. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds support for a new chunk format with encoding info. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Ingesters now support encoding config. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add support for no compression. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add docs Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove default Gzip for ByteChunk. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Removes none, snappyv2 and gzip-1 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Move log test lines to testdata and add supported encoding stringer Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * got linted Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
6 years ago
require.NoError(t, it.Error())
require.NoError(t, it.Close())
require.Equal(t, len(cases), idx)
countExtractor = func() log.StreamSampleExtractor {
ex, err := log.NewLineSampleExtractor(log.CountExtractor, nil, nil, false, false)
if err != nil {
panic(err)
}
return ex.ForStream(labels.Labels{})
}()
sampleIt := chk.SampleIterator(context.Background(), time.Unix(0, 0), time.Unix(0, math.MaxInt64), countExtractor)
idx = 0
for sampleIt.Next() {
s := sampleIt.Sample()
require.Equal(t, cases[idx].ts, s.Timestamp)
require.Equal(t, 1., s.Value)
require.NotEmpty(t, s.Hash)
idx++
}
require.NoError(t, sampleIt.Error())
require.NoError(t, sampleIt.Close())
require.Equal(t, len(cases), idx)
t.Run("bounded-iteration", func(t *testing.T) {
it, err := chk.Iterator(context.Background(), time.Unix(0, 3), time.Unix(0, 7), logproto.FORWARD, noopStreamPipeline)
require.NoError(t, err)
idx := 2
for it.Next() {
e := it.Entry()
require.Equal(t, cases[idx].ts, e.Timestamp.UnixNano())
require.Equal(t, cases[idx].str, e.Line)
idx++
}
require.NoError(t, it.Error())
require.Equal(t, 6, idx)
})
Adds configurable compression algorithms for chunks (#1411) * Adds L4Z encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds encoding benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds chunk size test Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy v2 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Improve benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove chunkenc Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update lz4 to latest master version. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use temporary buffer in serialise method to avoid allocations when doing string -> byte conversion. It also makes code little more readable. We pool those buffers for reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added gzip -1 for comparison. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Initialize reader and buffered reader lazily. This helps with reader/buffered reader reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Don't keep entries, extracted generateData function (mostly to get more understandable profile) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Improve test and benchmark to cover all encodings. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds support for a new chunk format with encoding info. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Ingesters now support encoding config. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add support for no compression. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add docs Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove default Gzip for ByteChunk. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Removes none, snappyv2 and gzip-1 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Move log test lines to testdata and add supported encoding stringer Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * got linted Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
6 years ago
})
}
Adds configurable compression algorithms for chunks (#1411) * Adds L4Z encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds encoding benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds chunk size test Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy v2 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Improve benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove chunkenc Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update lz4 to latest master version. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use temporary buffer in serialise method to avoid allocations when doing string -> byte conversion. It also makes code little more readable. We pool those buffers for reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added gzip -1 for comparison. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Initialize reader and buffered reader lazily. This helps with reader/buffered reader reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Don't keep entries, extracted generateData function (mostly to get more understandable profile) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Improve test and benchmark to cover all encodings. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds support for a new chunk format with encoding info. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Ingesters now support encoding config. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add support for no compression. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add docs Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove default Gzip for ByteChunk. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Removes none, snappyv2 and gzip-1 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Move log test lines to testdata and add supported encoding stringer Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * got linted Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
6 years ago
}
}
chunks: decode varints directly from byte buffer; stop panicing on some corrupt inputs (#7264) **What this PR does / why we need it**: This is much faster, since we don't pay for byte-at-a-time reads from the buffer. For best performance with gzip and lz4, we still want a bufio.Reader. We don't need the pool of buffered readers now. Other compression formats have their own internal buffers so don't need extra buffering. **Checklist** - [x] Reviewed the `CONTRIBUTING.md` guide - NA Documentation added - NA Tests updated - [x] `CHANGELOG.md` updated - NA Changes that require user attention or interaction to upgrade are documented in `docs/sources/upgrading/_index.md` (using changes from https://github.com/grafana/loki/pull/7246, ignoring lines that didn't change much) ``` name old time/op new time/op delta Write/ordered-lz4-256k-4 24.1ms ± 7% 22.1ms ± 2% -8.52% (p=0.008 n=5+5) Write/unordered-lz4-4 54.6ms ±29% 49.1ms ± 1% -10.04% (p=0.016 n=5+4) Read/none_66_kB-4 4.16ms ± 6% 2.54ms ± 2% -38.87% (p=0.008 n=5+5) Read/gzip_66_kB-4 177ms ± 4% 161ms ± 4% -9.09% (p=0.008 n=5+5) Read/lz4-64k_66_kB-4 59.5ms ± 2% 49.5ms ± 3% -16.80% (p=0.008 n=5+5) Read/lz4-256k_66_kB-4 62.9ms ± 0% 51.2ms ± 1% -18.58% (p=0.016 n=4+5) Read/lz4-1M_66_kB-4 62.9ms ± 2% 51.5ms ± 1% -18.03% (p=0.008 n=5+5) Read/lz4_66_kB-4 63.7ms ± 4% 51.2ms ± 1% -19.68% (p=0.008 n=5+5) Read/snappy_66_kB-4 51.1ms ± 2% 41.9ms ± 1% -17.95% (p=0.016 n=5+4) Read/flate_66_kB-4 168ms ± 4% 150ms ± 2% -10.49% (p=0.008 n=5+5) Read/zstd_66_kB-4 334ms ±16% 272ms ± 6% -18.52% (p=0.008 n=5+5) Read/none_262_kB-4 3.93ms ± 1% 2.48ms ± 2% -36.91% (p=0.008 n=5+5) Read/gzip_262_kB-4 159ms ± 5% 143ms ± 2% -10.12% (p=0.008 n=5+5) Read/lz4-64k_262_kB-4 53.5ms ± 1% 43.2ms ± 3% -19.35% (p=0.008 n=5+5) Read/lz4-256k_262_kB-4 55.8ms ± 1% 45.3ms ± 2% -18.75% (p=0.008 n=5+5) Read/lz4-1M_262_kB-4 57.5ms ± 2% 45.5ms ± 2% -20.91% (p=0.008 n=5+5) Read/lz4_262_kB-4 56.8ms ± 2% 45.3ms ± 2% -20.26% (p=0.008 n=5+5) Read/snappy_262_kB-4 46.0ms ± 2% 37.5ms ± 1% -18.57% (p=0.008 n=5+5) Read/flate_262_kB-4 153ms ± 3% 134ms ± 2% -12.33% (p=0.008 n=5+5) Read/none_524_kB-4 3.85ms ± 2% 2.46ms ± 2% -36.04% (p=0.008 n=5+5) Read/gzip_524_kB-4 128ms ± 2% 119ms ± 2% -7.64% (p=0.008 n=5+5) Read/lz4-64k_524_kB-4 44.6ms ± 3% 34.7ms ± 1% -22.19% (p=0.008 n=5+5) Read/lz4-256k_524_kB-4 45.4ms ± 1% 36.7ms ± 5% -19.07% (p=0.008 n=5+5) Read/lz4-1M_524_kB-4 47.3ms ± 1% 38.6ms ± 2% -18.36% (p=0.008 n=5+5) Read/lz4_524_kB-4 47.4ms ± 1% 38.0ms ± 1% -19.70% (p=0.008 n=5+5) Read/snappy_524_kB-4 37.3ms ± 1% 31.1ms ± 3% -16.47% (p=0.008 n=5+5) Read/flate_524_kB-4 122ms ± 1% 111ms ± 3% -9.04% (p=0.008 n=5+5) Read/sample_none_66_kB-4 7.81ms ± 1% 6.62ms ± 6% -15.24% (p=0.008 n=5+5) Read/sample_gzip_66_kB-4 237ms ± 2% 219ms ± 1% -7.57% (p=0.008 n=5+5) Read/sample_lz4-64k_66_kB-4 105ms ± 2% 92ms ± 1% -12.03% (p=0.008 n=5+5) Read/sample_lz4-256k_66_kB-4 109ms ± 1% 98ms ± 2% -9.65% (p=0.008 n=5+5) Read/sample_lz4-1M_66_kB-4 113ms ± 8% 98ms ± 2% -13.66% (p=0.008 n=5+5) Read/sample_lz4_66_kB-4 109ms ± 1% 99ms ± 2% -9.29% (p=0.008 n=5+5) Read/sample_snappy_66_kB-4 86.6ms ± 1% 77.2ms ± 6% -10.92% (p=0.008 n=5+5) Read/sample_flate_66_kB-4 236ms ± 8% 214ms ± 3% -9.11% (p=0.008 n=5+5) Read/sample_none_262_kB-4 7.70ms ± 2% 6.34ms ± 2% -17.67% (p=0.008 n=5+5) Read/sample_lz4-64k_262_kB-4 94.7ms ± 1% 84.4ms ± 2% -10.90% (p=0.008 n=5+5) Read/sample_lz4-256k_262_kB-4 98.4ms ± 1% 87.7ms ± 2% -10.93% (p=0.008 n=5+5) Read/sample_lz4-1M_262_kB-4 101ms ± 2% 90ms ± 2% -11.19% (p=0.008 n=5+5) Read/sample_lz4_262_kB-4 100ms ± 2% 90ms ± 3% -10.78% (p=0.008 n=5+5) Read/sample_snappy_262_kB-4 77.9ms ± 2% 68.7ms ± 2% -11.84% (p=0.008 n=5+5) Read/sample_flate_262_kB-4 211ms ± 2% 190ms ± 2% -9.86% (p=0.008 n=5+5) Read/sample_none_524_kB-4 7.82ms ± 2% 6.43ms ± 2% -17.80% (p=0.008 n=5+5) Read/sample_gzip_524_kB-4 176ms ± 1% 164ms ± 2% -6.80% (p=0.008 n=5+5) Read/sample_lz4-64k_524_kB-4 77.7ms ± 1% 69.8ms ± 4% -10.10% (p=0.008 n=5+5) Read/sample_lz4-256k_524_kB-4 81.7ms ± 2% 72.1ms ± 3% -11.79% (p=0.008 n=5+5) Read/sample_lz4-1M_524_kB-4 83.8ms ± 1% 74.4ms ± 1% -11.21% (p=0.008 n=5+5) Read/sample_lz4_524_kB-4 83.2ms ± 0% 76.1ms ± 8% -8.53% (p=0.016 n=4+5) Read/sample_snappy_524_kB-4 63.6ms ± 2% 56.5ms ± 2% -11.23% (p=0.008 n=5+5) Read/sample_flate_524_kB-4 172ms ± 2% 157ms ± 1% -8.88% (p=0.008 n=5+5) name old speed new speed delta Write/ordered-lz4-256k-4 687MB/s ± 6% 749MB/s ± 2% +9.09% (p=0.008 n=5+5) Write/unordered-lz4-4 313MB/s ±24% 341MB/s ± 1% +9.23% (p=0.016 n=5+4) Read/none_66_kB-4 1.82GB/s ± 6% 2.97GB/s ± 2% +63.42% (p=0.008 n=5+5) Read/gzip_66_kB-4 669MB/s ± 4% 736MB/s ± 4% +10.00% (p=0.008 n=5+5) Read/lz4-64k_66_kB-4 1.42GB/s ± 2% 1.71GB/s ± 3% +20.21% (p=0.008 n=5+5) Read/lz4-256k_66_kB-4 1.43GB/s ± 0% 1.76GB/s ± 1% +22.83% (p=0.016 n=4+5) Read/lz4-1M_66_kB-4 1.43GB/s ± 2% 1.74GB/s ± 1% +21.99% (p=0.008 n=5+5) Read/lz4_66_kB-4 1.41GB/s ± 4% 1.76GB/s ± 1% +24.44% (p=0.008 n=5+5) Read/snappy_66_kB-4 1.29GB/s ± 2% 1.57GB/s ± 1% +21.86% (p=0.016 n=5+4) Read/flate_66_kB-4 710MB/s ± 4% 793MB/s ± 2% +11.70% (p=0.008 n=5+5) Read/zstd_66_kB-4 425MB/s ±14% 517MB/s ± 6% +21.74% (p=0.008 n=5+5) Read/none_262_kB-4 1.93GB/s ± 1% 3.06GB/s ± 2% +58.54% (p=0.008 n=5+5) Read/gzip_262_kB-4 701MB/s ± 5% 779MB/s ± 2% +11.20% (p=0.008 n=5+5) Read/lz4-64k_262_kB-4 1.47GB/s ± 1% 1.82GB/s ± 3% +24.03% (p=0.008 n=5+5) Read/lz4-256k_262_kB-4 1.48GB/s ± 1% 1.82GB/s ± 2% +23.09% (p=0.008 n=5+5) Read/lz4-1M_262_kB-4 1.46GB/s ± 2% 1.84GB/s ± 2% +26.44% (p=0.008 n=5+5) Read/lz4_262_kB-4 1.48GB/s ± 2% 1.85GB/s ± 2% +25.42% (p=0.008 n=5+5) Read/snappy_262_kB-4 1.31GB/s ± 2% 1.61GB/s ± 1% +22.80% (p=0.008 n=5+5) Read/flate_262_kB-4 729MB/s ± 3% 831MB/s ± 2% +14.05% (p=0.008 n=5+5) Read/none_524_kB-4 1.97GB/s ± 2% 3.08GB/s ± 2% +56.33% (p=0.008 n=5+5) Read/gzip_524_kB-4 714MB/s ± 2% 773MB/s ± 2% +8.27% (p=0.008 n=5+5) Read/lz4-64k_524_kB-4 1.47GB/s ± 3% 1.88GB/s ± 1% +28.48% (p=0.008 n=5+5) Read/lz4-256k_524_kB-4 1.50GB/s ± 1% 1.86GB/s ± 5% +23.63% (p=0.008 n=5+5) Read/lz4-1M_524_kB-4 1.49GB/s ± 1% 1.83GB/s ± 2% +22.51% (p=0.008 n=5+5) Read/lz4_524_kB-4 1.49GB/s ± 1% 1.86GB/s ± 1% +24.53% (p=0.008 n=5+5) Read/snappy_524_kB-4 1.34GB/s ± 1% 1.60GB/s ± 3% +19.75% (p=0.008 n=5+5) Read/flate_524_kB-4 750MB/s ± 1% 824MB/s ± 3% +9.97% (p=0.008 n=5+5) Read/sample_none_66_kB-4 967MB/s ± 1% 1142MB/s ± 6% +18.13% (p=0.008 n=5+5) Read/sample_gzip_66_kB-4 501MB/s ± 2% 542MB/s ± 1% +8.19% (p=0.008 n=5+5) Read/sample_lz4-64k_66_kB-4 811MB/s ± 2% 922MB/s ± 1% +13.66% (p=0.008 n=5+5) Read/sample_lz4-256k_66_kB-4 825MB/s ± 1% 913MB/s ± 2% +10.68% (p=0.008 n=5+5) Read/sample_lz4-1M_66_kB-4 795MB/s ± 7% 920MB/s ± 1% +15.67% (p=0.008 n=5+5) Read/sample_lz4_66_kB-4 824MB/s ± 1% 909MB/s ± 2% +10.25% (p=0.008 n=5+5) Read/sample_snappy_66_kB-4 759MB/s ± 1% 852MB/s ± 6% +12.37% (p=0.008 n=5+5) Read/sample_flate_66_kB-4 506MB/s ± 7% 556MB/s ± 4% +9.88% (p=0.008 n=5+5) Read/sample_none_262_kB-4 983MB/s ± 2% 1194MB/s ± 2% +21.47% (p=0.008 n=5+5) Read/sample_lz4-64k_262_kB-4 830MB/s ± 1% 932MB/s ± 2% +12.24% (p=0.008 n=5+5) Read/sample_lz4-256k_262_kB-4 839MB/s ± 1% 942MB/s ± 2% +12.28% (p=0.008 n=5+5) Read/sample_lz4-1M_262_kB-4 829MB/s ± 2% 934MB/s ± 2% +12.60% (p=0.008 n=5+5) Read/sample_lz4_262_kB-4 836MB/s ± 2% 937MB/s ± 3% +12.11% (p=0.008 n=5+5) Read/sample_snappy_262_kB-4 774MB/s ± 2% 878MB/s ± 2% +13.44% (p=0.008 n=5+5) Read/sample_flate_262_kB-4 529MB/s ± 2% 587MB/s ± 2% +10.94% (p=0.008 n=5+5) Read/sample_none_524_kB-4 970MB/s ± 2% 1180MB/s ± 2% +21.64% (p=0.008 n=5+5) Read/sample_gzip_524_kB-4 521MB/s ± 1% 559MB/s ± 2% +7.31% (p=0.008 n=5+5) Read/sample_lz4-64k_524_kB-4 842MB/s ± 1% 937MB/s ± 4% +11.30% (p=0.008 n=5+5) Read/sample_lz4-256k_524_kB-4 834MB/s ± 2% 945MB/s ± 3% +13.37% (p=0.008 n=5+5) Read/sample_lz4-1M_524_kB-4 842MB/s ± 1% 949MB/s ± 1% +12.63% (p=0.008 n=5+5) Read/sample_lz4_524_kB-4 849MB/s ± 0% 929MB/s ± 7% +9.53% (p=0.016 n=4+5) Read/sample_snappy_524_kB-4 783MB/s ± 2% 882MB/s ± 2% +12.65% (p=0.008 n=5+5) Read/sample_flate_524_kB-4 534MB/s ± 2% 585MB/s ± 1% +9.73% (p=0.008 n=5+5) name old alloc/op new alloc/op delta Write/unordered-lz4-4 18.8MB ± 4% 17.2MB ± 7% -8.28% (p=0.008 n=5+5) Read/none_66_kB-4 41.9kB ± 0% 43.7kB ± 0% +4.41% (p=0.008 n=5+5) Read/gzip_66_kB-4 713kB ± 0% 741kB ± 0% +3.85% (p=0.016 n=5+4) Read/lz4-64k_66_kB-4 730kB ± 1% 748kB ± 0% +2.53% (p=0.016 n=5+4) Read/lz4-256k_66_kB-4 770kB ± 0% 806kB ± 3% +4.70% (p=0.016 n=4+5) Read/lz4-1M_66_kB-4 770kB ± 0% 812kB ± 4% +5.43% (p=0.016 n=4+5) Read/lz4_66_kB-4 770kB ± 0% 792kB ± 0% +2.83% (p=0.029 n=4+4) Read/snappy_66_kB-4 361kB ± 0% 376kB ± 0% +4.02% (p=0.008 n=5+5) Read/flate_66_kB-4 715kB ± 0% 743kB ± 0% +3.88% (p=0.016 n=5+4) Read/none_262_kB-4 11.2kB ± 0% 11.7kB ± 0% +4.12% (p=0.016 n=5+4) Read/gzip_262_kB-4 198kB ± 0% 203kB ± 0% +2.93% (p=0.016 n=4+5) Read/lz4-64k_262_kB-4 169kB ± 0% 174kB ± 0% +2.75% (p=0.008 n=5+5) Read/lz4-256k_262_kB-4 177kB ± 0% 182kB ± 0% +2.79% (p=0.008 n=5+5) Read/lz4-1M_262_kB-4 181kB ± 0% 186kB ± 0% +2.77% (p=0.008 n=5+5) Read/lz4_262_kB-4 181kB ± 0% 186kB ± 0% +2.73% (p=0.016 n=5+4) Read/snappy_262_kB-4 87.8kB ± 0% 90.1kB ± 0% +2.58% (p=0.008 n=5+5) Read/flate_262_kB-4 197kB ± 0% 203kB ± 0% +2.63% (p=0.016 n=5+4) Read/zstd_262_kB-4 553MB ± 0% 552MB ± 0% -0.15% (p=0.032 n=5+5) Read/none_524_kB-4 5.94kB ± 0% 6.16kB ± 0% +3.76% (p=0.016 n=5+4) Read/gzip_524_kB-4 96.2kB ± 0% 98.3kB ± 0% +2.16% (p=0.016 n=4+5) Read/lz4-64k_524_kB-4 70.9kB ± 0% 72.8kB ± 0% +2.71% (p=0.016 n=4+5) Read/lz4-256k_524_kB-4 73.8kB ± 0% 75.8kB ± 0% +2.68% (p=0.008 n=5+5) Read/lz4-1M_524_kB-4 76.5kB ± 0% 78.7kB ± 0% +2.77% (p=0.016 n=4+5) Read/lz4_524_kB-4 76.5kB ± 0% 78.6kB ± 0% +2.74% (p=0.016 n=5+4) Read/snappy_524_kB-4 39.1kB ± 0% 39.5kB ± 0% +1.19% (p=0.008 n=5+5) Read/flate_524_kB-4 95.4kB ± 0% 97.3kB ± 0% +2.06% (p=0.008 n=5+5) Read/sample_none_66_kB-4 39.9kB ± 0% 41.7kB ± 0% +4.62% (p=0.008 n=5+5) Read/sample_gzip_66_kB-4 686kB ± 0% 715kB ± 0% +4.23% (p=0.016 n=4+5) Read/sample_lz4-64k_66_kB-4 707kB ± 0% 728kB ± 0% +2.88% (p=0.016 n=5+4) Read/sample_lz4-256k_66_kB-4 749kB ± 0% 783kB ± 4% +4.65% (p=0.008 n=5+5) Read/sample_lz4_66_kB-4 748kB ± 0% 770kB ± 0% +2.91% (p=0.016 n=5+4) Read/sample_snappy_66_kB-4 350kB ± 0% 364kB ± 0% +4.04% (p=0.029 n=4+4) Read/sample_flate_66_kB-4 688kB ± 0% 716kB ± 0% +4.08% (p=0.029 n=4+4) Read/sample_none_262_kB-4 10.6kB ± 0% 11.0kB ± 0% +4.23% (p=0.016 n=5+4) Read/sample_gzip_262_kB-4 194kB ± 0% 199kB ± 0% +2.48% (p=0.016 n=4+5) Read/sample_lz4-64k_262_kB-4 165kB ± 0% 169kB ± 0% +2.86% (p=0.016 n=4+5) Read/sample_lz4-256k_262_kB-4 172kB ± 0% 177kB ± 0% +2.92% (p=0.016 n=4+5) Read/sample_lz4-1M_262_kB-4 176kB ± 0% 181kB ± 0% +2.91% (p=0.029 n=4+4) Read/sample_lz4_262_kB-4 176kB ± 0% 181kB ± 0% +2.81% (p=0.016 n=5+4) Read/sample_snappy_262_kB-4 88.1kB ± 0% 90.8kB ± 0% +3.11% (p=0.029 n=4+4) Read/sample_flate_262_kB-4 194kB ± 0% 198kB ± 0% +2.13% (p=0.029 n=4+4) Read/sample_none_524_kB-4 5.56kB ± 0% 5.76kB ± 0% +3.65% (p=0.016 n=5+4) Read/sample_gzip_524_kB-4 95.4kB ± 0% 96.9kB ± 0% +1.56% (p=0.008 n=5+5) Read/sample_lz4-64k_524_kB-4 69.0kB ± 0% 71.0kB ± 0% +2.87% (p=0.008 n=5+5) Read/sample_lz4-256k_524_kB-4 71.8kB ± 0% 73.8kB ± 0% +2.79% (p=0.016 n=4+5) Read/sample_lz4-1M_524_kB-4 74.5kB ± 0% 76.6kB ± 0% +2.75% (p=0.008 n=5+5) Read/sample_lz4_524_kB-4 74.5kB ± 0% 188.4kB ±89% +152.98% (p=0.008 n=5+5) Read/sample_snappy_524_kB-4 40.7kB ± 1% 41.3kB ± 1% +1.58% (p=0.008 n=5+5) Read/sample_flate_524_kB-4 95.2kB ± 0% 96.2kB ± 0% +0.95% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Read/gzip_66_kB-4 13.2k ± 0% 13.2k ± 0% -0.04% (p=0.000 n=5+4) Read/zstd_66_kB-4 18.0k ± 1% 17.8k ± 1% -1.06% (p=0.032 n=5+5) Read/gzip_262_kB-4 4.61k ± 0% 4.61k ± 0% -0.11% (p=0.029 n=4+4) Read/flate_262_kB-4 4.61k ± 0% 4.60k ± 0% -0.11% (p=0.008 n=5+5) Read/zstd_262_kB-4 7.10k ± 2% 6.88k ± 3% -3.01% (p=0.008 n=5+5) Read/gzip_524_kB-4 2.62k ± 0% 2.62k ± 0% -0.06% (p=0.000 n=4+5) Read/zstd_524_kB-4 6.13k ± 2% 5.94k ± 2% -3.03% (p=0.024 n=5+5) Read/sample_flate_66_kB-4 13.2k ± 0% 13.2k ± 0% -0.01% (p=0.029 n=4+4) Read/sample_flate_262_kB-4 4.62k ± 0% 4.61k ± 0% -0.22% (p=0.029 n=4+4) Read/sample_zstd_262_kB-4 7.12k ± 1% 6.97k ± 1% -2.11% (p=0.008 n=5+5) Read/sample_gzip_524_kB-4 2.63k ± 0% 2.63k ± 0% -0.11% (p=0.008 n=5+5) Read/sample_flate_524_kB-4 2.63k ± 0% 2.62k ± 0% -0.15% (p=0.008 n=5+5) Read/sample_zstd_524_kB-4 6.11k ± 2% 5.89k ± 1% -3.68% (p=0.016 n=5+4) ``` Co-authored-by: Owen Diehl <ow.diehl@gmail.com>
3 years ago
func TestCorruptChunk(t *testing.T) {
for _, enc := range testEncoding {
enc := enc
chunks: decode varints directly from byte buffer; stop panicing on some corrupt inputs (#7264) **What this PR does / why we need it**: This is much faster, since we don't pay for byte-at-a-time reads from the buffer. For best performance with gzip and lz4, we still want a bufio.Reader. We don't need the pool of buffered readers now. Other compression formats have their own internal buffers so don't need extra buffering. **Checklist** - [x] Reviewed the `CONTRIBUTING.md` guide - NA Documentation added - NA Tests updated - [x] `CHANGELOG.md` updated - NA Changes that require user attention or interaction to upgrade are documented in `docs/sources/upgrading/_index.md` (using changes from https://github.com/grafana/loki/pull/7246, ignoring lines that didn't change much) ``` name old time/op new time/op delta Write/ordered-lz4-256k-4 24.1ms ± 7% 22.1ms ± 2% -8.52% (p=0.008 n=5+5) Write/unordered-lz4-4 54.6ms ±29% 49.1ms ± 1% -10.04% (p=0.016 n=5+4) Read/none_66_kB-4 4.16ms ± 6% 2.54ms ± 2% -38.87% (p=0.008 n=5+5) Read/gzip_66_kB-4 177ms ± 4% 161ms ± 4% -9.09% (p=0.008 n=5+5) Read/lz4-64k_66_kB-4 59.5ms ± 2% 49.5ms ± 3% -16.80% (p=0.008 n=5+5) Read/lz4-256k_66_kB-4 62.9ms ± 0% 51.2ms ± 1% -18.58% (p=0.016 n=4+5) Read/lz4-1M_66_kB-4 62.9ms ± 2% 51.5ms ± 1% -18.03% (p=0.008 n=5+5) Read/lz4_66_kB-4 63.7ms ± 4% 51.2ms ± 1% -19.68% (p=0.008 n=5+5) Read/snappy_66_kB-4 51.1ms ± 2% 41.9ms ± 1% -17.95% (p=0.016 n=5+4) Read/flate_66_kB-4 168ms ± 4% 150ms ± 2% -10.49% (p=0.008 n=5+5) Read/zstd_66_kB-4 334ms ±16% 272ms ± 6% -18.52% (p=0.008 n=5+5) Read/none_262_kB-4 3.93ms ± 1% 2.48ms ± 2% -36.91% (p=0.008 n=5+5) Read/gzip_262_kB-4 159ms ± 5% 143ms ± 2% -10.12% (p=0.008 n=5+5) Read/lz4-64k_262_kB-4 53.5ms ± 1% 43.2ms ± 3% -19.35% (p=0.008 n=5+5) Read/lz4-256k_262_kB-4 55.8ms ± 1% 45.3ms ± 2% -18.75% (p=0.008 n=5+5) Read/lz4-1M_262_kB-4 57.5ms ± 2% 45.5ms ± 2% -20.91% (p=0.008 n=5+5) Read/lz4_262_kB-4 56.8ms ± 2% 45.3ms ± 2% -20.26% (p=0.008 n=5+5) Read/snappy_262_kB-4 46.0ms ± 2% 37.5ms ± 1% -18.57% (p=0.008 n=5+5) Read/flate_262_kB-4 153ms ± 3% 134ms ± 2% -12.33% (p=0.008 n=5+5) Read/none_524_kB-4 3.85ms ± 2% 2.46ms ± 2% -36.04% (p=0.008 n=5+5) Read/gzip_524_kB-4 128ms ± 2% 119ms ± 2% -7.64% (p=0.008 n=5+5) Read/lz4-64k_524_kB-4 44.6ms ± 3% 34.7ms ± 1% -22.19% (p=0.008 n=5+5) Read/lz4-256k_524_kB-4 45.4ms ± 1% 36.7ms ± 5% -19.07% (p=0.008 n=5+5) Read/lz4-1M_524_kB-4 47.3ms ± 1% 38.6ms ± 2% -18.36% (p=0.008 n=5+5) Read/lz4_524_kB-4 47.4ms ± 1% 38.0ms ± 1% -19.70% (p=0.008 n=5+5) Read/snappy_524_kB-4 37.3ms ± 1% 31.1ms ± 3% -16.47% (p=0.008 n=5+5) Read/flate_524_kB-4 122ms ± 1% 111ms ± 3% -9.04% (p=0.008 n=5+5) Read/sample_none_66_kB-4 7.81ms ± 1% 6.62ms ± 6% -15.24% (p=0.008 n=5+5) Read/sample_gzip_66_kB-4 237ms ± 2% 219ms ± 1% -7.57% (p=0.008 n=5+5) Read/sample_lz4-64k_66_kB-4 105ms ± 2% 92ms ± 1% -12.03% (p=0.008 n=5+5) Read/sample_lz4-256k_66_kB-4 109ms ± 1% 98ms ± 2% -9.65% (p=0.008 n=5+5) Read/sample_lz4-1M_66_kB-4 113ms ± 8% 98ms ± 2% -13.66% (p=0.008 n=5+5) Read/sample_lz4_66_kB-4 109ms ± 1% 99ms ± 2% -9.29% (p=0.008 n=5+5) Read/sample_snappy_66_kB-4 86.6ms ± 1% 77.2ms ± 6% -10.92% (p=0.008 n=5+5) Read/sample_flate_66_kB-4 236ms ± 8% 214ms ± 3% -9.11% (p=0.008 n=5+5) Read/sample_none_262_kB-4 7.70ms ± 2% 6.34ms ± 2% -17.67% (p=0.008 n=5+5) Read/sample_lz4-64k_262_kB-4 94.7ms ± 1% 84.4ms ± 2% -10.90% (p=0.008 n=5+5) Read/sample_lz4-256k_262_kB-4 98.4ms ± 1% 87.7ms ± 2% -10.93% (p=0.008 n=5+5) Read/sample_lz4-1M_262_kB-4 101ms ± 2% 90ms ± 2% -11.19% (p=0.008 n=5+5) Read/sample_lz4_262_kB-4 100ms ± 2% 90ms ± 3% -10.78% (p=0.008 n=5+5) Read/sample_snappy_262_kB-4 77.9ms ± 2% 68.7ms ± 2% -11.84% (p=0.008 n=5+5) Read/sample_flate_262_kB-4 211ms ± 2% 190ms ± 2% -9.86% (p=0.008 n=5+5) Read/sample_none_524_kB-4 7.82ms ± 2% 6.43ms ± 2% -17.80% (p=0.008 n=5+5) Read/sample_gzip_524_kB-4 176ms ± 1% 164ms ± 2% -6.80% (p=0.008 n=5+5) Read/sample_lz4-64k_524_kB-4 77.7ms ± 1% 69.8ms ± 4% -10.10% (p=0.008 n=5+5) Read/sample_lz4-256k_524_kB-4 81.7ms ± 2% 72.1ms ± 3% -11.79% (p=0.008 n=5+5) Read/sample_lz4-1M_524_kB-4 83.8ms ± 1% 74.4ms ± 1% -11.21% (p=0.008 n=5+5) Read/sample_lz4_524_kB-4 83.2ms ± 0% 76.1ms ± 8% -8.53% (p=0.016 n=4+5) Read/sample_snappy_524_kB-4 63.6ms ± 2% 56.5ms ± 2% -11.23% (p=0.008 n=5+5) Read/sample_flate_524_kB-4 172ms ± 2% 157ms ± 1% -8.88% (p=0.008 n=5+5) name old speed new speed delta Write/ordered-lz4-256k-4 687MB/s ± 6% 749MB/s ± 2% +9.09% (p=0.008 n=5+5) Write/unordered-lz4-4 313MB/s ±24% 341MB/s ± 1% +9.23% (p=0.016 n=5+4) Read/none_66_kB-4 1.82GB/s ± 6% 2.97GB/s ± 2% +63.42% (p=0.008 n=5+5) Read/gzip_66_kB-4 669MB/s ± 4% 736MB/s ± 4% +10.00% (p=0.008 n=5+5) Read/lz4-64k_66_kB-4 1.42GB/s ± 2% 1.71GB/s ± 3% +20.21% (p=0.008 n=5+5) Read/lz4-256k_66_kB-4 1.43GB/s ± 0% 1.76GB/s ± 1% +22.83% (p=0.016 n=4+5) Read/lz4-1M_66_kB-4 1.43GB/s ± 2% 1.74GB/s ± 1% +21.99% (p=0.008 n=5+5) Read/lz4_66_kB-4 1.41GB/s ± 4% 1.76GB/s ± 1% +24.44% (p=0.008 n=5+5) Read/snappy_66_kB-4 1.29GB/s ± 2% 1.57GB/s ± 1% +21.86% (p=0.016 n=5+4) Read/flate_66_kB-4 710MB/s ± 4% 793MB/s ± 2% +11.70% (p=0.008 n=5+5) Read/zstd_66_kB-4 425MB/s ±14% 517MB/s ± 6% +21.74% (p=0.008 n=5+5) Read/none_262_kB-4 1.93GB/s ± 1% 3.06GB/s ± 2% +58.54% (p=0.008 n=5+5) Read/gzip_262_kB-4 701MB/s ± 5% 779MB/s ± 2% +11.20% (p=0.008 n=5+5) Read/lz4-64k_262_kB-4 1.47GB/s ± 1% 1.82GB/s ± 3% +24.03% (p=0.008 n=5+5) Read/lz4-256k_262_kB-4 1.48GB/s ± 1% 1.82GB/s ± 2% +23.09% (p=0.008 n=5+5) Read/lz4-1M_262_kB-4 1.46GB/s ± 2% 1.84GB/s ± 2% +26.44% (p=0.008 n=5+5) Read/lz4_262_kB-4 1.48GB/s ± 2% 1.85GB/s ± 2% +25.42% (p=0.008 n=5+5) Read/snappy_262_kB-4 1.31GB/s ± 2% 1.61GB/s ± 1% +22.80% (p=0.008 n=5+5) Read/flate_262_kB-4 729MB/s ± 3% 831MB/s ± 2% +14.05% (p=0.008 n=5+5) Read/none_524_kB-4 1.97GB/s ± 2% 3.08GB/s ± 2% +56.33% (p=0.008 n=5+5) Read/gzip_524_kB-4 714MB/s ± 2% 773MB/s ± 2% +8.27% (p=0.008 n=5+5) Read/lz4-64k_524_kB-4 1.47GB/s ± 3% 1.88GB/s ± 1% +28.48% (p=0.008 n=5+5) Read/lz4-256k_524_kB-4 1.50GB/s ± 1% 1.86GB/s ± 5% +23.63% (p=0.008 n=5+5) Read/lz4-1M_524_kB-4 1.49GB/s ± 1% 1.83GB/s ± 2% +22.51% (p=0.008 n=5+5) Read/lz4_524_kB-4 1.49GB/s ± 1% 1.86GB/s ± 1% +24.53% (p=0.008 n=5+5) Read/snappy_524_kB-4 1.34GB/s ± 1% 1.60GB/s ± 3% +19.75% (p=0.008 n=5+5) Read/flate_524_kB-4 750MB/s ± 1% 824MB/s ± 3% +9.97% (p=0.008 n=5+5) Read/sample_none_66_kB-4 967MB/s ± 1% 1142MB/s ± 6% +18.13% (p=0.008 n=5+5) Read/sample_gzip_66_kB-4 501MB/s ± 2% 542MB/s ± 1% +8.19% (p=0.008 n=5+5) Read/sample_lz4-64k_66_kB-4 811MB/s ± 2% 922MB/s ± 1% +13.66% (p=0.008 n=5+5) Read/sample_lz4-256k_66_kB-4 825MB/s ± 1% 913MB/s ± 2% +10.68% (p=0.008 n=5+5) Read/sample_lz4-1M_66_kB-4 795MB/s ± 7% 920MB/s ± 1% +15.67% (p=0.008 n=5+5) Read/sample_lz4_66_kB-4 824MB/s ± 1% 909MB/s ± 2% +10.25% (p=0.008 n=5+5) Read/sample_snappy_66_kB-4 759MB/s ± 1% 852MB/s ± 6% +12.37% (p=0.008 n=5+5) Read/sample_flate_66_kB-4 506MB/s ± 7% 556MB/s ± 4% +9.88% (p=0.008 n=5+5) Read/sample_none_262_kB-4 983MB/s ± 2% 1194MB/s ± 2% +21.47% (p=0.008 n=5+5) Read/sample_lz4-64k_262_kB-4 830MB/s ± 1% 932MB/s ± 2% +12.24% (p=0.008 n=5+5) Read/sample_lz4-256k_262_kB-4 839MB/s ± 1% 942MB/s ± 2% +12.28% (p=0.008 n=5+5) Read/sample_lz4-1M_262_kB-4 829MB/s ± 2% 934MB/s ± 2% +12.60% (p=0.008 n=5+5) Read/sample_lz4_262_kB-4 836MB/s ± 2% 937MB/s ± 3% +12.11% (p=0.008 n=5+5) Read/sample_snappy_262_kB-4 774MB/s ± 2% 878MB/s ± 2% +13.44% (p=0.008 n=5+5) Read/sample_flate_262_kB-4 529MB/s ± 2% 587MB/s ± 2% +10.94% (p=0.008 n=5+5) Read/sample_none_524_kB-4 970MB/s ± 2% 1180MB/s ± 2% +21.64% (p=0.008 n=5+5) Read/sample_gzip_524_kB-4 521MB/s ± 1% 559MB/s ± 2% +7.31% (p=0.008 n=5+5) Read/sample_lz4-64k_524_kB-4 842MB/s ± 1% 937MB/s ± 4% +11.30% (p=0.008 n=5+5) Read/sample_lz4-256k_524_kB-4 834MB/s ± 2% 945MB/s ± 3% +13.37% (p=0.008 n=5+5) Read/sample_lz4-1M_524_kB-4 842MB/s ± 1% 949MB/s ± 1% +12.63% (p=0.008 n=5+5) Read/sample_lz4_524_kB-4 849MB/s ± 0% 929MB/s ± 7% +9.53% (p=0.016 n=4+5) Read/sample_snappy_524_kB-4 783MB/s ± 2% 882MB/s ± 2% +12.65% (p=0.008 n=5+5) Read/sample_flate_524_kB-4 534MB/s ± 2% 585MB/s ± 1% +9.73% (p=0.008 n=5+5) name old alloc/op new alloc/op delta Write/unordered-lz4-4 18.8MB ± 4% 17.2MB ± 7% -8.28% (p=0.008 n=5+5) Read/none_66_kB-4 41.9kB ± 0% 43.7kB ± 0% +4.41% (p=0.008 n=5+5) Read/gzip_66_kB-4 713kB ± 0% 741kB ± 0% +3.85% (p=0.016 n=5+4) Read/lz4-64k_66_kB-4 730kB ± 1% 748kB ± 0% +2.53% (p=0.016 n=5+4) Read/lz4-256k_66_kB-4 770kB ± 0% 806kB ± 3% +4.70% (p=0.016 n=4+5) Read/lz4-1M_66_kB-4 770kB ± 0% 812kB ± 4% +5.43% (p=0.016 n=4+5) Read/lz4_66_kB-4 770kB ± 0% 792kB ± 0% +2.83% (p=0.029 n=4+4) Read/snappy_66_kB-4 361kB ± 0% 376kB ± 0% +4.02% (p=0.008 n=5+5) Read/flate_66_kB-4 715kB ± 0% 743kB ± 0% +3.88% (p=0.016 n=5+4) Read/none_262_kB-4 11.2kB ± 0% 11.7kB ± 0% +4.12% (p=0.016 n=5+4) Read/gzip_262_kB-4 198kB ± 0% 203kB ± 0% +2.93% (p=0.016 n=4+5) Read/lz4-64k_262_kB-4 169kB ± 0% 174kB ± 0% +2.75% (p=0.008 n=5+5) Read/lz4-256k_262_kB-4 177kB ± 0% 182kB ± 0% +2.79% (p=0.008 n=5+5) Read/lz4-1M_262_kB-4 181kB ± 0% 186kB ± 0% +2.77% (p=0.008 n=5+5) Read/lz4_262_kB-4 181kB ± 0% 186kB ± 0% +2.73% (p=0.016 n=5+4) Read/snappy_262_kB-4 87.8kB ± 0% 90.1kB ± 0% +2.58% (p=0.008 n=5+5) Read/flate_262_kB-4 197kB ± 0% 203kB ± 0% +2.63% (p=0.016 n=5+4) Read/zstd_262_kB-4 553MB ± 0% 552MB ± 0% -0.15% (p=0.032 n=5+5) Read/none_524_kB-4 5.94kB ± 0% 6.16kB ± 0% +3.76% (p=0.016 n=5+4) Read/gzip_524_kB-4 96.2kB ± 0% 98.3kB ± 0% +2.16% (p=0.016 n=4+5) Read/lz4-64k_524_kB-4 70.9kB ± 0% 72.8kB ± 0% +2.71% (p=0.016 n=4+5) Read/lz4-256k_524_kB-4 73.8kB ± 0% 75.8kB ± 0% +2.68% (p=0.008 n=5+5) Read/lz4-1M_524_kB-4 76.5kB ± 0% 78.7kB ± 0% +2.77% (p=0.016 n=4+5) Read/lz4_524_kB-4 76.5kB ± 0% 78.6kB ± 0% +2.74% (p=0.016 n=5+4) Read/snappy_524_kB-4 39.1kB ± 0% 39.5kB ± 0% +1.19% (p=0.008 n=5+5) Read/flate_524_kB-4 95.4kB ± 0% 97.3kB ± 0% +2.06% (p=0.008 n=5+5) Read/sample_none_66_kB-4 39.9kB ± 0% 41.7kB ± 0% +4.62% (p=0.008 n=5+5) Read/sample_gzip_66_kB-4 686kB ± 0% 715kB ± 0% +4.23% (p=0.016 n=4+5) Read/sample_lz4-64k_66_kB-4 707kB ± 0% 728kB ± 0% +2.88% (p=0.016 n=5+4) Read/sample_lz4-256k_66_kB-4 749kB ± 0% 783kB ± 4% +4.65% (p=0.008 n=5+5) Read/sample_lz4_66_kB-4 748kB ± 0% 770kB ± 0% +2.91% (p=0.016 n=5+4) Read/sample_snappy_66_kB-4 350kB ± 0% 364kB ± 0% +4.04% (p=0.029 n=4+4) Read/sample_flate_66_kB-4 688kB ± 0% 716kB ± 0% +4.08% (p=0.029 n=4+4) Read/sample_none_262_kB-4 10.6kB ± 0% 11.0kB ± 0% +4.23% (p=0.016 n=5+4) Read/sample_gzip_262_kB-4 194kB ± 0% 199kB ± 0% +2.48% (p=0.016 n=4+5) Read/sample_lz4-64k_262_kB-4 165kB ± 0% 169kB ± 0% +2.86% (p=0.016 n=4+5) Read/sample_lz4-256k_262_kB-4 172kB ± 0% 177kB ± 0% +2.92% (p=0.016 n=4+5) Read/sample_lz4-1M_262_kB-4 176kB ± 0% 181kB ± 0% +2.91% (p=0.029 n=4+4) Read/sample_lz4_262_kB-4 176kB ± 0% 181kB ± 0% +2.81% (p=0.016 n=5+4) Read/sample_snappy_262_kB-4 88.1kB ± 0% 90.8kB ± 0% +3.11% (p=0.029 n=4+4) Read/sample_flate_262_kB-4 194kB ± 0% 198kB ± 0% +2.13% (p=0.029 n=4+4) Read/sample_none_524_kB-4 5.56kB ± 0% 5.76kB ± 0% +3.65% (p=0.016 n=5+4) Read/sample_gzip_524_kB-4 95.4kB ± 0% 96.9kB ± 0% +1.56% (p=0.008 n=5+5) Read/sample_lz4-64k_524_kB-4 69.0kB ± 0% 71.0kB ± 0% +2.87% (p=0.008 n=5+5) Read/sample_lz4-256k_524_kB-4 71.8kB ± 0% 73.8kB ± 0% +2.79% (p=0.016 n=4+5) Read/sample_lz4-1M_524_kB-4 74.5kB ± 0% 76.6kB ± 0% +2.75% (p=0.008 n=5+5) Read/sample_lz4_524_kB-4 74.5kB ± 0% 188.4kB ±89% +152.98% (p=0.008 n=5+5) Read/sample_snappy_524_kB-4 40.7kB ± 1% 41.3kB ± 1% +1.58% (p=0.008 n=5+5) Read/sample_flate_524_kB-4 95.2kB ± 0% 96.2kB ± 0% +0.95% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Read/gzip_66_kB-4 13.2k ± 0% 13.2k ± 0% -0.04% (p=0.000 n=5+4) Read/zstd_66_kB-4 18.0k ± 1% 17.8k ± 1% -1.06% (p=0.032 n=5+5) Read/gzip_262_kB-4 4.61k ± 0% 4.61k ± 0% -0.11% (p=0.029 n=4+4) Read/flate_262_kB-4 4.61k ± 0% 4.60k ± 0% -0.11% (p=0.008 n=5+5) Read/zstd_262_kB-4 7.10k ± 2% 6.88k ± 3% -3.01% (p=0.008 n=5+5) Read/gzip_524_kB-4 2.62k ± 0% 2.62k ± 0% -0.06% (p=0.000 n=4+5) Read/zstd_524_kB-4 6.13k ± 2% 5.94k ± 2% -3.03% (p=0.024 n=5+5) Read/sample_flate_66_kB-4 13.2k ± 0% 13.2k ± 0% -0.01% (p=0.029 n=4+4) Read/sample_flate_262_kB-4 4.62k ± 0% 4.61k ± 0% -0.22% (p=0.029 n=4+4) Read/sample_zstd_262_kB-4 7.12k ± 1% 6.97k ± 1% -2.11% (p=0.008 n=5+5) Read/sample_gzip_524_kB-4 2.63k ± 0% 2.63k ± 0% -0.11% (p=0.008 n=5+5) Read/sample_flate_524_kB-4 2.63k ± 0% 2.62k ± 0% -0.15% (p=0.008 n=5+5) Read/sample_zstd_524_kB-4 6.11k ± 2% 5.89k ± 1% -3.68% (p=0.016 n=5+4) ``` Co-authored-by: Owen Diehl <ow.diehl@gmail.com>
3 years ago
t.Run(enc.String(), func(t *testing.T) {
t.Parallel()
chk := NewMemChunk(enc, DefaultTestHeadBlockFmt, testBlockSize, testTargetSize)
chunks: decode varints directly from byte buffer; stop panicing on some corrupt inputs (#7264) **What this PR does / why we need it**: This is much faster, since we don't pay for byte-at-a-time reads from the buffer. For best performance with gzip and lz4, we still want a bufio.Reader. We don't need the pool of buffered readers now. Other compression formats have their own internal buffers so don't need extra buffering. **Checklist** - [x] Reviewed the `CONTRIBUTING.md` guide - NA Documentation added - NA Tests updated - [x] `CHANGELOG.md` updated - NA Changes that require user attention or interaction to upgrade are documented in `docs/sources/upgrading/_index.md` (using changes from https://github.com/grafana/loki/pull/7246, ignoring lines that didn't change much) ``` name old time/op new time/op delta Write/ordered-lz4-256k-4 24.1ms ± 7% 22.1ms ± 2% -8.52% (p=0.008 n=5+5) Write/unordered-lz4-4 54.6ms ±29% 49.1ms ± 1% -10.04% (p=0.016 n=5+4) Read/none_66_kB-4 4.16ms ± 6% 2.54ms ± 2% -38.87% (p=0.008 n=5+5) Read/gzip_66_kB-4 177ms ± 4% 161ms ± 4% -9.09% (p=0.008 n=5+5) Read/lz4-64k_66_kB-4 59.5ms ± 2% 49.5ms ± 3% -16.80% (p=0.008 n=5+5) Read/lz4-256k_66_kB-4 62.9ms ± 0% 51.2ms ± 1% -18.58% (p=0.016 n=4+5) Read/lz4-1M_66_kB-4 62.9ms ± 2% 51.5ms ± 1% -18.03% (p=0.008 n=5+5) Read/lz4_66_kB-4 63.7ms ± 4% 51.2ms ± 1% -19.68% (p=0.008 n=5+5) Read/snappy_66_kB-4 51.1ms ± 2% 41.9ms ± 1% -17.95% (p=0.016 n=5+4) Read/flate_66_kB-4 168ms ± 4% 150ms ± 2% -10.49% (p=0.008 n=5+5) Read/zstd_66_kB-4 334ms ±16% 272ms ± 6% -18.52% (p=0.008 n=5+5) Read/none_262_kB-4 3.93ms ± 1% 2.48ms ± 2% -36.91% (p=0.008 n=5+5) Read/gzip_262_kB-4 159ms ± 5% 143ms ± 2% -10.12% (p=0.008 n=5+5) Read/lz4-64k_262_kB-4 53.5ms ± 1% 43.2ms ± 3% -19.35% (p=0.008 n=5+5) Read/lz4-256k_262_kB-4 55.8ms ± 1% 45.3ms ± 2% -18.75% (p=0.008 n=5+5) Read/lz4-1M_262_kB-4 57.5ms ± 2% 45.5ms ± 2% -20.91% (p=0.008 n=5+5) Read/lz4_262_kB-4 56.8ms ± 2% 45.3ms ± 2% -20.26% (p=0.008 n=5+5) Read/snappy_262_kB-4 46.0ms ± 2% 37.5ms ± 1% -18.57% (p=0.008 n=5+5) Read/flate_262_kB-4 153ms ± 3% 134ms ± 2% -12.33% (p=0.008 n=5+5) Read/none_524_kB-4 3.85ms ± 2% 2.46ms ± 2% -36.04% (p=0.008 n=5+5) Read/gzip_524_kB-4 128ms ± 2% 119ms ± 2% -7.64% (p=0.008 n=5+5) Read/lz4-64k_524_kB-4 44.6ms ± 3% 34.7ms ± 1% -22.19% (p=0.008 n=5+5) Read/lz4-256k_524_kB-4 45.4ms ± 1% 36.7ms ± 5% -19.07% (p=0.008 n=5+5) Read/lz4-1M_524_kB-4 47.3ms ± 1% 38.6ms ± 2% -18.36% (p=0.008 n=5+5) Read/lz4_524_kB-4 47.4ms ± 1% 38.0ms ± 1% -19.70% (p=0.008 n=5+5) Read/snappy_524_kB-4 37.3ms ± 1% 31.1ms ± 3% -16.47% (p=0.008 n=5+5) Read/flate_524_kB-4 122ms ± 1% 111ms ± 3% -9.04% (p=0.008 n=5+5) Read/sample_none_66_kB-4 7.81ms ± 1% 6.62ms ± 6% -15.24% (p=0.008 n=5+5) Read/sample_gzip_66_kB-4 237ms ± 2% 219ms ± 1% -7.57% (p=0.008 n=5+5) Read/sample_lz4-64k_66_kB-4 105ms ± 2% 92ms ± 1% -12.03% (p=0.008 n=5+5) Read/sample_lz4-256k_66_kB-4 109ms ± 1% 98ms ± 2% -9.65% (p=0.008 n=5+5) Read/sample_lz4-1M_66_kB-4 113ms ± 8% 98ms ± 2% -13.66% (p=0.008 n=5+5) Read/sample_lz4_66_kB-4 109ms ± 1% 99ms ± 2% -9.29% (p=0.008 n=5+5) Read/sample_snappy_66_kB-4 86.6ms ± 1% 77.2ms ± 6% -10.92% (p=0.008 n=5+5) Read/sample_flate_66_kB-4 236ms ± 8% 214ms ± 3% -9.11% (p=0.008 n=5+5) Read/sample_none_262_kB-4 7.70ms ± 2% 6.34ms ± 2% -17.67% (p=0.008 n=5+5) Read/sample_lz4-64k_262_kB-4 94.7ms ± 1% 84.4ms ± 2% -10.90% (p=0.008 n=5+5) Read/sample_lz4-256k_262_kB-4 98.4ms ± 1% 87.7ms ± 2% -10.93% (p=0.008 n=5+5) Read/sample_lz4-1M_262_kB-4 101ms ± 2% 90ms ± 2% -11.19% (p=0.008 n=5+5) Read/sample_lz4_262_kB-4 100ms ± 2% 90ms ± 3% -10.78% (p=0.008 n=5+5) Read/sample_snappy_262_kB-4 77.9ms ± 2% 68.7ms ± 2% -11.84% (p=0.008 n=5+5) Read/sample_flate_262_kB-4 211ms ± 2% 190ms ± 2% -9.86% (p=0.008 n=5+5) Read/sample_none_524_kB-4 7.82ms ± 2% 6.43ms ± 2% -17.80% (p=0.008 n=5+5) Read/sample_gzip_524_kB-4 176ms ± 1% 164ms ± 2% -6.80% (p=0.008 n=5+5) Read/sample_lz4-64k_524_kB-4 77.7ms ± 1% 69.8ms ± 4% -10.10% (p=0.008 n=5+5) Read/sample_lz4-256k_524_kB-4 81.7ms ± 2% 72.1ms ± 3% -11.79% (p=0.008 n=5+5) Read/sample_lz4-1M_524_kB-4 83.8ms ± 1% 74.4ms ± 1% -11.21% (p=0.008 n=5+5) Read/sample_lz4_524_kB-4 83.2ms ± 0% 76.1ms ± 8% -8.53% (p=0.016 n=4+5) Read/sample_snappy_524_kB-4 63.6ms ± 2% 56.5ms ± 2% -11.23% (p=0.008 n=5+5) Read/sample_flate_524_kB-4 172ms ± 2% 157ms ± 1% -8.88% (p=0.008 n=5+5) name old speed new speed delta Write/ordered-lz4-256k-4 687MB/s ± 6% 749MB/s ± 2% +9.09% (p=0.008 n=5+5) Write/unordered-lz4-4 313MB/s ±24% 341MB/s ± 1% +9.23% (p=0.016 n=5+4) Read/none_66_kB-4 1.82GB/s ± 6% 2.97GB/s ± 2% +63.42% (p=0.008 n=5+5) Read/gzip_66_kB-4 669MB/s ± 4% 736MB/s ± 4% +10.00% (p=0.008 n=5+5) Read/lz4-64k_66_kB-4 1.42GB/s ± 2% 1.71GB/s ± 3% +20.21% (p=0.008 n=5+5) Read/lz4-256k_66_kB-4 1.43GB/s ± 0% 1.76GB/s ± 1% +22.83% (p=0.016 n=4+5) Read/lz4-1M_66_kB-4 1.43GB/s ± 2% 1.74GB/s ± 1% +21.99% (p=0.008 n=5+5) Read/lz4_66_kB-4 1.41GB/s ± 4% 1.76GB/s ± 1% +24.44% (p=0.008 n=5+5) Read/snappy_66_kB-4 1.29GB/s ± 2% 1.57GB/s ± 1% +21.86% (p=0.016 n=5+4) Read/flate_66_kB-4 710MB/s ± 4% 793MB/s ± 2% +11.70% (p=0.008 n=5+5) Read/zstd_66_kB-4 425MB/s ±14% 517MB/s ± 6% +21.74% (p=0.008 n=5+5) Read/none_262_kB-4 1.93GB/s ± 1% 3.06GB/s ± 2% +58.54% (p=0.008 n=5+5) Read/gzip_262_kB-4 701MB/s ± 5% 779MB/s ± 2% +11.20% (p=0.008 n=5+5) Read/lz4-64k_262_kB-4 1.47GB/s ± 1% 1.82GB/s ± 3% +24.03% (p=0.008 n=5+5) Read/lz4-256k_262_kB-4 1.48GB/s ± 1% 1.82GB/s ± 2% +23.09% (p=0.008 n=5+5) Read/lz4-1M_262_kB-4 1.46GB/s ± 2% 1.84GB/s ± 2% +26.44% (p=0.008 n=5+5) Read/lz4_262_kB-4 1.48GB/s ± 2% 1.85GB/s ± 2% +25.42% (p=0.008 n=5+5) Read/snappy_262_kB-4 1.31GB/s ± 2% 1.61GB/s ± 1% +22.80% (p=0.008 n=5+5) Read/flate_262_kB-4 729MB/s ± 3% 831MB/s ± 2% +14.05% (p=0.008 n=5+5) Read/none_524_kB-4 1.97GB/s ± 2% 3.08GB/s ± 2% +56.33% (p=0.008 n=5+5) Read/gzip_524_kB-4 714MB/s ± 2% 773MB/s ± 2% +8.27% (p=0.008 n=5+5) Read/lz4-64k_524_kB-4 1.47GB/s ± 3% 1.88GB/s ± 1% +28.48% (p=0.008 n=5+5) Read/lz4-256k_524_kB-4 1.50GB/s ± 1% 1.86GB/s ± 5% +23.63% (p=0.008 n=5+5) Read/lz4-1M_524_kB-4 1.49GB/s ± 1% 1.83GB/s ± 2% +22.51% (p=0.008 n=5+5) Read/lz4_524_kB-4 1.49GB/s ± 1% 1.86GB/s ± 1% +24.53% (p=0.008 n=5+5) Read/snappy_524_kB-4 1.34GB/s ± 1% 1.60GB/s ± 3% +19.75% (p=0.008 n=5+5) Read/flate_524_kB-4 750MB/s ± 1% 824MB/s ± 3% +9.97% (p=0.008 n=5+5) Read/sample_none_66_kB-4 967MB/s ± 1% 1142MB/s ± 6% +18.13% (p=0.008 n=5+5) Read/sample_gzip_66_kB-4 501MB/s ± 2% 542MB/s ± 1% +8.19% (p=0.008 n=5+5) Read/sample_lz4-64k_66_kB-4 811MB/s ± 2% 922MB/s ± 1% +13.66% (p=0.008 n=5+5) Read/sample_lz4-256k_66_kB-4 825MB/s ± 1% 913MB/s ± 2% +10.68% (p=0.008 n=5+5) Read/sample_lz4-1M_66_kB-4 795MB/s ± 7% 920MB/s ± 1% +15.67% (p=0.008 n=5+5) Read/sample_lz4_66_kB-4 824MB/s ± 1% 909MB/s ± 2% +10.25% (p=0.008 n=5+5) Read/sample_snappy_66_kB-4 759MB/s ± 1% 852MB/s ± 6% +12.37% (p=0.008 n=5+5) Read/sample_flate_66_kB-4 506MB/s ± 7% 556MB/s ± 4% +9.88% (p=0.008 n=5+5) Read/sample_none_262_kB-4 983MB/s ± 2% 1194MB/s ± 2% +21.47% (p=0.008 n=5+5) Read/sample_lz4-64k_262_kB-4 830MB/s ± 1% 932MB/s ± 2% +12.24% (p=0.008 n=5+5) Read/sample_lz4-256k_262_kB-4 839MB/s ± 1% 942MB/s ± 2% +12.28% (p=0.008 n=5+5) Read/sample_lz4-1M_262_kB-4 829MB/s ± 2% 934MB/s ± 2% +12.60% (p=0.008 n=5+5) Read/sample_lz4_262_kB-4 836MB/s ± 2% 937MB/s ± 3% +12.11% (p=0.008 n=5+5) Read/sample_snappy_262_kB-4 774MB/s ± 2% 878MB/s ± 2% +13.44% (p=0.008 n=5+5) Read/sample_flate_262_kB-4 529MB/s ± 2% 587MB/s ± 2% +10.94% (p=0.008 n=5+5) Read/sample_none_524_kB-4 970MB/s ± 2% 1180MB/s ± 2% +21.64% (p=0.008 n=5+5) Read/sample_gzip_524_kB-4 521MB/s ± 1% 559MB/s ± 2% +7.31% (p=0.008 n=5+5) Read/sample_lz4-64k_524_kB-4 842MB/s ± 1% 937MB/s ± 4% +11.30% (p=0.008 n=5+5) Read/sample_lz4-256k_524_kB-4 834MB/s ± 2% 945MB/s ± 3% +13.37% (p=0.008 n=5+5) Read/sample_lz4-1M_524_kB-4 842MB/s ± 1% 949MB/s ± 1% +12.63% (p=0.008 n=5+5) Read/sample_lz4_524_kB-4 849MB/s ± 0% 929MB/s ± 7% +9.53% (p=0.016 n=4+5) Read/sample_snappy_524_kB-4 783MB/s ± 2% 882MB/s ± 2% +12.65% (p=0.008 n=5+5) Read/sample_flate_524_kB-4 534MB/s ± 2% 585MB/s ± 1% +9.73% (p=0.008 n=5+5) name old alloc/op new alloc/op delta Write/unordered-lz4-4 18.8MB ± 4% 17.2MB ± 7% -8.28% (p=0.008 n=5+5) Read/none_66_kB-4 41.9kB ± 0% 43.7kB ± 0% +4.41% (p=0.008 n=5+5) Read/gzip_66_kB-4 713kB ± 0% 741kB ± 0% +3.85% (p=0.016 n=5+4) Read/lz4-64k_66_kB-4 730kB ± 1% 748kB ± 0% +2.53% (p=0.016 n=5+4) Read/lz4-256k_66_kB-4 770kB ± 0% 806kB ± 3% +4.70% (p=0.016 n=4+5) Read/lz4-1M_66_kB-4 770kB ± 0% 812kB ± 4% +5.43% (p=0.016 n=4+5) Read/lz4_66_kB-4 770kB ± 0% 792kB ± 0% +2.83% (p=0.029 n=4+4) Read/snappy_66_kB-4 361kB ± 0% 376kB ± 0% +4.02% (p=0.008 n=5+5) Read/flate_66_kB-4 715kB ± 0% 743kB ± 0% +3.88% (p=0.016 n=5+4) Read/none_262_kB-4 11.2kB ± 0% 11.7kB ± 0% +4.12% (p=0.016 n=5+4) Read/gzip_262_kB-4 198kB ± 0% 203kB ± 0% +2.93% (p=0.016 n=4+5) Read/lz4-64k_262_kB-4 169kB ± 0% 174kB ± 0% +2.75% (p=0.008 n=5+5) Read/lz4-256k_262_kB-4 177kB ± 0% 182kB ± 0% +2.79% (p=0.008 n=5+5) Read/lz4-1M_262_kB-4 181kB ± 0% 186kB ± 0% +2.77% (p=0.008 n=5+5) Read/lz4_262_kB-4 181kB ± 0% 186kB ± 0% +2.73% (p=0.016 n=5+4) Read/snappy_262_kB-4 87.8kB ± 0% 90.1kB ± 0% +2.58% (p=0.008 n=5+5) Read/flate_262_kB-4 197kB ± 0% 203kB ± 0% +2.63% (p=0.016 n=5+4) Read/zstd_262_kB-4 553MB ± 0% 552MB ± 0% -0.15% (p=0.032 n=5+5) Read/none_524_kB-4 5.94kB ± 0% 6.16kB ± 0% +3.76% (p=0.016 n=5+4) Read/gzip_524_kB-4 96.2kB ± 0% 98.3kB ± 0% +2.16% (p=0.016 n=4+5) Read/lz4-64k_524_kB-4 70.9kB ± 0% 72.8kB ± 0% +2.71% (p=0.016 n=4+5) Read/lz4-256k_524_kB-4 73.8kB ± 0% 75.8kB ± 0% +2.68% (p=0.008 n=5+5) Read/lz4-1M_524_kB-4 76.5kB ± 0% 78.7kB ± 0% +2.77% (p=0.016 n=4+5) Read/lz4_524_kB-4 76.5kB ± 0% 78.6kB ± 0% +2.74% (p=0.016 n=5+4) Read/snappy_524_kB-4 39.1kB ± 0% 39.5kB ± 0% +1.19% (p=0.008 n=5+5) Read/flate_524_kB-4 95.4kB ± 0% 97.3kB ± 0% +2.06% (p=0.008 n=5+5) Read/sample_none_66_kB-4 39.9kB ± 0% 41.7kB ± 0% +4.62% (p=0.008 n=5+5) Read/sample_gzip_66_kB-4 686kB ± 0% 715kB ± 0% +4.23% (p=0.016 n=4+5) Read/sample_lz4-64k_66_kB-4 707kB ± 0% 728kB ± 0% +2.88% (p=0.016 n=5+4) Read/sample_lz4-256k_66_kB-4 749kB ± 0% 783kB ± 4% +4.65% (p=0.008 n=5+5) Read/sample_lz4_66_kB-4 748kB ± 0% 770kB ± 0% +2.91% (p=0.016 n=5+4) Read/sample_snappy_66_kB-4 350kB ± 0% 364kB ± 0% +4.04% (p=0.029 n=4+4) Read/sample_flate_66_kB-4 688kB ± 0% 716kB ± 0% +4.08% (p=0.029 n=4+4) Read/sample_none_262_kB-4 10.6kB ± 0% 11.0kB ± 0% +4.23% (p=0.016 n=5+4) Read/sample_gzip_262_kB-4 194kB ± 0% 199kB ± 0% +2.48% (p=0.016 n=4+5) Read/sample_lz4-64k_262_kB-4 165kB ± 0% 169kB ± 0% +2.86% (p=0.016 n=4+5) Read/sample_lz4-256k_262_kB-4 172kB ± 0% 177kB ± 0% +2.92% (p=0.016 n=4+5) Read/sample_lz4-1M_262_kB-4 176kB ± 0% 181kB ± 0% +2.91% (p=0.029 n=4+4) Read/sample_lz4_262_kB-4 176kB ± 0% 181kB ± 0% +2.81% (p=0.016 n=5+4) Read/sample_snappy_262_kB-4 88.1kB ± 0% 90.8kB ± 0% +3.11% (p=0.029 n=4+4) Read/sample_flate_262_kB-4 194kB ± 0% 198kB ± 0% +2.13% (p=0.029 n=4+4) Read/sample_none_524_kB-4 5.56kB ± 0% 5.76kB ± 0% +3.65% (p=0.016 n=5+4) Read/sample_gzip_524_kB-4 95.4kB ± 0% 96.9kB ± 0% +1.56% (p=0.008 n=5+5) Read/sample_lz4-64k_524_kB-4 69.0kB ± 0% 71.0kB ± 0% +2.87% (p=0.008 n=5+5) Read/sample_lz4-256k_524_kB-4 71.8kB ± 0% 73.8kB ± 0% +2.79% (p=0.016 n=4+5) Read/sample_lz4-1M_524_kB-4 74.5kB ± 0% 76.6kB ± 0% +2.75% (p=0.008 n=5+5) Read/sample_lz4_524_kB-4 74.5kB ± 0% 188.4kB ±89% +152.98% (p=0.008 n=5+5) Read/sample_snappy_524_kB-4 40.7kB ± 1% 41.3kB ± 1% +1.58% (p=0.008 n=5+5) Read/sample_flate_524_kB-4 95.2kB ± 0% 96.2kB ± 0% +0.95% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Read/gzip_66_kB-4 13.2k ± 0% 13.2k ± 0% -0.04% (p=0.000 n=5+4) Read/zstd_66_kB-4 18.0k ± 1% 17.8k ± 1% -1.06% (p=0.032 n=5+5) Read/gzip_262_kB-4 4.61k ± 0% 4.61k ± 0% -0.11% (p=0.029 n=4+4) Read/flate_262_kB-4 4.61k ± 0% 4.60k ± 0% -0.11% (p=0.008 n=5+5) Read/zstd_262_kB-4 7.10k ± 2% 6.88k ± 3% -3.01% (p=0.008 n=5+5) Read/gzip_524_kB-4 2.62k ± 0% 2.62k ± 0% -0.06% (p=0.000 n=4+5) Read/zstd_524_kB-4 6.13k ± 2% 5.94k ± 2% -3.03% (p=0.024 n=5+5) Read/sample_flate_66_kB-4 13.2k ± 0% 13.2k ± 0% -0.01% (p=0.029 n=4+4) Read/sample_flate_262_kB-4 4.62k ± 0% 4.61k ± 0% -0.22% (p=0.029 n=4+4) Read/sample_zstd_262_kB-4 7.12k ± 1% 6.97k ± 1% -2.11% (p=0.008 n=5+5) Read/sample_gzip_524_kB-4 2.63k ± 0% 2.63k ± 0% -0.11% (p=0.008 n=5+5) Read/sample_flate_524_kB-4 2.63k ± 0% 2.62k ± 0% -0.15% (p=0.008 n=5+5) Read/sample_zstd_524_kB-4 6.11k ± 2% 5.89k ± 1% -3.68% (p=0.016 n=5+4) ``` Co-authored-by: Owen Diehl <ow.diehl@gmail.com>
3 years ago
cases := []struct {
data []byte
}{
// Data that should not decode as lines from a chunk in any encoding.
{data: []byte{0}},
{data: []byte{1}},
{data: []byte("asdfasdfasdfqwyteqwtyeq")},
}
ctx, start, end := context.Background(), time.Unix(0, 0), time.Unix(0, math.MaxInt64)
for i, c := range cases {
chk.blocks = []block{{b: c.data}}
it, err := chk.Iterator(ctx, start, end, logproto.FORWARD, noopStreamPipeline)
require.NoError(t, err, "case %d", i)
idx := 0
for it.Next() {
idx++
}
require.Error(t, it.Error(), "case %d", i)
require.NoError(t, it.Close())
}
})
}
}
Adds configurable compression algorithms for chunks (#1411) * Adds L4Z encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds encoding benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds chunk size test Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy v2 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Improve benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove chunkenc Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update lz4 to latest master version. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use temporary buffer in serialise method to avoid allocations when doing string -> byte conversion. It also makes code little more readable. We pool those buffers for reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added gzip -1 for comparison. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Initialize reader and buffered reader lazily. This helps with reader/buffered reader reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Don't keep entries, extracted generateData function (mostly to get more understandable profile) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Improve test and benchmark to cover all encodings. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds support for a new chunk format with encoding info. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Ingesters now support encoding config. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add support for no compression. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add docs Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove default Gzip for ByteChunk. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Removes none, snappyv2 and gzip-1 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Move log test lines to testdata and add supported encoding stringer Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * got linted Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
6 years ago
func TestReadFormatV1(t *testing.T) {
t.Parallel()
c := NewMemChunk(EncGZIP, DefaultTestHeadBlockFmt, testBlockSize, testTargetSize)
Adds configurable compression algorithms for chunks (#1411) * Adds L4Z encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds encoding benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds chunk size test Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy v2 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Improve benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove chunkenc Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update lz4 to latest master version. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use temporary buffer in serialise method to avoid allocations when doing string -> byte conversion. It also makes code little more readable. We pool those buffers for reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added gzip -1 for comparison. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Initialize reader and buffered reader lazily. This helps with reader/buffered reader reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Don't keep entries, extracted generateData function (mostly to get more understandable profile) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Improve test and benchmark to cover all encodings. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds support for a new chunk format with encoding info. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Ingesters now support encoding config. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add support for no compression. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add docs Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove default Gzip for ByteChunk. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Removes none, snappyv2 and gzip-1 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Move log test lines to testdata and add supported encoding stringer Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * got linted Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
6 years ago
fillChunk(c)
// overrides default v2 format
c.format = chunkFormatV1
b, err := c.Bytes()
if err != nil {
t.Fatal(err)
}
r, err := NewByteChunk(b, testBlockSize, testTargetSize)
Adds configurable compression algorithms for chunks (#1411) * Adds L4Z encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds encoding benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds chunk size test Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy v2 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Improve benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove chunkenc Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update lz4 to latest master version. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use temporary buffer in serialise method to avoid allocations when doing string -> byte conversion. It also makes code little more readable. We pool those buffers for reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added gzip -1 for comparison. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Initialize reader and buffered reader lazily. This helps with reader/buffered reader reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Don't keep entries, extracted generateData function (mostly to get more understandable profile) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Improve test and benchmark to cover all encodings. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds support for a new chunk format with encoding info. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Ingesters now support encoding config. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add support for no compression. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add docs Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove default Gzip for ByteChunk. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Removes none, snappyv2 and gzip-1 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Move log test lines to testdata and add supported encoding stringer Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * got linted Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
6 years ago
if err != nil {
t.Fatal(err)
}
it, err := r.Iterator(context.Background(), time.Unix(0, 0), time.Unix(0, math.MaxInt64), logproto.FORWARD, noopStreamPipeline)
Adds configurable compression algorithms for chunks (#1411) * Adds L4Z encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds encoding benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds chunk size test Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy v2 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Improve benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove chunkenc Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update lz4 to latest master version. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use temporary buffer in serialise method to avoid allocations when doing string -> byte conversion. It also makes code little more readable. We pool those buffers for reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added gzip -1 for comparison. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Initialize reader and buffered reader lazily. This helps with reader/buffered reader reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Don't keep entries, extracted generateData function (mostly to get more understandable profile) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Improve test and benchmark to cover all encodings. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds support for a new chunk format with encoding info. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Ingesters now support encoding config. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add support for no compression. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add docs Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove default Gzip for ByteChunk. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Removes none, snappyv2 and gzip-1 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Move log test lines to testdata and add supported encoding stringer Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * got linted Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
6 years ago
if err != nil {
t.Fatal(err)
}
i := int64(0)
for it.Next() {
require.Equal(t, i, it.Entry().Timestamp.UnixNano())
require.Equal(t, testdata.LogString(i), it.Entry().Line)
i++
}
}
// Test all encodings by populating a memchunk, serializing it,
// re-loading with NewByteChunk, serializing it again, and re-loading into via NewByteChunk once more.
// This tests the integrity of transfer between the following:
// 1) memory populated chunks <-> []byte loaded chunks
// 2) []byte loaded chunks <-> []byte loaded chunks
func TestRoundtripV2(t *testing.T) {
for _, testData := range allPossibleFormats {
for _, enc := range testEncoding {
enc := enc
t.Run(testNameWithFormats(enc, testData.chunkFormat, testData.headBlockFmt), func(t *testing.T) {
t.Parallel()
c := newMemChunkWithFormat(testData.chunkFormat, enc, testData.headBlockFmt, testBlockSize, testTargetSize)
populated := fillChunk(c)
assertLines := func(c *MemChunk) {
require.Equal(t, enc, c.Encoding())
it, err := c.Iterator(context.Background(), time.Unix(0, 0), time.Unix(0, math.MaxInt64), logproto.FORWARD, noopStreamPipeline)
if err != nil {
t.Fatal(err)
}
i := int64(0)
var data int64
for it.Next() {
require.Equal(t, i, it.Entry().Timestamp.UnixNano())
require.Equal(t, testdata.LogString(i), it.Entry().Line)
data += int64(len(it.Entry().Line))
i++
}
require.Equal(t, populated, data)
}
assertLines(c)
// test MemChunk -> NewByteChunk loading
b, err := c.Bytes()
if err != nil {
t.Fatal(err)
}
r, err := NewByteChunk(b, testBlockSize, testTargetSize)
if err != nil {
t.Fatal(err)
}
assertLines(r)
// test NewByteChunk -> NewByteChunk loading
rOut, err := r.Bytes()
require.Nil(t, err)
loaded, err := NewByteChunk(rOut, testBlockSize, testTargetSize)
require.Nil(t, err)
assertLines(loaded)
})
}
}
}
func testNameWithFormats(enc Encoding, chunkFormat byte, headBlockFmt HeadBlockFmt) string {
return fmt.Sprintf("encoding:%v chunkFormat:%v headBlockFmt:%v", enc, chunkFormat, headBlockFmt)
}
func TestRoundtripV3(t *testing.T) {
for _, f := range HeadBlockFmts {
for _, enc := range testEncoding {
enc := enc
t.Run(fmt.Sprintf("%v-%v", f, enc), func(t *testing.T) {
t.Parallel()
c := NewMemChunk(enc, f, testBlockSize, testTargetSize)
c.format = chunkFormatV3
_ = fillChunk(c)
b, err := c.Bytes()
require.Nil(t, err)
r, err := NewByteChunk(b, testBlockSize, testTargetSize)
require.Nil(t, err)
b2, err := r.Bytes()
require.Nil(t, err)
require.Equal(t, b, b2)
})
}
}
}
Adds configurable compression algorithms for chunks (#1411) * Adds L4Z encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds encoding benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds chunk size test Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy v2 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Improve benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove chunkenc Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update lz4 to latest master version. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use temporary buffer in serialise method to avoid allocations when doing string -> byte conversion. It also makes code little more readable. We pool those buffers for reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added gzip -1 for comparison. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Initialize reader and buffered reader lazily. This helps with reader/buffered reader reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Don't keep entries, extracted generateData function (mostly to get more understandable profile) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Improve test and benchmark to cover all encodings. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds support for a new chunk format with encoding info. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Ingesters now support encoding config. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add support for no compression. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add docs Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove default Gzip for ByteChunk. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Removes none, snappyv2 and gzip-1 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Move log test lines to testdata and add supported encoding stringer Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * got linted Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
6 years ago
func TestSerialization(t *testing.T) {
for _, testData := range allPossibleFormats {
for _, enc := range testEncoding {
enc := enc
// run tests with and without non-indexed labels set since it is optional
for _, appendWithNonIndexedLabels := range []bool{false, true} {
appendWithNonIndexedLabels := appendWithNonIndexedLabels
testName := testNameWithFormats(enc, testData.chunkFormat, testData.headBlockFmt)
if appendWithNonIndexedLabels {
testName = fmt.Sprintf("%s - append non-indexed labels", testName)
} else {
testName = fmt.Sprintf("%s - without non-indexed labels", testName)
}
t.Run(testName, func(t *testing.T) {
t.Parallel()
chk := NewMemChunk(enc, testData.headBlockFmt, testBlockSize, testTargetSize)
chk.format = testData.chunkFormat
numSamples := 50000
var entry *logproto.Entry
for i := 0; i < numSamples; i++ {
entry = logprotoEntry(int64(i), strconv.Itoa(i))
if appendWithNonIndexedLabels {
entry.NonIndexedLabels = []logproto.LabelAdapter{{Name: "foo", Value: strconv.Itoa(i)}}
}
require.NoError(t, chk.Append(entry))
}
require.NoError(t, chk.Close())
Adds configurable compression algorithms for chunks (#1411) * Adds L4Z encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds encoding benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds chunk size test Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy v2 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Improve benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove chunkenc Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update lz4 to latest master version. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use temporary buffer in serialise method to avoid allocations when doing string -> byte conversion. It also makes code little more readable. We pool those buffers for reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added gzip -1 for comparison. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Initialize reader and buffered reader lazily. This helps with reader/buffered reader reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Don't keep entries, extracted generateData function (mostly to get more understandable profile) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Improve test and benchmark to cover all encodings. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds support for a new chunk format with encoding info. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Ingesters now support encoding config. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add support for no compression. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add docs Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove default Gzip for ByteChunk. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Removes none, snappyv2 and gzip-1 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Move log test lines to testdata and add supported encoding stringer Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * got linted Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
6 years ago
byt, err := chk.Bytes()
require.NoError(t, err)
Adds configurable compression algorithms for chunks (#1411) * Adds L4Z encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds encoding benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds chunk size test Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy v2 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Improve benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove chunkenc Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update lz4 to latest master version. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use temporary buffer in serialise method to avoid allocations when doing string -> byte conversion. It also makes code little more readable. We pool those buffers for reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added gzip -1 for comparison. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Initialize reader and buffered reader lazily. This helps with reader/buffered reader reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Don't keep entries, extracted generateData function (mostly to get more understandable profile) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Improve test and benchmark to cover all encodings. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds support for a new chunk format with encoding info. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Ingesters now support encoding config. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add support for no compression. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add docs Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove default Gzip for ByteChunk. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Removes none, snappyv2 and gzip-1 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Move log test lines to testdata and add supported encoding stringer Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * got linted Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
6 years ago
bc, err := NewByteChunk(byt, testBlockSize, testTargetSize)
require.NoError(t, err)
Adds configurable compression algorithms for chunks (#1411) * Adds L4Z encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds encoding benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds chunk size test Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy v2 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Improve benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove chunkenc Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update lz4 to latest master version. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use temporary buffer in serialise method to avoid allocations when doing string -> byte conversion. It also makes code little more readable. We pool those buffers for reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added gzip -1 for comparison. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Initialize reader and buffered reader lazily. This helps with reader/buffered reader reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Don't keep entries, extracted generateData function (mostly to get more understandable profile) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Improve test and benchmark to cover all encodings. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds support for a new chunk format with encoding info. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Ingesters now support encoding config. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add support for no compression. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add docs Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove default Gzip for ByteChunk. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Removes none, snappyv2 and gzip-1 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Move log test lines to testdata and add supported encoding stringer Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * got linted Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
6 years ago
it, err := bc.Iterator(context.Background(), time.Unix(0, 0), time.Unix(0, math.MaxInt64), logproto.FORWARD, log.NewNoopPipeline().ForStream(labels.Labels{}))
require.NoError(t, err)
for i := 0; i < numSamples; i++ {
require.True(t, it.Next())
Adds configurable compression algorithms for chunks (#1411) * Adds L4Z encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds encoding benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds chunk size test Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy v2 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Improve benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove chunkenc Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update lz4 to latest master version. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use temporary buffer in serialise method to avoid allocations when doing string -> byte conversion. It also makes code little more readable. We pool those buffers for reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added gzip -1 for comparison. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Initialize reader and buffered reader lazily. This helps with reader/buffered reader reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Don't keep entries, extracted generateData function (mostly to get more understandable profile) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Improve test and benchmark to cover all encodings. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds support for a new chunk format with encoding info. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Ingesters now support encoding config. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add support for no compression. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add docs Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove default Gzip for ByteChunk. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Removes none, snappyv2 and gzip-1 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Move log test lines to testdata and add supported encoding stringer Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * got linted Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
6 years ago
e := it.Entry()
require.Equal(t, int64(i), e.Timestamp.UnixNano())
require.Equal(t, strconv.Itoa(i), e.Line)
require.Nil(t, e.NonIndexedLabels)
if appendWithNonIndexedLabels && testData.chunkFormat >= chunkFormatV4 {
require.Equal(t, labels.FromStrings("foo", strconv.Itoa(i)).String(), it.Labels())
} else {
require.Equal(t, labels.EmptyLabels().String(), it.Labels())
}
}
require.NoError(t, it.Error())
Improve metric queries by computing samples at the edges. (#2293) * First pass breaking the code appart. Wondering how we're going to achieve fast mutation of labels. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Work in progress. I realize I need hash for deduping lines. going to benchmark somes. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Tested some hash and decided which one to use. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Wip Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Starting working on ingester. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Trying to find a better hash function. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * More hash testing we have a winner. xxhash it is. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Settle on xxhash Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Better params interfacing. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add interface for queryparams for things that exist in both type of params. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add storage sample iterator implementations. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Fixing tests and verifying we don't get collions for the hashing method. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Fixing ingesters tests and refactoring utility function/tests. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Fixing and testing that stats are still well computed. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Fixing more tests. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * More engine tests finished. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Fixes sharding evaluator. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Fixes more engine tests. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Fix error tests in the engine. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Finish fixing all tests. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Fixes a bug where extractor was not passed in correctly. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add notes about upgrade. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Renamed and fix a bug. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add memchunk tests and starting test for sampleIterator. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Test heap sample iterator. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * working on test. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Finishing testing all new iterators. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Making sure all store functions are tested. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Benchmark and verify everything is working well. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Make the linter happy. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * use xxhash v2. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Fix a flaky test because of map. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * go.mod. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> Co-authored-by: Edward Welch <edward.welch@grafana.com>
6 years ago
countExtractor = func() log.StreamSampleExtractor {
ex, err := log.NewLineSampleExtractor(log.CountExtractor, nil, nil, false, false)
if err != nil {
panic(err)
}
return ex.ForStream(labels.Labels{})
}()
sampleIt := bc.SampleIterator(context.Background(), time.Unix(0, 0), time.Unix(0, math.MaxInt64), countExtractor)
for i := 0; i < numSamples; i++ {
require.True(t, sampleIt.Next(), i)
s := sampleIt.Sample()
require.Equal(t, int64(i), s.Timestamp)
require.Equal(t, 1., s.Value)
if appendWithNonIndexedLabels && testData.chunkFormat >= chunkFormatV4 {
require.Equal(t, labels.FromStrings("foo", strconv.Itoa(i)).String(), sampleIt.Labels())
} else {
require.Equal(t, labels.EmptyLabels().String(), sampleIt.Labels())
}
}
require.NoError(t, sampleIt.Error())
Improve metric queries by computing samples at the edges. (#2293) * First pass breaking the code appart. Wondering how we're going to achieve fast mutation of labels. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Work in progress. I realize I need hash for deduping lines. going to benchmark somes. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Tested some hash and decided which one to use. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Wip Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Starting working on ingester. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Trying to find a better hash function. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * More hash testing we have a winner. xxhash it is. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Settle on xxhash Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Better params interfacing. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add interface for queryparams for things that exist in both type of params. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add storage sample iterator implementations. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Fixing tests and verifying we don't get collions for the hashing method. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Fixing ingesters tests and refactoring utility function/tests. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Fixing and testing that stats are still well computed. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Fixing more tests. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * More engine tests finished. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Fixes sharding evaluator. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Fixes more engine tests. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Fix error tests in the engine. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Finish fixing all tests. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Fixes a bug where extractor was not passed in correctly. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add notes about upgrade. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Renamed and fix a bug. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add memchunk tests and starting test for sampleIterator. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Test heap sample iterator. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * working on test. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Finishing testing all new iterators. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Making sure all store functions are tested. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Benchmark and verify everything is working well. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Make the linter happy. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * use xxhash v2. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Fix a flaky test because of map. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * go.mod. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> Co-authored-by: Edward Welch <edward.welch@grafana.com>
6 years ago
byt2, err := chk.Bytes()
require.NoError(t, err)
Adds configurable compression algorithms for chunks (#1411) * Adds L4Z encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds encoding benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds chunk size test Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy v2 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Improve benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove chunkenc Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update lz4 to latest master version. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use temporary buffer in serialise method to avoid allocations when doing string -> byte conversion. It also makes code little more readable. We pool those buffers for reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added gzip -1 for comparison. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Initialize reader and buffered reader lazily. This helps with reader/buffered reader reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Don't keep entries, extracted generateData function (mostly to get more understandable profile) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Improve test and benchmark to cover all encodings. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds support for a new chunk format with encoding info. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Ingesters now support encoding config. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add support for no compression. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add docs Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove default Gzip for ByteChunk. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Removes none, snappyv2 and gzip-1 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Move log test lines to testdata and add supported encoding stringer Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * got linted Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
6 years ago
require.True(t, bytes.Equal(byt, byt2))
})
}
}
Adds configurable compression algorithms for chunks (#1411) * Adds L4Z encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds encoding benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds chunk size test Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy v2 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Improve benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove chunkenc Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update lz4 to latest master version. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use temporary buffer in serialise method to avoid allocations when doing string -> byte conversion. It also makes code little more readable. We pool those buffers for reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added gzip -1 for comparison. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Initialize reader and buffered reader lazily. This helps with reader/buffered reader reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Don't keep entries, extracted generateData function (mostly to get more understandable profile) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Improve test and benchmark to cover all encodings. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds support for a new chunk format with encoding info. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Ingesters now support encoding config. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add support for no compression. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add docs Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove default Gzip for ByteChunk. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Removes none, snappyv2 and gzip-1 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Move log test lines to testdata and add supported encoding stringer Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * got linted Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
6 years ago
}
}
func TestChunkFilling(t *testing.T) {
for _, testData := range allPossibleFormats {
for _, enc := range testEncoding {
enc := enc
t.Run(testNameWithFormats(enc, testData.chunkFormat, testData.headBlockFmt), func(t *testing.T) {
t.Parallel()
chk := newMemChunkWithFormat(testData.chunkFormat, enc, testData.headBlockFmt, testBlockSize, 0)
chk.blockSize = 1024
Adds configurable compression algorithms for chunks (#1411) * Adds L4Z encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds encoding benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds chunk size test Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy v2 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Improve benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove chunkenc Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update lz4 to latest master version. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use temporary buffer in serialise method to avoid allocations when doing string -> byte conversion. It also makes code little more readable. We pool those buffers for reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added gzip -1 for comparison. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Initialize reader and buffered reader lazily. This helps with reader/buffered reader reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Don't keep entries, extracted generateData function (mostly to get more understandable profile) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Improve test and benchmark to cover all encodings. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds support for a new chunk format with encoding info. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Ingesters now support encoding config. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add support for no compression. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add docs Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove default Gzip for ByteChunk. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Removes none, snappyv2 and gzip-1 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Move log test lines to testdata and add supported encoding stringer Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * got linted Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
6 years ago
// We should be able to append only 10KB of logs.
maxBytes := chk.blockSize * blocksPerChunk
lineSize := 512
lines := maxBytes / lineSize
Adds configurable compression algorithms for chunks (#1411) * Adds L4Z encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds encoding benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds chunk size test Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy v2 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Improve benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove chunkenc Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update lz4 to latest master version. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use temporary buffer in serialise method to avoid allocations when doing string -> byte conversion. It also makes code little more readable. We pool those buffers for reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added gzip -1 for comparison. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Initialize reader and buffered reader lazily. This helps with reader/buffered reader reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Don't keep entries, extracted generateData function (mostly to get more understandable profile) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Improve test and benchmark to cover all encodings. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds support for a new chunk format with encoding info. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Ingesters now support encoding config. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add support for no compression. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add docs Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove default Gzip for ByteChunk. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Removes none, snappyv2 and gzip-1 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Move log test lines to testdata and add supported encoding stringer Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * got linted Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
6 years ago
logLine := string(make([]byte, lineSize))
entry := &logproto.Entry{
Timestamp: time.Unix(0, 0),
Line: logLine,
}
Adds configurable compression algorithms for chunks (#1411) * Adds L4Z encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds encoding benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds chunk size test Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy v2 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Improve benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove chunkenc Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update lz4 to latest master version. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use temporary buffer in serialise method to avoid allocations when doing string -> byte conversion. It also makes code little more readable. We pool those buffers for reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added gzip -1 for comparison. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Initialize reader and buffered reader lazily. This helps with reader/buffered reader reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Don't keep entries, extracted generateData function (mostly to get more understandable profile) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Improve test and benchmark to cover all encodings. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds support for a new chunk format with encoding info. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Ingesters now support encoding config. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add support for no compression. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add docs Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove default Gzip for ByteChunk. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Removes none, snappyv2 and gzip-1 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Move log test lines to testdata and add supported encoding stringer Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * got linted Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
6 years ago
i := int64(0)
for ; chk.SpaceFor(entry) && i < 30; i++ {
entry.Timestamp = time.Unix(0, i)
require.NoError(t, chk.Append(entry))
}
Adds configurable compression algorithms for chunks (#1411) * Adds L4Z encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds encoding benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds chunk size test Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy v2 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Improve benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove chunkenc Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update lz4 to latest master version. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use temporary buffer in serialise method to avoid allocations when doing string -> byte conversion. It also makes code little more readable. We pool those buffers for reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added gzip -1 for comparison. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Initialize reader and buffered reader lazily. This helps with reader/buffered reader reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Don't keep entries, extracted generateData function (mostly to get more understandable profile) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Improve test and benchmark to cover all encodings. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds support for a new chunk format with encoding info. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Ingesters now support encoding config. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add support for no compression. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add docs Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove default Gzip for ByteChunk. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Removes none, snappyv2 and gzip-1 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Move log test lines to testdata and add supported encoding stringer Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * got linted Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
6 years ago
require.Equal(t, int64(lines), i)
Adds configurable compression algorithms for chunks (#1411) * Adds L4Z encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds encoding benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds chunk size test Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy v2 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Improve benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove chunkenc Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update lz4 to latest master version. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use temporary buffer in serialise method to avoid allocations when doing string -> byte conversion. It also makes code little more readable. We pool those buffers for reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added gzip -1 for comparison. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Initialize reader and buffered reader lazily. This helps with reader/buffered reader reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Don't keep entries, extracted generateData function (mostly to get more understandable profile) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Improve test and benchmark to cover all encodings. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds support for a new chunk format with encoding info. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Ingesters now support encoding config. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add support for no compression. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add docs Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove default Gzip for ByteChunk. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Removes none, snappyv2 and gzip-1 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Move log test lines to testdata and add supported encoding stringer Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * got linted Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
6 years ago
it, err := chk.Iterator(context.Background(), time.Unix(0, 0), time.Unix(0, 100), logproto.FORWARD, noopStreamPipeline)
require.NoError(t, err)
i = 0
for it.Next() {
entry := it.Entry()
require.Equal(t, i, entry.Timestamp.UnixNano())
i++
}
Adds configurable compression algorithms for chunks (#1411) * Adds L4Z encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds encoding benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds chunk size test Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy v2 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Improve benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove chunkenc Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update lz4 to latest master version. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use temporary buffer in serialise method to avoid allocations when doing string -> byte conversion. It also makes code little more readable. We pool those buffers for reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added gzip -1 for comparison. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Initialize reader and buffered reader lazily. This helps with reader/buffered reader reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Don't keep entries, extracted generateData function (mostly to get more understandable profile) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Improve test and benchmark to cover all encodings. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds support for a new chunk format with encoding info. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Ingesters now support encoding config. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add support for no compression. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add docs Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove default Gzip for ByteChunk. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Removes none, snappyv2 and gzip-1 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Move log test lines to testdata and add supported encoding stringer Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * got linted Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
6 years ago
require.Equal(t, int64(lines), i)
})
}
Adds configurable compression algorithms for chunks (#1411) * Adds L4Z encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds encoding benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds chunk size test Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy v2 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Improve benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove chunkenc Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update lz4 to latest master version. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use temporary buffer in serialise method to avoid allocations when doing string -> byte conversion. It also makes code little more readable. We pool those buffers for reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added gzip -1 for comparison. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Initialize reader and buffered reader lazily. This helps with reader/buffered reader reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Don't keep entries, extracted generateData function (mostly to get more understandable profile) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Improve test and benchmark to cover all encodings. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds support for a new chunk format with encoding info. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Ingesters now support encoding config. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add support for no compression. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add docs Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove default Gzip for ByteChunk. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Removes none, snappyv2 and gzip-1 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Move log test lines to testdata and add supported encoding stringer Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * got linted Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
6 years ago
}
}
func TestGZIPChunkTargetSize(t *testing.T) {
t.Parallel()
chk := NewMemChunk(EncGZIP, DefaultTestHeadBlockFmt, testBlockSize, testTargetSize)
Adds configurable compression algorithms for chunks (#1411) * Adds L4Z encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds encoding benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds chunk size test Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy v2 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Improve benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove chunkenc Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update lz4 to latest master version. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use temporary buffer in serialise method to avoid allocations when doing string -> byte conversion. It also makes code little more readable. We pool those buffers for reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added gzip -1 for comparison. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Initialize reader and buffered reader lazily. This helps with reader/buffered reader reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Don't keep entries, extracted generateData function (mostly to get more understandable profile) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Improve test and benchmark to cover all encodings. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds support for a new chunk format with encoding info. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Ingesters now support encoding config. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add support for no compression. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add docs Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove default Gzip for ByteChunk. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Removes none, snappyv2 and gzip-1 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Move log test lines to testdata and add supported encoding stringer Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * got linted Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
6 years ago
lineSize := 512
entry := &logproto.Entry{
Timestamp: time.Unix(0, 0),
Line: "",
}
// Use a random number to generate random log data, otherwise the gzip compression is way too good
// and the following loop has to run waaayyyyy to many times
// Using the same seed should guarantee the same random numbers and same test data.
r := rand.New(rand.NewSource(99))
i := int64(0)
for ; chk.SpaceFor(entry) && i < 5000; i++ {
logLine := make([]byte, lineSize)
for j := range logLine {
logLine[j] = byte(r.Int())
}
entry = &logproto.Entry{
Timestamp: time.Unix(0, 0),
Line: string(logLine),
}
entry.Timestamp = time.Unix(0, i)
require.NoError(t, chk.Append(entry))
}
// 5000 is a limit ot make sure the test doesn't run away, we shouldn't need this many log lines to make 1MB chunk
require.NotEqual(t, 5000, i)
require.NoError(t, chk.Close())
require.Equal(t, 0, chk.head.UncompressedSize())
Adds configurable compression algorithms for chunks (#1411) * Adds L4Z encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds encoding benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds chunk size test Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy v2 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Improve benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove chunkenc Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update lz4 to latest master version. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use temporary buffer in serialise method to avoid allocations when doing string -> byte conversion. It also makes code little more readable. We pool those buffers for reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added gzip -1 for comparison. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Initialize reader and buffered reader lazily. This helps with reader/buffered reader reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Don't keep entries, extracted generateData function (mostly to get more understandable profile) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Improve test and benchmark to cover all encodings. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds support for a new chunk format with encoding info. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Ingesters now support encoding config. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add support for no compression. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add docs Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove default Gzip for ByteChunk. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Removes none, snappyv2 and gzip-1 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Move log test lines to testdata and add supported encoding stringer Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * got linted Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
6 years ago
// Even though the seed is static above and results should be deterministic,
// we will allow +/- 10% variance
minSize := int(float64(testTargetSize) * 0.9)
maxSize := int(float64(testTargetSize) * 1.1)
Adds configurable compression algorithms for chunks (#1411) * Adds L4Z encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds encoding benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds chunk size test Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy v2 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Improve benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove chunkenc Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update lz4 to latest master version. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use temporary buffer in serialise method to avoid allocations when doing string -> byte conversion. It also makes code little more readable. We pool those buffers for reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added gzip -1 for comparison. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Initialize reader and buffered reader lazily. This helps with reader/buffered reader reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Don't keep entries, extracted generateData function (mostly to get more understandable profile) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Improve test and benchmark to cover all encodings. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds support for a new chunk format with encoding info. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Ingesters now support encoding config. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add support for no compression. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add docs Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove default Gzip for ByteChunk. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Removes none, snappyv2 and gzip-1 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Move log test lines to testdata and add supported encoding stringer Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * got linted Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
6 years ago
require.Greater(t, chk.CompressedSize(), minSize)
require.Less(t, chk.CompressedSize(), maxSize)
// Also verify our utilization is close to 1.0
ut := chk.Utilization()
require.Greater(t, ut, 0.99)
require.Less(t, ut, 1.01)
}
func TestMemChunk_AppendOutOfOrder(t *testing.T) {
t.Parallel()
type tester func(t *testing.T, chk *MemChunk)
tests := map[string]tester{
"append out of order in the same block": func(t *testing.T, chk *MemChunk) {
assert.NoError(t, chk.Append(logprotoEntry(5, "test")))
assert.NoError(t, chk.Append(logprotoEntry(6, "test")))
if chk.headFmt == OrderedHeadBlockFmt {
assert.EqualError(t, chk.Append(logprotoEntry(1, "test")), ErrOutOfOrder.Error())
} else {
assert.NoError(t, chk.Append(logprotoEntry(1, "test")))
}
Adds configurable compression algorithms for chunks (#1411) * Adds L4Z encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds encoding benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds chunk size test Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy v2 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Improve benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove chunkenc Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update lz4 to latest master version. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use temporary buffer in serialise method to avoid allocations when doing string -> byte conversion. It also makes code little more readable. We pool those buffers for reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added gzip -1 for comparison. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Initialize reader and buffered reader lazily. This helps with reader/buffered reader reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Don't keep entries, extracted generateData function (mostly to get more understandable profile) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Improve test and benchmark to cover all encodings. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds support for a new chunk format with encoding info. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Ingesters now support encoding config. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add support for no compression. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add docs Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove default Gzip for ByteChunk. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Removes none, snappyv2 and gzip-1 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Move log test lines to testdata and add supported encoding stringer Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * got linted Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
6 years ago
},
"append out of order in a new block right after cutting the previous one": func(t *testing.T, chk *MemChunk) {
assert.NoError(t, chk.Append(logprotoEntry(5, "test")))
assert.NoError(t, chk.Append(logprotoEntry(6, "test")))
assert.NoError(t, chk.cut())
if chk.headFmt == OrderedHeadBlockFmt {
assert.EqualError(t, chk.Append(logprotoEntry(1, "test")), ErrOutOfOrder.Error())
} else {
assert.NoError(t, chk.Append(logprotoEntry(1, "test")))
}
Adds configurable compression algorithms for chunks (#1411) * Adds L4Z encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds encoding benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds chunk size test Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy v2 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Improve benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove chunkenc Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update lz4 to latest master version. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use temporary buffer in serialise method to avoid allocations when doing string -> byte conversion. It also makes code little more readable. We pool those buffers for reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added gzip -1 for comparison. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Initialize reader and buffered reader lazily. This helps with reader/buffered reader reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Don't keep entries, extracted generateData function (mostly to get more understandable profile) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Improve test and benchmark to cover all encodings. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds support for a new chunk format with encoding info. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Ingesters now support encoding config. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add support for no compression. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add docs Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove default Gzip for ByteChunk. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Removes none, snappyv2 and gzip-1 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Move log test lines to testdata and add supported encoding stringer Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * got linted Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
6 years ago
},
"append out of order in a new block after multiple cuts": func(t *testing.T, chk *MemChunk) {
assert.NoError(t, chk.Append(logprotoEntry(5, "test")))
assert.NoError(t, chk.cut())
assert.NoError(t, chk.Append(logprotoEntry(6, "test")))
assert.NoError(t, chk.cut())
if chk.headFmt == OrderedHeadBlockFmt {
assert.EqualError(t, chk.Append(logprotoEntry(1, "test")), ErrOutOfOrder.Error())
} else {
assert.NoError(t, chk.Append(logprotoEntry(1, "test")))
}
Adds configurable compression algorithms for chunks (#1411) * Adds L4Z encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds encoding benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds chunk size test Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy v2 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Improve benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove chunkenc Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update lz4 to latest master version. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use temporary buffer in serialise method to avoid allocations when doing string -> byte conversion. It also makes code little more readable. We pool those buffers for reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added gzip -1 for comparison. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Initialize reader and buffered reader lazily. This helps with reader/buffered reader reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Don't keep entries, extracted generateData function (mostly to get more understandable profile) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Improve test and benchmark to cover all encodings. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds support for a new chunk format with encoding info. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Ingesters now support encoding config. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add support for no compression. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add docs Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove default Gzip for ByteChunk. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Removes none, snappyv2 and gzip-1 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Move log test lines to testdata and add supported encoding stringer Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * got linted Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
6 years ago
},
}
for _, f := range HeadBlockFmts {
for testName, tester := range tests {
tester := tester
Adds configurable compression algorithms for chunks (#1411) * Adds L4Z encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds encoding benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds chunk size test Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy v2 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Improve benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove chunkenc Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update lz4 to latest master version. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use temporary buffer in serialise method to avoid allocations when doing string -> byte conversion. It also makes code little more readable. We pool those buffers for reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added gzip -1 for comparison. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Initialize reader and buffered reader lazily. This helps with reader/buffered reader reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Don't keep entries, extracted generateData function (mostly to get more understandable profile) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Improve test and benchmark to cover all encodings. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds support for a new chunk format with encoding info. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Ingesters now support encoding config. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add support for no compression. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add docs Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove default Gzip for ByteChunk. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Removes none, snappyv2 and gzip-1 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Move log test lines to testdata and add supported encoding stringer Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * got linted Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
6 years ago
t.Run(testName, func(t *testing.T) {
t.Parallel()
Adds configurable compression algorithms for chunks (#1411) * Adds L4Z encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds encoding benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds chunk size test Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy v2 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Improve benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove chunkenc Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update lz4 to latest master version. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use temporary buffer in serialise method to avoid allocations when doing string -> byte conversion. It also makes code little more readable. We pool those buffers for reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added gzip -1 for comparison. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Initialize reader and buffered reader lazily. This helps with reader/buffered reader reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Don't keep entries, extracted generateData function (mostly to get more understandable profile) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Improve test and benchmark to cover all encodings. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds support for a new chunk format with encoding info. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Ingesters now support encoding config. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add support for no compression. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add docs Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove default Gzip for ByteChunk. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Removes none, snappyv2 and gzip-1 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Move log test lines to testdata and add supported encoding stringer Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * got linted Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
6 years ago
tester(t, NewMemChunk(EncGZIP, f, testBlockSize, testTargetSize))
})
}
Adds configurable compression algorithms for chunks (#1411) * Adds L4Z encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds encoding benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds chunk size test Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy v2 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Improve benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove chunkenc Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update lz4 to latest master version. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use temporary buffer in serialise method to avoid allocations when doing string -> byte conversion. It also makes code little more readable. We pool those buffers for reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added gzip -1 for comparison. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Initialize reader and buffered reader lazily. This helps with reader/buffered reader reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Don't keep entries, extracted generateData function (mostly to get more understandable profile) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Improve test and benchmark to cover all encodings. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds support for a new chunk format with encoding info. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Ingesters now support encoding config. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add support for no compression. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add docs Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove default Gzip for ByteChunk. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Removes none, snappyv2 and gzip-1 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Move log test lines to testdata and add supported encoding stringer Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * got linted Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
6 years ago
}
}
func TestChunkSize(t *testing.T) {
type res struct {
name string
size uint64
compressedSize uint64
ratio float64
}
var result []res
for _, bs := range testBlockSizes {
for _, f := range HeadBlockFmts {
for _, enc := range testEncoding {
name := fmt.Sprintf("%s_%s", enc.String(), humanize.Bytes(uint64(bs)))
t.Run(name, func(t *testing.T) {
c := NewMemChunk(enc, f, bs, testTargetSize)
inserted := fillChunk(c)
b, err := c.Bytes()
if err != nil {
t.Fatal(err)
}
result = append(result, res{
name: name,
size: uint64(inserted),
compressedSize: uint64(len(b)),
ratio: float64(inserted) / float64(len(b)),
})
})
}
}
Adds configurable compression algorithms for chunks (#1411) * Adds L4Z encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds encoding benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds chunk size test Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy v2 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Improve benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove chunkenc Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update lz4 to latest master version. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use temporary buffer in serialise method to avoid allocations when doing string -> byte conversion. It also makes code little more readable. We pool those buffers for reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added gzip -1 for comparison. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Initialize reader and buffered reader lazily. This helps with reader/buffered reader reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Don't keep entries, extracted generateData function (mostly to get more understandable profile) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Improve test and benchmark to cover all encodings. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds support for a new chunk format with encoding info. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Ingesters now support encoding config. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add support for no compression. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add docs Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove default Gzip for ByteChunk. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Removes none, snappyv2 and gzip-1 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Move log test lines to testdata and add supported encoding stringer Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * got linted Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
6 years ago
}
sort.Slice(result, func(i, j int) bool {
return result[i].ratio > result[j].ratio
})
fmt.Printf("%s\t%s\t%s\t%s\n", "name", "uncompressed", "compressed", "ratio")
for _, r := range result {
fmt.Printf("%s\t%s\t%s\t%f\n", r.name, humanize.Bytes(r.size), humanize.Bytes(r.compressedSize), r.ratio)
}
Adds configurable compression algorithms for chunks (#1411) * Adds L4Z encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds encoding benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds chunk size test Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy v2 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Improve benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove chunkenc Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update lz4 to latest master version. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use temporary buffer in serialise method to avoid allocations when doing string -> byte conversion. It also makes code little more readable. We pool those buffers for reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added gzip -1 for comparison. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Initialize reader and buffered reader lazily. This helps with reader/buffered reader reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Don't keep entries, extracted generateData function (mostly to get more understandable profile) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Improve test and benchmark to cover all encodings. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds support for a new chunk format with encoding info. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Ingesters now support encoding config. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add support for no compression. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add docs Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove default Gzip for ByteChunk. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Removes none, snappyv2 and gzip-1 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Move log test lines to testdata and add supported encoding stringer Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * got linted Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
6 years ago
}
func TestChunkStats(t *testing.T) {
c := NewMemChunk(EncSnappy, DefaultTestHeadBlockFmt, testBlockSize, 0)
first := time.Now()
entry := &logproto.Entry{
Timestamp: first,
Line: `ts=2020-03-16T13:58:33.459Z caller=dedupe.go:112 component=remote level=debug remote_name=3ea44a url=https:/blan.goo.net/api/prom/push msg=QueueManager.updateShardsLoop lowerBound=45.5 desiredShards=56.724401194003136 upperBound=84.5`,
}
inserted := 0
// fill the chunk with known data size.
for {
if !c.SpaceFor(entry) {
break
}
if err := c.Append(entry); err != nil {
t.Fatal(err)
}
inserted++
entry.Timestamp = entry.Timestamp.Add(time.Nanosecond)
}
expectedSize := (inserted * len(entry.Line)) + (inserted * 2 * binary.MaxVarintLen64)
statsCtx, ctx := stats.NewContext(context.Background())
it, err := c.Iterator(ctx, first.Add(-time.Hour), entry.Timestamp.Add(time.Hour), logproto.BACKWARD, noopStreamPipeline)
if err != nil {
t.Fatal(err)
}
for it.Next() {
}
if err := it.Close(); err != nil {
t.Fatal(err)
}
// test on a chunk filling up
s := statsCtx.Result(time.Since(first), 0, 0)
require.Equal(t, int64(expectedSize), s.Summary.TotalBytesProcessed)
require.Equal(t, int64(inserted), s.Summary.TotalLinesProcessed)
require.Equal(t, int64(expectedSize), s.TotalDecompressedBytes())
require.Equal(t, int64(inserted), s.TotalDecompressedLines())
b, err := c.Bytes()
if err != nil {
t.Fatal(err)
}
// test on a new chunk.
cb, err := NewByteChunk(b, testBlockSize, testTargetSize)
if err != nil {
t.Fatal(err)
}
statsCtx, ctx = stats.NewContext(context.Background())
it, err = cb.Iterator(ctx, first.Add(-time.Hour), entry.Timestamp.Add(time.Hour), logproto.BACKWARD, noopStreamPipeline)
if err != nil {
t.Fatal(err)
}
for it.Next() {
}
if err := it.Close(); err != nil {
t.Fatal(err)
}
s = statsCtx.Result(time.Since(first), 0, 0)
require.Equal(t, int64(expectedSize), s.Summary.TotalBytesProcessed)
require.Equal(t, int64(inserted), s.Summary.TotalLinesProcessed)
require.Equal(t, int64(expectedSize), s.TotalDecompressedBytes())
require.Equal(t, int64(inserted), s.TotalDecompressedLines())
}
func TestIteratorClose(t *testing.T) {
for _, f := range HeadBlockFmts {
for _, enc := range testEncoding {
t.Run(enc.String(), func(t *testing.T) {
for _, test := range []func(iter iter.EntryIterator, t *testing.T){
func(iter iter.EntryIterator, t *testing.T) {
// close without iterating
if err := iter.Close(); err != nil {
t.Fatal(err)
}
},
func(iter iter.EntryIterator, t *testing.T) {
// close after iterating
for iter.Next() {
_ = iter.Entry()
}
if err := iter.Close(); err != nil {
t.Fatal(err)
}
},
func(iter iter.EntryIterator, t *testing.T) {
// close after a single iteration
iter.Next()
_ = iter.Entry()
if err := iter.Close(); err != nil {
t.Fatal(err)
}
},
} {
c := NewMemChunk(enc, f, testBlockSize, testTargetSize)
inserted := fillChunk(c)
iter, err := c.Iterator(context.Background(), time.Unix(0, 0), time.Unix(0, inserted), logproto.BACKWARD, noopStreamPipeline)
if err != nil {
t.Fatal(err)
}
test(iter, t)
}
})
}
}
}
Adds configurable compression algorithms for chunks (#1411) * Adds L4Z encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds encoding benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds chunk size test Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy v2 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Improve benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove chunkenc Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update lz4 to latest master version. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use temporary buffer in serialise method to avoid allocations when doing string -> byte conversion. It also makes code little more readable. We pool those buffers for reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added gzip -1 for comparison. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Initialize reader and buffered reader lazily. This helps with reader/buffered reader reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Don't keep entries, extracted generateData function (mostly to get more understandable profile) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Improve test and benchmark to cover all encodings. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds support for a new chunk format with encoding info. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Ingesters now support encoding config. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add support for no compression. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add docs Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove default Gzip for ByteChunk. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Removes none, snappyv2 and gzip-1 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Move log test lines to testdata and add supported encoding stringer Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * got linted Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
6 years ago
func BenchmarkWrite(b *testing.B) {
entry := &logproto.Entry{
Timestamp: time.Unix(0, 0),
Line: testdata.LogString(0),
}
i := int64(0)
for _, f := range HeadBlockFmts {
for _, enc := range testEncoding {
for _, withNonIndexedLabels := range []bool{false, true} {
name := fmt.Sprintf("%v-%v", f, enc)
if withNonIndexedLabels {
name += "-withNonIndexedLabels"
Adds configurable compression algorithms for chunks (#1411) * Adds L4Z encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds encoding benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds chunk size test Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy v2 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Improve benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove chunkenc Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update lz4 to latest master version. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use temporary buffer in serialise method to avoid allocations when doing string -> byte conversion. It also makes code little more readable. We pool those buffers for reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added gzip -1 for comparison. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Initialize reader and buffered reader lazily. This helps with reader/buffered reader reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Don't keep entries, extracted generateData function (mostly to get more understandable profile) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Improve test and benchmark to cover all encodings. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds support for a new chunk format with encoding info. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Ingesters now support encoding config. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add support for no compression. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add docs Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove default Gzip for ByteChunk. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Removes none, snappyv2 and gzip-1 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Move log test lines to testdata and add supported encoding stringer Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * got linted Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
6 years ago
}
b.Run(name, func(b *testing.B) {
uncompressedBytes, compressedBytes := 0, 0
for n := 0; n < b.N; n++ {
c := NewMemChunk(enc, f, testBlockSize, testTargetSize)
// adds until full so we trigger cut which serialize using gzip
for c.SpaceFor(entry) {
_ = c.Append(entry)
entry.Timestamp = time.Unix(0, i)
entry.Line = testdata.LogString(i)
if withNonIndexedLabels {
entry.NonIndexedLabels = []logproto.LabelAdapter{
{Name: "foo", Value: fmt.Sprint(i)},
}
}
i++
}
uncompressedBytes += c.UncompressedSize()
compressedBytes += c.CompressedSize()
}
b.SetBytes(int64(uncompressedBytes) / int64(b.N))
b.ReportMetric(float64(compressedBytes)/float64(uncompressedBytes)*100, "%compressed")
})
}
}
Adds configurable compression algorithms for chunks (#1411) * Adds L4Z encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds encoding benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds chunk size test Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy v2 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Improve benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove chunkenc Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update lz4 to latest master version. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use temporary buffer in serialise method to avoid allocations when doing string -> byte conversion. It also makes code little more readable. We pool those buffers for reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added gzip -1 for comparison. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Initialize reader and buffered reader lazily. This helps with reader/buffered reader reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Don't keep entries, extracted generateData function (mostly to get more understandable profile) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Improve test and benchmark to cover all encodings. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds support for a new chunk format with encoding info. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Ingesters now support encoding config. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add support for no compression. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add docs Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove default Gzip for ByteChunk. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Removes none, snappyv2 and gzip-1 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Move log test lines to testdata and add supported encoding stringer Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * got linted Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
6 years ago
}
}
type nomatchPipeline struct{}
func (nomatchPipeline) BaseLabels() log.LabelsResult { return log.EmptyLabelsResult }
func (nomatchPipeline) Process(_ int64, line []byte, _ ...labels.Label) ([]byte, log.LabelsResult, bool) {
return line, nil, false
}
func (nomatchPipeline) ProcessString(_ int64, line string, _ ...labels.Label) (string, log.LabelsResult, bool) {
return line, nil, false
}
Adds configurable compression algorithms for chunks (#1411) * Adds L4Z encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds encoding benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds chunk size test Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy v2 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Improve benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove chunkenc Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update lz4 to latest master version. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use temporary buffer in serialise method to avoid allocations when doing string -> byte conversion. It also makes code little more readable. We pool those buffers for reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added gzip -1 for comparison. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Initialize reader and buffered reader lazily. This helps with reader/buffered reader reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Don't keep entries, extracted generateData function (mostly to get more understandable profile) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Improve test and benchmark to cover all encodings. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds support for a new chunk format with encoding info. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Ingesters now support encoding config. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add support for no compression. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add docs Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove default Gzip for ByteChunk. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Removes none, snappyv2 and gzip-1 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Move log test lines to testdata and add supported encoding stringer Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * got linted Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
6 years ago
func BenchmarkRead(b *testing.B) {
for _, bs := range testBlockSizes {
for _, enc := range testEncoding {
name := fmt.Sprintf("%s_%s", enc.String(), humanize.Bytes(uint64(bs)))
b.Run(name, func(b *testing.B) {
chunks, size := generateData(enc, 5, bs, testTargetSize)
chunks: improve readability of compression benchmarks (#7246) **What this PR does / why we need it**: ### BenchmarkWrite Use Go runtime facilities to report MB/s for each operation and compression ratio. Before: ``` BenchmarkWrite/ordered-none-4 1018 1104885 ns/op 1431962 B/op 33 allocs/op BenchmarkWrite/ordered-gzip-4 12 87333201 ns/op 3516699 B/op 696 allocs/op BenchmarkWrite/ordered-lz4-64k-4 51 21483048 ns/op 3095117 B/op 649 allocs/op ``` After: ``` BenchmarkWrite/ordered-none-4 841 1202007 ns/op 1054353.53 MB/s 101.9 %compressed 1569953 B/op 34 allocs/op BenchmarkWrite/ordered-gzip-4 13 86299699 ns/op 3357.25 MB/s 6.891 %compressed 3778896 B/op 702 allocs/op BenchmarkWrite/ordered-lz4-64k-4 49 22332214 ns/op 34107.61 MB/s 9.880 %compressed 3407522 B/op 661 allocs/op ``` Stop collecting all compressed chunks, as this blows up the memory of the benchmark and creates unrealistic test conditions for the later encoding. Before: `4724260 maxresident KB`; after: `375804 maxresident KB` ### BenchmarkRead Replace reporting of MB/s which gave several values for each encoding as Go hunted for the right number of benchmark iterations. Add stats to the context for decoding, otherwise a new stats object is created each time round the loop. Re-order so all unsampled results come before all sampled results. Before: ``` BenchmarkRead/none_66_kB-4 278 4555157 ns/op 131744 B/op 717 allocs/op BenchmarkRead/sample_none_66_kB-4 142 8238861 ns/op 129771 B/op 717 allocs/op BenchmarkRead/gzip_66_kB-4 6 179445187 ns/op 2100498 B/op 15010 allocs/op BenchmarkRead/sample_gzip_66_kB-4 4 261040057 ns/op 2076030 B/op 15028 allocs/op BenchmarkRead/lz4-64k_66_kB-4 19 62855240 ns/op 1722979 B/op 14219 allocs/op BenchmarkRead/sample_lz4-64k_66_kB-4 10 115577796 ns/op 1706040 B/op 14220 allocs/op BenchmarkRead/lz4-256k_66_kB-4 18 66144317 ns/op 1880851 B/op 15078 allocs/op BenchmarkRead/sample_lz4-256k_66_kB-4 9 118405993 ns/op 1800396 B/op 15077 allocs/op BenchmarkRead/lz4-1M_66_kB-4 18 67832189 ns/op 1821806 B/op 15076 allocs/op ... none_524 kB: 1828.92 MB/s none_524 kB: 1809.33 MB/s none_262 kB: 1790.79 MB/s none_66 kB: 1673.84 MB/s ... gzip_66 kB: 647.35 MB/s ... ``` After: ``` BenchmarkRead/none_66_kB-4 286 4377585 ns/op 1725.23 MB/s 41896 B/op 600 allocs/op BenchmarkRead/gzip_66_kB-4 6 180770638 ns/op 655.82 MB/s 723426 B/op 13265 allocs/op BenchmarkRead/lz4-64k_66_kB-4 19 65380524 ns/op 1296.10 MB/s 727571 B/op 12925 allocs/op BenchmarkRead/lz4-256k_66_kB-4 16 63737292 ns/op 1409.75 MB/s 786994 B/op 13707 allocs/op BenchmarkRead/lz4-1M_66_kB-4 18 65963825 ns/op 1362.17 MB/s 770441 B/op 13707 allocs/op BenchmarkRead/lz4_66_kB-4 18 64955415 ns/op 1383.31 MB/s 1003277 B/op 13707 allocs/op BenchmarkRead/snappy_66_kB-4 20 51588959 ns/op 1273.38 MB/s 361872 B/op 5020 allocs/op BenchmarkRead/flate_66_kB-4 6 172147502 ns/op 691.74 MB/s 715706 B/op 13242 allocs/op BenchmarkRead/zstd_66_kB-4 3 359473273 ns/op 390.07 MB/s 427247248 B/op 18551 allocs/op ... BenchmarkRead/sample_none_66_kB-4 135 8533283 ns/op 885.04 MB/s 39866 B/op 600 allocs/op BenchmarkRead/sample_gzip_66_kB-4 5 243016949 ns/op 487.84 MB/s 686358 B/op 13212 allocs/op BenchmarkRead/sample_lz4-64k_66_kB-4 10 114330032 ns/op 741.18 MB/s 707104 B/op 12926 allocs/op BenchmarkRead/sample_lz4-256k_66_kB-4 9 117294928 ns/op 766.05 MB/s 778461 B/op 13709 allocs/op ... ``` Note "MB" in previous code was 2^^20 while Go uses 10^6. **Checklist** - [x] Reviewed the `CONTRIBUTING.md` guide - NA Documentation added - [x] Tests updated - NA `CHANGELOG.md` updated - not user-facing - NA Changes that require user attention or interaction to upgrade are documented in `docs/sources/upgrading/_index.md`
3 years ago
_, ctx := stats.NewContext(context.Background())
b.ResetTimer()
for n := 0; n < b.N; n++ {
for _, c := range chunks {
// use forward iterator for benchmark -- backward iterator does extra allocations by keeping entries in memory
chunks: improve readability of compression benchmarks (#7246) **What this PR does / why we need it**: ### BenchmarkWrite Use Go runtime facilities to report MB/s for each operation and compression ratio. Before: ``` BenchmarkWrite/ordered-none-4 1018 1104885 ns/op 1431962 B/op 33 allocs/op BenchmarkWrite/ordered-gzip-4 12 87333201 ns/op 3516699 B/op 696 allocs/op BenchmarkWrite/ordered-lz4-64k-4 51 21483048 ns/op 3095117 B/op 649 allocs/op ``` After: ``` BenchmarkWrite/ordered-none-4 841 1202007 ns/op 1054353.53 MB/s 101.9 %compressed 1569953 B/op 34 allocs/op BenchmarkWrite/ordered-gzip-4 13 86299699 ns/op 3357.25 MB/s 6.891 %compressed 3778896 B/op 702 allocs/op BenchmarkWrite/ordered-lz4-64k-4 49 22332214 ns/op 34107.61 MB/s 9.880 %compressed 3407522 B/op 661 allocs/op ``` Stop collecting all compressed chunks, as this blows up the memory of the benchmark and creates unrealistic test conditions for the later encoding. Before: `4724260 maxresident KB`; after: `375804 maxresident KB` ### BenchmarkRead Replace reporting of MB/s which gave several values for each encoding as Go hunted for the right number of benchmark iterations. Add stats to the context for decoding, otherwise a new stats object is created each time round the loop. Re-order so all unsampled results come before all sampled results. Before: ``` BenchmarkRead/none_66_kB-4 278 4555157 ns/op 131744 B/op 717 allocs/op BenchmarkRead/sample_none_66_kB-4 142 8238861 ns/op 129771 B/op 717 allocs/op BenchmarkRead/gzip_66_kB-4 6 179445187 ns/op 2100498 B/op 15010 allocs/op BenchmarkRead/sample_gzip_66_kB-4 4 261040057 ns/op 2076030 B/op 15028 allocs/op BenchmarkRead/lz4-64k_66_kB-4 19 62855240 ns/op 1722979 B/op 14219 allocs/op BenchmarkRead/sample_lz4-64k_66_kB-4 10 115577796 ns/op 1706040 B/op 14220 allocs/op BenchmarkRead/lz4-256k_66_kB-4 18 66144317 ns/op 1880851 B/op 15078 allocs/op BenchmarkRead/sample_lz4-256k_66_kB-4 9 118405993 ns/op 1800396 B/op 15077 allocs/op BenchmarkRead/lz4-1M_66_kB-4 18 67832189 ns/op 1821806 B/op 15076 allocs/op ... none_524 kB: 1828.92 MB/s none_524 kB: 1809.33 MB/s none_262 kB: 1790.79 MB/s none_66 kB: 1673.84 MB/s ... gzip_66 kB: 647.35 MB/s ... ``` After: ``` BenchmarkRead/none_66_kB-4 286 4377585 ns/op 1725.23 MB/s 41896 B/op 600 allocs/op BenchmarkRead/gzip_66_kB-4 6 180770638 ns/op 655.82 MB/s 723426 B/op 13265 allocs/op BenchmarkRead/lz4-64k_66_kB-4 19 65380524 ns/op 1296.10 MB/s 727571 B/op 12925 allocs/op BenchmarkRead/lz4-256k_66_kB-4 16 63737292 ns/op 1409.75 MB/s 786994 B/op 13707 allocs/op BenchmarkRead/lz4-1M_66_kB-4 18 65963825 ns/op 1362.17 MB/s 770441 B/op 13707 allocs/op BenchmarkRead/lz4_66_kB-4 18 64955415 ns/op 1383.31 MB/s 1003277 B/op 13707 allocs/op BenchmarkRead/snappy_66_kB-4 20 51588959 ns/op 1273.38 MB/s 361872 B/op 5020 allocs/op BenchmarkRead/flate_66_kB-4 6 172147502 ns/op 691.74 MB/s 715706 B/op 13242 allocs/op BenchmarkRead/zstd_66_kB-4 3 359473273 ns/op 390.07 MB/s 427247248 B/op 18551 allocs/op ... BenchmarkRead/sample_none_66_kB-4 135 8533283 ns/op 885.04 MB/s 39866 B/op 600 allocs/op BenchmarkRead/sample_gzip_66_kB-4 5 243016949 ns/op 487.84 MB/s 686358 B/op 13212 allocs/op BenchmarkRead/sample_lz4-64k_66_kB-4 10 114330032 ns/op 741.18 MB/s 707104 B/op 12926 allocs/op BenchmarkRead/sample_lz4-256k_66_kB-4 9 117294928 ns/op 766.05 MB/s 778461 B/op 13709 allocs/op ... ``` Note "MB" in previous code was 2^^20 while Go uses 10^6. **Checklist** - [x] Reviewed the `CONTRIBUTING.md` guide - NA Documentation added - [x] Tests updated - NA `CHANGELOG.md` updated - not user-facing - NA Changes that require user attention or interaction to upgrade are documented in `docs/sources/upgrading/_index.md`
3 years ago
iterator, err := c.Iterator(ctx, time.Unix(0, 0), time.Now(), logproto.FORWARD, nomatchPipeline{})
if err != nil {
panic(err)
}
for iterator.Next() {
_ = iterator.Entry()
}
if err := iterator.Close(); err != nil {
b.Fatal(err)
}
Adds configurable compression algorithms for chunks (#1411) * Adds L4Z encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds encoding benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds chunk size test Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy v2 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Improve benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove chunkenc Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update lz4 to latest master version. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use temporary buffer in serialise method to avoid allocations when doing string -> byte conversion. It also makes code little more readable. We pool those buffers for reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added gzip -1 for comparison. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Initialize reader and buffered reader lazily. This helps with reader/buffered reader reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Don't keep entries, extracted generateData function (mostly to get more understandable profile) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Improve test and benchmark to cover all encodings. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds support for a new chunk format with encoding info. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Ingesters now support encoding config. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add support for no compression. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add docs Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove default Gzip for ByteChunk. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Removes none, snappyv2 and gzip-1 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Move log test lines to testdata and add supported encoding stringer Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * got linted Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
6 years ago
}
}
chunks: improve readability of compression benchmarks (#7246) **What this PR does / why we need it**: ### BenchmarkWrite Use Go runtime facilities to report MB/s for each operation and compression ratio. Before: ``` BenchmarkWrite/ordered-none-4 1018 1104885 ns/op 1431962 B/op 33 allocs/op BenchmarkWrite/ordered-gzip-4 12 87333201 ns/op 3516699 B/op 696 allocs/op BenchmarkWrite/ordered-lz4-64k-4 51 21483048 ns/op 3095117 B/op 649 allocs/op ``` After: ``` BenchmarkWrite/ordered-none-4 841 1202007 ns/op 1054353.53 MB/s 101.9 %compressed 1569953 B/op 34 allocs/op BenchmarkWrite/ordered-gzip-4 13 86299699 ns/op 3357.25 MB/s 6.891 %compressed 3778896 B/op 702 allocs/op BenchmarkWrite/ordered-lz4-64k-4 49 22332214 ns/op 34107.61 MB/s 9.880 %compressed 3407522 B/op 661 allocs/op ``` Stop collecting all compressed chunks, as this blows up the memory of the benchmark and creates unrealistic test conditions for the later encoding. Before: `4724260 maxresident KB`; after: `375804 maxresident KB` ### BenchmarkRead Replace reporting of MB/s which gave several values for each encoding as Go hunted for the right number of benchmark iterations. Add stats to the context for decoding, otherwise a new stats object is created each time round the loop. Re-order so all unsampled results come before all sampled results. Before: ``` BenchmarkRead/none_66_kB-4 278 4555157 ns/op 131744 B/op 717 allocs/op BenchmarkRead/sample_none_66_kB-4 142 8238861 ns/op 129771 B/op 717 allocs/op BenchmarkRead/gzip_66_kB-4 6 179445187 ns/op 2100498 B/op 15010 allocs/op BenchmarkRead/sample_gzip_66_kB-4 4 261040057 ns/op 2076030 B/op 15028 allocs/op BenchmarkRead/lz4-64k_66_kB-4 19 62855240 ns/op 1722979 B/op 14219 allocs/op BenchmarkRead/sample_lz4-64k_66_kB-4 10 115577796 ns/op 1706040 B/op 14220 allocs/op BenchmarkRead/lz4-256k_66_kB-4 18 66144317 ns/op 1880851 B/op 15078 allocs/op BenchmarkRead/sample_lz4-256k_66_kB-4 9 118405993 ns/op 1800396 B/op 15077 allocs/op BenchmarkRead/lz4-1M_66_kB-4 18 67832189 ns/op 1821806 B/op 15076 allocs/op ... none_524 kB: 1828.92 MB/s none_524 kB: 1809.33 MB/s none_262 kB: 1790.79 MB/s none_66 kB: 1673.84 MB/s ... gzip_66 kB: 647.35 MB/s ... ``` After: ``` BenchmarkRead/none_66_kB-4 286 4377585 ns/op 1725.23 MB/s 41896 B/op 600 allocs/op BenchmarkRead/gzip_66_kB-4 6 180770638 ns/op 655.82 MB/s 723426 B/op 13265 allocs/op BenchmarkRead/lz4-64k_66_kB-4 19 65380524 ns/op 1296.10 MB/s 727571 B/op 12925 allocs/op BenchmarkRead/lz4-256k_66_kB-4 16 63737292 ns/op 1409.75 MB/s 786994 B/op 13707 allocs/op BenchmarkRead/lz4-1M_66_kB-4 18 65963825 ns/op 1362.17 MB/s 770441 B/op 13707 allocs/op BenchmarkRead/lz4_66_kB-4 18 64955415 ns/op 1383.31 MB/s 1003277 B/op 13707 allocs/op BenchmarkRead/snappy_66_kB-4 20 51588959 ns/op 1273.38 MB/s 361872 B/op 5020 allocs/op BenchmarkRead/flate_66_kB-4 6 172147502 ns/op 691.74 MB/s 715706 B/op 13242 allocs/op BenchmarkRead/zstd_66_kB-4 3 359473273 ns/op 390.07 MB/s 427247248 B/op 18551 allocs/op ... BenchmarkRead/sample_none_66_kB-4 135 8533283 ns/op 885.04 MB/s 39866 B/op 600 allocs/op BenchmarkRead/sample_gzip_66_kB-4 5 243016949 ns/op 487.84 MB/s 686358 B/op 13212 allocs/op BenchmarkRead/sample_lz4-64k_66_kB-4 10 114330032 ns/op 741.18 MB/s 707104 B/op 12926 allocs/op BenchmarkRead/sample_lz4-256k_66_kB-4 9 117294928 ns/op 766.05 MB/s 778461 B/op 13709 allocs/op ... ``` Note "MB" in previous code was 2^^20 while Go uses 10^6. **Checklist** - [x] Reviewed the `CONTRIBUTING.md` guide - NA Documentation added - [x] Tests updated - NA `CHANGELOG.md` updated - not user-facing - NA Changes that require user attention or interaction to upgrade are documented in `docs/sources/upgrading/_index.md`
3 years ago
b.SetBytes(int64(size))
})
chunks: improve readability of compression benchmarks (#7246) **What this PR does / why we need it**: ### BenchmarkWrite Use Go runtime facilities to report MB/s for each operation and compression ratio. Before: ``` BenchmarkWrite/ordered-none-4 1018 1104885 ns/op 1431962 B/op 33 allocs/op BenchmarkWrite/ordered-gzip-4 12 87333201 ns/op 3516699 B/op 696 allocs/op BenchmarkWrite/ordered-lz4-64k-4 51 21483048 ns/op 3095117 B/op 649 allocs/op ``` After: ``` BenchmarkWrite/ordered-none-4 841 1202007 ns/op 1054353.53 MB/s 101.9 %compressed 1569953 B/op 34 allocs/op BenchmarkWrite/ordered-gzip-4 13 86299699 ns/op 3357.25 MB/s 6.891 %compressed 3778896 B/op 702 allocs/op BenchmarkWrite/ordered-lz4-64k-4 49 22332214 ns/op 34107.61 MB/s 9.880 %compressed 3407522 B/op 661 allocs/op ``` Stop collecting all compressed chunks, as this blows up the memory of the benchmark and creates unrealistic test conditions for the later encoding. Before: `4724260 maxresident KB`; after: `375804 maxresident KB` ### BenchmarkRead Replace reporting of MB/s which gave several values for each encoding as Go hunted for the right number of benchmark iterations. Add stats to the context for decoding, otherwise a new stats object is created each time round the loop. Re-order so all unsampled results come before all sampled results. Before: ``` BenchmarkRead/none_66_kB-4 278 4555157 ns/op 131744 B/op 717 allocs/op BenchmarkRead/sample_none_66_kB-4 142 8238861 ns/op 129771 B/op 717 allocs/op BenchmarkRead/gzip_66_kB-4 6 179445187 ns/op 2100498 B/op 15010 allocs/op BenchmarkRead/sample_gzip_66_kB-4 4 261040057 ns/op 2076030 B/op 15028 allocs/op BenchmarkRead/lz4-64k_66_kB-4 19 62855240 ns/op 1722979 B/op 14219 allocs/op BenchmarkRead/sample_lz4-64k_66_kB-4 10 115577796 ns/op 1706040 B/op 14220 allocs/op BenchmarkRead/lz4-256k_66_kB-4 18 66144317 ns/op 1880851 B/op 15078 allocs/op BenchmarkRead/sample_lz4-256k_66_kB-4 9 118405993 ns/op 1800396 B/op 15077 allocs/op BenchmarkRead/lz4-1M_66_kB-4 18 67832189 ns/op 1821806 B/op 15076 allocs/op ... none_524 kB: 1828.92 MB/s none_524 kB: 1809.33 MB/s none_262 kB: 1790.79 MB/s none_66 kB: 1673.84 MB/s ... gzip_66 kB: 647.35 MB/s ... ``` After: ``` BenchmarkRead/none_66_kB-4 286 4377585 ns/op 1725.23 MB/s 41896 B/op 600 allocs/op BenchmarkRead/gzip_66_kB-4 6 180770638 ns/op 655.82 MB/s 723426 B/op 13265 allocs/op BenchmarkRead/lz4-64k_66_kB-4 19 65380524 ns/op 1296.10 MB/s 727571 B/op 12925 allocs/op BenchmarkRead/lz4-256k_66_kB-4 16 63737292 ns/op 1409.75 MB/s 786994 B/op 13707 allocs/op BenchmarkRead/lz4-1M_66_kB-4 18 65963825 ns/op 1362.17 MB/s 770441 B/op 13707 allocs/op BenchmarkRead/lz4_66_kB-4 18 64955415 ns/op 1383.31 MB/s 1003277 B/op 13707 allocs/op BenchmarkRead/snappy_66_kB-4 20 51588959 ns/op 1273.38 MB/s 361872 B/op 5020 allocs/op BenchmarkRead/flate_66_kB-4 6 172147502 ns/op 691.74 MB/s 715706 B/op 13242 allocs/op BenchmarkRead/zstd_66_kB-4 3 359473273 ns/op 390.07 MB/s 427247248 B/op 18551 allocs/op ... BenchmarkRead/sample_none_66_kB-4 135 8533283 ns/op 885.04 MB/s 39866 B/op 600 allocs/op BenchmarkRead/sample_gzip_66_kB-4 5 243016949 ns/op 487.84 MB/s 686358 B/op 13212 allocs/op BenchmarkRead/sample_lz4-64k_66_kB-4 10 114330032 ns/op 741.18 MB/s 707104 B/op 12926 allocs/op BenchmarkRead/sample_lz4-256k_66_kB-4 9 117294928 ns/op 766.05 MB/s 778461 B/op 13709 allocs/op ... ``` Note "MB" in previous code was 2^^20 while Go uses 10^6. **Checklist** - [x] Reviewed the `CONTRIBUTING.md` guide - NA Documentation added - [x] Tests updated - NA `CHANGELOG.md` updated - not user-facing - NA Changes that require user attention or interaction to upgrade are documented in `docs/sources/upgrading/_index.md`
3 years ago
}
}
chunks: improve readability of compression benchmarks (#7246) **What this PR does / why we need it**: ### BenchmarkWrite Use Go runtime facilities to report MB/s for each operation and compression ratio. Before: ``` BenchmarkWrite/ordered-none-4 1018 1104885 ns/op 1431962 B/op 33 allocs/op BenchmarkWrite/ordered-gzip-4 12 87333201 ns/op 3516699 B/op 696 allocs/op BenchmarkWrite/ordered-lz4-64k-4 51 21483048 ns/op 3095117 B/op 649 allocs/op ``` After: ``` BenchmarkWrite/ordered-none-4 841 1202007 ns/op 1054353.53 MB/s 101.9 %compressed 1569953 B/op 34 allocs/op BenchmarkWrite/ordered-gzip-4 13 86299699 ns/op 3357.25 MB/s 6.891 %compressed 3778896 B/op 702 allocs/op BenchmarkWrite/ordered-lz4-64k-4 49 22332214 ns/op 34107.61 MB/s 9.880 %compressed 3407522 B/op 661 allocs/op ``` Stop collecting all compressed chunks, as this blows up the memory of the benchmark and creates unrealistic test conditions for the later encoding. Before: `4724260 maxresident KB`; after: `375804 maxresident KB` ### BenchmarkRead Replace reporting of MB/s which gave several values for each encoding as Go hunted for the right number of benchmark iterations. Add stats to the context for decoding, otherwise a new stats object is created each time round the loop. Re-order so all unsampled results come before all sampled results. Before: ``` BenchmarkRead/none_66_kB-4 278 4555157 ns/op 131744 B/op 717 allocs/op BenchmarkRead/sample_none_66_kB-4 142 8238861 ns/op 129771 B/op 717 allocs/op BenchmarkRead/gzip_66_kB-4 6 179445187 ns/op 2100498 B/op 15010 allocs/op BenchmarkRead/sample_gzip_66_kB-4 4 261040057 ns/op 2076030 B/op 15028 allocs/op BenchmarkRead/lz4-64k_66_kB-4 19 62855240 ns/op 1722979 B/op 14219 allocs/op BenchmarkRead/sample_lz4-64k_66_kB-4 10 115577796 ns/op 1706040 B/op 14220 allocs/op BenchmarkRead/lz4-256k_66_kB-4 18 66144317 ns/op 1880851 B/op 15078 allocs/op BenchmarkRead/sample_lz4-256k_66_kB-4 9 118405993 ns/op 1800396 B/op 15077 allocs/op BenchmarkRead/lz4-1M_66_kB-4 18 67832189 ns/op 1821806 B/op 15076 allocs/op ... none_524 kB: 1828.92 MB/s none_524 kB: 1809.33 MB/s none_262 kB: 1790.79 MB/s none_66 kB: 1673.84 MB/s ... gzip_66 kB: 647.35 MB/s ... ``` After: ``` BenchmarkRead/none_66_kB-4 286 4377585 ns/op 1725.23 MB/s 41896 B/op 600 allocs/op BenchmarkRead/gzip_66_kB-4 6 180770638 ns/op 655.82 MB/s 723426 B/op 13265 allocs/op BenchmarkRead/lz4-64k_66_kB-4 19 65380524 ns/op 1296.10 MB/s 727571 B/op 12925 allocs/op BenchmarkRead/lz4-256k_66_kB-4 16 63737292 ns/op 1409.75 MB/s 786994 B/op 13707 allocs/op BenchmarkRead/lz4-1M_66_kB-4 18 65963825 ns/op 1362.17 MB/s 770441 B/op 13707 allocs/op BenchmarkRead/lz4_66_kB-4 18 64955415 ns/op 1383.31 MB/s 1003277 B/op 13707 allocs/op BenchmarkRead/snappy_66_kB-4 20 51588959 ns/op 1273.38 MB/s 361872 B/op 5020 allocs/op BenchmarkRead/flate_66_kB-4 6 172147502 ns/op 691.74 MB/s 715706 B/op 13242 allocs/op BenchmarkRead/zstd_66_kB-4 3 359473273 ns/op 390.07 MB/s 427247248 B/op 18551 allocs/op ... BenchmarkRead/sample_none_66_kB-4 135 8533283 ns/op 885.04 MB/s 39866 B/op 600 allocs/op BenchmarkRead/sample_gzip_66_kB-4 5 243016949 ns/op 487.84 MB/s 686358 B/op 13212 allocs/op BenchmarkRead/sample_lz4-64k_66_kB-4 10 114330032 ns/op 741.18 MB/s 707104 B/op 12926 allocs/op BenchmarkRead/sample_lz4-256k_66_kB-4 9 117294928 ns/op 766.05 MB/s 778461 B/op 13709 allocs/op ... ``` Note "MB" in previous code was 2^^20 while Go uses 10^6. **Checklist** - [x] Reviewed the `CONTRIBUTING.md` guide - NA Documentation added - [x] Tests updated - NA `CHANGELOG.md` updated - not user-facing - NA Changes that require user attention or interaction to upgrade are documented in `docs/sources/upgrading/_index.md`
3 years ago
for _, bs := range testBlockSizes {
for _, enc := range testEncoding {
name := fmt.Sprintf("sample_%s_%s", enc.String(), humanize.Bytes(uint64(bs)))
b.Run(name, func(b *testing.B) {
chunks, size := generateData(enc, 5, bs, testTargetSize)
chunks: improve readability of compression benchmarks (#7246) **What this PR does / why we need it**: ### BenchmarkWrite Use Go runtime facilities to report MB/s for each operation and compression ratio. Before: ``` BenchmarkWrite/ordered-none-4 1018 1104885 ns/op 1431962 B/op 33 allocs/op BenchmarkWrite/ordered-gzip-4 12 87333201 ns/op 3516699 B/op 696 allocs/op BenchmarkWrite/ordered-lz4-64k-4 51 21483048 ns/op 3095117 B/op 649 allocs/op ``` After: ``` BenchmarkWrite/ordered-none-4 841 1202007 ns/op 1054353.53 MB/s 101.9 %compressed 1569953 B/op 34 allocs/op BenchmarkWrite/ordered-gzip-4 13 86299699 ns/op 3357.25 MB/s 6.891 %compressed 3778896 B/op 702 allocs/op BenchmarkWrite/ordered-lz4-64k-4 49 22332214 ns/op 34107.61 MB/s 9.880 %compressed 3407522 B/op 661 allocs/op ``` Stop collecting all compressed chunks, as this blows up the memory of the benchmark and creates unrealistic test conditions for the later encoding. Before: `4724260 maxresident KB`; after: `375804 maxresident KB` ### BenchmarkRead Replace reporting of MB/s which gave several values for each encoding as Go hunted for the right number of benchmark iterations. Add stats to the context for decoding, otherwise a new stats object is created each time round the loop. Re-order so all unsampled results come before all sampled results. Before: ``` BenchmarkRead/none_66_kB-4 278 4555157 ns/op 131744 B/op 717 allocs/op BenchmarkRead/sample_none_66_kB-4 142 8238861 ns/op 129771 B/op 717 allocs/op BenchmarkRead/gzip_66_kB-4 6 179445187 ns/op 2100498 B/op 15010 allocs/op BenchmarkRead/sample_gzip_66_kB-4 4 261040057 ns/op 2076030 B/op 15028 allocs/op BenchmarkRead/lz4-64k_66_kB-4 19 62855240 ns/op 1722979 B/op 14219 allocs/op BenchmarkRead/sample_lz4-64k_66_kB-4 10 115577796 ns/op 1706040 B/op 14220 allocs/op BenchmarkRead/lz4-256k_66_kB-4 18 66144317 ns/op 1880851 B/op 15078 allocs/op BenchmarkRead/sample_lz4-256k_66_kB-4 9 118405993 ns/op 1800396 B/op 15077 allocs/op BenchmarkRead/lz4-1M_66_kB-4 18 67832189 ns/op 1821806 B/op 15076 allocs/op ... none_524 kB: 1828.92 MB/s none_524 kB: 1809.33 MB/s none_262 kB: 1790.79 MB/s none_66 kB: 1673.84 MB/s ... gzip_66 kB: 647.35 MB/s ... ``` After: ``` BenchmarkRead/none_66_kB-4 286 4377585 ns/op 1725.23 MB/s 41896 B/op 600 allocs/op BenchmarkRead/gzip_66_kB-4 6 180770638 ns/op 655.82 MB/s 723426 B/op 13265 allocs/op BenchmarkRead/lz4-64k_66_kB-4 19 65380524 ns/op 1296.10 MB/s 727571 B/op 12925 allocs/op BenchmarkRead/lz4-256k_66_kB-4 16 63737292 ns/op 1409.75 MB/s 786994 B/op 13707 allocs/op BenchmarkRead/lz4-1M_66_kB-4 18 65963825 ns/op 1362.17 MB/s 770441 B/op 13707 allocs/op BenchmarkRead/lz4_66_kB-4 18 64955415 ns/op 1383.31 MB/s 1003277 B/op 13707 allocs/op BenchmarkRead/snappy_66_kB-4 20 51588959 ns/op 1273.38 MB/s 361872 B/op 5020 allocs/op BenchmarkRead/flate_66_kB-4 6 172147502 ns/op 691.74 MB/s 715706 B/op 13242 allocs/op BenchmarkRead/zstd_66_kB-4 3 359473273 ns/op 390.07 MB/s 427247248 B/op 18551 allocs/op ... BenchmarkRead/sample_none_66_kB-4 135 8533283 ns/op 885.04 MB/s 39866 B/op 600 allocs/op BenchmarkRead/sample_gzip_66_kB-4 5 243016949 ns/op 487.84 MB/s 686358 B/op 13212 allocs/op BenchmarkRead/sample_lz4-64k_66_kB-4 10 114330032 ns/op 741.18 MB/s 707104 B/op 12926 allocs/op BenchmarkRead/sample_lz4-256k_66_kB-4 9 117294928 ns/op 766.05 MB/s 778461 B/op 13709 allocs/op ... ``` Note "MB" in previous code was 2^^20 while Go uses 10^6. **Checklist** - [x] Reviewed the `CONTRIBUTING.md` guide - NA Documentation added - [x] Tests updated - NA `CHANGELOG.md` updated - not user-facing - NA Changes that require user attention or interaction to upgrade are documented in `docs/sources/upgrading/_index.md`
3 years ago
_, ctx := stats.NewContext(context.Background())
b.ResetTimer()
bytesRead := uint64(0)
for n := 0; n < b.N; n++ {
for _, c := range chunks {
chunks: improve readability of compression benchmarks (#7246) **What this PR does / why we need it**: ### BenchmarkWrite Use Go runtime facilities to report MB/s for each operation and compression ratio. Before: ``` BenchmarkWrite/ordered-none-4 1018 1104885 ns/op 1431962 B/op 33 allocs/op BenchmarkWrite/ordered-gzip-4 12 87333201 ns/op 3516699 B/op 696 allocs/op BenchmarkWrite/ordered-lz4-64k-4 51 21483048 ns/op 3095117 B/op 649 allocs/op ``` After: ``` BenchmarkWrite/ordered-none-4 841 1202007 ns/op 1054353.53 MB/s 101.9 %compressed 1569953 B/op 34 allocs/op BenchmarkWrite/ordered-gzip-4 13 86299699 ns/op 3357.25 MB/s 6.891 %compressed 3778896 B/op 702 allocs/op BenchmarkWrite/ordered-lz4-64k-4 49 22332214 ns/op 34107.61 MB/s 9.880 %compressed 3407522 B/op 661 allocs/op ``` Stop collecting all compressed chunks, as this blows up the memory of the benchmark and creates unrealistic test conditions for the later encoding. Before: `4724260 maxresident KB`; after: `375804 maxresident KB` ### BenchmarkRead Replace reporting of MB/s which gave several values for each encoding as Go hunted for the right number of benchmark iterations. Add stats to the context for decoding, otherwise a new stats object is created each time round the loop. Re-order so all unsampled results come before all sampled results. Before: ``` BenchmarkRead/none_66_kB-4 278 4555157 ns/op 131744 B/op 717 allocs/op BenchmarkRead/sample_none_66_kB-4 142 8238861 ns/op 129771 B/op 717 allocs/op BenchmarkRead/gzip_66_kB-4 6 179445187 ns/op 2100498 B/op 15010 allocs/op BenchmarkRead/sample_gzip_66_kB-4 4 261040057 ns/op 2076030 B/op 15028 allocs/op BenchmarkRead/lz4-64k_66_kB-4 19 62855240 ns/op 1722979 B/op 14219 allocs/op BenchmarkRead/sample_lz4-64k_66_kB-4 10 115577796 ns/op 1706040 B/op 14220 allocs/op BenchmarkRead/lz4-256k_66_kB-4 18 66144317 ns/op 1880851 B/op 15078 allocs/op BenchmarkRead/sample_lz4-256k_66_kB-4 9 118405993 ns/op 1800396 B/op 15077 allocs/op BenchmarkRead/lz4-1M_66_kB-4 18 67832189 ns/op 1821806 B/op 15076 allocs/op ... none_524 kB: 1828.92 MB/s none_524 kB: 1809.33 MB/s none_262 kB: 1790.79 MB/s none_66 kB: 1673.84 MB/s ... gzip_66 kB: 647.35 MB/s ... ``` After: ``` BenchmarkRead/none_66_kB-4 286 4377585 ns/op 1725.23 MB/s 41896 B/op 600 allocs/op BenchmarkRead/gzip_66_kB-4 6 180770638 ns/op 655.82 MB/s 723426 B/op 13265 allocs/op BenchmarkRead/lz4-64k_66_kB-4 19 65380524 ns/op 1296.10 MB/s 727571 B/op 12925 allocs/op BenchmarkRead/lz4-256k_66_kB-4 16 63737292 ns/op 1409.75 MB/s 786994 B/op 13707 allocs/op BenchmarkRead/lz4-1M_66_kB-4 18 65963825 ns/op 1362.17 MB/s 770441 B/op 13707 allocs/op BenchmarkRead/lz4_66_kB-4 18 64955415 ns/op 1383.31 MB/s 1003277 B/op 13707 allocs/op BenchmarkRead/snappy_66_kB-4 20 51588959 ns/op 1273.38 MB/s 361872 B/op 5020 allocs/op BenchmarkRead/flate_66_kB-4 6 172147502 ns/op 691.74 MB/s 715706 B/op 13242 allocs/op BenchmarkRead/zstd_66_kB-4 3 359473273 ns/op 390.07 MB/s 427247248 B/op 18551 allocs/op ... BenchmarkRead/sample_none_66_kB-4 135 8533283 ns/op 885.04 MB/s 39866 B/op 600 allocs/op BenchmarkRead/sample_gzip_66_kB-4 5 243016949 ns/op 487.84 MB/s 686358 B/op 13212 allocs/op BenchmarkRead/sample_lz4-64k_66_kB-4 10 114330032 ns/op 741.18 MB/s 707104 B/op 12926 allocs/op BenchmarkRead/sample_lz4-256k_66_kB-4 9 117294928 ns/op 766.05 MB/s 778461 B/op 13709 allocs/op ... ``` Note "MB" in previous code was 2^^20 while Go uses 10^6. **Checklist** - [x] Reviewed the `CONTRIBUTING.md` guide - NA Documentation added - [x] Tests updated - NA `CHANGELOG.md` updated - not user-facing - NA Changes that require user attention or interaction to upgrade are documented in `docs/sources/upgrading/_index.md`
3 years ago
iterator := c.SampleIterator(ctx, time.Unix(0, 0), time.Now(), countExtractor)
for iterator.Next() {
_ = iterator.Sample()
}
if err := iterator.Close(); err != nil {
b.Fatal(err)
}
}
bytesRead += size
}
chunks: improve readability of compression benchmarks (#7246) **What this PR does / why we need it**: ### BenchmarkWrite Use Go runtime facilities to report MB/s for each operation and compression ratio. Before: ``` BenchmarkWrite/ordered-none-4 1018 1104885 ns/op 1431962 B/op 33 allocs/op BenchmarkWrite/ordered-gzip-4 12 87333201 ns/op 3516699 B/op 696 allocs/op BenchmarkWrite/ordered-lz4-64k-4 51 21483048 ns/op 3095117 B/op 649 allocs/op ``` After: ``` BenchmarkWrite/ordered-none-4 841 1202007 ns/op 1054353.53 MB/s 101.9 %compressed 1569953 B/op 34 allocs/op BenchmarkWrite/ordered-gzip-4 13 86299699 ns/op 3357.25 MB/s 6.891 %compressed 3778896 B/op 702 allocs/op BenchmarkWrite/ordered-lz4-64k-4 49 22332214 ns/op 34107.61 MB/s 9.880 %compressed 3407522 B/op 661 allocs/op ``` Stop collecting all compressed chunks, as this blows up the memory of the benchmark and creates unrealistic test conditions for the later encoding. Before: `4724260 maxresident KB`; after: `375804 maxresident KB` ### BenchmarkRead Replace reporting of MB/s which gave several values for each encoding as Go hunted for the right number of benchmark iterations. Add stats to the context for decoding, otherwise a new stats object is created each time round the loop. Re-order so all unsampled results come before all sampled results. Before: ``` BenchmarkRead/none_66_kB-4 278 4555157 ns/op 131744 B/op 717 allocs/op BenchmarkRead/sample_none_66_kB-4 142 8238861 ns/op 129771 B/op 717 allocs/op BenchmarkRead/gzip_66_kB-4 6 179445187 ns/op 2100498 B/op 15010 allocs/op BenchmarkRead/sample_gzip_66_kB-4 4 261040057 ns/op 2076030 B/op 15028 allocs/op BenchmarkRead/lz4-64k_66_kB-4 19 62855240 ns/op 1722979 B/op 14219 allocs/op BenchmarkRead/sample_lz4-64k_66_kB-4 10 115577796 ns/op 1706040 B/op 14220 allocs/op BenchmarkRead/lz4-256k_66_kB-4 18 66144317 ns/op 1880851 B/op 15078 allocs/op BenchmarkRead/sample_lz4-256k_66_kB-4 9 118405993 ns/op 1800396 B/op 15077 allocs/op BenchmarkRead/lz4-1M_66_kB-4 18 67832189 ns/op 1821806 B/op 15076 allocs/op ... none_524 kB: 1828.92 MB/s none_524 kB: 1809.33 MB/s none_262 kB: 1790.79 MB/s none_66 kB: 1673.84 MB/s ... gzip_66 kB: 647.35 MB/s ... ``` After: ``` BenchmarkRead/none_66_kB-4 286 4377585 ns/op 1725.23 MB/s 41896 B/op 600 allocs/op BenchmarkRead/gzip_66_kB-4 6 180770638 ns/op 655.82 MB/s 723426 B/op 13265 allocs/op BenchmarkRead/lz4-64k_66_kB-4 19 65380524 ns/op 1296.10 MB/s 727571 B/op 12925 allocs/op BenchmarkRead/lz4-256k_66_kB-4 16 63737292 ns/op 1409.75 MB/s 786994 B/op 13707 allocs/op BenchmarkRead/lz4-1M_66_kB-4 18 65963825 ns/op 1362.17 MB/s 770441 B/op 13707 allocs/op BenchmarkRead/lz4_66_kB-4 18 64955415 ns/op 1383.31 MB/s 1003277 B/op 13707 allocs/op BenchmarkRead/snappy_66_kB-4 20 51588959 ns/op 1273.38 MB/s 361872 B/op 5020 allocs/op BenchmarkRead/flate_66_kB-4 6 172147502 ns/op 691.74 MB/s 715706 B/op 13242 allocs/op BenchmarkRead/zstd_66_kB-4 3 359473273 ns/op 390.07 MB/s 427247248 B/op 18551 allocs/op ... BenchmarkRead/sample_none_66_kB-4 135 8533283 ns/op 885.04 MB/s 39866 B/op 600 allocs/op BenchmarkRead/sample_gzip_66_kB-4 5 243016949 ns/op 487.84 MB/s 686358 B/op 13212 allocs/op BenchmarkRead/sample_lz4-64k_66_kB-4 10 114330032 ns/op 741.18 MB/s 707104 B/op 12926 allocs/op BenchmarkRead/sample_lz4-256k_66_kB-4 9 117294928 ns/op 766.05 MB/s 778461 B/op 13709 allocs/op ... ``` Note "MB" in previous code was 2^^20 while Go uses 10^6. **Checklist** - [x] Reviewed the `CONTRIBUTING.md` guide - NA Documentation added - [x] Tests updated - NA `CHANGELOG.md` updated - not user-facing - NA Changes that require user attention or interaction to upgrade are documented in `docs/sources/upgrading/_index.md`
3 years ago
b.SetBytes(int64(bytesRead) / int64(b.N))
})
}
}
Adds configurable compression algorithms for chunks (#1411) * Adds L4Z encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds encoding benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds chunk size test Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy v2 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Improve benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove chunkenc Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update lz4 to latest master version. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use temporary buffer in serialise method to avoid allocations when doing string -> byte conversion. It also makes code little more readable. We pool those buffers for reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added gzip -1 for comparison. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Initialize reader and buffered reader lazily. This helps with reader/buffered reader reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Don't keep entries, extracted generateData function (mostly to get more understandable profile) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Improve test and benchmark to cover all encodings. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds support for a new chunk format with encoding info. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Ingesters now support encoding config. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add support for no compression. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add docs Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove default Gzip for ByteChunk. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Removes none, snappyv2 and gzip-1 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Move log test lines to testdata and add supported encoding stringer Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * got linted Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
6 years ago
}
func BenchmarkBackwardIterator(b *testing.B) {
for _, bs := range testBlockSizes {
b.Run(humanize.Bytes(uint64(bs)), func(b *testing.B) {
b.ReportAllocs()
c := NewMemChunk(EncSnappy, DefaultTestHeadBlockFmt, bs, testTargetSize)
_ = fillChunk(c)
b.ResetTimer()
for n := 0; n < b.N; n++ {
iterator, err := c.Iterator(context.Background(), time.Unix(0, 0), time.Now(), logproto.BACKWARD, noopStreamPipeline)
if err != nil {
panic(err)
}
for iterator.Next() {
_ = iterator.Entry()
}
if err := iterator.Close(); err != nil {
b.Fatal(err)
}
}
})
}
}
pkg/chunkenc: BenchmarkRead focuses on reading chunks (#1423) Previously BenchmarkRead asked for every single line from chunks to be returned. This causes a lot of unnnecessary allocations, which dominate the benchmark. Instead of counting bytes when reading, we now count size when generating data for logging speed. Another test was added to show that these two approaches are comparable. This change makes BenchmarkRead to report real time needed to decode chunks: name old time/op new time/op delta Read/none-4 86.2ms ± 0% 33.2ms ± 0% ~ (p=1.000 n=1+1) Read/gzip-4 255ms ± 0% 194ms ± 0% ~ (p=1.000 n=1+1) Read/lz4-4 121ms ± 0% 64ms ± 0% ~ (p=1.000 n=1+1) Read/snappy-4 119ms ± 0% 67ms ± 0% ~ (p=1.000 n=1+1) name old alloc/op new alloc/op delta Read/none-4 134MB ± 0% 0MB ± 0% ~ (p=1.000 n=1+1) Read/gzip-4 135MB ± 0% 0MB ± 0% ~ (p=1.000 n=1+1) Read/lz4-4 136MB ± 0% 1MB ± 0% ~ (p=1.000 n=1+1) Read/snappy-4 135MB ± 0% 0MB ± 0% ~ (p=1.000 n=1+1) name old allocs/op new allocs/op delta Read/none-4 491k ± 0% 3k ± 0% ~ (p=1.000 n=1+1) Read/gzip-4 491k ± 0% 3k ± 0% ~ (p=1.000 n=1+1) Read/lz4-4 491k ± 0% 3k ± 0% ~ (p=1.000 n=1+1) Read/snappy-4 491k ± 0% 3k ± 0% ~ (p=1.000 n=1+1) Decompression speed is now also more correct (esp. None is much higher than LZ4/Snappy) Before (/s) Now (/s) None 1.1 GB (n=1) 4.0 GB (n=1) None 1.5 GB (n=9) 3.8 GB (n=38) None 1.6 GB (n=13) Gzip 516 MB (n=1) 640 MB (n=1) Gzip 509 MB (n=3) 664 MB (n=4) Gzip 514 MB (n=4) 649 MB (n=6) LZ4 1.1 GB (n=1) 1.7 GB (n=1) LZ4 1.1 GB (n=9) 1.9 GB (n=15) Snappy 1.1 GB (n=1) 2.0 GB (n=1) Snappy 1.1 GB (n=9) 1.8 GB (n=16) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
6 years ago
func TestGenerateDataSize(t *testing.T) {
for _, enc := range testEncoding {
t.Run(enc.String(), func(t *testing.T) {
chunks, size := generateData(enc, 50, testBlockSize, testTargetSize)
pkg/chunkenc: BenchmarkRead focuses on reading chunks (#1423) Previously BenchmarkRead asked for every single line from chunks to be returned. This causes a lot of unnnecessary allocations, which dominate the benchmark. Instead of counting bytes when reading, we now count size when generating data for logging speed. Another test was added to show that these two approaches are comparable. This change makes BenchmarkRead to report real time needed to decode chunks: name old time/op new time/op delta Read/none-4 86.2ms ± 0% 33.2ms ± 0% ~ (p=1.000 n=1+1) Read/gzip-4 255ms ± 0% 194ms ± 0% ~ (p=1.000 n=1+1) Read/lz4-4 121ms ± 0% 64ms ± 0% ~ (p=1.000 n=1+1) Read/snappy-4 119ms ± 0% 67ms ± 0% ~ (p=1.000 n=1+1) name old alloc/op new alloc/op delta Read/none-4 134MB ± 0% 0MB ± 0% ~ (p=1.000 n=1+1) Read/gzip-4 135MB ± 0% 0MB ± 0% ~ (p=1.000 n=1+1) Read/lz4-4 136MB ± 0% 1MB ± 0% ~ (p=1.000 n=1+1) Read/snappy-4 135MB ± 0% 0MB ± 0% ~ (p=1.000 n=1+1) name old allocs/op new allocs/op delta Read/none-4 491k ± 0% 3k ± 0% ~ (p=1.000 n=1+1) Read/gzip-4 491k ± 0% 3k ± 0% ~ (p=1.000 n=1+1) Read/lz4-4 491k ± 0% 3k ± 0% ~ (p=1.000 n=1+1) Read/snappy-4 491k ± 0% 3k ± 0% ~ (p=1.000 n=1+1) Decompression speed is now also more correct (esp. None is much higher than LZ4/Snappy) Before (/s) Now (/s) None 1.1 GB (n=1) 4.0 GB (n=1) None 1.5 GB (n=9) 3.8 GB (n=38) None 1.6 GB (n=13) Gzip 516 MB (n=1) 640 MB (n=1) Gzip 509 MB (n=3) 664 MB (n=4) Gzip 514 MB (n=4) 649 MB (n=6) LZ4 1.1 GB (n=1) 1.7 GB (n=1) LZ4 1.1 GB (n=9) 1.9 GB (n=15) Snappy 1.1 GB (n=1) 2.0 GB (n=1) Snappy 1.1 GB (n=9) 1.8 GB (n=16) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
6 years ago
bytesRead := uint64(0)
for _, c := range chunks {
// use forward iterator for benchmark -- backward iterator does extra allocations by keeping entries in memory
iterator, err := c.Iterator(context.TODO(), time.Unix(0, 0), time.Now(), logproto.FORWARD, noopStreamPipeline)
pkg/chunkenc: BenchmarkRead focuses on reading chunks (#1423) Previously BenchmarkRead asked for every single line from chunks to be returned. This causes a lot of unnnecessary allocations, which dominate the benchmark. Instead of counting bytes when reading, we now count size when generating data for logging speed. Another test was added to show that these two approaches are comparable. This change makes BenchmarkRead to report real time needed to decode chunks: name old time/op new time/op delta Read/none-4 86.2ms ± 0% 33.2ms ± 0% ~ (p=1.000 n=1+1) Read/gzip-4 255ms ± 0% 194ms ± 0% ~ (p=1.000 n=1+1) Read/lz4-4 121ms ± 0% 64ms ± 0% ~ (p=1.000 n=1+1) Read/snappy-4 119ms ± 0% 67ms ± 0% ~ (p=1.000 n=1+1) name old alloc/op new alloc/op delta Read/none-4 134MB ± 0% 0MB ± 0% ~ (p=1.000 n=1+1) Read/gzip-4 135MB ± 0% 0MB ± 0% ~ (p=1.000 n=1+1) Read/lz4-4 136MB ± 0% 1MB ± 0% ~ (p=1.000 n=1+1) Read/snappy-4 135MB ± 0% 0MB ± 0% ~ (p=1.000 n=1+1) name old allocs/op new allocs/op delta Read/none-4 491k ± 0% 3k ± 0% ~ (p=1.000 n=1+1) Read/gzip-4 491k ± 0% 3k ± 0% ~ (p=1.000 n=1+1) Read/lz4-4 491k ± 0% 3k ± 0% ~ (p=1.000 n=1+1) Read/snappy-4 491k ± 0% 3k ± 0% ~ (p=1.000 n=1+1) Decompression speed is now also more correct (esp. None is much higher than LZ4/Snappy) Before (/s) Now (/s) None 1.1 GB (n=1) 4.0 GB (n=1) None 1.5 GB (n=9) 3.8 GB (n=38) None 1.6 GB (n=13) Gzip 516 MB (n=1) 640 MB (n=1) Gzip 509 MB (n=3) 664 MB (n=4) Gzip 514 MB (n=4) 649 MB (n=6) LZ4 1.1 GB (n=1) 1.7 GB (n=1) LZ4 1.1 GB (n=9) 1.9 GB (n=15) Snappy 1.1 GB (n=1) 2.0 GB (n=1) Snappy 1.1 GB (n=9) 1.8 GB (n=16) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
6 years ago
if err != nil {
panic(err)
}
for iterator.Next() {
e := iterator.Entry()
bytesRead += uint64(len(e.Line))
}
if err := iterator.Close(); err != nil {
t.Fatal(err)
}
}
require.Equal(t, size, bytesRead)
})
}
}
Adds configurable compression algorithms for chunks (#1411) * Adds L4Z encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds encoding benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds chunk size test Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy v2 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Improve benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove chunkenc Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update lz4 to latest master version. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use temporary buffer in serialise method to avoid allocations when doing string -> byte conversion. It also makes code little more readable. We pool those buffers for reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added gzip -1 for comparison. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Initialize reader and buffered reader lazily. This helps with reader/buffered reader reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Don't keep entries, extracted generateData function (mostly to get more understandable profile) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Improve test and benchmark to cover all encodings. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds support for a new chunk format with encoding info. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Ingesters now support encoding config. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add support for no compression. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add docs Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove default Gzip for ByteChunk. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Removes none, snappyv2 and gzip-1 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Move log test lines to testdata and add supported encoding stringer Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * got linted Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
6 years ago
func BenchmarkHeadBlockIterator(b *testing.B) {
for _, j := range []int{100000, 50000, 15000, 10000} {
for _, withNonIndexedLabels := range []bool{false, true} {
b.Run(fmt.Sprintf("size=%d nonIndexedLabels=%v", j, withNonIndexedLabels), func(b *testing.B) {
h := headBlock{}
Adds configurable compression algorithms for chunks (#1411) * Adds L4Z encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds encoding benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds chunk size test Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy v2 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Improve benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove chunkenc Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update lz4 to latest master version. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use temporary buffer in serialise method to avoid allocations when doing string -> byte conversion. It also makes code little more readable. We pool those buffers for reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added gzip -1 for comparison. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Initialize reader and buffered reader lazily. This helps with reader/buffered reader reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Don't keep entries, extracted generateData function (mostly to get more understandable profile) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Improve test and benchmark to cover all encodings. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds support for a new chunk format with encoding info. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Ingesters now support encoding config. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add support for no compression. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add docs Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove default Gzip for ByteChunk. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Removes none, snappyv2 and gzip-1 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Move log test lines to testdata and add supported encoding stringer Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * got linted Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
6 years ago
var nonIndexedLabels labels.Labels
if withNonIndexedLabels {
nonIndexedLabels = labels.Labels{{Name: "foo", Value: "foo"}}
Adds configurable compression algorithms for chunks (#1411) * Adds L4Z encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds encoding benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds chunk size test Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy v2 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Improve benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove chunkenc Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update lz4 to latest master version. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use temporary buffer in serialise method to avoid allocations when doing string -> byte conversion. It also makes code little more readable. We pool those buffers for reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added gzip -1 for comparison. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Initialize reader and buffered reader lazily. This helps with reader/buffered reader reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Don't keep entries, extracted generateData function (mostly to get more understandable profile) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Improve test and benchmark to cover all encodings. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds support for a new chunk format with encoding info. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Ingesters now support encoding config. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add support for no compression. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add docs Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove default Gzip for ByteChunk. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Removes none, snappyv2 and gzip-1 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Move log test lines to testdata and add supported encoding stringer Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * got linted Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
6 years ago
}
for i := 0; i < j; i++ {
if err := h.Append(int64(i), "this is the append string", nonIndexedLabels); err != nil {
b.Fatal(err)
}
}
Adds configurable compression algorithms for chunks (#1411) * Adds L4Z encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds encoding benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds chunk size test Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy v2 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Improve benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove chunkenc Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update lz4 to latest master version. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use temporary buffer in serialise method to avoid allocations when doing string -> byte conversion. It also makes code little more readable. We pool those buffers for reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added gzip -1 for comparison. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Initialize reader and buffered reader lazily. This helps with reader/buffered reader reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Don't keep entries, extracted generateData function (mostly to get more understandable profile) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Improve test and benchmark to cover all encodings. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds support for a new chunk format with encoding info. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Ingesters now support encoding config. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add support for no compression. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add docs Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove default Gzip for ByteChunk. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Removes none, snappyv2 and gzip-1 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Move log test lines to testdata and add supported encoding stringer Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * got linted Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
6 years ago
b.ResetTimer()
Adds configurable compression algorithms for chunks (#1411) * Adds L4Z encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds encoding benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds chunk size test Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy v2 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Improve benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove chunkenc Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update lz4 to latest master version. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use temporary buffer in serialise method to avoid allocations when doing string -> byte conversion. It also makes code little more readable. We pool those buffers for reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added gzip -1 for comparison. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Initialize reader and buffered reader lazily. This helps with reader/buffered reader reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Don't keep entries, extracted generateData function (mostly to get more understandable profile) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Improve test and benchmark to cover all encodings. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds support for a new chunk format with encoding info. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Ingesters now support encoding config. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add support for no compression. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add docs Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove default Gzip for ByteChunk. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Removes none, snappyv2 and gzip-1 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Move log test lines to testdata and add supported encoding stringer Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * got linted Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
6 years ago
for n := 0; n < b.N; n++ {
iter := h.Iterator(context.Background(), logproto.BACKWARD, 0, math.MaxInt64, noopStreamPipeline)
for iter.Next() {
_ = iter.Entry()
}
Adds configurable compression algorithms for chunks (#1411) * Adds L4Z encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds encoding benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds chunk size test Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy v2 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Improve benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove chunkenc Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update lz4 to latest master version. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use temporary buffer in serialise method to avoid allocations when doing string -> byte conversion. It also makes code little more readable. We pool those buffers for reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added gzip -1 for comparison. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Initialize reader and buffered reader lazily. This helps with reader/buffered reader reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Don't keep entries, extracted generateData function (mostly to get more understandable profile) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Improve test and benchmark to cover all encodings. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds support for a new chunk format with encoding info. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Ingesters now support encoding config. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add support for no compression. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add docs Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove default Gzip for ByteChunk. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Removes none, snappyv2 and gzip-1 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Move log test lines to testdata and add supported encoding stringer Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * got linted Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
6 years ago
}
})
}
Adds configurable compression algorithms for chunks (#1411) * Adds L4Z encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds encoding benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy encoding. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds chunk size test Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds snappy v2 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Improve benchmarks Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove chunkenc Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update lz4 to latest master version. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use temporary buffer in serialise method to avoid allocations when doing string -> byte conversion. It also makes code little more readable. We pool those buffers for reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added gzip -1 for comparison. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Initialize reader and buffered reader lazily. This helps with reader/buffered reader reuse. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Don't keep entries, extracted generateData function (mostly to get more understandable profile) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Improve test and benchmark to cover all encodings. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Adds support for a new chunk format with encoding info. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Ingesters now support encoding config. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add support for no compression. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Add docs Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Remove default Gzip for ByteChunk. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Removes none, snappyv2 and gzip-1 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Move log test lines to testdata and add supported encoding stringer Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * got linted Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
6 years ago
}
}
Add ProcessString to Pipeline. (#2972) * Add ProcessString to Pipeline. Most of the time, you have a buffer that you want to test again a log pipeline, when it passed you usually copy via string(buff). In some cases like headchunk and tailer you already have an allocated immutable string, since pipeline never mutate the line passed as parameter, we can create ProcessString pipeline method that will avoid re-allocating the line. benchmp: ``` ❯ benchcmp before.txt after.txt benchmark old ns/op new ns/op delta BenchmarkHeadBlockIterator/Size_100000-16 20264242 16112591 -20.49% BenchmarkHeadBlockIterator/Size_50000-16 10186969 7905259 -22.40% BenchmarkHeadBlockIterator/Size_15000-16 3229052 2202770 -31.78% BenchmarkHeadBlockIterator/Size_10000-16 1916537 1392355 -27.35% BenchmarkHeadBlockSampleIterator/Size_100000-16 18364773 16106425 -12.30% BenchmarkHeadBlockSampleIterator/Size_50000-16 8988422 7730226 -14.00% BenchmarkHeadBlockSampleIterator/Size_15000-16 2788746 2306161 -17.30% BenchmarkHeadBlockSampleIterator/Size_10000-16 1773766 1488861 -16.06% benchmark old allocs new allocs delta BenchmarkHeadBlockIterator/Size_100000-16 200039 39 -99.98% BenchmarkHeadBlockIterator/Size_50000-16 100036 36 -99.96% BenchmarkHeadBlockIterator/Size_15000-16 30031 31 -99.90% BenchmarkHeadBlockIterator/Size_10000-16 20029 29 -99.86% BenchmarkHeadBlockSampleIterator/Size_100000-16 100040 40 -99.96% BenchmarkHeadBlockSampleIterator/Size_50000-16 50036 36 -99.93% BenchmarkHeadBlockSampleIterator/Size_15000-16 15031 31 -99.79% BenchmarkHeadBlockSampleIterator/Size_10000-16 10029 29 -99.71% benchmark old bytes new bytes delta BenchmarkHeadBlockIterator/Size_100000-16 27604042 21203941 -23.19% BenchmarkHeadBlockIterator/Size_50000-16 13860654 10660652 -23.09% BenchmarkHeadBlockIterator/Size_15000-16 4231360 3271363 -22.69% BenchmarkHeadBlockIterator/Size_10000-16 2633436 1993420 -24.30% BenchmarkHeadBlockSampleIterator/Size_100000-16 17799137 14598973 -17.98% BenchmarkHeadBlockSampleIterator/Size_50000-16 7433260 5833137 -21.53% BenchmarkHeadBlockSampleIterator/Size_15000-16 2258099 1778096 -21.26% BenchmarkHeadBlockSampleIterator/Size_10000-16 1393600 1073605 -22.96% ``` Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * add some precision about why Sum64 is not a concern. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update pkg/chunkenc/memchunk.go Co-authored-by: Owen Diehl <ow.diehl@gmail.com> Co-authored-by: Owen Diehl <ow.diehl@gmail.com>
5 years ago
func BenchmarkHeadBlockSampleIterator(b *testing.B) {
for _, j := range []int{20000, 10000, 8000, 5000} {
for _, withNonIndexedLabels := range []bool{false, true} {
b.Run(fmt.Sprintf("size=%d nonIndexedLabels=%v", j, withNonIndexedLabels), func(b *testing.B) {
h := headBlock{}
Add ProcessString to Pipeline. (#2972) * Add ProcessString to Pipeline. Most of the time, you have a buffer that you want to test again a log pipeline, when it passed you usually copy via string(buff). In some cases like headchunk and tailer you already have an allocated immutable string, since pipeline never mutate the line passed as parameter, we can create ProcessString pipeline method that will avoid re-allocating the line. benchmp: ``` ❯ benchcmp before.txt after.txt benchmark old ns/op new ns/op delta BenchmarkHeadBlockIterator/Size_100000-16 20264242 16112591 -20.49% BenchmarkHeadBlockIterator/Size_50000-16 10186969 7905259 -22.40% BenchmarkHeadBlockIterator/Size_15000-16 3229052 2202770 -31.78% BenchmarkHeadBlockIterator/Size_10000-16 1916537 1392355 -27.35% BenchmarkHeadBlockSampleIterator/Size_100000-16 18364773 16106425 -12.30% BenchmarkHeadBlockSampleIterator/Size_50000-16 8988422 7730226 -14.00% BenchmarkHeadBlockSampleIterator/Size_15000-16 2788746 2306161 -17.30% BenchmarkHeadBlockSampleIterator/Size_10000-16 1773766 1488861 -16.06% benchmark old allocs new allocs delta BenchmarkHeadBlockIterator/Size_100000-16 200039 39 -99.98% BenchmarkHeadBlockIterator/Size_50000-16 100036 36 -99.96% BenchmarkHeadBlockIterator/Size_15000-16 30031 31 -99.90% BenchmarkHeadBlockIterator/Size_10000-16 20029 29 -99.86% BenchmarkHeadBlockSampleIterator/Size_100000-16 100040 40 -99.96% BenchmarkHeadBlockSampleIterator/Size_50000-16 50036 36 -99.93% BenchmarkHeadBlockSampleIterator/Size_15000-16 15031 31 -99.79% BenchmarkHeadBlockSampleIterator/Size_10000-16 10029 29 -99.71% benchmark old bytes new bytes delta BenchmarkHeadBlockIterator/Size_100000-16 27604042 21203941 -23.19% BenchmarkHeadBlockIterator/Size_50000-16 13860654 10660652 -23.09% BenchmarkHeadBlockIterator/Size_15000-16 4231360 3271363 -22.69% BenchmarkHeadBlockIterator/Size_10000-16 2633436 1993420 -24.30% BenchmarkHeadBlockSampleIterator/Size_100000-16 17799137 14598973 -17.98% BenchmarkHeadBlockSampleIterator/Size_50000-16 7433260 5833137 -21.53% BenchmarkHeadBlockSampleIterator/Size_15000-16 2258099 1778096 -21.26% BenchmarkHeadBlockSampleIterator/Size_10000-16 1393600 1073605 -22.96% ``` Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * add some precision about why Sum64 is not a concern. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update pkg/chunkenc/memchunk.go Co-authored-by: Owen Diehl <ow.diehl@gmail.com> Co-authored-by: Owen Diehl <ow.diehl@gmail.com>
5 years ago
var nonIndexedLabels labels.Labels
if withNonIndexedLabels {
nonIndexedLabels = labels.Labels{{Name: "foo", Value: "foo"}}
Add ProcessString to Pipeline. (#2972) * Add ProcessString to Pipeline. Most of the time, you have a buffer that you want to test again a log pipeline, when it passed you usually copy via string(buff). In some cases like headchunk and tailer you already have an allocated immutable string, since pipeline never mutate the line passed as parameter, we can create ProcessString pipeline method that will avoid re-allocating the line. benchmp: ``` ❯ benchcmp before.txt after.txt benchmark old ns/op new ns/op delta BenchmarkHeadBlockIterator/Size_100000-16 20264242 16112591 -20.49% BenchmarkHeadBlockIterator/Size_50000-16 10186969 7905259 -22.40% BenchmarkHeadBlockIterator/Size_15000-16 3229052 2202770 -31.78% BenchmarkHeadBlockIterator/Size_10000-16 1916537 1392355 -27.35% BenchmarkHeadBlockSampleIterator/Size_100000-16 18364773 16106425 -12.30% BenchmarkHeadBlockSampleIterator/Size_50000-16 8988422 7730226 -14.00% BenchmarkHeadBlockSampleIterator/Size_15000-16 2788746 2306161 -17.30% BenchmarkHeadBlockSampleIterator/Size_10000-16 1773766 1488861 -16.06% benchmark old allocs new allocs delta BenchmarkHeadBlockIterator/Size_100000-16 200039 39 -99.98% BenchmarkHeadBlockIterator/Size_50000-16 100036 36 -99.96% BenchmarkHeadBlockIterator/Size_15000-16 30031 31 -99.90% BenchmarkHeadBlockIterator/Size_10000-16 20029 29 -99.86% BenchmarkHeadBlockSampleIterator/Size_100000-16 100040 40 -99.96% BenchmarkHeadBlockSampleIterator/Size_50000-16 50036 36 -99.93% BenchmarkHeadBlockSampleIterator/Size_15000-16 15031 31 -99.79% BenchmarkHeadBlockSampleIterator/Size_10000-16 10029 29 -99.71% benchmark old bytes new bytes delta BenchmarkHeadBlockIterator/Size_100000-16 27604042 21203941 -23.19% BenchmarkHeadBlockIterator/Size_50000-16 13860654 10660652 -23.09% BenchmarkHeadBlockIterator/Size_15000-16 4231360 3271363 -22.69% BenchmarkHeadBlockIterator/Size_10000-16 2633436 1993420 -24.30% BenchmarkHeadBlockSampleIterator/Size_100000-16 17799137 14598973 -17.98% BenchmarkHeadBlockSampleIterator/Size_50000-16 7433260 5833137 -21.53% BenchmarkHeadBlockSampleIterator/Size_15000-16 2258099 1778096 -21.26% BenchmarkHeadBlockSampleIterator/Size_10000-16 1393600 1073605 -22.96% ``` Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * add some precision about why Sum64 is not a concern. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update pkg/chunkenc/memchunk.go Co-authored-by: Owen Diehl <ow.diehl@gmail.com> Co-authored-by: Owen Diehl <ow.diehl@gmail.com>
5 years ago
}
for i := 0; i < j; i++ {
if err := h.Append(int64(i), "this is the append string", nonIndexedLabels); err != nil {
b.Fatal(err)
}
}
Add ProcessString to Pipeline. (#2972) * Add ProcessString to Pipeline. Most of the time, you have a buffer that you want to test again a log pipeline, when it passed you usually copy via string(buff). In some cases like headchunk and tailer you already have an allocated immutable string, since pipeline never mutate the line passed as parameter, we can create ProcessString pipeline method that will avoid re-allocating the line. benchmp: ``` ❯ benchcmp before.txt after.txt benchmark old ns/op new ns/op delta BenchmarkHeadBlockIterator/Size_100000-16 20264242 16112591 -20.49% BenchmarkHeadBlockIterator/Size_50000-16 10186969 7905259 -22.40% BenchmarkHeadBlockIterator/Size_15000-16 3229052 2202770 -31.78% BenchmarkHeadBlockIterator/Size_10000-16 1916537 1392355 -27.35% BenchmarkHeadBlockSampleIterator/Size_100000-16 18364773 16106425 -12.30% BenchmarkHeadBlockSampleIterator/Size_50000-16 8988422 7730226 -14.00% BenchmarkHeadBlockSampleIterator/Size_15000-16 2788746 2306161 -17.30% BenchmarkHeadBlockSampleIterator/Size_10000-16 1773766 1488861 -16.06% benchmark old allocs new allocs delta BenchmarkHeadBlockIterator/Size_100000-16 200039 39 -99.98% BenchmarkHeadBlockIterator/Size_50000-16 100036 36 -99.96% BenchmarkHeadBlockIterator/Size_15000-16 30031 31 -99.90% BenchmarkHeadBlockIterator/Size_10000-16 20029 29 -99.86% BenchmarkHeadBlockSampleIterator/Size_100000-16 100040 40 -99.96% BenchmarkHeadBlockSampleIterator/Size_50000-16 50036 36 -99.93% BenchmarkHeadBlockSampleIterator/Size_15000-16 15031 31 -99.79% BenchmarkHeadBlockSampleIterator/Size_10000-16 10029 29 -99.71% benchmark old bytes new bytes delta BenchmarkHeadBlockIterator/Size_100000-16 27604042 21203941 -23.19% BenchmarkHeadBlockIterator/Size_50000-16 13860654 10660652 -23.09% BenchmarkHeadBlockIterator/Size_15000-16 4231360 3271363 -22.69% BenchmarkHeadBlockIterator/Size_10000-16 2633436 1993420 -24.30% BenchmarkHeadBlockSampleIterator/Size_100000-16 17799137 14598973 -17.98% BenchmarkHeadBlockSampleIterator/Size_50000-16 7433260 5833137 -21.53% BenchmarkHeadBlockSampleIterator/Size_15000-16 2258099 1778096 -21.26% BenchmarkHeadBlockSampleIterator/Size_10000-16 1393600 1073605 -22.96% ``` Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * add some precision about why Sum64 is not a concern. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update pkg/chunkenc/memchunk.go Co-authored-by: Owen Diehl <ow.diehl@gmail.com> Co-authored-by: Owen Diehl <ow.diehl@gmail.com>
5 years ago
b.ResetTimer()
Add ProcessString to Pipeline. (#2972) * Add ProcessString to Pipeline. Most of the time, you have a buffer that you want to test again a log pipeline, when it passed you usually copy via string(buff). In some cases like headchunk and tailer you already have an allocated immutable string, since pipeline never mutate the line passed as parameter, we can create ProcessString pipeline method that will avoid re-allocating the line. benchmp: ``` ❯ benchcmp before.txt after.txt benchmark old ns/op new ns/op delta BenchmarkHeadBlockIterator/Size_100000-16 20264242 16112591 -20.49% BenchmarkHeadBlockIterator/Size_50000-16 10186969 7905259 -22.40% BenchmarkHeadBlockIterator/Size_15000-16 3229052 2202770 -31.78% BenchmarkHeadBlockIterator/Size_10000-16 1916537 1392355 -27.35% BenchmarkHeadBlockSampleIterator/Size_100000-16 18364773 16106425 -12.30% BenchmarkHeadBlockSampleIterator/Size_50000-16 8988422 7730226 -14.00% BenchmarkHeadBlockSampleIterator/Size_15000-16 2788746 2306161 -17.30% BenchmarkHeadBlockSampleIterator/Size_10000-16 1773766 1488861 -16.06% benchmark old allocs new allocs delta BenchmarkHeadBlockIterator/Size_100000-16 200039 39 -99.98% BenchmarkHeadBlockIterator/Size_50000-16 100036 36 -99.96% BenchmarkHeadBlockIterator/Size_15000-16 30031 31 -99.90% BenchmarkHeadBlockIterator/Size_10000-16 20029 29 -99.86% BenchmarkHeadBlockSampleIterator/Size_100000-16 100040 40 -99.96% BenchmarkHeadBlockSampleIterator/Size_50000-16 50036 36 -99.93% BenchmarkHeadBlockSampleIterator/Size_15000-16 15031 31 -99.79% BenchmarkHeadBlockSampleIterator/Size_10000-16 10029 29 -99.71% benchmark old bytes new bytes delta BenchmarkHeadBlockIterator/Size_100000-16 27604042 21203941 -23.19% BenchmarkHeadBlockIterator/Size_50000-16 13860654 10660652 -23.09% BenchmarkHeadBlockIterator/Size_15000-16 4231360 3271363 -22.69% BenchmarkHeadBlockIterator/Size_10000-16 2633436 1993420 -24.30% BenchmarkHeadBlockSampleIterator/Size_100000-16 17799137 14598973 -17.98% BenchmarkHeadBlockSampleIterator/Size_50000-16 7433260 5833137 -21.53% BenchmarkHeadBlockSampleIterator/Size_15000-16 2258099 1778096 -21.26% BenchmarkHeadBlockSampleIterator/Size_10000-16 1393600 1073605 -22.96% ``` Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * add some precision about why Sum64 is not a concern. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update pkg/chunkenc/memchunk.go Co-authored-by: Owen Diehl <ow.diehl@gmail.com> Co-authored-by: Owen Diehl <ow.diehl@gmail.com>
5 years ago
for n := 0; n < b.N; n++ {
iter := h.SampleIterator(context.Background(), 0, math.MaxInt64, countExtractor)
for iter.Next() {
_ = iter.Sample()
}
iter.Close()
Add ProcessString to Pipeline. (#2972) * Add ProcessString to Pipeline. Most of the time, you have a buffer that you want to test again a log pipeline, when it passed you usually copy via string(buff). In some cases like headchunk and tailer you already have an allocated immutable string, since pipeline never mutate the line passed as parameter, we can create ProcessString pipeline method that will avoid re-allocating the line. benchmp: ``` ❯ benchcmp before.txt after.txt benchmark old ns/op new ns/op delta BenchmarkHeadBlockIterator/Size_100000-16 20264242 16112591 -20.49% BenchmarkHeadBlockIterator/Size_50000-16 10186969 7905259 -22.40% BenchmarkHeadBlockIterator/Size_15000-16 3229052 2202770 -31.78% BenchmarkHeadBlockIterator/Size_10000-16 1916537 1392355 -27.35% BenchmarkHeadBlockSampleIterator/Size_100000-16 18364773 16106425 -12.30% BenchmarkHeadBlockSampleIterator/Size_50000-16 8988422 7730226 -14.00% BenchmarkHeadBlockSampleIterator/Size_15000-16 2788746 2306161 -17.30% BenchmarkHeadBlockSampleIterator/Size_10000-16 1773766 1488861 -16.06% benchmark old allocs new allocs delta BenchmarkHeadBlockIterator/Size_100000-16 200039 39 -99.98% BenchmarkHeadBlockIterator/Size_50000-16 100036 36 -99.96% BenchmarkHeadBlockIterator/Size_15000-16 30031 31 -99.90% BenchmarkHeadBlockIterator/Size_10000-16 20029 29 -99.86% BenchmarkHeadBlockSampleIterator/Size_100000-16 100040 40 -99.96% BenchmarkHeadBlockSampleIterator/Size_50000-16 50036 36 -99.93% BenchmarkHeadBlockSampleIterator/Size_15000-16 15031 31 -99.79% BenchmarkHeadBlockSampleIterator/Size_10000-16 10029 29 -99.71% benchmark old bytes new bytes delta BenchmarkHeadBlockIterator/Size_100000-16 27604042 21203941 -23.19% BenchmarkHeadBlockIterator/Size_50000-16 13860654 10660652 -23.09% BenchmarkHeadBlockIterator/Size_15000-16 4231360 3271363 -22.69% BenchmarkHeadBlockIterator/Size_10000-16 2633436 1993420 -24.30% BenchmarkHeadBlockSampleIterator/Size_100000-16 17799137 14598973 -17.98% BenchmarkHeadBlockSampleIterator/Size_50000-16 7433260 5833137 -21.53% BenchmarkHeadBlockSampleIterator/Size_15000-16 2258099 1778096 -21.26% BenchmarkHeadBlockSampleIterator/Size_10000-16 1393600 1073605 -22.96% ``` Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * add some precision about why Sum64 is not a concern. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update pkg/chunkenc/memchunk.go Co-authored-by: Owen Diehl <ow.diehl@gmail.com> Co-authored-by: Owen Diehl <ow.diehl@gmail.com>
5 years ago
}
})
}
Add ProcessString to Pipeline. (#2972) * Add ProcessString to Pipeline. Most of the time, you have a buffer that you want to test again a log pipeline, when it passed you usually copy via string(buff). In some cases like headchunk and tailer you already have an allocated immutable string, since pipeline never mutate the line passed as parameter, we can create ProcessString pipeline method that will avoid re-allocating the line. benchmp: ``` ❯ benchcmp before.txt after.txt benchmark old ns/op new ns/op delta BenchmarkHeadBlockIterator/Size_100000-16 20264242 16112591 -20.49% BenchmarkHeadBlockIterator/Size_50000-16 10186969 7905259 -22.40% BenchmarkHeadBlockIterator/Size_15000-16 3229052 2202770 -31.78% BenchmarkHeadBlockIterator/Size_10000-16 1916537 1392355 -27.35% BenchmarkHeadBlockSampleIterator/Size_100000-16 18364773 16106425 -12.30% BenchmarkHeadBlockSampleIterator/Size_50000-16 8988422 7730226 -14.00% BenchmarkHeadBlockSampleIterator/Size_15000-16 2788746 2306161 -17.30% BenchmarkHeadBlockSampleIterator/Size_10000-16 1773766 1488861 -16.06% benchmark old allocs new allocs delta BenchmarkHeadBlockIterator/Size_100000-16 200039 39 -99.98% BenchmarkHeadBlockIterator/Size_50000-16 100036 36 -99.96% BenchmarkHeadBlockIterator/Size_15000-16 30031 31 -99.90% BenchmarkHeadBlockIterator/Size_10000-16 20029 29 -99.86% BenchmarkHeadBlockSampleIterator/Size_100000-16 100040 40 -99.96% BenchmarkHeadBlockSampleIterator/Size_50000-16 50036 36 -99.93% BenchmarkHeadBlockSampleIterator/Size_15000-16 15031 31 -99.79% BenchmarkHeadBlockSampleIterator/Size_10000-16 10029 29 -99.71% benchmark old bytes new bytes delta BenchmarkHeadBlockIterator/Size_100000-16 27604042 21203941 -23.19% BenchmarkHeadBlockIterator/Size_50000-16 13860654 10660652 -23.09% BenchmarkHeadBlockIterator/Size_15000-16 4231360 3271363 -22.69% BenchmarkHeadBlockIterator/Size_10000-16 2633436 1993420 -24.30% BenchmarkHeadBlockSampleIterator/Size_100000-16 17799137 14598973 -17.98% BenchmarkHeadBlockSampleIterator/Size_50000-16 7433260 5833137 -21.53% BenchmarkHeadBlockSampleIterator/Size_15000-16 2258099 1778096 -21.26% BenchmarkHeadBlockSampleIterator/Size_10000-16 1393600 1073605 -22.96% ``` Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * add some precision about why Sum64 is not a concern. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com> * Update pkg/chunkenc/memchunk.go Co-authored-by: Owen Diehl <ow.diehl@gmail.com> Co-authored-by: Owen Diehl <ow.diehl@gmail.com>
5 years ago
}
}
func TestMemChunk_IteratorBounds(t *testing.T) {
createChunk := func() *MemChunk {
t.Helper()
c := NewMemChunk(EncNone, DefaultTestHeadBlockFmt, 1e6, 1e6)
if err := c.Append(&logproto.Entry{
Timestamp: time.Unix(0, 1),
Line: "1",
}); err != nil {
t.Fatal(err)
}
if err := c.Append(&logproto.Entry{
Timestamp: time.Unix(0, 2),
Line: "2",
}); err != nil {
t.Fatal(err)
}
return c
}
for _, tt := range []struct {
mint, maxt time.Time
direction logproto.Direction
expect []bool // array of expected values for next call in sequence
}{
{time.Unix(0, 0), time.Unix(0, 1), logproto.FORWARD, []bool{false}},
{time.Unix(0, 1), time.Unix(0, 2), logproto.FORWARD, []bool{true, false}},
{time.Unix(0, 1), time.Unix(0, 3), logproto.FORWARD, []bool{true, true, false}},
{time.Unix(0, 2), time.Unix(0, 3), logproto.FORWARD, []bool{true, false}},
{time.Unix(0, 0), time.Unix(0, 1), logproto.BACKWARD, []bool{false}},
{time.Unix(0, 1), time.Unix(0, 2), logproto.BACKWARD, []bool{true, false}},
{time.Unix(0, 1), time.Unix(0, 3), logproto.BACKWARD, []bool{true, true, false}},
{time.Unix(0, 2), time.Unix(0, 3), logproto.BACKWARD, []bool{true, false}},
} {
t.Run(
fmt.Sprintf("mint:%d,maxt:%d,direction:%s", tt.mint.UnixNano(), tt.maxt.UnixNano(), tt.direction),
func(t *testing.T) {
tt := tt
c := createChunk()
// testing headchunk
it, err := c.Iterator(context.Background(), tt.mint, tt.maxt, tt.direction, noopStreamPipeline)
require.NoError(t, err)
for idx, expected := range tt.expect {
require.Equal(t, expected, it.Next(), "idx: %s", idx)
}
require.NoError(t, it.Close())
// testing chunk blocks
require.NoError(t, c.cut())
it, err = c.Iterator(context.Background(), tt.mint, tt.maxt, tt.direction, noopStreamPipeline)
require.NoError(t, err)
for i := range tt.expect {
require.Equal(t, tt.expect[i], it.Next())
}
require.NoError(t, it.Close())
})
}
}
func TestMemchunkLongLine(t *testing.T) {
for _, enc := range testEncoding {
enc := enc
t.Run(enc.String(), func(t *testing.T) {
t.Parallel()
c := NewMemChunk(enc, DefaultTestHeadBlockFmt, testBlockSize, testTargetSize)
for i := 1; i <= 10; i++ {
require.NoError(t, c.Append(&logproto.Entry{Timestamp: time.Unix(0, int64(i)), Line: strings.Repeat("e", 200000)}))
}
it, err := c.Iterator(context.Background(), time.Unix(0, 0), time.Unix(0, 100), logproto.FORWARD, noopStreamPipeline)
require.NoError(t, err)
for i := 1; i <= 10; i++ {
require.True(t, it.Next())
}
require.False(t, it.Next())
})
}
}
// Ensure passing a reusable []byte doesn't affect output
func TestBytesWith(t *testing.T) {
t.Parallel()
exp, err := NewMemChunk(EncNone, DefaultTestHeadBlockFmt, testBlockSize, testTargetSize).BytesWith(nil)
require.Nil(t, err)
out, err := NewMemChunk(EncNone, DefaultTestHeadBlockFmt, testBlockSize, testTargetSize).BytesWith([]byte{1, 2, 3})
require.Nil(t, err)
require.Equal(t, exp, out)
}
Adds WAL support (experimental) (#2981) * marshalable chunks * wal record types custom serialization * proto types for wal checkpoints * byteswith output unaffected by buffer * wal & record pool ifcs * wal record can hold entries from multiple series * entry pool * ingester uses noopWal * removes duplicate argument passing in ingester code. adds ingester config validation & derives chunk encoding. * segment writing * [WIP] wal recovery from segments * replay uses sync.Maps & preserves WAL fingerprints * in memory wal recovery * wal segment recovery * ingester metrics struct * wal replay locks streamsMtx in instances, adds checkpoint codec * ingester metrics * checkpointer * WAL checkpoint writer * checkpointwriter can write multiple checkpoints * reorgs checkpointing * wires up checkpointwriter to wal * ingester SeriesIter impl * wires up ingesterRecoverer to consume checkpoints * generic recovery fn * generic recovery fn * recover from both wal types * cleans up old tmp checkpoints & allows aborting in flight checkpoints * wires up wal checkpointing * more granular wal logging * fixes off by 1 wal truncation & removes double logging * adds userID to wal records correctly * wire chunk encoding tests * more granular wal metrics * checkpoint encoding test * ignores debug bins * segment replay ignores out of orders * fixes bug between WAL reading []byte validity and proto unmarshalling refs * conf validations, removes comments * flush on shutdown config * POST /ingester/shutdown * renames flush on shutdown * wal & checkpoint use same segment size * writes entries to wal regardless of tailers * makes wal checkpoing duration default to 5m * recovery metrics * encodes headchunks separately for wal purposes * merge upstream * linting * addresses pr feedback uses entry pool in stream push/tailer removes unnecessary pool interaction checkpointbytes comment fillchunk helper, record resetting in tests via pool redundant comment defers wg done in recovery s/num/count/ checkpoint wal uses a logger encodeWithTypeHeader now creates its own []byte removes pool from decodeEntries wal stop can error * prevent shared access bug with tailers and entry pool * removes stream push entry pool optimization
5 years ago
func TestCheckpointEncoding(t *testing.T) {
t.Parallel()
Adds WAL support (experimental) (#2981) * marshalable chunks * wal record types custom serialization * proto types for wal checkpoints * byteswith output unaffected by buffer * wal & record pool ifcs * wal record can hold entries from multiple series * entry pool * ingester uses noopWal * removes duplicate argument passing in ingester code. adds ingester config validation & derives chunk encoding. * segment writing * [WIP] wal recovery from segments * replay uses sync.Maps & preserves WAL fingerprints * in memory wal recovery * wal segment recovery * ingester metrics struct * wal replay locks streamsMtx in instances, adds checkpoint codec * ingester metrics * checkpointer * WAL checkpoint writer * checkpointwriter can write multiple checkpoints * reorgs checkpointing * wires up checkpointwriter to wal * ingester SeriesIter impl * wires up ingesterRecoverer to consume checkpoints * generic recovery fn * generic recovery fn * recover from both wal types * cleans up old tmp checkpoints & allows aborting in flight checkpoints * wires up wal checkpointing * more granular wal logging * fixes off by 1 wal truncation & removes double logging * adds userID to wal records correctly * wire chunk encoding tests * more granular wal metrics * checkpoint encoding test * ignores debug bins * segment replay ignores out of orders * fixes bug between WAL reading []byte validity and proto unmarshalling refs * conf validations, removes comments * flush on shutdown config * POST /ingester/shutdown * renames flush on shutdown * wal & checkpoint use same segment size * writes entries to wal regardless of tailers * makes wal checkpoing duration default to 5m * recovery metrics * encodes headchunks separately for wal purposes * merge upstream * linting * addresses pr feedback uses entry pool in stream push/tailer removes unnecessary pool interaction checkpointbytes comment fillchunk helper, record resetting in tests via pool redundant comment defers wg done in recovery s/num/count/ checkpoint wal uses a logger encodeWithTypeHeader now creates its own []byte removes pool from decodeEntries wal stop can error * prevent shared access bug with tailers and entry pool * removes stream push entry pool optimization
5 years ago
blockSize, targetSize := 256*1024, 1500*1024
for _, f := range allPossibleFormats {
t.Run(testNameWithFormats(EncSnappy, f.chunkFormat, f.headBlockFmt), func(t *testing.T) {
c := newMemChunkWithFormat(f.chunkFormat, EncSnappy, f.headBlockFmt, blockSize, targetSize)
// add a few entries
for i := 0; i < 5; i++ {
entry := &logproto.Entry{
Timestamp: time.Unix(int64(i), 0),
Line: fmt.Sprintf("hi there - %d", i),
NonIndexedLabels: push.LabelsAdapter{{
Name: fmt.Sprintf("name%d", i),
Value: fmt.Sprintf("val%d", i),
}},
}
require.Equal(t, true, c.SpaceFor(entry))
require.Nil(t, c.Append(entry))
}
Adds WAL support (experimental) (#2981) * marshalable chunks * wal record types custom serialization * proto types for wal checkpoints * byteswith output unaffected by buffer * wal & record pool ifcs * wal record can hold entries from multiple series * entry pool * ingester uses noopWal * removes duplicate argument passing in ingester code. adds ingester config validation & derives chunk encoding. * segment writing * [WIP] wal recovery from segments * replay uses sync.Maps & preserves WAL fingerprints * in memory wal recovery * wal segment recovery * ingester metrics struct * wal replay locks streamsMtx in instances, adds checkpoint codec * ingester metrics * checkpointer * WAL checkpoint writer * checkpointwriter can write multiple checkpoints * reorgs checkpointing * wires up checkpointwriter to wal * ingester SeriesIter impl * wires up ingesterRecoverer to consume checkpoints * generic recovery fn * generic recovery fn * recover from both wal types * cleans up old tmp checkpoints & allows aborting in flight checkpoints * wires up wal checkpointing * more granular wal logging * fixes off by 1 wal truncation & removes double logging * adds userID to wal records correctly * wire chunk encoding tests * more granular wal metrics * checkpoint encoding test * ignores debug bins * segment replay ignores out of orders * fixes bug between WAL reading []byte validity and proto unmarshalling refs * conf validations, removes comments * flush on shutdown config * POST /ingester/shutdown * renames flush on shutdown * wal & checkpoint use same segment size * writes entries to wal regardless of tailers * makes wal checkpoing duration default to 5m * recovery metrics * encodes headchunks separately for wal purposes * merge upstream * linting * addresses pr feedback uses entry pool in stream push/tailer removes unnecessary pool interaction checkpointbytes comment fillchunk helper, record resetting in tests via pool redundant comment defers wg done in recovery s/num/count/ checkpoint wal uses a logger encodeWithTypeHeader now creates its own []byte removes pool from decodeEntries wal stop can error * prevent shared access bug with tailers and entry pool * removes stream push entry pool optimization
5 years ago
// cut it
require.Nil(t, c.cut())
Adds WAL support (experimental) (#2981) * marshalable chunks * wal record types custom serialization * proto types for wal checkpoints * byteswith output unaffected by buffer * wal & record pool ifcs * wal record can hold entries from multiple series * entry pool * ingester uses noopWal * removes duplicate argument passing in ingester code. adds ingester config validation & derives chunk encoding. * segment writing * [WIP] wal recovery from segments * replay uses sync.Maps & preserves WAL fingerprints * in memory wal recovery * wal segment recovery * ingester metrics struct * wal replay locks streamsMtx in instances, adds checkpoint codec * ingester metrics * checkpointer * WAL checkpoint writer * checkpointwriter can write multiple checkpoints * reorgs checkpointing * wires up checkpointwriter to wal * ingester SeriesIter impl * wires up ingesterRecoverer to consume checkpoints * generic recovery fn * generic recovery fn * recover from both wal types * cleans up old tmp checkpoints & allows aborting in flight checkpoints * wires up wal checkpointing * more granular wal logging * fixes off by 1 wal truncation & removes double logging * adds userID to wal records correctly * wire chunk encoding tests * more granular wal metrics * checkpoint encoding test * ignores debug bins * segment replay ignores out of orders * fixes bug between WAL reading []byte validity and proto unmarshalling refs * conf validations, removes comments * flush on shutdown config * POST /ingester/shutdown * renames flush on shutdown * wal & checkpoint use same segment size * writes entries to wal regardless of tailers * makes wal checkpoing duration default to 5m * recovery metrics * encodes headchunks separately for wal purposes * merge upstream * linting * addresses pr feedback uses entry pool in stream push/tailer removes unnecessary pool interaction checkpointbytes comment fillchunk helper, record resetting in tests via pool redundant comment defers wg done in recovery s/num/count/ checkpoint wal uses a logger encodeWithTypeHeader now creates its own []byte removes pool from decodeEntries wal stop can error * prevent shared access bug with tailers and entry pool * removes stream push entry pool optimization
5 years ago
// add a few more to head
for i := 5; i < 10; i++ {
entry := &logproto.Entry{
Timestamp: time.Unix(int64(i), 0),
Line: fmt.Sprintf("hi there - %d", i),
}
require.Equal(t, true, c.SpaceFor(entry))
require.Nil(t, c.Append(entry))
}
Adds WAL support (experimental) (#2981) * marshalable chunks * wal record types custom serialization * proto types for wal checkpoints * byteswith output unaffected by buffer * wal & record pool ifcs * wal record can hold entries from multiple series * entry pool * ingester uses noopWal * removes duplicate argument passing in ingester code. adds ingester config validation & derives chunk encoding. * segment writing * [WIP] wal recovery from segments * replay uses sync.Maps & preserves WAL fingerprints * in memory wal recovery * wal segment recovery * ingester metrics struct * wal replay locks streamsMtx in instances, adds checkpoint codec * ingester metrics * checkpointer * WAL checkpoint writer * checkpointwriter can write multiple checkpoints * reorgs checkpointing * wires up checkpointwriter to wal * ingester SeriesIter impl * wires up ingesterRecoverer to consume checkpoints * generic recovery fn * generic recovery fn * recover from both wal types * cleans up old tmp checkpoints & allows aborting in flight checkpoints * wires up wal checkpointing * more granular wal logging * fixes off by 1 wal truncation & removes double logging * adds userID to wal records correctly * wire chunk encoding tests * more granular wal metrics * checkpoint encoding test * ignores debug bins * segment replay ignores out of orders * fixes bug between WAL reading []byte validity and proto unmarshalling refs * conf validations, removes comments * flush on shutdown config * POST /ingester/shutdown * renames flush on shutdown * wal & checkpoint use same segment size * writes entries to wal regardless of tailers * makes wal checkpoing duration default to 5m * recovery metrics * encodes headchunks separately for wal purposes * merge upstream * linting * addresses pr feedback uses entry pool in stream push/tailer removes unnecessary pool interaction checkpointbytes comment fillchunk helper, record resetting in tests via pool redundant comment defers wg done in recovery s/num/count/ checkpoint wal uses a logger encodeWithTypeHeader now creates its own []byte removes pool from decodeEntries wal stop can error * prevent shared access bug with tailers and entry pool * removes stream push entry pool optimization
5 years ago
// ensure new blocks are not cut
require.Equal(t, 1, len(c.blocks))
Adds WAL support (experimental) (#2981) * marshalable chunks * wal record types custom serialization * proto types for wal checkpoints * byteswith output unaffected by buffer * wal & record pool ifcs * wal record can hold entries from multiple series * entry pool * ingester uses noopWal * removes duplicate argument passing in ingester code. adds ingester config validation & derives chunk encoding. * segment writing * [WIP] wal recovery from segments * replay uses sync.Maps & preserves WAL fingerprints * in memory wal recovery * wal segment recovery * ingester metrics struct * wal replay locks streamsMtx in instances, adds checkpoint codec * ingester metrics * checkpointer * WAL checkpoint writer * checkpointwriter can write multiple checkpoints * reorgs checkpointing * wires up checkpointwriter to wal * ingester SeriesIter impl * wires up ingesterRecoverer to consume checkpoints * generic recovery fn * generic recovery fn * recover from both wal types * cleans up old tmp checkpoints & allows aborting in flight checkpoints * wires up wal checkpointing * more granular wal logging * fixes off by 1 wal truncation & removes double logging * adds userID to wal records correctly * wire chunk encoding tests * more granular wal metrics * checkpoint encoding test * ignores debug bins * segment replay ignores out of orders * fixes bug between WAL reading []byte validity and proto unmarshalling refs * conf validations, removes comments * flush on shutdown config * POST /ingester/shutdown * renames flush on shutdown * wal & checkpoint use same segment size * writes entries to wal regardless of tailers * makes wal checkpoing duration default to 5m * recovery metrics * encodes headchunks separately for wal purposes * merge upstream * linting * addresses pr feedback uses entry pool in stream push/tailer removes unnecessary pool interaction checkpointbytes comment fillchunk helper, record resetting in tests via pool redundant comment defers wg done in recovery s/num/count/ checkpoint wal uses a logger encodeWithTypeHeader now creates its own []byte removes pool from decodeEntries wal stop can error * prevent shared access bug with tailers and entry pool * removes stream push entry pool optimization
5 years ago
var chk, head bytes.Buffer
err := c.SerializeForCheckpointTo(&chk, &head)
require.Nil(t, err)
Adds WAL support (experimental) (#2981) * marshalable chunks * wal record types custom serialization * proto types for wal checkpoints * byteswith output unaffected by buffer * wal & record pool ifcs * wal record can hold entries from multiple series * entry pool * ingester uses noopWal * removes duplicate argument passing in ingester code. adds ingester config validation & derives chunk encoding. * segment writing * [WIP] wal recovery from segments * replay uses sync.Maps & preserves WAL fingerprints * in memory wal recovery * wal segment recovery * ingester metrics struct * wal replay locks streamsMtx in instances, adds checkpoint codec * ingester metrics * checkpointer * WAL checkpoint writer * checkpointwriter can write multiple checkpoints * reorgs checkpointing * wires up checkpointwriter to wal * ingester SeriesIter impl * wires up ingesterRecoverer to consume checkpoints * generic recovery fn * generic recovery fn * recover from both wal types * cleans up old tmp checkpoints & allows aborting in flight checkpoints * wires up wal checkpointing * more granular wal logging * fixes off by 1 wal truncation & removes double logging * adds userID to wal records correctly * wire chunk encoding tests * more granular wal metrics * checkpoint encoding test * ignores debug bins * segment replay ignores out of orders * fixes bug between WAL reading []byte validity and proto unmarshalling refs * conf validations, removes comments * flush on shutdown config * POST /ingester/shutdown * renames flush on shutdown * wal & checkpoint use same segment size * writes entries to wal regardless of tailers * makes wal checkpoing duration default to 5m * recovery metrics * encodes headchunks separately for wal purposes * merge upstream * linting * addresses pr feedback uses entry pool in stream push/tailer removes unnecessary pool interaction checkpointbytes comment fillchunk helper, record resetting in tests via pool redundant comment defers wg done in recovery s/num/count/ checkpoint wal uses a logger encodeWithTypeHeader now creates its own []byte removes pool from decodeEntries wal stop can error * prevent shared access bug with tailers and entry pool * removes stream push entry pool optimization
5 years ago
cpy, err := MemchunkFromCheckpoint(chk.Bytes(), head.Bytes(), f.headBlockFmt, blockSize, targetSize)
require.Nil(t, err)
if f.chunkFormat <= chunkFormatV2 {
for i := range c.blocks {
c.blocks[i].uncompressedSize = 0
}
}
require.Equal(t, c, cpy)
})
}
Adds WAL support (experimental) (#2981) * marshalable chunks * wal record types custom serialization * proto types for wal checkpoints * byteswith output unaffected by buffer * wal & record pool ifcs * wal record can hold entries from multiple series * entry pool * ingester uses noopWal * removes duplicate argument passing in ingester code. adds ingester config validation & derives chunk encoding. * segment writing * [WIP] wal recovery from segments * replay uses sync.Maps & preserves WAL fingerprints * in memory wal recovery * wal segment recovery * ingester metrics struct * wal replay locks streamsMtx in instances, adds checkpoint codec * ingester metrics * checkpointer * WAL checkpoint writer * checkpointwriter can write multiple checkpoints * reorgs checkpointing * wires up checkpointwriter to wal * ingester SeriesIter impl * wires up ingesterRecoverer to consume checkpoints * generic recovery fn * generic recovery fn * recover from both wal types * cleans up old tmp checkpoints & allows aborting in flight checkpoints * wires up wal checkpointing * more granular wal logging * fixes off by 1 wal truncation & removes double logging * adds userID to wal records correctly * wire chunk encoding tests * more granular wal metrics * checkpoint encoding test * ignores debug bins * segment replay ignores out of orders * fixes bug between WAL reading []byte validity and proto unmarshalling refs * conf validations, removes comments * flush on shutdown config * POST /ingester/shutdown * renames flush on shutdown * wal & checkpoint use same segment size * writes entries to wal regardless of tailers * makes wal checkpoing duration default to 5m * recovery metrics * encodes headchunks separately for wal purposes * merge upstream * linting * addresses pr feedback uses entry pool in stream push/tailer removes unnecessary pool interaction checkpointbytes comment fillchunk helper, record resetting in tests via pool redundant comment defers wg done in recovery s/num/count/ checkpoint wal uses a logger encodeWithTypeHeader now creates its own []byte removes pool from decodeEntries wal stop can error * prevent shared access bug with tailers and entry pool * removes stream push entry pool optimization
5 years ago
}
var (
streams = []logproto.Stream{}
series = []logproto.Series{}
)
func BenchmarkBufferedIteratorLabels(b *testing.B) {
for _, f := range HeadBlockFmts {
b.Run(f.String(), func(b *testing.B) {
c := NewMemChunk(EncSnappy, f, testBlockSize, testTargetSize)
_ = fillChunk(c)
labelsSet := []labels.Labels{
{
{Name: "cluster", Value: "us-central1"},
{Name: "stream", Value: "stdout"},
{Name: "filename", Value: "/var/log/pods/loki-prod_query-frontend-6894f97b98-89q2n_eac98024-f60f-44af-a46f-d099bc99d1e7/query-frontend/0.log"},
{Name: "namespace", Value: "loki-dev"},
{Name: "job", Value: "loki-prod/query-frontend"},
{Name: "container", Value: "query-frontend"},
{Name: "pod", Value: "query-frontend-6894f97b98-89q2n"},
},
{
{Name: "cluster", Value: "us-central2"},
{Name: "stream", Value: "stderr"},
{Name: "filename", Value: "/var/log/pods/loki-prod_querier-6894f97b98-89q2n_eac98024-f60f-44af-a46f-d099bc99d1e7/query-frontend/0.log"},
{Name: "namespace", Value: "loki-dev"},
{Name: "job", Value: "loki-prod/querier"},
{Name: "container", Value: "querier"},
{Name: "pod", Value: "querier-6894f97b98-89q2n"},
},
}
for _, test := range []string{
`{app="foo"}`,
`{app="foo"} != "foo"`,
`{app="foo"} != "foo" | logfmt `,
`{app="foo"} != "foo" | logfmt | duration > 10ms`,
`{app="foo"} != "foo" | logfmt | duration > 10ms and component="tsdb"`,
} {
b.Run(test, func(b *testing.B) {
b.ReportAllocs()
expr, err := syntax.ParseLogSelector(test, true)
if err != nil {
b.Fatal(err)
}
p, err := expr.Pipeline()
if err != nil {
b.Fatal(err)
}
var iters []iter.EntryIterator
for _, lbs := range labelsSet {
it, err := c.Iterator(context.Background(), time.Unix(0, 0), time.Now(), logproto.FORWARD, p.ForStream(lbs))
if err != nil {
b.Fatal(err)
}
iters = append(iters, it)
}
b.ResetTimer()
for n := 0; n < b.N; n++ {
for _, it := range iters {
for it.Next() {
streams = append(streams, logproto.Stream{Labels: it.Labels(), Entries: []logproto.Entry{it.Entry()}})
}
}
}
streams = streams[:0]
})
}
for _, test := range []string{
`rate({app="foo"}[1m])`,
`sum by (cluster) (rate({app="foo"}[10s]))`,
`sum by (cluster) (rate({app="foo"} != "foo" [10s]))`,
`sum by (cluster) (rate({app="foo"} != "foo" | logfmt[10s]))`,
`sum by (caller) (rate({app="foo"} != "foo" | logfmt[10s]))`,
`sum by (cluster) (rate({app="foo"} != "foo" | logfmt | duration > 10ms[10s]))`,
`sum by (cluster) (rate({app="foo"} != "foo" | logfmt | duration > 10ms and component="tsdb"[1m]))`,
} {
b.Run(test, func(b *testing.B) {
b.ReportAllocs()
expr, err := syntax.ParseSampleExpr(test)
if err != nil {
b.Fatal(err)
}
ex, err := expr.Extractor()
if err != nil {
b.Fatal(err)
}
var iters []iter.SampleIterator
for _, lbs := range labelsSet {
iters = append(iters, c.SampleIterator(context.Background(), time.Unix(0, 0), time.Now(), ex.ForStream(lbs)))
}
b.ResetTimer()
for n := 0; n < b.N; n++ {
for _, it := range iters {
for it.Next() {
series = append(series, logproto.Series{Labels: it.Labels(), Samples: []logproto.Sample{it.Sample()}})
}
}
}
series = series[:0]
})
}
})
}
}
func Test_HeadIteratorReverse(t *testing.T) {
for _, testData := range allPossibleFormats {
t.Run(testNameWithFormats(EncSnappy, testData.chunkFormat, testData.headBlockFmt), func(t *testing.T) {
c := newMemChunkWithFormat(testData.chunkFormat, EncSnappy, testData.headBlockFmt, testBlockSize, testTargetSize)
genEntry := func(i int64) *logproto.Entry {
return &logproto.Entry{
Timestamp: time.Unix(0, i),
Line: fmt.Sprintf(`msg="%d"`, i),
}
}
var i int64
for e := genEntry(i); c.SpaceFor(e); e, i = genEntry(i+1), i+1 {
require.NoError(t, c.Append(e))
}
assertOrder := func(t *testing.T, total int64) {
expr, err := syntax.ParseLogSelector(`{app="foo"} | logfmt`, true)
require.NoError(t, err)
p, err := expr.Pipeline()
require.NoError(t, err)
it, err := c.Iterator(context.TODO(), time.Unix(0, 0), time.Unix(0, i), logproto.BACKWARD, p.ForStream(labels.Labels{{Name: "app", Value: "foo"}}))
require.NoError(t, err)
for it.Next() {
total--
require.Equal(t, total, it.Entry().Timestamp.UnixNano())
}
}
assertOrder(t, i)
// let's try again without the headblock.
require.NoError(t, c.cut())
assertOrder(t, i)
})
}
}
func TestMemChunk_Rebound(t *testing.T) {
chkFrom := time.Unix(0, 0)
chkThrough := chkFrom.Add(time.Hour)
originalChunk := buildTestMemChunk(t, chkFrom, chkThrough)
for _, tc := range []struct {
name string
sliceFrom, sliceTo time.Time
err error
}{
{
name: "slice whole chunk",
sliceFrom: chkFrom,
sliceTo: chkThrough,
},
{
name: "slice first half",
sliceFrom: chkFrom,
sliceTo: chkFrom.Add(30 * time.Minute),
},
{
name: "slice second half",
sliceFrom: chkFrom.Add(30 * time.Minute),
sliceTo: chkThrough,
},
{
name: "slice in the middle",
sliceFrom: chkFrom.Add(15 * time.Minute),
sliceTo: chkFrom.Add(45 * time.Minute),
},
{
name: "slice interval not aligned with sample intervals",
sliceFrom: chkFrom.Add(time.Second),
sliceTo: chkThrough.Add(-time.Second),
},
{
name: "slice out of bounds without overlap",
err: chunk.ErrSliceNoDataInRange,
sliceFrom: chkThrough.Add(time.Minute),
sliceTo: chkThrough.Add(time.Hour),
},
{
name: "slice out of bounds with overlap",
sliceFrom: chkFrom.Add(10 * time.Minute),
sliceTo: chkThrough.Add(10 * time.Minute),
},
} {
t.Run(tc.name, func(t *testing.T) {
Add filter parameter to rebound so lines can be deleted by the compactor (#5879) * Add filter parameter to rebound Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Fix linting issues Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Add filter function to delete request Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Fix linting issues Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Enable api for filter and delete mode Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Add settings for retention Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Use labels to check and add test Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Simplify filter function Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Also enable filter mode Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Remove test settings in config file for docker Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Add extra (unused) param for ProcessString in filter Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Empty commit to trigger CI again Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Update changelog Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Fix flapping test Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Remove commented out unit tests and add some more Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Add extra test case for delete request without line filter Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Use chunk bounds Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * check if the log selector has a filter if the whole chunk is selected Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * fix lint issue: use correct go-kit import Signed-off-by: Michel Hollands <michel.hollands@grafana.com>
4 years ago
newChunk, err := originalChunk.Rebound(tc.sliceFrom, tc.sliceTo, nil)
if tc.err != nil {
require.Equal(t, tc.err, err)
return
}
require.NoError(t, err)
// iterate originalChunk from slice start to slice end + nanosecond. Adding a nanosecond here to be inclusive of sample at end time.
originalChunkItr, err := originalChunk.Iterator(context.Background(), tc.sliceFrom, tc.sliceTo.Add(time.Nanosecond), logproto.FORWARD, log.NewNoopPipeline().ForStream(labels.Labels{}))
require.NoError(t, err)
// iterate newChunk for whole chunk interval which should include all the samples in the chunk and hence align it with expected values.
newChunkItr, err := newChunk.Iterator(context.Background(), chkFrom, chkThrough, logproto.FORWARD, log.NewNoopPipeline().ForStream(labels.Labels{}))
require.NoError(t, err)
for {
originalChunksHasMoreSamples := originalChunkItr.Next()
newChunkHasMoreSamples := newChunkItr.Next()
// either both should have samples or none of them
require.Equal(t, originalChunksHasMoreSamples, newChunkHasMoreSamples)
if !originalChunksHasMoreSamples {
break
}
require.Equal(t, originalChunkItr.Entry(), newChunkItr.Entry())
}
})
}
}
func buildTestMemChunk(t *testing.T, from, through time.Time) *MemChunk {
chk := NewMemChunk(EncGZIP, DefaultTestHeadBlockFmt, defaultBlockSize, 0)
for ; from.Before(through); from = from.Add(time.Second) {
err := chk.Append(&logproto.Entry{
Line: from.String(),
Timestamp: from,
})
require.NoError(t, err)
}
return chk
}
Add filter parameter to rebound so lines can be deleted by the compactor (#5879) * Add filter parameter to rebound Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Fix linting issues Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Add filter function to delete request Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Fix linting issues Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Enable api for filter and delete mode Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Add settings for retention Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Use labels to check and add test Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Simplify filter function Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Also enable filter mode Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Remove test settings in config file for docker Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Add extra (unused) param for ProcessString in filter Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Empty commit to trigger CI again Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Update changelog Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Fix flapping test Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Remove commented out unit tests and add some more Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Add extra test case for delete request without line filter Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Use chunk bounds Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * check if the log selector has a filter if the whole chunk is selected Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * fix lint issue: use correct go-kit import Signed-off-by: Michel Hollands <michel.hollands@grafana.com>
4 years ago
func TestMemChunk_ReboundAndFilter_with_filter(t *testing.T) {
chkFrom := time.Unix(1, 0) // headBlock.Append treats Unix time 0 as not set so we have to use a later time
chkFromPlus5 := chkFrom.Add(5 * time.Second)
chkThrough := chkFrom.Add(10 * time.Second)
chkThroughPlus1 := chkThrough.Add(1 * time.Second)
filterFunc := func(_ time.Time, in string) bool {
Add filter parameter to rebound so lines can be deleted by the compactor (#5879) * Add filter parameter to rebound Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Fix linting issues Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Add filter function to delete request Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Fix linting issues Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Enable api for filter and delete mode Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Add settings for retention Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Use labels to check and add test Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Simplify filter function Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Also enable filter mode Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Remove test settings in config file for docker Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Add extra (unused) param for ProcessString in filter Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Empty commit to trigger CI again Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Update changelog Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Fix flapping test Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Remove commented out unit tests and add some more Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Add extra test case for delete request without line filter Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Use chunk bounds Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * check if the log selector has a filter if the whole chunk is selected Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * fix lint issue: use correct go-kit import Signed-off-by: Michel Hollands <michel.hollands@grafana.com>
4 years ago
return strings.HasPrefix(in, "matching")
}
for _, tc := range []struct {
name string
matchingSliceFrom, matchingSliceTo *time.Time
err error
nrMatching int
nrNotMatching int
}{
{
name: "no matches",
nrMatching: 0,
nrNotMatching: 10,
},
{
name: "some lines removed",
matchingSliceFrom: &chkFrom,
matchingSliceTo: &chkFromPlus5,
nrMatching: 5,
nrNotMatching: 5,
},
{
name: "all lines match",
err: chunk.ErrSliceNoDataInRange,
matchingSliceFrom: &chkFrom,
matchingSliceTo: &chkThroughPlus1,
},
} {
t.Run(tc.name, func(t *testing.T) {
originalChunk := buildFilterableTestMemChunk(t, chkFrom, chkThrough, tc.matchingSliceFrom, tc.matchingSliceTo)
newChunk, err := originalChunk.Rebound(chkFrom, chkThrough, filterFunc)
if tc.err != nil {
require.Equal(t, tc.err, err)
return
}
require.NoError(t, err)
// iterate originalChunk from slice start to slice end + nanosecond. Adding a nanosecond here to be inclusive of sample at end time.
originalChunkItr, err := originalChunk.Iterator(context.Background(), chkFrom, chkThrough.Add(time.Nanosecond), logproto.FORWARD, log.NewNoopPipeline().ForStream(labels.Labels{}))
require.NoError(t, err)
originalChunkSamples := 0
for originalChunkItr.Next() {
originalChunkSamples++
}
require.Equal(t, tc.nrMatching+tc.nrNotMatching, originalChunkSamples)
// iterate newChunk for whole chunk interval which should include all the samples in the chunk and hence align it with expected values.
newChunkItr, err := newChunk.Iterator(context.Background(), chkFrom, chkThrough.Add(time.Nanosecond), logproto.FORWARD, log.NewNoopPipeline().ForStream(labels.Labels{}))
require.NoError(t, err)
newChunkSamples := 0
for newChunkItr.Next() {
newChunkSamples++
}
require.Equal(t, tc.nrNotMatching, newChunkSamples)
})
}
}
func buildFilterableTestMemChunk(t *testing.T, from, through time.Time, matchingFrom, matchingTo *time.Time) *MemChunk {
chk := NewMemChunk(EncGZIP, DefaultTestHeadBlockFmt, defaultBlockSize, 0)
Add filter parameter to rebound so lines can be deleted by the compactor (#5879) * Add filter parameter to rebound Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Fix linting issues Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Add filter function to delete request Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Fix linting issues Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Enable api for filter and delete mode Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Add settings for retention Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Use labels to check and add test Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Simplify filter function Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Also enable filter mode Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Remove test settings in config file for docker Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Add extra (unused) param for ProcessString in filter Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Empty commit to trigger CI again Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Update changelog Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Fix flapping test Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Remove commented out unit tests and add some more Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Add extra test case for delete request without line filter Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * Use chunk bounds Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * check if the log selector has a filter if the whole chunk is selected Signed-off-by: Michel Hollands <michel.hollands@grafana.com> * fix lint issue: use correct go-kit import Signed-off-by: Michel Hollands <michel.hollands@grafana.com>
4 years ago
t.Logf("from : %v", from.String())
t.Logf("through: %v", through.String())
for from.Before(through) {
// If a line is between matchingFrom and matchingTo add the prefix "matching"
if matchingFrom != nil && matchingTo != nil &&
(from.Equal(*matchingFrom) || (from.After(*matchingFrom) && (from.Before(*matchingTo)))) {
t.Logf("%v matching line", from.String())
err := chk.Append(&logproto.Entry{
Line: fmt.Sprintf("matching %v", from.String()),
Timestamp: from,
})
require.NoError(t, err)
} else {
t.Logf("%v non-match line", from.String())
err := chk.Append(&logproto.Entry{
Line: from.String(),
Timestamp: from,
})
require.NoError(t, err)
}
from = from.Add(time.Second)
}
return chk
}
func TestMemChunk_SpaceFor(t *testing.T) {
for _, tc := range []struct {
desc string
nBlocks int
targetSize int
headSize int
cutBlockSize int
entry logproto.Entry
expect bool
expectFunc func(chunkFormat byte, headFmt HeadBlockFmt) bool
}{
{
desc: "targetSize not defined",
nBlocks: blocksPerChunk - 1,
entry: logproto.Entry{
Timestamp: time.Unix(0, 0),
Line: "a",
},
expect: true,
},
{
desc: "targetSize not defined and too many blocks",
nBlocks: blocksPerChunk + 1,
entry: logproto.Entry{
Timestamp: time.Unix(0, 0),
Line: "a",
},
expect: false,
},
{
desc: "head too big",
targetSize: 10,
headSize: 100,
cutBlockSize: 0,
entry: logproto.Entry{
Timestamp: time.Unix(0, 0),
Line: "a",
},
expect: false,
},
{
desc: "cut blocks too big",
targetSize: 10,
headSize: 0,
cutBlockSize: 100,
entry: logproto.Entry{
Timestamp: time.Unix(0, 0),
Line: "a",
},
expect: false,
},
{
desc: "entry fits",
targetSize: 10,
headSize: 0,
cutBlockSize: 0,
entry: logproto.Entry{
Timestamp: time.Unix(0, 0),
Line: strings.Repeat("a", 9),
},
expect: true,
},
{
desc: "entry fits with non-indexed labels",
targetSize: 10,
headSize: 0,
cutBlockSize: 0,
entry: logproto.Entry{
Timestamp: time.Unix(0, 0),
Line: strings.Repeat("a", 2),
NonIndexedLabels: []logproto.LabelAdapter{
{Name: "foo", Value: strings.Repeat("a", 2)},
},
},
expect: true,
},
{
desc: "entry too big",
targetSize: 10,
headSize: 0,
cutBlockSize: 0,
entry: logproto.Entry{
Timestamp: time.Unix(0, 0),
Line: strings.Repeat("a", 100),
},
expect: false,
},
{
desc: "entry too big because non-indexed labels",
targetSize: 10,
headSize: 0,
cutBlockSize: 0,
entry: logproto.Entry{
Timestamp: time.Unix(0, 0),
Line: strings.Repeat("a", 5),
NonIndexedLabels: []logproto.LabelAdapter{
{Name: "foo", Value: strings.Repeat("a", 5)},
},
},
expectFunc: func(chunkFormat byte, _ HeadBlockFmt) bool {
// Succeed unless we're using chunk format v4, which should
// take the non-indexed labels into account.
return chunkFormat < chunkFormatV4
},
},
} {
t.Run(tc.desc, func(t *testing.T) {
for _, format := range allPossibleFormats {
t.Run(fmt.Sprintf("chunk_v%d_head_%s", format.chunkFormat, format.headBlockFmt), func(t *testing.T) {
chk := newMemChunkWithFormat(format.chunkFormat, EncNone, format.headBlockFmt, 1024, tc.targetSize)
chk.blocks = make([]block, tc.nBlocks)
chk.cutBlockSize = tc.cutBlockSize
for i := 0; i < tc.headSize; i++ {
require.NoError(t, chk.head.Append(int64(i), "a", nil))
}
expect := tc.expect
if tc.expectFunc != nil {
expect = tc.expectFunc(format.chunkFormat, format.headBlockFmt)
}
require.Equal(t, expect, chk.SpaceFor(&tc.entry))
})
}
})
}
}
Fix buffered iterator (#9976) **What this PR does / why we need it**: In https://github.com/grafana/loki/pull/9700, we added support for writing non-indexed labels from entries into chunks. This PR introduced two bugs: - When the buffered iterator is closed, the buffer to read metadata labels is put back into the pool but not set to nil. Subsequent uses of the iterator may write on top of the buffer that was put back on the pool. This may lead to inconsistent/incorrect results. - Inside the buffered iterator, we were not correctly handling EOFs while reading the number of labels and each label length. This was due to the `lastAttemp` variable not being reset. This PR adds a new test for these two bugs and also fixes them. **Which issue(s) this PR fixes**: Fixes https://github.com/grafana/loki/pull/9700 **Special notes for your reviewer**: **Checklist** - [ ] Reviewed the [`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md) guide (**required**) - [ ] Documentation added - [ ] Tests updated - [ ] `CHANGELOG.md` updated - [ ] If the change is worth mentioning in the release notes, add `add-to-release-notes` label - [ ] Changes that require user attention or interaction to upgrade are documented in `docs/sources/upgrading/_index.md` - [ ] For Helm chart changes bump the Helm chart version in `production/helm/loki/Chart.yaml` and update `production/helm/loki/CHANGELOG.md` and `production/helm/loki/README.md`. [Example PR](https://github.com/grafana/loki/commit/d10549e3ece02120974929894ee333d07755d213)
2 years ago
func TestMemChunk_IteratorWithNonIndexedLabels(t *testing.T) {
for _, enc := range testEncoding {
enc := enc
t.Run(enc.String(), func(t *testing.T) {
streamLabels := labels.Labels{
{Name: "job", Value: "fake"},
}
chk := newMemChunkWithFormat(chunkFormatV4, enc, UnorderedWithNonIndexedLabelsHeadBlockFmt, testBlockSize, testTargetSize)
require.NoError(t, chk.Append(logprotoEntryWithNonIndexedLabels(1, "lineA", []logproto.LabelAdapter{
Fix buffered iterator (#9976) **What this PR does / why we need it**: In https://github.com/grafana/loki/pull/9700, we added support for writing non-indexed labels from entries into chunks. This PR introduced two bugs: - When the buffered iterator is closed, the buffer to read metadata labels is put back into the pool but not set to nil. Subsequent uses of the iterator may write on top of the buffer that was put back on the pool. This may lead to inconsistent/incorrect results. - Inside the buffered iterator, we were not correctly handling EOFs while reading the number of labels and each label length. This was due to the `lastAttemp` variable not being reset. This PR adds a new test for these two bugs and also fixes them. **Which issue(s) this PR fixes**: Fixes https://github.com/grafana/loki/pull/9700 **Special notes for your reviewer**: **Checklist** - [ ] Reviewed the [`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md) guide (**required**) - [ ] Documentation added - [ ] Tests updated - [ ] `CHANGELOG.md` updated - [ ] If the change is worth mentioning in the release notes, add `add-to-release-notes` label - [ ] Changes that require user attention or interaction to upgrade are documented in `docs/sources/upgrading/_index.md` - [ ] For Helm chart changes bump the Helm chart version in `production/helm/loki/Chart.yaml` and update `production/helm/loki/CHANGELOG.md` and `production/helm/loki/README.md`. [Example PR](https://github.com/grafana/loki/commit/d10549e3ece02120974929894ee333d07755d213)
2 years ago
{Name: "traceID", Value: "123"},
{Name: "user", Value: "a"},
})))
require.NoError(t, chk.Append(logprotoEntryWithNonIndexedLabels(2, "lineB", []logproto.LabelAdapter{
Fix buffered iterator (#9976) **What this PR does / why we need it**: In https://github.com/grafana/loki/pull/9700, we added support for writing non-indexed labels from entries into chunks. This PR introduced two bugs: - When the buffered iterator is closed, the buffer to read metadata labels is put back into the pool but not set to nil. Subsequent uses of the iterator may write on top of the buffer that was put back on the pool. This may lead to inconsistent/incorrect results. - Inside the buffered iterator, we were not correctly handling EOFs while reading the number of labels and each label length. This was due to the `lastAttemp` variable not being reset. This PR adds a new test for these two bugs and also fixes them. **Which issue(s) this PR fixes**: Fixes https://github.com/grafana/loki/pull/9700 **Special notes for your reviewer**: **Checklist** - [ ] Reviewed the [`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md) guide (**required**) - [ ] Documentation added - [ ] Tests updated - [ ] `CHANGELOG.md` updated - [ ] If the change is worth mentioning in the release notes, add `add-to-release-notes` label - [ ] Changes that require user attention or interaction to upgrade are documented in `docs/sources/upgrading/_index.md` - [ ] For Helm chart changes bump the Helm chart version in `production/helm/loki/Chart.yaml` and update `production/helm/loki/CHANGELOG.md` and `production/helm/loki/README.md`. [Example PR](https://github.com/grafana/loki/commit/d10549e3ece02120974929894ee333d07755d213)
2 years ago
{Name: "traceID", Value: "456"},
{Name: "user", Value: "b"},
})))
require.NoError(t, chk.cut())
require.NoError(t, chk.Append(logprotoEntryWithNonIndexedLabels(3, "lineC", []logproto.LabelAdapter{
Fix buffered iterator (#9976) **What this PR does / why we need it**: In https://github.com/grafana/loki/pull/9700, we added support for writing non-indexed labels from entries into chunks. This PR introduced two bugs: - When the buffered iterator is closed, the buffer to read metadata labels is put back into the pool but not set to nil. Subsequent uses of the iterator may write on top of the buffer that was put back on the pool. This may lead to inconsistent/incorrect results. - Inside the buffered iterator, we were not correctly handling EOFs while reading the number of labels and each label length. This was due to the `lastAttemp` variable not being reset. This PR adds a new test for these two bugs and also fixes them. **Which issue(s) this PR fixes**: Fixes https://github.com/grafana/loki/pull/9700 **Special notes for your reviewer**: **Checklist** - [ ] Reviewed the [`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md) guide (**required**) - [ ] Documentation added - [ ] Tests updated - [ ] `CHANGELOG.md` updated - [ ] If the change is worth mentioning in the release notes, add `add-to-release-notes` label - [ ] Changes that require user attention or interaction to upgrade are documented in `docs/sources/upgrading/_index.md` - [ ] For Helm chart changes bump the Helm chart version in `production/helm/loki/Chart.yaml` and update `production/helm/loki/CHANGELOG.md` and `production/helm/loki/README.md`. [Example PR](https://github.com/grafana/loki/commit/d10549e3ece02120974929894ee333d07755d213)
2 years ago
{Name: "traceID", Value: "789"},
{Name: "user", Value: "c"},
})))
require.NoError(t, chk.Append(logprotoEntryWithNonIndexedLabels(4, "lineD", []logproto.LabelAdapter{
Fix buffered iterator (#9976) **What this PR does / why we need it**: In https://github.com/grafana/loki/pull/9700, we added support for writing non-indexed labels from entries into chunks. This PR introduced two bugs: - When the buffered iterator is closed, the buffer to read metadata labels is put back into the pool but not set to nil. Subsequent uses of the iterator may write on top of the buffer that was put back on the pool. This may lead to inconsistent/incorrect results. - Inside the buffered iterator, we were not correctly handling EOFs while reading the number of labels and each label length. This was due to the `lastAttemp` variable not being reset. This PR adds a new test for these two bugs and also fixes them. **Which issue(s) this PR fixes**: Fixes https://github.com/grafana/loki/pull/9700 **Special notes for your reviewer**: **Checklist** - [ ] Reviewed the [`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md) guide (**required**) - [ ] Documentation added - [ ] Tests updated - [ ] `CHANGELOG.md` updated - [ ] If the change is worth mentioning in the release notes, add `add-to-release-notes` label - [ ] Changes that require user attention or interaction to upgrade are documented in `docs/sources/upgrading/_index.md` - [ ] For Helm chart changes bump the Helm chart version in `production/helm/loki/Chart.yaml` and update `production/helm/loki/CHANGELOG.md` and `production/helm/loki/README.md`. [Example PR](https://github.com/grafana/loki/commit/d10549e3ece02120974929894ee333d07755d213)
2 years ago
{Name: "traceID", Value: "123"},
{Name: "user", Value: "d"},
})))
// The expected bytes is the sum of bytes decompressed and bytes read from the head chunk.
// First we add the bytes read from the store (aka decompressed). That's
// nonIndexedLabelsBytes = n. lines * (n. labels <int> + (2 * n. nonIndexedLabelsSymbols * symbol <int>))
// lineBytes = n. lines * (ts <int> + line length <int> + line)
expectedNonIndexedLabelsBytes := 2 * (binary.MaxVarintLen64 + (2 * 2 * binary.MaxVarintLen64))
lineBytes := 2 * (2*binary.MaxVarintLen64 + len("lineA"))
// Now we add the bytes read from the head chunk. That's
// nonIndexedLabelsBytes = n. lines * (2 * n. nonIndexedLabelsSymbols * symbol <uint32>)
// lineBytes = n. lines * (line)
expectedNonIndexedLabelsBytes += 2 * (2 * 2 * 4)
lineBytes += 2 * (len("lineC"))
// Finally, the expected total bytes is the line bytes + non-indexed labels bytes
expectedBytes := lineBytes + expectedNonIndexedLabelsBytes
for _, tc := range []struct {
name string
query string
expectedLines []string
expectedStreams []string
}{
{
name: "no-filter",
query: `{job="fake"}`,
expectedLines: []string{"lineA", "lineB", "lineC", "lineD"},
expectedStreams: []string{
labels.FromStrings("job", "fake", "traceID", "123", "user", "a").String(),
labels.FromStrings("job", "fake", "traceID", "456", "user", "b").String(),
labels.FromStrings("job", "fake", "traceID", "789", "user", "c").String(),
labels.FromStrings("job", "fake", "traceID", "123", "user", "d").String(),
},
},
{
name: "filter",
query: `{job="fake"} | traceID="789"`,
expectedLines: []string{"lineC"},
expectedStreams: []string{
labels.FromStrings("job", "fake", "traceID", "789", "user", "c").String(),
},
},
{
name: "filter-regex-or",
query: `{job="fake"} | traceID=~"456|789"`,
expectedLines: []string{"lineB", "lineC"},
expectedStreams: []string{
labels.FromStrings("job", "fake", "traceID", "456", "user", "b").String(),
labels.FromStrings("job", "fake", "traceID", "789", "user", "c").String(),
},
},
{
name: "filter-regex-contains",
query: `{job="fake"} | traceID=~".*5.*"`,
expectedLines: []string{"lineB"},
expectedStreams: []string{
labels.FromStrings("job", "fake", "traceID", "456", "user", "b").String(),
},
},
{
name: "filter-regex-complex",
query: `{job="fake"} | traceID=~"^[0-9]2.*"`,
expectedLines: []string{"lineA", "lineD"},
expectedStreams: []string{
labels.FromStrings("job", "fake", "traceID", "123", "user", "a").String(),
labels.FromStrings("job", "fake", "traceID", "123", "user", "d").String(),
},
},
{
name: "multiple-filters",
query: `{job="fake"} | traceID="123" | user="d"`,
expectedLines: []string{"lineD"},
expectedStreams: []string{
labels.FromStrings("job", "fake", "traceID", "123", "user", "d").String(),
},
},
{
name: "keep",
query: `{job="fake"} | keep job, user`,
expectedLines: []string{"lineA", "lineB", "lineC", "lineD"},
expectedStreams: []string{
labels.FromStrings("job", "fake", "user", "a").String(),
labels.FromStrings("job", "fake", "user", "b").String(),
labels.FromStrings("job", "fake", "user", "c").String(),
labels.FromStrings("job", "fake", "user", "d").String(),
},
},
{
name: "keep-filter",
query: `{job="fake"} | keep job, user="b"`,
expectedLines: []string{"lineA", "lineB", "lineC", "lineD"},
expectedStreams: []string{
labels.FromStrings("job", "fake").String(),
labels.FromStrings("job", "fake", "user", "b").String(),
labels.FromStrings("job", "fake").String(),
labels.FromStrings("job", "fake").String(),
},
},
{
name: "drop",
query: `{job="fake"} | drop traceID`,
expectedLines: []string{"lineA", "lineB", "lineC", "lineD"},
expectedStreams: []string{
labels.FromStrings("job", "fake", "user", "a").String(),
labels.FromStrings("job", "fake", "user", "b").String(),
labels.FromStrings("job", "fake", "user", "c").String(),
labels.FromStrings("job", "fake", "user", "d").String(),
},
},
{
name: "drop-filter",
query: `{job="fake"} | drop traceID="123"`,
expectedLines: []string{"lineA", "lineB", "lineC", "lineD"},
expectedStreams: []string{
labels.FromStrings("job", "fake", "user", "a").String(),
labels.FromStrings("job", "fake", "traceID", "456", "user", "b").String(),
labels.FromStrings("job", "fake", "traceID", "789", "user", "c").String(),
labels.FromStrings("job", "fake", "user", "d").String(),
},
},
} {
t.Run(tc.name, func(t *testing.T) {
t.Run("log", func(t *testing.T) {
expr, err := syntax.ParseLogSelector(tc.query, true)
require.NoError(t, err)
pipeline, err := expr.Pipeline()
require.NoError(t, err)
// We will run the test twice so the iterator will be created twice.
// This is to ensure that the iterator is correctly closed.
for i := 0; i < 2; i++ {
sts, ctx := stats.NewContext(context.Background())
it, err := chk.Iterator(ctx, time.Unix(0, 0), time.Unix(0, math.MaxInt64), logproto.FORWARD, pipeline.ForStream(streamLabels))
require.NoError(t, err)
var lines []string
var streams []string
for it.Next() {
require.NoError(t, it.Error())
e := it.Entry()
lines = append(lines, e.Line)
streams = append(streams, it.Labels())
// We don't want to send back the non-indexed labels since
// they are already part of the returned labels.
require.Empty(t, e.NonIndexedLabels)
}
assert.ElementsMatch(t, tc.expectedLines, lines)
assert.ElementsMatch(t, tc.expectedStreams, streams)
resultStats := sts.Result(0, 0, len(lines))
require.Equal(t, int64(expectedBytes), resultStats.Summary.TotalBytesProcessed)
require.Equal(t, int64(expectedNonIndexedLabelsBytes), resultStats.Summary.TotalNonIndexedLabelsBytesProcessed)
}
})
t.Run("metric", func(t *testing.T) {
query := fmt.Sprintf(`count_over_time(%s [1d])`, tc.query)
expr, err := syntax.ParseSampleExpr(query)
require.NoError(t, err)
Fix buffered iterator (#9976) **What this PR does / why we need it**: In https://github.com/grafana/loki/pull/9700, we added support for writing non-indexed labels from entries into chunks. This PR introduced two bugs: - When the buffered iterator is closed, the buffer to read metadata labels is put back into the pool but not set to nil. Subsequent uses of the iterator may write on top of the buffer that was put back on the pool. This may lead to inconsistent/incorrect results. - Inside the buffered iterator, we were not correctly handling EOFs while reading the number of labels and each label length. This was due to the `lastAttemp` variable not being reset. This PR adds a new test for these two bugs and also fixes them. **Which issue(s) this PR fixes**: Fixes https://github.com/grafana/loki/pull/9700 **Special notes for your reviewer**: **Checklist** - [ ] Reviewed the [`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md) guide (**required**) - [ ] Documentation added - [ ] Tests updated - [ ] `CHANGELOG.md` updated - [ ] If the change is worth mentioning in the release notes, add `add-to-release-notes` label - [ ] Changes that require user attention or interaction to upgrade are documented in `docs/sources/upgrading/_index.md` - [ ] For Helm chart changes bump the Helm chart version in `production/helm/loki/Chart.yaml` and update `production/helm/loki/CHANGELOG.md` and `production/helm/loki/README.md`. [Example PR](https://github.com/grafana/loki/commit/d10549e3ece02120974929894ee333d07755d213)
2 years ago
extractor, err := expr.Extractor()
require.NoError(t, err)
// We will run the test twice so the iterator will be created twice.
// This is to ensure that the iterator is correctly closed.
for i := 0; i < 2; i++ {
sts, ctx := stats.NewContext(context.Background())
it := chk.SampleIterator(ctx, time.Unix(0, 0), time.Unix(0, math.MaxInt64), extractor.ForStream(streamLabels))
var sumValues int
var streams []string
for it.Next() {
require.NoError(t, it.Error())
e := it.Sample()
sumValues += int(e.Value)
streams = append(streams, it.Labels())
}
require.Equal(t, len(tc.expectedLines), sumValues)
assert.ElementsMatch(t, tc.expectedStreams, streams)
resultStats := sts.Result(0, 0, 0)
require.Equal(t, int64(expectedBytes), resultStats.Summary.TotalBytesProcessed)
require.Equal(t, int64(expectedNonIndexedLabelsBytes), resultStats.Summary.TotalNonIndexedLabelsBytesProcessed)
}
})
})
Fix buffered iterator (#9976) **What this PR does / why we need it**: In https://github.com/grafana/loki/pull/9700, we added support for writing non-indexed labels from entries into chunks. This PR introduced two bugs: - When the buffered iterator is closed, the buffer to read metadata labels is put back into the pool but not set to nil. Subsequent uses of the iterator may write on top of the buffer that was put back on the pool. This may lead to inconsistent/incorrect results. - Inside the buffered iterator, we were not correctly handling EOFs while reading the number of labels and each label length. This was due to the `lastAttemp` variable not being reset. This PR adds a new test for these two bugs and also fixes them. **Which issue(s) this PR fixes**: Fixes https://github.com/grafana/loki/pull/9700 **Special notes for your reviewer**: **Checklist** - [ ] Reviewed the [`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md) guide (**required**) - [ ] Documentation added - [ ] Tests updated - [ ] `CHANGELOG.md` updated - [ ] If the change is worth mentioning in the release notes, add `add-to-release-notes` label - [ ] Changes that require user attention or interaction to upgrade are documented in `docs/sources/upgrading/_index.md` - [ ] For Helm chart changes bump the Helm chart version in `production/helm/loki/Chart.yaml` and update `production/helm/loki/CHANGELOG.md` and `production/helm/loki/README.md`. [Example PR](https://github.com/grafana/loki/commit/d10549e3ece02120974929894ee333d07755d213)
2 years ago
}
})
}
}
Fix bugs in non-indexed-labels (#10142) **What this PR does / why we need it**: This PR fixes some bugs discovered while setting the default chunk format yo _V4_ and the default head format to _unordered with non-indexed labels_. Note that this PR doesn't change the defaults, but fixes some bugs that would prevent us from changing them in the future. The bugs it fixes are: - Index out-of-range panic when symbols' labels are empty and we do a Lookup. **Test added**. - When applying retention, new chunks wouldn't contain non-indexed labels (caused by https://github.com/grafana/loki/pull/10090). **Test added**. - New chunks would fail to serialize non-indexed labels when applying retention because the head format would be the [dummy one (ordered)][1] [(used here via `c.headFmt`)][2]. **Cannot test unless we change the default chunk/head format** but tested manually. **Checklist** - [ ] Reviewed the [`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md) guide (**required**) - [ ] Documentation added - [ ] Tests updated - [ ] `CHANGELOG.md` updated - [ ] If the change is worth mentioning in the release notes, add `add-to-release-notes` label - [ ] Changes that require user attention or interaction to upgrade are documented in `docs/sources/setup/upgrade/_index.md` - [ ] For Helm chart changes bump the Helm chart version in `production/helm/loki/Chart.yaml` and update `production/helm/loki/CHANGELOG.md` and `production/helm/loki/README.md`. [Example PR](https://github.com/grafana/loki/commit/d10549e3ece02120974929894ee333d07755d213) [1]: https://github.com/grafana/loki/blob/457f2e6ec0f626dfec17faa8ba424a04818168b8/pkg/chunkenc/memchunk.go#L375 [2]: https://github.com/grafana/loki/blob/24fd56683328514b58257972d6ca5baa9aeec51b/pkg/chunkenc/memchunk.go#L1088-L1095
2 years ago
func TestMemChunk_IteratorOptions(t *testing.T) {
chk := newMemChunkWithFormat(chunkFormatV4, EncNone, UnorderedWithNonIndexedLabelsHeadBlockFmt, testBlockSize, testTargetSize)
require.NoError(t, chk.Append(logprotoEntryWithNonIndexedLabels(0, "0", logproto.FromLabelsToLabelAdapters(
labels.FromStrings("a", "0"),
))))
require.NoError(t, chk.Append(logprotoEntryWithNonIndexedLabels(1, "1", logproto.FromLabelsToLabelAdapters(
labels.FromStrings("a", "1"),
))))
require.NoError(t, chk.cut())
require.NoError(t, chk.Append(logprotoEntryWithNonIndexedLabels(2, "2", logproto.FromLabelsToLabelAdapters(
labels.FromStrings("a", "2"),
))))
require.NoError(t, chk.Append(logprotoEntryWithNonIndexedLabels(3, "3", logproto.FromLabelsToLabelAdapters(
labels.FromStrings("a", "3"),
))))
for _, tc := range []struct {
name string
options []iter.EntryIteratorOption
expectNonIndexedLabels bool
}{
{
name: "No options",
expectNonIndexedLabels: false,
},
{
name: "WithKeepNonIndexedLabels",
options: []iter.EntryIteratorOption{
iter.WithKeepNonIndexedLabels(),
},
expectNonIndexedLabels: true,
},
} {
t.Run(tc.name, func(t *testing.T) {
it, err := chk.Iterator(context.Background(), time.Unix(0, 0), time.Unix(0, math.MaxInt64), logproto.FORWARD, noopStreamPipeline, tc.options...)
require.NoError(t, err)
var idx int64
for it.Next() {
expectedLabels := labels.FromStrings("a", fmt.Sprintf("%d", idx))
expectedEntry := logproto.Entry{
Timestamp: time.Unix(0, idx),
Line: fmt.Sprintf("%d", idx),
}
if tc.expectNonIndexedLabels {
expectedEntry.NonIndexedLabels = logproto.FromLabelsToLabelAdapters(expectedLabels)
}
require.Equal(t, expectedEntry, it.Entry())
require.Equal(t, expectedLabels.String(), it.Labels())
idx++
}
})
}
}