**What this PR does / why we need it**:
Previously we used to read the info of all the chunks from the index and
then filter it out in a layer above within the tsdb code. This wastes a
lot of resources when there are too many chunks in the index, but we
just need a few of them based on the query range.
Before jumping into how and why I went with chunk sampling, here are
some points to consider:
* Chunks in the index are sorted by the start time of the chunk. Since
this does not tell us much about the end time of the chunks, we can only
skip chunks that start after the end time of the query, which still
would make us process lots of chunks when the query touches chunks that
are near the end of the table boundary.
* Data is written to tsdb with variable length encoding. This means we
can't skip/jump chunks since each chunk info might vary in the number of
bytes we write.
Here is how I have implemented the sampling approach:
* Chunks are sampled considering their end times from the index and
stored in memory.
* Here is how `chunkSample` is defined:
```
type chunkSample struct {
largestMaxt int64 // holds largest chunk end time we have seen so far. In other words all the earlier chunks have maxt <= largestMaxt
idx int // index of the chunk in the list which helps with determining position of sampled chunk
offset int // offset is relative to beginning chunk info block i.e after series labels info and chunk count etc
prevChunkMaxt int64 // chunk times are stored as deltas. This is used for calculating mint of sampled chunk
}
```
* When a query comes in, we will find `chunkSample`, which has the
largest "largestMaxt" that is less than the given query start time. In
other words, find a chunk sample which skips all/most of the chunks that
end before the query start time.
* Once we have found a chunk sample which skips all/most of the chunks
that end before the query start, we will sequentially go through chunks
and consider only the once that overlap with the query range. We will
stop processing chunks as soon as we see a chunk that starts after the
end time of the query since the chunks are sorted by start time.
* Sampling of chunks is done lazily for only the series that are
queried, so we do not waste any resources on sampling series that are
not queried.
* To avoid sampling too many chunks, I am sampling chunks at `1h` steps
i.e given a sampled chunk with chunk end time `t`, the next chunk would
be sampled with end time >= `t + 1h`. This means typically, we should
have ~28 chunks sampled for each series queried from each index file,
considering 2h default chunk length and chunks overlapping multiple
tables.
Here are the benchmark results showing the difference it makes:
```
benchmark old ns/op new ns/op delta
BenchmarkTSDBIndex_GetChunkRefs-10 12420741 4764309 -61.64%
BenchmarkTSDBIndex_GetChunkRefs-10 12412014 4794156 -61.37%
BenchmarkTSDBIndex_GetChunkRefs-10 12382716 4748571 -61.65%
BenchmarkTSDBIndex_GetChunkRefs-10 12391397 4691054 -62.14%
BenchmarkTSDBIndex_GetChunkRefs-10 12272200 5023567 -59.07%
benchmark old allocs new allocs delta
BenchmarkTSDBIndex_GetChunkRefs-10 345653 40 -99.99%
BenchmarkTSDBIndex_GetChunkRefs-10 345653 40 -99.99%
BenchmarkTSDBIndex_GetChunkRefs-10 345653 40 -99.99%
BenchmarkTSDBIndex_GetChunkRefs-10 345653 40 -99.99%
BenchmarkTSDBIndex_GetChunkRefs-10 345653 40 -99.99%
benchmark old bytes new bytes delta
BenchmarkTSDBIndex_GetChunkRefs-10 27286536 6398855 -76.55%
BenchmarkTSDBIndex_GetChunkRefs-10 27286571 6399276 -76.55%
BenchmarkTSDBIndex_GetChunkRefs-10 27286566 6400699 -76.54%
BenchmarkTSDBIndex_GetChunkRefs-10 27286561 6399158 -76.55%
BenchmarkTSDBIndex_GetChunkRefs-10 27286580 6399643 -76.55%
```
**Checklist**
- [x] Tests updated