In many cases we only need a few labels from a line. Because most of our
parsers parse lines incrementally, we can stop parsing a line after we
have all the labels we want from it.
This pr uses `ParserHints` to keep track of the number of extracted
labels. It also provides a way for parsers to know when they should stop
parsing.
Notes:
- parsers had inconsistent ordering between the `ShouldExtract` call and
adding the `_extracted` label to duplicates. This PR makes appending
`_extracted` always happen before `ShouldExtract` to keep counts of what
is extracted compared to expected labels consistent.
Next Steps:
- When the user specifies a query with a grouping containing the
`_extracted` label but there is no duplicate between the passed line and
labels, short circuiting will not work. I'll address this in a follow up
PR.
Benchmarks:
To try and show a balanced view of what this buys us, this pr picks a
label our of the middle of a line. In a best case this might be much
better. In the worst case, we have to parse the whole line.
```
benchstat short_circuit_old.txt short_circuit_new.txt
name old time/op new time/op delta
KeyExtraction/json-8 456ns ± 3% 256ns ± 2% -43.84% (p=0.000 n=9+10)
KeyExtraction/logfmt-8 347ns ± 4% 171ns ± 2% -50.86% (p=0.000 n=10+10)
KeyExtraction/logfmt-expression-8 552ns ± 2% 368ns ± 2% -33.23% (p=0.000 n=9+10)
name old alloc/op new alloc/op delta
KeyExtraction/json-8 5.00B ± 0% 5.00B ± 0% ~ (all equal)
KeyExtraction/logfmt-8 5.00B ± 0% 5.00B ± 0% ~ (all equal)
KeyExtraction/logfmt-expression-8 16.0B ± 0% 16.0B ± 0% ~ (all equal)
name old allocs/op new allocs/op delta
KeyExtraction/json-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
KeyExtraction/logfmt-8 1.00 ± 0% 1.00 ± 0% ~ (all equal)
KeyExtraction/logfmt-expression-8 2.00 ± 0% 2.00 ± 0% ~ (all equal)
```
---------
Co-authored-by: J Stickler <julie.stickler@grafana.com>
@ -285,7 +285,7 @@ For instance, the pipeline `| json` will produce the following mapping:
In case of errors, for instance if the line is not in the expected format, the log line won't be filtered but instead will get a new `__error__` label added.
If an extracted label key name already exists in the original log stream, the extracted label key will be suffixed with the `_extracted` keyword to make the distinction between the two labels. You can forcefully override the original label using a [label formatter expression](#labels-format-expression). However if an extracted key appears twice, only the latest label value will be kept.
If an extracted label key name already exists in the original log stream, the extracted label key will be suffixed with the `_extracted` keyword to make the distinction between the two labels. You can forcefully override the original label using a [label formatter expression](#labels-format-expression). However, if an extracted key appears twice, only the first label value will be kept.
Loki supports [JSON](#json), [logfmt](#logfmt), [pattern](#pattern), [regexp](#regular-expression) and [unpack](#unpack) parsers.
testLine:=[]byte(`{"app":"foo","field with space":"value","field with ÜFT8👌":"value","null_field":null,"bool_field":false,"namespace":"prod","pod":{"uuid":"foo","deployment":{"ref":"foobar", "params": [1,2,3]}}}`)