LogQL: Simple JSON expressions (#3280)

* New approach, still rough

Signed-off-by: Danny Kopping <danny.kopping@grafana.com>

* Adding benchmark

Signed-off-by: Danny Kopping <danny.kopping@grafana.com>

* Adding tests

Signed-off-by: Danny Kopping <danny.kopping@grafana.com>

* Minor refactoring

Signed-off-by: Danny Kopping <danny.kopping@grafana.com>

* Appeasing the linter

Signed-off-by: Danny Kopping <danny.kopping@grafana.com>

* Further appeasing the linter

Signed-off-by: Danny Kopping <danny.kopping@grafana.com>

* Adding more tests

Signed-off-by: Danny Kopping <danny.kopping@grafana.com>

* Adding documentation

Signed-off-by: Danny Kopping <danny.kopping@grafana.com>

* Docs fixup

Signed-off-by: Danny Kopping <danny.kopping@grafana.com>

* Removing unnecessary condition

Signed-off-by: Danny Kopping <danny.kopping@grafana.com>

* Adding extra tests from suggestion in review

Signed-off-by: Danny Kopping <danny.kopping@grafana.com>

* Adding JSONParseErr

Signed-off-by: Danny Kopping <danny.kopping@grafana.com>

* Adding test to cover invalid JSON line

Signed-off-by: Danny Kopping <danny.kopping@grafana.com>

* Adding equivalent benchmarks for JSON and JSONExpression parsing

Signed-off-by: Danny Kopping <danny.kopping@grafana.com>

* Adding suffix if label would be overridden

Signed-off-by: Danny Kopping <danny.kopping@grafana.com>

* Reparenting jsonexpr directory to more appropriate location

Signed-off-by: Danny Kopping <danny.kopping@grafana.com>

* Setting empty label on non-matching expression, to retain parity with label_format

Signed-off-by: Danny Kopping <danny.kopping@grafana.com>

* Adding statement about returned complex JSON types

Signed-off-by: Danny Kopping <danny.kopping@grafana.com>

* Added check for valid label name

Signed-off-by: Danny Kopping <danny.kopping@grafana.com>

* Making json expressions shardable

Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
pull/3321/head
Danny Kopping 4 years ago committed by GitHub
parent 6c8fdd68aa
commit feb7fb470b
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
  1. 128
      docs/sources/logql/_index.md
  2. 33
      pkg/logql/ast.go
  3. 1
      pkg/logql/ast_test.go
  4. 18
      pkg/logql/expr.y
  5. 813
      pkg/logql/expr.y.go
  6. 1
      pkg/logql/lex_test.go
  7. 13
      pkg/logql/log/json_expr.go
  8. 56
      pkg/logql/log/jsonexpr/jsonexpr.y
  9. 517
      pkg/logql/log/jsonexpr/jsonexpr.y.go
  10. 131
      pkg/logql/log/jsonexpr/jsonexpr_test.go
  11. 164
      pkg/logql/log/jsonexpr/lexer.go
  12. 19
      pkg/logql/log/jsonexpr/parser.go
  13. 51
      pkg/logql/log/parser.go
  14. 293
      pkg/logql/log/parser_test.go
  15. 86
      pkg/logql/log/pipeline_test.go
  16. 34
      pkg/logql/parser_test.go

@ -145,40 +145,100 @@ If an extracted label key name already exists in the original log stream, the ex
We support currently support json, logfmt and regexp parsers.
The **json** parsers take no parameters and can be added using the expression `| json` in your pipeline. It will extract all json properties as labels if the log line is a valid json document. Nested properties are flattened into label keys using the `_` separator. **Arrays are skipped**.
For example the json parsers will extract from the following document:
```json
{
"protocol": "HTTP/2.0",
"servers": ["129.0.1.1","10.2.1.3"],
"request": {
"time": "6.032",
"method": "GET",
"host": "foo.grafana.net",
"size": "55",
},
"response": {
"status": 401,
"size": "228",
"latency_seconds": "6.031"
}
}
```
The following list of labels:
```kv
"protocol" => "HTTP/2.0"
"request_time" => "6.032"
"request_method" => "GET"
"request_host" => "foo.grafana.net"
"request_size" => "55"
"response_status" => "401"
"response_size" => "228"
"response_size" => "228"
```
The **json** parser operates in two modes:
1. **without** parameters:
Adding `| json` to your pipeline will extract all json properties as labels if the log line is a valid json document.
Nested properties are flattened into label keys using the `_` separator.
Note: **Arrays are skipped**.
For example the json parsers will extract from the following document:
```json
{
"protocol": "HTTP/2.0",
"servers": ["129.0.1.1","10.2.1.3"],
"request": {
"time": "6.032",
"method": "GET",
"host": "foo.grafana.net",
"size": "55",
"headers": {
"Accept": "*/*",
"User-Agent": "curl/7.68.0"
}
},
"response": {
"status": 401,
"size": "228",
"latency_seconds": "6.031"
}
}
```
The following list of labels:
```kv
"protocol" => "HTTP/2.0"
"request_time" => "6.032"
"request_method" => "GET"
"request_host" => "foo.grafana.net"
"request_size" => "55"
"response_status" => "401"
"response_size" => "228"
"response_size" => "228"
```
2. **with** parameters:
Using `| json label="expression", another="expression"` in your pipeline will extract only the
specified json fields to labels. You can specify one or more expressions in this way, the same
as [`label_format`](#labels-format-expression); all expressions must be quoted.
Currently, we only support field access (`my.field`, `my["field"]`) and array access (`list[0]`), and any combination
of these in any level of nesting (`my.list[0]["field"]`).
For example, `| json first_server="servers[0]", ua="request.headers[\"User-Agent\"]` will extract from the following document:
```json
{
"protocol": "HTTP/2.0",
"servers": ["129.0.1.1","10.2.1.3"],
"request": {
"time": "6.032",
"method": "GET",
"host": "foo.grafana.net",
"size": "55",
"headers": {
"Accept": "*/*",
"User-Agent": "curl/7.68.0"
}
},
"response": {
"status": 401,
"size": "228",
"latency_seconds": "6.031"
}
}
```
The following list of labels:
```kv
"first_server" => "129.0.1.1"
"ua" => "curl/7.68.0"
```
If an array or an object returned by an expression, it will be assigned to the label in json format.
For example, `| json server_list="servers", headers="request.headers` will extract:
```kv
"server_list" => `["129.0.1.1","10.2.1.3"]`
"headers" => `{"Accept": "*/*", "User-Agent": "curl/7.68.0"}`
```
The **logfmt** parser can be added using the `| logfmt` and will extract all keys and values from the [logfmt](https://brandur.org/logfmt) formatted log line.

@ -416,6 +416,39 @@ func (e *labelFmtExpr) String() string {
return sb.String()
}
type jsonExpressionParser struct {
expressions []log.JSONExpression
implicit
}
func newJSONExpressionParser(expressions []log.JSONExpression) *jsonExpressionParser {
return &jsonExpressionParser{
expressions: expressions,
}
}
func (j *jsonExpressionParser) Shardable() bool { return true }
func (j *jsonExpressionParser) Stage() (log.Stage, error) {
return log.NewJSONExpressionParser(j.expressions)
}
func (j *jsonExpressionParser) String() string {
var sb strings.Builder
sb.WriteString(fmt.Sprintf("%s %s ", OpPipe, OpParserTypeJSON))
for i, exp := range j.expressions {
sb.WriteString(exp.Identifier)
sb.WriteString("=")
sb.WriteString(strconv.Quote(exp.Expression))
if i+1 != len(j.expressions) {
sb.WriteString(",")
}
}
return sb.String()
}
func mustNewMatcher(t labels.MatchType, n, v string) *labels.Matcher {
m, err := labels.NewMatcher(t, n, v)
if err != nil {

@ -127,6 +127,7 @@ func Test_SampleExpr_String(t *testing.T) {
`,
`10 / (5/2)`,
`10 / (count_over_time({job="postgres"}[5m])/2)`,
`{app="foo"} | json response_status="response.status.code", first_param="request.params[0]"`,
} {
t.Run(tc, func(t *testing.T) {
expr, err := ParseExpr(tc)

@ -46,6 +46,9 @@ import (
LabelFormatExpr *labelFmtExpr
LabelFormat log.LabelFmt
LabelsFormat []log.LabelFmt
JSONExpressionParser *jsonExpressionParser
JSONExpression log.JSONExpression
JSONExpressionList []log.JSONExpression
UnwrapExpr *unwrapExpr
}
@ -82,6 +85,9 @@ import (
%type <LabelFormatExpr> labelFormatExpr
%type <LabelFormat> labelFormat
%type <LabelsFormat> labelsFormat
%type <JSONExpressionParser> jsonExpressionParser
%type <JSONExpression> jsonExpression
%type <JSONExpressionList> jsonExpressionList
%type <UnwrapExpr> unwrapExpr
%type <UnitFilter> unitFilter
@ -211,6 +217,7 @@ pipelineExpr:
pipelineStage:
lineFilters { $$ = $1 }
| PIPE labelParser { $$ = $2 }
| PIPE jsonExpressionParser { $$ = $2 }
| PIPE labelFilter { $$ = &labelFilterExpr{LabelFilterer: $2 }}
| PIPE lineFormatExpr { $$ = $2 }
| PIPE labelFormatExpr { $$ = $2 }
@ -226,6 +233,9 @@ labelParser:
| REGEXP STRING { $$ = newLabelParserExpr(OpParserTypeRegexp, $2) }
;
jsonExpressionParser:
JSON jsonExpressionList { $$ = newJSONExpressionParser($2) }
lineFormatExpr: LINE_FMT STRING { $$ = newLineFmtExpr($2) };
labelFormat:
@ -252,6 +262,14 @@ labelFilter:
| labelFilter OR labelFilter { $$ = log.NewOrLabelFilter($1, $3 ) }
;
jsonExpression:
IDENTIFIER EQ STRING { $$ = log.NewJSONExpr($1, $3) }
jsonExpressionList:
jsonExpression { $$ = []log.JSONExpression{$1} }
| jsonExpressionList COMMA jsonExpression { $$ = append($1, $3) }
;
unitFilter:
durationFilter { $$ = $1 }
| bytesFilter { $$ = $1 }

File diff suppressed because it is too large Load Diff

@ -56,6 +56,7 @@ func TestLex(t *testing.T) {
{`{foo="bar"}
# |~ "\\w+"
| json`, []int{OPEN_BRACE, IDENTIFIER, EQ, STRING, CLOSE_BRACE, PIPE, JSON}},
{`{foo="bar"} | json code="response.code", param="request.params[0]"`, []int{OPEN_BRACE, IDENTIFIER, EQ, STRING, CLOSE_BRACE, PIPE, JSON, IDENTIFIER, EQ, STRING, COMMA, IDENTIFIER, EQ, STRING}},
} {
t.Run(tc.input, func(t *testing.T) {
actual := []int{}

@ -0,0 +1,13 @@
package log
type JSONExpression struct {
Identifier string
Expression string
}
func NewJSONExpr(identifier, expression string) JSONExpression {
return JSONExpression{
Identifier: identifier,
Expression: expression,
}
}

@ -0,0 +1,56 @@
// Inspired by https://github.com/sjjian/yacc-examples
%{
package jsonexpr
func setScannerData(lex interface{}, data []interface{}) {
lex.(*Scanner).data = data
}
%}
%union {
empty struct{}
str string
field string
list []interface{}
int int
}
%token<empty> DOT LSB RSB
%token<str> STRING
%token<field> FIELD
%token<int> INDEX
%type<int> index index_access
%type<str> field key key_access
%type<list> values
%%
json:
values { setScannerData(JSONExprlex, $1) }
values:
field { $$ = []interface{}{$1} }
| key_access { $$ = []interface{}{$1} }
| index_access { $$ = []interface{}{$1} }
| values key_access { $$ = append($1, $2) }
| values index_access { $$ = append($1, $2) }
| values DOT field { $$ = append($1, $3) }
;
key_access:
LSB key RSB { $$ = $2 }
index_access:
LSB index RSB { $$ = $2 }
field:
FIELD { $$ = $1 }
key:
STRING { $$ = $1 }
index:
INDEX { $$ = $1 }

@ -0,0 +1,517 @@
// Code generated by goyacc -p JSONExpr -o pkg/logql/log/jsonexpr/jsonexpr.y.go pkg/logql/log/jsonexpr/jsonexpr.y. DO NOT EDIT.
//line pkg/logql/log/jsonexpr/jsonexpr.y:4
package jsonexpr
import __yyfmt__ "fmt"
//line pkg/logql/log/jsonexpr/jsonexpr.y:4
func setScannerData(lex interface{}, data []interface{}) {
lex.(*Scanner).data = data
}
//line pkg/logql/log/jsonexpr/jsonexpr.y:12
type JSONExprSymType struct {
yys int
empty struct{}
str string
field string
list []interface{}
int int
}
const DOT = 57346
const LSB = 57347
const RSB = 57348
const STRING = 57349
const FIELD = 57350
const INDEX = 57351
var JSONExprToknames = [...]string{
"$end",
"error",
"$unk",
"DOT",
"LSB",
"RSB",
"STRING",
"FIELD",
"INDEX",
}
var JSONExprStatenames = [...]string{}
const JSONExprEofCode = 1
const JSONExprErrCode = 2
const JSONExprInitialStackSize = 16
//line yacctab:1
var JSONExprExca = [...]int{
-1, 1,
1, -1,
-2, 0,
}
const JSONExprPrivate = 57344
const JSONExprLast = 19
var JSONExprAct = [...]int{
3, 13, 7, 14, 6, 6, 17, 16, 10, 7,
4, 15, 5, 8, 1, 9, 2, 11, 12,
}
var JSONExprPact = [...]int{
-3, -1000, 4, -1000, -1000, -1000, -1000, -6, -1000, -1000,
-4, 1, 0, -1000, -1000, -1000, -1000, -1000,
}
var JSONExprPgo = [...]int{
0, 18, 12, 0, 17, 10, 16, 14,
}
var JSONExprR1 = [...]int{
0, 7, 6, 6, 6, 6, 6, 6, 5, 2,
3, 4, 1,
}
var JSONExprR2 = [...]int{
0, 1, 1, 1, 1, 2, 2, 3, 3, 3,
1, 1, 1,
}
var JSONExprChk = [...]int{
-1000, -7, -6, -3, -5, -2, 8, 5, -5, -2,
4, -4, -1, 7, 9, -3, 6, 6,
}
var JSONExprDef = [...]int{
0, -2, 1, 2, 3, 4, 10, 0, 5, 6,
0, 0, 0, 11, 12, 7, 8, 9,
}
var JSONExprTok1 = [...]int{
1,
}
var JSONExprTok2 = [...]int{
2, 3, 4, 5, 6, 7, 8, 9,
}
var JSONExprTok3 = [...]int{
0,
}
var JSONExprErrorMessages = [...]struct {
state int
token int
msg string
}{}
//line yaccpar:1
/* parser for yacc output */
var (
JSONExprDebug = 0
JSONExprErrorVerbose = false
)
type JSONExprLexer interface {
Lex(lval *JSONExprSymType) int
Error(s string)
}
type JSONExprParser interface {
Parse(JSONExprLexer) int
Lookahead() int
}
type JSONExprParserImpl struct {
lval JSONExprSymType
stack [JSONExprInitialStackSize]JSONExprSymType
char int
}
func (p *JSONExprParserImpl) Lookahead() int {
return p.char
}
func JSONExprNewParser() JSONExprParser {
return &JSONExprParserImpl{}
}
const JSONExprFlag = -1000
func JSONExprTokname(c int) string {
if c >= 1 && c-1 < len(JSONExprToknames) {
if JSONExprToknames[c-1] != "" {
return JSONExprToknames[c-1]
}
}
return __yyfmt__.Sprintf("tok-%v", c)
}
func JSONExprStatname(s int) string {
if s >= 0 && s < len(JSONExprStatenames) {
if JSONExprStatenames[s] != "" {
return JSONExprStatenames[s]
}
}
return __yyfmt__.Sprintf("state-%v", s)
}
func JSONExprErrorMessage(state, lookAhead int) string {
const TOKSTART = 4
if !JSONExprErrorVerbose {
return "syntax error"
}
for _, e := range JSONExprErrorMessages {
if e.state == state && e.token == lookAhead {
return "syntax error: " + e.msg
}
}
res := "syntax error: unexpected " + JSONExprTokname(lookAhead)
// To match Bison, suggest at most four expected tokens.
expected := make([]int, 0, 4)
// Look for shiftable tokens.
base := JSONExprPact[state]
for tok := TOKSTART; tok-1 < len(JSONExprToknames); tok++ {
if n := base + tok; n >= 0 && n < JSONExprLast && JSONExprChk[JSONExprAct[n]] == tok {
if len(expected) == cap(expected) {
return res
}
expected = append(expected, tok)
}
}
if JSONExprDef[state] == -2 {
i := 0
for JSONExprExca[i] != -1 || JSONExprExca[i+1] != state {
i += 2
}
// Look for tokens that we accept or reduce.
for i += 2; JSONExprExca[i] >= 0; i += 2 {
tok := JSONExprExca[i]
if tok < TOKSTART || JSONExprExca[i+1] == 0 {
continue
}
if len(expected) == cap(expected) {
return res
}
expected = append(expected, tok)
}
// If the default action is to accept or reduce, give up.
if JSONExprExca[i+1] != 0 {
return res
}
}
for i, tok := range expected {
if i == 0 {
res += ", expecting "
} else {
res += " or "
}
res += JSONExprTokname(tok)
}
return res
}
func JSONExprlex1(lex JSONExprLexer, lval *JSONExprSymType) (char, token int) {
token = 0
char = lex.Lex(lval)
if char <= 0 {
token = JSONExprTok1[0]
goto out
}
if char < len(JSONExprTok1) {
token = JSONExprTok1[char]
goto out
}
if char >= JSONExprPrivate {
if char < JSONExprPrivate+len(JSONExprTok2) {
token = JSONExprTok2[char-JSONExprPrivate]
goto out
}
}
for i := 0; i < len(JSONExprTok3); i += 2 {
token = JSONExprTok3[i+0]
if token == char {
token = JSONExprTok3[i+1]
goto out
}
}
out:
if token == 0 {
token = JSONExprTok2[1] /* unknown char */
}
if JSONExprDebug >= 3 {
__yyfmt__.Printf("lex %s(%d)\n", JSONExprTokname(token), uint(char))
}
return char, token
}
func JSONExprParse(JSONExprlex JSONExprLexer) int {
return JSONExprNewParser().Parse(JSONExprlex)
}
func (JSONExprrcvr *JSONExprParserImpl) Parse(JSONExprlex JSONExprLexer) int {
var JSONExprn int
var JSONExprVAL JSONExprSymType
var JSONExprDollar []JSONExprSymType
_ = JSONExprDollar // silence set and not used
JSONExprS := JSONExprrcvr.stack[:]
Nerrs := 0 /* number of errors */
Errflag := 0 /* error recovery flag */
JSONExprstate := 0
JSONExprrcvr.char = -1
JSONExprtoken := -1 // JSONExprrcvr.char translated into internal numbering
defer func() {
// Make sure we report no lookahead when not parsing.
JSONExprstate = -1
JSONExprrcvr.char = -1
JSONExprtoken = -1
}()
JSONExprp := -1
goto JSONExprstack
ret0:
return 0
ret1:
return 1
JSONExprstack:
/* put a state and value onto the stack */
if JSONExprDebug >= 4 {
__yyfmt__.Printf("char %v in %v\n", JSONExprTokname(JSONExprtoken), JSONExprStatname(JSONExprstate))
}
JSONExprp++
if JSONExprp >= len(JSONExprS) {
nyys := make([]JSONExprSymType, len(JSONExprS)*2)
copy(nyys, JSONExprS)
JSONExprS = nyys
}
JSONExprS[JSONExprp] = JSONExprVAL
JSONExprS[JSONExprp].yys = JSONExprstate
JSONExprnewstate:
JSONExprn = JSONExprPact[JSONExprstate]
if JSONExprn <= JSONExprFlag {
goto JSONExprdefault /* simple state */
}
if JSONExprrcvr.char < 0 {
JSONExprrcvr.char, JSONExprtoken = JSONExprlex1(JSONExprlex, &JSONExprrcvr.lval)
}
JSONExprn += JSONExprtoken
if JSONExprn < 0 || JSONExprn >= JSONExprLast {
goto JSONExprdefault
}
JSONExprn = JSONExprAct[JSONExprn]
if JSONExprChk[JSONExprn] == JSONExprtoken { /* valid shift */
JSONExprrcvr.char = -1
JSONExprtoken = -1
JSONExprVAL = JSONExprrcvr.lval
JSONExprstate = JSONExprn
if Errflag > 0 {
Errflag--
}
goto JSONExprstack
}
JSONExprdefault:
/* default state action */
JSONExprn = JSONExprDef[JSONExprstate]
if JSONExprn == -2 {
if JSONExprrcvr.char < 0 {
JSONExprrcvr.char, JSONExprtoken = JSONExprlex1(JSONExprlex, &JSONExprrcvr.lval)
}
/* look through exception table */
xi := 0
for {
if JSONExprExca[xi+0] == -1 && JSONExprExca[xi+1] == JSONExprstate {
break
}
xi += 2
}
for xi += 2; ; xi += 2 {
JSONExprn = JSONExprExca[xi+0]
if JSONExprn < 0 || JSONExprn == JSONExprtoken {
break
}
}
JSONExprn = JSONExprExca[xi+1]
if JSONExprn < 0 {
goto ret0
}
}
if JSONExprn == 0 {
/* error ... attempt to resume parsing */
switch Errflag {
case 0: /* brand new error */
JSONExprlex.Error(JSONExprErrorMessage(JSONExprstate, JSONExprtoken))
Nerrs++
if JSONExprDebug >= 1 {
__yyfmt__.Printf("%s", JSONExprStatname(JSONExprstate))
__yyfmt__.Printf(" saw %s\n", JSONExprTokname(JSONExprtoken))
}
fallthrough
case 1, 2: /* incompletely recovered error ... try again */
Errflag = 3
/* find a state where "error" is a legal shift action */
for JSONExprp >= 0 {
JSONExprn = JSONExprPact[JSONExprS[JSONExprp].yys] + JSONExprErrCode
if JSONExprn >= 0 && JSONExprn < JSONExprLast {
JSONExprstate = JSONExprAct[JSONExprn] /* simulate a shift of "error" */
if JSONExprChk[JSONExprstate] == JSONExprErrCode {
goto JSONExprstack
}
}
/* the current p has no shift on "error", pop stack */
if JSONExprDebug >= 2 {
__yyfmt__.Printf("error recovery pops state %d\n", JSONExprS[JSONExprp].yys)
}
JSONExprp--
}
/* there is no state on the stack with an error shift ... abort */
goto ret1
case 3: /* no shift yet; clobber input char */
if JSONExprDebug >= 2 {
__yyfmt__.Printf("error recovery discards %s\n", JSONExprTokname(JSONExprtoken))
}
if JSONExprtoken == JSONExprEofCode {
goto ret1
}
JSONExprrcvr.char = -1
JSONExprtoken = -1
goto JSONExprnewstate /* try again in the same state */
}
}
/* reduction by production JSONExprn */
if JSONExprDebug >= 2 {
__yyfmt__.Printf("reduce %v in:\n\t%v\n", JSONExprn, JSONExprStatname(JSONExprstate))
}
JSONExprnt := JSONExprn
JSONExprpt := JSONExprp
_ = JSONExprpt // guard against "declared and not used"
JSONExprp -= JSONExprR2[JSONExprn]
// JSONExprp is now the index of $0. Perform the default action. Iff the
// reduced production is ε, $1 is possibly out of range.
if JSONExprp+1 >= len(JSONExprS) {
nyys := make([]JSONExprSymType, len(JSONExprS)*2)
copy(nyys, JSONExprS)
JSONExprS = nyys
}
JSONExprVAL = JSONExprS[JSONExprp+1]
/* consult goto table to find next state */
JSONExprn = JSONExprR1[JSONExprn]
JSONExprg := JSONExprPgo[JSONExprn]
JSONExprj := JSONExprg + JSONExprS[JSONExprp].yys + 1
if JSONExprj >= JSONExprLast {
JSONExprstate = JSONExprAct[JSONExprg]
} else {
JSONExprstate = JSONExprAct[JSONExprj]
if JSONExprChk[JSONExprstate] != -JSONExprn {
JSONExprstate = JSONExprAct[JSONExprg]
}
}
// dummy call; replaced with literal code
switch JSONExprnt {
case 1:
JSONExprDollar = JSONExprS[JSONExprpt-1 : JSONExprpt+1]
//line pkg/logql/log/jsonexpr/jsonexpr.y:32
{
setScannerData(JSONExprlex, JSONExprDollar[1].list)
}
case 2:
JSONExprDollar = JSONExprS[JSONExprpt-1 : JSONExprpt+1]
//line pkg/logql/log/jsonexpr/jsonexpr.y:35
{
JSONExprVAL.list = []interface{}{JSONExprDollar[1].str}
}
case 3:
JSONExprDollar = JSONExprS[JSONExprpt-1 : JSONExprpt+1]
//line pkg/logql/log/jsonexpr/jsonexpr.y:36
{
JSONExprVAL.list = []interface{}{JSONExprDollar[1].str}
}
case 4:
JSONExprDollar = JSONExprS[JSONExprpt-1 : JSONExprpt+1]
//line pkg/logql/log/jsonexpr/jsonexpr.y:37
{
JSONExprVAL.list = []interface{}{JSONExprDollar[1].int}
}
case 5:
JSONExprDollar = JSONExprS[JSONExprpt-2 : JSONExprpt+1]
//line pkg/logql/log/jsonexpr/jsonexpr.y:38
{
JSONExprVAL.list = append(JSONExprDollar[1].list, JSONExprDollar[2].str)
}
case 6:
JSONExprDollar = JSONExprS[JSONExprpt-2 : JSONExprpt+1]
//line pkg/logql/log/jsonexpr/jsonexpr.y:39
{
JSONExprVAL.list = append(JSONExprDollar[1].list, JSONExprDollar[2].int)
}
case 7:
JSONExprDollar = JSONExprS[JSONExprpt-3 : JSONExprpt+1]
//line pkg/logql/log/jsonexpr/jsonexpr.y:40
{
JSONExprVAL.list = append(JSONExprDollar[1].list, JSONExprDollar[3].str)
}
case 8:
JSONExprDollar = JSONExprS[JSONExprpt-3 : JSONExprpt+1]
//line pkg/logql/log/jsonexpr/jsonexpr.y:44
{
JSONExprVAL.str = JSONExprDollar[2].str
}
case 9:
JSONExprDollar = JSONExprS[JSONExprpt-3 : JSONExprpt+1]
//line pkg/logql/log/jsonexpr/jsonexpr.y:47
{
JSONExprVAL.int = JSONExprDollar[2].int
}
case 10:
JSONExprDollar = JSONExprS[JSONExprpt-1 : JSONExprpt+1]
//line pkg/logql/log/jsonexpr/jsonexpr.y:50
{
JSONExprVAL.str = JSONExprDollar[1].field
}
case 11:
JSONExprDollar = JSONExprS[JSONExprpt-1 : JSONExprpt+1]
//line pkg/logql/log/jsonexpr/jsonexpr.y:53
{
JSONExprVAL.str = JSONExprDollar[1].str
}
case 12:
JSONExprDollar = JSONExprS[JSONExprpt-1 : JSONExprpt+1]
//line pkg/logql/log/jsonexpr/jsonexpr.y:56
{
JSONExprVAL.int = JSONExprDollar[1].int
}
}
goto JSONExprstack /* stack new state and value */
}

@ -0,0 +1,131 @@
package jsonexpr
import (
"fmt"
"testing"
"github.com/stretchr/testify/require"
)
func TestJSONExpressionParser(t *testing.T) {
// {"app":"foo","field with space":"value","field with ÜFT8👌":true,"namespace":"prod","pod":{"uuid":"foo","deployment":{"ref":"foobar", "params": [{"param": true},2,3]}}}
tests := []struct {
name string
expression string
want []interface{}
error error
}{
{
"single field",
"app",
[]interface{}{"app"},
nil,
},
{
"top-level field with spaces",
`["field with space"]`,
[]interface{}{"field with space"},
nil,
},
{
"top-level field with UTF8",
`["field with ÜFT8👌"]`,
[]interface{}{"field with ÜFT8👌"},
nil,
},
{
"top-level array access",
`[0]`,
[]interface{}{0},
nil,
},
{
"nested field",
`pod.uuid`,
[]interface{}{"pod", "uuid"},
nil,
},
{
"nested field alternate syntax",
`pod["uuid"]`,
[]interface{}{"pod", "uuid"},
nil,
},
{
"nested field alternate syntax 2",
`["pod"]["uuid"]`,
[]interface{}{"pod", "uuid"},
nil,
},
{
"array access",
`pod.deployment.params[0]`,
[]interface{}{"pod", "deployment", "params", 0},
nil,
},
{
"multi-level array access",
`pod.deployment.params[0].param`,
[]interface{}{"pod", "deployment", "params", 0, "param"},
nil,
},
{
"multi-level array access alternate syntax",
`pod.deployment.params[0]["param"]`,
[]interface{}{"pod", "deployment", "params", 0, "param"},
nil,
},
{
"empty",
``,
nil,
nil,
},
{
"invalid field access",
`field with space`,
nil,
fmt.Errorf("syntax error: unexpected FIELD"),
},
{
"missing opening square bracket",
`"pod"]`,
nil,
fmt.Errorf("syntax error: unexpected STRING, expecting LSB or FIELD"),
},
{
"missing closing square bracket",
`["pod"`,
nil,
fmt.Errorf("syntax error: unexpected $end, expecting RSB"),
},
{
"missing closing square bracket",
`["pod""deployment"]`,
nil,
fmt.Errorf("syntax error: unexpected STRING, expecting RSB"),
},
{
"invalid nesting",
`pod..uuid`,
nil,
fmt.Errorf("syntax error: unexpected DOT, expecting FIELD"),
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
parsed, err := Parse(tt.expression, false)
require.Equal(t, tt.want, parsed)
if tt.error == nil {
return
}
require.NotNil(t, err)
require.Equal(t, tt.error.Error(), err.Error())
})
}
}

@ -0,0 +1,164 @@
package jsonexpr
import (
"bufio"
"fmt"
"io"
"strconv"
"text/scanner"
)
type Scanner struct {
buf *bufio.Reader
data []interface{}
err error
debug bool
}
func NewScanner(r io.Reader, debug bool) *Scanner {
return &Scanner{
buf: bufio.NewReader(r),
debug: debug,
}
}
func (sc *Scanner) Error(s string) {
sc.err = fmt.Errorf(s)
fmt.Printf("syntax error: %s\n", s)
}
func (sc *Scanner) Reduced(rule, state int, lval *JSONExprSymType) bool {
if sc.debug {
fmt.Printf("rule: %v; state %v; lval: %v\n", rule, state, lval)
}
return false
}
func (sc *Scanner) Lex(lval *JSONExprSymType) int {
return sc.lex(lval)
}
func (sc *Scanner) lex(lval *JSONExprSymType) int {
for {
r := sc.read()
if r == 0 {
return 0
}
if isWhitespace(r) {
continue
}
if isDigit(r) {
sc.unread()
val, err := sc.scanInt()
if err != nil {
sc.err = fmt.Errorf(err.Error())
return 0
}
lval.int = val
return INDEX
}
switch true {
case r == '[':
return LSB
case r == ']':
return RSB
case r == '.':
return DOT
case isIdentifier(r):
sc.unread()
lval.field = sc.scanField()
return FIELD
case r == '"':
sc.unread()
lval.str = sc.scanStr()
return STRING
default:
sc.err = fmt.Errorf("unexpected char %c", r)
return 0
}
}
}
func isIdentifier(r rune) bool {
return (r >= 'a' && r <= 'z') || (r >= 'A' && r <= 'Z') || r == '_'
}
func (sc *Scanner) scanField() string {
var str []rune
for {
r := sc.read()
if !isIdentifier(r) {
sc.unread()
break
}
if r == '.' || r == scanner.EOF || r == rune(0) {
sc.unread()
break
}
str = append(str, r)
}
return string(str)
}
func (sc *Scanner) scanStr() string {
var str []rune
//begin with ", end with "
r := sc.read()
if r != '"' {
sc.err = fmt.Errorf("unexpected char %c", r)
return ""
}
for {
r := sc.read()
if r == '"' || r == ']' {
break
}
str = append(str, r)
}
return string(str)
}
func (sc *Scanner) scanInt() (int, error) {
var number []rune
for {
r := sc.read()
if r == '.' && len(number) > 0 {
return 0, fmt.Errorf("cannot use float as array index")
}
if isWhitespace(r) || r == '.' || r == ']' {
sc.unread()
break
}
if !isDigit(r) {
return 0, fmt.Errorf("non-integer value: %c", r)
}
number = append(number, r)
}
return strconv.Atoi(string(number))
}
func (sc *Scanner) read() rune {
ch, _, _ := sc.buf.ReadRune()
return ch
}
func (sc *Scanner) unread() { _ = sc.buf.UnreadRune() }
func isWhitespace(ch rune) bool { return ch == ' ' || ch == '\t' || ch == '\n' }
func isDigit(r rune) bool {
return r >= '0' && r <= '9'
}

@ -0,0 +1,19 @@
package jsonexpr
import (
"strings"
)
func init() {
JSONExprErrorVerbose = true
}
func Parse(expr string, debug bool) ([]interface{}, error) {
s := NewScanner(strings.NewReader(expr), debug)
JSONExprParse(s)
if s.err != nil {
return nil, s.err
}
return s.data, nil
}

@ -6,6 +6,7 @@ import (
"io"
"regexp"
"github.com/grafana/loki/pkg/logql/log/jsonexpr"
"github.com/grafana/loki/pkg/logql/log/logfmt"
jsoniter "github.com/json-iterator/go"
@ -259,3 +260,53 @@ func (l *LogfmtParser) Process(line []byte, lbs *LabelsBuilder) ([]byte, bool) {
}
func (l *LogfmtParser) RequiredLabelNames() []string { return []string{} }
type JSONExpressionParser struct {
expressions map[string][]interface{}
}
func NewJSONExpressionParser(expressions []JSONExpression) (*JSONExpressionParser, error) {
var paths = make(map[string][]interface{})
for _, exp := range expressions {
path, err := jsonexpr.Parse(exp.Expression, false)
if err != nil {
return nil, fmt.Errorf("cannot parse expression [%s]: %w", exp.Expression, err)
}
if !model.LabelName(exp.Identifier).IsValid() {
return nil, fmt.Errorf("invalid extracted label name '%s'", exp.Identifier)
}
paths[exp.Identifier] = path
}
return &JSONExpressionParser{
expressions: paths,
}, nil
}
func (j *JSONExpressionParser) Process(line []byte, lbs *LabelsBuilder) ([]byte, bool) {
if lbs.ParserLabelHints().NoLabels() {
return line, true
}
if !jsoniter.ConfigFastest.Valid(line) {
lbs.SetErr(errJSON)
return line, true
}
for identifier, paths := range j.expressions {
result := jsoniter.ConfigFastest.Get(line, paths...).ToString()
if lbs.BaseHas(identifier) {
identifier = identifier + duplicateSuffix
}
lbs.Set(identifier, result)
}
return line, true
}
func (j *JSONExpressionParser) RequiredLabelNames() []string { return []string{} }

@ -1,6 +1,7 @@
package log
import (
"fmt"
"sort"
"testing"
@ -87,6 +88,298 @@ func Test_jsonParser_Parse(t *testing.T) {
}
}
func TestJSONExpressionParser(t *testing.T) {
testLine := []byte(`{"app":"foo","field with space":"value","field with ÜFT8👌":"value","null_field":null,"bool_field":false,"namespace":"prod","pod":{"uuid":"foo","deployment":{"ref":"foobar", "params": [1,2,3]}}}`)
tests := []struct {
name string
line []byte
expressions []JSONExpression
lbs labels.Labels
want labels.Labels
}{
{
"single field",
testLine,
[]JSONExpression{
NewJSONExpr("app", "app"),
},
labels.Labels{},
labels.Labels{
{Name: "app", Value: "foo"},
},
},
{
"alternate syntax",
testLine,
[]JSONExpression{
NewJSONExpr("test", `["field with space"]`),
},
labels.Labels{},
labels.Labels{
{Name: "test", Value: "value"},
},
},
{
"multiple fields",
testLine,
[]JSONExpression{
NewJSONExpr("app", "app"),
NewJSONExpr("namespace", "namespace"),
},
labels.Labels{},
labels.Labels{
{Name: "app", Value: "foo"},
{Name: "namespace", Value: "prod"},
},
},
{
"utf8",
testLine,
[]JSONExpression{
NewJSONExpr("utf8", `["field with ÜFT8👌"]`),
},
labels.Labels{},
labels.Labels{
{Name: "utf8", Value: "value"},
},
},
{
"nested field",
testLine,
[]JSONExpression{
NewJSONExpr("uuid", "pod.uuid"),
},
labels.Labels{},
labels.Labels{
{Name: "uuid", Value: "foo"},
},
},
{
"nested field alternate syntax",
testLine,
[]JSONExpression{
NewJSONExpr("uuid", `pod["uuid"]`),
},
labels.Labels{},
labels.Labels{
{Name: "uuid", Value: "foo"},
},
},
{
"nested field alternate syntax 2",
testLine,
[]JSONExpression{
NewJSONExpr("uuid", `["pod"]["uuid"]`),
},
labels.Labels{},
labels.Labels{
{Name: "uuid", Value: "foo"},
},
},
{
"nested field alternate syntax 3",
testLine,
[]JSONExpression{
NewJSONExpr("uuid", `["pod"].uuid`),
},
labels.Labels{},
labels.Labels{
{Name: "uuid", Value: "foo"},
},
},
{
"array element",
testLine,
[]JSONExpression{
NewJSONExpr("param", `pod.deployment.params[0]`),
},
labels.Labels{},
labels.Labels{
{Name: "param", Value: "1"},
},
},
{
"full array",
testLine,
[]JSONExpression{
NewJSONExpr("params", `pod.deployment.params`),
},
labels.Labels{},
labels.Labels{
{Name: "params", Value: "[1,2,3]"},
},
},
{
"full object",
testLine,
[]JSONExpression{
NewJSONExpr("deployment", `pod.deployment`),
},
labels.Labels{},
labels.Labels{
{Name: "deployment", Value: `{"ref":"foobar", "params": [1,2,3]}`},
},
},
{
"expression matching nothing",
testLine,
[]JSONExpression{
NewJSONExpr("nope", `pod.nope`),
},
labels.Labels{},
labels.Labels{
labels.Label{Name: "nope", Value: ""},
},
},
{
"null field",
testLine,
[]JSONExpression{
NewJSONExpr("nf", `null_field`),
},
labels.Labels{},
labels.Labels{
labels.Label{Name: "nf", Value: ""}, // null is coerced to an empty string
},
},
{
"boolean field",
testLine,
[]JSONExpression{
NewJSONExpr("bool", `bool_field`),
},
labels.Labels{},
labels.Labels{
{Name: "bool", Value: `false`},
},
},
{
"label override",
testLine,
[]JSONExpression{
NewJSONExpr("uuid", `pod.uuid`),
},
labels.Labels{
{Name: "uuid", Value: "bar"},
},
labels.Labels{
{Name: "uuid", Value: "bar"},
{Name: "uuid_extracted", Value: "foo"},
},
},
{
"non-matching expression",
testLine,
[]JSONExpression{
NewJSONExpr("request_size", `request.size.invalid`),
},
labels.Labels{
{Name: "uuid", Value: "bar"},
},
labels.Labels{
{Name: "uuid", Value: "bar"},
{Name: "request_size", Value: ""},
},
},
{
"empty line",
[]byte("{}"),
[]JSONExpression{
NewJSONExpr("uuid", `pod.uuid`),
},
labels.Labels{},
labels.Labels{
labels.Label{Name: "uuid", Value: ""},
},
},
{
"existing labels are not affected",
testLine,
[]JSONExpression{
NewJSONExpr("uuid", `will.not.work`),
},
labels.Labels{
{Name: "foo", Value: "bar"},
},
labels.Labels{
{Name: "foo", Value: "bar"},
{Name: "uuid", Value: ""},
},
},
{
"invalid JSON line",
[]byte(`invalid json`),
[]JSONExpression{
NewJSONExpr("uuid", `will.not.work`),
},
labels.Labels{
{Name: "foo", Value: "bar"},
},
labels.Labels{
{Name: "foo", Value: "bar"},
{Name: ErrorLabel, Value: errJSON},
},
},
}
for _, tt := range tests {
j, err := NewJSONExpressionParser(tt.expressions)
if err != nil {
t.Fatalf("cannot create JSON expression parser: %s", err.Error())
}
t.Run(tt.name, func(t *testing.T) {
b := NewBaseLabelsBuilder().ForLabels(tt.lbs, tt.lbs.Hash())
b.Reset()
_, _ = j.Process(tt.line, b)
sort.Sort(tt.want)
require.Equal(t, tt.want, b.Labels())
})
}
}
func TestJSONExpressionParserFailures(t *testing.T) {
tests := []struct {
name string
expression JSONExpression
error string
}{
{
"invalid field name",
NewJSONExpr("app", `field with space`),
"unexpected FIELD",
},
{
"missing opening square bracket",
NewJSONExpr("app", `"pod"]`),
"unexpected STRING, expecting LSB or FIELD",
},
{
"missing closing square bracket",
NewJSONExpr("app", `["pod"`),
"unexpected $end, expecting RSB",
},
{
"missing closing square bracket",
NewJSONExpr("app", `["pod""uuid"]`),
"unexpected STRING, expecting RSB",
},
{
"invalid nesting",
NewJSONExpr("app", `pod..uuid`),
"unexpected DOT, expecting FIELD",
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
_, err := NewJSONExpressionParser([]JSONExpression{tt.expression})
require.NotNil(t, err)
require.Equal(t, err.Error(), fmt.Sprintf("cannot parse expression [%s]: syntax error: %s", tt.expression.Expression, tt.error))
})
}
}
func Benchmark_Parser(b *testing.B) {
lbs := labels.Labels{
{Name: "cluster", Value: "qa-us-central1"},

@ -93,3 +93,89 @@ func mustFilter(f Filterer, err error) Filterer {
}
return f
}
func jsonBenchmark(b *testing.B, parser Stage) {
b.ReportAllocs()
p := NewPipeline([]Stage{
mustFilter(NewFilter("metrics.go", labels.MatchEqual)).ToStage(),
parser,
})
line := []byte(`{"ts":"2020-12-27T09:15:54.333026285Z","error":"action could not be completed", "context":{"file": "metrics.go"}}`)
lbs := labels.Labels{
{Name: "cluster", Value: "ops-tool1"},
{Name: "name", Value: "querier"},
{Name: "pod", Value: "querier-5896759c79-q7q9h"},
{Name: "stream", Value: "stderr"},
{Name: "container", Value: "querier"},
{Name: "namespace", Value: "loki-dev"},
{Name: "job", Value: "loki-dev/querier"},
{Name: "pod_template_hash", Value: "5896759c79"},
}
b.ResetTimer()
sp := p.ForStream(lbs)
for n := 0; n < b.N; n++ {
resLine, resLbs, resOK = sp.Process(line)
if !resOK {
b.Fatalf("resulting line not ok: %s\n", line)
}
if resLbs.Labels().Get("context_file") != "metrics.go" {
b.Fatalf("label was not extracted correctly! %+v\n", resLbs)
}
}
}
func invalidJSONBenchmark(b *testing.B, parser Stage) {
b.ReportAllocs()
p := NewPipeline([]Stage{
mustFilter(NewFilter("invalid json", labels.MatchEqual)).ToStage(),
parser,
})
line := []byte(`invalid json`)
b.ResetTimer()
sp := p.ForStream(labels.Labels{})
for n := 0; n < b.N; n++ {
resLine, resLbs, resOK = sp.Process(line)
if !resOK {
b.Fatalf("resulting line not ok: %s\n", line)
}
if resLbs.Labels().Get(ErrorLabel) != errJSON {
b.Fatalf("no %s label found: %+v\n", ErrorLabel, resLbs.Labels())
}
}
}
func BenchmarkJSONParser(b *testing.B) {
jsonBenchmark(b, NewJSONParser())
}
func BenchmarkJSONParserInvalidLine(b *testing.B) {
invalidJSONBenchmark(b, NewJSONParser())
}
func BenchmarkJSONExpressionParser(b *testing.B) {
parser, err := NewJSONExpressionParser([]JSONExpression{
NewJSONExpr("context_file", "context.file"),
})
if err != nil {
b.Fatal("cannot create new JSON expression parser")
}
jsonBenchmark(b, parser)
}
func BenchmarkJSONExpressionParserInvalidLine(b *testing.B) {
parser, err := NewJSONExpressionParser([]JSONExpression{
NewJSONExpr("context_file", "some.expression"),
})
if err != nil {
b.Fatal("cannot create new JSON expression parser")
}
invalidJSONBenchmark(b, parser)
}

@ -2193,6 +2193,40 @@ func TestParse(t *testing.T) {
},
},
},
{
in: `{app="foo"} | json bob="top.sub[\"index\"]"`,
exp: &pipelineExpr{
left: newMatcherExpr([]*labels.Matcher{{Type: labels.MatchEqual, Name: "app", Value: "foo"}}),
pipeline: MultiStageExpr{
newJSONExpressionParser([]log.JSONExpression{
log.NewJSONExpr("bob", `top.sub["index"]`),
}),
},
},
},
{
in: `{app="foo"} | json bob="top.params[0]"`,
exp: &pipelineExpr{
left: newMatcherExpr([]*labels.Matcher{{Type: labels.MatchEqual, Name: "app", Value: "foo"}}),
pipeline: MultiStageExpr{
newJSONExpressionParser([]log.JSONExpression{
log.NewJSONExpr("bob", `top.params[0]`),
}),
},
},
},
{
in: `{app="foo"} | json response_code="response.code", api_key="request.headers[\"X-API-KEY\"]"`,
exp: &pipelineExpr{
left: newMatcherExpr([]*labels.Matcher{{Type: labels.MatchEqual, Name: "app", Value: "foo"}}),
pipeline: MultiStageExpr{
newJSONExpressionParser([]log.JSONExpression{
log.NewJSONExpr("response_code", `response.code`),
log.NewJSONExpr("api_key", `request.headers["X-API-KEY"]`),
}),
},
},
},
} {
t.Run(tc.in, func(t *testing.T) {
ast, err := ParseExpr(tc.in)

Loading…
Cancel
Save