# Processing Log Lines A detailed look at how to setup promtail to process your log lines, including extracting metrics and labels. * [Pipeline](#pipeline) * [Stages](#stages) ## Pipeline Pipeline stages implement the following interface: ```go type Stage interface { Process(labels model.LabelSet, extracted map[string]interface{}, time *time.Time, entry *string) } ``` Any Stage is capable of modifying the `labels`, `extracted` data, `time`, and/or `entry`, though generally a Stage should only modify one of those things to reduce complexity. Typical pipelines will start with a [regex](#regex) or [json](#json) stage to extract data from the log line. Then any combination of other stages follow to use the data in the `extracted` map. It may also be common to see the use of [match](#match) at the start of a pipeline to selectively apply stages based on labels. The example below gives a good glimpse of what you can achieve with a pipeline : ```yaml scrape_configs: - job_name: kubernetes-pods-name kubernetes_sd_configs: .... pipeline_stages: - match: selector: '{name="promtail"}' stages: - regex: expression: '.*level=(?P[a-zA-Z]+).*ts=(?P[T\d-:.Z]*).*component=(?P[a-zA-Z]+)' - labels: level: component: - timestamp: format: RFC3339Nano source: timestamp - match: selector: '{name="nginx"}' stages: - regex: expression: \w{1,3}.\w{1,3}.\w{1,3}.\w{1,3}(?P.*) - output: source: output - match: selector: '{name="jaeger-agent"}' stages: - json: expressions: level: level - labels: level: - job_name: kubernetes-pods-app kubernetes_sd_configs: .... pipeline_stages: - match: selector: '{app=~"grafana|prometheus"}' stages: - regex: expression: ".*(lvl|level)=(?P[a-zA-Z]+).*(logger|component)=(?P[a-zA-Z]+)" - labels: level: component: - match: selector: '{app="some-app"}' stages: - regex: expression: ".*(?Ppanic: .*)" - metrics: - panic_total: type: Counter description: "total count of panic" source: panic config: action: inc ``` In the first job: The first `match` stage will only run if a label named `name` == `promtail`, it then applies a regex to parse the line, followed by setting two labels (level and component) and the timestamp from extracted data. The second `match` stage will only run if a label named `name` == `nginx`, it then parses the log line with regex and extracts the `output` which is then set as the log line output sent to loki The third `match` stage will only run if label named `name` == `jaeger-agent`, it then parses this log as JSON extracting `level` which is then set as a label In the second job: The first `match` stage will only run if a label named `app` == `grafana` or `prometheus`, it then parses the log line with regex, and sets two new labels of level and component from the extracted data. The second `match` stage will only run if a label named `app` == `some-app`, it then parses the log line and creates an extracted key named panic if it finds `panic: ` in the log line. Then a metrics stage will increment a counter if the extracted key `panic` is found in the `extracted` map. More info on each field in the interface: ##### labels A set of prometheus style labels which will be sent with the log line and will be indexed by Loki. ##### extracted metadata extracted during the pipeline execution which can be used by subsequent stages. This data is not sent with the logs and is dropped after the log entry is processed through the pipeline. For example, stages like [regex](#regex) and [json](#json) will use expressions to extract data from a log line and store it in the `extracted` map, which following stages like [timestamp](#timestamp) or [output](#output) can use to manipulate the log lines `time` and `entry`. ##### time The timestamp which loki will store for the log line, if not set within the pipeline using the [time](#time) stage, it will default to time.Now(). ##### entry The log line which will be stored by loki, the [output](#output) stage is capable of modifying this value, if no stage modifies this value the log line stored will match what was input to the system and not be modified. ## Stages Extracting data (for use by other stages) * [regex](#regex) - use regex to extract data * [json](#json) - parse a JSON log and extract data Modifying extracted data * [template](#template) - use Go templates to modify extracted data Filtering stages * [match](#match) - apply selectors to conditionally run stages based on labels Mutating/manipulating output * [timestamp](#timestamp) - set the timestamp sent to Loki * [output](#output) - set the log content sent to Loki Adding Labels * [labels](#labels) - add labels to the log stream Metrics * [metrics](#metrics) - calculate metrics from the log content ### regex A regex stage will take the provided regex and set the named groups as data in the `extracted` map. ```yaml - regex: expression: ① source: ② ``` ① `expression` is **required** and needs to be a [golang RE2 regex string](https://github.com/google/re2/wiki/Syntax). Every capture group `(re)` will be set into the `extracted` map, every capture group **must be named:** `(?Pre)`, the name will be used as the key in the map. ② `source` is optional and contains the name of key in the `extracted` map containing the data to parse. If omitted, the regex stage will parse the log `entry`. ##### Example (without source): ```yaml - regex: expression: "^(?s)(?P