--- title: Scraping --- # Promtail Scraping (Service Discovery) ## File Target Discovery Promtail discovers locations of log files and extract labels from them through the `scrape_configs` section in the config YAML. The syntax is identical to what [Prometheus uses](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config). `scrape_configs` contains one or more entries which are executed for each discovered target (i.e., each container in each new pod running in the instance): ``` scrape_configs: - job_name: local static_configs: - ... - job_name: kubernetes kubernetes_sd_config: - ... ``` If more than one scrape config section matches your logs, you will get duplicate entries as the logs are sent in different streams likely with slightly different labels. There are different types of labels present in Promtail: - Labels starting with `__` (two underscores) are internal labels. They usually come from dynamic sources like service discovery. Once relabeling is done, they are removed from the label set. To persist internal labels so they're sent to Grafana Loki, rename them so they don't start with `__`. See [Relabeling](#relabeling) for more information. - Labels starting with `__meta_kubernetes_pod_label_*` are "meta labels" which are generated based on your Kubernetes pod's labels. For example, if your Kubernetes pod has a label `name` set to `foobar`, then the `scrape_configs` section will receive an internal label `__meta_kubernetes_pod_label_name` with a value set to `foobar`. - Other labels starting with `__meta_kubernetes_*` exist based on other Kubernetes metadata, such as the namespace of the pod (`__meta_kubernetes_namespace`) or the name of the container inside the pod (`__meta_kubernetes_pod_container_name`). Refer to [the Prometheus docs](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config) for the full list of Kubernetes meta labels. - The `__path__` label is a special label which Promtail uses after discovery to figure out where the file to read is located. Wildcards are allowed, for example `/var/log/*.log` to get all files with a `log` extension in the specified directory, and `/var/log/**/*.log` for matching files and directories recursively. For a full list of options check out the docs for the [library](https://github.com/bmatcuk/doublestar) Promtail uses. - The `__path_exclude__` label is another special label Promtail uses after discovery, to exclude a subset of the files discovered using `__path__` from being read in the current scrape_config block. It uses the same [library](https://github.com/bmatcuk/doublestar) to enable usage of wildcards and glob patterns. - The label `filename` is added for every file found in `__path__` to ensure the uniqueness of the streams. It is set to the absolute path of the file the line was read from. ### Kubernetes Discovery Note that while Promtail can utilize the Kubernetes API to discover pods as targets, it can only read log files from pods that are running on the same node as the one Promtail is running on. Promtail looks for a `__host__` label on each target and validates that it is set to the same hostname as Promtail's (using either `$HOSTNAME` or the hostname reported by the kernel if the environment variable is not set). This means that any time Kubernetes service discovery is used, there must be a `relabel_config` that creates the intermediate label `__host__` from `__meta_kubernetes_pod_node_name`: ```yaml relabel_configs: - source_labels: ['__meta_kubernetes_pod_node_name'] target_label: '__host__' ``` See [Relabeling](#relabeling) for more information. For more information on how to configure the service discovery see the [Kubernetes Service Discovery configuration](../configuration/#kubernetes_sd_config). ## Journal Scraping (Linux Only) On systems with `systemd`, Promtail also supports reading from the journal. Unlike file scraping which is defined in the `static_configs` stanza, journal scraping is defined in a `journal` stanza: ```yaml scrape_configs: - job_name: journal journal: json: false max_age: 12h path: /var/log/journal matches: _TRANSPORT=kernel labels: job: systemd-journal relabel_configs: - source_labels: ['__journal__systemd_unit'] target_label: 'unit' ``` All fields defined in the `journal` section are optional, and are just provided here for reference. The `max_age` field ensures that no older entry than the time specified will be sent to Loki; this circumvents "entry too old" errors. The `path` field tells Promtail where to read journal entries from. The labels map defines a constant list of labels to add to every journal entry that Promtail reads. The `matches` field adds journal filters. If multiple filters are specified matching different fields, the log entries are filtered by both, if two filters apply to the same field, then they are automatically matched as alternatives. When the `json` field is set to `true`, messages from the journal will be passed through the pipeline as JSON, keeping all of the original fields from the journal entry. This is useful when you don't want to index some fields but you still want to know what values they contained. By default, Promtail reads from the journal by looking in the `/var/log/journal` and `/run/log/journal` paths. If running Promtail inside of a Docker container, the path appropriate to your distribution should be bind mounted inside of Promtail along with binding `/etc/machine-id`. Bind mounting `/etc/machine-id` to the path of the same name is required for the journal reader to know which specific journal to read from. For example: ```bash docker run \ -v /var/log/journal/:/var/log/journal/ \ -v /run/log/journal/:/run/log/journal/ \ -v /etc/machine-id:/etc/machine-id \ grafana/promtail:latest \ -config.file=/path/to/config/file.yaml ``` When Promtail reads from the journal, it brings in all fields prefixed with `__journal_` as internal labels. Like in the example above, the `_SYSTEMD_UNIT` field from the journal was transformed into a label called `unit` through `relabel_configs`. See [Relabeling](#relabeling) for more information, also look at [the systemd man pages](https://www.freedesktop.org/software/systemd/man/systemd.journal-fields.html) for a list of fields exposed by the journal. Here's an example where the `SYSTEMD_UNIT`, `HOSTNAME`, and `SYSLOG_IDENTIFIER` are relabeled for use in Loki. Keep in mind that labels prefixed with `__` will be dropped, so relabeling is required to keep these labels. ```yaml - job_name: systemd-journal journal: labels: cluster: ops-tools1 job: default/systemd-journal path: /var/log/journal relabel_configs: - source_labels: - __journal__systemd_unit target_label: systemd_unit - source_labels: - __journal__hostname target_label: nodename - source_labels: - __journal_syslog_identifier target_label: syslog_identifier ``` ## Windows Event Log On Windows Promtail supports reading from the event log. Windows event targets can be configured using the `windows_events` stanza: ```yaml scrape_configs: - job_name: windows windows_events: use_incoming_timestamp: false bookmark_path: "./bookmark.xml" eventlog_name: "Application" xpath_query: '*' labels: job: windows relabel_configs: - source_labels: ['computer'] target_label: 'host' ``` When Promtail receives an event it will attach the `channel` and `computer` labels and serialize the event in json. You can relabel default labels via [Relabeling](#relabeling) if required. Providing a path to a bookmark is mandatory, it will be used to persist the last event processed and allow resuming the target without skipping logs. see the [configuration](https://grafana.com/docs/loki/latest/clients/promtail/configuration/#windows_events) section for more information. ## GCP Log scraping Promtail supports scraping cloud resource logs such as GCS bucket logs, load balancer logs, and Kubernetes cluster logs from GCP. Configuration is specified in the `gcplog` section, within `scrape_config`. There are two kind of scraping strategies: `pull` and `push`. ### Pull ```yaml - job_name: gcplog gcplog: subscription_type: "pull" # If the `subscription_type` field is empty, defaults to `pull` project_id: "my-gcp-project" subscription: "my-pubsub-subscription" use_incoming_timestamp: false # default rewrite timestamps. labels: job: "gcplog" relabel_configs: - source_labels: ['__gcp_resource_type'] target_label: 'resource_type' - source_labels: ['__gcp_resource_labels_project_id'] target_label: 'project' ``` Here `project_id` and `subscription` are the only required fields. - `project_id` is the GCP project id. - `subscription` is the GCP pubsub subscription where Promtail can consume log entries from. Before using `gcplog` target, GCP should be [configured](../gcplog-cloud) with pubsub subscription to receive logs from. It also supports `relabeling` and `pipeline` stages just like other targets. When Promtail receives GCP logs, various internal labels are made available for [relabeling](#relabeling): - `__gcp_logname` - `__gcp_resource_type` - `__gcp_resource_labels_` In the example above, the `project_id` label from a GCP resource was transformed into a label called `project` through `relabel_configs`. ### Push ```yaml - job_name: gcplog gcplog: subscription_type: "push" use_incoming_timestamp: false labels: job: "gcplog-push" server: http_listen_address: 0.0.0.0 http_listen_port: 8080 relabel_configs: - source_labels: ['__gcp_message_id'] target_label: 'message_id' - source_labels: ['__gcp_attributes_logging_googleapis_com_timestamp'] target_label: 'incoming_ts' ``` When configuring the GCP Log push target, Promtail will start an HTTP server listening on port `8080`, as configured in the `server` section. This server exposes the single endpoint `POST /gcp/api/v1/push`, responsible for receiving logs from GCP. For Google's PubSub to be able to send logs, **Promtail server must be publicly accessible, and support HTTPS**. For that, Promtail can be deployed as part of a larger orchestration service like Kubernetes, which can handle HTTPS traffic through an ingress, or it can be hosted behind a proxy/gateway, offloading the HTTPS to that component and routing the request to Promtail. Once that's solved, GCP can be [configured](../gcplog-cloud) to send logs to Promtail. It also supports `relabeling` and `pipeline` stages. When Promtail receives GCP logs, various internal labels are made available for [relabeling](#relabeling): - `__gcp_message_id` - `__gcp_subscription_name` - `__gcp_attributes_` - `__gcp_logname` - `__gcp_resource_type` - `__gcp_resource_labels_` In the example above, the `__gcp_message_id` and the `__gcp_attributes_logging_googleapis_com_timestamp` labels are transformed to `message_id` and `incoming_ts` through `relabel_configs`. All other internal labels, for example some other attribute, will be dropped by the target if not transformed. ## Syslog Receiver Promtail supports receiving [IETF Syslog (RFC5424)](https://tools.ietf.org/html/rfc5424) messages from a TCP or UDP stream. Receiving syslog messages is defined in a `syslog` stanza: ```yaml scrape_configs: - job_name: syslog syslog: listen_address: 0.0.0.0:1514 listen_protocol: tcp idle_timeout: 60s label_structured_data: yes labels: job: "syslog" relabel_configs: - source_labels: ['__syslog_message_hostname'] target_label: 'host' ``` The only required field in the syslog section is the `listen_address` field, where a valid network address must be provided. The default protocol for receiving messages is TCP. To change the protocol, the `listen_protocol` field can be changed to `udp`. Note, that UDP does not support TLS. The `idle_timeout` can help with cleaning up stale syslog connections. If `label_structured_data` is set, [structured data](https://tools.ietf.org/html/rfc5424#section-6.3) in the syslog header will be translated to internal labels in the form of `__syslog_message_sd__`. The labels map defines a constant list of labels to add to every journal entry that Promtail reads. Note that it is recommended to deploy a dedicated syslog forwarder like **syslog-ng** or **rsyslog** in front of Promtail. The forwarder can take care of the various specifications and transports that exist (UDP, BSD syslog, ...). See recommended output configurations for [syslog-ng](#syslog-ng-output-configuration) and [rsyslog](#rsyslog-output-configuration). When Promtail receives syslog messages, it brings in all header fields, parsed from the received message, prefixed with `__syslog_` as internal labels. Like in the example above, the `__syslog_message_hostname` field from the journal was transformed into a label called `host` through `relabel_configs`. See [Relabeling](#relabeling) for more information. ### Syslog-NG Output Configuration ``` destination d_loki { syslog("localhost" transport("tcp") port()); }; ``` ### Rsyslog Output Configuration For sending messages via TCP: ``` *.* action(type="omfwd" protocol="tcp" target="" port="" Template="RSYSLOG_SyslogProtocol23Format" TCP_Framing="octet-counted" KeepAlive="on") ``` For sending messages via UDP: ``` *.* action(type="omfwd" protocol="udp" target="" port="" Template="RSYSLOG_SyslogProtocol23Format") ``` ## Kafka Promtail supports reading message from Kafka using a consumer group. The Kafka targets can be configured using the `kafka` stanza: ```yaml scrape_configs: - job_name: kafka kafka: brokers: - my-kafka-0.org:50705 - my-kafka-1.org:50705 topics: - ^promtail.* - some_fixed_topic labels: job: kafka relabel_configs: - action: replace source_labels: - __meta_kafka_topic target_label: topic - action: replace source_labels: - __meta_kafka_partition target_label: partition - action: replace source_labels: - __meta_kafka_group_id target_label: group - action: replace source_labels: - __meta_kafka_message_key target_label: message_key ``` Only the `brokers` and `topics` is required. see the [configuration](../../configuration/#kafka) section for more information. ## GELF GELF support in Promtail is an experimental feature. Promtail supports listening message using the [GELF](https://docs.graylog.org/docs/gelf) UDP protocol. The GELF targets can be configured using the `gelf` stanza: ```yaml scrape_configs: - job_name: gelf gelf: listen_address: "0.0.0.0:12201" use_incoming_timestamp: true labels: job: gelf relabel_configs: - action: replace source_labels: - __gelf_message_host target_label: host - action: replace source_labels: - __gelf_message_level target_label: level - action: replace source_labels: - __gelf_message_facility target_label: facility ``` ## Cloudflare Promtail supports pulling HTTP log messages from Cloudflare using the [Logpull API](https://developers.cloudflare.com/logs/logpull). The Cloudflare targets can be configured with a `cloudflare` block: ```yaml scrape_configs: - job_name: cloudflare cloudflare: api_token: REDACTED zone_id: REDACTED fields_type: all labels: job: cloudflare-foo.com ``` Only `api_token` and `zone_id` are required. Refer to the [Cloudfare](configuration/#cloudflare) configuration section for details. ## Heroku Drain Promtail supports receiving logs from a Heroku application by using a [Heroku HTTPS Drain](https://devcenter.heroku.com/articles/log-drains#https-drains). Configuration is specified in a`heroku_drain` block within the Promtail `scrape_config` configuration. ```yaml - job_name: heroku_drain heroku_drain: server: http_listen_address: 0.0.0.0 http_listen_port: 8080 labels: job: heroku_drain_docs use_incoming_timestamp: true relabel_configs: - source_labels: ['__heroku_drain_host'] target_label: 'host' - source_labels: ['__heroku_drain_app'] target_label: 'app' - source_labels: ['__heroku_drain_proc'] target_label: 'proc' - source_labels: ['__heroku_drain_log_id'] target_label: 'log_id' ``` Within the `scrape_configs` configuration for a Heroku Drain target, the `job_name` must be a Prometheus-compatible [metric name](https://prometheus.io/docs/concepts/data_model/#metric-names-and-labels). The [server](../configuration.md#server) section configures the HTTP server created for receiving logs. `labels` defines a static set of label values added to each received log entry. `use_incoming_timestamp` can be used to pass the timestamp received from Heroku. Before using a `heroku_drain` target, Heroku should be configured with the URL where the Promtail instance will be listening. Follow the steps in [Heroku HTTPS Drain docs](https://devcenter.heroku.com/articles/log-drains#https-drains) for using the Heroku CLI with a command like the following: ``` heroku drains:add [http|https]://HOSTNAME:8080/heroku/api/v1/drain -a HEROKU_APP_NAME ``` It also supports `relabeling` and `pipeline` stages just like other targets. When Promtail receives Heroku Drain logs, various internal labels are made available for [relabeling](#relabeling): - `__heroku_drain_host` - `__heroku_drain_app` - `__heroku_drain_proc` - `__heroku_drain_log_id` In the example above, the `project_id` label from a GCP resource was transformed into a label called `project` through `relabel_configs`. ## Relabeling Each `scrape_configs` entry can contain a `relabel_configs` stanza. `relabel_configs` is a list of operations to transform the labels from discovery into another form. A single entry in `relabel_configs` can also reject targets by doing an `action: drop` if a label value matches a specified regex. When a target is dropped, the owning `scrape_config` will not process logs from that particular source. Other `scrape_configs` without the drop action reading from the same target may still use and forward logs from it to Loki. A common use case of `relabel_configs` is to transform an internal label such as `__meta_kubernetes_*` into an intermediate internal label such as `__service__`. The intermediate internal label may then be dropped based on value or transformed to a final external label, such as `__job__`. ### Examples - Drop the target if a label (`__service__` in the example) is empty: ```yaml - action: drop regex: '' source_labels: - __service__ ``` - Drop the target if any of the `source_labels` contain a value: ```yaml - action: drop regex: .+ separator: '' source_labels: - __meta_kubernetes_pod_label_name - __meta_kubernetes_pod_label_app ``` - Persist an internal label by renaming it so it will be sent to Loki: ```yaml - action: replace source_labels: - __meta_kubernetes_namespace target_label: namespace ``` - Persist all Kubernetes pod labels by mapping them, like by mapping `__meta_kube__meta_kubernetes_pod_label_foo` to `foo`. ```yaml - action: labelmap regex: __meta_kubernetes_pod_label_(.+) ``` Additional reading: - [Julien Pivotto's slides from PromConf Munich, 2017](https://www.slideshare.net/roidelapluie/taking-advantage-of-prometheus-relabeling-109483749) ## HTTP client options Promtail uses the Prometheus HTTP client implementation for all calls to Loki. Therefore it can be configured using the `clients` stanza, where one or more connections to Loki can be established: ```yaml clients: - [ ] ``` Refer to [`client_config`](./configuration#client_config) from the Promtail Configuration reference for all available options.