--- title: Upgrading weight: 250 --- # Upgrading Grafana Loki Every attempt is made to keep Grafana Loki backwards compatible, such that upgrades should be low risk and low friction. Unfortunately Loki is software and software is hard and sometimes we are forced to make decisions between ease of use and ease of maintenance. If we have any expectation of difficulty upgrading we will document it here. As more versions are released it becomes more likely unexpected problems arise moving between multiple versions at once. If possible try to stay current and do sequential updates. If you want to skip versions, try it in a development environment before attempting to upgrade production. # Checking for config changes Using docker you can check changes between 2 versions of Loki with a command like this: ``` export OLD_LOKI=2.3.0 export NEW_LOKI=2.4.1 export CONFIG_FILE=loki-local-config.yaml diff --color=always --side-by-side <(docker run --rm -t -v "${PWD}":/config grafana/loki:${OLD_LOKI} -config.file=/config/${CONFIG_FILE} -print-config-stderr 2>&1 | sed '/Starting Loki/q' | tr -d '\r') <(docker run --rm -t -v "${PWD}":/config grafana/loki:${NEW_LOKI} -config.file=/config/${CONFIG_FILE} -print-config-stderr 2>&1 | sed '/Starting Loki/q' | tr -d '\r') | less -R ``` the `tr -d '\r'` is likely not necessary for most people, seems like WSL2 was sneaking in some windows newline characters... The output is incredibly verbose as it shows the entire internal config struct used to run Loki, you can play around with the diff command if you prefer to only show changes or a different style output. ## Main / Unreleased ### Promtail #### The go build tag `promtail_journal_enabled` was introduced The go build tag `promtail_journal_enabled` should be passed to include Journal support to the promtail binary. If you need Journal support you will need to run go build with tag `promtail_journal_enabled`: ```shell go build ./clients/cmd/promtail --tags=promtail_journal_enabled ``` Introducing this tag aims to relieve Linux/CentOS users with CGO enabled from installing libsystemd-dev/systemd-devel libraries if they don't need Journal support. ## 2.7.0 ### Loki ### Loki Canary Permission The new `push` mode to [Loki canary](https://grafana.com/docs/loki/latest/operations/loki-canary/) can push logs that are generated by a Loki canary directly to a given Loki URL. Previously, it only wrote to a local file and you needed some agent, such as promtail, to scrape and push it to Loki. So if you run Loki behind some proxy with different authorization policies to read and write to Loki, then auth credentials we pass to Loki canary now needs to have both `READ` and `WRITE` permissions. ### Engine query timeout is deprecated Previously, we had two configurations to define a query timeout: `engine.timeout` and `querier.query-timeout`. As they were conflicting and `engine.timeout` isn't as expressive as `querier.query-tiomeout`, we're deprecating it in favor of relying on `engine.query-timeout` only. #### `fifocache` has been renamed The in-memory `fifocache` has been renamed to `embedded-cache`. This allows us to replace the implementation (currently a simple FIFO datastructure) with something else in the future without causing confusion #### Evenly spread Memcached pods for chunks across kubernetes nodes We now evenly spread memcached_chunks pods across the available kubernetes nodes, but allowing more than one pod to be scheduled into the same node. If you want to run at most a single pod per node, set `$.memcached.memcached_chunks.use_topology_spread` to false. While we attempt to schedule at most 1 memcached_chunks pod per Kubernetes node with the `topology_spread_max_skew: 1` field, if no more nodes are available then multiple pods will be scheduled on the same node. This can potentially impact your service's reliability so consider tuning these values according to your risk tolerance. #### Evenly spread distributors across kubernetes nodes We now evenly spread distributors across the available kubernetes nodes, but allowing more than one distributors to be scheduled into the same node. If you want to run at most a single distributors per node, set `$._config.distributors.use_topology_spread` to false. While we attempt to schedule at most 1 distributor per Kubernetes node with the `topology_spread_max_skew: 1` field, if no more nodes are available then multiple distributors will be scheduled on the same node. This can potentially impact your service's reliability so consider tuning these values according to your risk tolerance. #### Evenly spread queriers across kubernetes nodes We now evenly spread queriers across the available kubernetes nodes, but allowing more than one querier to be scheduled into the same node. If you want to run at most a single querier per node, set `$._config.querier.use_topology_spread` to false. While we attempt to schedule at most 1 querier per Kubernetes node with the `topology_spread_max_skew: 1` field, if no more nodes are available then multiple queriers will be scheduled on the same node. This can potentially impact your service's reliability so consider tuning these values according to your risk tolerance. #### Default value for `server.http-listen-port` changed This value now defaults to 3100, so the Loki process doesn't require special privileges. Previously, it had been set to port 80, which is a privileged port. If you need Loki to listen on port 80, you can set it back to the previous default using `-server.http-listen-port=80`. #### docker-compose setup has been updated The docker-compose [setup](https://github.com/grafana/loki/blob/main/production/docker) has been updated to **v2.6.0** and includes many improvements. Notable changes include: - authentication (multi-tenancy) is **enabled** by default; you can disable it in `production/docker/config/loki.yaml` by setting `auth_enabled: false` - storage is now using Minio instead of local filesystem - move your current storage into `.data/minio` and it should work transparently - log-generator was added - if you don't need it, simply remove the service from `docker-compose.yaml` or don't start the service #### Configuration for deletes has changed The global `deletion_mode` option in the compactor configuration moved to runtime configurations. - The `deletion_mode` option needs to be removed from your compactor configuration - The `deletion_mode` global override needs to be set to the desired mode: `disabled`, `filter-only`, or `filter-and-delete`. By default, `filter-and-delete` is enabled. - Any `allow_delete` per-tenant overrides need to be removed or changed to `deletion_mode` overrides with the desired mode. #### Metric name for `loki_log_messages_total` changed The name of this metric was changed to `loki_internal_log_messages_total` to reduce ambiguity. The previous name is still present but is deprecated. #### Usage Report / Telemetry config has changed named The configuration for anonymous usage statistics reporting to Grafana has changed from `usage_report` to `analytics`. #### TLS `cipher_suites` and `tls_min_version` have moved These were previously configurable under `server.http_tls_config` and `server.grpc_tls_config` separately. They are now under `server.tls_cipher_suites` and `server.tls_min_version`. These values are also now configurable for individual clients, for example: `distributor.ring.etcd` or `querier.ingester_client.grpc_client_config`. #### Querier `query_timeout` default changed The previous default value for `querier.query_timeout` of `1m` has changed to `0s`. #### `ruler.storage.configdb` has been removed ConfigDB was disallowed as a Ruler storage option back in 2.0. The config struct has finally been removed. #### `ruler.remote_write.client` has been removed Can no longer specify a remote write client for the ruler. ### Promtail #### `gcp_push_target_parsing_errors_total` has a new `reason` label The `gcp_push_target_parsing_errors_total` GCP Push Target metrics has been added a new label named `reason`. This includes detail on what might have caused the parsing to fail. #### Windows event logs: now correctly includes `user_data` The contents of the `user_data` field was erroneously set to the same value as `event_data` in previous versions. This was fixed in [#7461](https://github.com/grafana/loki/pull/7461) and log queries relying on this broken behaviour may be impacted. ## 2.6.0 ### Loki #### Implementation of unwrapped `rate` aggregation changed The implementation of the `rate()` aggregation function changed back to the previous implemention prior to [#5013](https://github.com/grafana/loki/pulls/5013). This means that the rate per second is calculated based on the sum of the extracted values, instead of the average increase over time. If you want the extracted values to be treated as [Counter](https://prometheus.io/docs/concepts/metric_types/#counter) metric, you should use the new `rate_counter()` aggregation function, which calculates the per-second average rate of increase of the vector. #### Default value for `azure.container-name` changed This value now defaults to `loki`, it was previously set to `cortex`. If you are relying on this container name for your chunks or ruler storage, you will have to manually specify `-azure.container-name=cortex` or `-ruler.storage.azure.container-name=cortex` respectively. ## 2.5.0 ### Loki #### `split_queries_by_interval` yaml configuration has moved. It was previously possible to define this value in two places ```yaml query_range: split_queries_by_interval: 10m ``` and/or ``` limits_config: split_queries_by_interval: 10m ``` In 2.5.0 it can only be defined in the `limits_config` section, **Loki will fail to start if you do not remove the `split_queries_by_interval` config from the `query_range` section.** Additionally, it has a new default value of `30m` rather than `0`. The CLI flag is not changed and remains `querier.split-queries-by-interval`. #### Dropped support for old Prometheus rules configuration format Alerting rules previously could be specified in two formats: 1.x format (legacy one, named `v0` internally) and 2.x. We decided to drop support for format `1.x` as it is fairly old and keeping support for it required a lot of code. In case you're still using the legacy format, take a look at [Alerting Rules](https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/) for instructions on how to write alerting rules in the new format. For reference, the newer format follows a structure similar to the one below: ```yaml groups: - name: example rules: - alert: HighErrorRate expr: job:request_latency_seconds:mean5m{job="myjob"} > 0.5 for: 10m labels: severity: page annotations: summary: High request latency ``` Meanwhile, the legacy format is a string in the following format: ``` ALERT IF [ FOR ] [ LABELS