By default this Helm Chart configures meta-monitoring of metrics (service monitoring) and logs (self monitoring).
The `ServiceMonitor` resource works with either the Prometheus Operator or the Grafana Agent Operator, and defines how Loki's metrics should be scraped.
Scraping this Loki cluster using the scrape config defined in the `ServiceMonitor` resource is required for the included dashboards to work. A `MetricsInstance` can be configured to write the metrics to a remote Prometheus instance such as Grafana Cloud Metrics.
The `ServiceMonitor` resource works with either the Prometheus Operator or the Grafana Agent Operator, and defines how Loki's metrics should be scraped. Scraping this Loki cluster using the scrape config defined in the `SerivceMonitor` resource is required for the included dashboards to work. A `MetricsInstance` can be configured to write the metrics to a remote Prometheus instance such as Grafana Cloud Metrics.
*Self monitoring* is enabled by default. This will deploy a `GrafanaAgent`, `LogsInstance`, and `PodLogs` resource which will instruct the Grafana Agent Operator (installed seperately) on how to scrape this Loki cluster's logs and send them back to itself. Scraping this Loki cluster using the scrape config defined in the `PodLogs` resource is required for the included dashboards to work.
_Self monitoring_ is enabled by default. This will deploy a `GrafanaAgent`, `LogsInstance`, and `PodLogs` resource which will instruct the Grafana Agent Operator (installed seperately) on how to scrape this Loki cluster's logs and send them back to itself. Scraping this Loki cluster using the scrape config defined in the `PodLogs` resource is required for the included dashboards to work.
Rules and alerts are automatically deployed.
@ -27,12 +26,91 @@ Rules and alerts are automatically deployed.
- Helm 3 or above. See [Installing Helm](https://helm.sh/docs/intro/install/).
- A running Kubernetes cluster with a running Loki deployment.
- A running Grafana instance.
- A running Prometheus operator in order for recording rules for the dashboards
to work.
- A running Prometheus Operator installed using the `kube-prometheus-stack` Helm chart.
**Prometheus Operator Prequisites**
The dashboards require certain metric labels to display Kubernetes metrics. The best way to accomplish this is to install the `kube-prometheus-stack` Helm chart with the following values file, replacing `CLUSTER_NAME` with the name of your cluster. The cluster name is what you specify during the helm installation, so a cluster installed with the command `helm install loki-cluster grafana/loki` would be called `loki-cluster`.
```yaml
kubelet:
serviceMonitor:
cAdvisorRelabelings:
- action: replace
replacement: <CLUSTER_NAME>
targetLabel: cluster
- targetLabel: metrics_path
sourceLabels:
- "__metrics_path__"
- targetLabel: "instance"
sourceLabels:
- "node"
defaultRules:
additionalRuleLabels:
cluster: <CLUSTER_NAME>
"kube-state-metrics":
prometheus:
monitor:
relabelings:
- action: replace
replacement: <CLUSTER_NAME>
targetLabel: cluster
- targetLabel: "instance"
sourceLabels:
- "__meta_kubernetes_pod_node_name"
"prometheus-node-exporter":
prometheus:
monitor:
relabelings:
- action: replace
replacement: <CLUSTER_NAME>
targetLabel: cluster
- targetLabel: "instance"
sourceLabels:
- "__meta_kubernetes_pod_node_name"
prometheus:
monitor:
relabelings:
- action: replace
replacement: <CLUSTER_NAME>
targetLabel: cluster
```
The `kube-prometheus-stack` installs `ServicMonitor` and `PrometheusRule` resources for monitoring Kubernetes, and it depends on the `kube-state-metrics` and `prometheus-node-exporter` helm charts which also install `ServiceMonitor` resources for collecting `kubelet` and `node-exporter` metrics. The above values file adds the necessary additional labels required for these metrics to work with the included dashboards.
If you are using this helm chart in an environment which does not allow for the installation of `kube-prometheus-stack` or custom CRDs, you should run `helm template` on the `kube-prometheus-stack` helm chart with the above values file, and review all generated `ServiceMonitor` and `PrometheusRule` resources. These resources may have to be modified with the correct ports and selectors to find the various services such as `kubelet` and `node-exporter` in your environment.
**To install the dashboards:**
1. Dashboards are enabled by default. Set `monitoring.dashboards.namespace` to the namespace of the Grafana instance if it is in a different namespace than this Loki cluster.
1. Dashbards must be mounted to your Grafana container. The dashboards are in `ConfigMap`s named `loki-dashboards-1` and `loki-dashboards-2` for Loki, and `enterprise-logs-dashboards-1` and `enterprise-logs-dashboards-2` for GEL. Mount them to `/var/lib/grafana/dashboards/loki-1` and `/var/lib/grafana/dashboards/loki-2` in your Grafana container.
1. Create a dashboard provisioning file called `dashboards.yaml` in `/etc/grafana/provisioning/dashboards` of your Grafana container with the following contents (_note_: you may need to edit the `orgId`):
```yaml
---
apiVersion: 1
providers:
- disableDeletion: true
editable: false
folder: Loki
name: loki-1
options:
path: /var/lib/grafana/dashboards/loki-1
orgId: 1
type: file
- disableDeletion: true
editable: false
folder: Loki
name: loki-2
options:
path: /var/lib/grafana/dashboards/loki-2
orgId: 1
type: file
```
**To add add additional Prometheus rules:**
@ -52,17 +130,16 @@ Rules and alerts are automatically deployed.
expr: sum(rate(container_cpu_usage_seconds_total[1m])) by (node, namespace, pod, container)
```
**To disable monitoring:**
1. Modify the configuration file `values.yaml`:
```yaml
selfMonitoring:
enabled: false
enabled: false
serviceMonitor:
enabled: false
enabled: false
```
**To use a remote Prometheus and Loki instance such as Grafana Cloud**
@ -77,8 +154,8 @@ Rules and alerts are automatically deployed.
name: primary-credentials-metrics
namespace: default
stringData:
username: '<instanceID>'
password: '<APIkey>'
username: "<instanceID>"
password: "<APIkey>"
---
apiVersion: v1
kind: Secret
@ -86,8 +163,8 @@ Rules and alerts are automatically deployed.
name: primary-credentials-logs
namespace: default
stringData:
username: '<instanceID>'
password: '<APIkey>'
username: "<instanceID>"
password: "<APIkey>"
```
2. Add the secret to Kubernetes with `kubectl create -f secret.yaml`.
@ -116,19 +193,19 @@ Rules and alerts are automatically deployed.
@ -11,6 +11,10 @@ Entries should be ordered as follows:
Entries should include a reference to the pull request that introduced the change.
## 4.4.1
- [BUGFIX] Fix a few problems with the included dashboards and allow the rules to be created in a different namespace (which may be necessary based on how your Prometheus Operator is deployed).
## 4.1.1
- [FEATURE] Added `loki.runtimeConfig` helm values to provide a reloadable runtime configuration.
"expr":"topk(5, sum by (name,level) (rate(promtail_custom_bad_words_total{cluster=\"$cluster\", exported_namespace=\"$namespace\"}[$__interval])) - \nsum by (name,level) (rate(promtail_custom_bad_words_total{cluster=\"$cluster\", exported_namespace=\"$namespace\"}[$__interval] offset 1h)))",
"legendFormat":"{{name}}-{{level}}",
"refId":"A"
}
],
"thresholds":[],
"timeFrom":null,
"timeRegions":[],
"timeShift":null,
"title":"Bad Words",
"tooltip":{
"shared":true,
"sort":2,
"value_type":"individual"
},
"type":"graph",
"xaxis":{
"buckets":null,
"mode":"time",
"name":null,
"show":true,
"values":[]
},
"yaxes":[
{
"format":"short",
"label":null,
"logBase":1,
"max":null,
"min":null,
"show":true
},
{
"format":"short",
"label":null,
"logBase":1,
"max":null,
"min":null,
"show":true
}
],
"yaxis":{
"align":false,
"alignLevel":null
}
},
{
"aliasColors":{},
"bars":false,
@ -663,17 +567,17 @@
"steppedLine":false,
"targets":[
{
"expr":"histogram_quantile(0.99, sum by (le) (job_route:loki_request_duration_seconds_bucket:sum_rate{job=~\"($namespace)/cortex-gw\", route=~\"api_prom_push|loki_api_v1_push\", cluster=~\"$cluster\"})) * 1e3",
"expr":"histogram_quantile(0.99, sum by (le) (job_route:loki_request_duration_seconds_bucket:sum_rate{job=~\"($namespace)/(loki|enterprise-logs)-write\", route=~\"api_prom_push|loki_api_v1_push\", cluster=~\"$cluster\"})) * 1e3",
"legendFormat":".99",
"refId":"A"
},
{
"expr":"histogram_quantile(0.75, sum by (le) (job_route:loki_request_duration_seconds_bucket:sum_rate{job=~\"($namespace)/cortex-gw\", route=~\"api_prom_push|loki_api_v1_push\", cluster=~\"$cluster\"})) * 1e3",
"expr":"histogram_quantile(0.75, sum by (le) (job_route:loki_request_duration_seconds_bucket:sum_rate{job=~\"($namespace)/(loki|enterprise-logs)-write\", route=~\"api_prom_push|loki_api_v1_push\", cluster=~\"$cluster\"})) * 1e3",
"legendFormat":".9",
"refId":"B"
},
{
"expr":"histogram_quantile(0.5, sum by (le) (job_route:loki_request_duration_seconds_bucket:sum_rate{job=~\"($namespace)/cortex-gw\", route=~\"api_prom_push|loki_api_v1_push\", cluster=~\"$cluster\"})) * 1e3",
"expr":"histogram_quantile(0.5, sum by (le) (job_route:loki_request_duration_seconds_bucket:sum_rate{job=~\"($namespace)/(loki|enterprise-logs)-write\", route=~\"api_prom_push|loki_api_v1_push\", cluster=~\"$cluster\"})) * 1e3",
{{-ifgt(len$resourceName)253-}}{{-printf"Resource name (%s) exceeds kubernetes limit of 253 character. To fix: shorten release name if this will be a fresh install or shorten zone names (e.g. \"a\" instead of \"zone-a\") if using zone-awareness."$resourceName|fail-}}{{-end-}}