Loki cloud integration instructions (and necessary mixin changes) (#8492)

**What this PR does / why we need it**: This PR is a first stab at documentation for setting up the Grafana Cloud Loki Integration to monitor a local Loki cluster, installed using the helm chart, using Grafana Cloud. In addition to instructions on how to collect the necessary kubernetes metrics, there were also some things with the Loki mixin (which provides dashboards, alerts, and rules to the integration) that needed to be fixed. For example, our dashboards rely on there being a cluster label on our recording rules, yet our mixin does not include this? Also, we are still including `cortex-gw` panels even when the mixin is compile with `internal_components: false`. **Special notes for your reviewer**: The changes to the mixin will not be reflected in the current version of the integration. We need to merge this PR first then update the integration. **Checklist** - [ ] Reviewed the [`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md) guide (**required**) - [ ] Documentation added - [ ] Tests updated - [ ] `CHANGELOG.md` updated - [ ] Changes that require user attention or interaction to upgrade are documented in `docs/sources/upgrading/_index.md` --------- Co-authored-by: Michel Hollands <42814411+MichelHollands@users.noreply.github.com> Co-authored-by: J Stickler <julie.stickler@grafana.com>
3 years ago · dac3b84d08
parent 9f8aa4b98a
commit dac3b84d08
15 changed files with 275 additions and 138 deletions
--- a/docs/sources/installation/helm/monitor-and-alert/_index.md
+++ b/docs/sources/installation/helm/monitor-and-alert/_index.md
@ -0,0 +1,19 @@
+---
+title: Monitoring
+description: monitoring
+weight: 200
+aliases:
+  - /docs/installation/helm/monitor-and-alert
+keywords:
+  - helm 
+  - scalable
+  - simple-scalable
+  - monitor
+---
+
+# Monitoring
+
+There are two common ways to monitor Loki:
+
+- [Monitor using Grafana Cloud (recommended)]({{<relref "with-grafana-cloud">}})
+- [Monitor using using Local Monitoring]({{<relref "with-local-monitoring">}})
--- a/docs/sources/installation/helm/monitor-and-alert/with-grafana-cloud.md
+++ b/docs/sources/installation/helm/monitor-and-alert/with-grafana-cloud.md
@ -0,0 +1,100 @@
+---
+title: Configure Monitoring and Alerting of Loki Using Grafana Cloud
+menuTitle: Monitor Loki with Grafana Cloud
+description: setup monitoring and alerts for Loki using Grafana Cloud
+aliases:
+  - /docs/installation/helm/monitoring/with-grafana-cloud
+weight: 100
+keywords:
+  - monitoring
+  - alert
+  - alerting
+  - grafana cloud
+---
+
+# Configure Monitoring and Alerting of Loki Using Grafana Cloud
+
+This topic will walk you through using Grafana Cloud to monitor a Loki installation that is installed with the Helm chart. This approach leverages many of the chart's _self monitoring_ features, but instead of sending logs back to Loki itself, it sends them to a Grafana Cloud Logs instance. This approach also does not require the installation of the Prometheus Operator and instead sends metrics to a Grafana Cloud Metrics instance. Using Grafana Cloud to monitor Loki has the added benefit of being able to troubleshoot problems with Loki when the Helm installed Loki is down, as the logs will still be available in the Grafana Cloud Logs instance.
+
+**Before you begin:**
+
+- Helm 3 or above. See [Installing Helm](https://helm.sh/docs/intro/install/).
+- A Grafana Cloud account and stack (including Cloud Grafana, Cloud Metrics, and Cloud Logs)
+- [Grafana Kubernetes Monitoring using Agent](/docs/grafana-cloud/kubernetes-monitoring/configuration/config-k8s-agent-guide/) configured for the Kubernetes cluster
+- A running Loki deployment installed in that Kubernetes cluster via the Helm chart
+
+**Prequisites for Monitoring Loki:**
+
+You must setup the Grafana Kubernetes Integration following the instructions in [Grafana Kubernetes Monitoring using Agent](/docs/grafana-cloud/kubernetes-monitoring/configuration/config-k8s-agent-guide/) as this will install necessary components for collecting metrics about your Kubernetes cluster and sending them to Grafana Cloud. Many of the dashboards installed as a part of the Loki integration rely on these metrics.
+
+Walking through this installation will create two Grafana Agent configurations, one for metrics and one for logs, that will add the external label `cluster: cloud`. In order for the Dashboards in the self-hosted Grafana Loki integration to work, the cluster name needs to match your Helm installation name. If you installed Loki using the command `helm install best-loki-cluster grafana/loki`, you would need to change the `cluster` value in both Grafana Agent configurations from `cloud` to `best-loki-cluster` when setting up the Grafana Kubernetes integration.
+
+**To set up the Loki integration in Grafana Cloud:**
+
+1. Get valid Push credentials for your Cloud Metrics and Cloud Logs instances.
+1. Create a secret in the same namespace as Loki to store your Cloud Logs credentials.
+
+   ```bash
+   cat <<'EOF' | NAMESPACE=loki /bin/sh -c 'kubectl apply -n $NAMESPACE -f -'
+   apiVersion: v1
+   data:
+     password: <BASE64_ENCODED_CLOUD_LOGS_PASSWORD>
+     username: <BASE64_ENCODED_CLOUD_LOGS_USERNAME>
+   kind: Secret
+   metadata:
+     name: grafana-cloud-logs-credentials
+   type: Opaque
+   EOF
+   ```
+
+1. Create a secret to store your Cloud Metrics credentials.
+
+   ```bash
+   cat <<'EOF' | NAMESPACE=loki /bin/sh -c 'kubectl apply -n $NAMESPACE -f -'
+   apiVersion: v1
+   data:
+     password: <BASE64_ENCODED_CLOUD_METRICS_PASSWORD>
+     username: <BASE64_ENCODED_CLOUD_METRICS_USERNAME>
+   kind: Secret
+   metadata:
+     name: grafana-cloud-metrics-credentials
+   type: Opaque
+   EOF
+   ```
+
+1. Enable monitoring metrics and logs for the Loki installation to be sent your cloud database instances by adding the following to your Helm `values.yaml` file:
+
+   ```yaml
+   ---
+   monitoring:
+     dashboards:
+       enabled: false
+     rules:
+       enabled: false
+     selfMonitoring:
+       logsInstance:
+         clients:
+           - url: <CLOUD_METRICS_URL>
+             basicAuth:
+               username:
+                 name: grafana-cloud-logs-credentials
+                 key: username
+               password:
+                 name: grafana-cloud-logs-credentials
+                 key: password
+     serviceMonitor:
+       metricsInstance:
+         remoteWrite:
+           - url: <CLOUD_LOGS_URL>
+             basicAuth:
+               username:
+                 name: grafana-cloud-metrics-credentials
+                 key: username
+               password:
+                 name: grafana-cloud-metrics-credentials
+                 key: password
+   ```
+
+1. Install the self-hosted Grafana Loki integration by going to your hosted Grafana instance, clicking the lightning bolt icon labeled **Integrations and Connections**, then search for and install the **Self-hosted Grafana Loki** integration.
+
+1. Once the self-hosted Grafana Loki integration is installed, click the **View Dashboards** button to see the installed dashboards.
--- a/docs/sources/installation/helm/monitor-and-alert/with-local-monitoring.md
+++ b/docs/sources/installation/helm/monitor-and-alert/with-local-monitoring.md
@ -3,7 +3,7 @@ title: Configure monitoring and alerting
 menuTitle: Configure monitoring and alerting
 description: setup monitoring and alerts for the Helm Chart
 aliases:
-  - /docs/installation/helm/monitoring
+  - /docs/installation/helm/monitoring/with-local-monitoring
 weight: 100
 keywords:
  - monitoring
@ -13,9 +13,9 @@ keywords:

 # Configure monitoring and alerting

-By default this Helm Chart configures meta-monitoring of metrics (service monitoring) and logs (self monitoring).
+By default this Helm Chart configures meta-monitoring of metrics (service monitoring) and logs (self monitoring). This topic will walk you through configuring monitoring using a monitoring solution local to the same cluster where Loki is installed.

-The `ServiceMonitor` resource works with either the Prometheus Operator or the Grafana Agent Operator, and defines how Loki's metrics should be scraped. Scraping this Loki cluster using the scrape config defined in the `ServiceMonitor` resource is required for the included dashboards to work. A `MetricsInstance` can be configured to write the metrics to a remote Prometheus instance such as Grafana Cloud Metrics.
+The `ServiceMonitor` resource works with either the Prometheus Operator or the Grafana Agent Operator, and defines how Loki's metrics should be scraped. Scraping this Loki cluster using the scrape config defined in the `SerivceMonitor` resource is required for the included dashboards to work. A `MetricsInstance` can be configured to write the metrics to a remote Prometheus instance such as Grafana Cloud Metrics.

 _Self monitoring_ is enabled by default. This will deploy a `GrafanaAgent`, `LogsInstance`, and `PodLogs` resource which will instruct the Grafana Agent Operator (installed seperately) on how to scrape this Loki cluster's logs and send them back to itself. Scraping this Loki cluster using the scrape config defined in the `PodLogs` resource is required for the included dashboards to work.

@ -80,25 +80,6 @@ prometheus:
        targetLabel: cluster
 ```

-In order to make sure the Prometheus Operator discovers the `ServiceMonitor` resources deployed by the `loki` chart, you will need to make sure those resources have the correct labels the Prometheus Operator is configured to look for. By default this is the key value label pair `release: prometheus`. Make sure the Loki `ServiceMonitor`s have this label by adding the following to the `values.yaml` for your Loki helm chart:
-
-```yaml
-monitoring:
-  serviceMonitor:
-    labels:
-      release: prometheus
-```
-
-This is also true for the `PrometheusRule` resource deployed by the Helm chart, which in addition to a label, need to be in the same namespace as the Prometheus Operator. For example, if you installed the Prometheus Operator in the `monitoring` namespace, you would need to also add the following to the `values.yaml` for you Loki helm chart to ensure recroding rules are properly discoverd:
-
-```yaml
-monitoring:
-  rules:
-    namespace: monitoring
-    labels:
-      release: prometheus
-```
-
 The `kube-prometheus-stack` installs `ServicMonitor` and `PrometheusRule` resources for monitoring Kubernetes, and it depends on the `kube-state-metrics` and `prometheus-node-exporter` helm charts which also install `ServiceMonitor` resources for collecting `kubelet` and `node-exporter` metrics. The above values file adds the necessary additional labels required for these metrics to work with the included dashboards.

 If you are using this helm chart in an environment which does not allow for the installation of `kube-prometheus-stack` or custom CRDs, you should run `helm template` on the `kube-prometheus-stack` helm chart with the above values file, and review all generated `ServiceMonitor` and `PrometheusRule` resources. These resources may have to be modified with the correct ports and selectors to find the various services such as `kubelet` and `node-exporter` in your environment.
@ -131,7 +112,7 @@ If you are using this helm chart in an environment which does not allow for the
       type: file
   ```

-**To add additional Prometheus rules:**
+**To add add additional Prometheus rules:**

 1. Modify the configuration file `values.yaml`:

@ -229,4 +210,4 @@ If you are using this helm chart in an environment which does not allow for the
     enabled: false
   ```

-5. Install the `Loki meta-monitoring` connection on Grafana Cloud.
+5. Install the `Loki meta-motoring` connection on Grafana Cloud.
--- a/docs/sources/installation/helm/reference.md
+++ b/docs/sources/installation/helm/reference.md
@ -2404,9 +2404,9 @@ true
 		<tr>
 			<td>monitoring.serviceMonitor.interval</td>
 			<td>string</td>
-			<td>ServiceMonitor scrape interval</td>
+			<td>ServiceMonitor scrape interval Default is 15s because included recording rules use a 1m rate, and scrape interval needs to be at least 1/4 rate interval.</td>
 			<td><pre lang="json">
-null
+"15s"
 </pre>
 </td>
 		</tr>
--- a/production/helm/loki/values.yaml
+++ b/production/helm/loki/values.yaml
@ -550,7 +550,9 @@ monitoring:
    # -- Additional ServiceMonitor labels
    labels: {}
    # -- ServiceMonitor scrape interval
-    interval: null
+    # Default is 15s because included recording rules use a 1m rate, and scrape interval needs to be at
+    # least 1/4 rate interval.
+    interval: 15s
    # -- ServiceMonitor scrape timeout in Go duration format (e.g. 15s)
    scrapeTimeout: null
    # -- ServiceMonitor relabel configs to apply to samples before scraping
--- a/production/loki-mixin-compiled-ssd/dashboards/loki-operational.json
+++ b/production/loki-mixin-compiled-ssd/dashboards/loki-operational.json
@ -87,7 +87,7 @@
            "steppedLine": false,
            "targets": [
               {
-                  "expr": "sum by (status) (\nlabel_replace(\n  label_replace(\n        rate(loki_request_duration_seconds_count{cluster=\"$cluster\", job=~\"$namespace/cortex-gw(-internal)?\", route=~\"api_prom_query|api_prom_label|api_prom_label_name_values|loki_api_v1_query|loki_api_v1_query_range|loki_api_v1_label|loki_api_v1_label_name_values\"}[5m]),\n  \"status\", \"${1}xx\", \"status_code\", \"([0-9])..\"),\n\"status\", \"${1}\", \"status_code\", \"([a-z]+)\")\n)",
+                  "expr": "sum by (status) (\nlabel_replace(\n  label_replace(\n        rate(loki_request_duration_seconds_count{cluster=\"$cluster\", job=~\"($namespace)/(loki|enterprise-logs)-read\", route=~\"api_prom_query|api_prom_label|api_prom_label_name_values|loki_api_v1_query|loki_api_v1_query_range|loki_api_v1_label|loki_api_v1_label_name_values\"}[5m]),\n  \"status\", \"${1}xx\", \"status_code\", \"([0-9])..\"),\n\"status\", \"${1}\", \"status_code\", \"([a-z]+)\")\n)",
                  "legendFormat": "{{status}}",
                  "refId": "A"
               }
@ -183,7 +183,7 @@
            "steppedLine": false,
            "targets": [
               {
-                  "expr": "sum by (status) (\nlabel_replace(\n  label_replace(\n          rate(loki_request_duration_seconds_count{cluster=\"$cluster\", job=~\"$namespace/cortex-gw(-internal)?\", route=~\"api_prom_push|loki_api_v1_push\"}[5m]),\n   \"status\", \"${1}xx\", \"status_code\", \"([0-9])..\"),\n\"status\", \"${1}\", \"status_code\", \"([a-z]+)\"))",
+                  "expr": "sum by (status) (\nlabel_replace(\n  label_replace(\n          rate(loki_request_duration_seconds_count{cluster=\"$cluster\", job=~\"($namespace)/(loki|enterprise-logs)-write\", route=~\"api_prom_push|loki_api_v1_push\"}[5m]),\n   \"status\", \"${1}xx\", \"status_code\", \"([0-9])..\"),\n\"status\", \"${1}\", \"status_code\", \"([a-z]+)\"))",
                  "legendFormat": "{{status}}",
                  "refId": "A"
               }
@ -229,102 +229,6 @@
               "alignLevel": null
            }
         },
-         {
-            "aliasColors": { },
-            "bars": false,
-            "dashLength": 10,
-            "dashes": false,
-            "datasource": "$datasource",
-            "fieldConfig": {
-               "defaults": {
-                  "custom": { }
-               },
-               "overrides": [ ]
-            },
-            "fill": 1,
-            "fillGradient": 0,
-            "gridPos": {
-               "h": 5,
-               "w": 4,
-               "x": 8,
-               "y": 1
-            },
-            "hiddenSeries": false,
-            "id": 11,
-            "legend": {
-               "avg": false,
-               "current": false,
-               "hideEmpty": false,
-               "hideZero": false,
-               "max": false,
-               "min": false,
-               "show": false,
-               "total": false,
-               "values": false
-            },
-            "lines": true,
-            "linewidth": 1,
-            "nullPointMode": "null",
-            "options": {
-               "dataLinks": [ ]
-            },
-            "panels": [ ],
-            "percentage": false,
-            "pointradius": 2,
-            "points": false,
-            "renderer": "flot",
-            "seriesOverrides": [ ],
-            "spaceLength": 10,
-            "stack": false,
-            "steppedLine": false,
-            "targets": [
-               {
-                  "expr": "topk(5, sum by (name,level) (rate(promtail_custom_bad_words_total{cluster=\"$cluster\", exported_namespace=\"$namespace\"}[$__interval])) - \nsum by (name,level) (rate(promtail_custom_bad_words_total{cluster=\"$cluster\", exported_namespace=\"$namespace\"}[$__interval] offset 1h)))",
-                  "legendFormat": "{{name}}-{{level}}",
-                  "refId": "A"
-               }
-            ],
-            "thresholds": [ ],
-            "timeFrom": null,
-            "timeRegions": [ ],
-            "timeShift": null,
-            "title": "Bad Words",
-            "tooltip": {
-               "shared": true,
-               "sort": 2,
-               "value_type": "individual"
-            },
-            "type": "graph",
-            "xaxis": {
-               "buckets": null,
-               "mode": "time",
-               "name": null,
-               "show": true,
-               "values": [ ]
-            },
-            "yaxes": [
-               {
-                  "format": "short",
-                  "label": null,
-                  "logBase": 1,
-                  "max": null,
-                  "min": null,
-                  "show": true
-               },
-               {
-                  "format": "short",
-                  "label": null,
-                  "logBase": 1,
-                  "max": null,
-                  "min": null,
-                  "show": true
-               }
-            ],
-            "yaxis": {
-               "align": false,
-               "alignLevel": null
-            }
-         },
         {
            "aliasColors": { },
            "bars": false,
@ -662,17 +566,17 @@
            "steppedLine": false,
            "targets": [
               {
-                  "expr": "histogram_quantile(0.99, sum by (le) (job_route:loki_request_duration_seconds_bucket:sum_rate{job=~\"$namespace/cortex-gw(-internal)?\", route=~\"api_prom_push|loki_api_v1_push\", cluster=~\"$cluster\"})) * 1e3",
+                  "expr": "histogram_quantile(0.99, sum by (le) (job_route:loki_request_duration_seconds_bucket:sum_rate{job=~\"($namespace)/(loki|enterprise-logs)-write\", route=~\"api_prom_push|loki_api_v1_push\", cluster=~\"$cluster\"})) * 1e3",
                  "legendFormat": ".99",
                  "refId": "A"
               },
               {
-                  "expr": "histogram_quantile(0.75, sum by (le) (job_route:loki_request_duration_seconds_bucket:sum_rate{job=~\"$namespace/cortex-gw(-internal)?\", route=~\"api_prom_push|loki_api_v1_push\", cluster=~\"$cluster\"})) * 1e3",
+                  "expr": "histogram_quantile(0.75, sum by (le) (job_route:loki_request_duration_seconds_bucket:sum_rate{job=~\"($namespace)/(loki|enterprise-logs)-write\", route=~\"api_prom_push|loki_api_v1_push\", cluster=~\"$cluster\"})) * 1e3",
                  "legendFormat": ".9",
                  "refId": "B"
               },
               {
-                  "expr": "histogram_quantile(0.5, sum by (le) (job_route:loki_request_duration_seconds_bucket:sum_rate{job=~\"$namespace/cortex-gw(-internal)?\", route=~\"api_prom_push|loki_api_v1_push\", cluster=~\"$cluster\"})) * 1e3",
+                  "expr": "histogram_quantile(0.5, sum by (le) (job_route:loki_request_duration_seconds_bucket:sum_rate{job=~\"($namespace)/(loki|enterprise-logs)-write\", route=~\"api_prom_push|loki_api_v1_push\", cluster=~\"$cluster\"})) * 1e3",
                  "legendFormat": ".5",
                  "refId": "C"
               }
--- a/production/loki-mixin-compiled/dashboards/loki-operational.json
+++ b/production/loki-mixin-compiled/dashboards/loki-operational.json
@ -87,7 +87,7 @@
            "steppedLine": false,
            "targets": [
               {
-                  "expr": "sum by (status) (\nlabel_replace(\n  label_replace(\n        rate(loki_request_duration_seconds_count{cluster=\"$cluster\", job=~\"$namespace/cortex-gw(-internal)?\", route=~\"api_prom_query|api_prom_label|api_prom_label_name_values|loki_api_v1_query|loki_api_v1_query_range|loki_api_v1_label|loki_api_v1_label_name_values\"}[5m]),\n  \"status\", \"${1}xx\", \"status_code\", \"([0-9])..\"),\n\"status\", \"${1}\", \"status_code\", \"([a-z]+)\")\n)",
+                  "expr": "sum by (status) (\nlabel_replace(\n  label_replace(\n        rate(loki_request_duration_seconds_count{cluster=\"$cluster\", job=~\"($namespace)/query-frontend\", route=~\"api_prom_query|api_prom_label|api_prom_label_name_values|loki_api_v1_query|loki_api_v1_query_range|loki_api_v1_label|loki_api_v1_label_name_values\"}[5m]),\n  \"status\", \"${1}xx\", \"status_code\", \"([0-9])..\"),\n\"status\", \"${1}\", \"status_code\", \"([a-z]+)\")\n)",
                  "legendFormat": "{{status}}",
                  "refId": "A"
               }
@ -183,7 +183,7 @@
            "steppedLine": false,
            "targets": [
               {
-                  "expr": "sum by (status) (\nlabel_replace(\n  label_replace(\n          rate(loki_request_duration_seconds_count{cluster=\"$cluster\", job=~\"$namespace/cortex-gw(-internal)?\", route=~\"api_prom_push|loki_api_v1_push\"}[5m]),\n   \"status\", \"${1}xx\", \"status_code\", \"([0-9])..\"),\n\"status\", \"${1}\", \"status_code\", \"([a-z]+)\"))",
+                  "expr": "sum by (status) (\nlabel_replace(\n  label_replace(\n          rate(loki_request_duration_seconds_count{cluster=\"$cluster\", job=~\"($namespace)/distributor\", route=~\"api_prom_push|loki_api_v1_push\"}[5m]),\n   \"status\", \"${1}xx\", \"status_code\", \"([0-9])..\"),\n\"status\", \"${1}\", \"status_code\", \"([a-z]+)\"))",
                  "legendFormat": "{{status}}",
                  "refId": "A"
               }
@ -662,17 +662,17 @@
            "steppedLine": false,
            "targets": [
               {
-                  "expr": "histogram_quantile(0.99, sum by (le) (job_route:loki_request_duration_seconds_bucket:sum_rate{job=~\"$namespace/cortex-gw(-internal)?\", route=~\"api_prom_push|loki_api_v1_push\", cluster=~\"$cluster\"})) * 1e3",
+                  "expr": "histogram_quantile(0.99, sum by (le) (job_route:loki_request_duration_seconds_bucket:sum_rate{job=~\"($namespace)/distributor\", route=~\"api_prom_push|loki_api_v1_push\", cluster=~\"$cluster\"})) * 1e3",
                  "legendFormat": ".99",
                  "refId": "A"
               },
               {
-                  "expr": "histogram_quantile(0.75, sum by (le) (job_route:loki_request_duration_seconds_bucket:sum_rate{job=~\"$namespace/cortex-gw(-internal)?\", route=~\"api_prom_push|loki_api_v1_push\", cluster=~\"$cluster\"})) * 1e3",
+                  "expr": "histogram_quantile(0.75, sum by (le) (job_route:loki_request_duration_seconds_bucket:sum_rate{job=~\"($namespace)/distributor\", route=~\"api_prom_push|loki_api_v1_push\", cluster=~\"$cluster\"})) * 1e3",
                  "legendFormat": ".9",
                  "refId": "B"
               },
               {
-                  "expr": "histogram_quantile(0.5, sum by (le) (job_route:loki_request_duration_seconds_bucket:sum_rate{job=~\"$namespace/cortex-gw(-internal)?\", route=~\"api_prom_push|loki_api_v1_push\", cluster=~\"$cluster\"})) * 1e3",
+                  "expr": "histogram_quantile(0.5, sum by (le) (job_route:loki_request_duration_seconds_bucket:sum_rate{job=~\"($namespace)/distributor\", route=~\"api_prom_push|loki_api_v1_push\", cluster=~\"$cluster\"})) * 1e3",
                  "legendFormat": ".5",
                  "refId": "C"
               }
--- a/production/loki-mixin/config.libsonnet
+++ b/production/loki-mixin/config.libsonnet
@ -15,6 +15,11 @@
    // Enable dashboard and panels for Grafana Labs internal components.
    internal_components: false,

+    promtail: {
+      // Whether or not to include promtail specific dashboards
+      enabled: true,
+    },
+
    // SSD related configuration for dashboards.
    ssd: {
      // Support Loki SSD mode on dashboards.
--- a/production/loki-mixin/dashboards/loki-operational.libsonnet
+++ b/production/loki-mixin/dashboards/loki-operational.libsonnet
@ -19,11 +19,16 @@ local utils = import 'mixin-utils/utils.libsonnet';
                                 'Ingester',
                               ],

+                               hiddenPanels:: if $._config.promtail.enabled then [] else [
+                                 'Bad Words',
+                               ],
+
                               jobMatchers:: {
                                 cortexgateway: [utils.selector.re('job', '($namespace)/cortex-gw(-internal)?')],
                                 distributor: [utils.selector.re('job', '($namespace)/%s' % (if $._config.ssd.enabled then '%s-write' % $._config.ssd.pod_prefix_matcher else 'distributor'))],
                                 ingester: [utils.selector.re('job', '($namespace)/%s' % (if $._config.ssd.enabled then '%s-write' % $._config.ssd.pod_prefix_matcher else 'ingester.*'))],
                                 querier: [utils.selector.re('job', '($namespace)/%s' % (if $._config.ssd.enabled then '%s-read' % $._config.ssd.pod_prefix_matcher else 'querier'))],
+                                 queryFrontend: [utils.selector.re('job', '($namespace)/%s' % (if $._config.ssd.enabled then '%s-read' % $._config.ssd.pod_prefix_matcher else 'query-frontend'))],
                               },

                               podMatchers:: {
@ -136,6 +141,7 @@ local utils = import 'mixin-utils/utils.libsonnet';
                                   std.rstripChars(matcherStr('querier'), ',')
                                 ),

+
                               local replaceAllMatchers(expr) =
                                 replaceMatchers(replaceClusterMatchers(expr)),

@ -147,12 +153,33 @@ local utils = import 'mixin-utils/utils.libsonnet';
                               local isRowHidden(row) =
                                 std.member(dashboards['loki-operational.json'].hiddenRows, row),

+                               local isPanelHidden(panelTitle) =
+                                 std.member(dashboards['loki-operational.json'].hiddenPanels, panelTitle),
+
+                               local replaceCortexGateway(expr, replacement) = if $._config.internal_components then
+                                 expr
+                               else
+                                 std.strReplace(
+                                   expr,
+                                   'job=~"$namespace/cortex-gw(-internal)?"',
+                                   matcherStr(replacement, matcher='job', sep='')
+                                 ),
+
+                               local removeInternalComponents(title, expr) = if (title == 'Queries/Second') then
+                                 replaceCortexGateway(expr, 'queryFrontend')
+                               else if (title == 'Pushes/Second') then
+                                 replaceCortexGateway(expr, 'distributor')
+                               else if (title == 'Push Latency') then
+                                 replaceCortexGateway(expr, 'distributor')
+                               else
+                                 replaceAllMatchers(expr),
+
                               panels: [
                                 p {
                                   datasource: selectDatasource(super.datasource),
                                   targets: if std.objectHas(p, 'targets') then [
                                     e {
-                                       expr: replaceAllMatchers(e.expr),
+                                       expr: removeInternalComponents(p.title, e.expr),
                                     }
                                     for e in p.targets
                                   ] else [],
@ -161,7 +188,7 @@ local utils = import 'mixin-utils/utils.libsonnet';
                                       datasource: selectDatasource(super.datasource),
                                       targets: if std.objectHas(sp, 'targets') then [
                                         e {
-                                           expr: replaceAllMatchers(e.expr),
+                                           expr: removeInternalComponents(p.title, e.expr),
                                         }
                                         for e in sp.targets
                                       ] else [],
@ -170,15 +197,17 @@ local utils = import 'mixin-utils/utils.libsonnet';
                                           datasource: selectDatasource(super.datasource),
                                           targets: if std.objectHas(ssp, 'targets') then [
                                             e {
-                                               expr: replaceAllMatchers(e.expr),
+                                               expr: removeInternalComponents(p.title, e.expr),
                                             }
                                             for e in ssp.targets
                                           ] else [],
                                         }
                                         for ssp in sp.panels
+                                         if !(isPanelHidden(ssp.title))
                                       ] else [],
                                     }
                                     for sp in p.panels
+                                     if !(isPanelHidden(sp.title))
                                   ] else [],
                                   title: if !($._config.ssd.enabled && p.type == 'row') then p.title else
                                     if p.title == 'Distributor' then 'Write Path'
@ -186,7 +215,7 @@ local utils = import 'mixin-utils/utils.libsonnet';
                                     else p.title,
                                 }
                                 for p in super.panels
-                                 if !(p.type == 'row' && isRowHidden(p.title))
+                                 if !(p.type == 'row' && isRowHidden(p.title)) && !(isPanelHidden(p.title))
                               ],
                             } +
                             $.dashboard('Loki / Operational', uid='operational')
--- a/production/loki-mixin/mixin-ssd.libsonnet
+++ b/production/loki-mixin/mixin-ssd.libsonnet
@ -4,6 +4,13 @@
  grafanaDashboardFolder: 'Loki SSD',

  _config+:: {
+    internal_components: false,
+
+    // By default the helm chart uses the Grafana Agent instead of promtail
+    promtail+: {
+      enabled: false,
+    },
+
    ssd+: {
      enabled: true,
    },
--- a/tools/dev/k3d/Makefile
+++ b/tools/dev/k3d/Makefile
@ -57,6 +57,9 @@ apply-enterprise-helm-cluster:
 apply-loki-helm-cluster:
 	tk apply --ext-code enterprise=false environments/helm-cluster

+apply-empty-helm-cluster:
+	tk apply --ext-code enterprise=false environments/helm-cluster/empty.jsonnet
+
 down:
 	k3d cluster delete helm-cluster

@ -151,3 +154,12 @@ helm-upgrade-loki-ha-single-binary:

 helm-uninstall-loki-binary:
 	$(HELM) uninstall loki-single-binary -n loki
+
+helm-install-kube-state-metrics:
+	helm install kube-state-metrics --create-namespace --values "$(CURDIR)/environments/helm-cluster/values/kube-state-metrics.yaml
+
+helm-install-enterprise-logs-cloud-monitoring:
+	helm install enterprise-logs-test-fixture "$(HELM_DIR)" -n loki --create-namespace --values "$(CURDIR)/environments/helm-cluster/values/enterprise-logs-cloud-monitoring.yaml"
+
+helm-upgrade-enterprise-logs-cloud-monitoring:
+	helm upgrade enterprise-logs-test-fixture "$(HELM_DIR)" -n loki --values "$(CURDIR)/environments/helm-cluster/values/enterprise-logs-cloud-monitoring.yaml"
--- a/tools/dev/k3d/environments/helm-cluster/empty.jsonnet
+++ b/tools/dev/k3d/environments/helm-cluster/empty.jsonnet
@ -0,0 +1,34 @@
+local k = import 'github.com/grafana/jsonnet-libs/ksonnet-util/kausal.libsonnet';
+local tanka = import 'github.com/grafana/jsonnet-libs/tanka-util/main.libsonnet';
+local configMap = k.core.v1.configMap;
+
+local spec = (import './spec.json').spec;
+
+{
+  _config+:: {
+    namespace: spec.namespace,
+  },
+
+  lokiNamespace: k.core.v1.namespace.new('loki'),
+
+  gelLicenseSecret: k.core.v1.secret.new('gel-license', {}, type='Opaque')
+                    + k.core.v1.secret.withStringData({
+                      'license.jwt': importstr '../../secrets/gel.jwt',
+                    })
+                    + k.core.v1.secret.metadata.withNamespace('loki'),
+  local grafanaCloudCredentials = import '../../secrets/grafana-cloud-credentials.json',
+  grafanaCloudMetricsCredentials: k.core.v1.secret.new('grafana-cloud-metrics-credentials', {}, type='Opaque')
+                                  + k.core.v1.secret.withStringData({
+                                    username: '%d' % grafanaCloudCredentials.metrics.username,
+                                    password: grafanaCloudCredentials.metrics.password,
+                                  })
+                                  + k.core.v1.secret.metadata.withNamespace('loki'),
+  grafanaCloudLogsCredentials: k.core.v1.secret.new('grafana-cloud-logs-credentials', {}, type='Opaque')
+                               + k.core.v1.secret.withStringData({
+                                 username: '%d' % grafanaCloudCredentials.logs.username,
+                                 password: grafanaCloudCredentials.logs.password,
+                               })
+                               + k.core.v1.secret.metadata.withNamespace('loki'),
+
+
+}
--- a/tools/dev/k3d/environments/helm-cluster/spec.json
+++ b/tools/dev/k3d/environments/helm-cluster/spec.json
@ -6,7 +6,7 @@
    "namespace": "environments/helm-cluster/main.jsonnet"
  },
  "spec": {
-    "apiServer": "https://0.0.0.0:38539",
+    "apiServer": "https://0.0.0.0:33931",
    "namespace": "k3d-helm-cluster",
    "resourceDefaults": {},
    "expectVersions": {}
--- a/tools/dev/k3d/environments/helm-cluster/values/enterprise-logs-cloud-monitoring.yaml
+++ b/tools/dev/k3d/environments/helm-cluster/values/enterprise-logs-cloud-monitoring.yaml
@ -0,0 +1,43 @@
+---
+loki:
+  querier:
+    multi_tenant_queries_enabled: true
+enterprise:
+  enabled: true
+  adminToken:
+    secret: "gel-admin-token"
+  useExternalLicense: true
+  externalLicenseName: gel-license
+  provisioner:
+    provisionedSecretPrefix: "provisioned-secret"
+monitoring:
+  dashboards:
+    enabled: false
+  rules:
+    enabled: false
+  selfMonitoring:
+    tenant:
+      name: loki
+    logsInstance:
+      clients:
+        - url: https://logs-prod-us-central1.grafana.net/loki/api/v1/push
+          basicAuth:
+            username:
+              name: grafana-cloud-logs-credentials
+              key: username
+            password:
+              name: grafana-cloud-logs-credentials
+              key: password
+  serviceMonitor:
+    metricsInstance:
+      remoteWrite:
+        - url: https://prometheus-blocks-prod-us-central1.grafana.net/api/prom/push
+          basicAuth:
+            username:
+              name: grafana-cloud-metrics-credentials
+              key: username
+            password:
+              name: grafana-cloud-metrics-credentials
+              key: password
+minio:
+  enabled: true
--- a/tools/dev/k3d/environments/helm-cluster/values/kube-state-metrics.yaml
+++ b/tools/dev/k3d/environments/helm-cluster/values/kube-state-metrics.yaml
@ -0,0 +1 @@
+---