docs(alerting): prom backend to write ALERTS metric (#107006)

* docs(alerting): prom backend to write ALERTS metric * add enterprise label * Update docs/sources/alerting/set-up/configure-alert-state-history/index.md Co-authored-by: Alexander Akhmetov <me@alx.cx> * Update docs/sources/alerting/set-up/configure-alert-state-history/index.md Co-authored-by: Alexander Akhmetov <me@alx.cx> * Update docs/sources/alerting/set-up/configure-alert-state-history/index.md Co-authored-by: Alexander Akhmetov <me@alx.cx> * Update docs/sources/alerting/set-up/configure-alert-state-history/index.md Co-authored-by: Alexander Akhmetov <me@alx.cx> * Update docs/sources/alerting/set-up/configure-alert-state-history/index.md Co-authored-by: Alexander Akhmetov <me@alx.cx> * Update docs/sources/alerting/set-up/meta-monitoring.md Co-authored-by: Alexander Akhmetov <me@alx.cx> * Update docs/sources/alerting/set-up/meta-monitoring.md Co-authored-by: Alexander Akhmetov <me@alx.cx> * Update docs/sources/alerting/set-up/meta-monitoring.md Co-authored-by: Alexander Akhmetov <me@alx.cx> * Update docs/sources/alerting/set-up/meta-monitoring.md Co-authored-by: Alexander Akhmetov <me@alx.cx> * Update docs/sources/alerting/set-up/meta-monitoring.md Co-authored-by: Alexander Akhmetov <me@alx.cx> * Update docs/sources/alerting/set-up/configure-alert-state-history/index.md Co-authored-by: Alexander Akhmetov <me@alx.cx> * Update docs/sources/alerting/set-up/configure-alert-state-history/index.md Co-authored-by: Alexander Akhmetov <me@alx.cx> * Update docs/sources/alerting/set-up/meta-monitoring.md Co-authored-by: Alexander Akhmetov <me@alx.cx> * Update docs/sources/alerting/set-up/meta-monitoring.md Co-authored-by: Alexander Akhmetov <me@alx.cx> --------- Co-authored-by: Alexander Akhmetov <me@alx.cx>
7 months ago · f14492baf8
parent 9062d88ea0
commit f14492baf8
3 changed files with 212 additions and 46 deletions
--- a/docs/sources/alerting/monitor-status/view-alert-state-history.md
+++ b/docs/sources/alerting/monitor-status/view-alert-state-history.md
@ -14,6 +14,22 @@ labels:
    - oss
 title: View alert state history
 weight: 440
+refs:
+  time-series-visualizations:
+    - pattern: /docs/grafana/
+      destination: /docs/grafana/<GRAFANA_VERSION>/panels-visualizations/visualizations/time-series/
+    - pattern: /docs/grafana-cloud/
+      destination: /docs/grafana-cloud/visualizations/panels-visualizations/visualizations/time-series/
+  explore:
+    - pattern: /docs/grafana/
+      destination: /docs/grafana/<GRAFANA_VERSION>/explore/
+    - pattern: /docs/grafana-cloud/
+      destination: /docs/grafana/<GRAFANA_VERSION>/explore/
+  meta-monitoring:
+    - pattern: /docs/grafana/
+      destination: /docs/grafana/<GRAFANA_VERSION>/alerting/monitor/
+    - pattern: /docs/grafana-cloud/
+      destination: /docs/grafana-cloud/alerting-and-irm/alerting/monitor/
 ---

 # View alert state history
@ -22,17 +38,21 @@ View a history of all alert events generated by your Grafana-managed alert rules

 An alert event is displayed each time an alert instance changes its state over a period of time. All alert events are displayed regardless of whether silences or mute timings are set, so you’ll see a complete set of your data history even if you’re not necessarily being notified.

+{{< admonition type="note" >}}
+Grafana OSS and Grafana Enterprise users must [configure alert state history in Loki](/docs/grafana/<GRAFANA_VERSION>/alerting/set-up/configure-alert-state-history/) to view the **History page** and **State history view**.
+{{< /admonition >}}
+
 ## View from the History page

-{{< admonition type="note" >}}
-For Grafana Enterprise and OSS users:
-The feature is available starting with Grafana 11.2.
-To try out the new alert history page, enable the `alertingCentralAlertHistory` feature toggle and configure [Loki annotations](https://grafana.com/docs/grafana/<GRAFANA_VERSION>/alerting/set-up/configure-alert-state-history/).
+The History page shows the history and state changes of all Grafana-managed alert rules. You can filter by labels and alert states.
+
+Users can only view the history of alert rules they have permission to access (RBAC).

-Users can only see the history and transitions of alert rules they have access to (RBAC).
+{{< admonition type="note" >}}
+Grafana OSS and Grafana Enterprise users must also enable the [`alertingCentralAlertHistory`](/docs/grafana/<GRAFANA_VERSION>/setup-grafana/configure-grafana/feature-toggles/) feature toggle to access this page.
 {{< /admonition >}}

-To access the History view, complete the following steps.
+To access the History page, complete the following steps.

 1. Navigate to **Alerts & IRM** -> **Alerting** -> **History**.

@ -59,10 +79,6 @@ Use the State history view to get insight into how your individual alert instanc

 View information on when a state change occurred, what the previous state was, the current state, any other alert instances that changed their state at the same time as well as what the query value was that triggered the change.

-{{< admonition type="note" >}}
-Open source users must [configure alert state history](/docs/grafana/latest/alerting/set-up/configure-alert-state-history/) in order to be able to access the view.
-{{< /admonition >}}
-
 To access the State history view, complete the following steps.

 1. Navigate to **Alerts & IRM** -> **Alerting** -> **Alert rules**.
@ -84,3 +100,9 @@ To access the State history view, complete the following steps.
   The value shown for each instance is for each part of the expression that was evaluated.

 1. Click the labels to filter and narrow down the results.
+
+## Explore and query the alert history
+
+You can also use [Grafana Explore](ref:explore) to query the data sources that store alert history.
+
+For details on querying these data sources, refer to [Alerting Meta monitoring](ref:meta-monitoring).
--- a/docs/sources/alerting/set-up/configure-alert-state-history/index.md
+++ b/docs/sources/alerting/set-up/configure-alert-state-history/index.md
@ -9,58 +9,129 @@ keywords:
  - alert state history
 labels:
  products:
+    - enterprise
    - oss
 title: Configure alert state history
 weight: 250
+refs:
+  explore:
+    - pattern: /docs/grafana/
+      destination: /docs/grafana/<GRAFANA_VERSION>/explore/
+    - pattern: /docs/grafana-cloud/
+      destination: /docs/grafana/<GRAFANA_VERSION>/explore/
+  meta-monitoring:
+    - pattern: /docs/grafana/
+      destination: /docs/grafana/<GRAFANA_VERSION>/alerting/monitor/
+    - pattern: /docs/grafana-cloud/
+      destination: /docs/grafana-cloud/alerting-and-irm/alerting/monitor/
 ---

 # Configure alert state history

-Starting with Grafana 10, Alerting can record all alert rule state changes for your Grafana managed alert rules in a Loki instance.
+Alerting can record all alert rule state changes for your Grafana managed alert rules in a Loki or Prometheus instance, or in both.

-This allows you to explore the behavior of your alert rules in the Grafana explore view and levels up the existing state history dialog box with a powerful new visualisation.
+- With Prometheus, you can query the `GRAFANA_ALERTS` metric for alert state changes in **Grafana Explore**.
+- With Loki, you can query and view alert state changes in **Grafana Explore** and the [Grafana Alerting History views](/docs/grafana/<GRAFANA_VERSION>/alerting/monitor-status/view-alert-state-history/).

-<!-- image here, maybe the one from the blog? -->
+## Configure Loki for alert state

-## Configuring Loki
+The following steps describe a basic configuration:

-To set up alert state history, make sure to have a Loki instance Grafana can write data to. The default settings might need some tweaking as the state history dialog box might query up to 30 days of data.
+1. **Configure Loki**

-The following change to the default configuration should work for most instances, but look at the full Loki configuration settings and adjust according to your needs.
+   The default Loki settings might need some tweaking as the state history view might query up to 30 days of data.

-As this might impact the performances of an existing Loki instance, use a separate Loki instance for the alert state history.
+   The following change to the default configuration should work for most instances, but look at the full Loki configuration settings and adjust according to your needs.

-```yaml
-limits_config:
-  split_queries_by_interval: '24h'
-  max_query_parallelism: 32
+   ```yaml
+   limits_config:
+     split_queries_by_interval: '24h'
+     max_query_parallelism: 32
+   ```
+
+   As this might impact the performances of an existing Loki instance, use a separate Loki instance for the alert state history.
+
+1. **Configure Grafana**
+
+   The following Grafana configuration instructs Alerting to write alert state history to a Loki instance:
+
+   ```toml
+   [unified_alerting.state_history]
+   enabled = true
+   backend = loki
+
+   # The URL of the Loki server
+   loki_remote_url = http://localhost:3100
+   ```
+
+1. **Configure the Loki data source in Grafana**
+
+   Add the [Loki data source](/docs/grafana/<GRAFANA_VERSION>/datasources/loki/) to Grafana.
+
+If everything is set up correctly, you can access the [History view and History page](/docs/grafana/<GRAFANA_VERSION>/alerting/monitor-status/view-alert-state-history/) to view and filter alert state history. You can also use **Grafana Explore** to query the Loki instance, see [Alerting Meta monitoring](/docs/grafana/<GRAFANA_VERSION>/alerting/monitor/) for details.
+
+## Configure Prometheus for alert state (GRAFANA_ALERTS metric)
+
+You can also configure a Prometheus instance to store alert state changes for your Grafana-managed alert rules. However, this setup does not enable the **Grafana Alerting History views**, as Loki does.
+
+Instead, Grafana Alerting writes alert state data to the `GRAFANA_ALERTS` metric-similar to how Prometheus Alerting writes to the `ALERTS` metric.
+
+```
+GRAFANA_ALERTS{alertname="", alertstate="", grafana_alertstate="", grafana_rule_uid="", <additional alert labels>}
 ```

-## Configuring Grafana
+The following steps describe a basic configuration:

-Additional configuration is required in the Grafana configuration file to have it working with the alert state history.
+1. **Configure Prometheus**

-The example below instructs Grafana to write alert state history to a local Loki instance:
+   Enable the remote write receiver in your Prometheus instance by setting the `--web.enable-remote-write-receiver` command-line flag. This enables the endpoint to receive alert state data from Grafana Alerting.

-```toml
-[unified_alerting.state_history]
-enabled = true
-backend = "loki"
-loki_remote_url = "http://localhost:3100"
+1. **Configure the Prometheus data source in Grafana**
+
+   Add the [Prometheus data source](/docs/grafana/<GRAFANA_VERSION>/datasources/prometheus/) to Grafana.
+
+   In the [Prometheus data source configuration options](/docs/grafana/<GRAFANA_VERSION>/datasources/prometheus/configure/), set the **Prometheus type** to match your Prometheus instance type. Grafana Alerting uses this option to identify the remote write endpoint.
+
+1. **Configure Grafana**
+
+   The following Grafana configuration instructs Alerting to write alert state history to a Prometheus instance:
+
+   ```toml
+   [unified_alerting.state_history]
+   enabled = true
+   backend = prometheus
+   # Target data source UID for writing alert state changes.
+   prometheus_target_datasource_uid = <DATA_SOURCE_UID>
+
+   # (Optional) Metric name for the alert state metric. Default is "GRAFANA_ALERTS".
+   # prometheus_metric_name = GRAFANA_ALERTS
+   # (Optional)  Timeout for writing alert state data to the target data source. Default is 10s.
+   # prometheus_write_timeout = 10s
+   ```
+
+You can then use **Grafana Explore** to query the alert state metric. For details, refer to [Alerting Meta monitoring](/docs/grafana/<GRAFANA_VERSION>/alerting/monitor/).
+
+```promQL
+GRAFANA_ALERTS{alertstate='firing'}
 ```

-<!-- TODO can we add some more info here about the feature flags and the various different supported setups with Loki as Primary / Secondary, etc? -->
+## Configure Loki and Prometheus for alert state

-## Adding the Loki data source
+You can also configure both Loki and Prometheus to record alert state changes for your Grafana-managed alert rules.

-Refer to the instructions on [adding a data source](/docs/grafana/latest/administration/data-source-management/).
+Start with the same setup steps as shown in the previous [Loki](#configure-loki-for-alert-state) and [Prometheus](#configure-prometheus-for-alert-state-alerts-metric) sections. Then, adjust your Grafana configuration as follows:

-## Querying the history
+```toml
+[unified_alerting.state_history]
+enabled = true
+backend = multiple

-If everything is set up correctly you can use the Grafana Explore view to start querying the Loki data source.
+primary = loki
+# URL of the Loki server.
+loki_remote_url = http://localhost:3100

-A simple litmus test to see if data is being written correctly into the Loki instance is the following query:
+secondaries = prometheus
+# Target data source UID for writing alert state changes.
+prometheus_target_datasource_uid = <DATA_SOURCE_UID>

-```logQL
-{ from="state-history" } | json
 ```
--- a/docs/sources/alerting/set-up/meta-monitoring.md
+++ b/docs/sources/alerting/set-up/meta-monitoring.md
@ -35,11 +35,57 @@ You can use meta-monitoring metrics to understand the health of your alerting sy

 ## Metrics for Grafana-managed alerts

-To meta monitor Grafana-managed alerts, you need a Prometheus server, or other metrics database to collect and store metrics exported by Grafana.
+To meta monitor Grafana-managed alerts, you can collect two types of metrics in a Prometheus instance:

-For example, if you are using Prometheus, add a `scrape_config` to Prometheus to scrape metrics from Grafana, Alertmanager, or your data sources.
+- **State history metric (`GRAFANA_ALERTS`)** — Exported by Grafana Alerting as part of alert state history.

-### Example
+- **Scraped metrics** — Exported by Grafana’s `/metrics` endpoint to monitor alerting activity and performance.
+
+You need a Prometheus-compatible server to collect and store these metrics.
+
+### `GRAFANA_ALERTS` metric
+
+If you have configured [Prometheus for alert state history](/docs/grafana/<GRAFANA_VERSION>/alerting/set-up/configure-alert-state-history/), Grafana writes alert state changes to the `ALERTS` metric:
+
+```
+GRAFANA_ALERTS{alertname="", alertstate="", grafana_alertstate="", grafana_rule_uid="", <additional alert labels>}
+```
+
+This `GRAFANA_ALERTS` metric is compatible with the `ALERTS` metric used by Prometheus Alerting and includes two additional labels:
+
+1. A new `grafana_rule_uid` label for the UID of the Grafana rule.
+2. A new `grafana_alertstate` label for the Grafana alert state, which differs slightly from the equivalent Prometheus state included in the `alertstate` label.
+
+   | Grafana state  | `alertstate`          | `grafana_alertstate`  |
+   | -------------- | --------------------- | --------------------- |
+   | **Alerting**   | `firing`              | `alerting`            |
+   | **Recovering** | `firing`              | `recovering`          |
+   | **Pending**    | `pending`             | `pending`             |
+   | **Error**      | `firing`              | `error`               |
+   | **NoData**     | `firing`              | `nodata`              |
+   | **Normal**     | _(no metric emitted)_ | _(no metric emitted)_ |
+
+You can then query this metric like any other Prometheus metric:
+
+{{< code >}}
+
+```firing-alerts
+GRAFANA_ALERTS{grafana_alertstate="alerting"}
+```
+
+```recovering-alerts
+GRAFANA_ALERTS{grafana_alertstate="recovering"}
+```
+
+```critical-alerts-in-pending
+GRAFANA_ALERTS{grafana_alertstate="pending", severity="critical"}
+```
+
+    {{< /code >}}
+
+### Scraped metrics
+
+To collect scraped Alerting metrics, configure Prometheus to scrape metrics from Grafana.

 ```yaml
 - job_name: grafana
@ -54,8 +100,6 @@ For example, if you are using Prometheus, add a `scrape_config` to Prometheus to
        - grafana:3000
 ```

-### List of available metrics
-
 The Grafana ruler, which is responsible for evaluating alert rules, and the Grafana Alertmanager, which is responsible for sending notifications of firing and resolved alerts, provide a number of metrics that let you observe them.

 #### grafana_alerting_alerts
@ -82,6 +126,39 @@ This metric is a gauge that shows you the number of seconds that the scheduler i

 This metric is a histogram that shows you the number of seconds taken to send notifications for firing and resolved alerts. This metric lets you observe slow or over-utilized integrations, such as an SMTP server that is being given emails faster than it can send them.

+## Logs for Grafana-managed alerts
+
+If you have configured [Loki for alert state history](/docs/grafana/<GRAFANA_VERSION>/alerting/set-up/configure-alert-state-history/), logs related to state changes in Grafana-managed alerts are stored in the Loki data source.
+
+You can use **Grafana Explore** and the Loki query editor to search for alert state changes.
+
+{{< code >}}
+
+```basic-query
+{from="state-history"} | json
+```
+
+```additional-filters
+{from="state-history"} | json | previous=~"Normal.*" | current=~"Alerting.*"
+```
+
+```failing-rules
+{from="state-history"} | json | current=~"Error.*"
+```
+
+    {{< /code >}}
+
+In the **Logs** view, you can review details for individual alerts by selecting fields such as:
+
+- `previous`: previous alert instance state.
+- `current`: current alert instance state.
+- `ruleTitle`: alert rule title.
+- `ruleID` and `ruleUID`.
+- `labels_alertname`, `labels_new_label`, and `labels_grafana_folder`.
+- Additional available fields.
+
+Alternatively, you can access the [History page](/docs/grafana/<GRAFANA_VERSION>/alerting/monitor-status/view-alert-state-history/) in Grafana to visualize and filter state changes for individual alerts or all alerts.
+
 ## Metrics for Mimir-managed alerts

 To meta monitor Grafana Mimir-managed alerts, open source and on-premise users need a Prometheus/Mimir server, or another metrics database to collect and store metrics exported by the Mimir ruler.
@ -96,8 +173,6 @@ To meta monitor the Alertmanager, you need a Prometheus/Mimir server, or another

 For example, if you are using Prometheus you should add a `scrape_config` to Prometheus to scrape metrics from your Alertmanager.

-### Example
-
 ```yaml
 - job_name: alertmanager
  honor_timestamps: true
@ -111,8 +186,6 @@ For example, if you are using Prometheus you should add a `scrape_config` to Pro
        - alertmanager:9093
 ```

-### List of available metrics
-
 The following is a list of available metrics for Alertmanager.

 #### alertmanager_alerts