mirror of https://github.com/grafana/grafana
docs(alerting): Add two common examples in `Learn` section (#105325)
* docs(alerting): Add two common examples in `Learn` section * Update docs/sources/alerting/learn/examples/multi-dimensional-alerts.md Co-authored-by: Johnny Kartheiser <140559259+JohnnyK-Grafana@users.noreply.github.com> * Update docs/sources/alerting/learn/examples/multi-dimensional-alerts.md Co-authored-by: Johnny Kartheiser <140559259+JohnnyK-Grafana@users.noreply.github.com> * mention `summary` annotation in multi-dimensional alerts example * Remove note about alert grouping * minor edits to section: `Differences with time series` * minor grammar change --------- Co-authored-by: Johnny Kartheiser <140559259+JohnnyK-Grafana@users.noreply.github.com>pull/105414/head
parent
0c699d4a72
commit
4ae91715df
@ -0,0 +1,20 @@ |
||||
--- |
||||
canonical: https://grafana.com/docs/grafana/latest/alerting/learn/examples/ |
||||
description: This section provides practical examples of alert rules for common monitoring scenarios. |
||||
keywords: |
||||
- grafana |
||||
labels: |
||||
products: |
||||
- cloud |
||||
- enterprise |
||||
- oss |
||||
menuTitle: Examples |
||||
title: Grafana Alerting Examples |
||||
weight: 1100 |
||||
--- |
||||
|
||||
# Grafana Alerting Examples |
||||
|
||||
This section provides practical examples of alert rules for common monitoring scenarios. Each example focuses on a specific use case, showing how to structure queries, evaluate conditions, and understand how Grafana generates alert instances. |
||||
|
||||
{{< section >}} |
@ -0,0 +1,156 @@ |
||||
--- |
||||
canonical: https://grafana.com/docs/grafana/latest/alerting/learn/examples/multi-dimensional-alerts/ |
||||
description: This example shows how a single alert rule can generate multiple alert instances using time series data. |
||||
keywords: |
||||
- grafana |
||||
labels: |
||||
products: |
||||
- cloud |
||||
- enterprise |
||||
- oss |
||||
menuTitle: Multi-dimensional alerts |
||||
title: Example of multi-dimensional alerts on time series data |
||||
weight: 1101 |
||||
refs: |
||||
testdata-data-source: |
||||
- pattern: /docs/grafana/ |
||||
destination: /docs/grafana/<GRAFANA_VERSION>/datasources/testdata/ |
||||
- pattern: /docs/grafana-cloud/ |
||||
destination: /docs/grafana-cloud/connect-externally-hosted/data-sources/testdata/ |
||||
table-data-example: |
||||
- pattern: /docs/grafana/ |
||||
destination: /docs/grafana/<GRAFANA_VERSION>/alerting/learn/examples/multi-dimensional-alerts/table-data/ |
||||
- pattern: /docs/grafana-cloud/ |
||||
destination: /docs/grafana-cloud/alerting-and-irm/alerting/learn/examples/multi-dimensional-alerts/table-data/ |
||||
annotations: |
||||
- pattern: /docs/grafana/ |
||||
destination: /docs/grafana/<GRAFANA_VERSION>/alerting/fundamentals/alert-rules/annotation-label/#annotations |
||||
- pattern: /docs/grafana-cloud/ |
||||
destination: /docs/grafana-cloud/alerting-and-irm/alerting/fundamentals/alert-rules/annotation-label/#annotations |
||||
reduce-expression: |
||||
- pattern: /docs/grafana/ |
||||
destination: /docs/grafana/<GRAFANA_VERSION>/alerting/fundamentals/alert-rules/queries-conditions/#reduce |
||||
- pattern: /docs/grafana-cloud/ |
||||
destination: /docs/grafana-cloud/alerting-and-irm/alerting/fundamentals/alert-rules/queries-conditions/#reduce |
||||
alert-grouping: |
||||
- pattern: /docs/grafana/ |
||||
destination: /docs/grafana/<GRAFANA_VERSION>/alerting/fundamentals/notifications/group-alert-notifications/ |
||||
- pattern: /docs/grafana-cloud/ |
||||
destination: /docs/grafana-cloud/alerting-and-irm/alerting/fundamentals/notifications/group-alert-notifications/ |
||||
--- |
||||
|
||||
# Example of multi-dimensional alerts on time series data |
||||
|
||||
This example shows how a single alert rule can generate multiple alert instances — one for each label set (or time series). This is called **multi-dimensional alerting**: one alert rule, many alert instances. |
||||
|
||||
In Prometheus, each unique combination of labels defines a distinct time series. Grafana Alerting uses the same model: each label set is evaluated independently, and a separate alert instance is created for each series. |
||||
|
||||
This pattern is common in dynamic environments when monitoring a group of components like multiple CPUs, containers, or per-host availability. Instead of defining individual alert rules or aggregated alerts, you alert on _each dimension_ — so you can detect particular issues and include that level of detail in notifications. |
||||
|
||||
For example, a query returns one series per CPU: |
||||
|
||||
| `cpu` label value | CPU percent usage | |
||||
| :---------------- | :---------------- | |
||||
| cpu-0 | 95 | |
||||
| cpu-1 | 30 | |
||||
| cpu-2 | 85 | |
||||
|
||||
With a threshold of `> 80`, this would trigger two alert instances for `cpu-0` and one for `cpu-2`. |
||||
|
||||
## Examples overview |
||||
|
||||
Imagine you want to trigger alerts when CPU usage goes above 80%, and you want to track each CPU core independently. |
||||
|
||||
You can use a Prometheus query like this: |
||||
|
||||
``` |
||||
sum by(cpu) ( |
||||
rate(node_cpu_seconds_total{mode!="idle"}[1m]) |
||||
) |
||||
``` |
||||
|
||||
This query returns the active CPU usage rate per CPU core, averaged over the past minute. |
||||
|
||||
| CPU core | Active usage rate | |
||||
| :------- | :---------------- | |
||||
| cpu-0 | 95 | |
||||
| cpu-1 | 30 | |
||||
| cpu-2 | 85 | |
||||
|
||||
This produces one series for each existing CPU. |
||||
|
||||
When Grafana Alerting evaluates the query, it creates an individual alert instance for each returned series. |
||||
|
||||
| Alert instance | Value | |
||||
| :------------- | :---- | |
||||
| {cpu="cpu-0"} | 95 | |
||||
| {cpu="cpu-1"} | 30 | |
||||
| {cpu="cpu-2"} | 85 | |
||||
|
||||
With a threshold condition like `$A > 80`, Grafana evaluates each instance separately and fires alerts only where the condition is met: |
||||
|
||||
| Alert instance | Value | State | |
||||
| :------------- | :---- | :----- | |
||||
| {cpu="cpu-0"} | 95 | Firing | |
||||
| {cpu="cpu-1"} | 30 | Normal | |
||||
| {cpu="cpu-2"} | 85 | Firing | |
||||
|
||||
Multi-dimensional alerts help you surface issues on individual components—problems that might be missed when alerting on aggregated data (like total CPU usage). |
||||
|
||||
Each alert instance targets a specific component, identified by its unique label set. This makes alerts more specific and actionable. For example, you can set a [`summary` annotation](ref:annotations) in your alert rule that identifies the affected CPU: |
||||
|
||||
``` |
||||
High CPU usage on {{$labels.cpu}} |
||||
``` |
||||
|
||||
In the previous example, the two firing alert instances would display summaries indicating the affected CPUs: |
||||
|
||||
- High CPU usage on `cpu-0` |
||||
- High CPU usage on `cpu-2` |
||||
|
||||
## Try it with TestData |
||||
|
||||
You can quickly experiment with multi-dimensional alerts using the [**TestData** data source](ref:testdata-data-source), which can generate multiple random time series. |
||||
|
||||
1. Add the **TestData** data source through the **Connections** menu. |
||||
1. Go to **Alerting** and create an alert rule |
||||
1. Select **TestData** as the data source. |
||||
1. Configure the TestData scenario |
||||
|
||||
1. Scenario: **Random Walk** |
||||
1. Series count: 3 |
||||
1. Start value: 70, Max: 100 |
||||
1. Labels: `cpu=cpu-$seriesIndex` |
||||
|
||||
{{< figure src="/media/docs/alerting/testdata-random-series.png" max-width="750px" alt="Generating random time series data using the TestData data source" >}} |
||||
|
||||
## Reduce time series data for comparison |
||||
|
||||
The example returns three time series like shown above with values across the selected time range. |
||||
|
||||
To alert on each series, you need to reduce the time series to a single value that the alert condition can evaluate and determine the alert instance state. |
||||
|
||||
Grafana Alerting provides several ways to reduce time series data: |
||||
|
||||
- **Data source query functions**. The earlier example used the Prometheus `sum` function to sum the rate results by `cpu,`producing a single value per CPU core. |
||||
- **Reduce expression**. In the query and condition section, Grafana provides the `Reduce` expression to aggregate time series data. |
||||
- In **Default mode**, the **When** input selects a reducer (like `last`, `mean`, or `min`), and the threshold compares that reduced value. |
||||
- In **Advanced mode**, you can add the [**Reduce** expression](ref:reduce-expression) (e.g., `last()`, `mean()`) before defining the threshold (alert condition). |
||||
|
||||
For demo purposes, this example uses the **Advanced mode** with a **Reduce** expression: |
||||
|
||||
1. Toggle **Advanced mode** in the top right section of the query panel to enable adding additional expressions. |
||||
1. Add the **Reduce** expression using a function like `mean()` to reduce each time series to a single value. |
||||
1. Define the alert condition using a **Threshold** like `$reducer > 80` |
||||
1. Click **Preview** to evaluate the alert rule. |
||||
|
||||
{{< figure src="/media/docs/alerting/using-expressions-with-multiple-series.png" max-width="750px" caption="The alert condition evaluates the reduced value for each alert instance and shows whether each instance is Firing or Normal." alt="Alert preview using a Reduce expression and a threshold condition" >}} |
||||
|
||||
## Learn more |
||||
|
||||
This example shows how Grafana Alerting implements a multi-dimensional alerting model: one rule, many alert instances and why reducing time series data to a single value is required for evaluation. |
||||
|
||||
For additional learning resources, check out: |
||||
|
||||
- [Get started with Grafana Alerting – Part 2](https://grafana.com/tutorials/alerting-get-started-pt2/) |
||||
- [Example of alerting on tabular data](ref:table-data-example) |
@ -0,0 +1,128 @@ |
||||
--- |
||||
canonical: https://grafana.com/docs/grafana/latest/alerting/learn/examples/table-data |
||||
description: This example shows how to create an alert rule using table data. |
||||
keywords: |
||||
- grafana |
||||
labels: |
||||
products: |
||||
- cloud |
||||
- enterprise |
||||
- oss |
||||
menuTitle: Table data |
||||
title: Example of alerting on tabular data |
||||
weight: 1102 |
||||
refs: |
||||
testdata-data-source: |
||||
- pattern: /docs/grafana/ |
||||
destination: /docs/grafana/<GRAFANA_VERSION>/datasources/testdata/ |
||||
- pattern: /docs/grafana-cloud/ |
||||
destination: /docs/grafana-cloud/connect-externally-hosted/data-sources/testdata/ |
||||
multi-dimensional-example: |
||||
- pattern: /docs/grafana/ |
||||
destination: /docs/grafana/<GRAFANA_VERSION>/alerting/learn/examples/multi-dimensional-alerts/ |
||||
- pattern: /docs/grafana-cloud/ |
||||
destination: /docs/grafana-cloud/alerting-and-irm/alerting/learn/examples/multi-dimensional-alerts/ |
||||
--- |
||||
|
||||
# Example of alerting on tabular data |
||||
|
||||
Not all data sources return time series data. SQL databases, CSV files, and some APIs often return results as rows or arrays of columns or fields — commonly referred to as tabular data. |
||||
|
||||
This example shows how to create an alert rule using data in table format. Grafana treats each row as a separate alert instance, as long as the data meets the expected format. |
||||
|
||||
## How Grafana Alerting evaluates tabular data |
||||
|
||||
When a query returns data in table format, Grafana transforms each row into a separate alert instance. |
||||
|
||||
To evaluate each row (alert instance), it expects: |
||||
|
||||
1. **Only one numeric column.** This is the value used for evaluating the alert condition. |
||||
1. **Non-numeric columns.** These columns defines the label set. The column name becomes a label name; and the cell value becomes the label value. |
||||
1. **Unique label sets per row.** Each row must be uniquely identifiable by its labels. This ensures each row represents a distinct alert instance. |
||||
|
||||
{{< admonition type="caution" >}} |
||||
These three conditions must be met—otherwise, Grafana can’t evaluate the table data and the rule will fail. |
||||
{{< /admonition >}} |
||||
|
||||
## Example overview |
||||
|
||||
Imagine you store disk usage in a `DiskSpace` table and you want to trigger alerts when the available space drops below 5%. |
||||
|
||||
| Time | Host | Disk | PercentFree | |
||||
| ---------- | ---- | ---- | ----------- | |
||||
| 2021-06-07 | web1 | /etc | 3 | |
||||
| 2021-06-07 | web2 | /var | 4 | |
||||
| 2021-06-07 | web3 | /var | 8 | |
||||
|
||||
To calculate the free space per Host and Disk in this case, you can use `$__timeFilter` to filter by time but without returning the date to Grafana: |
||||
|
||||
```sql |
||||
SELECT |
||||
Host, |
||||
Disk, |
||||
AVG(PercentFree) AS PercentFree |
||||
FROM DiskSpace |
||||
WHERE $__timeFilter(Time) |
||||
GROUP BY Host, Disk |
||||
``` |
||||
|
||||
This query returns the following table response: |
||||
|
||||
| Host | Disk | PercentFree | |
||||
| ---- | ---- | ----------- | |
||||
| web1 | /etc | 3 | |
||||
| web2 | /var | 4 | |
||||
| web3 | /var | 8 | |
||||
|
||||
When Alerting evaluates the query response, the data is transformed into three alert instances as previously detailed: |
||||
|
||||
- The numeric column becomes the value for the alert condition. |
||||
- Additional columns define the label set for each alert instance. |
||||
|
||||
| Alert instance | Value | |
||||
| ---------------------------- | ----- | |
||||
| `{Host="web1", Disk="/etc"}` | 3 | |
||||
| `{Host="web2", Disk="/var"}` | 4 | |
||||
| `{Host="web3", Disk="/var"}` | 8 | |
||||
|
||||
Finally, an alert condition that checks for less than 5% of free space (`$A < 5`) would result in two alert instances firing: |
||||
|
||||
| Alert instance | Value | State | |
||||
| ---------------------------- | ----- | ------ | |
||||
| `{Host="web1", Disk="/etc"}` | 3 | Firing | |
||||
| `{Host="web2", Disk="/var"}` | 4 | Firing | |
||||
| `{Host="web3", Disk="/var"}` | 8 | Normal | |
||||
|
||||
## Try it with TestData |
||||
|
||||
To test this quickly, you can simulate the table using the [**TestData** data source](ref:testdata-data-source): |
||||
|
||||
1. Add the **TestData** data source through the **Connections** menu. |
||||
1. Go to **Alerting** and create an alert rule |
||||
1. Select **TestData** as the data source. |
||||
1. From **Scenario**, select **CSV Content** and paste this CSV: |
||||
|
||||
```bash |
||||
host, disk, percentFree |
||||
web1, /etc, 3 |
||||
web2, /var, 4 |
||||
web3, /var, 8 |
||||
``` |
||||
|
||||
1. Set a condition like `$A < 5` and **Preview** the alert. |
||||
|
||||
Grafana evaluates the table data and fires the two first alert instances. |
||||
|
||||
{{< figure src="/media/docs/alerting/example-table-data-preview.png" max-width="750px" alt="Alert preview with tabular data using the TestData data source" >}} |
||||
|
||||
## **Differences with time series data** |
||||
|
||||
Working with time series is similar—each series is treated as a separate alert instance, based on its label set. |
||||
|
||||
The key difference is the data format: |
||||
|
||||
- **Time series data** contains multiple values over time, each with its own timestamp. |
||||
To evaluate the alert condition, alert rules **must reduce each series to a single number** using a function like `last()`, `avg()`, or `max()`. |
||||
- **Tabular data** doesn’t require reduction, as each row contains only a single numeric value used to evaluate the alert condition. |
||||
|
||||
For comparison, see the [multi-dimensional time series data example](ref:multi-dimensional-example). |
Loading…
Reference in new issue