mirror of https://github.com/grafana/grafana
Docs: Cleanup alerting documentation, part 1 (#40737)
* First commit. * Adding shared content. * More changes. * More changes * Updated few more topics, fixed broken relrefs. * Checking in changes. * Some more topics scrubbed. * Minor update. * Few more changes. * Index pages are finally somewhat sorted. Added relevant information and new topics. * Updated Alert grouping. * Last bunch of changes for today. * Updated folder names, relrefs, and some topic weights. * Fixed typo in L37, notifications topic. * Fixed another typo. * Run prettier. * Fixed remaining broken relrefs. * Minor reorg, added link to basics some overview topic. * Some more re-org of the basics section. * Some more changes. * More changes. * Update docs/sources/shared/alerts/grafana-managed-alerts.md Co-authored-by: Fiona Artiaga <89225282+GrafanaWriter@users.noreply.github.com> * Update docs/sources/alerting/unified-alerting/_index.md Co-authored-by: Eve Meelan <81647476+Eve832@users.noreply.github.com> * Update docs/sources/alerting/unified-alerting/_index.md Co-authored-by: Eve Meelan <81647476+Eve832@users.noreply.github.com> * Update docs/sources/alerting/unified-alerting/opt-in.md Co-authored-by: Eve Meelan <81647476+Eve832@users.noreply.github.com> * Update docs/sources/alerting/unified-alerting/notification-policies.md Co-authored-by: Eve Meelan <81647476+Eve832@users.noreply.github.com> * Update docs/sources/alerting/unified-alerting/alert-groups.md Co-authored-by: Eve Meelan <81647476+Eve832@users.noreply.github.com> * Update docs/sources/alerting/unified-alerting/alerting-rules/_index.md Co-authored-by: Fiona Artiaga <89225282+GrafanaWriter@users.noreply.github.com> * Update docs/sources/alerting/unified-alerting/alerting-rules/alert-annotation-label.md Co-authored-by: Eve Meelan <81647476+Eve832@users.noreply.github.com> * Update docs/sources/alerting/unified-alerting/alerting-rules/alert-annotation-label.md Co-authored-by: Fiona Artiaga <89225282+GrafanaWriter@users.noreply.github.com> * Update docs/sources/alerting/unified-alerting/alerting-rules/alert-annotation-label.md Co-authored-by: Fiona Artiaga <89225282+GrafanaWriter@users.noreply.github.com> * Update docs/sources/alerting/unified-alerting/alerting-rules/create-cortex-loki-managed-recording-rule.md Co-authored-by: Fiona Artiaga <89225282+GrafanaWriter@users.noreply.github.com> * Ran prettier and applied suggestion from code review. * Update docs/sources/alerting/unified-alerting/message-templating/_index.md Co-authored-by: Fiona Artiaga <89225282+GrafanaWriter@users.noreply.github.com> * Update docs/sources/alerting/unified-alerting/contact-points.md Co-authored-by: Fiona Artiaga <89225282+GrafanaWriter@users.noreply.github.com> * Update docs/sources/alerting/unified-alerting/contact-points.md Co-authored-by: Fiona Artiaga <89225282+GrafanaWriter@users.noreply.github.com> * Change from code review. Also fixed typo "bos" in playlist topic. * Ran prettier to fix formatting issues. * Update docs/sources/alerting/unified-alerting/alerting-rules/edit-cortex-loki-namespace-group.md Co-authored-by: Fiona Artiaga <89225282+GrafanaWriter@users.noreply.github.com> * Update docs/sources/alerting/unified-alerting/contact-points.md Co-authored-by: Fiona Artiaga <89225282+GrafanaWriter@users.noreply.github.com> * Update docs/sources/alerting/unified-alerting/basics/alertmanager.md Co-authored-by: Fiona Artiaga <89225282+GrafanaWriter@users.noreply.github.com> * Update docs/sources/alerting/unified-alerting/basics/alertmanager.md Co-authored-by: Fiona Artiaga <89225282+GrafanaWriter@users.noreply.github.com> * Update docs/sources/alerting/unified-alerting/basics/evaluate-grafana-alerts.md Co-authored-by: Fiona Artiaga <89225282+GrafanaWriter@users.noreply.github.com> * Update docs/sources/alerting/unified-alerting/contact-points.md Co-authored-by: Fiona Artiaga <89225282+GrafanaWriter@users.noreply.github.com> * More changes from code review. * Replaced drop down with drop-down * Fix broken relrefs * Update docs/sources/alerting/unified-alerting/alerting-rules/create-cortex-loki-managed-rule.md Co-authored-by: Eve Meelan <81647476+Eve832@users.noreply.github.com> * Update docs/sources/alerting/unified-alerting/alerting-rules/rule-list.md Co-authored-by: Eve Meelan <81647476+Eve832@users.noreply.github.com> * Few more. * Couple more. Co-authored-by: Fiona Artiaga <89225282+GrafanaWriter@users.noreply.github.com> Co-authored-by: Eve Meelan <81647476+Eve832@users.noreply.github.com>pull/40906/head
parent
c9654c4bc0
commit
14225b07b2
@ -1,26 +0,0 @@ |
|||||||
+++ |
|
||||||
title = "What's New with Grafana 8 alerts" |
|
||||||
description = "What's New with Grafana 8 Alerts" |
|
||||||
keywords = ["grafana", "alerting", "guide"] |
|
||||||
weight = 112 |
|
||||||
+++ |
|
||||||
|
|
||||||
# What's New with Grafana 8 alerts |
|
||||||
|
|
||||||
The alerts released with Grafana 8.0 centralizes alerting information for Grafana managed alerts and alerts from Prometheus-compatible datasources in one UI and API. You can create and edit alerting rules for Grafana managed alerts, Cortex alerts, and Loki alerts as well as see alerting information from prometheus-compatible datasources in a single, searchable view. |
|
||||||
|
|
||||||
## Multi-dimensional alerting |
|
||||||
|
|
||||||
Create alerts that will give you system-wide visibility with a single alerting rule. With Grafana 8 alerts, you are able to generate multiple alert instances from a single rule eg. creating a rule to monitor disk usage for multiple mount points on a single host. The evaluation engine is able to return multiple time series from a single query. Each time series is identified by its label set. |
|
||||||
|
|
||||||
## Create alerts outside of Dashboards |
|
||||||
|
|
||||||
Grafana legacy alerts were tied to a dashboard. Grafana 8 Alerts allow you to create queries and expressions that can combine data from multiple sources, in unique ways. You are still able to link dashboards and panels to alerting rules, allowing you to quickly troubleshoot the system under observation, by linking a dashboard and/or panel ID to the alerting rule. |
|
||||||
|
|
||||||
## Create Loki and Cortex alerting rules |
|
||||||
|
|
||||||
With Grafana 8 Alerts you are able to manage your Loki and Cortex alerting rules using the same UI and API as your Grafana managed alerts. |
|
||||||
|
|
||||||
## View and search for alerts from Prometheus |
|
||||||
|
|
||||||
You can now display all of your alerting information in one, searchable UI. Alerts for Prometheus compatible datasources are listed below Grafana managed alerts. Search for labels across multiple datasources to quickly find all of the relevant alerts. |
|
||||||
@ -0,0 +1,60 @@ |
|||||||
|
+++ |
||||||
|
title = "Annotations and labels for alerting rules" |
||||||
|
description = "Annotations and labels for alerting" |
||||||
|
keywords = ["grafana", "alerting", "guide", "rules", "create"] |
||||||
|
weight = 401 |
||||||
|
+++ |
||||||
|
|
||||||
|
# Annotations and labels for alerting rules |
||||||
|
|
||||||
|
Annotations and labels help customize alert messages so that you can quickly identify the service or application that needs attention. |
||||||
|
|
||||||
|
## Annotations |
||||||
|
|
||||||
|
Annotations are key-value pairs that provide additional meta-information about an alert. For example: a description, a summary, and runbook URL. These are displayed in rule and alert details in the UI and can be used in contact type message templates. Annotations can also be templated, for example `Instance {{ $labels.instance }} down` will have the evaluated `instance` label value added for every alert this rule produces. |
||||||
|
|
||||||
|
## Labels |
||||||
|
|
||||||
|
Labels are key-value pairs that categorize or identify an alert. Labels are used to match alerts in silences or match and groups alerts in notification policies. Labels are also shown in rule or alert details in the UI and can be used in contact type message templates. For example, you can add a `severity` label, then configure a separate notification policy for each severity. You can also add, for example, a `team` label and configure notification policies specific to the team or silence all alerts for a particular team. Labels can also be templated like annotations, for example, `{{ $labels.namespace }}/{{ $labels.job }}` will produce a new rule label that will have the evaluated `namespace` and `job` label value added for every alert this rule produces. The rule labels take precedence over the labels produced by the query/condition. |
||||||
|
|
||||||
|
{{< figure src="/static/img/docs/alerting/unified/rule-edit-details-8-0.png" max-width="550px" caption="Alert details" >}} |
||||||
|
|
||||||
|
#### Template variables |
||||||
|
|
||||||
|
The following template variables are available when expanding annotations and labels. |
||||||
|
|
||||||
|
| Name | Description | |
||||||
|
| ------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | |
||||||
|
| $labels | The labels from the query or condition. For example, `{{ $labels.instance }}` and `{{ $labels.job }}`. This is unavailable when the rule uses a classic condition. | |
||||||
|
| $values | The values of all reduce and math expressions that were evaluated for this alert rule. For example, `{{ $values.A }}`, `{{ $values.A.Labels }}` and `{{ $values.A.Value }}` where `A` is the `refID` of the expression. This is unavailable when the rule uses a [classic condition]({{< relref "./create-grafana-managed-rule/#single-and-multi-dimensional-rule" >}}) | |
||||||
|
| $value | The value string of the alert instance. For example, `[ var='A' labels={instance=foo} value=10 ]`. | |
||||||
|
|
||||||
|
#### Template functions |
||||||
|
|
||||||
|
The following template functions are available when expanding annotations and labels. |
||||||
|
|
||||||
|
| Name | Argument | Return | Description | |
||||||
|
| ------------------ | -------------------------- | ---------------------- | ---------------------------------------------------------------------------------------------------------------------------------- | |
||||||
|
| humanize | number or string | string | Converts a number to a more readable format, using metric prefixes. | |
||||||
|
| humanize1024 | number or string | string | Like humanize, but uses 1024 as the base rather than 1000. | |
||||||
|
| humanizeDuration | number or string | string | Converts a duration in seconds to a more readable format. | |
||||||
|
| humanizePercentage | number or string | string | Converts a ratio value to a fraction of 100. | |
||||||
|
| humanizeTimestamp | number or string | string | Converts a Unix timestamp in seconds to a more readable format. | |
||||||
|
| title | string | string | strings.Title, capitalises first character of each word. | |
||||||
|
| toUpper | string | string | strings.ToUpper, converts all characters to upper case. | |
||||||
|
| toLower | string | string | strings.ToLower, converts all characters to lower case. | |
||||||
|
| match | pattern, text | boolean | regexp.MatchString Tests for a unanchored regexp match. | |
||||||
|
| reReplaceAll | pattern, replacement, text | string | Regexp.ReplaceAllString Regexp substitution, unanchored. | |
||||||
|
| graphLink | expr | string | Not supported | |
||||||
|
| tableLink | expr | string | Not supported | |
||||||
|
| args | []interface{} | map[string]interface{} | Converts a list of objects to a map with keys, for example, arg0, arg1. Use this function to pass multiple arguments to templates. | |
||||||
|
| externalURL | nothing | string | Returns a string representing the external URL. | |
||||||
|
| pathPrefix | nothing | string | Returns the path of the external URL. | |
||||||
|
| tmpl | string, []interface{} | nothing | Not supported | |
||||||
|
| safeHtml | string | string | Not supported | |
||||||
|
| query | query string | []sample | Not supported | |
||||||
|
| first | []sample | sample | Not supported | |
||||||
|
| label | label, sample | string | Not supported | |
||||||
|
| strvalue | []sample | string | Not supported | |
||||||
|
| value | sample | float64 | Not supported | |
||||||
|
| sortByLabel | label, []samples | []sample | Not supported | |
||||||
@ -1,34 +1,39 @@ |
|||||||
+++ |
+++ |
||||||
title = "Edit Cortex or Loki rule groups and namespaces" |
title = "Cortex or Loki rule groups and namespaces" |
||||||
description = "Edit Cortex or Loki rule groups and namespaces" |
description = "Edit Cortex or Loki rule groups and namespaces" |
||||||
keywords = ["grafana", "alerting", "guide", "group", "namespace", "cortex", "loki"] |
keywords = ["grafana", "alerting", "guide", "group", "namespace", "cortex", "loki"] |
||||||
weight = 400 |
weight = 405 |
||||||
+++ |
+++ |
||||||
|
|
||||||
# Edit Cortex or Loki rule groups and namespaces |
# Cortex or Loki rule groups and namespaces |
||||||
|
|
||||||
You can rename Cortex or Loki rule namespaces and groups and edit group evaluation intervals. |
A namespace contains one or more groups. The rules within a group are run sequentially at a regular interval. The default interval is one (1) minute. You can rename Cortex or Loki rule namespaces and groups, and edit group evaluation intervals. |
||||||
|
|
||||||
|
 |
||||||
|
|
||||||
|
{{< figure src="/static/img/docs/alerting/unified/rule-list-edit-cortex-loki-icon-8-2.png" max-width="550px" caption="Alert details" >}} |
||||||
|
|
||||||
## Rename a namespace |
## Rename a namespace |
||||||
|
|
||||||
A namespace contains one or more groups. To rename a namespace, find a group that belongs to the namespace, then update the namespace. |
To rename a namespace: |
||||||
|
|
||||||
1. Hover your cursor over the Alerting (bell) icon in the side menu. |
1. In the Grafana menu, click the **Alerting** (bell) icon to open the Alerting page listing existing alerts. |
||||||
1. Locate a group that belongs to the namespace you want to edit and click the edit (pen) icon. |
1. Find a Cortex or Loki managed rule with the group that belongs to the namespace you want to edit. |
||||||
|
1. Click the **Edit** (pen) icon. |
||||||
1. Enter a new name in the **Namespace** field, then click **Save changes**. |
1. Enter a new name in the **Namespace** field, then click **Save changes**. |
||||||
|
|
||||||
A new namespace is created and all groups are copied into this namespace from the old one. The old namespace is deleted. |
A new namespace is created and all groups are copied into this namespace from the old one. The old namespace is deleted. |
||||||
|
|
||||||
## Rename rule group or change rule group evaluation interval |
## Rename rule group or change the rule group evaluation interval |
||||||
|
|
||||||
The rules within a group are run sequentially at a regular interval, the default interval is one (1) minute. You can modify this interval using the following instructions. |
The rules within a group are run sequentially at a regular interval, the default interval is one (1) minute. You can modify this interval using the following instructions. |
||||||
|
|
||||||
1. Hover your cursor over the Alerting (bell) icon in the side menu. |
1. n the Grafana menu, click the **Alerting** (bell) icon to open the Alerting page listing existing alerts. |
||||||
1. Find the group you want to edit and click the edit (pen) icon. |
1. Find a Cortex or Loki managed rule with the group you want to edit. |
||||||
|
1. Click **Edit** (pen) icon. |
||||||
1. Modify the **Rule group** and **Rule group evaluation interval** information as necessary. |
1. Modify the **Rule group** and **Rule group evaluation interval** information as necessary. |
||||||
1. Click **Save changes**. |
1. Click **Save changes**. |
||||||
|
|
||||||
If you remaned the group, a new group is created that has all the rules from the old group, and the old group deleted. |
When you rename the group, a new group with all the rules from the old group is created. The old group is deleted. |
||||||
|
|
||||||
 |
|
||||||
 |
 |
||||||
|
|||||||
@ -1,54 +1,55 @@ |
|||||||
+++ |
+++ |
||||||
title = "View alert rules" |
title = "Manage alerting rules" |
||||||
description = "View alert rules" |
description = "Manage alerting rules" |
||||||
keywords = ["grafana", "alerting", "guide", "rules", "view"] |
keywords = ["grafana", "alerting", "guide", "rules", "view"] |
||||||
weight = 400 |
weight = 402 |
||||||
+++ |
+++ |
||||||
|
|
||||||
# View alert rules |
# Manage alerting rules |
||||||
|
|
||||||
To view alerts: |
The Alerting page lists existing Grafana 8 alerting rules. By default, rules are grouped by types of data sources. The Grafana section lists all Grafana managed rules. Alerting rules for Prometheus compatible data sources are also listed here. You can view alerting rules for Prometheus compatible data sources but you cannot edit them. |
||||||
|
|
||||||
1. In the Grafana menu hover your cursor over the Alerting (bell) icon. |
The Cortex/Loki rules section lists all rules for external Prometheus or Loki data sources. Cloud alerting rules are also listed in this section. |
||||||
1. Click **Alert Rules**. You can see all configured Grafana alert rules as well as any rules from Loki or Prometheus data sources. |
|
||||||
By default, the group view is shown. You can toggle between group or state views by clicking the relevant **View as** buttons in the options area at the top of the page. |
|
||||||
|
|
||||||
### Group view |
- [View alerting rules](#view-alerting-rule) |
||||||
|
- [Filter alerting rules](#filter-alerting-rules) |
||||||
Group view shows Grafana alert rules grouped by folder and Loki or Prometheus alert rules grouped by `namespace` + `group`. This is the default rule list view, intended for managing rules. You can expand each group to view a list of rules in this group. Each rule can be further expanded to view its details. Action buttons and any alerts spawned by this rule, and each alert can be further expanded to view its details. |
- [Edit or delete an alerting rule](#edit-or-delete-an-alerting-rule) |
||||||
|
|
||||||
 |
## View alerting rules |
||||||
|
|
||||||
### State view |
To view alerting details: |
||||||
|
|
||||||
State view shows alert rules grouped by state. Use this view to get an overview of which rules are in what state. Each rule can be expanded to view its details. Action buttons and any alerts spawned by this rule, and each alert can be further expanded to view its details. |
1. In the Grafana menu, click the **Alerting** (bell) icon to open the Alerting page. By default, the group view displays. |
||||||
|
1. In **View as**, toggle between group or state views by clicking the relevant option. See [Group view](#group-view) and [State view](#state-view) for more information. |
||||||
|
1. Expand the rule row to view the rule labels, annotations, data sources the rule queries, and a list of alert instances resulting from this rule. |
||||||
|
|
||||||
 |
{{< figure src="/static/img/docs/alerting/unified/rule-details-8-0.png" max-width="650px" caption="Alerting rule details" >}} |
||||||
|
|
||||||
## Filter alert rules |
### Group view |
||||||
|
|
||||||
You can use the following filters to view only alert rules that match specific criteria: |
Group view shows Grafana alert rules grouped by folder and Loki or Prometheus alert rules grouped by `namespace` + `group`. This is the default rule list view, intended for managing rules. You can expand each group to view a list of rules in this group. Expand a rule further to view its details. You can also expand action buttons and alerts resulting from the rule to view their details. |
||||||
|
|
||||||
- **Filter alerts by label -** Search by alert labels using label selectors in the **Search** input. eg: `environment=production,region=~US|EU,severity!=warning` |
{{< figure src="/static/img/docs/alerting/unified/rule-list-group-view-8-0.png" max-width="800px" caption="Alerting grouped view" >}} |
||||||
- **Filter alerts by state -** In **States** Select which alert states you want to see. All others are hidden. |
|
||||||
- **Filter alerts by data source -** Click the **Select data source** and select an alerting data source. Only alert rules that query selected data source will be visible. |
|
||||||
|
|
||||||
## Rule details |
### State view |
||||||
|
|
||||||
A rule row shows the rule state, health, and summary annotation if the rule has one. You can expand the rule row to display rule labels, all annotations, data sources this rule queries, and a list of alert instances spawned from this rule. |
State view shows alert rules grouped by state. Use this view to get an overview of which rules are in what state. Each rule can be expanded to view its details. Action buttons and any alerts generated by this rule, and each alert can be further expanded to view its details. |
||||||
|
|
||||||
 |
{{< figure src="/static/img/docs/alerting/unified/rule-list-state-view-8-0.png" max-width="800px" caption="Alerting state view" >}} |
||||||
|
|
||||||
### Edit or delete rule |
## Filter alerting rules |
||||||
|
|
||||||
Grafana rules can only be edited or deleted by users with Edit permissions for the folder which contains the rule. Prometheus or Loki rules can be edited or deleted by users with Editor or Admin roles. |
To filter alerting rules: |
||||||
|
|
||||||
To edit or delete a rule: |
- From **Select data sources**, select a data source. You can see alerting rules that query the selected data source. |
||||||
|
- In the **Search by label**, enter search criteria using label selectors. For example, `environment=production,region=~US|EU,severity!=warning`. |
||||||
|
- From **Filter alerts by state**, select an alerting state you want to see. You can see alerting rules that match the state. Rules matching other states are hidden. |
||||||
|
|
||||||
1. Expand this rule to reveal rule controls. |
## Edit or delete an alerting rule |
||||||
1. Click **Edit** to go to the rule editing form. Make changes following [instructions listed here]({{< relref "./create-grafana-managed-rule.md" >}}). |
|
||||||
1. Click **Delete"** to delete a rule. |
|
||||||
|
|
||||||
## Opt-out a Loki or Prometheus data source |
Grafana managed alerting rules can only be edited or deleted by users with Edit permissions for the folder storing the rules. Alerting rules for an external Cortex or Loki instance can be edited or deleted by users with Editor or Admin roles. |
||||||
|
To edit or delete a rule: |
||||||
|
|
||||||
If you do not want rules to be loaded from a Prometheus or Loki data source, go to its settings page and clear the **Manage alerts via Alerting UI** checkbox. |
1. Expand a rule row until you can see the rule controls of **View**, **Edit**, and **Delete**. |
||||||
|
1. Click **Edit** to open the create rule page. Make updates following instructions in [Create a Grafana managed alerting rule]({{< relref "./create-grafana-managed-rule.md" >}}) or [Create a Cortex or Loki managed alerting rule]({{< relref "./create-cortex-loki-managed-rule.md" >}}). |
||||||
|
1. Click **Delete** to delete a rule. |
||||||
|
|||||||
@ -1,35 +0,0 @@ |
|||||||
+++ |
|
||||||
title = "State and Health of alerting rules" |
|
||||||
description = "State and Health of alerting rules" |
|
||||||
keywords = ["grafana", "alerting", "guide", "state"] |
|
||||||
+++ |
|
||||||
|
|
||||||
# State and Health of alerting rule |
|
||||||
|
|
||||||
The concepts of state and health for alerting rules help you understand, at a glance, several key status indicators about your alerts. Alert state, alerting rule state, and alerting rule health are related, but they each convey subtly different information. |
|
||||||
|
|
||||||
## Alerting rule state |
|
||||||
|
|
||||||
Indicates whether any of the timeseries resulting from evaluation of the alerting rule are in an alerting state. Alerting rule state only requires a single alerting instance to be in a pending or firing state for the alerting rule state to not be normal. |
|
||||||
|
|
||||||
- Normal: none of the timeseries returned are in an alerting state. |
|
||||||
- Pending: at least one of the timeseries returned are in a pending state. |
|
||||||
- Firing: at least one of the timeseries returned are in an alerting state. |
|
||||||
|
|
||||||
## Alert state |
|
||||||
|
|
||||||
Alert state is an indication of the output of the alerting evaluation engine. |
|
||||||
|
|
||||||
- Normal: the condition for the alerting rule has evaluated to **false** for every timeseries returned by the evaluation engine. |
|
||||||
- Alerting: the condition for the alerting rule has evaluated to **true** for at least one timeseries returned by the evaluation engine and the duration, if set, **has** been met or exceeded. |
|
||||||
- Pending: the condition for the alerting rule has evaluated to **true** for at least one timeseries returned by the evaluation engine and the duration, if set, **has not** been met or exceeded. |
|
||||||
- NoData: the alerting rule has not returned a timeseries, all values for the timeseries are null, or all values for the timeseries are zero. |
|
||||||
- Error: There was an error encountered when attempting to evaluate the alerting rule. |
|
||||||
|
|
||||||
## Alerting rule health |
|
||||||
|
|
||||||
Indicates the status of alerting rule evaluation. |
|
||||||
|
|
||||||
- Ok: the rule is being evaluated, data is being returned, and no errors have been encountered. |
|
||||||
- Error: an error was encountered when evaluating the alerting rule. |
|
||||||
- NoData: at least one of the timeseries returned during evaluation is in a NoData state. |
|
||||||
@ -0,0 +1,26 @@ |
|||||||
|
+++ |
||||||
|
title = "What's new in Grafana 8 alerting" |
||||||
|
description = "What's New with Grafana 8 Alerts" |
||||||
|
keywords = ["grafana", "alerting", "guide"] |
||||||
|
weight = 114 |
||||||
|
+++ |
||||||
|
|
||||||
|
# What's new in Grafana 8 alerting |
||||||
|
|
||||||
|
Grafana 8.0 alerting has several enhancements over legacy dashboard alerting. |
||||||
|
|
||||||
|
## Multi-dimensional alerting |
||||||
|
|
||||||
|
You can now create alerts that give you system-wide visibility with a single alerting rule. Generate multiple alert instances from a single alert rule. For example, you can create a rule to monitor the disk usage of multiple mount points on a single host. The evaluation engine returns multiple time series from a single query, with each time series identified by its label set. |
||||||
|
|
||||||
|
## Create alerts outside of Dashboards |
||||||
|
|
||||||
|
Unlike legacy dashboard alerts, Grafana 8 alerts allow you to create queries and expressions that combine data from multiple sources in unique ways. You can still link dashboards and panels to alerting rules using their ID and quickly troubleshoot the system under observation. |
||||||
|
|
||||||
|
## Create Loki and Cortex alerting rules |
||||||
|
|
||||||
|
In Grafana 8 alerting, you can manage Loki and Cortex alerting rules using the same UI and API as your Grafana managed alerts. |
||||||
|
|
||||||
|
## View and search for alerts from Prometheus compatible data sources |
||||||
|
|
||||||
|
Alerts for Prometheus compatible data sources are now listed under the Grafana alerts section. You can search for labels across multiple data sources to quickly find relevant alerts. |
||||||
@ -0,0 +1,13 @@ |
|||||||
|
+++ |
||||||
|
title = "Alerting fundamentals" |
||||||
|
aliases = ["/docs/grafana/latest/alerting/metrics/"] |
||||||
|
weight = 120 |
||||||
|
+++ |
||||||
|
|
||||||
|
# Alerting fundamentals |
||||||
|
|
||||||
|
This section covers the fundamental concepts of Grafana 8 alerting. |
||||||
|
|
||||||
|
- [Alertmanager]({{< relref "./alertmanager.md" >}}) |
||||||
|
- [State and health of alerting rules]({{< relref "./state-and-health.md" >}}) |
||||||
|
- [Evaluating Grafana managed alerts]({{< relref "./evaluate-grafana-alerts.md" >}}) |
||||||
@ -0,0 +1,17 @@ |
|||||||
|
+++ |
||||||
|
title = "Alertmanager" |
||||||
|
aliases = ["/docs/grafana/latest/alerting/metrics/"] |
||||||
|
weight = 116 |
||||||
|
+++ |
||||||
|
|
||||||
|
# Alertmanager |
||||||
|
|
||||||
|
The Alertmanager helps both group and manage alert rules, adding a layer of orchestration on top of the alerting engines. To learn more, see [Prometheus Alertmanager documentation](https://prometheus.io/docs/alerting/latest/alertmanager/). |
||||||
|
|
||||||
|
Grafana includes built-in support for Prometheus Alertmanager. By default, notifications for Grafana managed alerts are handled by the embedded Alertmanager that is part of core Grafana. You can configure the Alertmanager's contact points, notification policies, silences, and templates from the alerting UI by selecting the `Grafana` option from the Alertmanager drop-down. |
||||||
|
|
||||||
|
> **Note:** Before v8.2, the configuration of the embedded Alertmanager was shared across organizations. If you are on an older Grafana version, we recommend that you use Grafana 8 Alerts only if you have one organization. Otherwise, your contact points are visible to all organizations. |
||||||
|
|
||||||
|
Grafana 8 alerting added support for external Alertmanager configuration. When you add an [Alertmanager data source]({{< relref "../../../datasources/alertmanager.md" >}}), the Alertmanager drop-down shows a list of available external Alertmanager data sources. Select a data source to create and manage alerting for standalone Cortex or Loki data sources. |
||||||
|
|
||||||
|
{{< figure max-width="40%" src="/static/img/docs/alerting/unified/contact-points-select-am-8-0.gif" max-width="250px" caption="Select Alertmanager" >}} |
||||||
@ -0,0 +1,95 @@ |
|||||||
|
+++ |
||||||
|
title = "Alerting on numeric data" |
||||||
|
aliases = ["/docs/grafana/latest/alerting/metrics/"] |
||||||
|
weight = 116 |
||||||
|
+++ |
||||||
|
|
||||||
|
# Alerting on numeric data |
||||||
|
|
||||||
|
This topic describes how Grafana managed alerts are evaluated by the backend engine as well as how Grafana handles alerting on numeric rather than time series data. |
||||||
|
|
||||||
|
- [Alert evaluation](#alert-evaluation) |
||||||
|
- [Alerting on numeric data](#alerting-on-numeric-data) |
||||||
|
|
||||||
|
## Alert evaluation |
||||||
|
|
||||||
|
Grafana managed alerts query the following backend data sources that have alerting enabled: |
||||||
|
|
||||||
|
- built-in data sources or those developed and maintained by Grafana: `Graphite`, `Prometheus`, `Loki`, `InfluxDB`, `Elasticsearch`, |
||||||
|
`Google Cloud Monitoring`, `Cloudwatch`, `Azure Monitor`, `MySQL`, `PostgreSQL`, `MSSQL`, `OpenTSDB`, `Oracle`, and `Azure Monitor` |
||||||
|
- community developed backend data sources with alerting enabled (`backend` and `alerting` properties are set in the [plugin.json]({{< relref "../../../developers/plugins/metadata.md" >}})) |
||||||
|
|
||||||
|
### Metrics from the alerting engine |
||||||
|
|
||||||
|
The alerting engine publishes some internal metrics about itself. You can read more about how Grafana publishes [internal metrics]({{< relref "../../../administration/view-server/internal-metrics.md" >}}). See also, [View alert rules and their current state]({{< relref "../alerting-rules/rule-list.md" >}}). |
||||||
|
|
||||||
|
| Metric Name | Type | Description | |
||||||
|
| ------------------------------------------------- | --------- | ---------------------------------------------------------------------------------------- | |
||||||
|
| `grafana_alerting_alerts` | gauge | How many alerts by state | |
||||||
|
| `grafana_alerting_request_duration` | histogram | Histogram of requests to the Alerting API | |
||||||
|
| `grafana_alerting_active_configurations` | gauge | The number of active, non default Alertmanager configurations for grafana managed alerts | |
||||||
|
| `grafana_alerting_rule_evaluations_total` | counter | The total number of rule evaluations | |
||||||
|
| `grafana_alerting_rule_evaluation_failures_total` | counter | The total number of rule evaluation failures | |
||||||
|
| `grafana_alerting_rule_evaluation_duration` | summary | The duration for a rule to execute | |
||||||
|
| `grafana_alerting_rule_group_rules` | gauge | The number of rules | |
||||||
|
|
||||||
|
## Alerting on numeric data |
||||||
|
|
||||||
|
Among certain data sources numeric data that is not time series can be directly alerted on, or passed into Server Side Expressions (SSE). This allows for more processing and resulting efficiency within the data source, and it can also simplify alert rules. |
||||||
|
When alerting on numeric data instead of time series data, there is no need to reduce each labeled time series into a single number. Instead labeled numbers are returned to Grafana instead. |
||||||
|
|
||||||
|
### Tabular Data |
||||||
|
|
||||||
|
This feature is supported with backend data sources that query tabular data: |
||||||
|
|
||||||
|
- SQL data sources such as MySQL, Postgres, MSSQL, and Oracle. |
||||||
|
- The Azure Kusto based services: Azure Monitor (Logs), Azure Monitor (Azure Resource Graph), and Azure Data Explorer. |
||||||
|
|
||||||
|
A query with Grafana managed alerts or SSE is considered numeric with these data sources, if: |
||||||
|
|
||||||
|
- The "Format AS" option is set to "Table" in the data source query. |
||||||
|
- The table response returned to Grafana from the query includes only one numeric (e.g. int, double, float) column, and optionally additional string columns. |
||||||
|
|
||||||
|
If there are string columns then those columns become labels. The name of column becomes the label name, and the value for each row becomes the value of the corresponding label. If multiple rows are returned, then each row should be uniquely identified their labels. |
||||||
|
|
||||||
|
### Example |
||||||
|
|
||||||
|
For a MySQL table called "DiskSpace": |
||||||
|
|
||||||
|
| Time | Host | Disk | PercentFree | |
||||||
|
| ----------- | ---- | ---- | ----------- | |
||||||
|
| 2021-June-7 | web1 | /etc | 3 | |
||||||
|
| 2021-June-7 | web2 | /var | 4 | |
||||||
|
| 2021-June-7 | web3 | /var | 8 | |
||||||
|
| ... | ... | ... | ... | |
||||||
|
|
||||||
|
You can query the data filtering on time, but without returning the time series to Grafana. For example, an alert that would trigger per Host, Disk when there is less than 5% free space: |
||||||
|
|
||||||
|
```sql |
||||||
|
SELECT Host, Disk, CASE WHEN PercentFree < 5.0 THEN PercentFree ELSE 0 END FROM ( |
||||||
|
SELECT |
||||||
|
Host, |
||||||
|
Disk, |
||||||
|
Avg(PercentFree) |
||||||
|
FROM DiskSpace |
||||||
|
Group By |
||||||
|
Host, |
||||||
|
Disk |
||||||
|
Where __timeFilter(Time) |
||||||
|
``` |
||||||
|
|
||||||
|
This query returns the following Table response to Grafana: |
||||||
|
|
||||||
|
| Host | Disk | PercentFree | |
||||||
|
| ---- | ---- | ----------- | |
||||||
|
| web1 | /etc | 3 | |
||||||
|
| web2 | /var | 4 | |
||||||
|
| web3 | /var | 0 | |
||||||
|
|
||||||
|
When this query is used as the **condition** in an alert rule, then the non-zero will be alerting. As a result, three alert instances are produced: |
||||||
|
|
||||||
|
| Labels | Status | |
||||||
|
| --------------------- | -------- | |
||||||
|
| {Host=web1,disk=/etc} | Alerting | |
||||||
|
| {Host=web2,disk=/var} | Alerting | |
||||||
|
| {Host=web3,disk=/var} | Normal | |
||||||
@ -0,0 +1,30 @@ |
|||||||
|
+++ |
||||||
|
title = "State and health of alerting rules" |
||||||
|
description = "State and Health of alerting rules" |
||||||
|
keywords = ["grafana", "alerting", "guide", "state"] |
||||||
|
aliases = ["/docs/grafana/llatest/alerting/unified-alerting/alerting-rules/state-and-health/"] |
||||||
|
+++ |
||||||
|
|
||||||
|
# State and health of alerting rules |
||||||
|
|
||||||
|
The state and health of alerting rules help you understand several key status indicators about your alerts. There are three key components: alert state, alerting rule state, and alerting rule health. Although related, each component conveys subtly different information. |
||||||
|
|
||||||
|
## Alerting rule state |
||||||
|
|
||||||
|
- **Normal**: None of the time series returned by the evaluation engine is in a Pending or Firing state. |
||||||
|
- **Pending**: At least one time series returned by the evaluation engine is Pending. |
||||||
|
- **Firing**: At least one time series returned by the evaluation engine is Firing. |
||||||
|
|
||||||
|
## Alert state |
||||||
|
|
||||||
|
- **Normal**: Condition for the alerting rule is **false** for every time series returned by the evaluation engine. |
||||||
|
- **Alerting**: Condition of the alerting rule is **true** for at least one time series returned by the evaluation engine. The duration for which the condition must be true before an alert fires, if set, is met or has exceeded. |
||||||
|
- **Pending**: Condition of the alerting rule is **true** for at least one time series returned by the evaluation engine. The duration for which the condition must be true before an alert fires, if set, **has not** been met. |
||||||
|
- **NoData**: the alerting rule has not returned a time series, all values for the time series are null, or all values for the time series are zero. |
||||||
|
- **Error**: Error when attempting to evaluate an alerting rule. |
||||||
|
|
||||||
|
## Alerting rule health |
||||||
|
|
||||||
|
- **Ok**: No error when evaluating an alerting rule. |
||||||
|
- **Error**: Error when evaluating an alerting rule. |
||||||
|
- **NoData**: The absence of data in at least one time series returned during a rule evaluation. |
||||||
@ -1,67 +0,0 @@ |
|||||||
+++ |
|
||||||
title = "Grafana managed alert rules for numeric data" |
|
||||||
description = "Grafana managed alert rules for numeric data" |
|
||||||
keywords = ["grafana", "alerting", "guide", "rules", "create"] |
|
||||||
weight = 400 |
|
||||||
+++ |
|
||||||
|
|
||||||
# Alerting on numeric data |
|
||||||
|
|
||||||
Among certain data sources numeric data that is not time series can be directly alerted on, or passed into Server Side Expressions (SSE). This allows for more processing and resulting efficiency within the data source, and it can also simplify alert rules. |
|
||||||
When alerting on numeric data instead of time series data, there is no need to reduce each labeled time series into a single number. Instead labeled numbers are returned to Grafana instead. |
|
||||||
|
|
||||||
## Tabular Data |
|
||||||
|
|
||||||
This feature is supported with backend data sources that query tabular data: |
|
||||||
|
|
||||||
- SQL data sources such as MySQL, Postgres, MSSQL, and Oracle. |
|
||||||
- The Azure Kusto based services: Azure Monitor (Logs), Azure Monitor (Azure Resource Graph), and Azure Data Explorer. |
|
||||||
|
|
||||||
A query with Grafana managed alerts or SSE is considered numeric with these data sources, if: |
|
||||||
|
|
||||||
- The "Format AS" option is set to "Table" in the data source query. |
|
||||||
- The table response returned to Grafana from the query includes only one numeric (e.g. int, double, float) column, and optionally additional string columns. |
|
||||||
|
|
||||||
If there are string columns then those columns become labels. The name of column becomes the label name, and the value for each row becomes the value of the corresponding label. If multiple rows are returned, then each row should be uniquely identified their labels. |
|
||||||
|
|
||||||
## Example |
|
||||||
|
|
||||||
For a MySQL table called "DiskSpace": |
|
||||||
|
|
||||||
| Time | Host | Disk | PercentFree | |
|
||||||
| ----------- | ---- | ---- | ----------- | |
|
||||||
| 2021-June-7 | web1 | /etc | 3 | |
|
||||||
| 2021-June-7 | web2 | /var | 4 | |
|
||||||
| 2021-June-7 | web3 | /var | 8 | |
|
||||||
| ... | ... | ... | ... | |
|
||||||
|
|
||||||
You can query the data filtering on time, but without returning the time series to Grafana. For example, an alert that would trigger per Host, Disk when there is less than 5% free space: |
|
||||||
|
|
||||||
```sql |
|
||||||
SELECT Host, Disk, CASE WHEN PercentFree < 5.0 THEN PercentFree ELSE 0 END FROM ( |
|
||||||
SELECT |
|
||||||
Host, |
|
||||||
Disk, |
|
||||||
Avg(PercentFree) |
|
||||||
FROM DiskSpace |
|
||||||
Group By |
|
||||||
Host, |
|
||||||
Disk |
|
||||||
Where __timeFilter(Time) |
|
||||||
``` |
|
||||||
|
|
||||||
This query returns the following Table response to Grafana: |
|
||||||
|
|
||||||
| Host | Disk | PercentFree | |
|
||||||
| ---- | ---- | ----------- | |
|
||||||
| web1 | /etc | 3 | |
|
||||||
| web2 | /var | 4 | |
|
||||||
| web3 | /var | 0 | |
|
||||||
|
|
||||||
When this query is used as the **condition** in an alert rule, then the non-zero will be alerting. As a result, three alert instances are produced: |
|
||||||
|
|
||||||
| Labels | Status | |
|
||||||
| --------------------- | -------- | |
|
||||||
| {Host=web1,disk=/etc} | Alerting | |
|
||||||
| {Host=web2,disk=/var} | Alerting | |
|
||||||
| {Host=web3,disk=/var} | Normal | |
|
||||||
@ -0,0 +1,31 @@ |
|||||||
|
--- |
||||||
|
title: Grafana managed alerts |
||||||
|
--- |
||||||
|
|
||||||
|
## Clustering |
||||||
|
|
||||||
|
The current alerting system doesn't support high availability. Alert notifications are not deduplicated and load balancing is not supported between instances; for example, silences from one instance will not appear in the other. |
||||||
|
|
||||||
|
## Alert evaluation |
||||||
|
|
||||||
|
Grafana managed alerts are evaluated by the Grafana backend. Rule evaluations are scheduled, according to the alert rule configuration, and queries are evaluated by an engine that is part of core Grafana. |
||||||
|
|
||||||
|
Alerting rules can only query backend data sources with alerting enabled: |
||||||
|
|
||||||
|
- builtin or developed and maintained by grafana: `Graphite`, `Prometheus`, `Loki`, `InfluxDB`, `Elasticsearch`, |
||||||
|
`Google Cloud Monitoring`, `Cloudwatch`, `Azure Monitor`, `MySQL`, `PostgreSQL`, `MSSQL`, `OpenTSDB`, `Oracle`, and `Azure Data Explorer` |
||||||
|
- any community backend data sources with alerting enabled (`backend` and `alerting` properties are set in the [plugin.json]({{< relref "../../developers/plugins/metadata.md" >}})) |
||||||
|
|
||||||
|
## Metrics from the alerting engine |
||||||
|
|
||||||
|
The alerting engine publishes some internal metrics about itself. You can read more about how Grafana publishes [internal metrics]({{< relref "../../administration/view-server/internal-metrics.md" >}}). See also, [View alert rules and their current state]({{< relref "../../alerting/old-alerting/view-alerts.md" >}}). |
||||||
|
|
||||||
|
| Metric Name | Type | Description | |
||||||
|
| ------------------------------------------- | --------- | ---------------------------------------------------------------------------------------- | |
||||||
|
| `alerting.alerts` | gauge | How many alerts by state | |
||||||
|
| `alerting.request_duration_seconds` | histogram | Histogram of requests to the Alerting API | |
||||||
|
| `alerting.active_configurations` | gauge | The number of active, non default alertmanager configurations for grafana managed alerts | |
||||||
|
| `alerting.rule_evaluations_total` | counter | The total number of rule evaluations | |
||||||
|
| `alerting.rule_evaluation_failures_total` | counter | The total number of rule evaluation failures | |
||||||
|
| `alerting.rule_evaluation_duration_seconds` | summary | The duration for a rule to execute | |
||||||
|
| `alerting.rule_group_rules` | gauge | The number of rules | |
||||||
Loading…
Reference in new issue