The open and composable observability and data visualization platform. Visualize metrics, logs, and traces from multiple sources like Prometheus, Loki, Elasticsearch, InfluxDB, Postgres and many more.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 
 
grafana/docs/sources/tutorials/alerting-get-started-pt6/index.md

13 KiB

Feedback Link categories description labels tags title weight killercoda
https://github.com/grafana/tutorials/issues/new [alerting] Create alerts using Prometheus data and link them to your visualizations. [{products [enterprise oss cloud]}] [beginner] Get started with Grafana Alerting - Link alerts to visualizations 67 [{title Get started with Grafana Alerting - Link alerts to visualizations} {description Create alerts using Prometheus data and link them to your visualizations.} {backend [{imageid ubuntu}]}]

This tutorial is a continuation of the Get started with Grafana Alerting - Route alerts using dynamic labels tutorial.

In this tutorial you will learn how to:

  • Link alert rules to time series panels for better visualization
  • View alert annotations directly on dashboards for better context
  • Write Prometheus queries

Before you begin

  • Interactive learning environment

  • Grafana OSS

    • If you opt to run a Grafana stack locally, ensure you have the following applications installed:
      • Docker Compose (included in Docker for Desktop for macOS and Windows)
      • Git

Set up the Grafana stack

To observe data using the Grafana stack, download and run the following files.

  1. Clone the tutorial environment repository.

    git clone https://github.com/tonypowa/grafana-prometheus-alerting-demo.git
    
  2. Change to the directory where you cloned the repository:

    cd grafana-prometheus-alerting-demo
    
  3. Build the Grafana stack:

    docker compose build
    

    {{< docs/ignore >}}

    docker-compose build
    

    {{< /docs/ignore >}}

  4. Bring up the containers:

    docker compose up -d
    

    {{< docs/ignore >}}

    docker-compose up -d
    

    {{< /docs/ignore >}}

    The first time you run docker compose up -d, Docker downloads all the necessary resources for the tutorial. This might take a few minutes, depending on your internet connection.

{{< admonition type="note" >}} If you already have Grafana, Loki, or Prometheus running on your system, you might see errors, because the Docker image is trying to use ports that your local installations are already using. If this is the case, stop the services, then run the command again. {{< /admonition >}}

{{< docs/ignore >}}

NOTE:

If you already have Grafana, Loki, or Prometheus running on your system, you might see errors, because the Docker image is trying to use ports that your local installations are already using. If this is the case, stop the services, then run the command again.

{{< /docs/ignore >}}

Use case: monitoring and alerting for system health with Prometheus and Grafana

In this use case, we focus on monitoring the system's CPU, memory, and disk usage as part of a monitoring setup. The demo app, launches a stack that includes a Python script to simulate metrics, which Grafana collects and visualizes as a time-series visualization.

The script simulates random CPU and memory usage values (10% to 100%) every 10 seconds and exposes them as Prometheus metrics.

Objective

You'll build a time series visualization to monitor CPU and memory usage, define alert rules with threshold-based conditions, and link those alerts to your dashboards to display real-time annotations when thresholds are breached.

Step 1: Create a visualization to monitor metrics

To keep track of these metrics you can set up a visualization for CPU usage and memory consumption. This will make it easier to see how the system is performing.

The time-series visualization supports alert rules to provide more context in the form of annotations and alert rule state. Follow these steps to create a visualization to monitor the application’s metrics.

  1. Log in to Grafana:

  2. Create a time series panel:

    • Navigate to Dashboards.
    • Click + Create dashboard.
    • Click + Add visualization.
    • Select Prometheus as the data source (provided with the demo).
    • Enter a title for your panel, e.g., CPU and Memory Usage.
  3. Add queries for metrics:

    • In the query area, copy and paste the following PromQL query:

      ** switch to Code mode if not already selected **

      flask_app_cpu_usage{instance="flask-prod:5000"}
      
    • Click Run queries.

    This query should display the simulated CPU usage data for the prod environment.

  4. Add memory usage query:

    • Click + Add query.

    • In the query area, paste the following PromQL query:

      flask_app_memory_usage{instance="flask-prod:5000"}
      

    {{< figure src="/media/docs/alerting/cpu-mem-dash.png" max-width="1200px" caption="Time-series panel displaying CPU and memory usage metrics in production." >}}

  5. Click Save dashboard. Name it: cpu-and-memory-metrics.

We have our time-series panel ready. Feel free to combine metrics with labels such as flask_app_cpu_usage{instance=“flask-staging:5000”}, or other labels like deployment.

Step 2: Create alert rules

Follow these steps to manually create alert rules and link them to a visualization.

Create an alert rule for CPU usage

  1. Navigate to Alerts & IRM > Alerting > Alert rules from the Grafana sidebar.
  2. Click + New alert rule rule to create a new alert.

Enter alert rule name

Make it short and descriptive, as this will appear in your alert notification. For instance, cpu-usage .

Define query and alert condition

  1. Select Prometheus data source from the drop-down menu.

  2. In the query section, enter the following query:

    ** switch to Code mode if not already selected **

    flask_app_cpu_usage{instance="flask-prod:5000"}
    
  3. Alert condition

    • Enter 75 as the value for WHEN QUERY IS ABOVE to set the threshold for the alert.

    • Click Preview alert rule condition to run the queries.

      {{< figure src="/media/docs/alerting/alert-condition-details-prod.png" max-width="1200px" caption="Preview of a query returning alert instances in Grafana." >}}

    The query returns the CPU usage of the Flask application in the production environment. In this case, the usage is 86.01%, which exceeds the configured threshold of 75%, causing the alert to fire.

Add folders and labels

  1. In Folder, click + New folder and enter a name. For example: system-metrics . This folder contains our alert rules.

Set evaluation behaviour

  1. Click + New evaluation group. Name it system-usage.
  2. Choose an Evaluation interval (how often the alert will be evaluated). Choose 1m.
  3. Set the pending period to 0s (None), so the alert rule fires the moment the condition is met (this minimizes the waiting time for the demonstration.).
  4. Set Keep firing for to, 0s, so the alert stops firing immediately after the condition is no longer true.

Configure notifications

  • Select a Contact point. If you don’t have any contact points, add a Contact point.

    For a quick test, you can use a public webhook from webhook.site to capture and inspect alert notifications. If you choose this method, select Webhook from the drop-down menu in contact points.

Configure notification message

To link this alert rule to our visualization click Link dashboard and panel

  • Select the folder that contains the dashboard. In this case: system-metrics
  • Select the cpu-and-memory-metrics visualization
  • Click confirm

You have successfully linked this alert rule to your visualization!

When the CPU usage exceeds the defined threshold, an annotation should appear on the graph to mark the event. Similarly, when the alert is resolved, another annotation is added to indicate the moment it returned to normal.

Try adding a second alert rule using the memory usage metric (flask_app_memory_usage{instance="flask-prod:5000"}) to see how combining multiple alerts can enhance your dashboard.

Check how your dashboard looks now that your alert has been linked to your dashboard panel.

Step 3: Visualizing metrics and alert annotations

After the alert rules are linked to visualization, they should appear as health indicators (colored heart icons: a red heart when the alert is in Alerting state, and a green heart when in Normal state) on the linked panel. In addition, annotations provide helpful context, such as the time the alert was triggered.

{{< figure src="/media/docs/alerting/alert-in-panel.png" max-width="1200px" caption="Time series panel displaying health indicators and annotations." >}}

Step 4: Receiving notifications

Finally, as part of the alerting process, you should receive notifications at the associated contact point. If you're receiving alerts via email, the default email template will include two buttons:

  • View dashboard: links to the dashboard that contains the alerting panel

  • View panel: links directly to the individual panel where the alert was triggered

{{< figure src="/media/docs/alerting/email-notification-w-url.png" max-width="1200px" caption="Alert notification with links to panel and dashboard." >}}

Clicking either button opens Grafana with a pre-applied time range relevant to the alert.

By default, this URL includes from and to query parameters that reflect the time window around the alert event (one hour before and after the alert). This helps you land directly in the time window where the alert occurred, making it easier to analyze what happened.

If you want to define a more intentional time range, you can customize your notifications using a notification template. With a template, you can explicitly set from and to values for more precise control over what users see when they follow the dashboard link. The final URL is constructed using a custom annotation (e.g., MyDashboardURL) along with the from and to parameters, which are calculated in the notification template.

Conclusion

You’ve now linked Prometheus-based alert rules to your Grafana visualizations, giving your dashboards real-time context with alert annotations and health indicators. By visualizing alerts alongside metrics, responders can quickly understand what’s happening and when. You also saw how alert notifications can include direct links to the affected dashboard or panel, helping teams jump straight into the right time window for faster troubleshooting.

Have feedback or ideas to improve this tutorial? Let us know.