Name	Name	Last commit message	Last commit date
parent directory ..
images	images
README.md	README.md
cluster.sh	cluster.sh
grafana-loki.yaml	grafana-loki.yaml
grafana.yaml	grafana.yaml
infra.yaml	infra.yaml

Setting Up Alerts in Grafana for Latency and Errors Panels
Deployment
Tip for Infrastructure as Code (IaC) with Ansible
Final Objective

Setting Up Alerts in Grafana for Latency and Errors Panels

Grafana alerts can notify you when a certain metric exceeds a defined threshold. Here’s how to create alerts for Latency and Errors in your Grafana dashboard.

From infrastucture point of view all the elements are going to be the same because the new alerts are going to be inside Grafana.

1. Loki changes

Explicar que es y porque se usa https://grafana.com/docs/loki/latest/alert/

    ruler:
      storage:
        type: local
        local:
          directory: /loki/rules
      rule_path: /tmp/rules/fake/
      alertmanager_url: http://localhost:9093
      ring:
        kvstore:
          store: inmemory
      enable_api: true

2. Alert for Latency Panel

To create an alert for the Latency panel, follow these steps:

Open the Latency Panel: In your Grafana dashboard, go to the Latency panel.
Edit the Panel: Click on the panel title, select Edit.
Go to the Alerts Tab: Switch to the Alerts tab within the panel editor.
Create a New Alert:
- Click on Create Alert.
- Define query and alert condition: Set a condition that checks the latency over a certain threshold. For example:
```
sum(rate(otel_collector_span_metrics_duration_milliseconds_bucket[5m])) by (span_name)
```
- Time Range: Set the evaluation period to check latency, such as over the last 10 minutes.
Expressions:
- input: Set A to use the expression above.
- is above: 1.4, 1 400 miliseconds.
Set evaluation behavior:
- folder: CReate a folder.
- Evaluation group and interval: create a group
Configure labels and notifications:
- In the Contact point section, choose the notification channel to receive alerts, like Slack, Email, or PagerDuty. So far grafana-default-email is enough.
Save Alert: Save your changes to enable the alert.

This will send a notification when latency exceeds the defined threshold that represented with the red line in the image below

3. Creating Additional Alerts Based on SLOs

Following the SLO definitions, create at least one alert for each SLO.

Endpoint Duration Below 1,400 Milliseconds:
```
sum(rate(otel_collector_span_metrics_duration_milliseconds_bucket[5m])) by (span_name)
```
Set a threshold to trigger an alert if endpoint duration goes above 1,400 milliseconds.
Receive Bytes Below 250,000 Bytes:
```
sum(rate(container_network_receive_bytes_total[5m])) by (container_label_k8s_app)
```
Set a threshold to trigger an alert if received bytes go below 250,000 bytes.
Availability with Fewer Than 5,500 Errors:
```
sum(count_over_time({service_name="unknown_service"} |= "err" [5m])) by (service_name)
```
Set a threshold to trigger an alert if the number of errors exceeds 5,500.

Deployment

Before deploy all the new staff it's important to clean the changes from the previous exercises and then apply the new settings wih short program like this one:

#!/bin/bash

kubectl delete ns application
kubectl delete ns opentelemetry
kubectl delete ns monitoring
kubectl delete pv --all 
kubectl delete pvc --all 
sleep 5;
echo "-------------------------------------------------------------------------"
echo "Start creating"
echo "-------------------------------------------------------------------------"
kubectl apply -f ../exercise10/storage.yaml;
kubectl apply -f ../exercise10/deployment.yaml;
kubectl apply -f ../exercise10/otel-collector.yaml;
kubectl apply -f ../exercise8/jaeger.yaml;
kubectl apply -f ../exercise9/prometheus.yaml;
kubectl apply -f ./grafana-loki.yaml;
kubectl apply -f ./grafana.yaml;
echo "-------------------------------------------------------------------------"
echo "wait"
echo "-------------------------------------------------------------------------"
sleep 5;
kubectl get pods -A

Tip for Infrastructure as Code (IaC) with Ansible

Tip

A more efficient Infrastructure as Code (IaC) approach can be implemented with Ansible to apply the new configuration and start its service in Minikube. An example of how to structure a YAML playbook to achieve this.

Run the Playbook

ansible-playbook -i ../exercise4.1/ansible_quickstart/inventory.ini infra.yaml
minikube service grafana-service -n monitoring
``

Final Objective

At the end of this document, you should accomplished this:

Important

The idea is to receive an alert if any of the previous thresholds are exceeded. In this link, you will find all the required configurations to validate the results, which should generate something like this:

Eventually, over time, the alerts will begin to trigger as arbitrary conditions have been implemented in the functions goo, foo, and zoo to simulate errors or service degradations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

exercise12

exercise12

README.md

Table of Contents

Setting Up Alerts in Grafana for Latency and Errors Panels

1. Loki changes

2. Alert for Latency Panel

3. Creating Additional Alerts Based on SLOs

Deployment

Tip for Infrastructure as Code (IaC) with Ansible

Final Objective

Files

exercise12

Directory actions

More options

Directory actions

More options

Latest commit

History

exercise12

Folders and files

parent directory

README.md

Table of Contents

Setting Up Alerts in Grafana for Latency and Errors Panels

1. Loki changes

2. Alert for Latency Panel

3. Creating Additional Alerts Based on SLOs

Deployment

Tip for Infrastructure as Code (IaC) with Ansible

Final Objective