The monitoring stack consists of the following components:
- Prometheus is the server that collects metrics from various sources and stores them such that they are available for querying. We will use three exporters: node_exporter, cadvisor and postgres_exporter.
- node_exporter exposes system-level metrics, such as CPU and memory usage.
- cadvisor provides container-level metrics
- postgres_exporter exposes PostgreSQL metrics
- Grafana is the dashboard that fetches data from the Prometheus server and displays it in a customizable UI.
The logging stack consists of the following components:
- Promtail collects logs from various sources, and forwards them to the Loki server for storage.
- Loki efficiently stores the logs, indexes them for fast retrieval, and provides APIs for querying them from the Grafana dashboard.
these are some of the key points of the setup:
- Prometheus scrapes metrics from the exporters. It needs to know their address (localhost + port). The scraping happens every couple of seconds (configurable). Prometheus then exposes the data via an API at localhost:9090.
- The Prometheus exporters Prometheus scrape the data from the log files stored in the system. Therefore the log files need to be mounted as their volumes (as read-only).
- Promtail allows us to add various metadata to logs, such as labels (see the pipelinestages section of _promtail config file). This makes logs easy to query.
- Grafana exposes the dashboard at localhost:3000. Requests to rides.jurajmajerik.com/grafana are reverse-proxied to localhost:3000, returning the dashboard to the user.
global:
scrape_interval: 10s
scrape_configs:
- job_name: 'prometheus'
metrics_path: /metrics
static_configs:
- targets: ['localhost:9090']
- job_name: 'node_exporter'
static_configs:
- targets: ['node_exporter:9100']
- job_name: 'cadvisor'
static_configs:
- targets: ['cadvisor:8080']
- job_name: 'postgres_exporter'
static_configs:
- targets: ['postgres_exporter:9187']
server:
http_listen_port: 9080
grpc_listen_port: 0
positions:
filename: /tmp/positions.yaml
clients:
- url: http://loki:3100/loki/api/v1/push
scrape_configs:
- job_name: local
static_configs:
- targets:
- localhost
labels:
job: varlogs
__path__: /var/log/*log
- job_name: docker
docker_sd_configs:
- host: unix:///run/docker.sock
refresh_interval: 5s
pipeline_stages:
- match:
selector: '{job="docker"}'
stages:
- regex:
expression: '.*level=(?P<level>[a-zA-Z]+).*ts=(?P<timestamp>[T\d-:.Z]*).*msg=(?P<msg>[a-zA-Z]+).*err=(?P<err>[a-zA-Z]+)'
- labels:
level:
msg:
err:
- timestamp:
format: RFC3339Nano
source: timestamp
relabel_configs:
- source_labels: ['__meta_docker_container_name']
regex: '/(.*)'
target_label: 'container'
- source_labels: ['level']
target_label: 'level'
auth_enabled: false
server:
http_listen_port: 3100
grpc_listen_port: 9096
common:
path_prefix: /tmp/loki
storage:
filesystem:
chunks_directory: /tmp/loki/chunks
rules_directory: /tmp/loki/rules
replication_factor: 1
ring:
instance_addr: 127.0.0.1
kvstore:
store: inmemory
query_range:
results_cache:
cache:
embedded_cache:
enabled: true
max_size_mb: 100
schema_config:
configs:
- from: 2020-10-24
store: boltdb-shipper
object_store: filesystem
schema: v11
index:
prefix: index_
period: 24h
ruler:
alertmanager_url: http://localhost:9093
analytics:
reporting_enabled: false
pg_stat_activity:
query: "SELECT COUNT(*) AS active_connections, now() as timestamp FROM pg_stat_activity WHERE state = 'active'"
metrics:
- active_connections:
usage: "COUNTER"
description: "Number of active connections"
services:
prometheus:
image: prom/prometheus:latest
container_name: prometheus
ports:
- "9090:9090"
volumes:
- ./prometheus-prod.yml:/etc/prometheus/prometheus.yml
- prometheus-data:/prometheus
restart: unless-stopped
command:
- "--config.file=/etc/prometheus/prometheus.yml"
- "--web.external-url=/prometheus/"
- "--web.route-prefix=/prometheus/"
node_exporter:
image: quay.io/prometheus/node-exporter:latest
container_name: node_exporter
command:
- '--path.rootfs=/host'
pid: host
restart: unless-stopped
volumes:
- '/:/host:ro,rslave'
cadvisor:
image: gcr.io/cadvisor/cadvisor:v0.45.0
container_name: cadvisor
ports:
- "9092:8080"
volumes:
- /:/rootfs:ro
- /var/run:/var/run:ro
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
- /dev/disk/:/dev/disk:ro
devices:
- /dev/kmsg
restart: unless-stopped
privileged: true
grafana:
image: grafana/grafana-oss:latest
user: "0:0"
container_name: grafana
ports:
- "3000:3000"
volumes:
- grafana-data:/var/lib/grafana
- ./grafana.ini:/etc/grafana/grafana.ini
- /etc/letsencrypt:/etc/letsencrypt
restart: unless-stopped
volumes:
prometheus-data:
driver: local
grafana-data:
driver: local
3. Configure Volume Mounting for cAdvisor (because after reboot, WSL2 system deleted symlink mounting)
sudo mount -t drvfs '\\wsl$\docker-desktop-data\data\docker' /mnt/docker_data
docker-compose up -d grafana prometheus node_exporter cadvisor
http://host.docker.internal:9090/prometheus/
http://host.docker.internal:9090/prometheus/metrics
http://localhost:9090/prometheus/
http://host.docker.internal:3100/metrics
http://host.docker.internal:3100/ready
- Add Prometheus Data Source
Name: Prometheus
Prometheus server URL: http://host.docker.internal:9090/prometheus/ or http://host.docker.internal:9090/prometheus
-
Everything else is default, click Save & Test
-
Import Dashboard (check Grafana Labs)
-
Add Loki Data Source
Name: Loki
Prometheus server URL: http://loki:3100
- Loki Query
According to this Link, you must add "
| logfmt
"
{jobs="varlogs"} | logfmt | level=`error` |= "docker"
- Install Plugin in each target machines that running a container
docker plugin install grafana/loki-docker-driver:2.8.2 --alias loki --grant-all-permissions
- check plugin staus
docker plugin ls
- Change the default logging driver
If you want the Loki logging driver to be the default for all containers, change Docker’s daemon.json file (located in /etc/docker on Linux) and set the value of log-driver to loki:
Note:
The Loki logging driver still uses the json-log driver in combination with sending logs to Loki, this is mainly useful to keep the docker logs command working. You can adjust file size and rotation using the respective log option max-size and max-file. Keep in mind that default values for these options are not taken from json-log configuration. You can deactivate this behavior by setting the log option no-file to true.
{
"log-driver": "loki",
"log-opts": {
// "loki-url": "https://<user_id>:<password>@logs-us-west1.grafana.net/loki/api/v1/push",
"loki-url": "http://localhost:3100/loki/api/v1/push",
"loki-batch-size": "400"
}
}
- Monitoring Stack
- fix cadvisor mounting with WSL2 + Docker Desktop Enginer in /var/lib/docker
- choose between auto mounting docker_data or execute sudo mount -t drvfs
- Logging Stack
- add logging level labels "
error
", "debug
", "warn
", and "info
"
- add logging level labels "
- Mysql Exporter
- Up to Cloud Infra
- Config in Cloud Infra
- Monitoring
- add/install/deploy the EXPORTERS (node_exporter, cAdvisor, etc) in TARGET_SERVER
- connect it to PROMETHEUS MASTER SERVER (add the target ip in the prometheus.yml)
- Logging
- add/install docker driver plugin in the TARGET_SERVER (REMOTE VMs) to push all logging containers - according to this link
docker plugin install grafana/loki-docker-driver:2.8.2 --alias loki --grant-all-permissions
- [ ]
- add/install docker driver plugin in the TARGET_SERVER (REMOTE VMs) to push all logging containers - according to this link
- Monitoring