Hi! 🐤🐤 This repo contains two docker compose "projects".
One is called "client-monitor". It is a simple compose file and associated Promtail config. It is used to connect to the server-monitor.
The other is called "server-monitor". It is a Grafana Loki Prometheus Grafana (GPG) stack with a pre-built dashboard that visualizes a livepeer Orchestrator's block status. There are also several alerts provided with this repo.
- Prometheus Alerts (~/speedy-livepeer/server-monitor/configs/prometheus/rules.yml)
- InstanceDown -> Orchestrator metrics are not reachable
- HighLoadSessionCapacity -> Orchestrator sessions exceeded 85% of the max available sessions.
- ProcessTooManyRestarts -> A process has had more than two restarts within the last 15 minutes. This indicates potential stability issues.
- DailyWinningTicketSummary -> A daily summary of tickets received. This is an approximation due to Prometheus scrape intervals.
- Loki Alerts (~/speedy-livepeer/server-monitor/configs/loki/rules/fake/rules.yml)
- BlockwatchFailure -> Blockwatch errors indicate a potential issue with your Arbitrum RPC endpoint.
- OrchestratorOverloaded -> The Orchestrator is overloaded and is throwing "OrchestratorBusy" errors.
- GasPriceTooHigh -> Gas prices are too high to execute transactions (per Orchestrator configuration). Unlikely to occur since Arbitrum Nitro upgrade in 2022.
- FailedSegementUpload -> May indicate bandwidth issues.
- InsufficientFunds -> The configured address does not have enough funds to operate (e.g. redeem tickets).
- TicketExpired -> Expired tickets were found. You may need to review them manually and potentially mark them as redeemed.
Run the below command but first replace the values prefixed with dollar signs per the following table.
Value to Replace | New Value |
---|---|
$ORCHESTRATOR_IP_DNS | The IP address or DNS hostname of the Orchestrator node. Include the port if using something other than 80 or 443. This only supports one Orchestrator. Update the Prometheus configuration manually to add more. |
$TG_BOT_TOKEN | The token provided by the BotFather from the section Create a Telegram Bot |
$TG_CHAT_ID | The chat id where you want to send alerts from the section Configure Alertmanager Telegram Receiver |
$LOKI_DNS | The DNS hostname for your Loki system on Host Machine 2. Do not supply a port. Port 443 is assumed. |
$LOKI_DNS_Email | The email associated with your DNS provider for the Loki DNS |
curl -sL https://raw.githubusercontent.com/0xspeedybird/livepeer-monitoring-stack/main/server-monitor/bootstrap-server.sh | sudo bash -s -- -i $ORCHESTRATOR_IP_DNS -b $TG_BOT_TOKEN -c $TG_CHAT_ID -d $LOKI_DNS -e $LOKI_DNS_Email
Value to Replace | New Value |
---|---|
$LOKI_DNS | The DNS hostname for your Loki system on Host Machine 2. Do not supply a port. Port 443 is assumed. |
curl -sL https://raw.githubusercontent.com/0xspeedybird/livepeer-monitoring-stack/main/client-monitor/bootstrap-client.sh | sudo bash -s -- -i $LOKI_DNS
See the detailed install guide in this repo.
All scrape configs are located at: /configs/prometheus/prometheus.yml
All Prometheus alert rules configs are located at: /configs/prometheus/rules.yml
Prometheus dashboard located at. ./configs/grafana/provisioning/datasources/prometheus_ds.yml
Dashboards and providers located at: ./configs/grafana/provisioning/dashboards
Plugins are located at: ./configs/grafana/plugins
All Loki alert rules configs are located at: /configs/loki/rules/fake/rules.yml
Mike Zupper - for his awesome support and Livepeer contributions