Skip to content
This repository has been archived by the owner on Aug 4, 2021. It is now read-only.

Unable to configure High Availability of Prometheus with timescaleDB #49

Open
MohanSaiTeki opened this issue Mar 9, 2020 · 4 comments

Comments

@MohanSaiTeki
Copy link

I am trying to set up the High Availability of Prometheus using timescaleDB with below configurations.

Node exporter

docker run -d -p 9100:9100 quay.io/prometheus/node-exporter

Prometheus

  • prometheus-1
    docker run -it -p 9090:9090 -v /root/prometheus/prometheus1.yml:/etc/prometheus/prometheus.yml prom/prometheus
  • prometheus1.yml

global:
scrape_interval: 5s
evaluation_interval: 10s
scrape_configs:
job_name: prometheus
static_configs:
targets: ['10.128.15.221:9100']
remote_write:
url: "http://10.128.15.221:9201/write"
remote_read:
url: "http://10.128.15.221:9201/read"
read_recent: true

  • prometheus-2
    docker run -it -p 9091:9090 -v /root/prometheus/prometheus2.yml:/etc/prometheus/prometheus.yml prom/prometheus
  • prometheus2.yml

global:
scrape_interval: 5s
evaluation_interval: 10s
scrape_configs:
job_name: prometheus
static_configs:
targets: ['10.128.15.221:9100']
remote_write:
url: "http://10.128.15.221:9202/write"
remote_read:
url: "http://10.128.15.221:9202/read"
read_recent: true

Prometheus adapter

  • prometheus-adapter-1
    docker run -it -p 9201:9201 timescale/prometheus-postgresql-adapter:latest -pg-host=10.128.15.221 -pg-password=secret -leader-election-pg-advisory-lock-id=2 -leader-election-pg-advisory-lock-prometheus-timeout=7s

  • prometheus-adapter-2
    docker run -it -p 9202:9201 timescale/prometheus-postgresql-adapter:latest -pg-host=10.128.15.221 -pg-password=secret -leader-election-pg-advisory-lock-id=2 -leader-election-pg-advisory-lock-prometheus-timeout=7s

pg_prometheus

docker run --name pg_prometheus -e POSTGRES_PASSWORD=secret -it -p 5432:5432 timescale/pg_prometheus:latest-pg11 postgres -csynchronous_commit=off

When I spin up, everything is working fine with the below status.

  • prometheus-adapter-1 -> Leader
    log

{"caller":"log.go:27","count":100,"duration":0.007022144,"level":"debug","msg":"Wrote samples","ts":"2020-03-09T10:02:01.146Z"}
{"caller":"log.go:27","count":100,"duration":0.007113201,"level":"debug","msg":"Wrote samples","ts":"2020-03-09T10:02:06.119Z"}
{"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":100,"ts":"2020-03-09T10:02:06.119Z"}
{"caller":"log.go:27","count":100,"duration":0.006514815,"level":"debug","msg":"Wrote samples","ts":"2020-03-09T10:02:06.128Z"}
{"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":200,"ts":"2020-03-09T10:02:06.128Z"}
{"caller":"log.go:27","count":100,"duration":0.00611504,"level":"debug","msg":"Wrote samples","ts":"2020-03-09T10:02:06.136Z"}
{"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":100,"ts":"2020-03-09T10:02:06.136Z"}
{"caller":"log.go:27","count":100,"duration":0.006294438,"level":"debug","msg":"Wrote samples","ts":"2020-03-09T10:02:06.144Z"}

  • prometheus-adapter-2 -> Not a leader
    log

{"caller":"log.go:27","level":"debug","msg":"Election id 1: Instance is not a leader. Can't write data","ts":"2020-03-09T10:01:33.135Z"}
{"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:01:33.135Z"}
{"caller":"log.go:27","level":"debug","msg":"Election id 1: Instance is not a leader. Can't write data","ts":"2020-03-09T10:01:33.138Z"}
{"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:01:33.138Z"}
{"caller":"log.go:27","level":"debug","msg":"Election id 1: Instance is not a leader. Can't write data","ts":"2020-03-09T10:01:33.140Z"}
{"caller":"log.go:27","level":"debug","msg":"Election id 1: Instance is not a leader. Can't write data","ts":"2020-03-09T10:01:38.133Z"}
{"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:01:38.133Z"}
{"caller":"log.go:27","level":"debug","msg":"Election id 1: Instance is not a leader. Can't write data","ts":"2020-03-09T10:01:38.135Z"}

But when I stop the prometheus-1, prometheus-adapter-2 is not picking the leadership. Please find the below logs for adapters.

prometheus-adapter-1

{"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":100,"ts":"2020-03-09T10:29:56.513Z"}
{"caller":"log.go:27","count":93,"duration":0.005575618,"level":"debug","msg":"Wrote samples","ts":"2020-03-09T10:29:59.668Z"}
{"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":100,"ts":"2020-03-09T10:29:59.668Z"}
{"caller":"log.go:35","level":"warn","msg":"Prometheus timeout exceeded","timeout":"7s","ts":"2020-03-09T10:30:06.960Z"}
{"caller":"log.go:35","level":"warn","msg":"Scheduled election is paused. Instance is removed from election pool.","ts":"2020-03-09T10:30:06.960Z"}
{"caller":"log.go:31","level":"info","msg":"Instance is no longer a leader","ts":"2020-03-09T10:30:06.962Z"}
{"caller":"log.go:27","level":"debug","msg":"Scheduled election is paused. Instance can't become a leader until scheduled election is resumed (Prometheus comes up again)","ts":"2020-03-09T10:30:10.958Z"}
{"caller":"log.go:27","level":"debug","msg":"Scheduled election is paused. Instance can't become a leader until scheduled election is resumed (Prometheus comes up again)","ts":"2020-03-09T10:30:15.958Z"}
{"caller":"log.go:27","level":"debug","msg":"Scheduled election is paused. Instance can't become a leader until scheduled election is resumed (Prometheus comes up again)","ts":"2020-03-09T10:30:20.958Z"}
{"caller":"log.go:27","level":"debug","msg":"Scheduled election is paused. Instance can't become a leader until scheduled election is resumed (Prometheus comes up again)","ts":"2020-03-09T10:30:25.958Z"}

prometheus-adapter-2

{"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:30:55.046Z"}
{"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:30:55.047Z"}
{"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:30:55.048Z"}
{"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:31:00.041Z"}
{"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:31:00.041Z"}
{"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:31:00.043Z"}
{"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:31:00.044Z"}
{"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:31:00.045Z"}
{"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:31:00.046Z"}
{"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:31:05.041Z"}
{"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:31:05.041Z"}
{"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:31:05.044Z"}
{"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:31:05.044Z"}
{"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:31:05.046Z"}
{"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:31:05.046Z"}
{"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:31:05.048Z"}
{"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:31:10.041Z"}
{"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:31:10.042Z"}
{"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:31:10.043Z"}
{"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:31:10.044Z"}
{"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:31:10.045Z"}
{"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:31:10.045Z"}
{"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:31:10.046Z"}

But when I stop the prometheus-adapter-1 then prometheus-adapter-2 is picking the leadership.

Another interesting thing is when I again start the promethus-1 then I see "Election id 2: Instance is not a leader. Can't write data" in prometheus-adapter-1 log. Please see the below log.

{"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:33:34.566Z"}
{"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":93,"ts":"2020-03-09T10:33:34.571Z"}
{"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:33:34.576Z"}
{"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:33:34.576Z"}
{"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:33:34.578Z"}
{"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:33:34.579Z"}
{"caller":"log.go:31","level":"info","msg":"Prometheus seems alive. Resuming scheduled election.","ts":"2020-03-09T10:33:34.959Z"}
{"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:33:39.550Z"}
{"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:33:39.551Z"}
{"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:33:39.553Z"}
{"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:33:39.553Z"}
{"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:33:39.555Z"}
{"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:33:39.556Z"}
{"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:33:39.558Z"}
{"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:33:44.551Z"}
{"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:33:44.551Z"}
{"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:33:44.554Z"}
{"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:33:44.554Z"}

So, am I followed any wrong step while setting this. or is this bug?

Please help me to resolve this issue.

@msarm
Copy link

msarm commented Jun 16, 2021

@MohanSai1997 - Were you able to make any progress setting up the HA instance?

@MohanSaiTeki
Copy link
Author

@MohanSai1997 - Were you able to make any progress setting up the HA instance?

This project is SUNSET. Please refer README.md file

@msarm
Copy link

msarm commented Jun 17, 2021

@MohanSai1997 - Were you able to make any progress setting up the HA instance?

This project is SUNSET. Please refer README.md file

Ohh yeah, I see it. thank you!

@Harkishen-Singh
Copy link
Member

https://github.com/timescale/promscale is the project that is recommended to use.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants