Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add clustering_max_lag to /metrics #805

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions src/lavinmq/http/controller/prometheus.cr
Original file line number Diff line number Diff line change
Expand Up @@ -264,6 +264,10 @@ module LavinMQ
value: @amqp_server.followers.size,
type: "gauge",
help: "Amount of follower nodes connected"})
writer.write({name: "clustering_max_lag",
value: Config.instance.clustering_max_lag,
type: "gauge",
help: "Max unsynced replicated messages"})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT?

Suggested change
help: "Max unsynced replicated messages"})
help: "Maximum allowed unsynced replicated messages"})

Also, here it says "actions"

property clustering_max_lag = 8192 # number of clustering actions
, not sure if we want to be consistent :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just copied the text from here https://github.com/cloudamqp/lavinmq/blob/main/src/lavinmq/config.cr#L136 🙈
But yeah, we should be consistent!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe easiest to update the comment then, hehe :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah :)
Hmm, this all might be a little confusing. clustering_max_lag is the maximum number of actions that a follower can lag behind, but the lag we show in GUI/metrics is how many actual bytes the follower is lagging behind. So the numbers don't really have much to do with each other... 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that it can be confusing. Could we name it something like max_allowed_cluster_actions, and let lag be lag_bytes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it might be reasonable to rename at least one of them.
Maybe rename
lag -> lag_in_bytes,
max_lag -> max_lag_in_actions,
and add lag_in_actions 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's PRs for the changes mentioned in this thread
#810
#811

@amqp_server.followers.each do |f|
writer.write({name: "follower_lag",
labels: {id: f.id.to_s(36)},
Expand Down
Loading