Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collector for storage content metrics ctime, size and verification state #317

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

pehlert
Copy link

@pehlert pehlert commented Feb 9, 2025

Hi all,

This PR adds a few metrics for storage contents:

Example:

# HELP pve_storage_contents_ctime Proxmox storage contents ctime
# TYPE pve_storage_contents_ctime gauge
pve_storage_contents_ctime{content="backup",node="srv02",storage="pbs",vmid="100",volid="pbs:backup/vm/100/2025-02-09T02:33:33Z"} 1.739068413e+09
pve_storage_contents_ctime{content="backup",node="srv02",storage="pbs",vmid="100",volid="pbs:backup/vm/100/2025-02-09T07:36:08Z"} 1.739086568e+09

# HELP pve_storage_contents_bytes Proxmox storage contents size in bytes
# TYPE pve_storage_contents_bytes gauge
pve_storage_contents_bytes{content="backup",node="srv02",storage="pbs",vmid="100",volid="pbs:backup/vm/100/2025-02-09T02:33:33Z"} 2.17109760043e+011
pve_storage_contents_bytes{content="backup",node="srv02",storage="pbs",vmid="100",volid="pbs:backup/vm/100/2025-02-09T07:36:08Z"} 2.17109760045e+011

# HELP pve_storage_contents_verification Proxmox storage contents verification state
# TYPE pve_storage_contents_verification gauge
pve_storage_contents_verification{content="backup",node="srv02",state="ok",storage="pbs",vmid="100",volid="pbs:backup/vm/100/2025-02-09T02:33:33Z"} 1.0
pve_storage_contents_verification{content="backup",node="srv02",state="ok",storage="pbs",vmid="100",volid="pbs:backup/vm/100/2025-02-09T07:36:08Z"} 1.0

We are using these in our Prometheus alerting to validate that backups have been created on time (by checking the newest ctime and making sure it is in sync with the respective VM's policies.

I decided to expose those under a separate endpoint which is scraped like this:

GET /storage?target=10.10.10.10&node=srv02&storage=pbs

This allows for a flexible scraping of multiple storages and nodes.

It seems the API user needs the PVEDatastoreAdmin role for the scraping to work, I will try to narrow down the requirements more.

Please let me know what you think about the approach of having a separate endpoint and if you'd be interested in merging this. I'd happily add some documentation to the PR then.

@znerol
Copy link
Member

znerol commented Feb 9, 2025

Thank you for filing this PR.

I'm not too happy with the metric structure though. This project follows the common practice of exposing an X_info gauge with a complete set of labels. The actual metrics only have an id label. The following excerpt from the metrics example in the readme demonstrates the approach nicely:

# HELP pve_uptime_seconds Number of seconds since the last boot
# TYPE pve_uptime_seconds gauge
pve_uptime_seconds{id="qemu/100"} 315039.0
[...]
# HELP pve_guest_info VM/CT info
# TYPE pve_guest_info gauge
pve_guest_info{id="qemu/100",name="samplevm1",node="proxmox",type="qemu"} 1.0

Hence, try to come up with an id label which is unique across a cluster. Then use that in the actual metrics as the only label. Additionally add an pve_storage_contents_info metric with all the metadata. See also #243 for more hints.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants