You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Apr 22, 2020. It is now read-only.
How will this affect metrics that have SUM() aggregates? Sampling out some data points will skew the monitoring.
For examples, a metric that reports throughput (req/sec) is often aggregated by summing all the data points. If some of them are suppressed due to sampling, the result will be a drop in throughput.
Was this considered?
There is support for critical_checks which can be used to overcome sampling. The idea in general is to throttle spammy workers and avoid collateral damage.
The critical check flag is currently not supported in ZMON, but if there then we have the notion of system/out-of-the-box/critical checks that are essential to the health of the systems.
@lmineiro Created another issue for more aligned sampling as a feature in ZMON, with more deterministic results.
Also re-editing this one to clarify its purpose.
mohabusama
changed the title
Implement sampling for check results metrics
Implement throttling for check results metrics
Dec 4, 2018
By enabling throttling(via random sampling), we can roughly control the percentage of metrics to be stored in ZMON timeseries database per worker.
The worker could get the sampling rate from:
There should be a list of critical checks that can be excluded from sampling. Critical checks can be defined in:
A flag in the checkThe text was updated successfully, but these errors were encountered: