-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metricbeat: The beat/stats module will frequently log errors about missing cluster UUIDs #34217
Comments
Pinging @elastic/elastic-agent (Team:Elastic-Agent) |
I'm facing this problem too! Elastic version 8.6.0 |
Same here with 8.6.0 |
I'm facing the same issues with 8.6.0 self managed |
The cluster uuid is required for Stack Monitoring application to properly tie a Beat to its Elasticsearch cluster. This is mainly driven by the business logic of SM, as without this information the application would show an incorrect state for the impacted beat processes. Given this issue should be transient and disappear once Beats successfully connects to ES, is there a need a suppress this warning ? If the issue persists it would surface a deeper problem in the monitored Beat process, and at this point it is valuable to get that logged. Should we consider a lower logging level ? Should the beats API not return a successful response unless it is consistent with its configuration ? |
I think the root cause here is that the Beats lazily connect to Elasticsearch when they have events to send. So Filebeat for example will not connect for the first time until there is data to send. This can lead to valid situations where we are repeatedly seeing this log message because the file being monitored hasn't updated since the last time Filebeat was started. @belimawr and I spoke and a better solution to this problem is likely to make an initial connection attempt as soon as the Beat is initialized so we can grab the cluster UUID and also detect if something is wrong in the output configuration much earlier. |
Generally this log message is harmless and is just log spam, because if the Beat has tried and failed to connect to Elasticsearch there will be other more obvious errors related to that in the logs. |
@cmacknz the importance of the message is not questioned. The problem is the flood of error severity messages in the agent log that creates way too much noise. |
I'll look into reducing the logs occurrence and lowering the severity of the message considering that a failure to connect to the ES output would already be logged |
@cmacknz Is there some way to verify which Beat is still waiting to connect to Elasticsearch? |
All the Beats lazily connect as far as I know, Metricbeat and Filebeat certainly do. If you can modify the Beat code for this experiment, I would just add a log statement when the beats/libbeat/cmd/instance/beat.go Line 1135 in 0587bb0
Without modifying the Beat, in the agent logs you'll see something like the following when a Beat does eventually connect to Elasticsearch: {"log.level":"info","@timestamp":"2023-03-22T08:54:21.468Z","message":"Connection to backoff(elasticsearch(https://$domain.europe-west1.gcp.cloud.es.io:443)) established","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"log-default","type":"log"},"log":{"source":"log-default"},"service.name":"filebeat","ecs.version":"1.6.0","log.logger":"publisher_pipeline_output","log.origin":{"file.line":147,"file.name":"pipeline/client_worker.go"},"ecs.version":"1.6.0"} |
Hi! We're labeling this issue as |
The Elastic agent uses the Metricbeat beat stats module to collect metrics for the Beats it starts. Until those Beats connect to Elasticsearch the agent logs will be full of errors like the one below that aren't particularly helpful. The Beats only obtain a cluster UUID when they publish their first event, so for example if there is a log source that never updates or is slow to change this can appear in the agent logs quite frequently.
This error is coming from this code:
beats/metricbeat/module/beat/stats/stats.go
Lines 79 to 100 in 64f98ca
Why do we need an ES cluster UUID to collect beat stats? Is there a way to bypass this or suppress this warning?
The text was updated successfully, but these errors were encountered: