Metricbeat: The beat/stats module will frequently log errors about missing cluster UUIDs #34217

cmacknz · 2023-01-09T20:15:25Z

The Elastic agent uses the Metricbeat beat stats module to collect metrics for the Beats it starts. Until those Beats connect to Elasticsearch the agent logs will be full of errors like the one below that aren't particularly helpful. The Beats only obtain a cluster UUID when they publish their first event, so for example if there is a log source that never updates or is slow to change this can appear in the agent logs quite frequently.

{"log.level":"error","@timestamp":"2022-12-22T14:26:36.306Z","message":"Error fetching data for metricset beat.stats: monitored beat is using Elasticsearch output but cluster UUID cannot be determined","component":{"binary":"metricbeat","dataset":"elastic_agent.metricbeat","id":"beat/metrics-monitoring","type":"beat/metrics"},"ecs.version":"1.6.0","log.origin":{"file.line":256,"file.name":"module/wrapper.go"},"service.name":"metricbeat","ecs.version":"1.6.0"}

This error is coming from this code:

beats/metricbeat/module/beat/stats/stats.go

Lines 79 to 100 in 64f98ca

    
           func (m *MetricSet) getClusterUUID() (string, error) { 
        
           	state, err := beat.GetState(m.MetricSet) 
        
           	if err != nil { 
        
           		return "", errors.Wrap(err, "could not get state information") 
        
           	} 
        
           	clusterUUID := state.Monitoring.ClusterUUID 
        
           	if clusterUUID != "" { 
        
           		return clusterUUID, nil 
        
           	} 
        
           	if state.Output.Name != "elasticsearch" { 
        
           		return "", nil 
        
           	} 
        
           	clusterUUID = state.Outputs.Elasticsearch.ClusterUUID 
        
           	if clusterUUID == "" { 
        
           		// Output is ES but cluster UUID could not be determined. No point sending monitoring 
        
           		// data with empty cluster UUID since it will not be associated with the correct ES 
        
           		// production cluster. Log error instead. 
        
           		return "", beat.ErrClusterUUID 
        
           	}

Why do we need an ES cluster UUID to collect beat stats? Is there a way to bypass this or suppress this warning?

elasticmachine · 2023-01-09T20:15:27Z

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

JAmorimNeon · 2023-01-19T18:49:26Z

I'm facing this problem too! Elastic version 8.6.0

herbc2 · 2023-01-20T01:32:44Z

Same here with 8.6.0

engarpe · 2023-01-21T22:21:08Z

I'm facing the same issues with 8.6.0 self managed

belimawr · 2023-01-25T10:47:26Z

@cmacknz I'm not quite sure, but there seems to be a related issue that leads to panic: #34384

klacabane · 2023-01-25T12:23:44Z

The cluster uuid is required for Stack Monitoring application to properly tie a Beat to its Elasticsearch cluster. This is mainly driven by the business logic of SM, as without this information the application would show an incorrect state for the impacted beat processes.

Given this issue should be transient and disappear once Beats successfully connects to ES, is there a need a suppress this warning ? If the issue persists it would surface a deeper problem in the monitored Beat process, and at this point it is valuable to get that logged. Should we consider a lower logging level ? Should the beats API not return a successful response unless it is consistent with its configuration ?

cmacknz · 2023-01-25T16:47:22Z

I think the root cause here is that the Beats lazily connect to Elasticsearch when they have events to send. So Filebeat for example will not connect for the first time until there is data to send.

This can lead to valid situations where we are repeatedly seeing this log message because the file being monitored hasn't updated since the last time Filebeat was started.

@belimawr and I spoke and a better solution to this problem is likely to make an initial connection attempt as soon as the Beat is initialized so we can grab the cluster UUID and also detect if something is wrong in the output configuration much earlier.

cmacknz · 2023-01-25T16:48:37Z

Generally this log message is harmless and is just log spam, because if the Beat has tried and failed to connect to Elasticsearch there will be other more obvious errors related to that in the logs.

yevgenytrcloudzone · 2023-01-31T10:04:28Z

@cmacknz the importance of the message is not questioned. The problem is the flood of error severity messages in the agent log that creates way too much noise.

klacabane · 2023-01-31T11:43:37Z

I'll look into reducing the logs occurrence and lowering the severity of the message considering that a failure to connect to the ES output would already be logged

miltonhultgren · 2023-03-16T15:17:07Z

@cmacknz Is there some way to verify which Beat is still waiting to connect to Elasticsearch?
And is there some Beat setup in the default Agent settings that would lazily connect like this?
So that we can check that the error indeed goes away once that Beat has a reason to send its first document.

cmacknz · 2023-03-23T21:53:53Z

All the Beats lazily connect as far as I know, Metricbeat and Filebeat certainly do.

If you can modify the Beat code for this experiment, I would just add a log statement when the clusterUUIDFetchingCallback is registered and another one when it is actually executed.

beats/libbeat/cmd/instance/beat.go

Line 1135 in 0587bb0

func (b *Beat) clusterUUIDFetchingCallback() elasticsearch.ConnectCallback {

Without modifying the Beat, in the agent logs you'll see something like the following when a Beat does eventually connect to Elasticsearch:

{"log.level":"info","@timestamp":"2023-03-22T08:54:21.468Z","message":"Connection to backoff(elasticsearch(https://$domain.europe-west1.gcp.cloud.es.io:443)) established","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"log-default","type":"log"},"log":{"source":"log-default"},"service.name":"filebeat","ecs.version":"1.6.0","log.logger":"publisher_pipeline_output","log.origin":{"file.line":147,"file.name":"pipeline/client_worker.go"},"ecs.version":"1.6.0"}

botelastic · 2024-11-08T22:06:27Z

Hi!
We just realized that we haven't looked into this issue in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1.
Thank you for your contribution!

cmacknz added Agent Team:Elastic-Agent Label for the Agent team Team:Infra Monitoring UI - DEPRECATED Infrastructure Monitoring UI team - DEPRECATED - Use Team:Monitoring labels Jan 9, 2023

gsantoro mentioned this issue Jan 17, 2023

Uplifting Kubernetes Standalone Manifest elastic/elastic-agent#2113

Merged

6 tasks

klacabane mentioned this issue Feb 2, 2023

[beats] reduce log noise #34460

Merged

smith removed the Team:Infra Monitoring UI - DEPRECATED Infrastructure Monitoring UI team - DEPRECATED - Use Team:Monitoring label Nov 9, 2023

botelastic bot added the Stalled label Nov 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metricbeat: The beat/stats module will frequently log errors about missing cluster UUIDs #34217

Metricbeat: The beat/stats module will frequently log errors about missing cluster UUIDs #34217

cmacknz commented Jan 9, 2023

elasticmachine commented Jan 9, 2023

JAmorimNeon commented Jan 19, 2023

herbc2 commented Jan 20, 2023

engarpe commented Jan 21, 2023

belimawr commented Jan 25, 2023

klacabane commented Jan 25, 2023 •

edited

Loading

cmacknz commented Jan 25, 2023

cmacknz commented Jan 25, 2023

yevgenytrcloudzone commented Jan 31, 2023

klacabane commented Jan 31, 2023

miltonhultgren commented Mar 16, 2023 •

edited

Loading

cmacknz commented Mar 23, 2023

botelastic bot commented Nov 8, 2024

Metricbeat: The beat/stats module will frequently log errors about missing cluster UUIDs #34217

Metricbeat: The beat/stats module will frequently log errors about missing cluster UUIDs #34217

Comments

cmacknz commented Jan 9, 2023

elasticmachine commented Jan 9, 2023

JAmorimNeon commented Jan 19, 2023

herbc2 commented Jan 20, 2023

engarpe commented Jan 21, 2023

belimawr commented Jan 25, 2023

klacabane commented Jan 25, 2023 • edited Loading

cmacknz commented Jan 25, 2023

cmacknz commented Jan 25, 2023

yevgenytrcloudzone commented Jan 31, 2023

klacabane commented Jan 31, 2023

miltonhultgren commented Mar 16, 2023 • edited Loading

cmacknz commented Mar 23, 2023

botelastic bot commented Nov 8, 2024

klacabane commented Jan 25, 2023 •

edited

Loading

miltonhultgren commented Mar 16, 2023 •

edited

Loading