You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After upgrading to spring boot 3.4, we have noticed a memory leak. I can reproduce it when running performance tests in a production-like environment after about 30 minutes, but I cannot on my local machine. So please apologize in advance for not providing a reproducer. But perhaps you can give me some hints and then I can dig further on my own and provide a reproducer.
The application is based on spring cloud gateway with SB 3.4.2 and spring cloud 2024.0.0. I can make the issue go away by force-downgrading both io.micrometer:micrometer-tracing-bridge-brave and io.micrometer:micrometer-tracing to 1.3.5 (we were on SB 3.3 before). I did this by forcing the specific version in my build and providing custom beans for BraveTracer and BaggagePropagation.FactoryBuilder, because they're not compatible (new constructor added).
I have captured a heap dump (can't share it unfortunately, but happy to provide pieces), which points to io.micrometer.observation.SimpleObservation:
Shortest path:
Incoming references:
Furthermore, when investigating these observations, we found that all the metrics in there were of a specific type. We have many types of metrics and how we collect them, but only the ones that we collect via Micrometer.observation seem to show up there. Importantly, by removing these metrics, the memory leak also goes away.
Observation <{
name=redis.cache.latency(redis.cache.latency),
error=null,
context=
name='redis.cache.latency',
contextualName='redis.cache.latency',
error='null',
lowCardinalityKeyValues=[method='get', reactor.status='completed', reactor.type='Mono', serviceId='LoginWebClientBean'],
highCardinalityKeyValues=[],
map=[
class io.micrometer.core.instrument.Timer$Sample='io.micrometer.core.instrument.Timer$Sample@6db2ad5d',
class io.micrometer.tracing.handler.TracingObservationHandler$TracingContext='TracingContext{span=679b469bcb6ed8a9787039b7703fb61b/027e17a438ddf74d}',
class io.micrometer.core.instrument.LongTaskTimer$Sample='SampleImpl{duration(seconds)=-1.0E-9, duration(nanos)=-1.0, startTimeNanos=3342086237708105}'
],
parentObservation={
name=spring.security.filterchains(security filterchain after),
error=null,
context=name='spring.security.filterchains',
contextualName='security filterchain after',
error='null',
lowCardinalityKeyValues=[spring.security.filterchain.position='13', spring.security.filterchain.size='13', spring.security.reached.filter.name='ServerWebExchangeReactorContextWebFilter', spring.security.reached.filter.section='after'],
highCardinalityKeyValues=[],
map=[
class io.micrometer.core.instrument.Timer$Sample='io.micrometer.core.instrument.Timer$Sample@20ce500d',
class io.micrometer.tracing.handler.TracingObservationHandler$TracingContext='TracingContext{span=679b469bcb6ed8a9787039b7703fb61b/799bf843490626c5}',
class io.micrometer.core.instrument.LongTaskTimer$Sample='SampleImpl{duration(seconds)=-1.0E-9, duration(nanos)=-1.0, startTimeNanos=3342086239679203}'
],
parentObservation={
name=http.server.requests(http get),
error=null, context=name='http.server.requests',
contextualName='http get',
error='null',
lowCardinalityKeyValues=[client.id='frontend', exception='none', method='GET', outcome='CLIENT_ERROR', status='401', uri='UNKNOWN'],
highCardinalityKeyValues=[http.url='/user'],
map=[
class io.micrometer.core.instrument.Timer$Sample='io.micrometer.core.instrument.Timer$Sample@665e5579',
class io.micrometer.tracing.handler.TracingObservationHandler$TracingContext='TracingContext{span=679b469bcb6ed8a9787039b7703fb61b/787039b7703fb61b}',
class io.micrometer.core.instrument.LongTaskTimer$Sample='SampleImpl{duration(seconds)=-1.0E-9, duration(nanos)=-1.0, startTimeNanos=3342086234182238}'
],
parentObservation=null
}
}
}>
to which we're restoring is not the same as the one set as this scope's parent observation
<{
name=redis.cache.latency(redis.cache.latency),
error=null,
context=name='redis.cache.latency',
contextualName='redis.cache.latency',
error='null',
lowCardinalityKeyValues=[method='get', reactor.status='completed', reactor.type='Mono', serviceId='LoginWebClientBean'],
highCardinalityKeyValues=[],
map=[
class io.micrometer.core.instrument.Timer$Sample='io.micrometer.core.instrument.Timer$Sample@6433cb08',
class io.micrometer.tracing.handler.TracingObservationHandler$TracingContext='TracingContext{span=5d7ed346ab277fea4f3a20599dce3820/0b6e3e4ddc8eed49}',
class io.micrometer.core.instrument.LongTaskTimer$Sample='SampleImpl{duration(seconds)=-1.0E-9, duration(nanos)=-1.0, startTimeNanos=3342090767706628}'
],
parentObservation={
name=spring.security.filterchains(null),
error=null,
context=name='spring.security.filterchains',
contextualName='null',
error='null',
lowCardinalityKeyValues=[spring.security.filterchain.position='0', spring.security.filterchain.size='0', spring.security.reached.filter.name='none', spring.security.reached.filter.section='before'],
highCardinalityKeyValues=[],
map=[
class io.micrometer.core.instrument.Timer$Sample='io.micrometer.core.instrument.Timer$Sample@3f6d09bc',
class io.micrometer.tracing.handler.TracingObservationHandler$TracingContext='TracingContext{span=5d7ed346ab277fea4f3a20599dce3820/064d8ada01512255}',
class io.micrometer.core.instrument.LongTaskTimer$Sample='SampleImpl{duration(seconds)=0.001866874, duration(nanos)=1866874.0, startTimeNanos=3342090766784193}'
],
parentObservation={
name=http.server.requests(null),
error=null,
context=name='http.server.requests',
contextualName='null',
error='null',
lowCardinalityKeyValues=[client.id='schema', exception='none', method='GET', outcome='SUCCESS', status='200', uri='UNKNOWN'],
highCardinalityKeyValues=[http.url='/user/preference'],
map=[class io.micrometer.core.instrument.Timer$Sample='io.micrometer.core.instrument.Timer$Sample@e7b3687', class io.micrometer.tracing.handler.TracingObservationHandler$TracingContext='TracingContext{span=5d7ed346ab277fea4f3a20599dce3820/853e50085c3180f2}', class io.micrometer.core.instrument.LongTaskTimer$Sample='SampleImpl{duration(seconds)=0.003921056, duration(nanos)=3921056.0, startTimeNanos=3342090764743947}'],
parentObservation=null
}
}
}>
Most likely a manually created Observation has a scope opened that was never closed. This may lead to thread polluting and memory leaks.
So I have no idea why this is happening but I understand what this means. Most likely one of the observations is put in scope and then the scope is not closed. With so many moving pieces it's hard to guess why. We have Redis, Micrometer, Tracing, Security and WebFlux that can be the culprit...
Is there an option to e.g. remove spring security from the classpath and to see if the problem is still there? So that we can provide the most probable pair of potential culprits?
Hi,
After upgrading to spring boot 3.4, we have noticed a memory leak. I can reproduce it when running performance tests in a production-like environment after about 30 minutes, but I cannot on my local machine. So please apologize in advance for not providing a reproducer. But perhaps you can give me some hints and then I can dig further on my own and provide a reproducer.
The application is based on spring cloud gateway with SB 3.4.2 and spring cloud 2024.0.0. I can make the issue go away by force-downgrading both
io.micrometer:micrometer-tracing-bridge-brave
andio.micrometer:micrometer-tracing
to 1.3.5 (we were on SB 3.3 before). I did this by forcing the specific version in my build and providing custom beans forBraveTracer
andBaggagePropagation.FactoryBuilder
, because they're not compatible (new constructor added).I have captured a heap dump (can't share it unfortunately, but happy to provide pieces), which points to
io.micrometer.observation.SimpleObservation
:Shortest path:
Incoming references:
Furthermore, when investigating these observations, we found that all the metrics in there were of a specific type. We have many types of metrics and how we collect them, but only the ones that we collect via
Micrometer.observation
seem to show up there. Importantly, by removing these metrics, the memory leak also goes away.This is how we collect them:
Also, we get tons of warnings in the logs about:
Warn message example
My suspicion is this PR 😜 #688
If anybody could give me some hints on how to further investigate, that would be much appreciated. Thanks!
The text was updated successfully, but these errors were encountered: