You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
we are suffering from a hard to debug resource leak on our components using the Elasticsearch Client which we are currently investigating. The symptoms will occur on deployments after multiple days/ weeks and will lead to the corresponding pod which carries the deployment to be "stuck". Requests reaching the pod are essentially stuck and no new requests will be distributed to the individual pod until restarted.
The issue is unfortunately hard to reproduce locally. From a heapdump of an affected instance I was able to retrieve the following information which pointed us in the direction of the Apache Async HTTP Client used by the Low Level Elasticsearch Rest Client.
One instance of org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager loaded by org.springframework.boot.loader.launch.LaunchedClassLoader @ 0xa515d078 occupies 131,150,024 (67.48%) bytes. The memory is accumulated in one instance of java.util.LinkedList, loaded by <system class loader>, which occupies 130,898,736 (67.36%) bytes.
Thread java.lang.Thread @ 0xa6ef74e0 elasticsearch-rest-client-0-thread-1 has a local variable or reference to org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager @ 0xa6ef7730 which is on the shortest path to java.util.LinkedList @ 0xa703f370. The thread java.lang.Thread @ 0xa6ef74e0 elasticsearch-rest-client-0-thread-1 keeps local variables with total size 1,928 (0.00%) bytes.
Significant stack frames and local variables
org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(Lorg/apache/http/nio/reactor/IOEventDispatch;)V (PoolingNHttpClientConnectionManager.java:221)
org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager @ 0xa6ef7730 retains 131,150,024 (67.48%) bytes
The stacktrace of this Thread is available. See stacktrace. See stacktrace with involved local variables.
Keywords
org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager
org.springframework.boot.loader.launch.LaunchedClassLoader
java.util.LinkedList
org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(Lorg/apache/http/nio/reactor/IOEventDispatch;)V
PoolingNHttpClientConnectionManager.java:221
elasticsearch-rest-client-0-thread-1
at sun.nio.ch.EPoll.wait(IJII)I (EPoll.java(Native Method))
at sun.nio.ch.EPollSelectorImpl.doSelect(Ljava/util/function/Consumer;J)I (EPollSelectorImpl.java:121)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(Ljava/util/function/Consumer;J)I (SelectorImpl.java:130)
at sun.nio.ch.SelectorImpl.select(J)I (SelectorImpl.java:142)
at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(Lorg/apache/http/nio/reactor/IOEventDispatch;)V (AbstractMultiworkerIOReactor.java:343)
at org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(Lorg/apache/http/nio/reactor/IOEventDispatch;)V (PoolingNHttpClientConnectionManager.java:221)
at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run()V (CloseableHttpAsyncClientBase.java:64)
at java.lang.Thread.runWith(Ljava/lang/Object;Ljava/lang/Runnable;)V (Thread.java:1596)
at java.lang.Thread.run()V (Thread.java:1583)
Class Name | Shallow Heap (bytes) | Retained Heap (bytes)
-- | -- | --
java.util.LinkedList @ 0xa703f370 | 32 | 130,898,736
└─ leasingRequests org.apache.http.impl.nio.conn.CPool @ 0xa703e868 | 88 | 131,084,168
└─ pool org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager @ 0xa6ef7730 | 32 | 131,150,024
+ <Java Local> java.lang.Thread @ 0xa6ef74e0 elasticsearch-rest-client-0-thread-1 Thread | 104 | 1,928
+ val$connmgr org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1 @ 0xa6ef77c0 | 32 | 32
+ connmgr, connmgr org.apache.http.impl.nio.client.InternalHttpAsyncClient @ 0xa618fb18 | 72 | 112
We are using the Elasticsearch client in a conventional way from a reactive context like this:
Wondering if you observed similar issues in the past or if you have an idea what the source of the issue could be.
The issue was also present in version prior to 8.16.1 as it seems.
Thanks a lot!
Best Regards
Sven S.
Edit:
A workaround which we found so far is to specify a short TTL for the connection of the http client itself.
The text was updated successfully, but these errors were encountered:
Java API client version
8.16.1
Java version
21
Elasticsearch Version
8.16.1
Problem description
Hello,
we are suffering from a hard to debug resource leak on our components using the Elasticsearch Client which we are currently investigating. The symptoms will occur on deployments after multiple days/ weeks and will lead to the corresponding pod which carries the deployment to be "stuck". Requests reaching the pod are essentially stuck and no new requests will be distributed to the individual pod until restarted.
The issue is unfortunately hard to reproduce locally. From a heapdump of an affected instance I was able to retrieve the following information which pointed us in the direction of the Apache Async HTTP Client used by the Low Level Elasticsearch Rest Client.
We are using the Elasticsearch client in a conventional way from a reactive context like this:
Wondering if you observed similar issues in the past or if you have an idea what the source of the issue could be.
The issue was also present in version prior to 8.16.1 as it seems.
Thanks a lot!
Best Regards
Sven S.
Edit:
A workaround which we found so far is to specify a short TTL for the connection of the http client itself.
The text was updated successfully, but these errors were encountered: