Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix thread-local leaks (alternate approach) #205

Closed
wants to merge 2 commits into from

Conversation

jkni
Copy link
Collaborator

@jkni jkni commented Feb 6, 2024

We can't allow object-level thread-local values to reference the object, since this means there will always be a strong reference to the key in the thread-local map, preventing these thread-locals for long-lived threads from being reclaimed.

This PR fixes two thread-local leak sources. The first was introduced with resumable search, where the GraphSearcher stores the score function. In several cases, the scoreFunction is a lambda with an implicit this referring to a GraphIndexBuilder. If the last search run by a GraphIndexBuilder is one of these cases, the GraphSearcher thread-local will be kept alive indefinitely by this circular reference. The second source is much older, but very similar. A GraphIndexBuilder's thread-local GraphSearcher references a view of an OnHeapGraphIndex initialized in the GraphIndexBuilder constructor. This OnHeapGraphIndex was initialized with a neighborFactory containing an implicit this to the GraphIndexBuilder instance. This loop also keep the GraphSearcher thread-local alive indefinitely.

In the process of working this PR, I discovered several cases where a try-with-resources block over a PoolingSupport retained the contents received from a RandomAccessVectorValues outside the context of the try-with-resources. If this PoolingSupport was ever a queued version, this would violate the contract of the autocloseable and potentially cause incorrect behavior. Since we don't currently use queued versions of the PoolingSupport, I opted to simplify PoolingSupport to just shared/thread-local values, which eliminates the need for autocloseable entirely. I find this cleaner given that we don't use queueing, but as an alternative, I could fix all cases where a pooled value leaks contents outside a try-with-resources (which would come at runtime cost with no benefit, since we don't use queuing).

jkni added 2 commits February 5, 2024 14:17
…sting usages used the interface incorrectly and leaked contents outside the context of the try-with-resources block. Refactor as StorageSupport, which lets us drop the autocloseable wrapper object.
Copy link
Contributor

@dlg99 dlg99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm after a quick glance, just some random thoughts about array pooling
I assume testing shows that the leak is gone.

}
var localRavv = ravvCopy.get();
float[] v = localRavv.vectorValue(targetOrd);
return localRavv.isValueShared() ? Arrays.copyOf(v, v.length) : v;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

random thoughts:
will we benefit from pooling the float[]? E.g. we can use multiple pools can be for small arrays (< 256 len), med (< 768), large (< 2048), and unpooled for others, adjust to the most common cases.
will vector instructions benefit from off-heap (direct memory) arrays?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitely could benefit from pooling that (or a similar solution); this area of code is something I've looked at in the native branch WIP, which changes vector representation and many of the details of our computations around them.

@jbellis
Copy link
Owner

jbellis commented Feb 7, 2024

Since we don't currently use queued versions of the PoolingSupport, I opted to simplify PoolingSupport to just shared/thread-local values, which eliminates the need for autocloseable entirely.

The main point of PS was to add support for virtual threads via QueuedPooling. If we don't care about supporting VT then we should just rip the whole thing out and use ETL instead which has a much friendlier API (doesn't require explicit close).

@jbellis
Copy link
Owner

jbellis commented Feb 7, 2024

Re the motivation for this approach over ExplicitThreadLocal,

over the lifetime of a GIB, you don't get any clean up of the searcher/vv objects associated with a thread that terminates

I think it's pretty reasonable to say "we assume that you call add from a bounded number of threads and allocate per-thread state that will persist until the GIB is GC'd", and if we want to be super proactive about it we could add some kind of release method so you could say "I'm done calling add from this thread."

I don't want to leave TL+TLM in the code, it's too fragile both wrt to the circular references you identified and also even without circular references there are just no guarantees when Entry references get expunged without explicit calls to remove that are difficult to impossible to fit into jvector's design.

@jkni
Copy link
Collaborator Author

jkni commented Feb 7, 2024

I agree with that conclusion.

@jkni jkni closed this Feb 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants