-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Annotate JFR leak profiler with BasedOnJDKFile
#10206
base: master
Are you sure you want to change the base?
Annotate JFR leak profiler with BasedOnJDKFile
#10206
Conversation
Thanks. Did you compare the current implementation with the HotSpot sources that you mentioned in the |
Thanks for catching that Christian. While looking into this, I actually discovered a bug that runs a lot deeper. What's happening in Hotspot is that the thread and stacktrace constant data are being cached and only written out when the OldObjectSample events get emitted. OldObjectSample events are special because they are only emitted at the end of a recording, not every chunk rotation. However, in SVM we are serializing the thread and stacktrace data normally like we do for any other event. This is bad because the constant data may get written out too early in chunk1 while the recording later finishes in chunk2. Thus separating the constant and event data. We probably didn't catch this before in tests because test recordings are generally only 1 chunk long. The fix for this will likely not be trivial. I am happy to work on it but it might be best to do it in a separate PR. I can open a bug report for it too. What do you think? |
Very good catch, thanks for checking! Yes, please open a bug report so that this problem is documented. It would be great if you could look into that. Regarding PRs, please do the fix and the |
Ok I'll add all the changes to this one PR. In the meantime, I've set this PR to "draft" and I'll open a bug report. |
c8cdc26
to
e5bcf34
Compare
e5bcf34
to
692a0be
Compare
Hi @christianhaeubl. I have updated this PR with a fix for the bug we talked about. I've tried to reuse as much of the existing SubstrateVM JFR infrastructure as possible. My general approach is to cache the thread and stacktrace data within the sample instances themselves so that they can later be serialized at the time OldObjectSample events are emitted. I've updated the existing tests and added a new test to catch this type of problem in the future. Hotspot handles stacktrace and thread data slightly differently. Stacktraces are deduplicated at the time of sampling using a second stacktrace repo dedicated to the leak profiler. Then at the time of chunk rotation, the stacktraces are retrieved, serialized, and cached in the old object samples. Multiple samples may reference the same cached stacktrace so a reference count is kept. If all the samples referencing a serialized stacktrace are evicted from the queue, the stacktrace is freed. Finally, at the time of event emission, the serialized stacktraces are retrieved from the surviving samples and written to the chunk. The goal is to (1) provide deduplication and (2) only keep stacktraces that correspond to surviving samples. I think my implementation accomplishes both goals while still allowing us to reuse most of the existing SubstrateVM JFR code. We can avoid introducing a second stacktrace repo that functions differently and avoid creating a caching/referencing system. The main practical difference of my implementation is that we delay deduplication and serialization until the time of event emission. I think this is okay, since we are bounded by the number of elements in the queue. The JFR CPU sampler also delays deduplication too, so this is not so different. Hotspot handles thread data similarly to stacktraces. It serializes it upfront and caches it in the sample. I decided it was better to instead delay serialization until the epoch that the events get emitted. This lets us reuse the JfrThreadRepository without any changes. |
This is just a minor change that adds
BasedOnJDKFile
to theOldObjectSampler
code.