Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compact GitProvenance.Committer representation #4096

Merged
merged 9 commits into from
Jan 27, 2025
Merged

Conversation

knutwannheden
Copy link
Contributor

The in-memory representation of the GitProvenance.Committer requires quite a lot of memory for the commit statistics as captured by the TreeMap<LocalDate, Integer> structure. Replacing with two more compact int[] fields.

The in-memory representation of the `GitProvenance.Committer` requires quite a lot of memory for the commit statistics as captured by the `TreeMap<LocalDate, Integer>` structure. Replacing with two more compact `int[]` fields.
# Conflicts:
#	rewrite-core/src/main/java/org/openrewrite/marker/GitProvenance.java
#	rewrite-core/src/main/java/org/openrewrite/search/FindCommitters.java
#	rewrite-core/src/test/java/org/openrewrite/search/FindCommittersTest.java
@knutwannheden
Copy link
Contributor Author

knutwannheden commented Jan 26, 2025

Comparing for the example of dotnet/runtime the difference in retained memory by the GitProvenance marker is 5594800 vs 1233192 bytes. In the serialized LST the difference is 79009 bytes. This was for 831 commits and 164 committers over the past 90 days:

$ echo "Commits: $(git rev-list --count --since="90 days ago" HEAD)" && \
echo "Committers: $(git log --since="90 days ago" --format='%ae' | sort | uniq | wc -l)"
Commits: 831
Committers:      164

Although maybe what is more important is the total:

$ echo "Commits: $(git rev-list --count HEAD)" && \
echo "Committers: $(git log --format='%ae' | sort | uniq | wc -l)"
Commits: 137292
Committers:     3054

While dotnet/runtime is not the Linux kernel, it is still a substantially large repo. Looking at tensorflow/tensorflow:

$ echo "Commits: $(git rev-list --count HEAD)" && \
echo "Committers: $(git log --format='%ae' | sort | uniq | wc -l)"
Commits: 175122
Committers:     4468

we get: 6028288 vs 1523496 bytes.

More optimizations are possible by only keeping one array. This would in the previous example yield an in-memory size of 1391976. Or alternatively replace the int[] used for the number of commits with short[] which would result in a size of 1370160.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some suggestions could not be made:

  • rewrite-gradle/src/main/groovy/RewriteGradleProject.groovy
    • lines 196-195

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some suggestions could not be made:

  • rewrite-gradle/src/main/groovy/RewriteGradleProject.groovy
    • lines 196-195

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some suggestions could not be made:

  • rewrite-gradle/src/main/groovy/RewriteGradleProject.groovy
    • lines 196-195

@knutwannheden knutwannheden marked this pull request as ready for review January 27, 2025 08:14
@knutwannheden knutwannheden merged commit c1c7bd1 into main Jan 27, 2025
2 checks passed
@knutwannheden knutwannheden deleted the compact-committers branch January 27, 2025 08:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants