-
Notifications
You must be signed in to change notification settings - Fork 339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Elude parser allocation #673
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
byroot
force-pushed
the
elude-parser-allocation
branch
from
November 1, 2024 17:14
3ea49e4
to
3ae8c8b
Compare
Similar to ruby#662, but here we don't even need to spill on the heap, because the parser is never exposed. Before: ``` == Parsing small hash (65 bytes) ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23] Warming up -------------------------------------- json 188.233k i/100ms oj 213.985k i/100ms oj strict 242.564k i/100ms Oj::Parser 448.682k i/100ms rapidjson 291.925k i/100ms Calculating ------------------------------------- json 1.983M (± 0.5%) i/s (504.32 ns/i) - 9.976M in 5.031352s oj 2.334M (± 0.2%) i/s (428.48 ns/i) - 11.769M in 5.042839s oj strict 2.689M (± 0.2%) i/s (371.85 ns/i) - 13.584M in 5.051044s Oj::Parser 4.662M (± 1.2%) i/s (214.50 ns/i) - 23.331M in 5.005414s rapidjson 3.110M (± 0.7%) i/s (321.57 ns/i) - 15.764M in 5.069531s Comparison: json: 1982878.1 i/s Oj::Parser: 4661924.8 i/s - 2.35x faster rapidjson: 3109722.2 i/s - 1.57x faster oj strict: 2689277.0 i/s - 1.36x faster oj: 2333852.9 i/s - 1.18x faster ``` After: ``` == Parsing small hash (65 bytes) ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23] Warming up -------------------------------------- json 223.083k i/100ms oj 214.400k i/100ms oj strict 243.519k i/100ms Oj::Parser 445.445k i/100ms rapidjson 293.936k i/100ms Calculating ------------------------------------- json 2.279M (± 4.5%) i/s (438.71 ns/i) - 11.377M in 5.002132s oj 2.315M (± 0.3%) i/s (431.96 ns/i) - 11.578M in 5.001141s oj strict 2.665M (± 0.9%) i/s (375.19 ns/i) - 13.394M in 5.025562s Oj::Parser 4.703M (± 0.3%) i/s (212.63 ns/i) - 23.609M in 5.019913s rapidjson 3.129M (± 0.4%) i/s (319.55 ns/i) - 15.873M in 5.072213s Comparison: json: 2279385.2 i/s Oj::Parser: 4703032.3 i/s - 2.06x faster rapidjson: 3129356.1 i/s - 1.37x faster oj strict: 2665318.3 i/s - 1.17x faster oj: 2315009.3 i/s - same-ish: difference falls within error ```
Same strategy used for the generator, if we assume only a couple options are passed at most, we might as well traverse the option hash rather than to check all possible keys. ``` == Parsing small hash (65 bytes) ruby 3.3.4 (2024-07-09 revision be1089c8ec) [arm64-darwin23] Warming up -------------------------------------- json 229.732k i/100ms oj 221.571k i/100ms oj strict 255.080k i/100ms Oj::Parser 427.514k i/100ms rapidjson 282.252k i/100ms Calculating ------------------------------------- json 2.185M (± 3.3%) i/s (457.68 ns/i) - 11.027M in 5.052670s oj 2.227M (± 0.4%) i/s (449.10 ns/i) - 11.300M in 5.074920s oj strict 2.532M (± 1.4%) i/s (394.97 ns/i) - 12.754M in 5.038527s Oj::Parser 4.309M (± 0.5%) i/s (232.10 ns/i) - 21.803M in 5.060621s rapidjson 2.811M (± 0.2%) i/s (355.78 ns/i) - 14.113M in 5.020940s Comparison: json: 2184913.9 i/s Oj::Parser: 4308534.8 i/s - 1.97x faster rapidjson: 2810757.1 i/s - 1.29x faster oj strict: 2531841.6 i/s - 1.16x faster oj: 2226694.4 i/s - same-ish: difference falls within error ```
This very significantly reduce the overhead on smaller benchmarks ``` == Parsing small hash (65 bytes) ruby 3.3.4 (2024-07-09 revision be1089c8ec) [arm64-darwin23] Warming up -------------------------------------- json 304.417k i/100ms oj 219.431k i/100ms oj strict 254.532k i/100ms Oj::Parser 431.309k i/100ms rapidjson 281.703k i/100ms Calculating ------------------------------------- json 3.046M (± 0.1%) i/s (328.25 ns/i) - 15.525M in 5.096243s oj 2.225M (± 0.2%) i/s (449.50 ns/i) - 11.191M in 5.030429s oj strict 2.553M (± 0.5%) i/s (391.75 ns/i) - 12.981M in 5.085538s Oj::Parser 4.280M (± 0.8%) i/s (233.64 ns/i) - 21.565M in 5.038834s rapidjson 2.826M (± 0.3%) i/s (353.83 ns/i) - 14.367M in 5.083480s Comparison: json: 3046420.8 i/s Oj::Parser: 4280132.7 i/s - 1.40x faster rapidjson: 2826209.4 i/s - 1.08x slower oj strict: 2552619.7 i/s - 1.19x slower oj: 2224670.7 i/s - 1.37x slower ```
byroot
force-pushed
the
elude-parser-allocation
branch
from
November 1, 2024 17:24
3ae8c8b
to
e660b61
Compare
Are you publishing these benchmarks somewhere? |
You mean the source? It's right there in the repo: https://github.com/ruby/json/tree/master/benchmark |
Thank you! It was not referenced anywhere in this PR so I did not know where it came from. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Similar to the optimizations done for the generator, but here it's even simpler because we never need to spill the
Parser
object on the heap.We also start the parsing buffer with 512B of stack memory, so that in most case no allocations other than the returned value are necessary.
This mostly help on micro-benchmarks, but isn't detrimental to more real world benchmarks.
Before:
After: