Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve lookup tables for string escaping. #724

Merged
merged 1 commit into from
Jan 3, 2025
Merged

Conversation

byroot
Copy link
Member

@byroot byroot commented Jan 3, 2025

Introduce a simplified table for the most common case, which is script_safe: false, ascii_only: false.

On the script_safe table, now only 0xE2 does a multi-byte check.

Merge back convert_ASCII_to_JSON, as it no longer help much with the simplified escape table.

== Encoding mixed utf8 (5003001 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
               after    38.000 i/100ms
Calculating -------------------------------------
               after    398.220 (± 3.0%) i/s    (2.51 ms/i) -      2.014k in   5.061659s

Comparison:
              before:      381.8 i/s
               after:      398.2 i/s - same-ish: difference falls within error

== Encoding mostly utf8 (5001001 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
               after    39.000 i/100ms
Calculating -------------------------------------
               after    393.337 (± 2.5%) i/s    (2.54 ms/i) -      1.989k in   5.059397s

Comparison:
              before:      304.3 i/s
               after:      393.3 i/s - 1.29x  faster

== Encoding twitter.json (466906 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
               after   244.000 i/100ms
Calculating -------------------------------------
               after      2.436k (± 0.9%) i/s  (410.43 μs/i) -     12.200k in   5.007702s

Comparison:
              before:     2125.9 i/s
               after:     2436.5 i/s - 1.15x  faster

Introduce a simplified table for the most common case, which is
`script_safe: false, ascii_only: false`.

On the `script_safe` table, now only `0xE2` does a multi-byte check.

Merge back `convert_ASCII_to_JSON`, as it no longer help much with
the simplified escape table.

```
== Encoding mixed utf8 (5003001 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
               after    38.000 i/100ms
Calculating -------------------------------------
               after    398.220 (± 3.0%) i/s    (2.51 ms/i) -      2.014k in   5.061659s

Comparison:
              before:      381.8 i/s
               after:      398.2 i/s - same-ish: difference falls within error

== Encoding mostly utf8 (5001001 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
               after    39.000 i/100ms
Calculating -------------------------------------
               after    393.337 (± 2.5%) i/s    (2.54 ms/i) -      1.989k in   5.059397s

Comparison:
              before:      304.3 i/s
               after:      393.3 i/s - 1.29x  faster

== Encoding twitter.json (466906 bytes)
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
               after   244.000 i/100ms
Calculating -------------------------------------
               after      2.436k (± 0.9%) i/s  (410.43 μs/i) -     12.200k in   5.007702s

Comparison:
              before:     2125.9 i/s
               after:     2436.5 i/s - 1.15x  faster
```
@byroot byroot merged commit 12965b9 into ruby:master Jan 3, 2025
40 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant