Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High cost encoding and decoding in the League library #23

Open
mgrojo opened this issue Jan 24, 2023 · 5 comments
Open

High cost encoding and decoding in the League library #23

mgrojo opened this issue Jan 24, 2023 · 5 comments
Labels
enhancement New feature or request

Comments

@mgrojo
Copy link
Collaborator

mgrojo commented Jan 24, 2023

League.Text_Codecs.Codec

Similar to #20.

I was going to apply the same solution than in #20. I see two possibilities:

  • Use another holder instance in PB_Support.IO.
  • Share the holder instance moving it from PB_Support.Internal to PB_Support (private part) and use it from both packages.

Do you see any problem in sharing this in PB_Support?

@mgrojo
Copy link
Collaborator Author

mgrojo commented Jan 24, 2023

Well, I see PB_Support is pure, so maybe it's not a good idea to change that to include the variable. I'll go for the other option, unless you have other proposal, @reznikmm.

@reznikmm
Copy link
Owner

I propose to move

Codec : Text_Codec_Holders.Holder;

into a public part of PB_Support.Internal and reuse it in the body of PB_Support.IO if this works.

@mgrojo
Copy link
Collaborator Author

mgrojo commented Jan 24, 2023

It doesn't make much difference after the change, this time. Looking more closely at the report, most of the time is gone encoding and decoding the UTF-8 strings.

@mgrojo mgrojo changed the title High cost calling League.Text_Codecs.Codec in PB_Support.IO High cost encoding and decoding in the League library Jan 26, 2023
@mgrojo
Copy link
Collaborator Author

mgrojo commented Jan 26, 2023

What I've gathered is that League.Universal_String is using UTF-16 and we need to convert to UTF-8 forth and back as required by Protobuf.

I was looking at https://web.archive.org/web/20220817170400/https://forge.ada-ru.org/matreshka/wiki/League/Performance which says

  • use of special algorithms and utilization of SIMD operations (when available) significantly improve performance.

and I wonder how could I check that it is actually using the SIMD operations in my build. I don't see any clue about it in the Callgrind report.

I was also wondering if there could be a way to speed up conversion if you know that the input is always going to be in the US-ASCII subset, which is my particular case.

@mgrojo mgrojo added the enhancement New feature or request label Jan 26, 2023
@reznikmm
Copy link
Owner

We could try to replace Matreshka with VSS. VSS uses utf-8 inside and keeps short strings inline...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants