Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optionally switch Identifiers form std::String to a different type in order to support string interning #30

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

nacaclanga
Copy link

Hello,

as you probably are aware of, many compilers use string interning on identifier and type name tokens in order to avoid heap allocation and to allow for faster equality comparison between identifier names. Generally Strings are interned during tolkinization. Unfortunately lang-c currently only allows for identifiers and type names to be stored in std::String. As I don't want to change the default behaviour, introduce dependencies on external crates or bless a particular string interning library, I came up with a solution where the ast and parser are converted into generic functions. The only issue with this solution is, that I somehow have to work around rust-peg's limitations. This is done by some string substitution on the parser.rs.raw. While I tried my best to modify the code in a fashion, that avoids breaking the public API, I cannot guarantee, whether I succeeded.

I am fully aware, that this use of rust-peg could be considered a hack and you might be reluctant to merge this. Please fell free to close this pull request in this case. Given that you considered merging #18 and I wasn't certain, how this would effect my string substitution, I also made a version, where this change is applied, https://github.com/nacaclanga/lang-c/tree/generic_ident_conditional, which you could consider merging instead.

Merry Christmas

@vickenty
Copy link
Owner

vickenty commented Jan 4, 2021

Hi. Thank you for making this pull request.

Just out of curiosity, did you make any benchmarks to compare performance with and without an interner? I wonder what kind of impact this would have.

Do you have an example how to plug an interner into this using the new trait? I think not having a receiver in Name::get_from_str might be too restrictive, and force the interner to be stored in a thread-local or a global static.

I would be also fine just vendoring a simple interner, given it is justified in terms of performance.

@nacaclanga
Copy link
Author

Hi, yes I was using a lasy_static RwCell global for the reciver, which isn't a very flexible approach. I now reworked my commit in such a way that the the parse function takes a interner by mutable reference that is then used to to intern the symbols.
I did not do any benchmarks so far.

@vickenty
Copy link
Owner

I did some quick tests with dump (see 28fca4b) and on a randomly generated source file. I got a modest ~4% improvement in instruction counts (664M vs 690M IRs), and a slightly larger ~10% improvement in memory usage (27Mb vs 30Mb). This is on parsing only, there isn't any kind of AST traversal.

Feel free to do your own benchmarks and add results here. I suspect benefits may be larger for the code that actually makes use of the AST.

But as it is now, to be honest, this doesn't seem to me like a big enough improvement to justify all the effort and the added complexity of making AST fully generic, and the extra work necessary to merge this (e.g. printer support).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants