Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid stack overflows by computing tree depth and giving up before attempting OCaml translation #73

Open
mjambon opened this issue Jan 19, 2024 · 0 comments
Labels
bug Something isn't working enhancement New feature or request priority:medium to do, not blocking users

Comments

@mjambon
Copy link
Member

mjambon commented Jan 19, 2024

Typically, long sequences of items such as long array literals or long sequences of statements are found only in generated files. Such files may not be interesting to analyze (e.g. in the case of semgrep) so it would be fine to give up on them. The problem is that at least the translation to the OCaml tree uses stack space that's proportional to the length of the tree (including lists) and it results in segfaults on some platforms while on other platforms it raises a Stack_overflow exception. To avoid such a crash, a solution may be to calculate the depth of the tree returned by the tree-sitter parser and return an error if it exceeds some limit. This assumes the tree-sitter parser itself doesn't crash due to insufficient stack space.

Here's an example of a large generated C++ file whose parsing results in stack overflows: https://github.com/juce-framework/JUCE/blob/d054f0d14dcac387aebda44ce5d792b5e7a625b3/extras/Projucer/JuceLibraryCode/BinaryData.cpp

Tasks:

  1. Check whether we can parse the input file above with just tree-sitter-cpp (e.g. with tree-sitter parse).
  2. If so, add a pass to calculate/estimate the depth of the tree returned by tree-sitter before its translation to OCaml.
  3. Add an option to fail if the tree depth exceeds a limit.

Ideas to avoid complete failure:

  • Truncate excessively deep trees/lists without aborting when possible (e.g. tree-sitter's repeat() and repeat1() constructs).
  • Increase the system's stack size limit or ask the user to do so in a last-gasp error message.
@mjambon mjambon added bug Something isn't working enhancement New feature or request priority:medium to do, not blocking users labels Jan 19, 2024
@mjambon mjambon changed the title Avoid stack overflows by computing tree depth and giving up before OCaml translation Avoid stack overflows by computing tree depth and giving up before attempting OCaml translation Jan 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request priority:medium to do, not blocking users
Development

No branches or pull requests

1 participant