Skip to content

Commit

Permalink
Update the ocaml-tree-sitter runtime library (#510)
Browse files Browse the repository at this point in the history
* Use new OCaml code generator and runtime with support for extras

* Add lists of languages that need this or that version of tree-sitter.

* Fix gitignore

* Update core submodule

* Add missing languages to lists of languages

* Add missing language

* Restore version of tree-sitter-julia supported by semgrep

* Update core, requires regenerating all languages

* Add lists of languages for the 'release' command

* Add readme

* Undo PR #488
because the corresponding changes weren't made in semgrep

* Revert 2 recent semgrep-go commits by Brandon and Yosef because
the accompanying changes in semgrep are not ready.
We're reverting the contents of semgrep-go to
commit 9b59bf4

* Update core

* Promote html parser from tree-sitter 0.20.6 to 0.22.6

* Use a patched version of tree-sitter-vue that avoids HTML parsing errors
in programs where both semgrep-vue and semgrep-html are used.

* Add safeguard against running into the same tree-sitter-vue issue again

* Update core
  • Loading branch information
mjambon authored Sep 18, 2024
1 parent d14c9ac commit e4bd859
Show file tree
Hide file tree
Showing 18 changed files with 145 additions and 336 deletions.
4 changes: 4 additions & 0 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,7 @@ workflows:
- bash
- c
- cairo
- circom
- clojure
- cpp
- c-sharp
Expand All @@ -158,11 +159,14 @@ workflows:
- kotlin
- lua
- make
- move-on-aptos
- move-on-sui
- ocaml
- php
- promql
- proto
- python
- ql
- r
- ruby
- rust
Expand Down
2 changes: 1 addition & 1 deletion .gitmodules
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@
url = https://github.com/returntocorp/ocaml-tree-sitter-core.git
[submodule "lang/semgrep-grammars/src/tree-sitter-vue"]
path = lang/semgrep-grammars/src/tree-sitter-vue
url = https://github.com/ikatyang/tree-sitter-vue.git
url = https://github.com/semgrep/tree-sitter-vue.git
[submodule "lang/semgrep-grammars/src/tree-sitter-html"]
path = lang/semgrep-grammars/src/tree-sitter-html
url = https://github.com/tree-sitter/tree-sitter-html.git
Expand Down
2 changes: 1 addition & 1 deletion core
Submodule core updated from c085bc to e063ec
5 changes: 4 additions & 1 deletion lang/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -37,11 +37,14 @@ SUPPORTED_TS_LANGUAGES = \
kotlin \
lua \
make \
move-on-aptos \
move-on-sui \
ocaml \
php \
promql \
proto \
python \
ql \
r \
ruby \
rust \
Expand Down Expand Up @@ -86,8 +89,8 @@ SUPPORTED_DIALECTS = \
kotlin \
lua \
make \
move-on-sui \
move-on-aptos \
move-on-sui \
ocaml \
php \
promql \
Expand Down
21 changes: 21 additions & 0 deletions lang/language-variants-0.20.6
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
apex
bash
c-sharp
dart
elixir
fsharp
hack
hcl
java
javascript
lua
php
python
r
ruby
rust
sml
solidity
tsx
typescript
vue
21 changes: 21 additions & 0 deletions lang/language-variants-0.22.6
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
c
cairo
circom
clojure
cpp
dockerfile
go
haskell
html
jsonnet
julia
kotlin
make
move-on-aptos
move-on-sui
ocaml
promql
proto
ql
sqlite
swift
20 changes: 20 additions & 0 deletions lang/languages-0.20.6
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
bash
c-sharp
dart
elixir
fsharp
hack
hcl
java
javascript
lua
php
python
r
ruby
rust
sfapex
sml
solidity
typescript
vue
14 changes: 14 additions & 0 deletions lang/languages-0.20.6.readme
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
The files languages-0.20.6, languages-0.22.6, language-variants-0.20.6,
and language-variants-0.22.6 contain lists of languages that are useful
when regenerating the code for all the languages when necessary.

We're in a situation where some languages are stuck with tree-sitter 0.20.6.
The language names in languages-* are suitable for the `test-lang` script.
The dialect names in language-variants-* are suitable for the `release`
script.

Sample Bash commands iterating over languages:

$ for x in $(cat languages-0.22.6,); do ./test-lang $x || break; done

$ for x in $(cat language-variants-0.22.6,); do ./release $x || break; done
21 changes: 21 additions & 0 deletions lang/languages-0.22.6
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
c
cairo
circom
clojure
cpp
dockerfile
go
haskell
html
jsonnet
julia
kotlin
make
move-on-aptos
move-on-sui
ocaml
promql
proto
ql
sqlite
swift
2 changes: 1 addition & 1 deletion lang/semgrep-grammars/src/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@
/semgrep-*/**/index.js
/semgrep-*/**/src
/semgrep-*/**/inherited
/semgrep-*/test.log
/semgrep-*/**/test.log
53 changes: 10 additions & 43 deletions lang/semgrep-grammars/src/semgrep-go/grammar.js
Original file line number Diff line number Diff line change
Expand Up @@ -17,48 +17,15 @@ module.exports = grammar(base_grammar, {
if they're not already part of the base grammar.
*/
rules: {
semgrep_ellipsis: $ => "...",

semgrep_ellipsis_metavar : $ => /\$\.\.\.[a-zA-Z_][a-zA-Z_0-9]*/,
semgrep_deep_ellipsis: $ => seq("<...", $._expression, "...>"),

// The parser tries to wrap ellipsis with expression statements since we
// list ellipsis as expressions and usually we use them in a statement
// position (i.e `if(true) {...}`)
_statement: ($, previous) => choice(
previous,
prec(1,$.semgrep_ellipsis_metavar),
prec(1,$.semgrep_deep_ellipsis),
prec(1,$.semgrep_ellipsis)
),

_expression: ($, previous) => choice(
previous,
$.semgrep_ellipsis_metavar,
$.semgrep_deep_ellipsis,
$.semgrep_ellipsis,
$.typed_metavar
),

typed_metavar: $ => seq(
"(", $.identifier, ":", $._type, ")"
),

identifier: ($, previous) => token(choice(
previous,
// inline this here so we can stay inside of the `token`, because
// `identifier` is the word token
/\$[A-Z_][A-Z_0-9]*/
)),

parameter_declaration: ($, previous) => choice(
$.semgrep_ellipsis,
$.semgrep_ellipsis_metavar,
previous
),

// slightly more precedence so we bump this up over using `...`
// for a semgrep ellipsis
implicit_length_array_type: ($, previous) => prec(1, previous)
/*
semgrep_ellipsis: $ => '...',
_expression: ($, previous) => {
return choice(
$.semgrep_ellipsis,
...previous.members
);
}
*/
}
});
Loading

0 comments on commit e4bd859

Please sign in to comment.