Global, incremental, syntax based, fuzzy index of all types/function in the project #73

matklad · 2024-12-02T11:01:21Z

This shouldn't be too hard, and is quite cute! That's what you need to do:

A function that takes a file, parses it into tree sitter tree, and extracts set of symbols and meta information about the symbol (so, string, line/column, flag for pub, enum(u8) for kind (function, constant, struct, etc))
Some infra to discover all .zig files in project and run the above processing for all of them, in parallel
A function that, for each file, converts the set of strings to an FSM (an optionally compressed trie)
A lookup function that takes two state machines and builds their "intersection state machine", an FSM containing strings which belong to both input state machines. This is how you do fuzzy search --- the trie from the previous step holds all the strings you have, a fuzzy query can be thought of as a subsequence state machine, and the intersection of the two is the query result.
Secret Sauce: an incremental update procedure which, when a single file changes, doesn't re-build the entire index, and only re-builds the state machine for this specific file.

In general, because this stuff is trivial to make both parallel & incremental, it can feel very, very fast. It kills me that VS Code has an artificial debouncing delay between when you type the first letter of the query, and when the results are returned. It would be stupid to not get first dozen results in under 8 milliseconds!
File watcher to kick the above when appropriate
Some glue code to take user's input and thread it through all FSMs, appropriately waiting for FSMs to get built!

The original implementation in rust-analyzer, which took advantage of the existing fsm crate, was about 200 lines of code!

The text was updated successfully, but these errors were encountered:

neurocyte added the enhancement New feature or request label Dec 2, 2024

neurocyte added this to the v1.0.0 milestone Dec 21, 2024

Provide feedback