Questions about algo used in relevant code snippet search #2950

alannesta · 2024-01-15T22:06:46Z

alannesta
Jan 15, 2024

Hello sweepai team, thanks for all this inspiring work!

In the repository, I see two ways of getting relevant code snippets during the search phase.

prep_snippets in ticket_utils.py

Line 23 in 3408bf0

def prep_snippets(

According to the scoring algorithms described in: https://docs.sweep.dev/blogs/building-code-search, this is the one currently used for code search. However, in this search algorithm, only reverted index is used, code embeddings in vector db are not searched at all.

get_relevant_snippets in vector_db.py

sweep/sweepai/core/vector_db.py

Line 374 in 3408bf0

def get_relevant_snippets(

This method does not seem to be used for now, but it uses the vector db embeddings and lexical search together, then use a different scoring algorithm to rerank the snippets.

This method seems to make more sense to me given all the previous effort to index code embeddings in the vector db.

Any reason why the first method is preferrable rather than the second one? Does lexical search provide better results than a combination of lexical search and embedding search? It would be interesting to know how the code search results are evalutated. Thanks.

wwzeng1 · 2024-03-13T22:21:14Z

wwzeng1
Mar 13, 2024
Maintainer

We recently added embedding search back. The right combination is really important for us, and the latest version should handle that.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about algo used in relevant code snippet search #2950

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Questions about algo used in relevant code snippet search #2950

alannesta Jan 15, 2024

Replies: 1 comment

wwzeng1 Mar 13, 2024 Maintainer

alannesta
Jan 15, 2024

wwzeng1
Mar 13, 2024
Maintainer