Replies: 1 comment
-
We recently added embedding search back. The right combination is really important for us, and the latest version should handle that. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello sweepai team, thanks for all this inspiring work!
In the repository, I see two ways of getting relevant code snippets during the search phase.
prep_snippets
inticket_utils.py
sweep/sweepai/utils/ticket_utils.py
Line 23 in 3408bf0
According to the scoring algorithms described in: https://docs.sweep.dev/blogs/building-code-search, this is the one currently used for code search. However, in this search algorithm, only reverted index is used, code embeddings in vector db are not searched at all.
get_relevant_snippets
invector_db.py
sweep/sweepai/core/vector_db.py
Line 374 in 3408bf0
This method does not seem to be used for now, but it uses the vector db embeddings and lexical search together, then use a different scoring algorithm to rerank the snippets.
This method seems to make more sense to me given all the previous effort to index code embeddings in the vector db.
Any reason why the first method is preferrable rather than the second one? Does lexical search provide better results than a combination of lexical search and embedding search? It would be interesting to know how the code search results are evalutated. Thanks.
Beta Was this translation helpful? Give feedback.
All reactions