You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current Voyager HNSW index works great but does not scale, which limits the usage and the tests on small databases.
To allow evaluation on all the BEIR datasets and other larger DB, we need to implement PLAID.
I started building it on this branch.
Right now, it'll mostly be a wrapper of the stanford-nlp implementation to work with input embeddings instead of computing them in the code, enabling it to be model agnostic and so to work with PyLate (but also ColPali for that matter).
First step was to make it work with all the stanford-nlp embedded, which is now done.
Now I have to make sure every parameter are correctly plugged and clean up the plugging in general.
After that, we'll clean up unused code from stanford-nlp and it should be good to go.
The whole processing will be somewhat very black-boxy compared to the rest of the codebase, but recoding PLAID from the ground up would require way more time (and will actually be easier with this base), so I think it is a good target for a v1 allowing people to use PLAID.
The text was updated successfully, but these errors were encountered:
The current Voyager HNSW index works great but does not scale, which limits the usage and the tests on small databases.
To allow evaluation on all the BEIR datasets and other larger DB, we need to implement PLAID.
I started building it on this branch.
Right now, it'll mostly be a wrapper of the stanford-nlp implementation to work with input embeddings instead of computing them in the code, enabling it to be model agnostic and so to work with PyLate (but also ColPali for that matter).
First step was to make it work with all the stanford-nlp embedded, which is now done.
Now I have to make sure every parameter are correctly plugged and clean up the plugging in general.
After that, we'll clean up unused code from stanford-nlp and it should be good to go.
The whole processing will be somewhat very black-boxy compared to the rest of the codebase, but recoding PLAID from the ground up would require way more time (and will actually be easier with this base), so I think it is a good target for a v1 allowing people to use PLAID.
The text was updated successfully, but these errors were encountered: