You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I benchmarked with a dataset of 10 million entries, Lucene often shows varying performance results and precision. In partial loading experiment #2401 , Lucene showed a 45% faster performance compared to FAISS. Specifically, Lucene achieved a query time of 30.39 ms, while FAISS took 55.92 ms.
With temporary debugging code, I discovered that Lucene visited much fewer vectors per query than FAISS. Lucene processed approximately 30,000 vectors per query, whereas FAISS visited between 60,000 and 75,000 vectors.
Despite Lucene's faster search speed, its recall was only 68%, which is typically below user expectations for vector search.
Engine
50% Latency
90% Latency
99% Latency
99.9% Latency
100% Latency
recall@k / recall@1
Lucene
30.39
33.18
35.82
38.78
81.12
0.68 / 0.85
--
--
--
--
--
--
--
FAISS
55.92
58.93
63.87
66.56
92.87
0.89 / 0.97
This is somewhat confusing, as the index was built using the same efConstruction parameter and configured to result in exactly 6 segments after a force merge. In other words, following indexing, the segments are merged into 6 final segments.
Goal
Determine the factors contributing to the differing results observed during the indexing process.
Re-evaluate performance as efSearch increases. For example, observe how performance evolves in the case above, where higher efSearch values are expected to improve precision but increase latency as the HNSW searcher visits more vectors to refine candidates.
Action Items
Compare the indexing logic of both engines and document the differences in the comments.
Convert the FAISS index into a Lucene index then rerun the performance benchmark to observe latency changes. This should be relatively straightforward since the logical layout of the HNSW index is identical between the two, despite Lucene uses variable integer encoding etc.
Gradually increase efSearch to analyze how performance changes with the Lucene engine.
The text was updated successfully, but these errors were encountered:
Background
When I benchmarked with a dataset of 10 million entries, Lucene often shows varying performance results and precision. In partial loading experiment #2401 , Lucene showed a 45% faster performance compared to FAISS. Specifically, Lucene achieved a query time of 30.39 ms, while FAISS took 55.92 ms.
With temporary debugging code, I discovered that Lucene visited much fewer vectors per query than FAISS. Lucene processed approximately 30,000 vectors per query, whereas FAISS visited between 60,000 and 75,000 vectors.
Despite Lucene's faster search speed, its recall was only 68%, which is typically below user expectations for vector search.
This is somewhat confusing, as the index was built using the same
efConstruction
parameter and configured to result in exactly 6 segments after a force merge. In other words, following indexing, the segments are merged into 6 final segments.Goal
efSearch
increases. For example, observe how performance evolves in the case above, where higherefSearch
values are expected to improve precision but increase latency as the HNSW searcher visits more vectors to refine candidates.Action Items
efSearch
to analyze how performance changes with the Lucene engine.The text was updated successfully, but these errors were encountered: