Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

master分支对应lucene哪个版本? #5

Open
wenma opened this issue Dec 25, 2019 · 3 comments
Open

master分支对应lucene哪个版本? #5

wenma opened this issue Dec 25, 2019 · 3 comments

Comments

@wenma
Copy link

wenma commented Dec 25, 2019

用master分支build了一个索引,读取segments内容后发现版本是6.4.18? 这个版本对应兼容lucene哪个版本呢?

>> read(segments_1)

header length: 35
lucene version: 6.4.18
version: 4
nameCounter: 1
segCount: 1
...
  1. rucene生成的索引是完全兼容原生lucene的吗?
  2. 有没有和原生lucene做对比的benchmark数据?
  3. 有没有在分布式存储上build索引的测试数据(之前看过你们分享的ppt)?
  4. merge segment的重IO操作rucene的表现怎么样? 尤其是在分布式存储上,有没有数据?
@sunxiaoguang
Copy link
Contributor

Thanks for your interest. In order to make this conversation beneficial for non Chinese speakers, I'm going to answer your questions in English.

Q: Does Rucene produce compatible index with Lucene?
A: Yes it is fully compatible at this time. There was a long time before write was implemented, we used Rucene to serve online search with indices built by Lucene.

Q: Do you have benchmark against official Lucene
A: There is no thorough benchmark yet. But I would say performance wise they are similar as long as there are memory for JVM. The biggest advantage of Rucene is deterministic response time. There is no GC caused pause time and cluster outage due to frequent full GC.

Q: Do you have benchmark to build index on distributed file system.
A: Simple answer is no, we run it on local provisioned volume with K8S. However we have plan to deploy our search engine on Ceph in future

Q: Does segment merging pose significant burden to system?
A: We run online index update with it, yes it is not trivial to run indexing and segment merging. I would say it is pretty much the same as Lucene.

@fulmicoton
Copy link

fulmicoton commented Dec 27, 2019

But I would say performance wise they are similar as long as there are memory for JVM.

For intersections, Rucene is a bit slower but very close to Lucene's performance.
For unions on the other hand, Lucene is much faster. Lucene is typically more than 10x faster if the block-wand optimisation can be used and more than 2x faster otherwise.

Source:
https://github.com/tantivy-search/search-benchmark-game
https://tantivy-search.github.io/bench/

@sunxiaoguang
Copy link
Contributor

But I would say performance wise they are similar as long as there are memory for JVM.

For intersections, Rucene is a bit slower but very close to Lucene's performance.
For unions on the other hand, Lucene is much faster. Lucene is typically more than 10x faster if the block-wand optimisation can be used and more than 2x faster otherwise.

Source:
https://github.com/tantivy-search/search-benchmark-game
https://tantivy-search.github.io/bench/

Thank's a lot for your great work Paul. @tongjianlin can we take a look at the benchmark and investigate the new optimizations introduced into Lucene that we are missing in the past.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants