Indexing causes SEARCH to become very slow #38

f-prime · 2019-07-01T13:27:57Z

While attempting to index a lot of Documents, I also tried to send a few SEARCH commands to see how things would behave. While the indexing was still going on, SEARCH commands became very very slow.

00-matt · 2019-07-01T13:32:10Z

I think that this is just the price to pay for a single-threaded architecture. I'm not sure which part is slowest, but if it was something like the indexer we could try and move that to a new thread.

f-prime · 2019-07-01T13:34:44Z

The question too is, will there be a need for concurrent connections? If I am creating search engine I am going to be having a crawler and the SEARCH engine running in parallel. So info will be constantly indexed but I still want my searches to be fast. I like the idea of moving the indexer into a new thread and keeping the connection handling single threaded.

deepanprabhu · 2019-07-10T23:11:02Z

Nice conversation folks,
One idea i had was, something similar to double buffer of graphics systems.
The index for searching can be read only. When a new index is ready each moment, we can point to the new index and all new searches would happen on the new index (going forward). This will keep the crawler separate from the querying engine.

f-prime · 2019-07-10T23:54:06Z

Interesting idea. The problem though is that each read and write is a blocking operation because of the single threaded architecture and the fact that we are currently processing all requests from a single connection until. On the Slack channel this conversation was continued and @00-matt suggested that instead of handling one connection at a time until the read() is complete we handle a single command per connection in a similar fashion to the NodeJS event loop. This will prevent other connections from hanging for longer periods of time than they need to.

Would your idea of a double buffer fix the blocking issue? I also don't know if this would be possible or realistic to do within a single thread.

deepanprabhu · 2019-07-11T03:56:56Z

Hmmm, got it.
Usually blocking IO, is usually handled using async methods, and yes they have an event loop within. What @00-matt said, makes sense.

Double buffer technique is used to decouple Index creation and querying.
When querying is served through one index, an updated index is created on a different buffer ( may be a different process - which is like a service or daemon ) , and when complete they are swapped, and the querying starts to happen on the new index.

Another idea @f-prime is,
Did you think about, callback model for giving out the results ?
So you take a query and a callback on one request, and when the results are ready, you pass the result through a callback to the client which requested the search. Client don't synchronously wait for you to reply, but actually get fed the results when they are ready.

I find the discussion very interesting !

00-matt · 2019-07-11T08:31:40Z

On Thursday, 11 July 2019 00:11:04 BST Deepan Prabhu Babu wrote: One idea i had was, something similar to double buffer of graphics systems. The index for searching can be read only. When a new index is ready each moment, we can point to the new index and all new searches would happen on the new index (going forward). This will keep the crawler separate from the querying engine.

I think that this is definitely worth trying. Although I think it should be made optional, because it could be worse for people with large data sets that don't change very often.

On Thursday, 11 July 2019 00:54:07 BST Frankie Primerano wrote: instead of handling one connection at a time until the `read()` is complete we handle a single command per connection in a similar fashion to the NodeJS event loop. This will prevent other connections from hanging for longer periods of time than they need to.

Yeah it would be like using `process.nextTick()` in Node.js so that all users get treated fairly. It wouldn't really make anything faster though.

Would your idea of a double buffer fix the blocking issue? I also don't know if this would be possible or realistic to do within a single thread.

It should do, but it will require another thread to do the indexing in the background.

On Thursday, 11 July 2019 04:56:58 BST Deepan Prabhu Babu wrote: Did you think about, callback model for giving out the results ? So you take **a query and a callback** on one request, and when the results are ready, you pass the result through a callback to the client which requested the search. Client don't synchronously wait for you to reply, but actually get fed the results when they are ready.

I don't think that doing this on the server would provide any value. Clients can already be non-blocking (see the Node.js client library). Having replies happen out of order from the server would mean that we would need to give each request a unique ID to match it with a later response, similar to what JSON-RPC does.

deepanprabhu · 2019-07-12T19:39:43Z

Its been a while i touched C and C++.
I moved to java, but i am feeling to come back :).
Any bugs, that I can start with ?

f-prime · 2019-07-15T03:05:19Z

Hey @deepanprabhu glad you are interested! I'd suggest joining the slack channel here https://join.slack.com/t/fist-global/shared_invite/enQtNjcyNzY4MTUwMDg0LTRiYzM5ZWNkOTMwODYzODRjNDQzNThiYjdhNjgzZDUxZGYxODRjOTI4NTcwYmYzYmI5MTViYjFiNGFlNWEwYjY

You can also check the current open tasks here: https://github.com/f-prime/fist/projects/1

Just claim something in Todo and make a PR for the fix and we can discuss the implementation there :)

f-prime added the bug Something isn't working label Jul 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Indexing causes SEARCH to become very slow #38

Indexing causes SEARCH to become very slow #38

f-prime commented Jul 1, 2019

00-matt commented Jul 1, 2019 •

edited

Loading

f-prime commented Jul 1, 2019

deepanprabhu commented Jul 10, 2019

f-prime commented Jul 10, 2019

deepanprabhu commented Jul 11, 2019

00-matt commented Jul 11, 2019 via email

deepanprabhu commented Jul 12, 2019

f-prime commented Jul 15, 2019

Indexing causes SEARCH to become very slow #38

Indexing causes SEARCH to become very slow #38

Comments

f-prime commented Jul 1, 2019

00-matt commented Jul 1, 2019 • edited Loading

f-prime commented Jul 1, 2019

deepanprabhu commented Jul 10, 2019

f-prime commented Jul 10, 2019

deepanprabhu commented Jul 11, 2019

00-matt commented Jul 11, 2019 via email

deepanprabhu commented Jul 12, 2019

f-prime commented Jul 15, 2019

00-matt commented Jul 1, 2019 •

edited

Loading