Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indexing causes SEARCH to become very slow #38

Open
f-prime opened this issue Jul 1, 2019 · 8 comments
Open

Indexing causes SEARCH to become very slow #38

f-prime opened this issue Jul 1, 2019 · 8 comments
Labels
bug Something isn't working

Comments

@f-prime
Copy link
Owner

f-prime commented Jul 1, 2019

While attempting to index a lot of Documents, I also tried to send a few SEARCH commands to see how things would behave. While the indexing was still going on, SEARCH commands became very very slow.

@f-prime f-prime added the bug Something isn't working label Jul 1, 2019
@00-matt
Copy link
Contributor

00-matt commented Jul 1, 2019

I think that this is just the price to pay for a single-threaded architecture. I'm not sure which part is slowest, but if it was something like the indexer we could try and move that to a new thread.

@f-prime
Copy link
Owner Author

f-prime commented Jul 1, 2019

The question too is, will there be a need for concurrent connections? If I am creating search engine I am going to be having a crawler and the SEARCH engine running in parallel. So info will be constantly indexed but I still want my searches to be fast. I like the idea of moving the indexer into a new thread and keeping the connection handling single threaded.

@deepanprabhu
Copy link

Nice conversation folks,
One idea i had was, something similar to double buffer of graphics systems.
The index for searching can be read only. When a new index is ready each moment, we can point to the new index and all new searches would happen on the new index (going forward). This will keep the crawler separate from the querying engine.

@f-prime
Copy link
Owner Author

f-prime commented Jul 10, 2019

Interesting idea. The problem though is that each read and write is a blocking operation because of the single threaded architecture and the fact that we are currently processing all requests from a single connection until. On the Slack channel this conversation was continued and @00-matt suggested that instead of handling one connection at a time until the read() is complete we handle a single command per connection in a similar fashion to the NodeJS event loop. This will prevent other connections from hanging for longer periods of time than they need to.

Would your idea of a double buffer fix the blocking issue? I also don't know if this would be possible or realistic to do within a single thread.

@deepanprabhu
Copy link

Hmmm, got it.
Usually blocking IO, is usually handled using async methods, and yes they have an event loop within. What @00-matt said, makes sense.

Double buffer technique is used to decouple Index creation and querying.
When querying is served through one index, an updated index is created on a different buffer ( may be a different process - which is like a service or daemon ) , and when complete they are swapped, and the querying starts to happen on the new index.

Another idea @f-prime is,
Did you think about, callback model for giving out the results ?
So you take a query and a callback on one request, and when the results are ready, you pass the result through a callback to the client which requested the search. Client don't synchronously wait for you to reply, but actually get fed the results when they are ready.

I find the discussion very interesting !

@00-matt
Copy link
Contributor

00-matt commented Jul 11, 2019 via email

@deepanprabhu
Copy link

Its been a while i touched C and C++.
I moved to java, but i am feeling to come back :).
Any bugs, that I can start with ?

@f-prime
Copy link
Owner Author

f-prime commented Jul 15, 2019

Hey @deepanprabhu glad you are interested! I'd suggest joining the slack channel here https://join.slack.com/t/fist-global/shared_invite/enQtNjcyNzY4MTUwMDg0LTRiYzM5ZWNkOTMwODYzODRjNDQzNThiYjdhNjgzZDUxZGYxODRjOTI4NTcwYmYzYmI5MTViYjFiNGFlNWEwYjY

You can also check the current open tasks here: https://github.com/f-prime/fist/projects/1

Just claim something in Todo and make a PR for the fix and we can discuss the implementation there :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants