-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Continuously stream input/out data #65
Comments
Also very interested in this feature. I was under the impression that centrifuge was able to continuously stream data and today realized that it in fact waits for all data to be loaded before beginning to process and output results. |
Any updates on this? I would need to run Centrifuge (or Kraken) on some 10k bacterial genomes as a part of the QC, and cannot find any options that would help to minimize the IO (since it's the NT database I'm using, loading it takes quite a while). I would be thankful for any suggestions. |
The access pattern in Centrifuge index is kind of random over all places, so you need to load all the index into memory before processing the data. If you mean you want to process multiple data sets by only loading the index once, there are two ways.
Is this what you mean? |
Ah, the second option is exactly what I needed. Thank you very much for pointing this out. |
We are using different solution for the issue:
With the ramdisk, one have to copy database to the memory only once at every system start. When one tries to access corresponding data, operating system automatically providing a link to the data allocated in the memory. |
Thank you for the suggestion. Unfortunately I am stuck with the cluster environment we use for all bioinformatic processing - and I don't think I have the permissions to set up things like that. I could use a standalone Unix machine, but processing few thousand genomes would take forever. PS. The link is broken - it references the wrong url. |
Sorry for broken URL. Fixed it. |
This is more of a feature request.
Is it possible to intermittently stream data via pipe to centrifuge so indexes do not need to be reloaded and it can be therefore setup as a server?
The text was updated successfully, but these errors were encountered: