-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question regarding the numThreads parameter #303
Comments
Hi @mhamaneh, numThreads paramer and batchSizenumThreads = affects overall memory consumption max. memory consumption = The preset configuration should be fien in almost any scenarios for personal computers. If you run oktoberfest on large server machines, you can up the batchsize or the number of threads, but don't use more than 10 threads at any times, which we found out to be problematic with the servers! Instead, use larger batchsize then. ExplanationImagine you only had one thread and an infinite batch size. In this scenario, all the predictions would be retrieved, stored in memory, then written to the file. Therefore, this is likely requiring way more memory, then you have available on your machine, since spectral libraries can easily exceeds 100GB in size depending on what you do. Also this, would be a waste of time, since while you retrieve predictions, you could already start writing them to disk. This is why Oktoberfest does this in parallel: 1 thread is always waiting for predictions to write, But still, you could predict faster than you write, in which case, your memory would blow up over time. At the same time, you want to ensure that you are done as fast as possible. The strategy Oktoberfest uses, is a fixed memory slot number for batches, which is eactly This figure illustrates how it works: How to optimize the two parametersWhat you want is to keep the writer process busy, i.e. there should always be data available in memory to write to disk, and you should have enough prediction threads to fill up the available memory slots. If they are full, the prediction threads will then wait, until a slot gets freed up by the writer process. This way, you have the fastest possible way to generate a spectral library. where you are only resctricted by how fast your machine can write data to disk. Local predictionsWhile Oktoberfest relies on Koina to receive its predictions supporting multiple state-of-the-art peptide property prediction models, there are efforts to ensure local predictions can be made. This works by designing a local model through the DLomix package, but the feature is currently not yet 100% ready. Once it is, people can either use Prosit-based template models, or design their own models using Dlomix: https://dlomix.readthedocs.io/en/main/ As an alternative to this, you can also set up your own local Koina server. We provide a docker container that should be relatively easy to be set up. You can then simply use the server address for your local server in the config file, i.e. |
Hi
I have a question regarding the
numThreads
parameter : The documentation states that this parameter "needs to be balanced with batchsize". What does this mean? Also, is there a way to run Oktoberfest/Prosit on a local machine without using the server?Thanks
The text was updated successfully, but these errors were encountered: