You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The first time we spin up a new inference job on the cluster, it takes over a minute because Ray has to hit HuggingFace, load the model into cache, instantiate the artifact, and then start inference. Model instantiation happens here:
Given that we don't know which model the user will select or their machine specs, we can try a couple strategies here depending on our product outcomes:
Pre-populate a small model and point the user to try out that model locally, cache that model
Load a small model as soon as the user instantiates and loads the UI and give an error message
Alternatives
No response
Contribution
Happy to work on this
Have you searched for similar issues before submitting this one?
Yes, I have searched for similar issues
The text was updated successfully, but these errors were encountered:
Motivation
The first time we spin up a new inference job on the cluster, it takes over a minute because Ray has to hit HuggingFace, load the model into cache, instantiate the artifact, and then start inference. Model instantiation happens here:
lumigator/lumigator/python/mzai/jobs/inference/inference.py
Line 105 in 09799ee
lumigator/lumigator/python/mzai/jobs/evaluator/evaluator/jobs/model_clients.py
Line 78 in 09799ee
At the cluster level, we can mitigate this by pre-populating the model cache:
https://huggingface.co/docs/datasets/en/cache
Given that we don't know which model the user will select or their machine specs, we can try a couple strategies here depending on our product outcomes:
Alternatives
No response
Contribution
Happy to work on this
Have you searched for similar issues before submitting this one?
The text was updated successfully, but these errors were encountered: