[FEATURE]: Reduce Ray job cold-start time #694

veekaybee · 2025-01-20T21:05:55Z

Motivation

The first time we spin up a new inference job on the cluster, it takes over a minute because Ray has to hit HuggingFace, load the model into cache, instantiate the artifact, and then start inference. Model instantiation happens here:

lumigator/lumigator/python/mzai/jobs/inference/inference.py

Line 105 in 09799ee

model_client = HuggingFaceModelClient(config)

and more specifically the model is loaded here:

lumigator/lumigator/python/mzai/jobs/evaluator/evaluator/jobs/model_clients.py

Line 78 in 09799ee

    
           self._model = hf_model_loader.load_pretrained_model(config.model).to(self._device)

At the cluster level, we can mitigate this by pre-populating the model cache:

https://huggingface.co/docs/datasets/en/cache

Given that we don't know which model the user will select or their machine specs, we can try a couple strategies here depending on our product outcomes:

Pre-populate a small model and point the user to try out that model locally, cache that model
Load a small model as soon as the user instantiates and loads the UI and give an error message

Alternatives

No response

Contribution

Happy to work on this

Have you searched for similar issues before submitting this one?

Yes, I have searched for similar issues

veekaybee added the enhancement New feature or request label Jan 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE]: Reduce Ray job cold-start time #694

[FEATURE]: Reduce Ray job cold-start time #694

veekaybee commented Jan 20, 2025

[FEATURE]: Reduce Ray job cold-start time #694

[FEATURE]: Reduce Ray job cold-start time #694

Comments

veekaybee commented Jan 20, 2025

Motivation

Alternatives

Contribution

Have you searched for similar issues before submitting this one?