You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is there a way to cap the number (e.g. CPU cores, CUDA MPS threads) of resources assigned to each model in a multi-model tensorflow server?
The only way (straightforward way and not considering lower-level tools like cpu limits), I can think of resource allocation to the microservices (like model servers) is containerization or VMs, so I think there isn’t such an option. Is that true?
The text was updated successfully, but these errors were encountered:
Similar feature request #2097 is already in work. Requesting you to follow and +1 that thread for updates.
Yes there are no option currently to configure resources per model for multi-model serving setup. But you can try the flag rest_api_num_threads, as mentioned here if that helps.
Configuring to limit CPU usage/core per model in mutli-model setup is currently not in our roadmap right now. But this sounds like a good feature to implement. I will keep this as a feature request and discuss internally within team for implementation. Once we have an update, we will update this thread.
Thank you for bringing this up to our attention.
Is there a way to cap the number (e.g. CPU cores, CUDA MPS threads) of resources assigned to each model in a multi-model tensorflow server?
The only way (straightforward way and not considering lower-level tools like cpu limits), I can think of resource allocation to the microservices (like model servers) is containerization or VMs, so I think there isn’t such an option. Is that true?
The text was updated successfully, but these errors were encountered: