You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently we have --per_process_gpu_memory_fraction to limit the memory usage of model server and there is no such method available to limit GPU usage on model level. Ref: /pull/694
Please try using --per_process_gpu_memory_fraction as shown below and let us know if it works for you. If not, please help us the use case of having GPU limit on model level. Thank you!
Example code to run model server image with memory limit enabled:
@singhniraj08 no. per_process_gpu_memory_fraction is not work for me. in k8s alloc one gpu for one container which contain multi models. per_process_gpu_memory_fraction is container level(serving level)
How to configure GPU memory limit for each model in one serving?
i just found configure "one serving instance" gpu memory limit with platform_config_file->per_process_gpu_memory_fraction
how can i configure each model gpu memory in one serving instance? i do not see any config in model_config like:
model_config_list {
config {
name: 'model-1'
base_path: '/models/model-1'
model_platform: 'tensorflow'
}
config {
name: 'model-2'
base_path: '/models/model-2'
model_platform: 'tensorflow'
}
}
The text was updated successfully, but these errors were encountered: