You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to express my sincere thanks to the Litellm project for providing such a great unified interface for managing large models. It's a fantastic tool that simplifies many aspects of model management. I have been using the Litellm Docker image to manage large models, and it has been working well overall.
Motivation, pitch
However, I’ve noticed that the throughput when using Litellm is significantly lower than when using Nginx. During these lower throughput periods, the CPU usage remains stable at around 110%, which made me wonder if the performance bottleneck is due to Python's Global Interpreter Lock (GIL) limiting the multi-threaded throughput of Litellm. I’m considering using multi-processing to improve the throughput, but I would like to confirm if this approach is suitable for Litellm, or if there might be a more optimal solution. Could you kindly advise if Litellm supports multi-process execution, or if there is something in my current usage that might be causing this issue?
Are you a ML Ops Team?
No
Twitter / LinkedIn details
No response
The text was updated successfully, but these errors were encountered:
To clarify, I am using the Docker deployment of Litellm, and the startup command is: docker run -v /vdb/configs/llm_32b.yaml:/app/config.yaml -p 38085:4000 --log-driver json-file --log-opt max-size=1g --restart always --name test_litellm ghcr.io/berriai/litellm:main-latest --config /app/config.yaml
The Feature
I would like to express my sincere thanks to the Litellm project for providing such a great unified interface for managing large models. It's a fantastic tool that simplifies many aspects of model management. I have been using the Litellm Docker image to manage large models, and it has been working well overall.
Motivation, pitch
However, I’ve noticed that the throughput when using Litellm is significantly lower than when using Nginx. During these lower throughput periods, the CPU usage remains stable at around 110%, which made me wonder if the performance bottleneck is due to Python's Global Interpreter Lock (GIL) limiting the multi-threaded throughput of Litellm. I’m considering using multi-processing to improve the throughput, but I would like to confirm if this approach is suitable for Litellm, or if there might be a more optimal solution. Could you kindly advise if Litellm supports multi-process execution, or if there is something in my current usage that might be causing this issue?
Are you a ML Ops Team?
No
Twitter / LinkedIn details
No response
The text was updated successfully, but these errors were encountered: