[Feature]: Does Litellm Support Multi-Processing to Improve Throughput? (Potential Impact of GIL on Multi-Threaded Performance) #7579

jts250 · 2025-01-06T04:22:51Z

The Feature

I would like to express my sincere thanks to the Litellm project for providing such a great unified interface for managing large models. It's a fantastic tool that simplifies many aspects of model management. I have been using the Litellm Docker image to manage large models, and it has been working well overall.

Motivation, pitch

However, I’ve noticed that the throughput when using Litellm is significantly lower than when using Nginx. During these lower throughput periods, the CPU usage remains stable at around 110%, which made me wonder if the performance bottleneck is due to Python's Global Interpreter Lock (GIL) limiting the multi-threaded throughput of Litellm. I’m considering using multi-processing to improve the throughput, but I would like to confirm if this approach is suitable for Litellm, or if there might be a more optimal solution. Could you kindly advise if Litellm supports multi-process execution, or if there is something in my current usage that might be causing this issue?

Are you a ML Ops Team?

No

Twitter / LinkedIn details

No response

jts250 · 2025-01-06T04:24:43Z

To clarify, I am using the Docker deployment of Litellm, and the startup command is:
docker run -v /vdb/configs/llm_32b.yaml:/app/config.yaml -p 38085:4000 --log-driver json-file --log-opt max-size=1g --restart always --name test_litellm ghcr.io/berriai/litellm:main-latest --config /app/config.yaml

ishaan-jaff · 2025-01-07T04:39:41Z

what RPS are you trying to reach @jts250 ?

we see 1K RPS with this update

jts250 · 2025-01-08T01:45:44Z

what RPS are you trying to reach @jts250 ?

we see 1K RPS with this update

@ishaan-jaff
Thanks for your reply! I will try the latest version~ My previous use case involves a concurrency of around 10-100, with input tokens between 500 and 2000, and output tokens under 800. At this point, the throughput speed of Litellm may encounter a bottleneck.

jts250 added the enhancement New feature or request label Jan 6, 2025

keenranger mentioned this issue Feb 12, 2025

Mitigate Event Loop Errors in Gemini by Adding Connection: close Header #8480

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Does Litellm Support Multi-Processing to Improve Throughput? (Potential Impact of GIL on Multi-Threaded Performance) #7579

[Feature]: Does Litellm Support Multi-Processing to Improve Throughput? (Potential Impact of GIL on Multi-Threaded Performance) #7579

jts250 commented Jan 6, 2025

jts250 commented Jan 6, 2025

ishaan-jaff commented Jan 7, 2025

jts250 commented Jan 8, 2025 •

edited

Loading

[Feature]: Does Litellm Support Multi-Processing to Improve Throughput? (Potential Impact of GIL on Multi-Threaded Performance) #7579

[Feature]: Does Litellm Support Multi-Processing to Improve Throughput? (Potential Impact of GIL on Multi-Threaded Performance) #7579

Comments

jts250 commented Jan 6, 2025

The Feature

Motivation, pitch

Are you a ML Ops Team?

Twitter / LinkedIn details

jts250 commented Jan 6, 2025

ishaan-jaff commented Jan 7, 2025

jts250 commented Jan 8, 2025 • edited Loading

jts250 commented Jan 8, 2025 •

edited

Loading