Add vLLM and MII Deepspeed for Throughput Benchmarking #117
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR integrates vLLM and Deepspeed MII for the convenience of benchmarking.
The usage:
Some performance result with new benchmark script.
Currently, I have a problem running Deepspeed MII with multi-gpu and vllm for Llama families.
With this PR, you wouldn't be able to compare with the previous numbers because
--greedy-sampling-ratio
to consider the performance of random sampling, which is more expensive path than greedy. Prior to this PR, the--greedy-sampling-ratio
was 0.0.--max-output-tokens
to force generation length globally since MII does not seem to support per-request max-output-tokens.The follow-up PR will bring latency benchmark.