Support Emoji w/o Perf Regression #107

sunggg · 2023-12-12T01:21:38Z

Follow-up on: #104

Previous approach had to be reverted due to significant performance regression.
This PR is based on vllm/TGI's approach and no performance regression is observed in my local testing.

w/ llama 7b fp16 on A100,
Before: Throughput: 10.96 requests/s, 5241.24 tokens/s
After: Throughput: 11.00 requests/s, 5259.99 tokens/s

sunggg added 3 commits December 11, 2023 22:35

wip

604993f

works

f34811d

done

26ca476

masahi approved these changes Dec 12, 2023

View reviewed changes

mypy

845dfd5

sunggg merged commit 73926cb into octoml:batch-serving Dec 12, 2023
1 check passed

masahi mentioned this pull request Jan 3, 2024

[Bug] Empty token can appear at the beginning of a generated sequence #140

Open

sunggg mentioned this pull request Jan 10, 2024

Enable Logprobs in MLC Batch Serving #82

Merged

Lunderberg pushed a commit to Lunderberg/mlc-llm that referenced this pull request Jan 30, 2024

update pointer (octoml#107)

cfaa5ee

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Emoji w/o Perf Regression #107

Support Emoji w/o Perf Regression #107

sunggg commented Dec 12, 2023

Support Emoji w/o Perf Regression #107

Support Emoji w/o Perf Regression #107

Conversation

sunggg commented Dec 12, 2023