Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Emoji w/o Perf Regression #107

Merged

Conversation

sunggg
Copy link
Member

@sunggg sunggg commented Dec 12, 2023

Follow-up on: #104

Previous approach had to be reverted due to significant performance regression.
This PR is based on vllm/TGI's approach and no performance regression is observed in my local testing.

w/ llama 7b fp16 on A100,
Before: Throughput: 10.96 requests/s, 5241.24 tokens/s
After: Throughput: 11.00 requests/s, 5259.99 tokens/s

@sunggg sunggg merged commit 73926cb into octoml:batch-serving Dec 12, 2023
1 check passed
Lunderberg pushed a commit to Lunderberg/mlc-llm that referenced this pull request Jan 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants