Skip to content
View cermeng's full-sized avatar

Block or report cermeng

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

🌟 LLM

7 repositories

Awesome-LLM: a curated list of Large Language Model

21,798 1,783 Updated Feb 2, 2025

πŸ“–A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, Flash-Attention, Paged-Attention, Parallelism, etc. πŸŽ‰πŸŽ‰

3,548 244 Updated Mar 3, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 39,935 5,979 Updated Mar 3, 2025

LLM KV cache compression made easy

Python 412 27 Updated Feb 18, 2025

Fast and memory-efficient exact attention

Python 16,024 1,512 Updated Mar 2, 2025

A framework for serving and evaluating LLM routers - save LLM costs without compromising quality

Python 3,669 275 Updated Aug 10, 2024

SGLang is a fast serving framework for large language models and vision language models.

Python 11,162 1,115 Updated Mar 3, 2025