cermeng

Yunmeng cermeng

Achievements

Stars

7 repositories

Awesome-LLM: a curated list of Large Language Model

📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, Flash-Attention, Paged-Attention, Parallelism, etc. 🎉🎉

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 39,935 5,979 Updated Mar 3, 2025

LLM KV cache compression made easy

Python 412 27 Updated Feb 18, 2025

Fast and memory-efficient exact attention

Python 16,024 1,512 Updated Mar 2, 2025

A framework for serving and evaluating LLM routers - save LLM costs without compromising quality

Python 3,669 275 Updated Aug 10, 2024

SGLang is a fast serving framework for large language models and vision language models.

Python 11,162 1,115 Updated Mar 3, 2025