Skip to content
View cermeng's full-sized avatar

Block or report cermeng

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

CUDA Templates for Linear Algebra Subroutines

C++ 6,916 1,130 Updated Feb 28, 2025

MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.

Python 1,975 178 Updated Feb 25, 2025

LLM inference in C/C++

C++ 75,632 10,929 Updated Mar 1, 2025

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

Python 17,120 1,420 Updated Feb 25, 2025

📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).

Cuda 2,587 267 Updated Feb 24, 2025

个人中文简历 Latex 源码 https://hijiangtao.github.io/

TeX 2,081 568 Updated Sep 4, 2024

FlashInfer: Kernel Library for LLM Serving

Cuda 2,229 232 Updated Feb 28, 2025

SGLang is a fast serving framework for large language models and vision language models.

Python 11,130 1,112 Updated Mar 2, 2025

A framework for serving and evaluating LLM routers - save LLM costs without compromising quality

Python 3,666 275 Updated Aug 10, 2024

Fast and memory-efficient exact attention

Python 16,006 1,509 Updated Mar 2, 2025

LLM KV cache compression made easy

Python 412 27 Updated Feb 18, 2025

计算机自学指南

HTML 60,722 7,106 Updated Feb 27, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 39,868 5,972 Updated Mar 2, 2025

📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, Flash-Attention, Paged-Attention, Parallelism, etc. 🎉🎉

3,539 244 Updated Mar 1, 2025

Awesome-LLM: a curated list of Large Language Model

21,780 1,780 Updated Feb 2, 2025
Showing results