Papers-Reading

Algorithm

Date	Paper	Key Words
1972	Karp's 21 NP-complete problems	Karp's 21 NP-complete problems
1973	An n^{5/2} algorithm for maximum matchings in bipartite graphs	Hopcroft-Karp Algorithm
2002	A 27/26-Approximation Algorithm for the Chromatic Sum Coloring of Bipartite Graphs	Chromatic Sum Coloring of Bipartite Graphs
2015.6.16	An Efficient Data Structure for Processing Palindromes in Strings	Palindromic Tree
2017.8.11	An Introduction to Quantum Computing, Without the Physics	Quantum Computing, Without the Physics
2018.7.30	A Simple Near-Linear Pseudopolynomial Time Randomized Algorithm for Subset Sum	A Simple Near-Linear Pseudopolynomial Time Randomized Algorithm for Subset Sum
2021.2.11	Hybrid Neural Fusion for Full-frame Video Stabilization	Video Stabilization Algorithm
2022.11.21	The Berlekamp-Massey Algorithm revisited	Berlekamp-Massey Algorithm

LLM

Date	Paper	Key Words
2017.6.12	Attention Is All You Need	Transformer & Attention
2018.6.11	Improving Language Understanding by Generative Pre-Training	Generative transformer model
2022.5.27	FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness	Flash Attention
2022.6.4	ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers	INT8 weights and INT8 activations
2022.8.15	LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale	LLM.int8
2022.11.18	SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models	8-bit Weight，8-bit Activation (W8A8)
2023.7.18	FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning	Flash Attention 2
2023.3.13	FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU	KV Cache 4-bit
2023.6.13	SqueezeLLM: Dense-and-Sparse Quantization	KV Cache 3-bit
2024.1.31	KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization	KV Cache 2、3、4bit
2024.2.5	KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache	KV Cache 2-bit
2024.3.19	When Do We Not Need Larger Vision Models?	Scaling on Scales
2024.7.10	PaliGemma: A versatile 3B VLM for transfer	Google small VLM: Paligemma
2024.7.12	FlashAttention-3 is optimized for Hopper GPUs (e.g. H100)	Flash Attention 3
2024.7.28	Enhancing Taobao Display Advertising with Multimodal Representations: Challenges, Approaches and Insights	Advertising with Multimodal
2024.8.22	NanoFlow: Towards Optimal Large Language Model Serving Throughput	A novel serving framework: NanoFlow
2024.10.3	SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration	Sage Attention
2024.11.17	SageAttention2: Efficient Attention with Thorough Outlier Smoothing and Per-thread INT4 Quantization	Sage Attention 2
2024.12.27	DeepSeek-V3 Technical Report	DeepSeek-V3 Technical Report

Name		Name	Last commit message	Last commit date
Latest commit History 111 Commits
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Papers-Reading

Algorithm

LLM

Engineering

About

Releases

Packages

lzyrapx/Papers-Books-Reading

Folders and files

Latest commit

History

Repository files navigation

Papers-Reading

Algorithm

LLM

Engineering

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages