Release v0.2.3 · kvcache-ai/ktransformers

We're excited to announce the update of KTransformers v0.2.3! You can now compile from the GitHub source code. Release packages and Docker images are being built/uploaded - stay tuned!

Key Updates:

Low-Precision Inference Optimization #754
1. Added IQ1_S/IQ2_XXS quantized matmul support, now compatible with Unsloth's DeepSeek-R1 1.58bit/2.51bit dynamic quantized weights
2. Released DeepSeek-R1 mixed-precision model (IQ1+FP8) achieving enhanced performance:
  - 19GB VRAM usage & 140GB system memory consumption
  - MMLU score of 83.6, slightly outperforming full-precision DeepSeek-V3
  - Ongoing benchmarks: View Details (Special thanks to @moonshadow-25 and @godrosev for their huge contributions to v0.2.3)
Long Context Handling Enhancement #750
1. Implemented chunked prefill mechanism. Supports processing 139K-token contexts with DeepSeek-R1 on 24GB VRAM
2. Note: As DeepSeek's native context window only supports 128K tokens, we will pause further optimizations for extended context handling.

Coming Next - v0.2.4 Preview The upcoming v0.2.4 will be the final minor release in the 0.2 series, delivering the most crucial update to transform KTransformers from "a toy project" to "a practical solution" - multi-concurrency support.

Scheduled for release within two weeks, this update will be followed by our 0.3 version development featuring:

AMX-powered optimizations for enhanced performance
Expanded hardware support including AMD, XPU, MetaX（沐曦）, Moore threads（摩尔线程）, and Ascend（昇腾）GPUs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.2.3

Contributors