Skip to content

v0.2.3

Compare
Choose a tag to compare
@KMSorSMS KMSorSMS released this 06 Mar 09:05
· 8 commits to main since this release
63b1c85

We're excited to announce the update of KTransformers v0.2.3! You can now compile from the GitHub source code. Release packages and Docker images are being built/uploaded - stay tuned!

Key Updates:

  1. Low-Precision Inference Optimization #754

    1. Added IQ1_S/IQ2_XXS quantized matmul support, now compatible with Unsloth's DeepSeek-R1 1.58bit/2.51bit dynamic quantized weights

    2. Released DeepSeek-R1 mixed-precision model (IQ1+FP8) achieving enhanced performance:

      • 19GB VRAM usage & 140GB system memory consumption

      • MMLU score of 83.6, slightly outperforming full-precision DeepSeek-V3

      • Ongoing benchmarks: View Details (Special thanks to @moonshadow-25 and @godrosev for their huge contributions to v0.2.3)

  2. Long Context Handling Enhancement #750

    1. Implemented chunked prefill mechanism. Supports processing 139K-token contexts with DeepSeek-R1 on 24GB VRAM

    2. Note: As DeepSeek's native context window only supports 128K tokens, we will pause further optimizations for extended context handling.


Coming Next - v0.2.4 Preview The upcoming v0.2.4 will be the final minor release in the 0.2 series, delivering the most crucial update to transform KTransformers from "a toy project" to "a practical solution" - multi-concurrency support.

Scheduled for release within two weeks, this update will be followed by our 0.3 version development featuring:

  • AMX-powered optimizations for enhanced performance

  • Expanded hardware support including AMD, XPU, MetaX(沐曦), Moore threads(摩尔线程), and Ascend(昇腾)GPUs