Skip to content

Commit

Permalink
deploy: 3d5db7a
Browse files Browse the repository at this point in the history
  • Loading branch information
n1o committed Dec 2, 2024
1 parent 42dc8a2 commit 9abc6a1
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion posts/hymba-new-ssm-att-hybrid-breed/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@
<span class=sr-only>Link to heading</span></a></h3><p>Performance on retrieving a specific value &ldquo;needle&rdquo; from the input &ldquo;haystack&rdquo;:</p><p><img src=/images/hymba_needle_in_the_haystack.png></p><h3 id=instruction-tuning>Instruction Tuning
<a class=heading-link href=#instruction-tuning><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
<span class=sr-only>Link to heading</span></a></h3><p>By taking it further and applying Directed Preference Optimization (DPO) we can compare its performance to current state of the art sub 2B language models, where Hymba is the clear winner.
<img src=./images/hymba_instruct.png></p><h1 id=recap>Recap
<img src=/images/hymba_instruct.png></p><h1 id=recap>Recap
<a class=heading-link href=#recap><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
<span class=sr-only>Link to heading</span></a></h1><p>With fusing Mamba2 and Attention on the same layer we get a synergistic effect. Mamba is responsible for long range recall, and Attention serves as a short term memory with perfect recall. Since Attention is not responsible for modeling long range dependencies we use Sliding Window Attention, except in the first, middle and last layer. By incorporating meta tokens we avoid Attention sinks in our attention layers, and make focus on more actually important tokens, and we as well fortify the long range modeling behavior of Mamba. By introducing a shared KV cache between 2 adjacent SWA layers we achieve a 20x times reduced cache size and the overall throughput is 3x higher than a comparable vanilla Attention model.</p><h1 id=final-impressions>Final impressions
<a class=heading-link href=#final-impressions><i class="fa-solid fa-link" aria-hidden=true title="Link to heading"></i>
Expand Down

0 comments on commit 9abc6a1

Please sign in to comment.