Add initial draft for extending t5

n1o · May 7, 2024 · b045edf · b045edf
1 parent 655a63a
commit b045edf
Show file tree

Hide file tree

Showing 10 changed files with 3,562 additions and 94 deletions.
diff --git a/.hugo_build.lock b/.hugo_build.lock
diff --git a/config.toml b/config.toml
@@ -110,11 +110,16 @@ disqusShortname = "mbarak-io"
             url = "posts/"
 
             [[languages.en.menu.main]]
-            name = "Projects"
+            name = "Awesome T5"
             weight = 3
+            url = "awesome-t5/"
+
+            [[languages.en.menu.main]]
+            name = "Projects"
+            weight = 4
             url = "projects/"
 
             [[languages.en.menu.main]]
             name = "Contact me"
-            weight = 4
+            weight = 5
             url = "contact/"
diff --git a/content/awesome-t5.md b/content/awesome-t5.md
@@ -0,0 +1,24 @@
+---
+title: "Awesome T5"
+date: 2024-05-06T17:31:29+02:00
+draft: true
+---
+
+Here is a bunch of my posts about a super cool model called T5 its extensions and applications.
+
+# Basics
+- [T5, FLAN-T5, UL2]({{< relref "posts/t5-the-old-new-thing.md" >}})
+- [LongT5, CoLT5]({{< relref "posts/longer-context-for-t5.md" >}})
+
+
+# Coding T5
+- CodeT5, CodeT5+
+- AST-T5
+- CodeFusion
+
+# Cyber Security
+- LENS
+
+# With Graphs
+- Graph Neural Prompting
+
diff --git a/content/posts/longer-context-for-t5.md b/content/posts/longer-context-for-t5.md
@@ -0,0 +1,60 @@
+---
+title: "Longer Context for T5"
+date: 2024-04-29T14:10:36+02:00
+draft: true
+---
+
+# Why does T5 need a longer context?
+
+In my previous post
+[T5 the Old New Thing]({{< relref "posts/t5-the-old-new-thing.md" >}})
+we already explored why T5 is awesome. But one downside is its limited context length of 512 tokens. Now technically this cannot be compared to the context length of an decoder-only model, since T5 is an encoder-decoder model, meaning the encoder can process input up to 512 tokens, and the decoder can generate output up to 512 tokens. Making the total context length 1024 tokens.  We will cover two extensions:
+1. [LongT5](https://arxiv.org/abs/2112.07916)
+2. [ColtT5](https://arxiv.org/abs/2303.09752)
+
+Booth LongT5 and CoLT5 are eploring ways how to extend the context length of the encoder part of T5. This means that we are looking into how we are able to process longer input lengths and not necessarily generate longer texts. This approach is especially useful for problems like text summarization or document question answering.
+
+# LongT5
+
+Originally published in 2022 and it uses a new pretraining strategy called Pegasus and explores using Local Attention and a novel TGlobal attention in the encoder.
+
+## Pegasus
+Pegasus is a pretraining strategy pecially designed for abstract summarization, where we mask out key (principle) sentecnes from a document and we ask the model to reproduce them as a single string as if it would be a summary.
+
+## Local Attention
+This is just a sliding window attention, thus any given token is able to attend to a neighborhood of $l$ tokens.
+
+## TGlobal
+In TGlobal we partition the input tokens into chucks of length $k$, for each chuck we compute a global token by summing up the individual token embeddings. Now when we perform attention we take a Local Window $l$ and we apped to it all the global tokens.
+
+![TGlobal](/images/t_global_attention.png)
+
+### Cons
+Since we do not perform full attention we experience some minor performance degradation, and we need an couple of extra parameters. Computation wise we compute the global tokens on the fly, but we can compute them only once per input tokens per layer and cache them.
+
+### Pros
+We are able to process way larger input lengths.
+
+## Notes
+It is worth noting that there is a variant of LongT5 that uses only Local Attention without Global Tokens. This wariant can be scaled up to even longer sequences unfortunately there is also an more profound performance drop.
+
+# CoLT5
+Paper from 2023, and it builds upon LongT5 by bring in ideas like [Mixture of Experts](https://arxiv.org/abs/2101.03961) and [Multi Query Attention](https://arxiv.org/abs/1911.02150).
+
+## Conditional Computation
+
+![ColT5 Attention](/images/colt5_transformer_layer.png)
+
+### Routing
+
+## Decoder
+During output generation long input setences cause an memory bandwidth bottle neck, by using Multi Query Attention (MQA) to speed up the decoding process. In MQA all the query heads share the same key-value pair.
+
+![Grouped Attention Variants](/images/klu-gqa-grouped-query-attention.webp)
+
+### Performance
+Vanilla Multi Head Attention tends to have the highest accuracy but it requires the most memory and it is the slowest to generate, by allowing query heads to share key-value pairs we decrease the memory requirements and improve token generation speed. Unfortunately this speedup comes at a cost and we loose accuracy. 
+
+
+## Cons
+Where with LongT5 we have an open source implementation at hug
diff --git a/content/posts/t5-the-old-new-thing.md b/content/posts/t5-the-old-new-thing.md
@@ -129,3 +129,4 @@ Furthermore, T5 boasts runtime requirements that are approximately half those of
 
 # Disclaimer
 Since I am not an english native speaker, I use ChatGPT to help me with the text (Formatting, Spelling, etc). However I did write every single word in this blog post, If you are interested you can check the the original text [here](https://github.com/n1o/n1o.github.io/blob/master/content/posts/t5-the-old-new-thing.md)
+
diff --git a/resources/_gen/assets/scss/scss/coder-dark.scss_9e20ccd2d8034c8e0fd83b11fb6e2bd5.content b/resources/_gen/assets/scss/scss/coder-dark.scss_9e20ccd2d8034c8e0fd83b11fb6e2bd5.content
@@ -10,16 +10,65 @@ body.colorscheme-dark {
   body.colorscheme-dark h5,
   body.colorscheme-dark h6 {
     color: #dadada; }
+    body.colorscheme-dark h1:hover .heading-link,
+    body.colorscheme-dark h2:hover .heading-link,
+    body.colorscheme-dark h3:hover .heading-link,
+    body.colorscheme-dark h4:hover .heading-link,
+    body.colorscheme-dark h5:hover .heading-link,
+    body.colorscheme-dark h6:hover .heading-link {
+      visibility: visible; }
+    body.colorscheme-dark h1 .heading-link,
+    body.colorscheme-dark h2 .heading-link,
+    body.colorscheme-dark h3 .heading-link,
+    body.colorscheme-dark h4 .heading-link,
+    body.colorscheme-dark h5 .heading-link,
+    body.colorscheme-dark h6 .heading-link {
+      color: #42a5f5;
+      font-weight: inherit;
+      text-decoration: none;
+      font-size: 80%;
+      visibility: hidden; }
+    body.colorscheme-dark h1 .title-link,
+    body.colorscheme-dark h2 .title-link,
+    body.colorscheme-dark h3 .title-link,
+    body.colorscheme-dark h4 .title-link,
+    body.colorscheme-dark h5 .title-link,
+    body.colorscheme-dark h6 .title-link {
+      color: inherit;
+      font-weight: inherit;
+      text-decoration: none; }
   body.colorscheme-dark code {
     background-color: #424242;
     color: #dadada; }
-  body.colorscheme-dark pre code {
+  body.colorscheme-dark .highlight pre {
+    background-color: #424242;
+    color: #212121; }
+    body.colorscheme-dark .highlight pre code {
+      background-color: inherit;
+      color: inherit; }
+  body.colorscheme-dark :not(.highlight) > pre code {
     background-color: inherit;
     color: inherit; }
   body.colorscheme-dark blockquote {
     border-left: 2px solid #424242; }
-  body.colorscheme-dark table td, body.colorscheme-dark table th {
+  body.colorscheme-dark th,
+  body.colorscheme-dark td {
+    padding: 1.6rem; }
+  body.colorscheme-dark table {
+    border-collapse: collapse; }
+  body.colorscheme-dark table td,
+  body.colorscheme-dark table th {
     border: 2px solid #dadada; }
+  body.colorscheme-dark table tr:first-child th {
+    border-top: 0; }
+  body.colorscheme-dark table tr:last-child td {
+    border-bottom: 0; }
+  body.colorscheme-dark table tr td:first-child,
+  body.colorscheme-dark table tr th:first-child {
+    border-left: 0; }
+  body.colorscheme-dark table tr td:last-child,
+  body.colorscheme-dark table tr th:last-child {
+    border-right: 0; }
 
 @media (prefers-color-scheme: dark) {
   body.colorscheme-auto {
@@ -34,16 +83,72 @@ body.colorscheme-dark {
     body.colorscheme-auto h5,
     body.colorscheme-auto h6 {
       color: #dadada; }
+      body.colorscheme-auto h1:hover .heading-link,
+      body.colorscheme-auto h2:hover .heading-link,
+      body.colorscheme-auto h3:hover .heading-link,
+      body.colorscheme-auto h4:hover .heading-link,
+      body.colorscheme-auto h5:hover .heading-link,
+      body.colorscheme-auto h6:hover .heading-link {
+        visibility: visible; }
+      body.colorscheme-auto h1 .heading-link,
+      body.colorscheme-auto h2 .heading-link,
+      body.colorscheme-auto h3 .heading-link,
+      body.colorscheme-auto h4 .heading-link,
+      body.colorscheme-auto h5 .heading-link,
+      body.colorscheme-auto h6 .heading-link {
+        color: #42a5f5;
+        font-weight: inherit;
+        text-decoration: none;
+        font-size: 80%;
+        visibility: hidden; }
+      body.colorscheme-auto h1 .title-link,
+      body.colorscheme-auto h2 .title-link,
+      body.colorscheme-auto h3 .title-link,
+      body.colorscheme-auto h4 .title-link,
+      body.colorscheme-auto h5 .title-link,
+      body.colorscheme-auto h6 .title-link {
+        color: inherit;
+        font-weight: inherit;
+        text-decoration: none; }
     body.colorscheme-auto code {
       background-color: #424242;
       color: #dadada; }
-    body.colorscheme-auto pre code {
+    body.colorscheme-auto .highlight pre {
+      background-color: #424242;
+      color: #212121; }
+      body.colorscheme-auto .highlight pre code {
+        background-color: inherit;
+        color: inherit; }
+    body.colorscheme-auto :not(.highlight) > pre code {
       background-color: inherit;
       color: inherit; }
     body.colorscheme-auto blockquote {
       border-left: 2px solid #424242; }
-    body.colorscheme-auto table td, body.colorscheme-auto table th {
-      border: 2px solid #dadada; } }
+    body.colorscheme-auto th,
+    body.colorscheme-auto td {
+      padding: 1.6rem; }
+    body.colorscheme-auto table {
+      border-collapse: collapse; }
+    body.colorscheme-auto table td,
+    body.colorscheme-auto table th {
+      border: 2px solid #dadada; }
+    body.colorscheme-auto table tr:first-child th {
+      border-top: 0; }
+    body.colorscheme-auto table tr:last-child td {
+      border-bottom: 0; }
+    body.colorscheme-auto table tr td:first-child,
+    body.colorscheme-auto table tr th:first-child {
+      border-left: 0; }
+    body.colorscheme-auto table tr td:last-child,
+    body.colorscheme-auto table tr th:last-child {
+      border-right: 0; } }
+
+body.colorscheme-dark .content .post .tags .tag {
+  background-color: #424242; }
+  body.colorscheme-dark .content .post .tags .tag a {
+    color: #dadada; }
+  body.colorscheme-dark .content .post .tags .tag a:active {
+    color: #dadada; }
 
 body.colorscheme-dark .content .list ul li .title {
   color: #dadada; }
@@ -56,6 +161,12 @@ body.colorscheme-dark .content .centered .about ul li a {
     color: #42a5f5; }
 
 @media (prefers-color-scheme: dark) {
+  body.colorscheme-auto .content .post .tags .tag {
+    background-color: #424242; }
+    body.colorscheme-auto .content .post .tags .tag a {
+      color: #dadada; }
+    body.colorscheme-auto .content .post .tags .tag a:active {
+      color: #dadada; }
   body.colorscheme-auto .content .list ul li .title {
     color: #dadada; }
     body.colorscheme-auto .content .list ul li .title:hover, body.colorscheme-auto .content .list ul li .title:focus {
@@ -65,7 +176,15 @@ body.colorscheme-dark .content .centered .about ul li a {
     body.colorscheme-auto .content .centered .about ul li a:hover, body.colorscheme-auto .content .centered .about ul li a:focus {
       color: #42a5f5; } }
 
-body.colorscheme-dark .navigation a, body.colorscheme-dark .navigation span {
+body.colorscheme-dark .notice .notice-title {
+  border-bottom: 1px solid #212121; }
+
+@media (prefers-color-scheme: dark) {
+  body.colorscheme-auto .notice .notice-title {
+    border-bottom: 1px solid #212121; } }
+
+body.colorscheme-dark .navigation a,
+body.colorscheme-dark .navigation span {
   color: #dadada; }
 
 body.colorscheme-dark .navigation a:hover, body.colorscheme-dark .navigation a:focus {
@@ -94,7 +213,8 @@ body.colorscheme-dark .navigation .menu-button i:hover, body.colorscheme-dark .n
   color: #dadada; }
 
 @media (prefers-color-scheme: dark) {
-  body.colorscheme-auto .navigation a, body.colorscheme-auto .navigation span {
+  body.colorscheme-auto .navigation a,
+  body.colorscheme-auto .navigation span {
     color: #dadada; }
   body.colorscheme-auto .navigation a:hover, body.colorscheme-auto .navigation a:focus {
     color: #42a5f5; } }
@@ -121,6 +241,42 @@ body.colorscheme-dark .navigation .menu-button i:hover, body.colorscheme-dark .n
   body.colorscheme-auto .navigation .menu-button i:hover, body.colorscheme-auto .navigation .menu-button i:focus {
     color: #dadada; } }
 
+body.colorscheme-dark .tabs label.tab-label {
+  background-color: #424242;
+  border-color: #4f4f4f; }
+
+body.colorscheme-dark .tabs input.tab-input:checked + label.tab-label {
+  background-color: #212121; }
+
+body.colorscheme-dark .tabs .tab-content {
+  background-color: #212121;
+  border-color: #4f4f4f; }
+
+@media (prefers-color-scheme: dark) {
+  body.colorscheme-auto .tabs label.tab-label {
+    background-color: #424242;
+    border-color: #4f4f4f; }
+  body.colorscheme-auto .tabs input.tab-input:checked + label.tab-label {
+    background-color: #212121; }
+  body.colorscheme-auto .tabs .tab-content {
+    background-color: #212121;
+    border-color: #4f4f4f; } }
+
+body.colorscheme-dark .taxonomy-element {
+  background-color: #424242; }
+  body.colorscheme-dark .taxonomy-element a {
+    color: #dadada; }
+  body.colorscheme-dark .taxonomy-element a:active {
+    color: #dadada; }
+
+@media (prefers-color-scheme: dark) {
+  body.colorscheme-auto .taxonomy-element {
+    background-color: #424242; }
+    body.colorscheme-auto .taxonomy-element a {
+      color: #dadada; }
+    body.colorscheme-auto .taxonomy-element a:active {
+      color: #dadada; } }
+
 body.colorscheme-dark .footer a {
   color: #42a5f5; }
Original file line number	Diff line number	Diff line change
Expand Up		@@ -129,3 +129,4 @@ Furthermore, T5 boasts runtime requirements that are approximately half those of

		# Disclaimer
		Since I am not an english native speaker, I use ChatGPT to help me with the text (Formatting, Spelling, etc). However I did write every single word in this blog post, If you are interested you can check the the original text [here](https://github.com/n1o/n1o.github.io/blob/master/content/posts/t5-the-old-new-thing.md)