Skip to content

Commit

Permalink
Add initial draft for extending t5
Browse files Browse the repository at this point in the history
  • Loading branch information
n1o committed May 7, 2024
1 parent 655a63a commit b045edf
Show file tree
Hide file tree
Showing 10 changed files with 3,562 additions and 94 deletions.
Empty file added .hugo_build.lock
Empty file.
9 changes: 7 additions & 2 deletions config.toml
Original file line number Diff line number Diff line change
Expand Up @@ -110,11 +110,16 @@ disqusShortname = "mbarak-io"
url = "posts/"

[[languages.en.menu.main]]
name = "Projects"
name = "Awesome T5"
weight = 3
url = "awesome-t5/"

[[languages.en.menu.main]]
name = "Projects"
weight = 4
url = "projects/"

[[languages.en.menu.main]]
name = "Contact me"
weight = 4
weight = 5
url = "contact/"
24 changes: 24 additions & 0 deletions content/awesome-t5.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
---
title: "Awesome T5"
date: 2024-05-06T17:31:29+02:00
draft: true
---

Here is a bunch of my posts about a super cool model called T5 its extensions and applications.

# Basics
- [T5, FLAN-T5, UL2]({{< relref "posts/t5-the-old-new-thing.md" >}})
- [LongT5, CoLT5]({{< relref "posts/longer-context-for-t5.md" >}})


# Coding T5
- CodeT5, CodeT5+
- AST-T5
- CodeFusion

# Cyber Security
- LENS

# With Graphs
- Graph Neural Prompting

60 changes: 60 additions & 0 deletions content/posts/longer-context-for-t5.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
---
title: "Longer Context for T5"
date: 2024-04-29T14:10:36+02:00
draft: true
---

# Why does T5 need a longer context?

In my previous post
[T5 the Old New Thing]({{< relref "posts/t5-the-old-new-thing.md" >}})
we already explored why T5 is awesome. But one downside is its limited context length of 512 tokens. Now technically this cannot be compared to the context length of an decoder-only model, since T5 is an encoder-decoder model, meaning the encoder can process input up to 512 tokens, and the decoder can generate output up to 512 tokens. Making the total context length 1024 tokens. We will cover two extensions:
1. [LongT5](https://arxiv.org/abs/2112.07916)
2. [ColtT5](https://arxiv.org/abs/2303.09752)

Booth LongT5 and CoLT5 are eploring ways how to extend the context length of the encoder part of T5. This means that we are looking into how we are able to process longer input lengths and not necessarily generate longer texts. This approach is especially useful for problems like text summarization or document question answering.

# LongT5

Originally published in 2022 and it uses a new pretraining strategy called Pegasus and explores using Local Attention and a novel TGlobal attention in the encoder.

## Pegasus
Pegasus is a pretraining strategy pecially designed for abstract summarization, where we mask out key (principle) sentecnes from a document and we ask the model to reproduce them as a single string as if it would be a summary.

## Local Attention
This is just a sliding window attention, thus any given token is able to attend to a neighborhood of $l$ tokens.

## TGlobal
In TGlobal we partition the input tokens into chucks of length $k$, for each chuck we compute a global token by summing up the individual token embeddings. Now when we perform attention we take a Local Window $l$ and we apped to it all the global tokens.

![TGlobal](/images/t_global_attention.png)

### Cons
Since we do not perform full attention we experience some minor performance degradation, and we need an couple of extra parameters. Computation wise we compute the global tokens on the fly, but we can compute them only once per input tokens per layer and cache them.

### Pros
We are able to process way larger input lengths.

## Notes
It is worth noting that there is a variant of LongT5 that uses only Local Attention without Global Tokens. This wariant can be scaled up to even longer sequences unfortunately there is also an more profound performance drop.

# CoLT5
Paper from 2023, and it builds upon LongT5 by bring in ideas like [Mixture of Experts](https://arxiv.org/abs/2101.03961) and [Multi Query Attention](https://arxiv.org/abs/1911.02150).

## Conditional Computation

![ColT5 Attention](/images/colt5_transformer_layer.png)

### Routing

## Decoder
During output generation long input setences cause an memory bandwidth bottle neck, by using Multi Query Attention (MQA) to speed up the decoding process. In MQA all the query heads share the same key-value pair.

![Grouped Attention Variants](/images/klu-gqa-grouped-query-attention.webp)

### Performance
Vanilla Multi Head Attention tends to have the highest accuracy but it requires the most memory and it is the slowest to generate, by allowing query heads to share key-value pairs we decrease the memory requirements and improve token generation speed. Unfortunately this speedup comes at a cost and we loose accuracy.


## Cons
Where with LongT5 we have an open source implementation at hug
1 change: 1 addition & 0 deletions content/posts/t5-the-old-new-thing.md
Original file line number Diff line number Diff line change
Expand Up @@ -129,3 +129,4 @@ Furthermore, T5 boasts runtime requirements that are approximately half those of

# Disclaimer
Since I am not an english native speaker, I use ChatGPT to help me with the text (Formatting, Spelling, etc). However I did write every single word in this blog post, If you are interested you can check the the original text [here](https://github.com/n1o/n1o.github.io/blob/master/content/posts/t5-the-old-new-thing.md)

Original file line number Diff line number Diff line change
Expand Up @@ -10,16 +10,65 @@ body.colorscheme-dark {
body.colorscheme-dark h5,
body.colorscheme-dark h6 {
color: #dadada; }
body.colorscheme-dark h1:hover .heading-link,
body.colorscheme-dark h2:hover .heading-link,
body.colorscheme-dark h3:hover .heading-link,
body.colorscheme-dark h4:hover .heading-link,
body.colorscheme-dark h5:hover .heading-link,
body.colorscheme-dark h6:hover .heading-link {
visibility: visible; }
body.colorscheme-dark h1 .heading-link,
body.colorscheme-dark h2 .heading-link,
body.colorscheme-dark h3 .heading-link,
body.colorscheme-dark h4 .heading-link,
body.colorscheme-dark h5 .heading-link,
body.colorscheme-dark h6 .heading-link {
color: #42a5f5;
font-weight: inherit;
text-decoration: none;
font-size: 80%;
visibility: hidden; }
body.colorscheme-dark h1 .title-link,
body.colorscheme-dark h2 .title-link,
body.colorscheme-dark h3 .title-link,
body.colorscheme-dark h4 .title-link,
body.colorscheme-dark h5 .title-link,
body.colorscheme-dark h6 .title-link {
color: inherit;
font-weight: inherit;
text-decoration: none; }
body.colorscheme-dark code {
background-color: #424242;
color: #dadada; }
body.colorscheme-dark pre code {
body.colorscheme-dark .highlight pre {
background-color: #424242;
color: #212121; }
body.colorscheme-dark .highlight pre code {
background-color: inherit;
color: inherit; }
body.colorscheme-dark :not(.highlight) > pre code {
background-color: inherit;
color: inherit; }
body.colorscheme-dark blockquote {
border-left: 2px solid #424242; }
body.colorscheme-dark table td, body.colorscheme-dark table th {
body.colorscheme-dark th,
body.colorscheme-dark td {
padding: 1.6rem; }
body.colorscheme-dark table {
border-collapse: collapse; }
body.colorscheme-dark table td,
body.colorscheme-dark table th {
border: 2px solid #dadada; }
body.colorscheme-dark table tr:first-child th {
border-top: 0; }
body.colorscheme-dark table tr:last-child td {
border-bottom: 0; }
body.colorscheme-dark table tr td:first-child,
body.colorscheme-dark table tr th:first-child {
border-left: 0; }
body.colorscheme-dark table tr td:last-child,
body.colorscheme-dark table tr th:last-child {
border-right: 0; }

@media (prefers-color-scheme: dark) {
body.colorscheme-auto {
Expand All @@ -34,16 +83,72 @@ body.colorscheme-dark {
body.colorscheme-auto h5,
body.colorscheme-auto h6 {
color: #dadada; }
body.colorscheme-auto h1:hover .heading-link,
body.colorscheme-auto h2:hover .heading-link,
body.colorscheme-auto h3:hover .heading-link,
body.colorscheme-auto h4:hover .heading-link,
body.colorscheme-auto h5:hover .heading-link,
body.colorscheme-auto h6:hover .heading-link {
visibility: visible; }
body.colorscheme-auto h1 .heading-link,
body.colorscheme-auto h2 .heading-link,
body.colorscheme-auto h3 .heading-link,
body.colorscheme-auto h4 .heading-link,
body.colorscheme-auto h5 .heading-link,
body.colorscheme-auto h6 .heading-link {
color: #42a5f5;
font-weight: inherit;
text-decoration: none;
font-size: 80%;
visibility: hidden; }
body.colorscheme-auto h1 .title-link,
body.colorscheme-auto h2 .title-link,
body.colorscheme-auto h3 .title-link,
body.colorscheme-auto h4 .title-link,
body.colorscheme-auto h5 .title-link,
body.colorscheme-auto h6 .title-link {
color: inherit;
font-weight: inherit;
text-decoration: none; }
body.colorscheme-auto code {
background-color: #424242;
color: #dadada; }
body.colorscheme-auto pre code {
body.colorscheme-auto .highlight pre {
background-color: #424242;
color: #212121; }
body.colorscheme-auto .highlight pre code {
background-color: inherit;
color: inherit; }
body.colorscheme-auto :not(.highlight) > pre code {
background-color: inherit;
color: inherit; }
body.colorscheme-auto blockquote {
border-left: 2px solid #424242; }
body.colorscheme-auto table td, body.colorscheme-auto table th {
border: 2px solid #dadada; } }
body.colorscheme-auto th,
body.colorscheme-auto td {
padding: 1.6rem; }
body.colorscheme-auto table {
border-collapse: collapse; }
body.colorscheme-auto table td,
body.colorscheme-auto table th {
border: 2px solid #dadada; }
body.colorscheme-auto table tr:first-child th {
border-top: 0; }
body.colorscheme-auto table tr:last-child td {
border-bottom: 0; }
body.colorscheme-auto table tr td:first-child,
body.colorscheme-auto table tr th:first-child {
border-left: 0; }
body.colorscheme-auto table tr td:last-child,
body.colorscheme-auto table tr th:last-child {
border-right: 0; } }

body.colorscheme-dark .content .post .tags .tag {
background-color: #424242; }
body.colorscheme-dark .content .post .tags .tag a {
color: #dadada; }
body.colorscheme-dark .content .post .tags .tag a:active {
color: #dadada; }

body.colorscheme-dark .content .list ul li .title {
color: #dadada; }
Expand All @@ -56,6 +161,12 @@ body.colorscheme-dark .content .centered .about ul li a {
color: #42a5f5; }

@media (prefers-color-scheme: dark) {
body.colorscheme-auto .content .post .tags .tag {
background-color: #424242; }
body.colorscheme-auto .content .post .tags .tag a {
color: #dadada; }
body.colorscheme-auto .content .post .tags .tag a:active {
color: #dadada; }
body.colorscheme-auto .content .list ul li .title {
color: #dadada; }
body.colorscheme-auto .content .list ul li .title:hover, body.colorscheme-auto .content .list ul li .title:focus {
Expand All @@ -65,7 +176,15 @@ body.colorscheme-dark .content .centered .about ul li a {
body.colorscheme-auto .content .centered .about ul li a:hover, body.colorscheme-auto .content .centered .about ul li a:focus {
color: #42a5f5; } }

body.colorscheme-dark .navigation a, body.colorscheme-dark .navigation span {
body.colorscheme-dark .notice .notice-title {
border-bottom: 1px solid #212121; }

@media (prefers-color-scheme: dark) {
body.colorscheme-auto .notice .notice-title {
border-bottom: 1px solid #212121; } }

body.colorscheme-dark .navigation a,
body.colorscheme-dark .navigation span {
color: #dadada; }

body.colorscheme-dark .navigation a:hover, body.colorscheme-dark .navigation a:focus {
Expand Down Expand Up @@ -94,7 +213,8 @@ body.colorscheme-dark .navigation .menu-button i:hover, body.colorscheme-dark .n
color: #dadada; }

@media (prefers-color-scheme: dark) {
body.colorscheme-auto .navigation a, body.colorscheme-auto .navigation span {
body.colorscheme-auto .navigation a,
body.colorscheme-auto .navigation span {
color: #dadada; }
body.colorscheme-auto .navigation a:hover, body.colorscheme-auto .navigation a:focus {
color: #42a5f5; } }
Expand All @@ -121,6 +241,42 @@ body.colorscheme-dark .navigation .menu-button i:hover, body.colorscheme-dark .n
body.colorscheme-auto .navigation .menu-button i:hover, body.colorscheme-auto .navigation .menu-button i:focus {
color: #dadada; } }

body.colorscheme-dark .tabs label.tab-label {
background-color: #424242;
border-color: #4f4f4f; }

body.colorscheme-dark .tabs input.tab-input:checked + label.tab-label {
background-color: #212121; }

body.colorscheme-dark .tabs .tab-content {
background-color: #212121;
border-color: #4f4f4f; }

@media (prefers-color-scheme: dark) {
body.colorscheme-auto .tabs label.tab-label {
background-color: #424242;
border-color: #4f4f4f; }
body.colorscheme-auto .tabs input.tab-input:checked + label.tab-label {
background-color: #212121; }
body.colorscheme-auto .tabs .tab-content {
background-color: #212121;
border-color: #4f4f4f; } }

body.colorscheme-dark .taxonomy-element {
background-color: #424242; }
body.colorscheme-dark .taxonomy-element a {
color: #dadada; }
body.colorscheme-dark .taxonomy-element a:active {
color: #dadada; }

@media (prefers-color-scheme: dark) {
body.colorscheme-auto .taxonomy-element {
background-color: #424242; }
body.colorscheme-auto .taxonomy-element a {
color: #dadada; }
body.colorscheme-auto .taxonomy-element a:active {
color: #dadada; } }

body.colorscheme-dark .footer a {
color: #42a5f5; }

Expand Down
Loading

0 comments on commit b045edf

Please sign in to comment.