Skip to content

Commit

Permalink
minor
Browse files Browse the repository at this point in the history
  • Loading branch information
vaibhavad committed Apr 9, 2024
1 parent fdc51a5 commit afd9a67
Show file tree
Hide file tree
Showing 2 changed files with 11 additions and 4 deletions.
7 changes: 7 additions & 0 deletions docs/_includes/head/custom.html
Original file line number Diff line number Diff line change
Expand Up @@ -45,4 +45,11 @@
const theme = sessionStorage.getItem('theme');
updateNodesRel(theme);
</script>
<!-- MathJax -->

<script type="text/javascript"

src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.3/MathJax.js?config=TeX-AMS-MML_HTMLorMML">

</script>
{% endif %}
8 changes: 4 additions & 4 deletions docs/_pages/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,15 @@ title: "LLM2Vec Tutorial: Steps for transforming any decoder-only model into a t
permalink: /tutorial/
---

LLM2Vec consists of 3 simple steps to transform decoder-only LLMs into text encoders: 1) enabling bidirectional attention, 2) training with masked next token prediction, and 3) unsupervised contrastive learning. The model can be further fine-tuned with supervised data. Here, we provide a tutorial on how to use the LlaMA models.
LLM2Vec consists of 3 simple steps to transform decoder-only LLMs into text encoders: 1) enabling bidirectional attention, 2) training with masked next token prediction, and 3) unsupervised contrastive learning. After the LLM2Vec transformation, the model can be further fine-tuned with supervised data. Here, we provide a tutorial on how to use the LlaMA models.

This tutorial will focus on the first two steps. After these steps, the model can be trained for unsupervised or supervised contrastive learning like any other encoder model.
This tutorial will focus on the first two steps. After completing these steps, the model can be trained for unsupervised or supervised contrastive learning like any other encoder model.

In this tutorial, we will transform LlaMA models into text encoders, however, transforming Mistral will require similar steps. We will focus on modifying the flash attention implementation as it requires the least changes in the codebase, and the implementation is consistent across models and transformers versions. Our tutorial is based on transformers version 4.39.3.
For the scope of this tutorial, we will showcase how to apply LLM2Vec to models from the LLaMA-2 model family. For simplicity, we focus on the FlashAttention attention implementation. The following steps have been tested using transformers version 4.39.3.

## 1) Enabling Bidirectional Attention

A decoder-only causal LLM consists of multiple decoder layers, each of which has a self-attention mechanism.
A decoder-only causal LLM consists of multiple decoder layers, each of which has a self-attention sub-layer.

<p align="center">
<img src="https://github.com/McGill-NLP/llm2vec/blob/weblog/docs/assets/images/LLM2Vec-tutorial.png?raw=true" width="75%" alt="Llama Conceptual overview"/>
Expand Down

0 comments on commit afd9a67

Please sign in to comment.