Name		Name	Last commit message	Last commit date
parent directory ..
configs		configs
README.md		README.md
__init__.py		__init__.py
config.py		config.py
run.py		run.py

README.md

Mistral Language Models

Overview of the model

Mistral is a very similar architecture to LLaMa except that:

Grouped-query attention (GQA), which reduces the number of attention-heads for keys and values
Sliding window attention (SWA) of 4k, which attends to a smaller local window of a sequence rather than the full sequence
Higher default maximum sequence length (MSL) of 32k, rather than 4k

For more details on each technique we refer to the original papers in the References section.

Structure of the code

The code for Mistral uses the same infrastructure as our implementation of GPT-2; we refer to the README under GPT-2 for most instructions. The code in this directory contains:

configs/: YAML configuration files.
run.py: Training script. Performs training and validation.

Configs included for this model

For convenience, we provide different configurations of common model setups for Mistral.

params_mistral_7B.yaml: A 7B parameter model configured as described in the original paper.
params_mistral_7B_msl128k.yaml: A 7B parameter model configured as above but with support for much higher sequence lengths. The sliding window attention allows Mistral to have much higher efficiency at longer sequence lengths.

Appendix

Reference: Touvron, Hugo, et al. (2023). Llama: Open and efficient foundation language models

Reference: Jiang, Albert, et al. (2023). Mistral 7B

Reference: Ainslie, Joshua, et al. (2023). GQA: Training Multi-Query Transformer Models from Multi-Head Checkpoints

Reference: Child, Rewon, et al. (2019). Generating Long Sequences with Sparse Transformers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mistral

mistral

README.md

Mistral Language Models

Overview of the model

Structure of the code

Configs included for this model

Appendix

Files

mistral

Directory actions

More options

Directory actions

More options

Latest commit

History

mistral

Folders and files

parent directory

README.md

Mistral Language Models

Overview of the model

Structure of the code

Configs included for this model

Appendix