Skip to content

Commit

Permalink
Fixed All Typos in docs (#2185)
Browse files Browse the repository at this point in the history
* docs: fixed minor typo in quicktour.mdx

* docs: fixed missing closing quotations mark in onnx/runtime/usage_guides/models.mdx

* docs: removed extra quotation mark at the end in onnxruntime/usage_guides/optimization.mdx

* docs: fixed spelling of NVIDIA in bettertransformer/overview.mdx

* docs: fixed other typos in bettertransformer/overview.mdx

* docs: fixed multiple typos in bettertransformer/tutorials/contribute.mdx

* docs: corrected minor typos and grammar in bettertransformer/tutorials/convert.mdx
  • Loading branch information
ruidazeng authored Feb 11, 2025
1 parent afff2fa commit 180bddf
Show file tree
Hide file tree
Showing 6 changed files with 11 additions and 11 deletions.
6 changes: 3 additions & 3 deletions docs/source/bettertransformer/overview.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ specific language governing permissions and limitations under the License.

## Quickstart

Since its 1.13 version, [PyTorch released](https://pytorch.org/blog/PyTorch-1.13-release/) the stable version of a fast path for its standard Transformer APIs that provides out of the box performance improvements for transformer-based models. You can benefit from interesting speedup on most consumer-type devices, including CPUs, older and newer versions of NIVIDIA GPUs.
Since its 1.13 version, [PyTorch released](https://pytorch.org/blog/PyTorch-1.13-release/) the stable version of a fast path for its standard Transformer APIs that provides out of the box performance improvements for transformer-based models. You can benefit from interesting speedup on most consumer-type devices, including CPUs, older and newer versions of NVIDIA GPUs.
You can now use this feature in 🤗 Optimum together with Transformers and use it for major models in the Hugging Face ecosystem.

In the 2.0 version, PyTorch includes a native scaled dot-product attention operator (SDPA) as part of `torch.nn.functional`. This function encompasses several implementations that can be applied depending on the inputs and the hardware in use. See the [official documentation](https://pytorch.org/docs/master/generated/torch.nn.functional.scaled_dot_product_attention) for more information, and [this blog post](https://pytorch.org/blog/out-of-the-box-acceleration/) for benchmarks.
Expand Down Expand Up @@ -54,13 +54,13 @@ The list of supported model below:
- [DeiT](https://arxiv.org/abs/2012.12877)
- [Electra](https://arxiv.org/abs/2003.10555)
- [Ernie](https://arxiv.org/abs/1904.09223)
- [Falcon](https://arxiv.org/abs/2306.01116) (No need to use BetterTransformer, it is [directy supported by Transformers](https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-and-memory-efficient-attention-through-pytorchs-scaleddotproductattention))
- [Falcon](https://arxiv.org/abs/2306.01116) (No need to use BetterTransformer, it is [directly supported by Transformers](https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-and-memory-efficient-attention-through-pytorchs-scaleddotproductattention))
- [FSMT](https://arxiv.org/abs/1907.06616)
- [GPT2](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf)
- [GPT-j](https://huggingface.co/EleutherAI/gpt-j-6B)
- [GPT-neo](https://github.com/EleutherAI/gpt-neo)
- [GPT-neo-x](https://arxiv.org/abs/2204.06745)
- [GPT BigCode](https://arxiv.org/abs/2301.03988) (SantaCoder, StarCoder - no need to use BetterTransformer, it is [directy supported by Transformers](https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-and-memory-efficient-attention-through-pytorchs-scaleddotproductattention))
- [GPT BigCode](https://arxiv.org/abs/2301.03988) (SantaCoder, StarCoder - no need to use BetterTransformer, it is [directly supported by Transformers](https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-and-memory-efficient-attention-through-pytorchs-scaleddotproductattention))
- [HuBERT](https://arxiv.org/pdf/2106.07447.pdf)
- [LayoutLM](https://arxiv.org/abs/1912.13318)
- [Llama & Llama2](https://arxiv.org/abs/2302.13971) (No need to use BetterTransformer, it is [directy supported by Transformers](https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-and-memory-efficient-attention-through-pytorchs-scaleddotproductattention))
Expand Down
4 changes: 2 additions & 2 deletions docs/source/bettertransformer/tutorials/contribute.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ Now, make sure to fill all the necessary attributes, the list of attributes are:

Note that these attributes correspond to all the components that are necessary to run a Transformer Encoder module, check the figure 1 on the ["Attention Is All You Need"](https://arxiv.org/pdf/1706.03762.pdf) paper.

Once you filled all these attributes (sometimes the `query`, `key` and `value` layers needs to be "contigufied", check the [`modeling_encoder.py`](https://github.com/huggingface/optimum/blob/main/optimum/bettertransformer/models/encoder_models.py) file to understand more.)
Once you filled all these attributes (sometimes the `query`, `key` and `value` layers needs to be "contiguified", check the [`modeling_encoder.py`](https://github.com/huggingface/optimum/blob/main/optimum/bettertransformer/models/encoder_models.py) file to understand more.)

Make sure also to add the lines:
```python
Expand All @@ -125,7 +125,7 @@ self.validate_bettertransformer()

First of all, start with the line `super().forward_checker()`, this is needed so that the parent class can run all the safety checkers before.

After the first forward pass, the hidden states needs to be *nested* using the attention mask. Once they are nested, the attention mask is not needed anymore, therefore can be set to `None`. This is how the forward pass is built for `Bert`, these lines should remain pretty much similar accross models, but sometimes the shapes of the attention masks are different across models.
After the first forward pass, the hidden states needs to be *nested* using the attention mask. Once they are nested, the attention mask is not needed anymore, therefore can be set to `None`. This is how the forward pass is built for `Bert`, these lines should remain pretty much similar across models, but sometimes the shapes of the attention masks are different across models.
```python
super().forward_checker()

Expand Down
6 changes: 3 additions & 3 deletions docs/source/bettertransformer/tutorials/convert.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ Sometimes you can directly load your model on your GPU devices using `accelerate

## Step 2: Set your model on your preferred device

If you did not used `device_map="auto"` to load your model (or if your model does not support `device_map="auto"`), you can manually set your model to a GPU:
If you did not use `device_map="auto"` to load your model (or if your model does not support `device_map="auto"`), you can manually set your model to a GPU:
```python
>>> model = model.to(0) # or model.to("cuda:0")
```
Expand Down Expand Up @@ -92,7 +92,7 @@ You can also use `transformers.pipeline` as usual and pass the converted model d
>>> ...
```

Please refer to the [official documentation of `pipeline`](https://huggingface.co/docs/transformers/main_classes/pipelines) for further usage. If you face into any issue, do not hesitate to open an isse on GitHub!
Please refer to the [official documentation of `pipeline`](https://huggingface.co/docs/transformers/main_classes/pipelines) for further usage. If you run into any issue, do not hesitate to open an issue on GitHub!

## Training compatibility

Expand All @@ -113,4 +113,4 @@ model = BetterTransformer.transform(model)
model = BetterTransformer.reverse(model)
model.save_pretrained("fine_tuned_model")
model.push_to_hub("fine_tuned_model")
```
```
2 changes: 1 addition & 1 deletion docs/source/onnxruntime/usage_guides/models.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Once your model was [exported to the ONNX format](https://huggingface.co/docs/op
- from transformers import AutoModelForCausalLM
+ from optimum.onnxruntime import ORTModelForCausalLM

- model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B) # PyTorch checkpoint
- model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B") # PyTorch checkpoint
+ model = ORTModelForCausalLM.from_pretrained("onnx-community/Llama-3.2-1B", subfolder="onnx") # ONNX checkpoint
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B")

Expand Down
2 changes: 1 addition & 1 deletion docs/source/onnxruntime/usage_guides/optimization.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,7 @@ Below you will find an easy end-to-end example on how to optimize [distilbert-ba
```


Below you will find an easy end-to-end example on how to optimize a Seq2Seq model [sshleifer/distilbart-cnn-12-6"](https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Below you will find an easy end-to-end example on how to optimize a Seq2Seq model [sshleifer/distilbart-cnn-12-6](https://huggingface.co/sshleifer/distilbart-cnn-12-6).

```python
>>> from transformers import AutoTokenizer
Expand Down
2 changes: 1 addition & 1 deletion docs/source/quicktour.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -185,7 +185,7 @@ Check out the [documentation](https://huggingface.co/docs/optimum/exporters/onnx

## PyTorch's BetterTransformer support

[BetterTransformer](https://pytorch.org/blog/a-better-transformer-for-fast-transformer-encoder-inference/) is a free-lunch PyTorch-native optimization to gain x1.25 - x4 speedup on the inference of Transformer-based models. It has been marked as stable in [PyTorch 1.13](https://pytorch.org/blog/PyTorch-1.13-release/). We integrated BetterTransformer with the most-used models from the 🤗 Transformers libary, and using the integration is as simple as:
[BetterTransformer](https://pytorch.org/blog/a-better-transformer-for-fast-transformer-encoder-inference/) is a free-lunch PyTorch-native optimization to gain x1.25 - x4 speedup on the inference of Transformer-based models. It has been marked as stable in [PyTorch 1.13](https://pytorch.org/blog/PyTorch-1.13-release/). We integrated BetterTransformer with the most-used models from the 🤗 Transformers library, and using the integration is as simple as:

```python
>>> from optimum.bettertransformer import BetterTransformer
Expand Down

0 comments on commit 180bddf

Please sign in to comment.