diff --git a/docs/EasyAttentionExample.rst b/docs/attentionmodule_example.rst similarity index 95% rename from docs/EasyAttentionExample.rst rename to docs/attentionmodule_example.rst index 2c021b819..2db7a13aa 100644 --- a/docs/EasyAttentionExample.rst +++ b/docs/attentionmodule_example.rst @@ -1,7 +1,8 @@ -# AttentionModule - -## what is `AttentionModule` +AttentionModule +======== +what is `AttentionModule` +-------- AttentionModule is a EasyDeL module that can perform attention operation with different strategies to help user achieve the best possible performance and numerical stability, here are some strategies supported right now. @@ -14,8 +15,8 @@ the best possible performance and numerical stability, here are some strategies 7. Wise Ring attention via "wise_ring" 8. sharded Attention with shard map known as "sharded_vanilla" -## Example of Using Flash Attention on TPU - +Example of Using Flash Attention on TPU +-------- ```python import jax import flax.linen.attention as flt diff --git a/docs/CONTRIBUTING.rst b/docs/contributing.rst similarity index 100% rename from docs/CONTRIBUTING.rst rename to docs/contributing.rst diff --git a/docs/DataProcessing.rst b/docs/data_processing.rst similarity index 100% rename from docs/DataProcessing.rst rename to docs/data_processing.rst diff --git a/docs/EasyStateExample.md b/docs/easydelstate.rst similarity index 100% rename from docs/EasyStateExample.md rename to docs/easydelstate.rst diff --git a/docs/FineTuningExample.rst b/docs/finetuning_example.rst similarity index 92% rename from docs/FineTuningExample.rst rename to docs/finetuning_example.rst index 18a1770de..61b30bee2 100644 --- a/docs/FineTuningExample.rst +++ b/docs/finetuning_example.rst @@ -1,11 +1,11 @@ -## FineTuning Causal Language Model 🥵 - +FineTuning Causal Language Model 🥵 +===== with using EasyDeL FineTuning LLM (CausalLanguageModels) are easy as much as possible with using Jax and Flax and having the benefit of `TPUs` for the best speed here's a simple code to use in order to finetune your own Model -_Days Has Been Passed and now using easydel in Jax is way more similar to HF/PyTorch Style -now it's time to finetune our model_. +Days Has Been Passed and now using easydel in Jax is way more similar to HF/PyTorch Style +now it's time to finetune our model. ```python import jax.numpy diff --git a/docs/LoRA-TransferLearningExample.rst b/docs/lora_transferlearning_example.rst similarity index 96% rename from docs/LoRA-TransferLearningExample.rst rename to docs/lora_transferlearning_example.rst index b0b50ee74..a6cf7b2fd 100644 --- a/docs/LoRA-TransferLearningExample.rst +++ b/docs/lora_transferlearning_example.rst @@ -1,5 +1,5 @@ -## EasyDeLXRapTure for layer tuning and LoRA - +EasyDeLXRapTure for layer tuning and LoRA +--------- in case of using LoRA and applying that on the EasyDeL models there are some other things that you might need to config on your own but a lot of things being handled by EasyDeL so let just jump into an example for LoRA fine-tuning section and use _EasyDeLXRapTure_ in for mistral models with flash attention example diff --git a/docs/Parameter-Quantization.md b/docs/parameterquantization.rst similarity index 100% rename from docs/Parameter-Quantization.md rename to docs/parameterquantization.rst