From 2c752dde81633047b46e3ee942df33bd94e8011d Mon Sep 17 00:00:00 2001 From: erfanzar Date: Tue, 28 May 2024 16:10:19 +0330 Subject: [PATCH] fixing documentation errors --- ...tentionExample.rst => attentionmodule_example.rst} | 11 ++++++----- docs/{CONTRIBUTING.rst => contributing.rst} | 0 docs/{DataProcessing.rst => data_processing.rst} | 0 docs/{EasyStateExample.md => easydelstate.rst} | 0 .../{FineTuningExample.rst => finetuning_example.rst} | 8 ++++---- ...gExample.rst => lora_transferlearning_example.rst} | 4 ++-- ...eter-Quantization.md => parameterquantization.rst} | 0 7 files changed, 12 insertions(+), 11 deletions(-) rename docs/{EasyAttentionExample.rst => attentionmodule_example.rst} (95%) rename docs/{CONTRIBUTING.rst => contributing.rst} (100%) rename docs/{DataProcessing.rst => data_processing.rst} (100%) rename docs/{EasyStateExample.md => easydelstate.rst} (100%) rename docs/{FineTuningExample.rst => finetuning_example.rst} (92%) rename docs/{LoRA-TransferLearningExample.rst => lora_transferlearning_example.rst} (96%) rename docs/{Parameter-Quantization.md => parameterquantization.rst} (100%) diff --git a/docs/EasyAttentionExample.rst b/docs/attentionmodule_example.rst similarity index 95% rename from docs/EasyAttentionExample.rst rename to docs/attentionmodule_example.rst index 2c021b819..2db7a13aa 100644 --- a/docs/EasyAttentionExample.rst +++ b/docs/attentionmodule_example.rst @@ -1,7 +1,8 @@ -# AttentionModule - -## what is `AttentionModule` +AttentionModule +======== +what is `AttentionModule` +-------- AttentionModule is a EasyDeL module that can perform attention operation with different strategies to help user achieve the best possible performance and numerical stability, here are some strategies supported right now. @@ -14,8 +15,8 @@ the best possible performance and numerical stability, here are some strategies 7. Wise Ring attention via "wise_ring" 8. sharded Attention with shard map known as "sharded_vanilla" -## Example of Using Flash Attention on TPU - +Example of Using Flash Attention on TPU +-------- ```python import jax import flax.linen.attention as flt diff --git a/docs/CONTRIBUTING.rst b/docs/contributing.rst similarity index 100% rename from docs/CONTRIBUTING.rst rename to docs/contributing.rst diff --git a/docs/DataProcessing.rst b/docs/data_processing.rst similarity index 100% rename from docs/DataProcessing.rst rename to docs/data_processing.rst diff --git a/docs/EasyStateExample.md b/docs/easydelstate.rst similarity index 100% rename from docs/EasyStateExample.md rename to docs/easydelstate.rst diff --git a/docs/FineTuningExample.rst b/docs/finetuning_example.rst similarity index 92% rename from docs/FineTuningExample.rst rename to docs/finetuning_example.rst index 18a1770de..61b30bee2 100644 --- a/docs/FineTuningExample.rst +++ b/docs/finetuning_example.rst @@ -1,11 +1,11 @@ -## FineTuning Causal Language Model 🥵 - +FineTuning Causal Language Model 🥵 +===== with using EasyDeL FineTuning LLM (CausalLanguageModels) are easy as much as possible with using Jax and Flax and having the benefit of `TPUs` for the best speed here's a simple code to use in order to finetune your own Model -_Days Has Been Passed and now using easydel in Jax is way more similar to HF/PyTorch Style -now it's time to finetune our model_. +Days Has Been Passed and now using easydel in Jax is way more similar to HF/PyTorch Style +now it's time to finetune our model. ```python import jax.numpy diff --git a/docs/LoRA-TransferLearningExample.rst b/docs/lora_transferlearning_example.rst similarity index 96% rename from docs/LoRA-TransferLearningExample.rst rename to docs/lora_transferlearning_example.rst index b0b50ee74..a6cf7b2fd 100644 --- a/docs/LoRA-TransferLearningExample.rst +++ b/docs/lora_transferlearning_example.rst @@ -1,5 +1,5 @@ -## EasyDeLXRapTure for layer tuning and LoRA - +EasyDeLXRapTure for layer tuning and LoRA +--------- in case of using LoRA and applying that on the EasyDeL models there are some other things that you might need to config on your own but a lot of things being handled by EasyDeL so let just jump into an example for LoRA fine-tuning section and use _EasyDeLXRapTure_ in for mistral models with flash attention example diff --git a/docs/Parameter-Quantization.md b/docs/parameterquantization.rst similarity index 100% rename from docs/Parameter-Quantization.md rename to docs/parameterquantization.rst