diff --git a/docs/EasyAttentionExample.rst b/docs/attentionmodule_example.rst
similarity index 95%
rename from docs/EasyAttentionExample.rst
rename to docs/attentionmodule_example.rst
index 2c021b819..2db7a13aa 100644
--- a/docs/EasyAttentionExample.rst
+++ b/docs/attentionmodule_example.rst
@@ -1,7 +1,8 @@
-# AttentionModule
-
-## what is `AttentionModule`
+AttentionModule
+========
 
+what is `AttentionModule`
+--------
 AttentionModule is a EasyDeL module that can perform attention operation with different strategies to help user achieve
 the best possible performance and numerical stability, here are some strategies supported right now.
 
@@ -14,8 +15,8 @@ the best possible performance and numerical stability, here are some strategies
 7. Wise Ring attention via "wise_ring"
 8. sharded Attention with shard map known as "sharded_vanilla"
 
-## Example of Using Flash Attention on TPU
-
+Example of Using Flash Attention on TPU
+--------
 ```python
 import jax
 import flax.linen.attention as flt
diff --git a/docs/CONTRIBUTING.rst b/docs/contributing.rst
similarity index 100%
rename from docs/CONTRIBUTING.rst
rename to docs/contributing.rst
diff --git a/docs/DataProcessing.rst b/docs/data_processing.rst
similarity index 100%
rename from docs/DataProcessing.rst
rename to docs/data_processing.rst
diff --git a/docs/EasyStateExample.md b/docs/easydelstate.rst
similarity index 100%
rename from docs/EasyStateExample.md
rename to docs/easydelstate.rst
diff --git a/docs/FineTuningExample.rst b/docs/finetuning_example.rst
similarity index 92%
rename from docs/FineTuningExample.rst
rename to docs/finetuning_example.rst
index 18a1770de..61b30bee2 100644
--- a/docs/FineTuningExample.rst
+++ b/docs/finetuning_example.rst
@@ -1,11 +1,11 @@
-## FineTuning Causal Language Model 🥵
-
+FineTuning Causal Language Model 🥵
+=====
 with using EasyDeL FineTuning LLM (CausalLanguageModels) are easy as much as possible with using Jax and Flax
 and having the benefit of `TPUs` for the best speed here's a simple code to use in order to finetune your
 own Model
 
-_Days Has Been Passed and now using easydel in Jax is way more similar to HF/PyTorch Style
-now it's time to finetune our model_.
+Days Has Been Passed and now using easydel in Jax is way more similar to HF/PyTorch Style
+now it's time to finetune our model.
 
 ```python
 import jax.numpy
diff --git a/docs/LoRA-TransferLearningExample.rst b/docs/lora_transferlearning_example.rst
similarity index 96%
rename from docs/LoRA-TransferLearningExample.rst
rename to docs/lora_transferlearning_example.rst
index b0b50ee74..a6cf7b2fd 100644
--- a/docs/LoRA-TransferLearningExample.rst
+++ b/docs/lora_transferlearning_example.rst
@@ -1,5 +1,5 @@
-## EasyDeLXRapTure for layer tuning and LoRA
-
+EasyDeLXRapTure for layer tuning and LoRA
+---------
 in case of using LoRA and applying that on the EasyDeL models there are some other things
 that you might need to config on your own but a lot of things being handled by EasyDeL so let just jump into an example
 for LoRA fine-tuning section and use _EasyDeLXRapTure_ in for mistral models with flash attention example
diff --git a/docs/Parameter-Quantization.md b/docs/parameterquantization.rst
similarity index 100%
rename from docs/Parameter-Quantization.md
rename to docs/parameterquantization.rst