diff --git a/.github/workflows/build_documentation.yml b/.github/workflows/build_documentation.yml
new file mode 100644
index 000000000000..4e016d07d55d
--- /dev/null
+++ b/.github/workflows/build_documentation.yml
@@ -0,0 +1,17 @@
+name: Build documentation
+
+on:
+ push:
+ branches:
+ - main
+ - doc-builder*
+ - v*-release
+
+jobs:
+ build:
+ uses: huggingface/doc-builder/.github/workflows/build_main_documentation.yml@main
+ with:
+ commit_sha: ${{ github.sha }}
+ package: diffusers
+ secrets:
+ token: ${{ secrets.HUGGINGFACE_PUSH }}
diff --git a/.github/workflows/build_pr_documentation.yml b/.github/workflows/build_pr_documentation.yml
new file mode 100644
index 000000000000..d51623e735c5
--- /dev/null
+++ b/.github/workflows/build_pr_documentation.yml
@@ -0,0 +1,16 @@
+name: Build PR Documentation
+
+on:
+ pull_request:
+
+concurrency:
+ group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
+ cancel-in-progress: true
+
+jobs:
+ build:
+ uses: huggingface/doc-builder/.github/workflows/build_pr_documentation.yml@main
+ with:
+ commit_sha: ${{ github.event.pull_request.head.sha }}
+ pr_number: ${{ github.event.number }}
+ package: diffusers
diff --git a/.github/workflows/delete_doc_comment.yml b/.github/workflows/delete_doc_comment.yml
new file mode 100644
index 000000000000..238dc0bdbabf
--- /dev/null
+++ b/.github/workflows/delete_doc_comment.yml
@@ -0,0 +1,13 @@
+name: Delete dev documentation
+
+on:
+ pull_request:
+ types: [ closed ]
+
+
+jobs:
+ delete:
+ uses: huggingface/doc-builder/.github/workflows/delete_doc_comment.yml@main
+ with:
+ pr_number: ${{ github.event.number }}
+ package: diffusers
diff --git a/docs/source/_toctree.yml b/docs/source/_toctree.yml
new file mode 100644
index 000000000000..2c1fb407939a
--- /dev/null
+++ b/docs/source/_toctree.yml
@@ -0,0 +1,40 @@
+- sections:
+ - local: index
+ title: 𧨠Diffusers
+ - local: quicktour
+ title: Quicktour
+ - local: philosophy
+ title: Philosophy
+ title: Get started
+- sections:
+ - sections:
+ - local: examples/diffusers_for_vision
+ title: Diffusers for Vision
+ - local: examples/diffusers_for_audio
+ title: Diffusers for Audio
+ - local: examples/diffusers_for_other
+ title: Diffusers for Other Modalities
+ title: Examples
+ title: Using Diffusers
+- sections:
+ - sections:
+ - local: pipelines
+ title: Pipelines
+ - local: schedulers
+ title: Schedulers
+ - local: models
+ title: Models
+ title: Main Classes
+ - sections:
+ - local: pipelines/glide
+ title: "Glide"
+ title: Pipelines
+ - sections:
+ - local: schedulers/ddpm
+ title: "DDPM"
+ title: Schedulers
+ - sections:
+ - local: models/unet
+ title: "Unet"
+ title: Models
+ title: API
diff --git a/docs/source/examples/diffusers_for_audio.mdx b/docs/source/examples/diffusers_for_audio.mdx
new file mode 100644
index 000000000000..d16980586848
--- /dev/null
+++ b/docs/source/examples/diffusers_for_audio.mdx
@@ -0,0 +1,13 @@
+
+
+# Diffusers for audio
\ No newline at end of file
diff --git a/docs/source/examples/diffusers_for_other.mdx b/docs/source/examples/diffusers_for_other.mdx
new file mode 100644
index 000000000000..79fc7d551540
--- /dev/null
+++ b/docs/source/examples/diffusers_for_other.mdx
@@ -0,0 +1,20 @@
+
+
+# Diffusers for other modalities
+
+Diffusers offers support to other modalities than vision and audio.
+Currently, some examples include:
+- [Diffuser](https://diffusion-planning.github.io/) for planning in reinforcement learning (currenlty only inference): [](https://colab.research.google.com/drive/1TmBmlYeKUZSkUZoJqfBmaicVTKx6nN1R?usp=sharing)
+
+If you are interested in contributing to under-construction examples, you can explore:
+- [GeoDiff](https://github.com/MinkaiXu/GeoDiff) for generating 3D configurations of molecule diagrams [](https://colab.research.google.com/drive/1pLYYWQhdLuv1q-JtEHGZybxp2RBF8gPs?usp=sharing).
\ No newline at end of file
diff --git a/docs/source/examples/diffusers_for_vision.mdx b/docs/source/examples/diffusers_for_vision.mdx
new file mode 100644
index 000000000000..624938f59d46
--- /dev/null
+++ b/docs/source/examples/diffusers_for_vision.mdx
@@ -0,0 +1,149 @@
+
+
+# Diffusers for vision
+
+## Direct image generation
+
+#### **Example image generation with PNDM**
+
+```python
+from diffusers import PNDM, UNetModel, PNDMScheduler
+import PIL.Image
+import numpy as np
+import torch
+
+model_id = "fusing/ddim-celeba-hq"
+
+model = UNetModel.from_pretrained(model_id)
+scheduler = PNDMScheduler()
+
+# load model and scheduler
+pndm = PNDM(unet=model, noise_scheduler=scheduler)
+
+# run pipeline in inference (sample random noise and denoise)
+with torch.no_grad():
+ image = pndm()
+
+# process image to PIL
+image_processed = image.cpu().permute(0, 2, 3, 1)
+image_processed = (image_processed + 1.0) / 2
+image_processed = torch.clamp(image_processed, 0.0, 1.0)
+image_processed = image_processed * 255
+image_processed = image_processed.numpy().astype(np.uint8)
+image_pil = PIL.Image.fromarray(image_processed[0])
+
+# save image
+image_pil.save("test.png")
+```
+
+#### **Example 1024x1024 image generation with SDE VE**
+
+See [paper](https://arxiv.org/abs/2011.13456) for more information on SDE VE.
+
+```python
+from diffusers import DiffusionPipeline
+import torch
+import PIL.Image
+import numpy as np
+
+torch.manual_seed(32)
+
+score_sde_sv = DiffusionPipeline.from_pretrained("fusing/ffhq_ncsnpp")
+
+# Note this might take up to 3 minutes on a GPU
+image = score_sde_sv(num_inference_steps=2000)
+
+image = image.permute(0, 2, 3, 1).cpu().numpy()
+image = np.clip(image * 255, 0, 255).astype(np.uint8)
+image_pil = PIL.Image.fromarray(image[0])
+
+# save image
+image_pil.save("test.png")
+```
+#### **Example 32x32 image generation with SDE VP**
+
+See [paper](https://arxiv.org/abs/2011.13456) for more information on SDE VE.
+
+```python
+from diffusers import DiffusionPipeline
+import torch
+import PIL.Image
+import numpy as np
+
+torch.manual_seed(32)
+
+score_sde_sv = DiffusionPipeline.from_pretrained("fusing/cifar10-ddpmpp-deep-vp")
+
+# Note this might take up to 3 minutes on a GPU
+image = score_sde_sv(num_inference_steps=1000)
+
+image = image.permute(0, 2, 3, 1).cpu().numpy()
+image = np.clip(image * 255, 0, 255).astype(np.uint8)
+image_pil = PIL.Image.fromarray(image[0])
+
+# save image
+image_pil.save("test.png")
+```
+
+
+#### **Text to Image generation with Latent Diffusion**
+
+_Note: To use latent diffusion install transformers from [this branch](https://github.com/patil-suraj/transformers/tree/ldm-bert)._
+
+```python
+from diffusers import DiffusionPipeline
+
+ldm = DiffusionPipeline.from_pretrained("fusing/latent-diffusion-text2im-large")
+
+generator = torch.manual_seed(42)
+
+prompt = "A painting of a squirrel eating a burger"
+image = ldm([prompt], generator=generator, eta=0.3, guidance_scale=6.0, num_inference_steps=50)
+
+image_processed = image.cpu().permute(0, 2, 3, 1)
+image_processed = image_processed * 255.
+image_processed = image_processed.numpy().astype(np.uint8)
+image_pil = PIL.Image.fromarray(image_processed[0])
+
+# save image
+image_pil.save("test.png")
+```
+
+
+## Text to image generation
+
+```python
+import torch
+from diffusers import BDDMPipeline, GradTTSPipeline
+
+torch_device = "cuda"
+
+# load grad tts and bddm pipelines
+grad_tts = GradTTSPipeline.from_pretrained("fusing/grad-tts-libri-tts")
+bddm = BDDMPipeline.from_pretrained("fusing/diffwave-vocoder-ljspeech")
+
+text = "Hello world, I missed you so much."
+
+# generate mel spectograms using text
+mel_spec = grad_tts(text, torch_device=torch_device)
+
+# generate the speech by passing mel spectograms to BDDMPipeline pipeline
+generator = torch.manual_seed(42)
+audio = bddm(mel_spec, generator, torch_device=torch_device)
+
+# save generated audio
+from scipy.io.wavfile import write as wavwrite
+sampling_rate = 22050
+wavwrite("generated_audio.wav", sampling_rate, audio.squeeze().cpu().numpy())
+```
+
diff --git a/docs/source/index.mdx b/docs/source/index.mdx
new file mode 100644
index 000000000000..28459117e0e4
--- /dev/null
+++ b/docs/source/index.mdx
@@ -0,0 +1,110 @@
+
+
+
+
+
+
+
+
+# 𧨠Diffusers
+
+
+π€ Diffusers provides pretrained diffusion models across multiple modalities, such as vision and audio, and serves
+as a modular toolbox for inference and training of diffusion models.
+
+More precisely, π€ Diffusers offers:
+
+- State-of-the-art diffusion pipelines that can be run in inference with just a couple of lines of code (see [src/diffusers/pipelines](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines)).
+- Various noise schedulers that can be used interchangeably for the prefered speed vs. quality trade-off in inference (see [src/diffusers/schedulers](https://github.com/huggingface/diffusers/tree/main/src/diffusers/schedulers)).
+- Multiple types of models, such as UNet, that can be used as building blocks in an end-to-end diffusion system (see [src/diffusers/models](https://github.com/huggingface/diffusers/tree/main/src/diffusers/models)).
+- Training examples to show how to train the most popular diffusion models (see [examples](https://github.com/huggingface/diffusers/tree/main/examples)).
+
+# Installation
+
+Install Diffusers for with PyTorch. Support for other libraries will come in the future
+
+π€ Diffusers is tested on Python 3.6+, and PyTorch 1.4.0+.
+
+## Install with pip
+
+You should install π€ Diffusers in a [virtual environment](https://docs.python.org/3/library/venv.html).
+If you're unfamiliar with Python virtual environments, take a look at this [guide](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/).
+A virtual environment makes it easier to manage different projects, and avoid compatibility issues between dependencies.
+
+Start by creating a virtual environment in your project directory:
+
+```bash
+python -m venv .env
+```
+
+Activate the virtual environment:
+
+```bash
+source .env/bin/activate
+```
+
+Now you're ready to install π€ Diffusers with the following command:
+
+```bash
+pip install diffusers
+```
+
+## Install from source
+
+Install π€ Diffusers from source with the following command:
+
+```bash
+pip install git+https://github.com/huggingface/diffusers
+```
+
+This command installs the bleeding edge `main` version rather than the latest `stable` version.
+The `main` version is useful for staying up-to-date with the latest developments.
+For instance, if a bug has been fixed since the last official release but a new release hasn't been rolled out yet.
+However, this means the `main` version may not always be stable.
+We strive to keep the `main` version operational, and most issues are usually resolved within a few hours or a day.
+If you run into a problem, please open an [Issue](https://github.com/huggingface/transformers/issues) so we can fix it even sooner!
+
+## Editable install
+
+You will need an editable install if you'd like to:
+
+* Use the `main` version of the source code.
+* Contribute to π€ Diffusers and need to test changes in the code.
+
+Clone the repository and install π€ Diffusers with the following commands:
+
+```bash
+git clone https://github.com/huggingface/diffusers.git
+cd transformers
+pip install -e .
+```
+
+These commands will link the folder you cloned the repository to and your Python library paths.
+Python will now look inside the folder you cloned to in addition to the normal library paths.
+For example, if your Python packages are typically installed in `~/anaconda3/envs/main/lib/python3.7/site-packages/`, Python will also search the folder you cloned to: `~/diffusers/`.
+
+
+
+You must keep the `diffusers` folder if you want to keep using the library.
+
+
+
+Now you can easily update your clone to the latest version of π€ Diffusers with the following command:
+
+```bash
+cd ~/diffusers/
+git pull
+```
+
+Your Python environment will find the `main` version of π€ Diffuers on the next run.
+
diff --git a/docs/source/models.mdx b/docs/source/models.mdx
new file mode 100644
index 000000000000..5c435dc8e1f1
--- /dev/null
+++ b/docs/source/models.mdx
@@ -0,0 +1,28 @@
+
+
+# Models
+
+Diffusers contains pretrained models for popular algorithms and modules for creating the next set of diffusion models.
+The primary function of these models is to denoise an input sample, by modeling the distribution $p_\theta(\mathbf{x}_{t-1}|\mathbf{x}_t)$.
+The models are built on the base class ['ModelMixin'] that is a `torch.nn.module` with basic functionality for saving and loading models both locally and from the HuggingFace hub.
+
+## API
+
+Models should provide the `def forward` function and initialization of the model.
+All saving, loading, and utilities should be in the base ['ModelMixin'] class.
+
+## Examples
+
+- The ['UNetModel'] was proposed in [TODO](https://arxiv.org/) and has been used in paper1, paper2, paper3.
+- Extensions of the ['UNetModel'] include the ['UNetGlideModel'] that uses attention and timestep embeddings for the [GLIDE](https://arxiv.org/abs/2112.10741) paper, the ['UNetGradTTS'] model from this [paper](https://arxiv.org/abs/2105.06337) for text-to-speech, ['UNetLDMModel'] for latent-diffusion models in this [paper](https://arxiv.org/abs/2112.10752), and the ['TemporalUNet'] used for time-series prediciton in this reinforcement learning [paper](https://arxiv.org/abs/2205.09991).
+- TODO: mention VAE / SDE score estimation
\ No newline at end of file
diff --git a/docs/source/models/unet.mdx b/docs/source/models/unet.mdx
new file mode 100644
index 000000000000..948562d3ae2a
--- /dev/null
+++ b/docs/source/models/unet.mdx
@@ -0,0 +1,4 @@
+# UNet
+
+The UNet is an example often used in diffusion models.
+It was originally published [here](https://www.google.com).
\ No newline at end of file
diff --git a/docs/source/philosophy.mdx b/docs/source/philosophy.mdx
new file mode 100644
index 000000000000..a0d4f95ddb06
--- /dev/null
+++ b/docs/source/philosophy.mdx
@@ -0,0 +1,17 @@
+
+
+# Philosophy
+
+- Readability and clarity is prefered over highly optimized code. A strong importance is put on providing readable, intuitive and elementary code design. *E.g.*, the provided [schedulers](https://github.com/huggingface/diffusers/tree/main/src/diffusers/schedulers) are separated from the provided [models](https://github.com/huggingface/diffusers/tree/main/src/diffusers/models) and provide well-commented code that can be read alongside the original paper.
+- Diffusers is **modality independent** and focusses on providing pretrained models and tools to build systems that generate **continous outputs**, *e.g.* vision and audio.
+- Diffusion models and schedulers are provided as consise, elementary building blocks whereas diffusion pipelines are a collection of end-to-end diffusion systems that can be used out-of-the-box, should stay as close as possible to their original implementation and can include components of other library, such as text-encoders. Examples for diffusion pipelines are [Glide](https://github.com/openai/glide-text2im) and [Latent Diffusion](https://github.com/CompVis/latent-diffusion).
diff --git a/docs/source/pipelines.mdx b/docs/source/pipelines.mdx
new file mode 100644
index 000000000000..993b4edf43fc
--- /dev/null
+++ b/docs/source/pipelines.mdx
@@ -0,0 +1,31 @@
+
+
+# Pipelines
+
+- Pipelines are a collection of end-to-end diffusion systems that can be used out-of-the-box
+- Pipelines should stay as close as possible to their original implementation
+- Pipelines can include components of other library, such as text-encoders.
+
+## API
+
+TODO(Patrick, Anton, Suraj)
+
+## Examples
+
+- DDPM for unconditional image generation in [pipeline_ddpm](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_ddpm.py).
+- DDIM for unconditional image generation in [pipeline_ddim](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_ddim.py).
+- PNDM for unconditional image generation in [pipeline_pndm](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_pndm.py).
+- Latent diffusion for text to image generation / conditional image generation in [pipeline_latent_diffusion](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_latent_diffusion.py).
+- Glide for text to image generation / conditional image generation in [pipeline_glide](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_glide.py).
+- BDDMPipeline for spectrogram-to-sound vocoding in [pipeline_bddm](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_bddm.py).
+- Grad-TTS for text to audio generation / conditional audio generation in [pipeline_grad_tts](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_grad_tts.py).
diff --git a/docs/source/pipelines/glide.mdx b/docs/source/pipelines/glide.mdx
new file mode 100644
index 000000000000..330b11f6a87f
--- /dev/null
+++ b/docs/source/pipelines/glide.mdx
@@ -0,0 +1 @@
+# GLIDE MODEL
\ No newline at end of file
diff --git a/docs/source/quicktour.mdx b/docs/source/quicktour.mdx
new file mode 100644
index 000000000000..044f3937b9bb
--- /dev/null
+++ b/docs/source/quicktour.mdx
@@ -0,0 +1,32 @@
+
+
+
+
+# Quicktour
+
+Start using Diffusers𧨠quickly!
+To start, use the [`DiffusionPipeline`] for quick inference and sample generations!
+
+```
+pip install diffusers
+```
+
+## Main classes
+
+### Models
+
+### Schedulers
+
+### Pipeliens
+
+
diff --git a/docs/source/schedulers.mdx b/docs/source/schedulers.mdx
new file mode 100644
index 000000000000..18ef10b1936f
--- /dev/null
+++ b/docs/source/schedulers.mdx
@@ -0,0 +1,33 @@
+
+
+# Schedulers
+
+The base class ['SchedulerMixin'] implements low level utilities used by multiple schedulers.
+At a high level:
+- Schedulers are the algorithms to use diffusion models in inference as well as for training. They include the noise schedules and define algorithm-specific diffusion steps.
+- Schedulers can be used interchangable between diffusion models in inference to find the preferred tradef-off between speed and generation quality.
+- Schedulers are available in numpy, but can easily be transformed into PyTorch.
+
+## API
+
+- Schedulers should provide one or more `def step(...)` functions that should be called iteratively to unroll the diffusion loop during
+the forward pass.
+- Schedulers should be framework-agonstic, but provide a simple functionality to convert the scheduler into a specific framework, such as PyTorch
+with a `set_format(...)` method.
+
+## Examples
+
+- The ['DDPMScheduler'] was proposed in [Denoising Diffusion Probabilistic Models](https://arxiv.org/abs/2006.11239) and can be found in [scheduling_ddpm.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/schedulers/scheduling_ddpm.py).
+An example of how to use this scheduler can be found in [pipeline_ddpm.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_ddpm.py).
+- The ['DDIMScheduler'] was proposed in [Denoising Diffusion Implicit Models](https://arxiv.org/abs/2010.02502) and can be found in [scheduling_ddim.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/schedulers/scheduling_ddim.py). An example of how to use this scheduler can be found in [pipeline_ddim.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_ddim.py).
+- The ['PNMDScheduler'] was proposed in [Pseudo Numerical Methods for Diffusion Models on Manifolds](https://arxiv.org/abs/2202.09778) and can be found in [scheduling_pndm.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/schedulers/scheduling_pndm.py). An example of how to use this scheduler can be found in [pipeline_pndm.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_pndm.py).
\ No newline at end of file
diff --git a/docs/source/schedulers/ddpm.mdx b/docs/source/schedulers/ddpm.mdx
new file mode 100644
index 000000000000..4050a03cdcfe
--- /dev/null
+++ b/docs/source/schedulers/ddpm.mdx
@@ -0,0 +1,3 @@
+# DDPM
+
+DDPM is a scheduler.
\ No newline at end of file