diff --git a/.github/workflows/build_documentation.yml b/.github/workflows/build_documentation.yml new file mode 100644 index 000000000000..4e016d07d55d --- /dev/null +++ b/.github/workflows/build_documentation.yml @@ -0,0 +1,17 @@ +name: Build documentation + +on: + push: + branches: + - main + - doc-builder* + - v*-release + +jobs: + build: + uses: huggingface/doc-builder/.github/workflows/build_main_documentation.yml@main + with: + commit_sha: ${{ github.sha }} + package: diffusers + secrets: + token: ${{ secrets.HUGGINGFACE_PUSH }} diff --git a/.github/workflows/build_pr_documentation.yml b/.github/workflows/build_pr_documentation.yml new file mode 100644 index 000000000000..d51623e735c5 --- /dev/null +++ b/.github/workflows/build_pr_documentation.yml @@ -0,0 +1,16 @@ +name: Build PR Documentation + +on: + pull_request: + +concurrency: + group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }} + cancel-in-progress: true + +jobs: + build: + uses: huggingface/doc-builder/.github/workflows/build_pr_documentation.yml@main + with: + commit_sha: ${{ github.event.pull_request.head.sha }} + pr_number: ${{ github.event.number }} + package: diffusers diff --git a/.github/workflows/delete_doc_comment.yml b/.github/workflows/delete_doc_comment.yml new file mode 100644 index 000000000000..238dc0bdbabf --- /dev/null +++ b/.github/workflows/delete_doc_comment.yml @@ -0,0 +1,13 @@ +name: Delete dev documentation + +on: + pull_request: + types: [ closed ] + + +jobs: + delete: + uses: huggingface/doc-builder/.github/workflows/delete_doc_comment.yml@main + with: + pr_number: ${{ github.event.number }} + package: diffusers diff --git a/docs/source/_toctree.yml b/docs/source/_toctree.yml new file mode 100644 index 000000000000..2c1fb407939a --- /dev/null +++ b/docs/source/_toctree.yml @@ -0,0 +1,40 @@ +- sections: + - local: index + title: 🧨 Diffusers + - local: quicktour + title: Quicktour + - local: philosophy + title: Philosophy + title: Get started +- sections: + - sections: + - local: examples/diffusers_for_vision + title: Diffusers for Vision + - local: examples/diffusers_for_audio + title: Diffusers for Audio + - local: examples/diffusers_for_other + title: Diffusers for Other Modalities + title: Examples + title: Using Diffusers +- sections: + - sections: + - local: pipelines + title: Pipelines + - local: schedulers + title: Schedulers + - local: models + title: Models + title: Main Classes + - sections: + - local: pipelines/glide + title: "Glide" + title: Pipelines + - sections: + - local: schedulers/ddpm + title: "DDPM" + title: Schedulers + - sections: + - local: models/unet + title: "Unet" + title: Models + title: API diff --git a/docs/source/examples/diffusers_for_audio.mdx b/docs/source/examples/diffusers_for_audio.mdx new file mode 100644 index 000000000000..d16980586848 --- /dev/null +++ b/docs/source/examples/diffusers_for_audio.mdx @@ -0,0 +1,13 @@ + + +# Diffusers for audio \ No newline at end of file diff --git a/docs/source/examples/diffusers_for_other.mdx b/docs/source/examples/diffusers_for_other.mdx new file mode 100644 index 000000000000..79fc7d551540 --- /dev/null +++ b/docs/source/examples/diffusers_for_other.mdx @@ -0,0 +1,20 @@ + + +# Diffusers for other modalities + +Diffusers offers support to other modalities than vision and audio. +Currently, some examples include: +- [Diffuser](https://diffusion-planning.github.io/) for planning in reinforcement learning (currenlty only inference): [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1TmBmlYeKUZSkUZoJqfBmaicVTKx6nN1R?usp=sharing) + +If you are interested in contributing to under-construction examples, you can explore: +- [GeoDiff](https://github.com/MinkaiXu/GeoDiff) for generating 3D configurations of molecule diagrams [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1pLYYWQhdLuv1q-JtEHGZybxp2RBF8gPs?usp=sharing). \ No newline at end of file diff --git a/docs/source/examples/diffusers_for_vision.mdx b/docs/source/examples/diffusers_for_vision.mdx new file mode 100644 index 000000000000..624938f59d46 --- /dev/null +++ b/docs/source/examples/diffusers_for_vision.mdx @@ -0,0 +1,149 @@ + + +# Diffusers for vision + +## Direct image generation + +#### **Example image generation with PNDM** + +```python +from diffusers import PNDM, UNetModel, PNDMScheduler +import PIL.Image +import numpy as np +import torch + +model_id = "fusing/ddim-celeba-hq" + +model = UNetModel.from_pretrained(model_id) +scheduler = PNDMScheduler() + +# load model and scheduler +pndm = PNDM(unet=model, noise_scheduler=scheduler) + +# run pipeline in inference (sample random noise and denoise) +with torch.no_grad(): + image = pndm() + +# process image to PIL +image_processed = image.cpu().permute(0, 2, 3, 1) +image_processed = (image_processed + 1.0) / 2 +image_processed = torch.clamp(image_processed, 0.0, 1.0) +image_processed = image_processed * 255 +image_processed = image_processed.numpy().astype(np.uint8) +image_pil = PIL.Image.fromarray(image_processed[0]) + +# save image +image_pil.save("test.png") +``` + +#### **Example 1024x1024 image generation with SDE VE** + +See [paper](https://arxiv.org/abs/2011.13456) for more information on SDE VE. + +```python +from diffusers import DiffusionPipeline +import torch +import PIL.Image +import numpy as np + +torch.manual_seed(32) + +score_sde_sv = DiffusionPipeline.from_pretrained("fusing/ffhq_ncsnpp") + +# Note this might take up to 3 minutes on a GPU +image = score_sde_sv(num_inference_steps=2000) + +image = image.permute(0, 2, 3, 1).cpu().numpy() +image = np.clip(image * 255, 0, 255).astype(np.uint8) +image_pil = PIL.Image.fromarray(image[0]) + +# save image +image_pil.save("test.png") +``` +#### **Example 32x32 image generation with SDE VP** + +See [paper](https://arxiv.org/abs/2011.13456) for more information on SDE VE. + +```python +from diffusers import DiffusionPipeline +import torch +import PIL.Image +import numpy as np + +torch.manual_seed(32) + +score_sde_sv = DiffusionPipeline.from_pretrained("fusing/cifar10-ddpmpp-deep-vp") + +# Note this might take up to 3 minutes on a GPU +image = score_sde_sv(num_inference_steps=1000) + +image = image.permute(0, 2, 3, 1).cpu().numpy() +image = np.clip(image * 255, 0, 255).astype(np.uint8) +image_pil = PIL.Image.fromarray(image[0]) + +# save image +image_pil.save("test.png") +``` + + +#### **Text to Image generation with Latent Diffusion** + +_Note: To use latent diffusion install transformers from [this branch](https://github.com/patil-suraj/transformers/tree/ldm-bert)._ + +```python +from diffusers import DiffusionPipeline + +ldm = DiffusionPipeline.from_pretrained("fusing/latent-diffusion-text2im-large") + +generator = torch.manual_seed(42) + +prompt = "A painting of a squirrel eating a burger" +image = ldm([prompt], generator=generator, eta=0.3, guidance_scale=6.0, num_inference_steps=50) + +image_processed = image.cpu().permute(0, 2, 3, 1) +image_processed = image_processed * 255. +image_processed = image_processed.numpy().astype(np.uint8) +image_pil = PIL.Image.fromarray(image_processed[0]) + +# save image +image_pil.save("test.png") +``` + + +## Text to image generation + +```python +import torch +from diffusers import BDDMPipeline, GradTTSPipeline + +torch_device = "cuda" + +# load grad tts and bddm pipelines +grad_tts = GradTTSPipeline.from_pretrained("fusing/grad-tts-libri-tts") +bddm = BDDMPipeline.from_pretrained("fusing/diffwave-vocoder-ljspeech") + +text = "Hello world, I missed you so much." + +# generate mel spectograms using text +mel_spec = grad_tts(text, torch_device=torch_device) + +# generate the speech by passing mel spectograms to BDDMPipeline pipeline +generator = torch.manual_seed(42) +audio = bddm(mel_spec, generator, torch_device=torch_device) + +# save generated audio +from scipy.io.wavfile import write as wavwrite +sampling_rate = 22050 +wavwrite("generated_audio.wav", sampling_rate, audio.squeeze().cpu().numpy()) +``` + diff --git a/docs/source/index.mdx b/docs/source/index.mdx new file mode 100644 index 000000000000..28459117e0e4 --- /dev/null +++ b/docs/source/index.mdx @@ -0,0 +1,110 @@ + + +

+
+ +
+

+ +# 🧨 Diffusers + + +πŸ€— Diffusers provides pretrained diffusion models across multiple modalities, such as vision and audio, and serves +as a modular toolbox for inference and training of diffusion models. + +More precisely, πŸ€— Diffusers offers: + +- State-of-the-art diffusion pipelines that can be run in inference with just a couple of lines of code (see [src/diffusers/pipelines](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines)). +- Various noise schedulers that can be used interchangeably for the prefered speed vs. quality trade-off in inference (see [src/diffusers/schedulers](https://github.com/huggingface/diffusers/tree/main/src/diffusers/schedulers)). +- Multiple types of models, such as UNet, that can be used as building blocks in an end-to-end diffusion system (see [src/diffusers/models](https://github.com/huggingface/diffusers/tree/main/src/diffusers/models)). +- Training examples to show how to train the most popular diffusion models (see [examples](https://github.com/huggingface/diffusers/tree/main/examples)). + +# Installation + +Install Diffusers for with PyTorch. Support for other libraries will come in the future + +πŸ€— Diffusers is tested on Python 3.6+, and PyTorch 1.4.0+. + +## Install with pip + +You should install πŸ€— Diffusers in a [virtual environment](https://docs.python.org/3/library/venv.html). +If you're unfamiliar with Python virtual environments, take a look at this [guide](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/). +A virtual environment makes it easier to manage different projects, and avoid compatibility issues between dependencies. + +Start by creating a virtual environment in your project directory: + +```bash +python -m venv .env +``` + +Activate the virtual environment: + +```bash +source .env/bin/activate +``` + +Now you're ready to install πŸ€— Diffusers with the following command: + +```bash +pip install diffusers +``` + +## Install from source + +Install πŸ€— Diffusers from source with the following command: + +```bash +pip install git+https://github.com/huggingface/diffusers +``` + +This command installs the bleeding edge `main` version rather than the latest `stable` version. +The `main` version is useful for staying up-to-date with the latest developments. +For instance, if a bug has been fixed since the last official release but a new release hasn't been rolled out yet. +However, this means the `main` version may not always be stable. +We strive to keep the `main` version operational, and most issues are usually resolved within a few hours or a day. +If you run into a problem, please open an [Issue](https://github.com/huggingface/transformers/issues) so we can fix it even sooner! + +## Editable install + +You will need an editable install if you'd like to: + +* Use the `main` version of the source code. +* Contribute to πŸ€— Diffusers and need to test changes in the code. + +Clone the repository and install πŸ€— Diffusers with the following commands: + +```bash +git clone https://github.com/huggingface/diffusers.git +cd transformers +pip install -e . +``` + +These commands will link the folder you cloned the repository to and your Python library paths. +Python will now look inside the folder you cloned to in addition to the normal library paths. +For example, if your Python packages are typically installed in `~/anaconda3/envs/main/lib/python3.7/site-packages/`, Python will also search the folder you cloned to: `~/diffusers/`. + + + +You must keep the `diffusers` folder if you want to keep using the library. + + + +Now you can easily update your clone to the latest version of πŸ€— Diffusers with the following command: + +```bash +cd ~/diffusers/ +git pull +``` + +Your Python environment will find the `main` version of πŸ€— Diffuers on the next run. + diff --git a/docs/source/models.mdx b/docs/source/models.mdx new file mode 100644 index 000000000000..5c435dc8e1f1 --- /dev/null +++ b/docs/source/models.mdx @@ -0,0 +1,28 @@ + + +# Models + +Diffusers contains pretrained models for popular algorithms and modules for creating the next set of diffusion models. +The primary function of these models is to denoise an input sample, by modeling the distribution $p_\theta(\mathbf{x}_{t-1}|\mathbf{x}_t)$. +The models are built on the base class ['ModelMixin'] that is a `torch.nn.module` with basic functionality for saving and loading models both locally and from the HuggingFace hub. + +## API + +Models should provide the `def forward` function and initialization of the model. +All saving, loading, and utilities should be in the base ['ModelMixin'] class. + +## Examples + +- The ['UNetModel'] was proposed in [TODO](https://arxiv.org/) and has been used in paper1, paper2, paper3. +- Extensions of the ['UNetModel'] include the ['UNetGlideModel'] that uses attention and timestep embeddings for the [GLIDE](https://arxiv.org/abs/2112.10741) paper, the ['UNetGradTTS'] model from this [paper](https://arxiv.org/abs/2105.06337) for text-to-speech, ['UNetLDMModel'] for latent-diffusion models in this [paper](https://arxiv.org/abs/2112.10752), and the ['TemporalUNet'] used for time-series prediciton in this reinforcement learning [paper](https://arxiv.org/abs/2205.09991). +- TODO: mention VAE / SDE score estimation \ No newline at end of file diff --git a/docs/source/models/unet.mdx b/docs/source/models/unet.mdx new file mode 100644 index 000000000000..948562d3ae2a --- /dev/null +++ b/docs/source/models/unet.mdx @@ -0,0 +1,4 @@ +# UNet + +The UNet is an example often used in diffusion models. +It was originally published [here](https://www.google.com). \ No newline at end of file diff --git a/docs/source/philosophy.mdx b/docs/source/philosophy.mdx new file mode 100644 index 000000000000..a0d4f95ddb06 --- /dev/null +++ b/docs/source/philosophy.mdx @@ -0,0 +1,17 @@ + + +# Philosophy + +- Readability and clarity is prefered over highly optimized code. A strong importance is put on providing readable, intuitive and elementary code design. *E.g.*, the provided [schedulers](https://github.com/huggingface/diffusers/tree/main/src/diffusers/schedulers) are separated from the provided [models](https://github.com/huggingface/diffusers/tree/main/src/diffusers/models) and provide well-commented code that can be read alongside the original paper. +- Diffusers is **modality independent** and focusses on providing pretrained models and tools to build systems that generate **continous outputs**, *e.g.* vision and audio. +- Diffusion models and schedulers are provided as consise, elementary building blocks whereas diffusion pipelines are a collection of end-to-end diffusion systems that can be used out-of-the-box, should stay as close as possible to their original implementation and can include components of other library, such as text-encoders. Examples for diffusion pipelines are [Glide](https://github.com/openai/glide-text2im) and [Latent Diffusion](https://github.com/CompVis/latent-diffusion). diff --git a/docs/source/pipelines.mdx b/docs/source/pipelines.mdx new file mode 100644 index 000000000000..993b4edf43fc --- /dev/null +++ b/docs/source/pipelines.mdx @@ -0,0 +1,31 @@ + + +# Pipelines + +- Pipelines are a collection of end-to-end diffusion systems that can be used out-of-the-box +- Pipelines should stay as close as possible to their original implementation +- Pipelines can include components of other library, such as text-encoders. + +## API + +TODO(Patrick, Anton, Suraj) + +## Examples + +- DDPM for unconditional image generation in [pipeline_ddpm](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_ddpm.py). +- DDIM for unconditional image generation in [pipeline_ddim](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_ddim.py). +- PNDM for unconditional image generation in [pipeline_pndm](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_pndm.py). +- Latent diffusion for text to image generation / conditional image generation in [pipeline_latent_diffusion](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_latent_diffusion.py). +- Glide for text to image generation / conditional image generation in [pipeline_glide](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_glide.py). +- BDDMPipeline for spectrogram-to-sound vocoding in [pipeline_bddm](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_bddm.py). +- Grad-TTS for text to audio generation / conditional audio generation in [pipeline_grad_tts](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_grad_tts.py). diff --git a/docs/source/pipelines/glide.mdx b/docs/source/pipelines/glide.mdx new file mode 100644 index 000000000000..330b11f6a87f --- /dev/null +++ b/docs/source/pipelines/glide.mdx @@ -0,0 +1 @@ +# GLIDE MODEL \ No newline at end of file diff --git a/docs/source/quicktour.mdx b/docs/source/quicktour.mdx new file mode 100644 index 000000000000..044f3937b9bb --- /dev/null +++ b/docs/source/quicktour.mdx @@ -0,0 +1,32 @@ + + + + +# Quicktour + +Start using Diffusers🧨 quickly! +To start, use the [`DiffusionPipeline`] for quick inference and sample generations! + +``` +pip install diffusers +``` + +## Main classes + +### Models + +### Schedulers + +### Pipeliens + + diff --git a/docs/source/schedulers.mdx b/docs/source/schedulers.mdx new file mode 100644 index 000000000000..18ef10b1936f --- /dev/null +++ b/docs/source/schedulers.mdx @@ -0,0 +1,33 @@ + + +# Schedulers + +The base class ['SchedulerMixin'] implements low level utilities used by multiple schedulers. +At a high level: +- Schedulers are the algorithms to use diffusion models in inference as well as for training. They include the noise schedules and define algorithm-specific diffusion steps. +- Schedulers can be used interchangable between diffusion models in inference to find the preferred tradef-off between speed and generation quality. +- Schedulers are available in numpy, but can easily be transformed into PyTorch. + +## API + +- Schedulers should provide one or more `def step(...)` functions that should be called iteratively to unroll the diffusion loop during +the forward pass. +- Schedulers should be framework-agonstic, but provide a simple functionality to convert the scheduler into a specific framework, such as PyTorch +with a `set_format(...)` method. + +## Examples + +- The ['DDPMScheduler'] was proposed in [Denoising Diffusion Probabilistic Models](https://arxiv.org/abs/2006.11239) and can be found in [scheduling_ddpm.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/schedulers/scheduling_ddpm.py). +An example of how to use this scheduler can be found in [pipeline_ddpm.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_ddpm.py). +- The ['DDIMScheduler'] was proposed in [Denoising Diffusion Implicit Models](https://arxiv.org/abs/2010.02502) and can be found in [scheduling_ddim.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/schedulers/scheduling_ddim.py). An example of how to use this scheduler can be found in [pipeline_ddim.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_ddim.py). +- The ['PNMDScheduler'] was proposed in [Pseudo Numerical Methods for Diffusion Models on Manifolds](https://arxiv.org/abs/2202.09778) and can be found in [scheduling_pndm.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/schedulers/scheduling_pndm.py). An example of how to use this scheduler can be found in [pipeline_pndm.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_pndm.py). \ No newline at end of file diff --git a/docs/source/schedulers/ddpm.mdx b/docs/source/schedulers/ddpm.mdx new file mode 100644 index 000000000000..4050a03cdcfe --- /dev/null +++ b/docs/source/schedulers/ddpm.mdx @@ -0,0 +1,3 @@ +# DDPM + +DDPM is a scheduler. \ No newline at end of file