Wan-Video · bakhti-ai · Feb 25, 2025 · Feb 26, 2025 · Feb 26, 2025 · Feb 26, 2025
diff --git a/.gitignore b/.gitignore
@@ -34,3 +34,5 @@ Wan2.1-T2V-14B/
 Wan2.1-T2V-1.3B/
 Wan2.1-I2V-14B-480P/
 Wan2.1-I2V-14B-720P/
+venv_wan/
+venv_wan_py310/
diff --git a/README.md b/README.md
@@ -1,31 +1,78 @@
-# Wan2.1
+# Wan2.1 Text-to-Video Model
 
-<p align="center">
-    <img src="assets/logo.png" width="400"/>
-<p>
+This repository contains the Wan2.1 text-to-video model, adapted for macOS with M1 Pro chip. This adaptation allows macOS users to run the model efficiently, overcoming CUDA-specific limitations.
 
 <p align="center">
     💜 <a href=""><b>Wan</b></a> &nbsp&nbsp ｜ &nbsp&nbsp 🖥️ <a href="https://github.com/Wan-Video/Wan2.1">GitHub</a> &nbsp&nbsp  | &nbsp&nbsp🤗 <a href="https://huggingface.co/Wan-AI/">Hugging Face</a>&nbsp&nbsp | &nbsp&nbsp🤖 <a href="https://modelscope.cn/organization/Wan-AI">ModelScope</a>&nbsp&nbsp | &nbsp&nbsp 📑 <a href="">Paper (Coming soon)</a> &nbsp&nbsp | &nbsp&nbsp 📑 <a href="https://wanxai.com">Blog</a> &nbsp&nbsp | &nbsp&nbsp💬 <a href="https://gw.alicdn.com/imgextra/i2/O1CN01tqjWFi1ByuyehkTSB_!!6000000000015-0-tps-611-1279.jpg">WeChat Group</a>&nbsp&nbsp | &nbsp&nbsp 📖 <a href="https://discord.gg/AKNgpMK4Yj">Discord</a>&nbsp&nbsp
 <br>
 
------
+## Introduction
+The Wan2.1 model is an open-source text-to-video generation model. It transforms textual descriptions into video sequences, leveraging advanced machine learning techniques.
+
+## Changes for macOS
+
+This version includes modifications to make the model compatible with macOS, specifically for systems using the M1 Pro chip. Key changes include:
+
+- Adaptation of CUDA-specific code to work with MPS (Metal Performance Shaders) on macOS.
+- Environment variable settings for MPS fallback to CPU for unsupported operations.
+- Adjustments to command-line arguments for better compatibility with macOS.
+
+## Installation Instructions
+
+Follow these steps to set up the environment on macOS:
+
+1. **Install Homebrew**: If not already installed, use Homebrew to manage packages.
+   ```bash
+   /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
+   ```
+
+2. **Install Python 3.10+**:
+   ```bash
+   brew install [email protected]
+   ```
+
+3. **Create and Activate a Virtual Environment**:
+   ```bash
+   python3.10 -m venv venv_wan
+   source venv_wan/bin/activate
+   ```
+
+4. **Install Dependencies**:
+   ```bash
+   pip install -r requirements.txt
+   pip install einops
+   ```
+
+5. **Download models using huggingface-cli**:
+   ```bash
+   pip install "huggingface_hub[cli]"
+   huggingface-cli download Wan-AI/Wan2.1-T2V-1.3B --local-dir ./Wan2.1-T2V-1.3B
+   ```
+   **Or download models using huggingface-cli**:
+   ```bash
+   pip install modelscope
+   modelscope download Wan-AI/Wan2.1-T2V-1.3B --local_dir ./Wan2.1-T2V-1.3B
+   ```
+
+## Usage
+
+To generate a video, use the following command:
+
+```bash
+export PYTORCH_ENABLE_MPS_FALLBACK=1
+python generate.py --task t2v-1.3B --size "480*832" --frame_num 16 --sample_steps 25 --ckpt_dir ./Wan2.1-T2V-1.3B --offload_model True --t5_cpu --device mps --prompt "Lion running under snow in Samarkand" --save_file output_video.mp4
+```
 
-[**Wan: Open and Advanced Large-Scale Video Generative Models**]("") <be>
+## Optimization Tips
 
-In this repository, we present **Wan2.1**, a comprehensive and open suite of video foundation models that pushes the boundaries of video generation. **Wan2.1** offers these key features:
-- 👍 **SOTA Performance**: **Wan2.1** consistently outperforms existing open-source models and state-of-the-art commercial solutions across multiple benchmarks.
-- 👍 **Supports Consumer-grade GPUs**: The T2V-1.3B model requires only 8.19 GB VRAM, making it compatible with almost all consumer-grade GPUs. It can generate a 5-second 480P video on an RTX 4090 in about 4 minutes (without optimization techniques like quantization). Its performance is even comparable to some closed-source models.
-- 👍 **Multiple Tasks**: **Wan2.1** excels in Text-to-Video, Image-to-Video, Video Editing, Text-to-Image, and Video-to-Audio, advancing the field of video generation.
-- 👍 **Visual Text Generation**: **Wan2.1** is the first video model capable of generating both Chinese and English text, featuring robust text generation that enhances its practical applications.
-- 👍 **Powerful Video VAE**: **Wan-VAE** delivers exceptional efficiency and performance, encoding and decoding 1080P videos of any length while preserving temporal information, making it an ideal foundation for video and image generation.
+- **Use CPU for Large Models**: If you encounter memory issues, use `--device cpu`.
+- **Reduce Resolution and Frame Count**: Use smaller resolutions and fewer frames to reduce memory usage.
+- **Monitor System Resources**: Keep an eye on memory usage and adjust parameters as needed.
 
-## Video Demos
+## Acknowledgments
 
-<div align="center">
-  <video src="https://github.com/user-attachments/assets/4aca6063-60bf-4953-bfb7-e265053f49ef" width="70%" poster=""> </video>
-</div>
+This project is based on the original Wan2.1 model. Special thanks to the original authors and contributors for their work.
 
-## 🔥 Latest News!!
 
 * Mar 3, 2025: 👋 **Wan2.1**'s T2V and I2V have been integrated into Diffusers ([T2V](https://huggingface.co/docs/diffusers/main/en/api/pipelines/wan#diffusers.WanPipeline) | [I2V](https://huggingface.co/docs/diffusers/main/en/api/pipelines/wan#diffusers.WanImageToVideoPipeline)). Feel free to give it a try!
 * Feb 27, 2025: 👋 **Wan2.1** has been integrated into [ComfyUI](https://comfyanonymous.github.io/ComfyUI_examples/wan/). Enjoy!
@@ -36,6 +83,10 @@ If your work has improved **Wan2.1** and you would like more people to see it, p
 - [TeaCache](https://github.com/ali-vilab/TeaCache) now supports **Wan2.1** acceleration, capable of increasing speed by approximately 2x. Feel free to give it a try!
 - [DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio) provides more support for **Wan2.1**, including video-to-video, FP8 quantization, VRAM optimization, LoRA training, and more. Please refer to [their examples](https://github.com/modelscope/DiffSynth-Studio/tree/main/examples/wanvideo).
 
+<div align="center">
+  <video src="https://github.com/user-attachments/assets/4aca6063-60bf-4953-bfb7-e265053f49ef" width="70%" poster=""> </video>
+</div>
+
 
 ## 📑 Todo List
 - Wan2.1 Text-to-Video
@@ -49,6 +100,8 @@ If your work has improved **Wan2.1** and you would like more people to see it, p
     - [x] Multi-GPU Inference code of the 14B model
     - [x] Checkpoints of the 14B model
     - [x] Gradio demo
+    - [X] ComfyUI integration
+    - [ ] Diffusers integration
     - [x] ComfyUI integration
     - [x] Diffusers integration
     - [ ] Diffusers + Multi-GPU Inference
@@ -157,6 +210,14 @@ torchrun --nproc_per_node=8 generate.py --task t2v-14B --size 1280*720 --ckpt_di
 
 Extending the prompts can effectively enrich the details in the generated videos, further enhancing the video quality. Therefore, we recommend enabling prompt extension. We provide the following two methods for prompt extension:
 
+## Usage
+
+To generate a video, use the following command:
+
+```bash
+export PYTORCH_ENABLE_MPS_FALLBACK=1
+python generate.py --task t2v-1.3B --size "480*832" --frame_num 16 --sample_steps 25 --ckpt_dir ./Wan2.1-T2V-1.3B --offload_model True --t5_cpu --device mps --prompt "Lion running under snow in Samarkand" --save_file output_video.mp4
+```
 - Use the Dashscope API for extension.
   - Apply for a `dashscope.api_key` in advance ([EN](https://www.alibabacloud.com/help/en/model-studio/getting-started/first-api-call-to-qwen) | [CN](https://help.aliyun.com/zh/model-studio/getting-started/first-api-call-to-qwen)).
   - Configure the environment variable `DASH_API_KEY` to specify the Dashscope API key. For users of Alibaba Cloud's international site, you also need to set the environment variable `DASH_API_URL` to 'https://dashscope-intl.aliyuncs.com/api/v1'. For more detailed instructions, please refer to the [dashscope document](https://www.alibabacloud.com/help/en/model-studio/developer-reference/use-qwen-by-calling-api?spm=a2c63.p38356.0.i1).
@@ -483,9 +544,18 @@ The models in this repository are licensed under the Apache 2.0 License. We clai
 
 ## Acknowledgements
 
-We would like to thank the contributors to the [SD3](https://huggingface.co/stabilityai/stable-diffusion-3-medium), [Qwen](https://huggingface.co/Qwen), [umt5-xxl](https://huggingface.co/google/umt5-xxl), [diffusers](https://github.com/huggingface/diffusers) and [HuggingFace](https://huggingface.co) repositories, for their open research.
 
+## Optimization Tips
 
+- **Use CPU for Large Models**: If you encounter memory issues, use `--device cpu`.
+- **Reduce Resolution and Frame Count**: Use smaller resolutions and fewer frames to reduce memory usage.
+- **Monitor System Resources**: Keep an eye on memory usage and adjust parameters as needed.
+
+## Acknowledgments
+
+
+This project is based on the original Wan2.1 model. Special thanks to the original authors and contributors for their work.
 
 ## Contact Us
 If you would like to leave a message to our research or product teams, feel free to join our [Discord](https://discord.gg/AKNgpMK4Yj) or [WeChat groups](https://gw.alicdn.com/imgextra/i2/O1CN01tqjWFi1ByuyehkTSB_!!6000000000015-0-tps-611-1279.jpg)!
+