Mini Sora 开源社区定位为由社区同学自发组织的开源社区(免费不收取任何费用、不割韭菜),Mini Sora 计划探索 Sora 的实现路径和后续的发展方向:
- 将定期举办 Sora 的圆桌和社区一起探讨可能性
- 视频生成的现有技术路径探讨
- Sora: Creating video from text 技术报告: Video generation models as world simulators
- DiT: Scalable Diffusion Models with Transformers
- Latte: Latte: Latent Diffusion Transformer for Video Generation
- 更新中...
论文 | 链接 |
---|---|
1) Guided-Diffusion: Diffusion Models Beat GANs on Image Synthesis | Paper, Github |
2) Latent Diffusion: High-Resolution Image Synthesis with Latent Diffusion Models | Paper, Github |
3) EDM: Elucidating the Design Space of Diffusion-Based Generative Models | Paper, Github |
4) DDPM: Denoising Diffusion Probabilistic Models | Paper, Github |
5) DDIM: Denoising Diffusion Implicit Models | Paper, Github |
6) Score-Based Diffusion: Score-Based Generative Modeling through Stochastic Differential Equations | Paper, Github, Blog |
论文 | 链接 |
---|---|
1) UViT: All are Worth Words: A ViT Backbone for Diffusion Models | Paper, Github, ModelScope |
2) DiT: Scalable Diffusion Models with Transformers | Paper, Github, ModelScope |
3) SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers | Paper, Github, ModelScope |
4) FiT: Flexible Vision Transformer for Diffusion Model | Paper, Github |
5) k-diffusion: Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers | Paper, Github |
论文 | 链接 |
---|---|
1) Animatediff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning | Paper, Github, ModelScope |
2) I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models | Paper, Github, ModelScope |
4) Imagen Video: High Definition Video Generation with Diffusion Models | Paper |
5) MoCoGAN: Decomposing Motion and Content for Video Generation | Paper |
6) Adversarial Video Generation on Complex Datasets | Paper |
7) Photorealistic Video Generation with Diffusion Models | Paper |
8) VideoGPT: Video Generation using VQ-VAE and Transformers | Paper, Github |
9) Video Diffusion Models | Paper, Github, Project |
10) MCVD: Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation | Paper, Github, Project, Blog |
11) VideoPoet: A Large Language Model for Zero-Shot Video Generation | Paper |
论文 | 链接 |
---|---|
1) World Model on Million-Length Video And Language With RingAttention | Paper, Github |
2) Ring Attention with Blockwise Transformers for Near-Infinite Context | Paper, Github |
3) Extending LLMs' Context Window with 100 Samples | Paper, Github |
4) Efficient Streaming Language Models with Attention Sinks | Paper, Github |
5) The What, Why, and How of Context Length Extension Techniques in Large Language Models – A Detailed Survey | Paper |
论文 | 链接 |
---|---|
1) ViViT: A Video Vision Transformer | Paper, Github |
2) VideoLDM: Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models | Paper |