Skip to content

Latest commit

 

History

History
96 lines (78 loc) · 9 KB

image-generation.md

File metadata and controls

96 lines (78 loc) · 9 KB

🏠Home

Image Generation

Models

text2image:

text to 3d:

image to 3d:

  • Wonder3D A cross-domain diffusion model for 3D reconstruction from a single image
  • DreamCraft3D Official implementation of DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior

image to text (OCR):

other:

  • facebookresearch/segment-anything image segmentation
    • YOLOv8 SOTA object detection, segmentation, classification and tracking
    • DINOv2 1B-parameter ViT model to generate robust all-purpose visual features that outperform OpenCLIP benchmarks at image and pixel levels
  • Final2x Image super-resolution through interpolation supporting multiple models like RealCUGAN, ESRGAN, Waifu2x, SRMD
  • text-to-room text to room
  • DragGAN Interactive Point-based Manipulation on Generative Images, demo
  • DragDiffusion Harnessing Diffusion Models for Interactive Point-based Image Editing
  • HQTrack Tracking Anything in High Quality (HQTrack) is a framework for high performance video object tracking and segmentation
  • CoTracker It is Better to Track Together. A fast transformer-based model that can track any point in a video
  • ZeroNVS Zero shot 460 degree view synthesis from single images
  • x-stable-diffusion Real-time inference for Stable Diffusion - 0.88s latency

Wrappers & GUIs

Fine Tuning

Research

  • Speed Is All You Need up to 50% speed increase for Latent Diffusion Models
  • ORCa converts glossy objects into radiance-field cameras, enabling depth estimation and novel-view synthesis, project, code
  • cocktail Mixing Multi-Modality Controls for Text-Conditional Image Generation, project, code
  • SnapFusion Fast text-to-image diffusion on mobile phones in 2 seconds
  • Objaverse-xl dataset of 10 million annotated high quality 3D objects, hf
  • LightGlue Local Feature Matching at Light Speed, a lightweight feature matcher with high accuracy and blazing fast inference. It takes as input a set of keypoints and descriptors for each image and returns the indices of corresponding points