Skip to content

Latest commit

 

History

History
25 lines (15 loc) · 3.51 KB

VisionEncoder.md

File metadata and controls

25 lines (15 loc) · 3.51 KB

Vision Encoder

commonly used vision encoder

Table of contents

Image Encoder

Paper Framework Data Code Publication Preprint Affiliation
DINOv2: Learning Robust Visual Features without Supervision DINO+iBOT 142M Image Pair DINOv2 TMLR 2304.07193 Meta
Sigmoid Loss for Language Image Pre-Training Contrastive (sigmoid) 900M Image-text Pair SigLIP ICCV 2023 2303.15343 Google
Learning Transferable Visual Models From Natural Language Supervision Contrastive (softmax) 400M Image-text Pair CLIP ICML 2021 2103.00020 OpenAI

Video Encoder

Encoder Analysis

  • [2024-06] Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs arxiv | comparison of different image encoder on LLM