Vision Encoder

commonly used vision encoder

Table of contents

Image Encoder

Paper	Framework	Data	Code	Publication	Preprint	Affiliation
DINOv2: Learning Robust Visual Features without Supervision	DINO+iBOT	142M Image Pair	DINOv2	TMLR	2304.07193	Meta
Sigmoid Loss for Language Image Pre-Training	Contrastive (sigmoid)	900M Image-text Pair	SigLIP	ICCV 2023	2303.15343	Google
Learning Transferable Visual Models From Natural Language Supervision	Contrastive (softmax)	400M Image-text Pair	CLIP	ICML 2021	2103.00020	OpenAI

[2024-06] Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs arxiv | comparison of different image encoder on LLM