Vision Language Models: MoE-LLaVA, MOBILE-AGENT, and more
Like 👍. Comment 💬. Subscribe 🟥. 🏘 Discord: https://discord.gg/pPAFwndTJd
YouTube: https://youtube.com/live/uYb38g-weEY
X: https://twitter.com/i/broadcasts/1zqKVqkYELnxB
Twitch: https://www.twitch.tv/hu_po
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models https://arxiv.org/pdf/2401.15947.pdf
Routers in Vision Mixture of Experts: An Empirical Study https://arxiv.org/pdf/2401.15969.pdf
InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Models https://arxiv.org/pdf/2401.16420.pdf
LLaVA-1.6: Improved reasoning, OCR, and world knowledge https://llava-vl.github.io/blog/2024-01-30-llava-1-6/
MouSi: Poly-Visual-Expert Vision-Language Models https://arxiv.org/pdf/2401.17221.pdf
https://github.com/vikhyat/moondream https://huggingface.co/LanguageBind/MoE-LLaVA-Phi2-2.7B-4e-384 https://replicate.com/yorickvp/llava-v1.6-mistral-7b https://qwenlm.github.io/blog/qwen-vl/