Skip to content

Latest commit

 

History

History
35 lines (22 loc) · 1.05 KB

03.02.2024.md

File metadata and controls

35 lines (22 loc) · 1.05 KB

Vision Language Models: MoE-LLaVA, MOBILE-AGENT, and more

Like 👍. Comment 💬. Subscribe 🟥. 🏘 Discord: https://discord.gg/pPAFwndTJd

YouTube: https://youtube.com/live/uYb38g-weEY

X: https://twitter.com/i/broadcasts/1zqKVqkYELnxB

Twitch: https://www.twitch.tv/hu_po

References

MoE-LLaVA: Mixture of Experts for Large Vision-Language Models https://arxiv.org/pdf/2401.15947.pdf

Routers in Vision Mixture of Experts: An Empirical Study https://arxiv.org/pdf/2401.15969.pdf

InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Models https://arxiv.org/pdf/2401.16420.pdf

LLaVA-1.6: Improved reasoning, OCR, and world knowledge https://llava-vl.github.io/blog/2024-01-30-llava-1-6/

MouSi: Poly-Visual-Expert Vision-Language Models https://arxiv.org/pdf/2401.17221.pdf

https://github.com/vikhyat/moondream https://huggingface.co/LanguageBind/MoE-LLaVA-Phi2-2.7B-4e-384 https://replicate.com/yorickvp/llava-v1.6-mistral-7b https://qwenlm.github.io/blog/qwen-vl/