TinyGPT-V, MobileVLM
Like 👍. Comment 💬. Subscribe 🟥. 🏘 Discord: https://discord.gg/pPAFwndTJd
YouTube: https://youtube.com/live/gfRHoqdZqxw
X: https://twitter.com/i/broadcasts/1ZkJzjQNPWoJv
Twitch: https://www.twitch.tv/hu_po
MobileVLM: A Fast, Strong and Open Vision Language Assistant for Mobile Devices https://arxiv.org/pdf/2312.16886.pdf
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones https://arxiv.org/pdf/2312.16862v1.pdf
GPT-4V(ision) is a Generalist Web Agent, if Grounded https://arxiv.org/pdf/2401.01614.pdf
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training https://arxiv.org/pdf/2401.00849.pdf
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models https://arxiv.org/abs/2301.12597
Conditional Positional Encodings for Vision Transformers https://arxiv.org/abs/2102.10882
Batch Norm, Layer Norm https://i.stack.imgur.com/fAowJ.png