- Shenzhen, China
-
18:29
- 12h behind
Highlights
- Pro
Stars
[ICLR 2025] MLLM for On-Demand Spatial-Temporal Understanding at Arbitrary Resolution
adefossez / demucs
Forked from facebookresearch/demucsCode for the paper Hybrid Spectrogram and Waveform Source Separation
LLaVA-CoT, a visual language model capable of spontaneous, systematic reasoning
✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Excalidraw app for mac. Powered by pure SwiftUI.
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.
A curated list of awesome places to learn and/or practice algorithms.
✨✨Latest Advances on Multimodal Large Language Models
Latitude is the open-source prompt engineering platform to build, evaluate, and refine your prompts with AI
Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high …
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
The swiss army knife of lossless video/audio editing
R1-onevision, a visual language model capable of deep CoT reasoning.
A curated list of awesome computer vision resources
Wan: Open and Advanced Large-Scale Video Generative Models
An efficient video loader for deep learning with smart shuffling that's super easy to digest
VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
verl: Volcano Engine Reinforcement Learning for LLMs
🐙 Guides, papers, lecture, notebooks and resources for prompt engineering
Janus-Series: Unified Multimodal Understanding and Generation Models
A Zotero plugin for syncing items and notes into Notion
Making large AI models cheaper, faster and more accessible