Skip to content
View Noctis-SC's full-sized avatar
  • Shenzhen, China
  • 18:29 - 12h behind

Highlights

  • Pro

Block or report Noctis-SC

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

[ICLR 2025] MLLM for On-Demand Spatial-Temporal Understanding at Arbitrary Resolution

Python 295 16 Updated Feb 27, 2025

Code for the paper Hybrid Spectrogram and Waveform Source Separation

Python 1,319 128 Updated Jul 15, 2024

LLaVA-CoT, a visual language model capable of spontaneous, systematic reasoning

Python 1,887 70 Updated Jan 22, 2025

✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Python 2,146 164 Updated Feb 13, 2025

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image

Jupyter Notebook 27,790 3,484 Updated Jul 23, 2024

High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.

Python 5,731 764 Updated Dec 24, 2024

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

Python 38,321 4,802 Updated Aug 16, 2024

A Conversational Speech Generation Model

6,083 197 Updated Feb 26, 2025

https://hf.co/hexgrad/Kokoro-82M

JavaScript 1,540 154 Updated Mar 1, 2025

Excalidraw app for mac. Powered by pure SwiftUI.

Swift 376 23 Updated Mar 10, 2025

Implement a ChatGPT-like LLM in PyTorch from scratch, step by step

Jupyter Notebook 41,733 5,668 Updated Mar 9, 2025

Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.

Python 291,466 48,430 Updated Dec 2, 2024

A curated list of awesome places to learn and/or practice algorithms.

21,947 2,739 Updated Nov 16, 2024

✨✨Latest Advances on Multimodal Large Language Models

14,192 915 Updated Mar 5, 2025

Latitude is the open-source prompt engineering platform to build, evaluate, and refine your prompts with AI

TypeScript 1,720 103 Updated Mar 10, 2025

Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high …

Python 526 186 Updated Mar 10, 2025

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding

Python 1,727 103 Updated Feb 27, 2025

The swiss army knife of lossless video/audio editing

TypeScript 30,008 1,430 Updated Feb 20, 2025

R1-onevision, a visual language model capable of deep CoT reasoning.

399 10 Updated Feb 28, 2025

Ultralytics YOLO11 🚀

Python 37,720 7,323 Updated Mar 11, 2025

A curated list of awesome computer vision resources

21,415 4,275 Updated May 17, 2024

Wan: Open and Advanced Large-Scale Video Generative Models

Python 7,894 812 Updated Mar 7, 2025

An efficient video loader for deep learning with smart shuffling that's super easy to digest

C++ 2,048 171 Updated Jul 17, 2024

VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling

Python 342 7 Updated Mar 9, 2025

EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL

Python 1,282 70 Updated Mar 10, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 4,574 431 Updated Mar 10, 2025

🐙 Guides, papers, lecture, notebooks and resources for prompt engineering

MDX 54,019 5,275 Updated Jan 21, 2025

Janus-Series: Unified Multimodal Understanding and Generation Models

Python 16,656 2,182 Updated Feb 1, 2025

A Zotero plugin for syncing items and notes into Notion

TypeScript 2,620 112 Updated Mar 1, 2025

Making large AI models cheaper, faster and more accessible

Python 40,573 4,477 Updated Mar 11, 2025
Next
Showing results