maestro is a tool designed to streamline and accelerate the fine-tuning process for multimodal models. It provides ready-to-use recipes for fine-tuning popular vision-language models (VLMs) such as Florence-2, PaliGemma, and Phi-3.5 Vision on downstream vision-language tasks.
Pip install the supervision package in a Python>=3.8 environment.
pip install maestro
Documentation and Florence-2 fine-tuning examples for object detection and VQA coming soon.
- Release a CLI for predefined fine-tuning recipes.
- Multi-GPU fine-tuning support.
- Allow multi-dataset fine-tuning and support multiple tasks at the same time.