Skip to content

streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL

License

Notifications You must be signed in to change notification settings

roboflow/maestro

Repository files navigation

maestro

coming: when it's ready...

πŸ‘‹ hello

maestro is a tool designed to streamline and accelerate the fine-tuning process for multimodal models. It provides ready-to-use recipes for fine-tuning popular vision-language models (VLMs) such as Florence-2, PaliGemma, and Phi-3.5 Vision on downstream vision-language tasks.

πŸ’» install

Pip install the supervision package in a Python>=3.8 environment.

pip install maestro

πŸš€ example

Documentation and Florence-2 fine-tuning examples for object detection and VQA coming soon.

🚧 roadmap

  • Release a CLI for predefined fine-tuning recipes.
  • Multi-GPU fine-tuning support.
  • Allow multi-dataset fine-tuning and support multiple tasks at the same time.