Skip to content

Commit

Permalink
Merge pull request #33 from roboflow/feature/foundations_of_training
Browse files Browse the repository at this point in the history
maestro Florence-2 fine-tuning
  • Loading branch information
SkalskiP authored Sep 11, 2024
2 parents 20933c6 + 3a82c11 commit ccd268c
Show file tree
Hide file tree
Showing 36 changed files with 1,921 additions and 2,031 deletions.
21 changes: 21 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v2.3.0
hooks:
- id: check-yaml
- id: end-of-file-fixer
- id: trailing-whitespace
- repo: https://github.com/psf/black
rev: 24.8.0
hooks:
- id: black
args: [--line-length=120]
- repo: https://github.com/pre-commit/mirrors-mypy
rev: v1.11.2
hooks:
- id: mypy
- repo: https://github.com/PyCQA/flake8
rev: 7.1.1
hooks:
- id: flake8
args: [--max-line-length=120]
143 changes: 31 additions & 112 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,142 +1,61 @@

<div align="center">

<h1>multimodal-maestro</h1>

<br>
<h1>maestro</h1>

[![version](https://badge.fury.io/py/maestro.svg)](https://badge.fury.io/py/maestro)
[![license](https://img.shields.io/pypi/l/maestro)](https://github.com/roboflow/multimodal-maestro/blob/main/LICENSE)
[![python-version](https://img.shields.io/pypi/pyversions/maestro)](https://badge.fury.io/py/maestro)
[![Gradio](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/Roboflow/SoM)
[![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/roboflow/multimodal-maestro/blob/develop/cookbooks/multimodal_maestro_gpt_4_vision.ipynb)
<p>coming: when it's ready...</p>

</div>

## 👋 hello

Multimodal-Maestro gives you more control over large multimodal models to get the
outputs you want. With more effective prompting tactics, you can get multimodal models
to do tasks you didn't know (or think!) were possible. Curious how it works? Try our
[HF space](https://huggingface.co/spaces/Roboflow/SoM)!
**maestro** is a tool designed to streamline and accelerate the fine-tuning process for
multimodal models. It provides ready-to-use recipes for fine-tuning popular
vision-language models (VLMs) such as **Florence-2**, **PaliGemma**, and
**Phi-3.5 Vision** on downstream vision-language tasks.

## 💻 install

⚠️ Our package has been renamed to `maestro`. Install the package in a
[**3.11>=Python>=3.8**](https://www.python.org/) environment.
Pip install the supervision package in a
[**Python>=3.8**](https://www.python.org/) environment.

```bash
pip install maestro
```

## 🔌 API
## 🔥 quickstart

🚧 The project is still under construction. The redesigned API is coming soon.
### CLI

![maestro-docs-Snap](https://github.com/roboflow/multimodal-maestro/assets/26109316/a787b7c0-527e-465a-9ca9-d46f4d63ea53)
VLMs can be fine-tuned on downstream tasks directly from the command line with
`maestro` command:

## 🧑‍🍳 prompting cookbooks
```bash
maestro florence2 train --dataset='<DATASET_PATH>' --epochs=10 --batch-size=8
```

| Description | Colab |
|:----------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|
| Prompt LMMs with Multimodal Maestro | [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/roboflow/multimodal-maestro/blob/develop/cookbooks/multimodal_maestro_gpt_4_vision.ipynb) |
| Manually annotate ONE image and let GPT-4V annotate ALL of them | [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/roboflow/multimodal-maestro/blob/develop/cookbooks/grounding_dino_and_gpt4_vision.ipynb) |
### SDK

Alternatively, you can fine-tune VLMs using the Python SDK, which accepts the same
arguments as the CLI example above:

## 🚀 example
```python
from maestro.trainer.common import MeanAveragePrecisionMetric
from maestro.trainer.models.florence_2 import train, TrainingConfiguration

```
Find dog.
config = TrainingConfiguration(
dataset='<DATASET_PATH>',
epochs=10,
batch_size=8,
metrics=[MeanAveragePrecisionMetric()]
)

>>> The dog is prominently featured in the center of the image with the label [9].
train(config)
```

<details close>
<summary>👉 read more</summary>

<br>

- **load image**

```python
import cv2

image = cv2.imread("...")
```

- **create and refine marks**

```python
import maestro

generator = maestro.SegmentAnythingMarkGenerator(device='cuda')
marks = generator.generate(image=image)
marks = maestro.refine_marks(marks=marks)
```

- **visualize marks**

```python
mark_visualizer = maestro.MarkVisualizer()
marked_image = mark_visualizer.visualize(image=image, marks=marks)
```
![image-vs-marked-image](https://github.com/roboflow/multimodal-maestro/assets/26109316/92951ed2-65c0-475a-9279-6fd344757092)

- **prompt**

```python
prompt = "Find dog."

response = maestro.prompt_image(api_key=api_key, image=marked_image, prompt=prompt)
```

```
>>> "The dog is prominently featured in the center of the image with the label [9]."
```

- **extract related marks**

```python
masks = maestro.extract_relevant_masks(text=response, detections=refined_marks)
```

```
>>> {'6': array([
... [False, False, False, ..., False, False, False],
... [False, False, False, ..., False, False, False],
... [False, False, False, ..., False, False, False],
... ...,
... [ True, True, True, ..., False, False, False],
... [ True, True, True, ..., False, False, False],
... [ True, True, True, ..., False, False, False]])
... }
```

</details>

![multimodal-maestro](https://github.com/roboflow/multimodal-maestro/assets/26109316/c04f2b18-2a1d-4535-9582-e5d3ec0a926e)

## 🚧 roadmap

- [ ] Rewriting the `maestro` API.
- [ ] Update [HF space](https://huggingface.co/spaces/Roboflow/SoM).
- [ ] Documentation page.
- [ ] Add GroundingDINO prompting strategy.
- [ ] CovVLM demo.
- [ ] Qwen-VL demo.

## 💜 acknowledgement

- [Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding
in GPT-4V](https://arxiv.org/abs/2310.11441) by Jianwei Yang, Hao Zhang, Feng Li, Xueyan
Zou, Chunyuan Li, Jianfeng Gao.
- [The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)](https://arxiv.org/abs/2309.17421)
by Zhengyuan Yang, Linjie Li, Kevin Lin, Jianfeng Wang, Chung-Ching Lin, Zicheng Liu,
Lijuan Wang

## 🦸 contribution

We would love your help in making this repository even better! If you noticed any bug,
or if you have any suggestions for improvement, feel free to open an
We would love your help in making this repository even better! We are especially
looking for contributors with experience in fine-tuning vision-language models (VLMs).
If you notice any bugs or have suggestions for improvement, feel free to open an
[issue](https://github.com/roboflow/multimodal-maestro/issues) or submit a
[pull request](https://github.com/roboflow/multimodal-maestro/pulls).
1 change: 1 addition & 0 deletions maestro/cli/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@

2 changes: 2 additions & 0 deletions maestro/cli/env.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
DISABLE_RECIPE_IMPORTS_WARNINGS_ENV = "DISABLE_RECIPE_IMPORTS_WARNINGS"
DEFAULT_DISABLE_RECIPE_IMPORTS_WARNINGS_ENV = "False"
37 changes: 37 additions & 0 deletions maestro/cli/introspection.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
import os

import typer

from maestro.cli.env import DISABLE_RECIPE_IMPORTS_WARNINGS_ENV, \
DEFAULT_DISABLE_RECIPE_IMPORTS_WARNINGS_ENV
from maestro.cli.utils import str2bool


def find_training_recipes(app: typer.Typer) -> None:
try:
from maestro.trainer.models.florence_2.entrypoint import florence_2_app

app.add_typer(florence_2_app, name="florence2")
except Exception:
_warn_about_recipe_import_error(model_name="Florence 2")

try:
from maestro.trainer.models.paligemma.entrypoint import paligemma_app

app.add_typer(paligemma_app, name="paligemma")
except Exception:
_warn_about_recipe_import_error(model_name="PaliGemma")


def _warn_about_recipe_import_error(model_name: str) -> None:
disable_warnings = str2bool(
os.getenv(
DISABLE_RECIPE_IMPORTS_WARNINGS_ENV,
DEFAULT_DISABLE_RECIPE_IMPORTS_WARNINGS_ENV,
)
)
if disable_warnings:
return None
warning = typer.style("WARNING", fg=typer.colors.RED, bold=True)
message = "🚧 " + warning + f" cannot import recipe for {model_name}"
typer.echo(message)
15 changes: 15 additions & 0 deletions maestro/cli/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
import typer

from maestro.cli.introspection import find_training_recipes

app = typer.Typer()
find_training_recipes(app=app)


@app.command(help="Display information about maestro")
def info():
typer.echo("Welcome to maestro CLI. Let's train some VLM! 🏋")


if __name__ == "__main__":
app()
2 changes: 2 additions & 0 deletions maestro/cli/utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
def str2bool(value: str) -> bool:
return value.lower() in {"y", "t", "yes", "true"}
Empty file added maestro/trainer/__init__.py
Empty file.
1 change: 1 addition & 0 deletions maestro/trainer/common/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from maestro.trainer.common.utils.metrics import MeanAveragePrecisionMetric
Empty file.
5 changes: 5 additions & 0 deletions maestro/trainer/common/configuration/env.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
SEED_ENV = "SEED"
DEFAULT_SEED = "42"
CUDA_DEVICE_ENV = "CUDA_DEVICE"
DEFAULT_CUDA_DEVICE = "cuda:0"
HF_TOKEN_ENV = "HF_TOKEN"
Empty file.
50 changes: 50 additions & 0 deletions maestro/trainer/common/data_loaders/datasets.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
import json
import os
from typing import List, Dict, Any, Tuple

from PIL import Image
from transformers.pipelines.base import Dataset


class JSONLDataset:
def __init__(self, jsonl_file_path: str, image_directory_path: str):
self.jsonl_file_path = jsonl_file_path
self.image_directory_path = image_directory_path
self.entries = self._load_entries()

def _load_entries(self) -> List[Dict[str, Any]]:
entries = []
with open(self.jsonl_file_path, "r") as file:
for line in file:
data = json.loads(line)
entries.append(data)
return entries

def __len__(self) -> int:
return len(self.entries)

def __getitem__(self, idx: int) -> Tuple[Image.Image, Dict[str, Any]]:
if idx < 0 or idx >= len(self.entries):
raise IndexError("Index out of range")

entry = self.entries[idx]
image_path = os.path.join(self.image_directory_path, entry["image"])
try:
image = Image.open(image_path)
return (image, entry)
except FileNotFoundError:
raise FileNotFoundError(f"Image file {image_path} not found.")


class DetectionDataset(Dataset):
def __init__(self, jsonl_file_path: str, image_directory_path: str):
self.dataset = JSONLDataset(jsonl_file_path, image_directory_path)

def __len__(self):
return len(self.dataset)

def __getitem__(self, idx):
image, data = self.dataset[idx]
prefix = data["prefix"]
suffix = data["suffix"]
return prefix, suffix, image
31 changes: 31 additions & 0 deletions maestro/trainer/common/data_loaders/jsonl.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
from __future__ import annotations

import random
from typing import List

from torch.utils.data import Dataset

from maestro.trainer.common.utils.file_system import read_jsonl


class JSONLDataset(Dataset):
# TODO: implementation could be better - avoiding loading
# whole files to memory

@classmethod
def from_jsonl_file(cls, path: str) -> JSONLDataset:
file_content = read_jsonl(path=path)
random.shuffle(file_content)
return cls(jsons=file_content)

def __init__(self, jsons: List[dict]):
self.jsons = jsons

def __getitem__(self, index):
return self.jsons[index]

def __len__(self):
return len(self.jsons)

def shuffle(self):
random.shuffle(self.jsons)
Empty file.
Loading

0 comments on commit ccd268c

Please sign in to comment.