Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

maestro Florence-2 fine-tuning #33

Merged
merged 76 commits into from
Sep 11, 2024
Merged
Show file tree
Hide file tree
Changes from 72 commits
Commits
Show all changes
76 commits
Select commit Hold shift + click to select a range
a494bd2
Add initial setup
PawelPeczek-Roboflow Aug 26, 2024
b3eee67
Add basic utils
PawelPeczek-Roboflow Aug 26, 2024
8a0713e
Add basic training utils
PawelPeczek-Roboflow Aug 26, 2024
2d6e4cf
Fix wrong typing
PawelPeczek-Roboflow Aug 27, 2024
aded4dd
use reproducibility utils
PawelPeczek-Roboflow Aug 27, 2024
263b7fe
Add basic utils for florence
PawelPeczek-Roboflow Aug 29, 2024
f9abed8
Fix prompting
PawelPeczek-Roboflow Aug 29, 2024
c65fc44
Fix prompting
PawelPeczek-Roboflow Aug 29, 2024
0c83e90
Fix prompting
PawelPeczek-Roboflow Aug 29, 2024
9fbfc10
Fix prompting
PawelPeczek-Roboflow Aug 29, 2024
453d4d1
Fix prompting
PawelPeczek-Roboflow Aug 29, 2024
45a04da
Fix prompting
PawelPeczek-Roboflow Aug 29, 2024
13d9d36
Fix prompting
PawelPeczek-Roboflow Aug 29, 2024
e4e9fee
Scratch of training loop
PawelPeczek-Roboflow Aug 29, 2024
785014c
Fix bug
PawelPeczek-Roboflow Aug 29, 2024
bf2b3e0
Fix bug
PawelPeczek-Roboflow Aug 29, 2024
d21333e
Fix bug
PawelPeczek-Roboflow Aug 29, 2024
b870667
Fix loss display
PawelPeczek-Roboflow Aug 29, 2024
d4bb6f9
Fix bug with model loading
PawelPeczek-Roboflow Aug 29, 2024
4a24e97
Add training summary
PawelPeczek-Roboflow Aug 29, 2024
44a2d0b
Fix visualisation bug
PawelPeczek-Roboflow Aug 29, 2024
5048079
Fix visualisation bug
PawelPeczek-Roboflow Aug 29, 2024
7c62cae
Add metrics plots
PawelPeczek-Roboflow Aug 29, 2024
783dc88
Add metrics plots
PawelPeczek-Roboflow Aug 29, 2024
dbe7593
Add metrics plots
PawelPeczek-Roboflow Aug 29, 2024
2fd4d7d
Fix minor issues
PawelPeczek-Roboflow Aug 29, 2024
6ff1926
fix: current Florence-2 training pipeline is missing `timm`
SkalskiP Sep 3, 2024
47aef06
update maestro `README.md`
SkalskiP Sep 4, 2024
aa0708c
Merge pull request #31 from roboflow/feature/foundations_of_training_…
SkalskiP Sep 4, 2024
50b4876
Merge pull request #32 from roboflow/feature/foundations_of_training_…
SkalskiP Sep 4, 2024
36251e5
wip
SkalskiP Sep 4, 2024
5f025af
ready for test
SkalskiP Sep 4, 2024
166f4ec
small fix
SkalskiP Sep 4, 2024
7ab38eb
ready for test
SkalskiP Sep 4, 2024
e0eca6b
Merge pull request #34 from roboflow/feature/foundations_of_training_…
SkalskiP Sep 5, 2024
8219f5f
initial refactor
SkalskiP Sep 5, 2024
1b4a224
small fix
SkalskiP Sep 5, 2024
ee1a6fa
Add first scratch of implementation for maestro CLI
PawelPeczek-Roboflow Sep 5, 2024
9e52912
up
SkalskiP Sep 9, 2024
4c3fbd0
test mAP metic
SkalskiP Sep 9, 2024
ad6e6c9
mAP refactor
SkalskiP Sep 9, 2024
3580ec8
small fix
SkalskiP Sep 9, 2024
4543159
debug
SkalskiP Sep 9, 2024
3fe24ce
debug
SkalskiP Sep 9, 2024
e1554e7
debug
SkalskiP Sep 9, 2024
5b42aec
retest
SkalskiP Sep 9, 2024
40a8bfa
debug
SkalskiP Sep 9, 2024
1db75ad
ready for test
SkalskiP Sep 9, 2024
bc0aff5
debug
SkalskiP Sep 9, 2024
2498de9
am I this stupid?
SkalskiP Sep 9, 2024
d363c66
clean up
SkalskiP Sep 9, 2024
9aedb29
test validation set result render
SkalskiP Sep 9, 2024
bb44d11
updated results display
SkalskiP Sep 9, 2024
ec2a324
cleanup
SkalskiP Sep 10, 2024
2723987
test new checkpoint management system
SkalskiP Sep 10, 2024
27c3cfd
more cleanup
SkalskiP Sep 10, 2024
758c72a
Merge pull request #36 from roboflow/feature/foundations_of_training_…
SkalskiP Sep 10, 2024
3e00b40
Merge branch 'feature/foundations_of_training' into feature/foundatio…
SkalskiP Sep 10, 2024
dad39ba
TrainingConfiguration filed names refactoer
SkalskiP Sep 10, 2024
672f27e
final tests before plugging in CLI
SkalskiP Sep 10, 2024
4a339a4
initial tests of CLI mode
SkalskiP Sep 10, 2024
c7c63b7
fix
SkalskiP Sep 10, 2024
5cc4220
fix `No such option: --mode `
SkalskiP Sep 10, 2024
518323c
fix 2 `No such option: --mode `
SkalskiP Sep 10, 2024
fb212ea
fix 3 `No such option: --mode `
SkalskiP Sep 10, 2024
f15b7a9
fix 4 `No such option: --mode `
SkalskiP Sep 10, 2024
566d9ca
fix 5 `No such option: --mode `
SkalskiP Sep 10, 2024
fb1c826
fix 6 `No such option: --mode `
SkalskiP Sep 10, 2024
d556a88
bring back Pawel's code with improvements
SkalskiP Sep 10, 2024
f46049e
remove Literal from command definitions
SkalskiP Sep 10, 2024
a2850ac
remove Union from command definitions
SkalskiP Sep 10, 2024
278918c
Merge pull request #35 from roboflow/feature/foundations_of_cli
SkalskiP Sep 10, 2024
d614d25
initial evaluate implementation
SkalskiP Sep 11, 2024
5aba660
adding `quickstart` section to `README.md`
SkalskiP Sep 11, 2024
50751d5
plug in `mean_average_precision` to Florence-2 CLI
SkalskiP Sep 11, 2024
3a82c11
Merge pull request #38 from roboflow/feature/florence_2_evaluate
SkalskiP Sep 11, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v2.3.0
hooks:
- id: check-yaml
- id: end-of-file-fixer
- id: trailing-whitespace
- repo: https://github.com/psf/black
rev: 24.8.0
hooks:
- id: black
args: [--line-length=120]
- repo: https://github.com/pre-commit/mirrors-mypy
rev: v1.11.2
hooks:
- id: mypy
- repo: https://github.com/PyCQA/flake8
rev: 7.1.1
hooks:
- id: flake8
args: [--max-line-length=120]
133 changes: 13 additions & 120 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,142 +1,35 @@

<div align="center">

<h1>multimodal-maestro</h1>
<h1>maestro</h1>

<br>

[![version](https://badge.fury.io/py/maestro.svg)](https://badge.fury.io/py/maestro)
[![license](https://img.shields.io/pypi/l/maestro)](https://github.com/roboflow/multimodal-maestro/blob/main/LICENSE)
[![python-version](https://img.shields.io/pypi/pyversions/maestro)](https://badge.fury.io/py/maestro)
[![Gradio](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/Roboflow/SoM)
[![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/roboflow/multimodal-maestro/blob/develop/cookbooks/multimodal_maestro_gpt_4_vision.ipynb)
<p>coming: when it's ready...</p>

</div>

## 👋 hello

Multimodal-Maestro gives you more control over large multimodal models to get the
outputs you want. With more effective prompting tactics, you can get multimodal models
to do tasks you didn't know (or think!) were possible. Curious how it works? Try our
[HF space](https://huggingface.co/spaces/Roboflow/SoM)!
**maestro** is a tool designed to streamline and accelerate the fine-tuning process for
multimodal models. It provides ready-to-use recipes for fine-tuning popular
vision-language models (VLMs) such as **Florence-2**, **PaliGemma**, and
**Phi-3.5 Vision** on downstream vision-language tasks.

## 💻 install

⚠️ Our package has been renamed to `maestro`. Install the package in a
[**3.11>=Python>=3.8**](https://www.python.org/) environment.
Pip install the supervision package in a
[**Python>=3.8**](https://www.python.org/) environment.

```bash
pip install maestro
```

## 🔌 API

🚧 The project is still under construction. The redesigned API is coming soon.

![maestro-docs-Snap](https://github.com/roboflow/multimodal-maestro/assets/26109316/a787b7c0-527e-465a-9ca9-d46f4d63ea53)

## 🧑‍🍳 prompting cookbooks

| Description | Colab |
|:----------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|
| Prompt LMMs with Multimodal Maestro | [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/roboflow/multimodal-maestro/blob/develop/cookbooks/multimodal_maestro_gpt_4_vision.ipynb) |
| Manually annotate ONE image and let GPT-4V annotate ALL of them | [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/roboflow/multimodal-maestro/blob/develop/cookbooks/grounding_dino_and_gpt4_vision.ipynb) |


## 🚀 example

```
Find dog.

>>> The dog is prominently featured in the center of the image with the label [9].
```

<details close>
<summary>👉 read more</summary>

<br>

- **load image**

```python
import cv2

image = cv2.imread("...")
```

- **create and refine marks**

```python
import maestro

generator = maestro.SegmentAnythingMarkGenerator(device='cuda')
marks = generator.generate(image=image)
marks = maestro.refine_marks(marks=marks)
```

- **visualize marks**

```python
mark_visualizer = maestro.MarkVisualizer()
marked_image = mark_visualizer.visualize(image=image, marks=marks)
```
![image-vs-marked-image](https://github.com/roboflow/multimodal-maestro/assets/26109316/92951ed2-65c0-475a-9279-6fd344757092)

- **prompt**

```python
prompt = "Find dog."

response = maestro.prompt_image(api_key=api_key, image=marked_image, prompt=prompt)
```

```
>>> "The dog is prominently featured in the center of the image with the label [9]."
```

- **extract related marks**

```python
masks = maestro.extract_relevant_masks(text=response, detections=refined_marks)
```

```
>>> {'6': array([
... [False, False, False, ..., False, False, False],
... [False, False, False, ..., False, False, False],
... [False, False, False, ..., False, False, False],
... ...,
... [ True, True, True, ..., False, False, False],
... [ True, True, True, ..., False, False, False],
... [ True, True, True, ..., False, False, False]])
... }
```

</details>

![multimodal-maestro](https://github.com/roboflow/multimodal-maestro/assets/26109316/c04f2b18-2a1d-4535-9582-e5d3ec0a926e)
Documentation and Florence-2 fine-tuning examples for object detection and VQA coming
soon.

## 🚧 roadmap

- [ ] Rewriting the `maestro` API.
- [ ] Update [HF space](https://huggingface.co/spaces/Roboflow/SoM).
- [ ] Documentation page.
- [ ] Add GroundingDINO prompting strategy.
- [ ] CovVLM demo.
- [ ] Qwen-VL demo.

## 💜 acknowledgement

- [Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding
in GPT-4V](https://arxiv.org/abs/2310.11441) by Jianwei Yang, Hao Zhang, Feng Li, Xueyan
Zou, Chunyuan Li, Jianfeng Gao.
- [The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)](https://arxiv.org/abs/2309.17421)
by Zhengyuan Yang, Linjie Li, Kevin Lin, Jianfeng Wang, Chung-Ching Lin, Zicheng Liu,
Lijuan Wang

## 🦸 contribution

We would love your help in making this repository even better! If you noticed any bug,
or if you have any suggestions for improvement, feel free to open an
[issue](https://github.com/roboflow/multimodal-maestro/issues) or submit a
[pull request](https://github.com/roboflow/multimodal-maestro/pulls).
- [ ] Release a CLI for predefined fine-tuning recipes.
- [ ] Multi-GPU fine-tuning support.
- [ ] Allow multi-dataset fine-tuning and support multiple tasks at the same time.
1 change: 1 addition & 0 deletions maestro/cli/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@

2 changes: 2 additions & 0 deletions maestro/cli/env.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
DISABLE_RECIPE_IMPORTS_WARNINGS_ENV = "DISABLE_RECIPE_IMPORTS_WARNINGS"
DEFAULT_DISABLE_RECIPE_IMPORTS_WARNINGS_ENV = "False"
37 changes: 37 additions & 0 deletions maestro/cli/introspection.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
import os

import typer

from maestro.cli.env import DISABLE_RECIPE_IMPORTS_WARNINGS_ENV, \
DEFAULT_DISABLE_RECIPE_IMPORTS_WARNINGS_ENV
from maestro.cli.utils import str2bool


def find_training_recipes(app: typer.Typer) -> None:
try:
from maestro.trainer.models.florence_2.entrypoint import florence_2_app

app.add_typer(florence_2_app, name="florence2")
except Exception:
_warn_about_recipe_import_error(model_name="Florence 2")

try:
from maestro.trainer.models.paligemma.entrypoint import paligemma_app

app.add_typer(paligemma_app, name="paligemma")
except Exception:
_warn_about_recipe_import_error(model_name="PaliGemma")


def _warn_about_recipe_import_error(model_name: str) -> None:
disable_warnings = str2bool(
os.getenv(
DISABLE_RECIPE_IMPORTS_WARNINGS_ENV,
DEFAULT_DISABLE_RECIPE_IMPORTS_WARNINGS_ENV,
)
)
if disable_warnings:
return None
warning = typer.style("WARNING", fg=typer.colors.RED, bold=True)
message = "🚧 " + warning + f" cannot import recipe for {model_name}"
typer.echo(message)
15 changes: 15 additions & 0 deletions maestro/cli/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
import typer

from maestro.cli.introspection import find_training_recipes

app = typer.Typer()
find_training_recipes(app=app)


@app.command(help="Display information about maestro")
def info():
typer.echo("Welcome to maestro CLI. Let's train some VLM! 🏋")


if __name__ == "__main__":
app()
2 changes: 2 additions & 0 deletions maestro/cli/utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
def str2bool(value: str) -> bool:
return value.lower() in {"y", "t", "yes", "true"}
Empty file added maestro/trainer/__init__.py
Empty file.
Empty file.
Empty file.
5 changes: 5 additions & 0 deletions maestro/trainer/common/configuration/env.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
SEED_ENV = "SEED"
DEFAULT_SEED = "42"
CUDA_DEVICE_ENV = "CUDA_DEVICE"
DEFAULT_CUDA_DEVICE = "cuda:0"
HF_TOKEN_ENV = "HF_TOKEN"
Empty file.
50 changes: 50 additions & 0 deletions maestro/trainer/common/data_loaders/datasets.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
import json
import os
from typing import List, Dict, Any, Tuple

from PIL import Image
from transformers.pipelines.base import Dataset


class JSONLDataset:
def __init__(self, jsonl_file_path: str, image_directory_path: str):
self.jsonl_file_path = jsonl_file_path
self.image_directory_path = image_directory_path
self.entries = self._load_entries()

def _load_entries(self) -> List[Dict[str, Any]]:
entries = []
with open(self.jsonl_file_path, "r") as file:
for line in file:
data = json.loads(line)
entries.append(data)
return entries

def __len__(self) -> int:
return len(self.entries)

def __getitem__(self, idx: int) -> Tuple[Image.Image, Dict[str, Any]]:
if idx < 0 or idx >= len(self.entries):
raise IndexError("Index out of range")

entry = self.entries[idx]
image_path = os.path.join(self.image_directory_path, entry["image"])
try:
image = Image.open(image_path)
return (image, entry)
except FileNotFoundError:
raise FileNotFoundError(f"Image file {image_path} not found.")


class DetectionDataset(Dataset):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is more Florence-2 Dataset than Detection Dataset. I assume we won't change that structure for other tasks?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, let's stick to model-datasets for now

def __init__(self, jsonl_file_path: str, image_directory_path: str):
self.dataset = JSONLDataset(jsonl_file_path, image_directory_path)

def __len__(self):
return len(self.dataset)

def __getitem__(self, idx):
image, data = self.dataset[idx]
prefix = data["prefix"]
suffix = data["suffix"]
return prefix, suffix, image
31 changes: 31 additions & 0 deletions maestro/trainer/common/data_loaders/jsonl.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
from __future__ import annotations

import random
from typing import List

from torch.utils.data import Dataset

from maestro.trainer.common.utils.file_system import read_jsonl


class JSONLDataset(Dataset):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure we do not need shuffle. torch DataLoader allows you to pass shuffle=True

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also feels like JSONLDataset should be unified across all models.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

# TODO: implementation could be better - avoiding loading
# whole files to memory

@classmethod
def from_jsonl_file(cls, path: str) -> JSONLDataset:
file_content = read_jsonl(path=path)
random.shuffle(file_content)
return cls(jsons=file_content)

def __init__(self, jsons: List[dict]):
self.jsons = jsons

def __getitem__(self, index):
return self.jsons[index]

def __len__(self):
return len(self.jsons)

def shuffle(self):
random.shuffle(self.jsons)
Empty file.
59 changes: 59 additions & 0 deletions maestro/trainer/common/utils/file_system.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
import json
import os
from glob import glob
from typing import Union, List


def read_jsonl(path: str) -> List[dict]:
file_lines = read_file(
path=path,
split_lines=True,
)
return [json.loads(line) for line in file_lines]


def read_file(
path: str,
split_lines: bool = False,
strip_white_spaces: bool = False,
line_separator: str = "\n",
) -> Union[str, List[str]]:
with open(path, "r") as f:
file_content = f.read()
if strip_white_spaces:
file_content = file_content.strip()
if not split_lines:
return file_content
lines = file_content.split(line_separator)
if not strip_white_spaces:
return lines
return [line.strip() for line in lines]


def save_json(path: str, content: dict) -> None:
ensure_parent_dir_exists(path=path)
with open(path, "w") as f:
json.dump(content, f, indent=4)


def ensure_parent_dir_exists(path: str) -> None:
parent_dir = os.path.dirname(os.path.abspath(path))
os.makedirs(parent_dir, exist_ok=True)


def create_new_run_directory(base_output_dir: str) -> str:
"""
Creates a new numbered directory for the current training run.

Args:
base_output_dir (str): The base directory where all run directories are stored.

Returns:
str: The path to the newly created run directory.
"""
base_output_dir = os.path.abspath(base_output_dir)
existing_run_dirs = [d for d in glob(os.path.join(base_output_dir, "*")) if os.path.isdir(d)]
new_run_number = len(existing_run_dirs) + 1
new_run_dir = os.path.join(base_output_dir, str(new_run_number))
os.makedirs(new_run_dir, exist_ok=True)
return new_run_dir
Loading