-
Notifications
You must be signed in to change notification settings - Fork 106
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #33 from roboflow/feature/foundations_of_training
maestro Florence-2 fine-tuning
- Loading branch information
Showing
36 changed files
with
1,921 additions
and
2,031 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
repos: | ||
- repo: https://github.com/pre-commit/pre-commit-hooks | ||
rev: v2.3.0 | ||
hooks: | ||
- id: check-yaml | ||
- id: end-of-file-fixer | ||
- id: trailing-whitespace | ||
- repo: https://github.com/psf/black | ||
rev: 24.8.0 | ||
hooks: | ||
- id: black | ||
args: [--line-length=120] | ||
- repo: https://github.com/pre-commit/mirrors-mypy | ||
rev: v1.11.2 | ||
hooks: | ||
- id: mypy | ||
- repo: https://github.com/PyCQA/flake8 | ||
rev: 7.1.1 | ||
hooks: | ||
- id: flake8 | ||
args: [--max-line-length=120] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,142 +1,61 @@ | ||
|
||
<div align="center"> | ||
|
||
<h1>multimodal-maestro</h1> | ||
|
||
<br> | ||
<h1>maestro</h1> | ||
|
||
[![version](https://badge.fury.io/py/maestro.svg)](https://badge.fury.io/py/maestro) | ||
[![license](https://img.shields.io/pypi/l/maestro)](https://github.com/roboflow/multimodal-maestro/blob/main/LICENSE) | ||
[![python-version](https://img.shields.io/pypi/pyversions/maestro)](https://badge.fury.io/py/maestro) | ||
[![Gradio](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/Roboflow/SoM) | ||
[![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/roboflow/multimodal-maestro/blob/develop/cookbooks/multimodal_maestro_gpt_4_vision.ipynb) | ||
<p>coming: when it's ready...</p> | ||
|
||
</div> | ||
|
||
## 👋 hello | ||
|
||
Multimodal-Maestro gives you more control over large multimodal models to get the | ||
outputs you want. With more effective prompting tactics, you can get multimodal models | ||
to do tasks you didn't know (or think!) were possible. Curious how it works? Try our | ||
[HF space](https://huggingface.co/spaces/Roboflow/SoM)! | ||
**maestro** is a tool designed to streamline and accelerate the fine-tuning process for | ||
multimodal models. It provides ready-to-use recipes for fine-tuning popular | ||
vision-language models (VLMs) such as **Florence-2**, **PaliGemma**, and | ||
**Phi-3.5 Vision** on downstream vision-language tasks. | ||
|
||
## 💻 install | ||
|
||
⚠️ Our package has been renamed to `maestro`. Install the package in a | ||
[**3.11>=Python>=3.8**](https://www.python.org/) environment. | ||
Pip install the supervision package in a | ||
[**Python>=3.8**](https://www.python.org/) environment. | ||
|
||
```bash | ||
pip install maestro | ||
``` | ||
|
||
## 🔌 API | ||
## 🔥 quickstart | ||
|
||
🚧 The project is still under construction. The redesigned API is coming soon. | ||
### CLI | ||
|
||
![maestro-docs-Snap](https://github.com/roboflow/multimodal-maestro/assets/26109316/a787b7c0-527e-465a-9ca9-d46f4d63ea53) | ||
VLMs can be fine-tuned on downstream tasks directly from the command line with | ||
`maestro` command: | ||
|
||
## 🧑🍳 prompting cookbooks | ||
```bash | ||
maestro florence2 train --dataset='<DATASET_PATH>' --epochs=10 --batch-size=8 | ||
``` | ||
|
||
| Description | Colab | | ||
|:----------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:| | ||
| Prompt LMMs with Multimodal Maestro | [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/roboflow/multimodal-maestro/blob/develop/cookbooks/multimodal_maestro_gpt_4_vision.ipynb) | | ||
| Manually annotate ONE image and let GPT-4V annotate ALL of them | [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/roboflow/multimodal-maestro/blob/develop/cookbooks/grounding_dino_and_gpt4_vision.ipynb) | | ||
### SDK | ||
|
||
Alternatively, you can fine-tune VLMs using the Python SDK, which accepts the same | ||
arguments as the CLI example above: | ||
|
||
## 🚀 example | ||
```python | ||
from maestro.trainer.common import MeanAveragePrecisionMetric | ||
from maestro.trainer.models.florence_2 import train, TrainingConfiguration | ||
|
||
``` | ||
Find dog. | ||
config = TrainingConfiguration( | ||
dataset='<DATASET_PATH>', | ||
epochs=10, | ||
batch_size=8, | ||
metrics=[MeanAveragePrecisionMetric()] | ||
) | ||
|
||
>>> The dog is prominently featured in the center of the image with the label [9]. | ||
train(config) | ||
``` | ||
|
||
<details close> | ||
<summary>👉 read more</summary> | ||
|
||
<br> | ||
|
||
- **load image** | ||
|
||
```python | ||
import cv2 | ||
|
||
image = cv2.imread("...") | ||
``` | ||
|
||
- **create and refine marks** | ||
|
||
```python | ||
import maestro | ||
|
||
generator = maestro.SegmentAnythingMarkGenerator(device='cuda') | ||
marks = generator.generate(image=image) | ||
marks = maestro.refine_marks(marks=marks) | ||
``` | ||
|
||
- **visualize marks** | ||
|
||
```python | ||
mark_visualizer = maestro.MarkVisualizer() | ||
marked_image = mark_visualizer.visualize(image=image, marks=marks) | ||
``` | ||
![image-vs-marked-image](https://github.com/roboflow/multimodal-maestro/assets/26109316/92951ed2-65c0-475a-9279-6fd344757092) | ||
|
||
- **prompt** | ||
|
||
```python | ||
prompt = "Find dog." | ||
|
||
response = maestro.prompt_image(api_key=api_key, image=marked_image, prompt=prompt) | ||
``` | ||
|
||
``` | ||
>>> "The dog is prominently featured in the center of the image with the label [9]." | ||
``` | ||
|
||
- **extract related marks** | ||
|
||
```python | ||
masks = maestro.extract_relevant_masks(text=response, detections=refined_marks) | ||
``` | ||
|
||
``` | ||
>>> {'6': array([ | ||
... [False, False, False, ..., False, False, False], | ||
... [False, False, False, ..., False, False, False], | ||
... [False, False, False, ..., False, False, False], | ||
... ..., | ||
... [ True, True, True, ..., False, False, False], | ||
... [ True, True, True, ..., False, False, False], | ||
... [ True, True, True, ..., False, False, False]]) | ||
... } | ||
``` | ||
|
||
</details> | ||
|
||
![multimodal-maestro](https://github.com/roboflow/multimodal-maestro/assets/26109316/c04f2b18-2a1d-4535-9582-e5d3ec0a926e) | ||
|
||
## 🚧 roadmap | ||
|
||
- [ ] Rewriting the `maestro` API. | ||
- [ ] Update [HF space](https://huggingface.co/spaces/Roboflow/SoM). | ||
- [ ] Documentation page. | ||
- [ ] Add GroundingDINO prompting strategy. | ||
- [ ] CovVLM demo. | ||
- [ ] Qwen-VL demo. | ||
|
||
## 💜 acknowledgement | ||
|
||
- [Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding | ||
in GPT-4V](https://arxiv.org/abs/2310.11441) by Jianwei Yang, Hao Zhang, Feng Li, Xueyan | ||
Zou, Chunyuan Li, Jianfeng Gao. | ||
- [The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)](https://arxiv.org/abs/2309.17421) | ||
by Zhengyuan Yang, Linjie Li, Kevin Lin, Jianfeng Wang, Chung-Ching Lin, Zicheng Liu, | ||
Lijuan Wang | ||
|
||
## 🦸 contribution | ||
|
||
We would love your help in making this repository even better! If you noticed any bug, | ||
or if you have any suggestions for improvement, feel free to open an | ||
We would love your help in making this repository even better! We are especially | ||
looking for contributors with experience in fine-tuning vision-language models (VLMs). | ||
If you notice any bugs or have suggestions for improvement, feel free to open an | ||
[issue](https://github.com/roboflow/multimodal-maestro/issues) or submit a | ||
[pull request](https://github.com/roboflow/multimodal-maestro/pulls). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
DISABLE_RECIPE_IMPORTS_WARNINGS_ENV = "DISABLE_RECIPE_IMPORTS_WARNINGS" | ||
DEFAULT_DISABLE_RECIPE_IMPORTS_WARNINGS_ENV = "False" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
import os | ||
|
||
import typer | ||
|
||
from maestro.cli.env import DISABLE_RECIPE_IMPORTS_WARNINGS_ENV, \ | ||
DEFAULT_DISABLE_RECIPE_IMPORTS_WARNINGS_ENV | ||
from maestro.cli.utils import str2bool | ||
|
||
|
||
def find_training_recipes(app: typer.Typer) -> None: | ||
try: | ||
from maestro.trainer.models.florence_2.entrypoint import florence_2_app | ||
|
||
app.add_typer(florence_2_app, name="florence2") | ||
except Exception: | ||
_warn_about_recipe_import_error(model_name="Florence 2") | ||
|
||
try: | ||
from maestro.trainer.models.paligemma.entrypoint import paligemma_app | ||
|
||
app.add_typer(paligemma_app, name="paligemma") | ||
except Exception: | ||
_warn_about_recipe_import_error(model_name="PaliGemma") | ||
|
||
|
||
def _warn_about_recipe_import_error(model_name: str) -> None: | ||
disable_warnings = str2bool( | ||
os.getenv( | ||
DISABLE_RECIPE_IMPORTS_WARNINGS_ENV, | ||
DEFAULT_DISABLE_RECIPE_IMPORTS_WARNINGS_ENV, | ||
) | ||
) | ||
if disable_warnings: | ||
return None | ||
warning = typer.style("WARNING", fg=typer.colors.RED, bold=True) | ||
message = "🚧 " + warning + f" cannot import recipe for {model_name}" | ||
typer.echo(message) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
import typer | ||
|
||
from maestro.cli.introspection import find_training_recipes | ||
|
||
app = typer.Typer() | ||
find_training_recipes(app=app) | ||
|
||
|
||
@app.command(help="Display information about maestro") | ||
def info(): | ||
typer.echo("Welcome to maestro CLI. Let's train some VLM! 🏋") | ||
|
||
|
||
if __name__ == "__main__": | ||
app() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
def str2bool(value: str) -> bool: | ||
return value.lower() in {"y", "t", "yes", "true"} |
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
from maestro.trainer.common.utils.metrics import MeanAveragePrecisionMetric |
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
SEED_ENV = "SEED" | ||
DEFAULT_SEED = "42" | ||
CUDA_DEVICE_ENV = "CUDA_DEVICE" | ||
DEFAULT_CUDA_DEVICE = "cuda:0" | ||
HF_TOKEN_ENV = "HF_TOKEN" |
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
import json | ||
import os | ||
from typing import List, Dict, Any, Tuple | ||
|
||
from PIL import Image | ||
from transformers.pipelines.base import Dataset | ||
|
||
|
||
class JSONLDataset: | ||
def __init__(self, jsonl_file_path: str, image_directory_path: str): | ||
self.jsonl_file_path = jsonl_file_path | ||
self.image_directory_path = image_directory_path | ||
self.entries = self._load_entries() | ||
|
||
def _load_entries(self) -> List[Dict[str, Any]]: | ||
entries = [] | ||
with open(self.jsonl_file_path, "r") as file: | ||
for line in file: | ||
data = json.loads(line) | ||
entries.append(data) | ||
return entries | ||
|
||
def __len__(self) -> int: | ||
return len(self.entries) | ||
|
||
def __getitem__(self, idx: int) -> Tuple[Image.Image, Dict[str, Any]]: | ||
if idx < 0 or idx >= len(self.entries): | ||
raise IndexError("Index out of range") | ||
|
||
entry = self.entries[idx] | ||
image_path = os.path.join(self.image_directory_path, entry["image"]) | ||
try: | ||
image = Image.open(image_path) | ||
return (image, entry) | ||
except FileNotFoundError: | ||
raise FileNotFoundError(f"Image file {image_path} not found.") | ||
|
||
|
||
class DetectionDataset(Dataset): | ||
def __init__(self, jsonl_file_path: str, image_directory_path: str): | ||
self.dataset = JSONLDataset(jsonl_file_path, image_directory_path) | ||
|
||
def __len__(self): | ||
return len(self.dataset) | ||
|
||
def __getitem__(self, idx): | ||
image, data = self.dataset[idx] | ||
prefix = data["prefix"] | ||
suffix = data["suffix"] | ||
return prefix, suffix, image |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
from __future__ import annotations | ||
|
||
import random | ||
from typing import List | ||
|
||
from torch.utils.data import Dataset | ||
|
||
from maestro.trainer.common.utils.file_system import read_jsonl | ||
|
||
|
||
class JSONLDataset(Dataset): | ||
# TODO: implementation could be better - avoiding loading | ||
# whole files to memory | ||
|
||
@classmethod | ||
def from_jsonl_file(cls, path: str) -> JSONLDataset: | ||
file_content = read_jsonl(path=path) | ||
random.shuffle(file_content) | ||
return cls(jsons=file_content) | ||
|
||
def __init__(self, jsons: List[dict]): | ||
self.jsons = jsons | ||
|
||
def __getitem__(self, index): | ||
return self.jsons[index] | ||
|
||
def __len__(self): | ||
return len(self.jsons) | ||
|
||
def shuffle(self): | ||
random.shuffle(self.jsons) |
Empty file.
Oops, something went wrong.