Skip to content

Commit

Permalink
Merge pull request #32 from roboflow/feature/foundations_of_training_…
Browse files Browse the repository at this point in the history
…readme

updated project `README` to showcase new project profile
  • Loading branch information
SkalskiP authored Sep 4, 2024
2 parents aa0708c + 47aef06 commit 50b4876
Showing 1 changed file with 13 additions and 120 deletions.
133 changes: 13 additions & 120 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,142 +1,35 @@

<div align="center">

<h1>multimodal-maestro</h1>
<h1>maestro</h1>

<br>

[![version](https://badge.fury.io/py/maestro.svg)](https://badge.fury.io/py/maestro)
[![license](https://img.shields.io/pypi/l/maestro)](https://github.com/roboflow/multimodal-maestro/blob/main/LICENSE)
[![python-version](https://img.shields.io/pypi/pyversions/maestro)](https://badge.fury.io/py/maestro)
[![Gradio](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/Roboflow/SoM)
[![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/roboflow/multimodal-maestro/blob/develop/cookbooks/multimodal_maestro_gpt_4_vision.ipynb)
<p>coming: when it's ready...</p>

</div>

## 👋 hello

Multimodal-Maestro gives you more control over large multimodal models to get the
outputs you want. With more effective prompting tactics, you can get multimodal models
to do tasks you didn't know (or think!) were possible. Curious how it works? Try our
[HF space](https://huggingface.co/spaces/Roboflow/SoM)!
**maestro** is a tool designed to streamline and accelerate the fine-tuning process for
multimodal models. It provides ready-to-use recipes for fine-tuning popular
vision-language models (VLMs) such as **Florence-2**, **PaliGemma**, and
**Phi-3.5 Vision** on downstream vision-language tasks.

## 💻 install

⚠️ Our package has been renamed to `maestro`. Install the package in a
[**3.11>=Python>=3.8**](https://www.python.org/) environment.
Pip install the supervision package in a
[**Python>=3.8**](https://www.python.org/) environment.

```bash
pip install maestro
```

## 🔌 API

🚧 The project is still under construction. The redesigned API is coming soon.

![maestro-docs-Snap](https://github.com/roboflow/multimodal-maestro/assets/26109316/a787b7c0-527e-465a-9ca9-d46f4d63ea53)

## 🧑‍🍳 prompting cookbooks

| Description | Colab |
|:----------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|
| Prompt LMMs with Multimodal Maestro | [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/roboflow/multimodal-maestro/blob/develop/cookbooks/multimodal_maestro_gpt_4_vision.ipynb) |
| Manually annotate ONE image and let GPT-4V annotate ALL of them | [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/roboflow/multimodal-maestro/blob/develop/cookbooks/grounding_dino_and_gpt4_vision.ipynb) |


## 🚀 example

```
Find dog.
>>> The dog is prominently featured in the center of the image with the label [9].
```

<details close>
<summary>👉 read more</summary>

<br>

- **load image**

```python
import cv2

image = cv2.imread("...")
```

- **create and refine marks**

```python
import maestro

generator = maestro.SegmentAnythingMarkGenerator(device='cuda')
marks = generator.generate(image=image)
marks = maestro.refine_marks(marks=marks)
```

- **visualize marks**

```python
mark_visualizer = maestro.MarkVisualizer()
marked_image = mark_visualizer.visualize(image=image, marks=marks)
```
![image-vs-marked-image](https://github.com/roboflow/multimodal-maestro/assets/26109316/92951ed2-65c0-475a-9279-6fd344757092)

- **prompt**

```python
prompt = "Find dog."

response = maestro.prompt_image(api_key=api_key, image=marked_image, prompt=prompt)
```

```
>>> "The dog is prominently featured in the center of the image with the label [9]."
```

- **extract related marks**

```python
masks = maestro.extract_relevant_masks(text=response, detections=refined_marks)
```

```
>>> {'6': array([
... [False, False, False, ..., False, False, False],
... [False, False, False, ..., False, False, False],
... [False, False, False, ..., False, False, False],
... ...,
... [ True, True, True, ..., False, False, False],
... [ True, True, True, ..., False, False, False],
... [ True, True, True, ..., False, False, False]])
... }
```

</details>

![multimodal-maestro](https://github.com/roboflow/multimodal-maestro/assets/26109316/c04f2b18-2a1d-4535-9582-e5d3ec0a926e)
Documentation and Florence-2 fine-tuning examples for object detection and VQA coming
soon.

## 🚧 roadmap

- [ ] Rewriting the `maestro` API.
- [ ] Update [HF space](https://huggingface.co/spaces/Roboflow/SoM).
- [ ] Documentation page.
- [ ] Add GroundingDINO prompting strategy.
- [ ] CovVLM demo.
- [ ] Qwen-VL demo.

## 💜 acknowledgement

- [Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding
in GPT-4V](https://arxiv.org/abs/2310.11441) by Jianwei Yang, Hao Zhang, Feng Li, Xueyan
Zou, Chunyuan Li, Jianfeng Gao.
- [The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)](https://arxiv.org/abs/2309.17421)
by Zhengyuan Yang, Linjie Li, Kevin Lin, Jianfeng Wang, Chung-Ching Lin, Zicheng Liu,
Lijuan Wang

## 🦸 contribution

We would love your help in making this repository even better! If you noticed any bug,
or if you have any suggestions for improvement, feel free to open an
[issue](https://github.com/roboflow/multimodal-maestro/issues) or submit a
[pull request](https://github.com/roboflow/multimodal-maestro/pulls).
- [ ] Release a CLI for predefined fine-tuning recipes.
- [ ] Multi-GPU fine-tuning support.
- [ ] Allow multi-dataset fine-tuning and support multiple tasks at the same time.

0 comments on commit 50b4876

Please sign in to comment.