Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docs] Fix transformers_vision multiple Images example #1370

Merged
merged 1 commit into from
Jan 12, 2025

Conversation

gante
Copy link
Contributor

@gante gante commented Jan 9, 2025

transformers maintainer here 👋

Was trying out your transformers_vision example, and ran into a bug in the example. In a nutshell, the original demo prompted the model with 3 images, but only two were being passed.

Stand-alone script for reproduction, based on the example:
import outlines
from transformers import LlavaNextForConditionalGeneration
import torch

model = outlines.models.transformers_vision(
    "llava-hf/llava-v1.6-mistral-7b-hf",
    model_class=LlavaNextForConditionalGeneration,
)


from PIL import Image
from io import BytesIO
from urllib.request import urlopen

def img_from_url(url):
    img_byte_stream = BytesIO(urlopen(url).read())
    return Image.open(img_byte_stream).convert("RGB")


description_generator = outlines.generate.text(model)
foo = description_generator(
    "<image> detailed description:",
    [img_from_url("https://upload.wikimedia.org/wikipedia/commons/2/25/Siam_lilacpoint.jpg")]
)
print(foo)


image_urls = [
    "https://cdn1.byjus.com/wp-content/uploads/2020/08/ShapeArtboard-1-copy-3.png",  # triangle
    "https://cdn1.byjus.com/wp-content/uploads/2020/08/ShapeArtboard-1-copy-11.png",  # hexagon
]
description_generator = outlines.generate.text(model)
foo = description_generator(
    "<image><image>What shapes are present?",
    list(map(img_from_url, image_urls)),
)
print(foo)


pattern = "Mercury|Venus|Earth|Mars|Saturn|Jupiter|Neptune|Uranus|Pluto"
planet_generator = outlines.generate.regex(model, pattern)

foo = planet_generator(
    "What planet is this: <image>",
    [img_from_url("https://upload.wikimedia.org/wikipedia/commons/e/e3/Saturn_from_Cassini_Orbiter_%282004-10-06%29.jpg")]
)
print(foo)


from pydantic import BaseModel
from typing import List, Optional

class ImageData(BaseModel):
    caption: str
    tags_list: List[str]
    object_list: List[str]
    is_photo: bool

image_data_generator = outlines.generate.json(model, ImageData)

foo = image_data_generator(
    "<image> detailed JSON metadata:",
    [img_from_url("https://upload.wikimedia.org/wikipedia/commons/9/98/Aldrin_Apollo_11_original.jpg")]
)
print(foo)
Original traceback error:
Traceback (most recent call last):
  File "/home/joao/transformers/../joao_scripts/dbg.py", line 38, in <module>                                                                             foo = description_generator(                                                                                                                        File "/home/joao/venvs/hf/lib/python3.10/site-packages/outlines/generate/api.py", line 556, in __call__
    completions = self.model.generate(
  File "/home/joao/venvs/hf/lib/python3.10/site-packages/outlines/models/transformers_vision.py", line 46, in generate
    inputs = self.processor(                                                                                                                            File "/home/joao/transformers/src/transformers/models/llava_next/processing_llava_next.py", line 165, in __call__
    image_size = next(image_sizes)
StopIteration

The other fixes are typos that Grammarly detected :)

Copy link
Contributor

@cpfiffer cpfiffer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@rlouf rlouf merged commit 03d3878 into dottxt-ai:main Jan 12, 2025
@rlouf
Copy link
Member

rlouf commented Jan 12, 2025

Much appreciated, thank you!

@gante gante deleted the patch-2 branch January 13, 2025 09:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants