[Bugfix] Fix image input for Pixtral-HF #11741

DarkLight1337 · 2025-01-04T16:40:40Z

Signed-off-by: DarkLight1337 <[email protected]>

github-actions · 2025-01-04T16:40:50Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

Signed-off-by: DarkLight1337 <[email protected]>

vllm/model_executor/models/utils.py

mgoin

Thank you for your work! The outputs seem reasonable now and don't crash for either single or multi-image.

However I uncovered another issue with accuracy once I started running evals to validate. It seems to be that pixtral_hf accuracy has been affected since 0.6.5. I think we can consider it unrelated so I will open up a separate issue for it and consider this PR done, just FYI.

Reference results on HF model card, we will look at `MMMU (CoT) ~= 51%. Evals ran using mistral-evals

vLLM 0.6.4.post1, server and eval:

> uv pip install vllm==0.6.4.post1
> vllm serve nm-testing/pixtral-12b-FP8-dynamic --max-num-seqs 30 --max-model-len 30000 --limit-mm-per-prompt image=5 --port 9000

> python -m eval.run eval_vllm --model_name nm-testing/pixtral-12b-FP8-dynamic --url http://0.0.0.0:9000 --output_dir output/ --eval_name "mmmu"
...
================================================================================
Metrics:
{
    "explicit_prompt_relaxed_correctness": 0.5044444444444445,
    "anywhere_in_answer_relaxed_correctness": 0.5044444444444445
}
================================================================================

vLLM 0.6.5, server and eval:

> uv pip install vllm==0.6.5
> vllm serve nm-testing/pixtral-12b-FP8-dynamic --max-num-seqs 30 --max-model-len 30000 --limit-mm-per-prompt image=5 --port 9000

> python -m eval.run eval_vllm --model_name nm-testing/pixtral-12b-FP8-dynamic --url http://0.0.0.0:9000 --output_dir output/ --eval_name "mmmu"
...
================================================================================
Metrics:
{
    "explicit_prompt_relaxed_correctness": 0.0011111111111111111,
    "anywhere_in_answer_relaxed_correctness": 0.3466666666666667
}
================================================================================

vLLM on this branch, server and eval:

> uv pip install vllm==0.6.5
> vllm serve nm-testing/pixtral-12b-FP8-dynamic --max-num-seqs 30 --max-model-len 30000 --limit-mm-per-prompt image=5 --port 9000

> python -m eval.run eval_vllm --model_name nm-testing/pixtral-12b-FP8-dynamic --url http://0.0.0.0:9000 --output_dir output/ --eval_name "mmmu"
...
================================================================================
Metrics:
{
    "explicit_prompt_relaxed_correctness": 0.0011111111111111111,
    "anywhere_in_answer_relaxed_correctness": 0.3466666666666667
}
================================================================================

mgoin · 2025-01-07T17:36:44Z

examples/offline_inference_vision_language_multi_image.py

@@ -209,6 +212,31 @@ def load_nvlm_d(question: str, image_urls: List[str]):
    )


+def load_pixtral_hf(question: str, image_urls: List[str]) -> ModelRequestData:
+    model_name = "mistral-community/pixtral-12b"


if you want to use chat template, you could use this model where I added it https://huggingface.co/mgoin/pixtral-12b

I think the prompt format is straightforward enough that we don't need a chat template for this.

Fix single-image input for Pixtral

762b092

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 requested a review from mgoin January 4, 2025 16:40

DarkLight1337 changed the title ~~[Bugfix] Fix single-image input for Pixtral~~ [Bugfix] Fix image input for Pixtral-HF Jan 4, 2025

DarkLight1337 added 6 commits January 7, 2025 03:23

Merge branch 'main' into fix-pixtral-hf

e9a0934

Merge branch 'main' into fix-pixtral-hf

166970f

Fix wrong patch size

c4d6836

Signed-off-by: DarkLight1337 <[email protected]>

Merge branch 'main' into fix-pixtral-hf

9270036

Add and update multi-image examples

7d394b5

Signed-off-by: DarkLight1337 <[email protected]>

Fix

aac372e

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 force-pushed the fix-pixtral-hf branch from c4e0beb to aac372e Compare January 7, 2025 13:22

DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 7, 2025

DarkLight1337 marked this pull request as ready for review January 7, 2025 13:39

DarkLight1337 added 4 commits January 7, 2025 13:41

Fix wrong function

77e0588

Signed-off-by: DarkLight1337 <[email protected]>

Fix

ac26cd2

Signed-off-by: DarkLight1337 <[email protected]>

Merge branch 'main' into fix-pixtral-hf

23239f5

Fix

23dfc45

Signed-off-by: DarkLight1337 <[email protected]>

comaniac approved these changes Jan 7, 2025

View reviewed changes

vllm/model_executor/models/utils.py Show resolved Hide resolved

mgoin approved these changes Jan 7, 2025

View reviewed changes

mgoin mentioned this pull request Jan 7, 2025

[Bug]: PixtralHF accuracy on MMMU regressed since 0.6.4.post1 #11816

Open

1 task

DarkLight1337 merged commit 91445c7 into vllm-project:main Jan 8, 2025
54 checks passed

DarkLight1337 deleted the fix-pixtral-hf branch January 8, 2025 02:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] Fix image input for Pixtral-HF #11741

[Bugfix] Fix image input for Pixtral-HF #11741

DarkLight1337 commented Jan 4, 2025

github-actions bot commented Jan 4, 2025

mgoin left a comment

mgoin Jan 7, 2025

DarkLight1337 Jan 8, 2025

[Bugfix] Fix image input for Pixtral-HF #11741

[Bugfix] Fix image input for Pixtral-HF #11741

Conversation

DarkLight1337 commented Jan 4, 2025

github-actions bot commented Jan 4, 2025

mgoin left a comment

Choose a reason for hiding this comment

mgoin Jan 7, 2025

Choose a reason for hiding this comment

DarkLight1337 Jan 8, 2025

Choose a reason for hiding this comment