Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bugfix] Fix image input for Pixtral-HF #11741

Merged
merged 11 commits into from
Jan 8, 2025

Conversation

DarkLight1337
Copy link
Member

FIX #11726

@DarkLight1337 DarkLight1337 requested a review from mgoin January 4, 2025 16:40
Copy link

github-actions bot commented Jan 4, 2025

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

@DarkLight1337 DarkLight1337 changed the title [Bugfix] Fix single-image input for Pixtral [Bugfix] Fix image input for Pixtral-HF Jan 4, 2025
@DarkLight1337 DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 7, 2025
@DarkLight1337 DarkLight1337 marked this pull request as ready for review January 7, 2025 13:39
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Copy link
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your work! The outputs seem reasonable now and don't crash for either single or multi-image.

However I uncovered another issue with accuracy once I started running evals to validate. It seems to be that pixtral_hf accuracy has been affected since 0.6.5. I think we can consider it unrelated so I will open up a separate issue for it and consider this PR done, just FYI.

Reference results on HF model card, we will look at `MMMU (CoT) ~= 51%. Evals ran using mistral-evals

vLLM 0.6.4.post1, server and eval:

> uv pip install vllm==0.6.4.post1
> vllm serve nm-testing/pixtral-12b-FP8-dynamic --max-num-seqs 30 --max-model-len 30000 --limit-mm-per-prompt image=5 --port 9000

> python -m eval.run eval_vllm --model_name nm-testing/pixtral-12b-FP8-dynamic --url http://0.0.0.0:9000 --output_dir output/ --eval_name "mmmu"
...
================================================================================
Metrics:
{
    "explicit_prompt_relaxed_correctness": 0.5044444444444445,
    "anywhere_in_answer_relaxed_correctness": 0.5044444444444445
}
================================================================================

vLLM 0.6.5, server and eval:

> uv pip install vllm==0.6.5
> vllm serve nm-testing/pixtral-12b-FP8-dynamic --max-num-seqs 30 --max-model-len 30000 --limit-mm-per-prompt image=5 --port 9000

> python -m eval.run eval_vllm --model_name nm-testing/pixtral-12b-FP8-dynamic --url http://0.0.0.0:9000 --output_dir output/ --eval_name "mmmu"
...
================================================================================
Metrics:
{
    "explicit_prompt_relaxed_correctness": 0.0011111111111111111,
    "anywhere_in_answer_relaxed_correctness": 0.3466666666666667
}
================================================================================

vLLM on this branch, server and eval:

> uv pip install vllm==0.6.5
> vllm serve nm-testing/pixtral-12b-FP8-dynamic --max-num-seqs 30 --max-model-len 30000 --limit-mm-per-prompt image=5 --port 9000

> python -m eval.run eval_vllm --model_name nm-testing/pixtral-12b-FP8-dynamic --url http://0.0.0.0:9000 --output_dir output/ --eval_name "mmmu"
...
================================================================================
Metrics:
{
    "explicit_prompt_relaxed_correctness": 0.0011111111111111111,
    "anywhere_in_answer_relaxed_correctness": 0.3466666666666667
}
================================================================================

@@ -209,6 +212,31 @@ def load_nvlm_d(question: str, image_urls: List[str]):
)


def load_pixtral_hf(question: str, image_urls: List[str]) -> ModelRequestData:
model_name = "mistral-community/pixtral-12b"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you want to use chat template, you could use this model where I added it https://huggingface.co/mgoin/pixtral-12b

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the prompt format is straightforward enough that we don't need a chat template for this.

@DarkLight1337 DarkLight1337 merged commit 91445c7 into vllm-project:main Jan 8, 2025
54 checks passed
@DarkLight1337 DarkLight1337 deleted the fix-pixtral-hf branch January 8, 2025 02:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: PixtralHF inference broken since #11396
3 participants