-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix] Fix image input for Pixtral-HF #11741
[Bugfix] Fix image input for Pixtral-HF #11741
Conversation
Signed-off-by: DarkLight1337 <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
c4e0beb
to
aac372e
Compare
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your work! The outputs seem reasonable now and don't crash for either single or multi-image.
However I uncovered another issue with accuracy once I started running evals to validate. It seems to be that pixtral_hf accuracy has been affected since 0.6.5. I think we can consider it unrelated so I will open up a separate issue for it and consider this PR done, just FYI.
Reference results on HF model card, we will look at `MMMU (CoT) ~= 51%. Evals ran using mistral-evals
vLLM 0.6.4.post1, server and eval:
> uv pip install vllm==0.6.4.post1
> vllm serve nm-testing/pixtral-12b-FP8-dynamic --max-num-seqs 30 --max-model-len 30000 --limit-mm-per-prompt image=5 --port 9000
> python -m eval.run eval_vllm --model_name nm-testing/pixtral-12b-FP8-dynamic --url http://0.0.0.0:9000 --output_dir output/ --eval_name "mmmu"
...
================================================================================
Metrics:
{
"explicit_prompt_relaxed_correctness": 0.5044444444444445,
"anywhere_in_answer_relaxed_correctness": 0.5044444444444445
}
================================================================================
vLLM 0.6.5, server and eval:
> uv pip install vllm==0.6.5
> vllm serve nm-testing/pixtral-12b-FP8-dynamic --max-num-seqs 30 --max-model-len 30000 --limit-mm-per-prompt image=5 --port 9000
> python -m eval.run eval_vllm --model_name nm-testing/pixtral-12b-FP8-dynamic --url http://0.0.0.0:9000 --output_dir output/ --eval_name "mmmu"
...
================================================================================
Metrics:
{
"explicit_prompt_relaxed_correctness": 0.0011111111111111111,
"anywhere_in_answer_relaxed_correctness": 0.3466666666666667
}
================================================================================
vLLM on this branch, server and eval:
> uv pip install vllm==0.6.5
> vllm serve nm-testing/pixtral-12b-FP8-dynamic --max-num-seqs 30 --max-model-len 30000 --limit-mm-per-prompt image=5 --port 9000
> python -m eval.run eval_vllm --model_name nm-testing/pixtral-12b-FP8-dynamic --url http://0.0.0.0:9000 --output_dir output/ --eval_name "mmmu"
...
================================================================================
Metrics:
{
"explicit_prompt_relaxed_correctness": 0.0011111111111111111,
"anywhere_in_answer_relaxed_correctness": 0.3466666666666667
}
================================================================================
@@ -209,6 +212,31 @@ def load_nvlm_d(question: str, image_urls: List[str]): | |||
) | |||
|
|||
|
|||
def load_pixtral_hf(question: str, image_urls: List[str]) -> ModelRequestData: | |||
model_name = "mistral-community/pixtral-12b" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you want to use chat template, you could use this model where I added it https://huggingface.co/mgoin/pixtral-12b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the prompt format is straightforward enough that we don't need a chat template for this.
FIX #11726