Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Misc] add mm_processor_kwargs to extra_body for Qwen2.5-VL #13533

Merged
merged 6 commits into from
Feb 20, 2025

Conversation

wulipc
Copy link
Contributor

@wulipc wulipc commented Feb 19, 2025

In Qwen2.5-VL online inference, the fps parameter in mm_processor_kwargs is essential for accurately calculating the second_pre_grid_t value. However, the OpenAI interface currently does not support passing mm_processor_kwargs via extra_body. This PR fixes this problem (the previous issue #11652).

You can now interact with the vLLM server using the following example, which has been self-tested and verified to work correctly. If additional test cases are required, please tell me where they should be added. @DarkLight1337

FIX #11652

import base64
import numpy as np
from PIL import Image
from io import BytesIO
from openai import OpenAI
from qwen_vl_utils import process_vision_info


# Set OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8899/v1"

client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)


video_messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": [
        {"type": "text", "text": "请用表格总结一下视频中的商品特点"},
        {
            "type": "video",
            "video": "https://duguang-labelling.oss-cn-shanghai.aliyuncs.com/qiansun/video_ocr/videos/50221078283.mp4",
            "total_pixels": 20480 * 28 * 28, "min_pixels": 16 * 28 * 2, 
            'fps': 3.0  # The default value is 2.0, but for demonstration purposes, we set it to 3.0.
        }]
    },
]


def prepare_message_for_vllm(content_messages):
    """
    The frame extraction logic for videos in `vLLM` differs from that of `qwen_vl_utils`.
    Here, we utilize `qwen_vl_utils` to extract video frames, with the `media_typ`e of the video explicitly set to `video/jpeg`.
    By doing so, vLLM will no longer attempt to extract frames from the input base64-encoded images.
    """
    vllm_messages, fps_list = [], []
    for message in content_messages:
        message_content_list = message["content"]
        if not isinstance(message_content_list, list):
            vllm_messages.append(message)
            continue

        new_content_list = []
        for part_message in message_content_list:
            if 'video' in part_message:
                video_message = [{'content': [part_message]}]
                image_inputs, video_inputs, video_kwargs = process_vision_info(video_message, return_video_kwargs=True)
                assert video_inputs is not None, "video_inputs should not be None"
                video_input = (video_inputs.pop()).permute(0, 2, 3, 1).numpy().astype(np.uint8)
                print("video_kwargs", video_kwargs, video_input.shape)
                fps_list.extend(video_kwargs.get('fps', []))

                # encode image with base64
                base64_frames = []
                for frame in video_input:
                    img = Image.fromarray(frame)
                    output_buffer = BytesIO()
                    img.save(output_buffer, format="jpeg")
                    byte_data = output_buffer.getvalue()
                    base64_str = base64.b64encode(byte_data).decode("utf-8")
                    base64_frames.append(base64_str)

                part_message = {
                    "type": "video_url",
                    "video_url": {"url": f"data:video/jpeg;base64,{','.join(base64_frames)}"}
                }
            new_content_list.append(part_message)
        message["content"] = new_content_list
        vllm_messages.append(message)
    return vllm_messages, {'fps': fps_list}


video_messages, video_kwargs = prepare_message_for_vllm(video_messages)
chat_response = client.chat.completions.create(
    model="Qwen/Qwen2.5-VL-7B-Instruct",
    messages=video_messages,
    extra_body={
        "mm_processor_kwargs": video_kwargs
    }
)
print("Chat response:", chat_response)

Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mergify mergify bot added the frontend label Feb 19, 2025
@DarkLight1337
Copy link
Member

Thanks, can you update the docs with this example? We don't have to test this as the code is straightforward enough.

@wulipc
Copy link
Contributor Author

wulipc commented Feb 19, 2025

Thanks, can you update the docs with this example? We don't have to test this as the code is straightforward enough.

ok, which docs in vLLM should be updated? This case has been updated in our Qwen repo.

@DarkLight1337
Copy link
Member

You can add it under Online Serving of this page

@DarkLight1337
Copy link
Member

Also, it would be best to verify that your PR works with #13516 once it's merged.

@wulipc
Copy link
Contributor Author

wulipc commented Feb 19, 2025

You can add it under Online Serving of this page

Currently, only the Qwen2.5-VL model requires passing the mm_processor_kwargs parameter, and the above case is a bit cumbersome. Given the specificity of this use case and to maintain the simplicity of the vLLM documentation, I prefer not to include this case in the vLLM documentation. If users have related needs, they can refer to this issue or the official Qwen documentation for more details.

@ywang96
Copy link
Member

ywang96 commented Feb 19, 2025

You can add it under Online Serving of this page

Currently, only the Qwen2.5-VL model requires passing the mm_processor_kwargs parameter, and the above case is a bit cumbersome. Given the specificity of this use case and to maintain the simplicity of the vLLM documentation, I prefer not to include this case in the vLLM documentation. If users have related needs, they can refer to this issue or the official Qwen documentation for more details.

I think it's okay if you want to include this example in https://github.com/vllm-project/vllm/blob/main/examples/online_serving/openai_chat_completion_client_for_multimodal.py (maybe video_with_kwargs as another chat-type), but yea as you mentioned, it's probably also a better idea to update this in the README of https://github.com/QwenLM/Qwen2.5-VL since this is only relavant to Qwen2.5VL

@wulipc
Copy link
Contributor Author

wulipc commented Feb 19, 2025

update this in the README of https://github.com/QwenLM/Qwen2.5-VL since this is only relavant to Qwen2.5VL

Let's keep it simple and only update this in the README of the Qwen2.5-VL: https://github.com/QwenLM/Qwen2.5-VL.

@DarkLight1337
Copy link
Member

DarkLight1337 commented Feb 19, 2025

I just realized that fps can be a list of float, can you update the model file with the correct type annotation? Otherwise LGTM

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) February 20, 2025 02:14
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 20, 2025
@wulipc
Copy link
Contributor Author

wulipc commented Feb 20, 2025

Also, it would be best to verify that your PR works with #13516 once it's merged.

@DarkLight1337 I found that after merging the fix into the main branch, passing the fps parameter through mm_processor_kwargs (mm_processor_kwargs: {'fps': []}) results in the following error. I checked and found that it was caused by the fps parameter being of list type.

image

@DarkLight1337
Copy link
Member

DarkLight1337 commented Feb 20, 2025

Can you try converting it into a tuple inside the processor?

@DarkLight1337
Copy link
Member

Another way would be to construct HashableList classes that use the tuple of its elements as the hash (similar to HashableDict)

@wulipc
Copy link
Contributor Author

wulipc commented Feb 20, 2025

HashableList

@DarkLight1337 I change the list to a tuple in the merge_mm_kwargs function. Using a tuple instead of HashableList should also be fine, right? Everything seems to be working fine so far, and after merging, the PR works with #13516.

@DarkLight1337
Copy link
Member

To follow HF's type hints, let's use HashableList

@wulipc
Copy link
Contributor Author

wulipc commented Feb 20, 2025

To follow HF's type hints, let's use HashableList

done

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) February 20, 2025 03:40
Copy link
Member

@ywang96 ywang96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - Thanks for the continuous contribution to vLLM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
frontend ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
4 participants