Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Model] add vllm compatible models #544

Merged
merged 6 commits into from
Feb 20, 2025
Merged

[Model] add vllm compatible models #544

merged 6 commits into from
Feb 20, 2025

Conversation

Luodian
Copy link
Contributor

@Luodian Luodian commented Feb 19, 2025

Add vLLM model integration and update configurations

  • Introduce VLLM model in the model registry.

vLLM offers exceptional speed in evaluation. Moving forward, we should prioritize using this approach.

image

- Introduce VLLM model in the model registry.
- Update AVAILABLE_MODELS to include new models:
  - models/__init__.py: Added "aria", "internvideo2", "llama_vision", "oryx", "ross", "slime", "videochat2", "vllm", "xcomposer2_4KHD", "xcomposer2d5".
- Create vllm.py for VLLM model implementation:
  - Implemented encoding for images and videos.
  - Added methods for generating responses and handling multi-round generation.
- Update mmu tasks with new prompt formats and evaluation metrics:
  - mmmu_val.yaml: Added specific kwargs for prompt types.
  - mmmu_val_reasoning.yaml: Enhanced prompts for reasoning tasks.
  - utils.py: Adjusted evaluation rules and scoring for predictions.
- Added script for easy model execution:
  - vllm_qwen2vl.sh: Script to run VLLM with specified parameters.
- Configure environment for better performance and debugging.
- Added variables to control multiprocessing and NCCL behavior.

miscs/vllm_qwen2vl.sh:
- Set `VLLM_WORKER_MULTIPROC_METHOD` to `spawn` for compatibility.
- Enabled `NCCL_BLOCKING_WAIT` to avoid hangs.
- Increased `NCCL_TIMEOUT` to 18000000 for long-running processes.
- Set `NCCL_DEBUG` to `DEBUG` for detailed logs.
- Renamed representation scripts for clarity.
  - miscs/repr_scripts.sh -> miscs/model_dryruns/llava_1_5.sh
  - miscs/cicd_qwen2vl.sh -> miscs/model_dryruns/qwen2vl.sh
  - miscs/tinyllava_repr_scripts.sh -> miscs/model_dryruns/tinyllava.sh
  - miscs/vllm_qwen2vl.sh -> miscs/model_dryruns/vllm_qwen2vl.sh
- Updated parameters in the vllm_qwen2vl.sh script.
  - miscs/model_dryruns/vllm_qwen2vl.sh: Added `--limit=64` to output path command.


@register_model("vllm")
class VLLM(lmms):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this processing is specific to VLMs, consider renaming to vllm_vlm

Copy link
Contributor Author

@Luodian Luodian Feb 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can remain as is vllm supports both vision/language models and vision/language tasks. For example, actually you can use Qwen/Qwen2.5-0.5B-Instruct to evaluate mmlu_flan_n_shot_generative within our framework.

So it's better supporting vllm through a single class.

img.save(output_buffer, format="PNG")
byte_data = output_buffer.getvalue()

base64_str = base64.b64encode(byte_data).decode("utf-8")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is base64 specific to qwen_vl? The vllm VLM interface accepts PIL images and then does model-specific processing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great, let me change it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh after checking the doc, I feel like we should keep the openai format messages so we dont need to apply chat template.

Otherwise if we use this format, we need to apply model specific chat template.

outputs = llm.generate({
    "prompt": prompt,
    "multi_modal_data": {"image": image_embeds},
})

https://github.com/vllm-project/vllm/blob/0d243f2a54fbd1c56da8a571f0899c30b6aba5d9/docs/source/serving/multimodal_inputs.md

@fzyzcjy
Copy link

fzyzcjy commented Feb 20, 2025

+1 This looks great and thanks for the support!

I wonder whether this PR is ready for use for not / when will this be ready for use? i.e. can I just use this branch and run it

- Simplify image conversion in the `to_base64` method:
  - vllm.py: Directly convert input image to RGB format instead of copying it.
- Remove unnecessary base64 encoding for images:
  - vllm.py: Return the PIL image directly instead of converting it to base64.
- Update video frame processing to return PIL images:
  - vllm.py: Replace base64 encoding of frames with returning the PIL frames directly.
@Luodian
Copy link
Contributor Author

Luodian commented Feb 20, 2025

+1 This looks great and thanks for the support!

I wonder whether this PR is ready for use for not / when will this be ready for use? i.e. can I just use this branch and run it

think immediately

@Luodian
Copy link
Contributor Author

Luodian commented Feb 20, 2025

image

@Luodian Luodian merged commit 968d5f1 into main Feb 20, 2025
2 checks passed
@kcz358 kcz358 deleted the dev/add_vllm branch February 20, 2025 05:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants