Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

请问vLLM proc的部分可以放在两张卡上吗? #138

Open
rrustlee opened this issue Feb 25, 2025 · 0 comments
Open

请问vLLM proc的部分可以放在两张卡上吗? #138

rrustlee opened this issue Feb 25, 2025 · 0 comments

Comments

@rrustlee
Copy link

在下面这部分代码中,torch.distributed.get_world_size被设置成1,但是一张卡80G放不下比较长的prompt_length,想放在两张卡上,如果直接"torch.distributed.get_world_size", return_value=1将这个部分改成2并设置对应的环境变量会导致通讯问题,直接卡住,请问有办法解决吗?

world_size_patch = patch(
                    "torch.distributed.get_world_size", return_value=1
                )
                profiling_patch = patch(
                    "vllm.worker.worker.Worker._assert_memory_footprint_increased_during_profiling",
                    return_value=None,
                )
                with world_size_patch, profiling_patch:
                    print("vllm is running on: ", vllm_device)
                    self.llm = LLM(
                        model=model.name_or_path,
                        device=vllm_device,
                        gpu_memory_utilization=self.args.vllm_gpu_memory_utilization,
                        dtype=torch.bfloat16,
                        # Automatic Prefix Caching caches the KV cache of existing queries, so that a new query can
                        # directly reuse the KV cache if it shares the same prefix with one of the existing queries.
                        # This is particularly useful here because we generate completions from the same prompts.
                        enable_prefix_caching=True,
                        enforce_eager=True,
                        # Ensure that training and inference use the same processor for images.
                        mm_processor_kwargs=(
                            {
                                "max_pixels": max_pixels,
                                "min_pixels": min_pixels,
                            }
                            if "Qwen2-VL" in model_id or "Qwen2.5-VL" in model_id
                            else None
                        ),
                        max_model_len=args.max_completion_length,
                    )
@rrustlee rrustlee reopened this Feb 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant