You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
world_size_patch = patch(
"torch.distributed.get_world_size", return_value=1
)
profiling_patch = patch(
"vllm.worker.worker.Worker._assert_memory_footprint_increased_during_profiling",
return_value=None,
)
with world_size_patch, profiling_patch:
print("vllm is running on: ", vllm_device)
self.llm = LLM(
model=model.name_or_path,
device=vllm_device,
gpu_memory_utilization=self.args.vllm_gpu_memory_utilization,
dtype=torch.bfloat16,
# Automatic Prefix Caching caches the KV cache of existing queries, so that a new query can
# directly reuse the KV cache if it shares the same prefix with one of the existing queries.
# This is particularly useful here because we generate completions from the same prompts.
enable_prefix_caching=True,
enforce_eager=True,
# Ensure that training and inference use the same processor for images.
mm_processor_kwargs=(
{
"max_pixels": max_pixels,
"min_pixels": min_pixels,
}
if "Qwen2-VL" in model_id or "Qwen2.5-VL" in model_id
else None
),
max_model_len=args.max_completion_length,
)
The text was updated successfully, but these errors were encountered:
在下面这部分代码中,torch.distributed.get_world_size被设置成1,但是一张卡80G放不下比较长的prompt_length,想放在两张卡上,如果直接"torch.distributed.get_world_size", return_value=1将这个部分改成2并设置对应的环境变量会导致通讯问题,直接卡住,请问有办法解决吗?
The text was updated successfully, but these errors were encountered: