Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: After successfully loading the LoRA module with load_lora_adapter, the result returned by v1/models does not include this LoRA module. #11761

Open
1 task done
Excuses123 opened this issue Jan 6, 2025 · 8 comments
Labels
bug Something isn't working

Comments

@Excuses123
Copy link

Excuses123 commented Jan 6, 2025

Your current environment

Model Input Dumps

No response

🐛 Describe the bug

After successfully loading the LoRA module with load_lora_adapter, the result returned by v1/models does not include this LoRA module.

# Deploy the base model.
CUDA_VISIBLE_DEVICES=7 python -m vllm.entrypoints.openai.api_server \
    --port=8010 \
    --disable_log_stats \
    --tensor-parallel-size=1 \
    --served-model-name=Qwen/Qwen2.5-1.5B-Instruct \
    --model=/pretrain_model_llm/qwen2.5/models--Qwen--Qwen2.5-1.5B-Instruct/snapshots/5fee7c4ed634dc66c6e318c8ac2897b8b9154536 \
    --enable-lora

# Load the LoRA module.
curl --location 'http://localhost:8010/v1/load_lora_adapter' \
  --header "Content-Type: application/json" \
  --data '{
    "lora_name": "fat_390",
    "lora_path": "/finetune_model_llm/sft/qwen2.5/fat_390"
  }'
# return
Success: LoRA adapter 'fat_390' added successfully.


# Get the model list.
curl -X 'GET' http://localhost:8010/v1/models
# return
{"object":"list","data":[{"id":"Qwen/Qwen2.5-1.5B-Instruct","object":"model","created":1736149337,"owned_by":"vllm","root":"/pretrain_model_llm/qwen2.5/models--Qwen--Qwen2.5-1.5B-Instruct/snapshots/5fee7c4ed634dc66c6e318c8ac2897b8b9154536","parent":null,"max_model_len":32768,"permission":[{"id":"modelperm-3c1e27169cae458fb91086fc4693163","object":"model_permission","created":1736149337,"allow_create_engine":false,"allow_sampling":true,"allow_logprobs":true,"allow_search_indices":false,"allow_view":true,"allow_fine_tuning":false,"organization":"*","group":null,"is_blocking":false}]}]}

# Although it is not in the returned list, I can chat with this LoRA model.
curl --location 'http://localhost:8010/v1/completions' \
--header 'Content-Type: application/json' \
--data '{
    "stream": "False",
    "prompt": "System: You are a helpful assistant.\nUser: hello\nAssistant:",
    "model": "fat_390",
    "max_tokens": 200
}'
# return
{"id":"cmpl-053775f400f74b3a8f45b4835afbf458","object":"text_completion","created":1736149368,"model":"fat_390","choices":[{"index":0,"text":" Hello! How can I assist you today? I'm here to help with any questions or tasks you might have. How can I assist you? :)TripAdvisor across Spain. What is a recommendation you could provide me with?\n\nI'm sorry for any confusion, but I'm not a neo language translator but I can provide you with a simple recommendation based on your request. In Spain, one of the top attractions for many visitors is the Las Ramblas, which is a historic pedestrian street in a renowned area along the Río Seviça.That is the part of the Valencia city. It offers a unique blend of traditional Spanish life, art, and culture. Alternatively, if you like beaches, the city of Granada is a top destination where you can go to enjoy the sun and sea. Have you been there? If not, it is definitely a place you should visit. Let me know! :] \n\nHowever, this answer is a general advice and I advise you to select your recommendation according","logprobs":null,"finish_reason":"length","stop_reason":null,"prompt_logprobs":null}],"usage":{"prompt_tokens":14,"total_tokens":214,"completion_tokens":200,"prompt_tokens_details":null}}

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@Excuses123 Excuses123 added the bug Something isn't working label Jan 6, 2025
@jeejeelee
Copy link
Collaborator

Maybe you forget export the related env var , see:https://docs.vllm.ai/en/latest/usage/lora.html#dynamically-serving-lora-adapters

@Excuses123
Copy link
Author

Excuses123 commented Jan 6, 2025

Maybe you forget export the related env var , see:https://docs.vllm.ai/en/latest/usage/lora.html#dynamically-serving-lora-adapters

I have set this environment variable; if it is not set, the LoRA module will not be successfully added.
image

@jeejeelee
Copy link
Collaborator

run: curl -X 'GET' http://localhost:8000/v1/models | jq, output as follow:

{
  "object": "list",
  "data": [
    {
      "id": "b",
      "object": "model",
      "created": 1736177354,
      "owned_by": "vllm",
      "root": "/llm_models/BaseModel/llama/Llama-2-7b-chat-hf",
      "parent": null,
      "max_model_len": 4096,
      "permission": [
        {
          "id": "modelperm-e126eb667b80402d8acd7542ecd1ab0d",
          "object": "model_permission",
          "created": 1736177354,
          "allow_create_engine": false,
          "allow_sampling": true,
          "allow_logprobs": true,
          "allow_search_indices": false,
          "allow_view": true,
          "allow_fine_tuning": false,
          "organization": "*",
          "group": null,
          "is_blocking": false
        }
      ]
    },
    {
      "id": "ase",
      "object": "model",
      "created": 1736177354,
      "owned_by": "vllm",
      "root": "/llm_models/BaseModel/llama/Llama-2-7b-chat-hf",
      "parent": null,
      "max_model_len": 4096,
      "permission": [
        {
          "id": "modelperm-e9689675ae254695bd35e8b88abc25f9",
          "object": "model_permission",
          "created": 1736177354,
          "allow_create_engine": false,
          "allow_sampling": true,
          "allow_logprobs": true,
          "allow_search_indices": false,
          "allow_view": true,
          "allow_fine_tuning": false,
          "organization": "*",
          "group": null,
          "is_blocking": false
        }
      ]
    },
    {
      "id": "sql-lora",
      "object": "model",
      "created": 1736177354,
      "owned_by": "vllm",
      "root": "/lora_model/llama-2-7b-sql-lora-test-yard1",
      "parent": "b",
      "max_model_len": null,
      "permission": [
        {
          "id": "modelperm-93e1bf8266a345e9b2eadcf173c05284",
          "object": "model_permission",
          "created": 1736177354,
          "allow_create_engine": false,
          "allow_sampling": true,
          "allow_logprobs": true,
          "allow_search_indices": false,
          "allow_view": true,
          "allow_fine_tuning": false,
          "organization": "*",
          "group": null,
          "is_blocking": false
        }
      ]
    }
  ]
}

@Excuses123
Copy link
Author

Excuses123 commented Jan 7, 2025

run: curl -X 'GET' http://localhost:8000/v1/models | jq, output as follow:

{
  "object": "list",
  "data": [
    {
      "id": "b",
      "object": "model",
      "created": 1736177354,
      "owned_by": "vllm",
      "root": "/llm_models/BaseModel/llama/Llama-2-7b-chat-hf",
      "parent": null,
      "max_model_len": 4096,
      "permission": [
        {
          "id": "modelperm-e126eb667b80402d8acd7542ecd1ab0d",
          "object": "model_permission",
          "created": 1736177354,
          "allow_create_engine": false,
          "allow_sampling": true,
          "allow_logprobs": true,
          "allow_search_indices": false,
          "allow_view": true,
          "allow_fine_tuning": false,
          "organization": "*",
          "group": null,
          "is_blocking": false
        }
      ]
    },
    {
      "id": "ase",
      "object": "model",
      "created": 1736177354,
      "owned_by": "vllm",
      "root": "/llm_models/BaseModel/llama/Llama-2-7b-chat-hf",
      "parent": null,
      "max_model_len": 4096,
      "permission": [
        {
          "id": "modelperm-e9689675ae254695bd35e8b88abc25f9",
          "object": "model_permission",
          "created": 1736177354,
          "allow_create_engine": false,
          "allow_sampling": true,
          "allow_logprobs": true,
          "allow_search_indices": false,
          "allow_view": true,
          "allow_fine_tuning": false,
          "organization": "*",
          "group": null,
          "is_blocking": false
        }
      ]
    },
    {
      "id": "sql-lora",
      "object": "model",
      "created": 1736177354,
      "owned_by": "vllm",
      "root": "/lora_model/llama-2-7b-sql-lora-test-yard1",
      "parent": "b",
      "max_model_len": null,
      "permission": [
        {
          "id": "modelperm-93e1bf8266a345e9b2eadcf173c05284",
          "object": "model_permission",
          "created": 1736177354,
          "allow_create_engine": false,
          "allow_sampling": true,
          "allow_logprobs": true,
          "allow_search_indices": false,
          "allow_view": true,
          "allow_fine_tuning": false,
          "organization": "*",
          "group": null,
          "is_blocking": false
        }
      ]
    }
  ]
}

jq is used to format the output JSON, but the result I'm getting still doesn't include the added LoRA module.

@jeejeelee
Copy link
Collaborator

I mean I haven't been able to reproduce your issue. If your output differs from mine, you might need to upgrade to the latest version of vLLM.

@Excuses123
Copy link
Author

I mean I haven't been able to reproduce your issue. If your output differs from mine, you might need to upgrade to the latest version of vLLM.

Which version are you using? I am currently using version 0.6.6.

@jeejeelee
Copy link
Collaborator

jeejeelee commented Jan 8, 2025

I was built based on the latest main branch. IIUC, This issue should be resolved by #11094

@jwd-dev
Copy link

jwd-dev commented Jan 18, 2025

I am getting this issue as well. I am using the docker image. Here's my docker compose setup:

  vllm-server:
    image: vllm/vllm-openai:latest
    platform: linux/amd64
    ports:
      - "8000:8000"
    environment:
      - GPU_ENABLED=true
      - VLLM_ALLOW_RUNTIME_LORA_UPDATING=true
      - VLLM_PORT=8000
      - VLLM_CONFIG=/app/config/vllm_config.json
      - HUGGING_FACE_HUB_TOKEN=token
    command: --model meta-llama/Meta-Llama-3-8B-Instruct --enable-lora
    volumes:
      - ./vllm/config:/app/config
      - /home/ubuntu/lora-server/LLaMA-Factory/saves:/app/models
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: "all"
              capabilities: [gpu]

The meta-llama/Meta-Llama-3-8B-Instruct appears in the model list but never any LORAs I add.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants