[Bug]: After successfully loading the LoRA module with load_lora_adapter, the result returned by v1/models does not include this LoRA module. #11761

Excuses123 · 2025-01-06T07:48:50Z

Your current environment

Model Input Dumps

No response

🐛 Describe the bug

After successfully loading the LoRA module with load_lora_adapter, the result returned by v1/models does not include this LoRA module.

# Deploy the base model.
CUDA_VISIBLE_DEVICES=7 python -m vllm.entrypoints.openai.api_server \
    --port=8010 \
    --disable_log_stats \
    --tensor-parallel-size=1 \
    --served-model-name=Qwen/Qwen2.5-1.5B-Instruct \
    --model=/pretrain_model_llm/qwen2.5/models--Qwen--Qwen2.5-1.5B-Instruct/snapshots/5fee7c4ed634dc66c6e318c8ac2897b8b9154536 \
    --enable-lora

# Load the LoRA module.
curl --location 'http://localhost:8010/v1/load_lora_adapter' \
  --header "Content-Type: application/json" \
  --data '{
    "lora_name": "fat_390",
    "lora_path": "/finetune_model_llm/sft/qwen2.5/fat_390"
  }'
# return
Success: LoRA adapter 'fat_390' added successfully.


# Get the model list.
curl -X 'GET' http://localhost:8010/v1/models
# return
{"object":"list","data":[{"id":"Qwen/Qwen2.5-1.5B-Instruct","object":"model","created":1736149337,"owned_by":"vllm","root":"/pretrain_model_llm/qwen2.5/models--Qwen--Qwen2.5-1.5B-Instruct/snapshots/5fee7c4ed634dc66c6e318c8ac2897b8b9154536","parent":null,"max_model_len":32768,"permission":[{"id":"modelperm-3c1e27169cae458fb91086fc4693163","object":"model_permission","created":1736149337,"allow_create_engine":false,"allow_sampling":true,"allow_logprobs":true,"allow_search_indices":false,"allow_view":true,"allow_fine_tuning":false,"organization":"*","group":null,"is_blocking":false}]}]}

# Although it is not in the returned list, I can chat with this LoRA model.
curl --location 'http://localhost:8010/v1/completions' \
--header 'Content-Type: application/json' \
--data '{
    "stream": "False",
    "prompt": "System: You are a helpful assistant.\nUser: hello\nAssistant:",
    "model": "fat_390",
    "max_tokens": 200
}'
# return
{"id":"cmpl-053775f400f74b3a8f45b4835afbf458","object":"text_completion","created":1736149368,"model":"fat_390","choices":[{"index":0,"text":" Hello! How can I assist you today? I'm here to help with any questions or tasks you might have. How can I assist you? :)TripAdvisor across Spain. What is a recommendation you could provide me with?\n\nI'm sorry for any confusion, but I'm not a neo language translator but I can provide you with a simple recommendation based on your request. In Spain, one of the top attractions for many visitors is the Las Ramblas, which is a historic pedestrian street in a renowned area along the Río Seviça.That is the part of the Valencia city. It offers a unique blend of traditional Spanish life, art, and culture. Alternatively, if you like beaches, the city of Granada is a top destination where you can go to enjoy the sun and sea. Have you been there? If not, it is definitely a place you should visit. Let me know! :] \n\nHowever, this answer is a general advice and I advise you to select your recommendation according","logprobs":null,"finish_reason":"length","stop_reason":null,"prompt_logprobs":null}],"usage":{"prompt_tokens":14,"total_tokens":214,"completion_tokens":200,"prompt_tokens_details":null}}

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

The text was updated successfully, but these errors were encountered:

jeejeelee · 2025-01-06T08:54:04Z

Maybe you forget export the related env var , see:https://docs.vllm.ai/en/latest/usage/lora.html#dynamically-serving-lora-adapters

Excuses123 · 2025-01-06T09:24:18Z

Maybe you forget export the related env var , see:https://docs.vllm.ai/en/latest/usage/lora.html#dynamically-serving-lora-adapters

I have set this environment variable; if it is not set, the LoRA module will not be successfully added.

jeejeelee · 2025-01-06T15:33:11Z

run: curl -X 'GET' http://localhost:8000/v1/models | jq, output as follow:

{
  "object": "list",
  "data": [
    {
      "id": "b",
      "object": "model",
      "created": 1736177354,
      "owned_by": "vllm",
      "root": "/llm_models/BaseModel/llama/Llama-2-7b-chat-hf",
      "parent": null,
      "max_model_len": 4096,
      "permission": [
        {
          "id": "modelperm-e126eb667b80402d8acd7542ecd1ab0d",
          "object": "model_permission",
          "created": 1736177354,
          "allow_create_engine": false,
          "allow_sampling": true,
          "allow_logprobs": true,
          "allow_search_indices": false,
          "allow_view": true,
          "allow_fine_tuning": false,
          "organization": "*",
          "group": null,
          "is_blocking": false
        }
      ]
    },
    {
      "id": "ase",
      "object": "model",
      "created": 1736177354,
      "owned_by": "vllm",
      "root": "/llm_models/BaseModel/llama/Llama-2-7b-chat-hf",
      "parent": null,
      "max_model_len": 4096,
      "permission": [
        {
          "id": "modelperm-e9689675ae254695bd35e8b88abc25f9",
          "object": "model_permission",
          "created": 1736177354,
          "allow_create_engine": false,
          "allow_sampling": true,
          "allow_logprobs": true,
          "allow_search_indices": false,
          "allow_view": true,
          "allow_fine_tuning": false,
          "organization": "*",
          "group": null,
          "is_blocking": false
        }
      ]
    },
    {
      "id": "sql-lora",
      "object": "model",
      "created": 1736177354,
      "owned_by": "vllm",
      "root": "/lora_model/llama-2-7b-sql-lora-test-yard1",
      "parent": "b",
      "max_model_len": null,
      "permission": [
        {
          "id": "modelperm-93e1bf8266a345e9b2eadcf173c05284",
          "object": "model_permission",
          "created": 1736177354,
          "allow_create_engine": false,
          "allow_sampling": true,
          "allow_logprobs": true,
          "allow_search_indices": false,
          "allow_view": true,
          "allow_fine_tuning": false,
          "organization": "*",
          "group": null,
          "is_blocking": false
        }
      ]
    }
  ]
}

Excuses123 · 2025-01-07T12:36:41Z

run: curl -X 'GET' http://localhost:8000/v1/models | jq, output as follow:

{
  "object": "list",
  "data": [
    {
      "id": "b",
      "object": "model",
      "created": 1736177354,
      "owned_by": "vllm",
      "root": "/llm_models/BaseModel/llama/Llama-2-7b-chat-hf",
      "parent": null,
      "max_model_len": 4096,
      "permission": [
        {
          "id": "modelperm-e126eb667b80402d8acd7542ecd1ab0d",
          "object": "model_permission",
          "created": 1736177354,
          "allow_create_engine": false,
          "allow_sampling": true,
          "allow_logprobs": true,
          "allow_search_indices": false,
          "allow_view": true,
          "allow_fine_tuning": false,
          "organization": "*",
          "group": null,
          "is_blocking": false
        }
      ]
    },
    {
      "id": "ase",
      "object": "model",
      "created": 1736177354,
      "owned_by": "vllm",
      "root": "/llm_models/BaseModel/llama/Llama-2-7b-chat-hf",
      "parent": null,
      "max_model_len": 4096,
      "permission": [
        {
          "id": "modelperm-e9689675ae254695bd35e8b88abc25f9",
          "object": "model_permission",
          "created": 1736177354,
          "allow_create_engine": false,
          "allow_sampling": true,
          "allow_logprobs": true,
          "allow_search_indices": false,
          "allow_view": true,
          "allow_fine_tuning": false,
          "organization": "*",
          "group": null,
          "is_blocking": false
        }
      ]
    },
    {
      "id": "sql-lora",
      "object": "model",
      "created": 1736177354,
      "owned_by": "vllm",
      "root": "/lora_model/llama-2-7b-sql-lora-test-yard1",
      "parent": "b",
      "max_model_len": null,
      "permission": [
        {
          "id": "modelperm-93e1bf8266a345e9b2eadcf173c05284",
          "object": "model_permission",
          "created": 1736177354,
          "allow_create_engine": false,
          "allow_sampling": true,
          "allow_logprobs": true,
          "allow_search_indices": false,
          "allow_view": true,
          "allow_fine_tuning": false,
          "organization": "*",
          "group": null,
          "is_blocking": false
        }
      ]
    }
  ]
}

jq is used to format the output JSON, but the result I'm getting still doesn't include the added LoRA module.

jeejeelee · 2025-01-07T14:09:14Z

I mean I haven't been able to reproduce your issue. If your output differs from mine, you might need to upgrade to the latest version of vLLM.

Excuses123 · 2025-01-07T14:33:16Z

I mean I haven't been able to reproduce your issue. If your output differs from mine, you might need to upgrade to the latest version of vLLM.

Which version are you using? I am currently using version 0.6.6.

jeejeelee · 2025-01-08T01:35:14Z

I was built based on the latest main branch. IIUC, This issue should be resolved by #11094

jwd-dev · 2025-01-18T20:43:25Z

I am getting this issue as well. I am using the docker image. Here's my docker compose setup:

  vllm-server:
    image: vllm/vllm-openai:latest
    platform: linux/amd64
    ports:
      - "8000:8000"
    environment:
      - GPU_ENABLED=true
      - VLLM_ALLOW_RUNTIME_LORA_UPDATING=true
      - VLLM_PORT=8000
      - VLLM_CONFIG=/app/config/vllm_config.json
      - HUGGING_FACE_HUB_TOKEN=token
    command: --model meta-llama/Meta-Llama-3-8B-Instruct --enable-lora
    volumes:
      - ./vllm/config:/app/config
      - /home/ubuntu/lora-server/LLaMA-Factory/saves:/app/models
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: "all"
              capabilities: [gpu]

The meta-llama/Meta-Llama-3-8B-Instruct appears in the model list but never any LORAs I add.

Excuses123 added the bug Something isn't working label Jan 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: After successfully loading the LoRA module with load_lora_adapter, the result returned by v1/models does not include this LoRA module. #11761

[Bug]: After successfully loading the LoRA module with load_lora_adapter, the result returned by v1/models does not include this LoRA module. #11761

Excuses123 commented Jan 6, 2025 •

edited

Loading

jeejeelee commented Jan 6, 2025

Excuses123 commented Jan 6, 2025 •

edited

Loading

jeejeelee commented Jan 6, 2025

Excuses123 commented Jan 7, 2025 •

edited

Loading

jeejeelee commented Jan 7, 2025

Excuses123 commented Jan 7, 2025

jeejeelee commented Jan 8, 2025 •

edited

Loading

jwd-dev commented Jan 18, 2025 •

edited

Loading

[Bug]: After successfully loading the LoRA module with load_lora_adapter, the result returned by v1/models does not include this LoRA module. #11761

[Bug]: After successfully loading the LoRA module with load_lora_adapter, the result returned by v1/models does not include this LoRA module. #11761

Comments

Excuses123 commented Jan 6, 2025 • edited Loading

Your current environment

Model Input Dumps

🐛 Describe the bug

Before submitting a new issue...

jeejeelee commented Jan 6, 2025

Excuses123 commented Jan 6, 2025 • edited Loading

jeejeelee commented Jan 6, 2025

Excuses123 commented Jan 7, 2025 • edited Loading

jeejeelee commented Jan 7, 2025

Excuses123 commented Jan 7, 2025

jeejeelee commented Jan 8, 2025 • edited Loading

jwd-dev commented Jan 18, 2025 • edited Loading

Excuses123 commented Jan 6, 2025 •

edited

Loading

Excuses123 commented Jan 6, 2025 •

edited

Loading

Excuses123 commented Jan 7, 2025 •

edited

Loading

jeejeelee commented Jan 8, 2025 •

edited

Loading

jwd-dev commented Jan 18, 2025 •

edited

Loading