Can't infer Qwen2-1.5B with a lora #1186

busishengui · 2025-01-15T06:35:53Z

I use the Qwen2-1.5B model and a lora:

Convert Qwen2-1.5B

python3  builder.py  -m Qwen2-1.5B-Instruct  -o  Qwen2-1.5B-Instruct-onnx-int4 -p int4 -e cpu  --extra_options int4_block_size=128 int4_accuracy_level=4 int4_op_types_to_quantize=MatMul/Gather

Convert the lora

python -m olive convert-adapters -a ./release --adapter_format onnx_adapter -o ./release  --log_level 4

+Infer code

auto model = OgaModel::Create(path_model_dir.c_str());
auto lora = OgaAdapters::Create(*model);
lora->LoadAdapter("release.onnx_adapter", "best_lora");
auto tokenizer = OgaTokenizer::Create(*model);
auto tokenizer_stream = OgaTokenizerStream::Create(*tokenizer);
auto params = OgaGeneratorParams::Create(*model);
params->SetSearchOption("max_length", 128);
auto seq = OgaSequences::Create();
tokenizer->Encode(query.c_str(), *seq);
params->SetInputSequences(*seq);
auto generator = OgaGenerator::Create(*model, *params);
generator->SetActiveAdapter(*lora, "best_lora");
std::stringstream result_ss;
while (!generator->IsDone())
{
	generator->ComputeLogits();
	generator->GenerateNextToken();
	const auto num_tokens = generator->GetSequenceCount(0);
	const auto new_token = generator->GetSequenceData(0)[num_tokens - 1];
        result_ss << tokenizer_stream->Decode(new_token);
}

but it dump !!! I don't know why

terminate called after throwing an instance of 'std::runtime_error'
  what():  Invalid input name: model.layers.9.self_attn.v_proj.lora_B.weight

Version

onnxruntime-genai : 0.5.2
olive-ai 0.7.1.1

ambroser53 · 2025-01-15T10:10:37Z

The documentation for this is pretty bad so I had the same issue. You have to convert the base model using olive as well otherwise it won't have the empty lora nodes that expect the adapter parameters. Qwen is even more finicky as convert-adapters doesn't work with it at all and you have to always use auto-opt and export the base model everytime with each set of adapters you want to convert. See my issue here to see what I mean.

Try this command:

olive auto-opt \
   --model_name_or_path Qwen2-1.5B-Instruct \
   --adapter_path ./release
   --device cpu \
   --provider CPUExecutionProvider \
   --use_ort_genai \
   --output_path ./release \
   --log_level 4 --precision int4 --use_model_builder

busishengui · 2025-01-17T03:14:30Z

The documentation for this is pretty bad so I had the same issue. You have to convert the base model using olive as well otherwise it won't have the empty lora nodes that expect the adapter parameters. Qwen is even more finicky as convert-adapters doesn't work with it at all and you have to always use auto-opt and export the base model everytime with each set of adapters you want to convert. See my issue here to see what I mean.

Try this command:
olive auto-opt \
   --model_name_or_path Qwen2-1.5B-Instruct \
   --adapter_path ./release
   --device cpu \
   --provider CPUExecutionProvider \
   --use_ort_genai \
   --output_path ./release \
   --log_level 4 --precision int4 --use_model_builder

I have the same problem

busishengui · 2025-02-20T02:33:01Z

on one reply me

baijumeswani · 2025-02-21T00:34:09Z

@busishengui were you able to give the above suggestion a try? In addition to converting the lora adapters to onnx format, you also need to convert the base model using olive so it has the necessary lora nodes.

Let us know if you're still having issues.

busishengui · 2025-02-21T09:59:26Z

@busishengui were you able to give the above suggestion a try? In addition to converting the lora adapters to onnx format, you also need to convert the base model using olive so it has the necessary lora nodes.

Let us know if you're still having issues.

Thanks very much, it could work, but there is another problem
I use this:

python3 -m olive auto-opt \
--model_name_or_path Qwen2-1.5B-Instruct \
--adapter_path ./prc_slm_v1170_best_lora \
--device cpu \
--provider CPUExecutionProvider \
--use_ort_genai \
--output_path ./release \
--precision int4 \
--use_model_builder \
--log_level 4

then I get these files:

release/
├── model
│   ├── adapter_weights.onnx_adapter
│   ├── added_tokens.json
│   ├── config.json
│   ├── genai_config.json
│   ├── generation_config.json
│   ├── merges.txt
│   ├── model.onnx
│   ├── release
│   ├── special_tokens_map.json
│   ├── tokenizer_config.json
│   ├── tokenizer.json
│   └── vocab.json
└── model_config.json

Then I use the ''model'' floder as my base LLM， and the adapter_weights.onnx_adapter as my Lora model， but it don't work,
There is another problem: the accuracy of the model has decreased significantly compared to directly merging Lora and the base model, I don't konw why.
Then I try set the precision fp32，but don't work, can you help me?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't infer Qwen2-1.5B with a lora #1186

Can't infer Qwen2-1.5B with a lora #1186

busishengui commented Jan 15, 2025

ambroser53 commented Jan 15, 2025

busishengui commented Jan 17, 2025

busishengui commented Feb 20, 2025

baijumeswani commented Feb 21, 2025

busishengui commented Feb 21, 2025

Can't infer Qwen2-1.5B with a lora #1186

Can't infer Qwen2-1.5B with a lora #1186

Comments

busishengui commented Jan 15, 2025

ambroser53 commented Jan 15, 2025

busishengui commented Jan 17, 2025

busishengui commented Feb 20, 2025

baijumeswani commented Feb 21, 2025

busishengui commented Feb 21, 2025