IPEX v2.5.10: Fail to run inference with quantized Phi-3 #756

shira-g · 2024-12-22T10:46:28Z

Describe the issue

I tried the example in: https://github.com/intel/intel-extension-for-pytorch/tree/release/xpu/2.5.10/examples/gpu/llm/inference#learn-to-quantize-llm-and-save-quantized-model-then-run-inference-with-quantized-model , using microsoft/Phi-3-mini-4k-instruct model.

It fails with:

  File "C:\Users\sdp\.cache\huggingface\modules\transformers_modules\0a67737cc96d2554230f90338b163bc6380a2a85\modeling_phi3.py", line 1305, in prepare_inputs_for_generation
    elif past_length < input_ids.shape[1]:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: '<' not supported between instances of 'NoneType' and 'int'

The text was updated successfully, but these errors were encountered:

xiguiw · 2024-12-23T03:19:52Z

@shira-g

Let me check it.

Could you provide some more detail about your environment? The GPU driver version and the transformers version you used.
How you set up your environment?

Thanks!

shira-g · 2024-12-23T10:28:07Z

I'm running on Windows 11, Intel(R) Core(TM) Ultra 5.
transformers version: 4.44.2
GPU driver: Intel(R) Arc(TM) 130V GPU (16GB) driver version: 32.0.101.6325

I set up a fresh conda environment with python 3.12, and run:
conda install libuv
python -m pip install torch==2.5.1+cxx11.abi torchvision==0.20.1+cxx11.abi torchaudio==2.5.1+cxx11.abi intel-extension-for-pytorch==2.5.10+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/lnl/cn/

Then, I run the sample from the link I sent above, by replacing model_id= "microsoft/Phi-3-mini-4k-instruct"

Thank you

xiguiw · 2024-12-30T15:51:31Z

@shira-g

What's the script you used to run the model?

Please run the Model with this script:

intel-extension-for-pytorch/examples/gpu/llm/inference/run_benchmark_woq.sh

Line 107 in 2bca097

Run_benchmark_Phi3-mini_int4

shira-g · 2024-12-31T10:36:16Z

Hi, thank you for the reference.

I used the following script: https://github.com/intel/intel-extension-for-pytorch/tree/release/xpu/2.5.10/examples/gpu/llm/inference#learn-to-quantize-llm-and-save-quantized-model-then-run-inference-with-quantized-model, and in line 10 it sets: 'use_hf_code = True'.

This is the cause for my failure when using Phi-3.
If I set use_hf_code = False, it runs successfully.

You might want to debug this when using 'use_hf_code = True'.

xiguiw · 2025-01-02T09:22:21Z

Hi, thank you for the reference.

I used the following script: https://github.com/intel/intel-extension-for-pytorch/tree/release/xpu/2.5.10/examples/gpu/llm/inference#learn-to-quantize-llm-and-save-quantized-model-then-run-inference-with-quantized-model, and in line 10 it sets: 'use_hf_code = True'.

This is the cause for my failure when using Phi-3. If I set use_hf_code = False, it runs successfully.

You might want to debug this when using 'use_hf_code = True'.

That's interesting.
Would you please provide the details (logs etc,) for the condition 'use_hf_code = True'?
Thanks!

shira-g · 2025-01-07T08:59:47Z

which logs do you need?
Following is the error I get if setting : 'use_hf_code = True'.

  File "C:\Users\sdp\.cache\huggingface\modules\transformers_modules\0a67737cc96d2554230f90338b163bc6380a2a85\modeling_phi3.py", line 1305, in prepare_inputs_for_generation
    elif past_length < input_ids.shape[1]:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: '<' not supported between instances of 'NoneType' and 'int'

xiguiw self-assigned this Dec 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IPEX v2.5.10: Fail to run inference with quantized Phi-3 #756

IPEX v2.5.10: Fail to run inference with quantized Phi-3 #756

shira-g commented Dec 22, 2024

xiguiw commented Dec 23, 2024 •

edited

Loading

shira-g commented Dec 23, 2024

xiguiw commented Dec 30, 2024 •

edited

Loading

shira-g commented Dec 31, 2024 •

edited

Loading

xiguiw commented Jan 2, 2025

shira-g commented Jan 7, 2025

IPEX v2.5.10: Fail to run inference with quantized Phi-3 #756

IPEX v2.5.10: Fail to run inference with quantized Phi-3 #756

Comments

shira-g commented Dec 22, 2024

Describe the issue

xiguiw commented Dec 23, 2024 • edited Loading

shira-g commented Dec 23, 2024

xiguiw commented Dec 30, 2024 • edited Loading

shira-g commented Dec 31, 2024 • edited Loading

xiguiw commented Jan 2, 2025

shira-g commented Jan 7, 2025

xiguiw commented Dec 23, 2024 •

edited

Loading

xiguiw commented Dec 30, 2024 •

edited

Loading

shira-g commented Dec 31, 2024 •

edited

Loading