-
Notifications
You must be signed in to change notification settings - Fork 258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IPEX v2.5.10: Fail to run inference with quantized Phi-3 #756
Comments
Let me check it. Could you provide some more detail about your environment? The GPU driver version and the transformers version you used. Thanks! |
I'm running on Windows 11, Intel(R) Core(TM) Ultra 5. I set up a fresh conda environment with python 3.12, and run: Then, I run the sample from the link I sent above, by replacing model_id= "microsoft/Phi-3-mini-4k-instruct" Thank you |
What's the script you used to run the model? Please run the Model with this script:
|
Hi, thank you for the reference. I used the following script: https://github.com/intel/intel-extension-for-pytorch/tree/release/xpu/2.5.10/examples/gpu/llm/inference#learn-to-quantize-llm-and-save-quantized-model-then-run-inference-with-quantized-model, and in line 10 it sets: 'use_hf_code = True'. This is the cause for my failure when using Phi-3. You might want to debug this when using 'use_hf_code = True'. |
That's interesting. |
which logs do you need?
|
Describe the issue
I tried the example in: https://github.com/intel/intel-extension-for-pytorch/tree/release/xpu/2.5.10/examples/gpu/llm/inference#learn-to-quantize-llm-and-save-quantized-model-then-run-inference-with-quantized-model , using
microsoft/Phi-3-mini-4k-instruct
model.It fails with:
The text was updated successfully, but these errors were encountered: