Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to run Qwen using Executorch? #7467

Open
Arya-Hari opened this issue Jan 2, 2025 · 3 comments
Open

How to run Qwen using Executorch? #7467

Arya-Hari opened this issue Jan 2, 2025 · 3 comments
Assignees
Labels
module: llm LLM examples and apps, and the extensions/llm libraries

Comments

@Arya-Hari
Copy link

📚 The doc issue

Hi! I just wanted to how, how would I go about running Qwen using executorch? I was able to create the .pte file for Qwen. The example for Llama had a step 'Create a llama runner for android'. Do we have to do something similar for Qwen by creating a custom runner? Also the Qwen repository on Hugging Face Hub does not have a 'tokenizer.model' file, but the Llama example requires it for running inference using the adb shell. How to navigate around this?

Suggest a potential alternative/fix

No response

@kimishpatel
Copy link
Contributor

i dont know the details how to run Qwen and whether there is any significant difference compared to llama as far as model's interface is concerned.

Also when you say you were abel to export the model, can you detail the steps you took. If you can run exported qwen model using https://github.com/pytorch/executorch/blob/main/examples/models/llama/runner/eager.py#L103 then highly likely that you can run via cpp runner. But you do need tokenizer, so not sure how hf runs this model

@kimishpatel kimishpatel added the module: llm LLM examples and apps, and the extensions/llm libraries label Jan 3, 2025
@SS-JIA SS-JIA moved this to To triage in ExecuTorch DevX improvements Jan 6, 2025
@SS-JIA SS-JIA self-assigned this Jan 6, 2025
@SS-JIA
Copy link
Contributor

SS-JIA commented Jan 6, 2025

@Arya-Hari for some more context, the llama_runner binary used in our examples is heavily tailored to the llama model architecture. So as Kimish mentioned, depending on the interface of Qwen compared to llama you may not be able to re-use the llama_runner binary. If you are familiar with the interface of the model, then the best way would be to fork or modify the llama_runner binary for the Qwen model; essentially creating a custom runner as you mentioned.

@mergennachin
Copy link
Contributor

@guangy10, Is there guidelines on how to leverage from recent Hugging Face (huggingface/transformers#32253, huggingface/transformers#34102) and optimum integrations (https://huggingface.co/docs/optimum/main/en/exporters/executorch/usage_guides/export_a_model)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: llm LLM examples and apps, and the extensions/llm libraries
Projects
Status: To triage
Development

No branches or pull requests

4 participants