Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Qwen2.5 vLLM generator (based on LlamaGenerator), fix batch 1 issue with generator's decode forward #17422

Merged
merged 2 commits into from
Jan 31, 2025

Conversation

skhorasganiTT
Copy link
Contributor

@skhorasganiTT skhorasganiTT commented Jan 31, 2025

Ticket

batch 1 issue: tenstorrent/vllm#54

Problem description

  • Qwen vLLM generator did not exist
  • LlamaGenerator would crash for batch=1 decode inputs

What's changed

  • Added Qwen2.5 vLLM generator (based on LlamaGenerator)
  • Updated ccl topology in process_output_decode
  • Padded decode tokens to tile size to fix batch 1 issue with generator
  • Note: these changes currently only affect vLLM tests

Checklist

  • Post commit CI passes
  • Blackhole Post commit (if applicable)
  • Model regression CI testing passes (if applicable)
  • Device performance regression CI testing passes (if applicable)
  • (For models and ops writers) Full new models tests passes
  • New/Existing tests provide coverage for changes

…logy in process_output_decode

Signed-off-by: Salar Hosseini <[email protected]>
(cherry picked from commit 4fbdcc3)
…erator

Signed-off-by: Salar Hosseini <[email protected]>
(cherry picked from commit 4221d8a)
Copy link
Contributor

@mtairum mtairum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skhorasganiTT pre-approving this one.

Please rebase after #17421 gets merged later today as it might cause a conflict on llama_model.py file.

@skhorasganiTT
Copy link
Contributor Author

@skhorasganiTT pre-approving this one.

Please rebase after #17421 gets merged later today as it might cause a conflict on llama_model.py file.

No conflicts, already checked

@skhorasganiTT skhorasganiTT merged commit 41d4b36 into main Jan 31, 2025
9 checks passed
@skhorasganiTT skhorasganiTT deleted the skhorasgani/integrate_qwen2 branch January 31, 2025 19:11
skhorasganiTT added a commit that referenced this pull request Jan 31, 2025
…ue with generator's decode forward (#17422)

(cherry picked from commit 41d4b36)
nikileshx pushed a commit to nikileshx/tt-metal that referenced this pull request Feb 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants