Add Qwen2.5 vLLM generator (based on LlamaGenerator), fix batch 1 issue with generator's decode forward #17422

skhorasganiTT · 2025-01-31T15:16:53Z

Ticket

batch 1 issue: tenstorrent/vllm#54

Problem description

Qwen vLLM generator did not exist
LlamaGenerator would crash for batch=1 decode inputs

What's changed

Added Qwen2.5 vLLM generator (based on LlamaGenerator)
Updated ccl topology in process_output_decode
Padded decode tokens to tile size to fix batch 1 issue with generator
Note: these changes currently only affect vLLM tests

Checklist

Post commit CI passes
Blackhole Post commit (if applicable)
Model regression CI testing passes (if applicable)
Device performance regression CI testing passes (if applicable)
(For models and ops writers) Full new models tests passes
New/Existing tests provide coverage for changes

…logy in process_output_decode Signed-off-by: Salar Hosseini <[email protected]> (cherry picked from commit 4fbdcc3)

…erator Signed-off-by: Salar Hosseini <[email protected]> (cherry picked from commit 4221d8a)

mtairum

@skhorasganiTT pre-approving this one.

Please rebase after #17421 gets merged later today as it might cause a conflict on llama_model.py file.

skhorasganiTT · 2025-01-31T17:51:43Z

@skhorasganiTT pre-approving this one.

Please rebase after #17421 gets merged later today as it might cause a conflict on llama_model.py file.

No conflicts, already checked

…ue with generator's decode forward (#17422) (cherry picked from commit 41d4b36)

…ue with generator's decode forward (tenstorrent#17422)

skhorasganiTT added 2 commits January 31, 2025 15:12

Add Qwen2.5 vLLM generator (based on LlamaGenerator), update ccl topo…

ed348b1

…logy in process_output_decode Signed-off-by: Salar Hosseini <[email protected]> (cherry picked from commit 4fbdcc3)

[Llama3] Pad decode tokens to tile size to fix batch 1 issue with gen…

17b2695

…erator Signed-off-by: Salar Hosseini <[email protected]> (cherry picked from commit 4221d8a)

skhorasganiTT requested review from cglagovichTT, yieldthought, mtairum and uaydonat as code owners January 31, 2025 15:16

mtairum approved these changes Jan 31, 2025

View reviewed changes

skhorasganiTT merged commit 41d4b36 into main Jan 31, 2025
9 checks passed

skhorasganiTT deleted the skhorasgani/integrate_qwen2 branch January 31, 2025 19:11

This was referenced Jan 31, 2025

[Bug]: max_seqs_in_batch=1 induces a FATAL tenstorrent/vllm#54

Closed

Add Qwen2.5 TT model option and generation test for arbitrary increasing seq lens tenstorrent/vllm#58

Merged

skhorasganiTT added a commit that referenced this pull request Jan 31, 2025

Add Qwen2.5 vLLM generator (based on LlamaGenerator), fix batch 1 iss…

606a15e

…ue with generator's decode forward (#17422) (cherry picked from commit 41d4b36)

nikileshx pushed a commit to nikileshx/tt-metal that referenced this pull request Feb 3, 2025

Add Qwen2.5 vLLM generator (based on LlamaGenerator), fix batch 1 iss…

bc3752a

…ue with generator's decode forward (tenstorrent#17422)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Qwen2.5 vLLM generator (based on LlamaGenerator), fix batch 1 issue with generator's decode forward #17422

Add Qwen2.5 vLLM generator (based on LlamaGenerator), fix batch 1 issue with generator's decode forward #17422

skhorasganiTT commented Jan 31, 2025 •

edited

Loading

mtairum left a comment

skhorasganiTT commented Jan 31, 2025

Add Qwen2.5 vLLM generator (based on LlamaGenerator), fix batch 1 issue with generator's decode forward #17422

Add Qwen2.5 vLLM generator (based on LlamaGenerator), fix batch 1 issue with generator's decode forward #17422

Conversation

skhorasganiTT commented Jan 31, 2025 • edited Loading

Ticket

Problem description

What's changed

Checklist

mtairum left a comment

Choose a reason for hiding this comment

skhorasganiTT commented Jan 31, 2025

skhorasganiTT commented Jan 31, 2025 •

edited

Loading