[Bug]: `max_seqs_in_batch=1` induces a FATAL #54

rpavlovicTT · 2025-01-18T09:10:34Z

Your current environment

vLLM branch: dev (last verified commit: 2f33504)

tt-metal branch: main (last verified commit: 47fb1a2)

Model Input Dumps

No response

🐛 Describe the bug

Running examples/offline_inference_tt.py with max_seqs_in_batch=1 results in FATAL error

FATAL    | ttnn.pad: For sharded inputs, only height-sharding is supported.

Repro on T3K machine:

python examples/offline_inference_tt.py --model "meta-llama/Meta-Llama-3.1-8B-Instruct" --measure_perf --max_seqs_in_batch 1

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

The text was updated successfully, but these errors were encountered:

skhorasganiTT · 2025-01-31T19:12:41Z

Addressed in tenstorrent/tt-metal#17422

rpavlovicTT added the bug Something isn't working label Jan 18, 2025

skhorasganiTT self-assigned this Jan 22, 2025

skhorasganiTT mentioned this issue Jan 31, 2025

Add Qwen2.5 vLLM generator (based on LlamaGenerator), fix batch 1 issue with generator's decode forward tenstorrent/tt-metal#17422

Merged

6 tasks

skhorasganiTT closed this as completed Jan 31, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: `max_seqs_in_batch=1` induces a FATAL #54

[Bug]: `max_seqs_in_batch=1` induces a FATAL #54

rpavlovicTT commented Jan 18, 2025

skhorasganiTT commented Jan 31, 2025

[Bug]: max_seqs_in_batch=1 induces a FATAL #54

[Bug]: max_seqs_in_batch=1 induces a FATAL #54

Comments

rpavlovicTT commented Jan 18, 2025

Your current environment

Model Input Dumps

🐛 Describe the bug

Before submitting a new issue...

skhorasganiTT commented Jan 31, 2025

[Bug]: `max_seqs_in_batch=1` induces a FATAL #54

[Bug]: `max_seqs_in_batch=1` induces a FATAL #54