Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: max_seqs_in_batch=1 induces a FATAL #54

Closed
1 task done
rpavlovicTT opened this issue Jan 18, 2025 · 1 comment
Closed
1 task done

[Bug]: max_seqs_in_batch=1 induces a FATAL #54

rpavlovicTT opened this issue Jan 18, 2025 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@rpavlovicTT
Copy link

Your current environment

vLLM branch: dev (last verified commit: 2f33504)

tt-metal branch: main (last verified commit: 47fb1a2)

Model Input Dumps

No response

🐛 Describe the bug

Running examples/offline_inference_tt.py with max_seqs_in_batch=1 results in FATAL error

FATAL    | ttnn.pad: For sharded inputs, only height-sharding is supported.

Repro on T3K machine:

python examples/offline_inference_tt.py --model "meta-llama/Meta-Llama-3.1-8B-Instruct" --measure_perf --max_seqs_in_batch 1

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants