-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inference Mixtral on Gaudi #249
Comments
Please only run this on a single card. Multi cards are not supported according to Habana's document. |
Result of running with single card was noted above.
|
Btw, the config of running model mixtral on habana without ray is:
|
Model: mistralai/Mixtral-8x7B-Instruct-v0.1
Deployed with single card, it will report OOM error:
Before the error went out, memory usage was like:
When 8 cards with Deepspeed, the model is deployed successfully.
Memory usage was like:
I guess sometimes queries will fail due to not enough cards for deploy, and it runs well when I killed all other parallel tasks.
The correct result will be like:
The text was updated successfully, but these errors were encountered: