Llama 3.1 405B on Gaudi #646
Unanswered
ppatel-eng
asked this question in
Q&A
Replies: 2 comments
-
Hi @ppatel-eng for the time being we don't provide a multinode solution. Did you have any issues with running model calibration procedure? |
Beta Was this translation helpful? Give feedback.
0 replies
-
Please use the https://github.com/HabanaAI/vllm-hpu-extension/tree/main/calibration to quantize the model. Make sure you are using the latest vllm and vllm-hpu-extenstion version with gaudi pytorch 1.19 image |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
We are trying to run Llama 3.1 405B on Gaudi and are running into memory constraints when following the guide below on 8 Gaudi 2 HPUs. Our end goal is to use vllm-fork to serve llama 3.1 405b, ideally with as little quantization as possible.
https://github.com/HabanaAI/vllm-hpu-extension/blob/main/calibration/README.md
Beta Was this translation helpful? Give feedback.
All reactions