Llama 3.1 405B on Gaudi #646

ppatel-eng · 2024-12-18T19:26:39Z

ppatel-eng
Dec 18, 2024

We are trying to run Llama 3.1 405B on Gaudi and are running into memory constraints when following the guide below on 8 Gaudi 2 HPUs. Our end goal is to use vllm-fork to serve llama 3.1 405b, ideally with as little quantization as possible.

https://github.com/HabanaAI/vllm-hpu-extension/blob/main/calibration/README.md

michalkuligowski · 2025-01-08T16:36:01Z

michalkuligowski
Jan 8, 2025
Collaborator

Hi @ppatel-eng for the time being we don't provide a multinode solution. Did you have any issues with running model calibration procedure?

0 replies

vinayK34 · 2025-02-10T06:19:47Z

vinayK34
Feb 10, 2025

Please use the https://github.com/HabanaAI/vllm-hpu-extension/tree/main/calibration to quantize the model. Make sure you are using the latest vllm and vllm-hpu-extenstion version with gaudi pytorch 1.19 image

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama 3.1 405B on Gaudi #646

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Llama 3.1 405B on Gaudi #646

ppatel-eng Dec 18, 2024

Replies: 2 comments

michalkuligowski Jan 8, 2025 Collaborator

vinayK34 Feb 10, 2025

ppatel-eng
Dec 18, 2024

michalkuligowski
Jan 8, 2025
Collaborator

vinayK34
Feb 10, 2025