Skip to content

Commit

Permalink
Update documentation on support of fp8 (vllm-project#288)
Browse files Browse the repository at this point in the history
Update documentation on support of fp8
  • Loading branch information
michalkuligowski authored Sep 17, 2024
1 parent a9de5ba commit d39298c
Show file tree
Hide file tree
Showing 2 changed files with 4 additions and 2 deletions.
3 changes: 2 additions & 1 deletion README_GAUDI.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,14 +81,15 @@ Supported Features
- Inference with [HPU
Graphs](https://docs.habana.ai/en/latest/PyTorch/Inference_on_PyTorch/Inference_Using_HPU_Graphs.html)
for accelerating low-batch latency and throughput
- INC quantization

Unsupported Features
====================

- Beam search
- LoRA adapters
- Attention with Linear Biases (ALiBi)
- Quantization (AWQ, FP8 E5M2, FP8 E4M3)
- AWQ quantization
- Prefill chunking (mixed-batch inferencing)

Supported Configurations
Expand Down
3 changes: 2 additions & 1 deletion docs/source/getting_started/gaudi-installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -76,14 +76,15 @@ Supported Features
- Tensor parallelism support for multi-card inference
- Inference with `HPU Graphs <https://docs.habana.ai/en/latest/PyTorch/Inference_on_PyTorch/Inference_Using_HPU_Graphs.html>`__
for accelerating low-batch latency and throughput
- INC quantization

Unsupported Features
====================

- Beam search
- LoRA adapters
- Attention with Linear Biases (ALiBi)
- Quantization (AWQ, FP8 E5M2, FP8 E4M3)
- AWQ quantization
- Prefill chunking (mixed-batch inferencing)

Supported Configurations
Expand Down

0 comments on commit d39298c

Please sign in to comment.