INT8 Quantization of dinov2 TensorRT Model is Not Faster than FP16 Quantization #489

mr-lz · 2024-12-06T10:34:58Z

Hello,

I used PyTorch-Quantization for post-training INT8 quantization on the dinov2-base model and then converted it to a TensorRT model. However, I found that the INT8 model is slightly slower than the FP16 model (the same conclusion was observed on A100, V100, and A10). Is this behavior normal?

Thank you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

INT8 Quantization of dinov2 TensorRT Model is Not Faster than FP16 Quantization #489

INT8 Quantization of dinov2 TensorRT Model is Not Faster than FP16 Quantization #489

mr-lz commented Dec 6, 2024

INT8 Quantization of dinov2 TensorRT Model is Not Faster than FP16 Quantization #489

INT8 Quantization of dinov2 TensorRT Model is Not Faster than FP16 Quantization #489

Comments

mr-lz commented Dec 6, 2024