Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Config refactor: Remove gemm_plugin from config, add quantization config with calibration size #1398

Merged
merged 11 commits into from
Feb 21, 2025

Conversation

michaelfeil
Copy link
Contributor

🚀 What

  • current possible options to configure this is ["auto", "float16", "bfloat16"]. The default is "auto" - there is no single case where you want to configure fp16 when weights are bfloat16.
  • only options where it makes sense is to use "auto" when using fp16/bf16 and None / null (RE: performance guide) when "fp8 or fp8kv". This conifguration is currently not possible. We should do this automatically in engine-builder, as the choice is stabgtforward.

💻 How

🔬 Testing

@michaelfeil michaelfeil changed the title Update trt_llm_config.py Remove gemm_plugin from config Feb 18, 2025
@michaelfeil michaelfeil changed the title Remove gemm_plugin from config Remove gemm_plugin from config, add quantization config with calibration size Feb 21, 2025
@michaelfeil michaelfeil changed the title Remove gemm_plugin from config, add quantization config with calibration size Config refactor: Remove gemm_plugin from config, add quantization config with calibration size Feb 21, 2025
@michaelfeil
Copy link
Contributor Author

@aspctu aspctu self-requested a review February 21, 2025 18:02
@michaelfeil michaelfeil merged commit 0970b7d into main Feb 21, 2025
1 check passed
@michaelfeil michaelfeil deleted the michaelfeil/drop-gemm-config branch February 21, 2025 18:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants