Dynamic Quantization

Now only onnxruntime backend support dynamic quantization.

¹The key idea with dynamic quantization as described here is that we are going to determine the scale factor for activations dynamically based on the data range observed at runtime. This ensures that the scale factor is “tuned” so that as much signal as possible about each observed dataset is preserved.

Dynamic quantization is relatively free of tuning parameters which makes it well suited to be added into production pipelines as a standard part of NLP models.

Take onnxruntime bert_base model as an example, users can specific quantization method like the following yaml:

model:                                               # mandatory. lpot uses this model name and framework name to decide where to save snapshot if tuning.snapshot field is empty.
  name: bert 
  framework: onnxrt_integerops                       # possible values are tensorflow, mxnet, pytorch or onnxrt

quantization:
  approach: post_training_dynamic_quant              # optional. default value is post_training_static_quant
                                                     # possible value is post_training_static_quant, 
                                                     # post_training_dynamic_quant
                                                     # quant_aware_training                                 
  calibration:
    sampling_size: 8, 16, 32

tuning:
  accuracy_criterion:
    relative:  0.01                                  # optional. default value is relative, other value is absolute. this example allows relative accuracy loss: 1%.
  exit_policy:
    timeout: 0                                       # optional. tuning timeout (seconds). default value is 0 which means early stop. combine with max_trials field to decide when to exit.
  random_seed: 9527                                  # optional. random seed for deterministic tuning.

Footnotes

https://pytorch.org/tutorials/recipes/recipes/dynamic_quantization.html ↩

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dynamic_quantization.md

dynamic_quantization.md

Dynamic Quantization

Now only onnxruntime backend support dynamic quantization.

Files

dynamic_quantization.md

Latest commit

History

dynamic_quantization.md

File metadata and controls

Dynamic Quantization

Now only onnxruntime backend support dynamic quantization.

Footnotes