Skip to content

Latest commit

 

History

History
32 lines (24 loc) · 1.99 KB

dynamic_quantization.md

File metadata and controls

32 lines (24 loc) · 1.99 KB

Dynamic Quantization

Now only onnxruntime backend support dynamic quantization.

1The key idea with dynamic quantization as described here is that we are going to determine the scale factor for activations dynamically based on the data range observed at runtime. This ensures that the scale factor is “tuned” so that as much signal as possible about each observed dataset is preserved.

Dynamic quantization is relatively free of tuning parameters which makes it well suited to be added into production pipelines as a standard part of NLP models.

Take onnxruntime bert_base model as an example, users can specific quantization method like the following yaml:

model:                                               # mandatory. lpot uses this model name and framework name to decide where to save snapshot if tuning.snapshot field is empty.
  name: bert 
  framework: onnxrt_integerops                       # possible values are tensorflow, mxnet, pytorch or onnxrt

quantization:
  approach: post_training_dynamic_quant              # optional. default value is post_training_static_quant
                                                     # possible value is post_training_static_quant, 
                                                     # post_training_dynamic_quant
                                                     # quant_aware_training                                 
  calibration:
    sampling_size: 8, 16, 32

tuning:
  accuracy_criterion:
    relative:  0.01                                  # optional. default value is relative, other value is absolute. this example allows relative accuracy loss: 1%.
  exit_policy:
    timeout: 0                                       # optional. tuning timeout (seconds). default value is 0 which means early stop. combine with max_trials field to decide when to exit.
  random_seed: 9527                                  # optional. random seed for deterministic tuning.

Footnotes

  1. https://pytorch.org/tutorials/recipes/recipes/dynamic_quantization.html