Intel® Low Precision Optimization Tool aims to help users to fast deploy low-precision inference solution on popular DL frameworks including TensorFlow, Pytorch, MxNet etc. With built-in strategies, it will automatically optimized low-precision recipes for deep learning models to achieve optimal product objectives like inference performance and memory usage with expected accuracy criteria. For now, it support Basic
, Bayesian
, Exhaustive
, MSE
, Random
and TPE
strategies. And the Basic
strategy is default one.
Strategies need to generate the next quantization configuration according to itself logic and last time quantization result. So the function of strategies can be shown in below graph.
Strategies need two sides information. One side comes from adator layer, user pass the framework specific model to initial the quantizator, then strategies will call the self.adaptor.query_fw_capablpoty(model)
to get the framework and model specific quantization capablpoties. On the other hand, strategy will merge the model specific configurations in yaml
configuration file to filter some capablpoty in first step to generate the tuning space. And then, each strategy will generate the quantiztion config according to itself location and logic with tuning strategy configurations in yaml
configuration file. All of strategies will finish the tuning processing if the timeout
or max_trails
has been reached. The default value of timeout
is 0, if so, the tuning phase will early stop when the accuracy
has met the criteria.
The detail configuration templates can be found in here
.
For model specific configurations, users can set the quantization approach, and for post training static quantization, we also can set calibration and quantization related parameters for model-wise and op-wise.
quantization: # optional. tuning constraints on model-wise for advance user to reduce tuning space.
approach: post_training_static_quant # optional. default value is post_training_static_quant.
calibration:
sampling_size: 1000, 2000 # optional. default value is the size of whole dataset. used to set how many portions of calibration dataset is used. exclusive with iterations field.
dataloader: # optional. if not specified, user need construct a q_dataloader in code for lpot.Quantization.
dataset:
TFRecordDataset:
root: /path/to/tf_record
transform:
Resize:
size: 256
CenterCrop:
size: 224
model_wise: # optional. tuning constraints on model-wise for advance user to reduce tuning space.
weight:
granularity: per_channel
scheme: asym
dtype: int8
algorithm: minmax
activation:
granularity: per_tensor
scheme: asym
dtype: int8, fp32
algorithm: minmax, kl
op_wise: { # optional. tuning constraints on op-wise for advance user to reduce tuning space.
'conv1': {
'activation': {'dtype': ['uint8', 'fp32'], 'algorithm': ['minmax', 'kl'], 'scheme':['sym']},
'weight': {'dtype': ['int8', 'fp32'], 'algorithm': ['kl']}
},
'pool1': {
'activation': {'dtype': ['int8'], 'scheme': ['sym'], 'granularity': ['per_tensor'], 'algorithm': ['minmax', 'kl']},
},
'conv2': {
'activation': {'dtype': ['fp32']},
'weight': {'dtype': ['fp32']}
}
}
In strategy tuning tuning part related configurations, user can choose the the specific tuning strategy and set the accuracy criterion and optimization objective for the tuning. And also can set the stop condition for the tuning by change the exit_policy
.
tuning:
strategy:
name: basic # optional. default value is basic. other values are bayesian, mse...
accuracy_criterion:
relative: 0.01 # optional. default value is relative, other value is absolute. this example allows relative accuracy loss: 1%.
objective: performance # optional. objective with accuracy constraint guaranteed. default value is performance. other values are modelsize and footprint.
exit_policy:
timeout: 0 # optional. tuning timeout (seconds). default value is 0 which means early stop. combine with max_trials field to decide when to exit.
max_trials: 100 # optional. max tune times. default value is 100. combine with timeout field to decide when to exit.
random_seed: 9527 # optional. random seed for deterministic tuning.
tensorboard: True # optional. dump tensor distribution in evaluation phase for debug purpose. default value is False.
USers can based on the basic TuneStrategy
class to enable a new strategy with a new self.next_tune_cfg()
function implement. If the new strategy need more information, user can try to override the self.traverse()
in the new strategy, such as TPE
strategy.
Basic strategy is design for most of models to do the quantization. It can be divided to three steps. Firstly, Basic
strategy will try all of model wise tuning configs and get the best quantized model. If all of the model wise tuning configs can't meet the accuracy loss criteria, then it will go to second step. In this step, do OP
high precision (FP32
, BF16
...) fallback one-by-one based on the best model-wise tuning config, and record the impact of each OP
on accuracy and sort accordingly. In the finnal step, strategy will try to incrementally fallback multipul OP
to high-precision according to the sorted OP
list generated in step two till achieving the accuracy goal.
Basic
strategy is the default strategy, so we can use it by default if we don't add the strategy
filed in our yaml
configuration file. For example, the classical setting in congiguration file as below.
tuning:
accuracy_criterion:
relative: 0.01
exit_policy:
timeout: 0
random_seed: 9527
Bayesian optimization is a sequential design strategy for global optimization of black-box functions. The strategy refers to the Bayesian optimization package bayesian-optimization and changes it to a discrete version that complies with the strategy standard of Intel® Low Precision Optimization Tool. It uses Gaussian Processes to define the prior/posterior distribution over the black-box function with the tuning history, and then finds the tuning configuration that maximizes the expected improvement. For now bayesian strategy just tune op-wise quantize configs, it don't included fallback datatype config. If it add fallback datatype config into the param space of bayesian, it will generate a full FP32 model finally, because in generally, fallback datatype has better acc. At the end [DEBUG] Tuning config was evaluated, skip!
will be printed endlessly. Because the param space very small for this tuning and can't get good acc result. If we change the timeout from 0 to a integrate, it can be ended after reach the timeout.
If we want to use Bayesian
strategy, we need to set the timeout
or max_trials
as non-zore value as below example. Because sometimes the param space for baysian
are very small and can't reach the accuracy goal. In this case, the tuning part will hang on. If we set the log level to debug
by LOGLEVEL=DEBUG
in the enviroment, [DEBUG] Tuning config was evaluated, skip!
will be printed endlessly.
tuning:
strategy:
name: bayesian
accuracy_criterion:
relative: 0.01
objective: performance
exit_policy:
timeout: 0
max_trials: 100
MSE
and Basic
has similar ideas. The mainly defierence of those two strategies is in the step 2, which to generate a sorted op lists. In MSE
strategy step 2, it needs to get the tensors for each Operator of raw FP32 models and the quantized model based on best model-wise tuning configuration. And then calculate the MSE (Mean Squared Error) for each operator, sort those operators according to the MSE value, finally do the op-wise fallback in this order.
Similar with Basic
but need specific the strategy name to mse
.
tuning:
strategy:
name: mse
accuracy_criterion:
relative: 0.01
exit_policy:
timeout: 0
random_seed: 9527
Sequential model-based optimization methods (SMBO) are a formalization of Bayesian optimization. The sequential refers to running trials one after another, each time trying better hyperparameters by applying Bayesian reasoning and updating a surrogate model. There are five main aspects of SMBO:
- Domain: A domain of hyperparameters or search space
- Objective function: An objective function which takes hyperparameters as input and outputs a score that needs to minimize (or maximize)
- Surrogate function: The representative surrogate model of the objective function
- Selection function: A selection criterion to evaluate which hyperparameters to choose next trial from the surrogate model
- History: A history consisting of (score, hyperparameter) pairs used by the algorithm to update the surrogate model
There are several variants of SMBO, which differ in how to build a surrogate and the selection criteria (steps 3–4). The TPE builds a surrogate model by applying Bayes's rule.
NOTE: TPE requires many iterations to converge to an optimal solution, and it is recommended to run it for at least 200 iterations. Because every iteration requires evaluation of a generated model, which means accuracy measurements on a dataset and latency measurements using a benchmark, this process may take from 24 hours up to few days to complete, depending on a model.
TPE algorithm consists of multiple (the following) steps:
- Define a domain of hyperparameter search space,
- Create an objective function which takes in hyperparameters and outputs a score (e.g., loss, RMSE, cross-entropy) that we want to minimize,
- Get a couple of observations (score) using randomly selected set of hyperparameters,
- Sort the collected observations by score and divide them into two groups based on some quantile. The first group (x1) contains observations that gave the best scores and the second one (x2) - all other observations,
- Two densities l(x1) and g(x2) are modeled using Parzen Estimators (also known as kernel density estimators) which are a simple average of kernels centered on existing data points,
- Draw sample hyperparameters from l(x1), evaluating them in terms of l(x1)/g(x2), and returning the set that yields the minimum value under l(x1)/g(x1) corresponding to the greatest expected improvement. These hyperparameters are then evaluated on the objective function.
- Update the observation list in step 3
- Repeat step 4-7 with a fixed number of trials
TPE
's usage is similar with Bayesian
.
tuning:
strategy:
name: bayesian
accuracy_criterion:
relative: 0.01
objective: performance
exit_policy:
timeout: 0
max_trials: 100
This strategy is used to sequentially traverse all the possible tuning configurations in tuning space. For now, from the perspective of the impact on performance, we only traverse all possible quantize tuning configs, don't included fall back data type.
The usage is also similar with Basic
.
tuning:
strategy:
name: exhaustive
accuracy_criterion:
relative: 0.01
exit_policy:
timeout: 0
random_seed: 9527
This strategy is used to randomly choose tuning configuration from the tuning space. Same with Exhaustive
strategy, it also just consider quantize tuning configs to generate a better performance quantized model.
The usage is also similar with Basic
.
tuning:
strategy:
name: random
accuracy_criterion:
relative: 0.01
exit_policy:
timeout: 0
random_seed: 9527